Parallel processor system with asynchronous data transmission by a data producing processor to a data using processor employing controlled execution of a program by the data using processor for processing the data

- Hitachi, Ltd.

A parallel computer system includes a plurality of processors, each of which is placed in data communication with an interconnecting network. Pairs of a data signal and a data identification code, predetermined for the data signal, are received by each processor and stored in a memory. Structure is provided for reading a data signal belonging to one of the pairs having a data identification code designated by a data readout instruction.

Skip to: Description  ·  Claims  ·  References Cited  · Patent History  ·  Patent History
Description
BACKGROUND OF THE INVENTION

The present invention relates to a parallel computer system composed of a plurality of processors and a host processor.

The data transmission between the processors of a parallel computer system according to the prior art are accomplished by two methods: one--for data transmission between two processors being synchronized with each other; the other--for data transmission between two processors being left asynchronous. In the second method, more specifically, the individual processors are controlled so as to perform so-called "data flow type operations", in which each operation is not started in a processor before all data necessary for the operation arrive at the processor. According to the first method an overhead for synchronizing two processors for data transmission is a problem. That is, one processor is frequently interrupted for the synchronism by another so that each using efficiency of each processor is degraded. According to the second method, on the other hand, the difficulty of the first method can be eliminated to some extent. Since, however, an operation cannot be started before all necessary data is prepared, the steps of an operation--data transmission--wait are sequential, and the overheads for the steps of the data transmission and wait are still left.

Another third method has also been proposed for data transmission without any synchronization of the processors (e.g., Japanese Patent Laid-Open No. 49464/1985). In this example, a processor sends not only data to be transmitted but also addresses of instructions requiring the former. Each time the receive processor receives the data, the execution of an instruction is interrupted, and a flag indicating arrival of the necessary data is added to the instruction which is in an address accompanying the data received. According to this third method, the interruption of the instruction execution for the above-specified processing is performed upon each data receipt, thus obstructing the desired high-speed operations.

SUMMARY OF THE INVENTION

An object of the present invention is to provide a parallel computer system capable of executing data transmission between processors at a high speed without any synchronization between the processor.

In order to achieve the above-specified object, according to the present invention, each processor is equipped with means for receiving from an interconnecting network and storing pairs of a data signal and a data identification code predetermined for the data signal and for reading a data signal included in one of the pairs having a data identification code designated by a data readout instruction.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic block diagram showing an embodiment of a parallel computer system according to the present invention;

FIG. 2A is a diagram showing a data packet to be used in a modified parallel computer system according to the present invention;

FIG. 2B is a diagram showing associative memory means for receiving the data packet of FIG. 2A;

FIG. 3 is a schematic block diagram showing another embodiment of the parallel computer system according to the present invention;

FIG. 4 is a diagram showing examples of an execution process list before process switching and a wait process list in the system of FIG. 3;

FIG. 5 is a diagram showing an execution process list after the process switching and a wait process list in the system of FIG. 3;

FIG. 6 is a diagram showing an example of a program for a scalar computer (for sequential processing); and

FIGS. 7a, 7b and 7c are diagrams showing plural processes for executing the respective programs of FIG. 6.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

The present invention will be described in detail in the following in connection with the embodiments thereof with reference to the accompanying drawings.

In FIG. 1, reference numeral 1 denotes a inter-connecting network, and numeral 2 denotes a plurality of parallel processing processor elements. Each of these processor elements has an identical structure which includes central processing unit (CPU) 3, a send unit 4, a receive unit 5, a memory 6 and a host processor 7. The present embodiment is characterized in that the send unit 4 of each processor element 2 gives a data identification code DID to data to be sent and sends it in the form of a data packet to a destination processor element, whereas the receive unit 5 has an associative memory 21 for storing a pair of the data and the data identification code included in the data packet received. The central processing unit 3 reads and executes the instruction of an instruction queue, which is held in the memory 5, under the control of execution control 61. The host processor 7 is loaded with a sequential processing program programmed for an ordinary scalar computer and divides it into a plurality of programs to be executed in parallel by the processing elements 2. The divided programs are respectively transferred together with data necessary for respective programs through a line l100 to different processor elements 2, to start the processor elements 2. Moreover, after an end of execution a program in all of the processor elements 2 the host processor 7 reads out resultant data from each processor element 2 and outputs to an external device (not shown).

The parallel computer shown in FIG. 1 executes one program (which is called the "job"). For this, each processor element 2 is so preset by the host processor 7 as to execute one (or more) a plurality of unit portions (which are called the "processes") to be processed and executed. The instruction queue 70 shown in FIG. 1 is one belonging to such one (or more) processes.

Grouped general purpose registers 10 or floating-point registers 11 are connected to an arithmetic logic unit 80 and the memory 6. Each of the general purpose registers 10 or floating-point registers 11 can be designated by an instruction. For example, the data in one general purpose register or floating-point register designated by an instruction is operated by the arithmetic logic unit 80, and the result is stored in one general purpose register or floating-point register designated by that instruction. The arithmetic logic unit 80 can operate the data in the memory 6 and send the result to the memory 6.

A specific example of the parallel processing program will be described with reference to FIG. 6 and FIGS. 7a, 7b and 7c.

FIG. 6 shows an example of a sequential processing program programmed for an ordinary scalar computer.

This is a program for calculating data elements U(1) to U(100.times.n). This program is divided into programs in the number n of the processor elements 2 by the host processor 7 so that each divided program is used for calculation of 100 data elements. These divided programs are shown in FIGS. 7a, 7b and 7c. FIG. 7a shows the program which is set in the processor element No. 1. FIG. 7c shows the program which is set in the processor element No. n. FIG. 7b shows the programs which are loaded into the processor elements No. 2 to No. n-1. Among initial values of data elements U(1) to U(n.times.100) required for execution of the program in FIG. 6, a group of 100 initial values of data elements U(i.times.100+1) to U((iH).times.100) are loaded into a processor element No. i (i=1, 2, . . . or n).

A processor element No. i requires a data element U(100.times.i) calculated by a processor element No. i-1. For this requirement, the data U(100.times.i) have to be transferred between the two processor elements. In the programs shown in FIGS. 7a, 7b and 7c, the data sending process is performed by calling a subroutine SEND, and the data receiving process is performed by calling a function RECEIVE.

The data sending process will be described with reference to FIG. 7a. In response to the call of the subroutine SEND at 701, the data are sent to another processor element.

A first argument PE#2 of the subroutine SEND denotes the number of the destination processor element. A second argument 100 denotes a data identification code of a third argument UN(100) to be transferred. This subroutine SEND is realized by the following SEND instruction, as will be described with reference to FIG. 1.

The format of the SEND instruction is as follows, for example: ##STR1##

The "Instruction Code" indicates that the instruction is the SEND instruction and "R2" indicates the number of one general purpose register. This register stores in advance the head address A of a parameter address table 71 in the memory 6 before the execution of that instruction.

The memory 6 has its area 30 to 33 set before the execution of the instruction with the number (P#) of the destination processor element, a control signal (CONT) for the transfer, the identification code (KEY) for the transfer data, and the transfer data. Moreover, the parameter address table 71 is stored with the addresses of those areas. Incidentally, the transfer control signal CONT indicates a data transfer mode and is used so as to indicate whether the data is to be sent to a single processor element or to all the processor elements.

The data identification code is predetermined for each data without direct relation to the value of the data and can be exemplified by the name or number of the data determined in advance.

When the CPU 3 decodes the SEND instruction, the address A in the general purpose register No. R2 designated by the instruction is sent to a packet generator 54 of the sent unit 4 through a line l41. The packet generator 54 is constructed of a microprocessor, for example, and can operate independently of the CPU 3 after receipt of the address A. This packet generator 54 accesses the address table 71 in response to the address A and reads out the informations of the areas 30 to 33 in accordance with the content of the table 71 thereby to set them as a packet in a register 12. This packet in the register 12 is sent through the output buffer 11 to the interconnecting network 1 to give the data identification code DID and the data to the processor designated by that packet. Thus, the data send is carried out without any requirement for the synchronization with the receive processor elements when the instruction in the send processor element demands it. The send processor element can execute the succeeding instructions irrespective of whether or not the send data have been processed by the receive processor elements.

Next, the data receive instruction will be described with reference to FIG. 7c. The data are received from other processor elements in response to the call of the function RECEIVE of 702.

The argument of the function RECEIVE is used for identifying the data to be received by the data identification code. The function RECEIVE is realized by the following RECEIVE instruction, as will be described with reference to FIG. 1.

The couple of the data identification code DID and the data in an identical data packet are received by an input buffer 20. The associative memory 21 has a plurality of entries each composed of an area 22 for storing the data identification code DID, a data storing area 23 and an effective indication bit area V 24. This effective indication bit is set at "1" when effective data are written in the entry. The data of the data identification code DID received by the input buffer 20 are written in the vacant region of the associative memory 21.

These data receiving operations are performed in parallel with the execution of the instruction at the CPU 3 each time the data arrive at the input buffer 20, When the CPU 3 uses the data in the associative memory 21, a specific instruction called the "RECEIVE" instruction is executed before the instruction for that use. The format of the RECEIVE instruction is as follows, for example: ##STR2## The "Instruction Code" indicates that the instruction is the RECEIVE instruction, and "R1" indicates the number of the general purpose register of floating-point register for storing the data to be read in. It is identified by the instruction code, for example, whether the register is the general purpose one or the floating-point one. "R2" indicates the number of a general purpose register which has stored in advance the same identification code as the data identification code DID accompanying the data to be read in.

When the RECEIVE instruction is set in an instruction register 60, the data identification code DID stored in the general purpose register No. R2 is sent out to the associative memory 21 through a line l120. The associative memory 21 examines whether or not there is one coincident with that data identification code and whether or not there is data having a corresponding effective bit 24 at "1". The associative memory 21 sends the entry, if any, to the grouped general purpose registers 10 through a line l121. These data are written in the general purpose register No. R1 designated by the RECEIVE instruction. The associative memory 21 sets the effective bit V at "0" and sends out the value "1" indicating the readout end, if the entry is found out, and otherwise the value "0" to a condition code (C.C.) register 90 through a line l123.

Subsequent to this RECEIVE instruction, there is prepared in advance a BRANCH instruction for discriminating the value of the condition code. When this BRANCH instruction is set in the instruction register, more specifically, the instruction read out execution control 61 examines the value of the condition code in the register 90 to bring a subsequent instruction executable if that value is at "1". If the value is at "0", on the contrary, the execution control 61 executes again the aforementioned RECEIVE instruction. As a result, the succeeding instructions are sequentially executed if the data having the identification code designated by the RECEIVE instruction have already been received. If, however, the coincidence data identification code is not found out in the associative memory 21 or the corresponding effective bit is at "0" with the code found out, the aforementioned RECEIVE instruction and the BRANCH instruction are executed repetitively. And, the succeeding instructions are executed after the target data have arrived at the associative memory 21.

As is apparent from the description thus far made, the data required by the program instructions can be utilized correctly and selectively irrespective of the order of receipt of the data by the associative memory 21 and which processor the data have been sent from.

When a plurality of jobs are to be executed by the parallel computer, the data packet used in the embodiment of FIG. 1 may be replaced by that to which the job No. (JOB#) being executed at present by the CPU for outputting the data to be sent is added, as shown in FIG. 2A. In this case, the job No. JOB# is also stored in the associative memory 21 of FIG. 1, as shown in FIG. 2B. This modification is different from the embodiment of FIG. 1 in that the associative memory 21 retrieves whether or not there is an entry in which the data identification code designated by the CPU and the job No. JOB# are coincident.

With this modification, the data can be transferred among several programs belonging to the job bearing the same number. In the parallel computer system and in each processor element 2, more specifically, the modification shown in FIG. 2 is effective for executing a portion of some job, in case a plurality of jobs are to be executed.

In the two embodiments thus far described, the data having an identification code coincident with the data identification code DID are retrieved. The present invention should not be limited thereto but can be extended to such an extent that the associative memory 21 is modified to make the retrieval by using the magnitudes, the plus or minus signs and so on.

Unless, moreover, the readout end signal is not fed from the associative memory 21 to the instruction readout execution control 61 within a predetermined time period after the end of the RECEIVE instruction, the control 61 can be so modified as to read out the instruction of another process from the memory 6 and execute it.

FIG. 3 shows an embodiment suitable especially for the modified case. In FIG. 3, the same reference numerals as those of FIG. 1 denote the identical parts. The embodiment of FIG. 3 is characterized by providing a second associative memory 42 in addition to the same associative memory 21 as that of FIG. 1.

In FIG. 3, the CPU 3 is different from the CPU of FIG. 1 in that it is equipped with a register 32 for storing the number PS# of a process being executed.

The process number PS# in that register 32 is sent to the packet generator 54, when the SEND instruction is stored in the instruction register 60 and is executed, and is assembled into the data packet until it is held in the register 12.

The data packet from another processor element is received by the input buffer 20. At this time, however, the portions of the data packet other than the processor number PS# are received. Each entry of the first associative memory 21 is different from that of FIG. 1 in that it has a process number (PS#) field 25 in addition to the data identification number DID field 22, the data field 23 and the effective bit field V.

When the RECEIVE instruction is held in the instruction register 60, the data identification code DID and the process number PS# are fed as the associative keys from the grouped general purpose registers and the register 32, respectively, to the first associative memory 21.

The data identification code DID and the process number PS# are also fed to the second associative memory 42 so that they are written in one entry of the same. Each entry of this associative memory 42 has an effective bit field and is set at "1" upon each writing operation. In the first associative memory, an entry is searched which has information coincident with two associative keys and has an effective bit V at "1" and if such an entry is found out, it is sent to the grouped general purpose registers 10 through the line l121 and a value "1" indicating an end of readout of the first associative memory 21 is set in the condition code register 80 through the line l123. In response to the signal at "1" on the line l123, on the other hand, the second associative memory 42 sets at "0" the effective bit V of the entry having been written previously.

On the other hand, in case no entry having the information coincident with the two input associative keys is found out in the first associative memory 21, the condition code register 80 is fed with the value "0" through the line l123. Thus, the execution of the RECEIVE instruction is ended, and the control 61 executes the BRANCH instruction for judging the condition code in the register 80 as a subsequent instruction in the same process. The succeeding instruction is executed if the condition code is at "1". Otherwise, the process executing schedule program is executed for connecting the process being executed to a wait process list to put it into a wait state and taking another process connected to an execution process list so as to execute it.

Thus, the instruction in the process having brought newly into the executable state is executed.

In the receive unit 5, on the other hand, the process number PS#, the data identification code DID and the data in the data packet received newly by the input buffer 20 are stored in the first associative memory 21 and the process number PS# and the data identification code DID are fed as the associative keys to the second associative memory 42. If one entry has the data identification code DID and the process number PS# coincident with those associative keys and the effective bit V corresponding thereto, the memory 42 informs the process number PS# of that entry of a microprocessor 16 through a line l58. And, the effective bit V of that entry is set at "0".

Thus, the second associative memory detects the number of the process, if any in a wait state waiting for the data in a data packet, each time it receives the data packet.

The microprocessor 16 is programmed to control the instruction read out by the control 61 for the following processes so that the process having the number PS# inputted may come into the executable state:

(1) For an exclusive control with the CPU 3, this CPU 3 is prohibited to access an execution process list (50A in FIG. 4) and a wait process list (50B in FIG. 4) in the memory 6;

(2) From the wait process list, there is retrieved a cell for holding the information on the process of the number sent from the second associative memory 42 through a line l58;

(3) The cell found out at the step 71 is connected to the tail of the execution process list;

(4) The aforementioned cell is deleted from the wait process list; and

(5) The prohibition of access to the above-specified two lists in the memory 6 is released.

In FIG. 4, numerals 50A and 50B denote examples of the execution process list and the wait process list, respectively, before the execution of the above-itemized steps. In FIG. 5, numerals 50C and 50D denote examples of the execution process list and the wait process list, respectively, after the execution of those steps. In FIGS. 4 and 5, reference letters A, AT, W and WT denote the head and tail pointers of the execution process list and the head and tail pointers of the wait process list, respectively. Moreover, the instruction address attached to a certain process number denotes the address of an instruction to be executed of the process bearing the number.

Thus, the process of the number PS detected by the second associative memory 42 is executed.

Claims

1. A parallel computer system for executing a group of programs in parallel so as to process a group of data signals which include shared data signals each one of which is produced by a corresponding one of the programs and used by a corresponding another one of the programs, comprising:

(a) a plurality of processors operable in parallel;
(b) network means connected to the processors for transferring data signals in parallel between different pairs of the processors;
(c) each one of the processors having;
(c1) a local memory portion for holding at least one of the programs to be executed by the each one processor and a part of the group of the data signals to be used or produced by the each one processor;
(c2) execution means connected to the network means and the local memory portion for executing instructions of the at least one program;
(c3) a receive memory portion connected to the network means and the execution means for receiving, in parallel to execution of instructions by the execution means, shared data signals transferred to the each one processor from other processors and for holding the shared data signals;
(d) said execution means including;
(d1) send means connected to the network for transferring, via said network means, a first selected shared data signal produced by the at least one program to another selected processor, selected by the at least one program, which is executing another program which uses the first shared data signal, at a first timing requested by the at least one program and in parallel to execution of the another program by the another processor;
(d2) process means connected to the receive memory portion for selectively processing a second shared data signal requested by the at least one program, among shared data signals used thereby, either at a second timing requested by the at least one program or at a third timing after said second shared data signal subsequently arrives at the receive memory portion depending upon whether or not the second data signal is present within the receive memory portion at the second timing.

2. A parallel computer system for executing a group of programs in parallel so as to process a group of data signals which include shared data signals each one of which is produced by a corresponding one of the programs and used by a corresponding another one of the programs, comprising:

(a) a plurality of processors operable in parallel;
(b) network means connected to the processors for transferring data signals in parallel between different pairs of the processors;
(c) each one of the processors having;
(c1) a local memory portion for holding at least one of the programs to be executed by the each one processor of the plurality of the processors and a part of the group of the data signals to be used or produced by the each one processor;
(c2) execution means connected to the network means and the local memory portion for executing instructions of the at least one program;
(c3) a receive memory portion connected to the network means and the execution means for receiving, in parallel to execution of instructions by the execution means, shared data signals transferred to the each one processor from other processors and for holding the shared data signals;
(d) said execution means including;
(d1) send means connected to the network for transferring, via said network means, a first selected shared data signal produced by the at least one program to another selected processor, selected by the at least one program, which is executing another program which uses the first shared data signal, at a first timing requested by the at least one program and in parallel to execution of the another program by the another processor;
(d2) request means connected to the receive memory portion for requesting a selective readout of a second shared data signal requested by the at least one program, among shared data signals used thereby, at a second timing requested by the at least one program; and
(d3) means for controlling subsequent execution of the at least one program, depending upon whether said second shared data signal is present within the receive memory portion at the second timing.

3. A parallel computer system for executing a group of programs in parallel so as to process a group of data signals which include shared data signals each one of which is produced by a corresponding one of the programs and used by a corresponding another one of the programs, comprising:

(a) a plurality of processors operable in parallel;
(b) network means connected to the processors for parallely transferring data signals between different pairs of the processors;
(c) each one of the processors having;
(c1) a local memory portion for holding at least the one of the programs to be executed by the each one processor of the plurality of processors and a part of the group of the data signals to be used or produced by the each one processor;
(c2) execution means connected to the network means and the local memory portion for executing instructions of the at least one program;
(c3) a receive memory portion connected to the network means and the execution means for receiving, in parallel to execution of instructions by the execution means, shared data signals transferred to the each one processor from other processors and for holding the shared data signals;
(d) said execution means including;
(d1) send means connected to the network for transferring, via said network means, a first selected shared data signal produced by the at least one program to another selected processor, selected by the at least one program, which is executing another program which uses the first shared data signal, the transferring operation being done in response to an instruction of a first kind provided within the at least one program, for selective transfer of the first shared data signal and in parallel to execution of the another program by the another processor;
(d2) request means connected to the receive memory portion for requesting a selective readout of a second shared data signal requested by the at least one program, among shared data signals used thereby, in response to an instruction of a second kind provided within the at least one program, for selective use of said second shared data signal; and
(d3) execution control means for controlling subsequent execution of the at least one program, depending upon whether said second shared data signal is present within the receive memory portion at the second timing including means for executing subsequent instructions of the at least one program, to be executed so as to process said second shared data signal after the instruction of the second kind, in case said second shared data signal is present within the receive memory portion; and
means for delaying execution of the subsequent instructions in case said second shared data signal is absent within the receive memory portion at the second timing, until after said second shared data signal subsequently becomes present.

4. A system of claim 3, wherein the delaying means includes:

means for executing other instructions of another process so as to execute processing not related to said second shared data signal, when said second shared data signal is absent from the receive memory portion at the second timing.

5. A system of claim 4, wherein the delaying means further includes;

means for executing the subsequent instructions in response to subsequent arrival of said second shared data signal at the receive memory portion.

6. A parallel computer system for executing a group of programs in parallel so as to process a group of data signals which include shared data signals each one of which is produced by a corresponding one of the programs and used by a corresponding another one of the programs, comprising:

(a) a plurality of processors operable in parallel;
(b) network means connected to the processors for transferring data signals in parallel between different pairs of the processors;
(c) each one of the processors having;
(c1) a local memory portion for holding at least one of the programs to be executed by the each one processor of the plurality of processors and a part of the group of the data signals to be used or produced by the each one processor;
(c2) execution means connected to the network means and the local memory portion for executing instructions of the at least one program;
(c3) a receive memory portion connected to the network means and the execution means for receiving, in parallel to execution of instructions by the execution means, shared data signals transferred to the each one processor from other processors and for holding the shared data signals;
(d) said execution means including;
(d1) send means connected to the network means for transferring, via said network means, a first shared data signal requested by the at least one program and a first data identification code (ID) predetermined for the first shared data signal, to another selected processor, selected by the at least one program, which is executing another program which uses the first shared data signal;
(d2) process means connected to the receive memory portion for selectively processing a second shared data signal for which is predetermined a second data ID requested by the at least one program among shared data signals used by the at least one program, wherein the process means includes (i) request means for informing the receive memory portion of the second data ID and for requesting the receive memory portion to selectively read out said second shared data signal, and (ii) execution control means connected to the receive memory portion for controlling execution of the at least one program, depending upon whether said second shared data signal is present within the receive memory portion.

7. A parallel computer system for executing a group of programs in parallel so as to process a group of data signals which include shared data signals each one of which is produced by a corresponding one of the programs and used by a corresponding another one of the programs, comprising:

(a) a plurality of processors operable in parallel;
(b) network means connected to the processors for transferring data signals in parallel between different pairs of the processors;
(c) each one of the processors having;
(c1) a local memory portion for holding at least one of the programs to be executed by the each one processor of the plurality of processors and a part of the group of the data signals, to be used or produced by the each one processor;
(c2) execution means connected to the network means and the local memory portion for executing instructions of the at least one program;
(c3) a receive memory portion connected to the network means and the execution means for receiving, in parallel to execution of instructions by the execution means, shared data signals transferred to the each one processor from other processors and for holding the shared data signals;
(d) said execution means including;
(d1) send means connected to the network means for transferring, via said network means, a first selected shared data signal produced by the at least one program and a first data identification code (ID) predetermined for the first shared data signal, at a first timing requested by the at least one program, to another selected processor, selected by the at least one program, which is executing another program which uses the first shared data signal;
(d2) request means connected to the receive memory portion for requesting selective readout of a second shared data signal for which is predetermined a second data ID requested by the at least one program, among shared data signals used by the at least one program, at a second timing requested by the at least one program; and
(d3) execution control means connected to the receive memory portion for controlling execution of the at least one program, depending upon whether said second shared data signal is present within the receive memory portion at the second timing.

8. A parallel computer system for executing a group of programs in parallel so as to process a group of data signals which include shared data signals each one of which is produced by a corresponding one of the programs and used by a corresponding another one of the programs, comprising:

(a) a plurality of processors operable in parallel;
(b) network means connected to the processors for transferring data signals in parallel between different pairs of the processors;
(c) each one of the processors having;
(c1) a local memory portion for holding at least one of the programs to be executed by the each one processor of the plurality of processors and a part of the group of the data signals to be used or produced by the each one processor;
(c2) execution means connected to the network means and the local memory portion for executing instructions of the at least one program;
(c3) a receive memory portion connected to the network means and the execution means for receiving, in parallel to execution of instructions by the execution means, shared data signals transferred to the each one processor from other processors and for holding the shared data signals;
(d) said execution means including;
(d1) send means connected to the network means for transferring, via said network means, a first selected shared data signal produced by the at least one program and a first data identification code (ID) predetermined for the first shared data signal, to another selected processor, selected by the at least one program, which is executing another program which uses the first shared data signal, in response to an instruction of a first kind provided in the at least one program for selective transfer of the first shared data signal;
(d2) request means connected to the receive memory portion for requesting selective readout of a second shared data signal for which is predetermined a second data ID informed by the at least one program, among shared data signals used by the at least one program, in response to an instruction of a second kind provided in the at least one program for selective use of said second shared data signal;
(d3) means for executing subsequent instructions of the at least one program to be executed so as to process said second shared data signal after the instruction of the second kind, in case of presence of said second shared data signal within the receive memory portion; and
(d4) means for delaying execution of the subsequent instructions in case of absence of said second shared data within the receive memory portion, until after said second shared data signal subsequently becomes present.

9. A system of claim 8, wherein the delaying means includes;

means for executing other instructions of another program instead of the subsequent instructions when the second shared data signal is absent from the receive memory portion.

10. A system of claim 9, wherein the delaying means further includes;

means for executing the subsequent instructions in response to subsequent arrival of said second shared data signal at the receive memory portion.

11. A system of claim 10, further including:

memory means connected to the network means for holding data ID's informed by the request means when shared data signals assigned with the data ID's are not present in the receive memory portion and for providing the execution control means with an interrupt signal, in response to subsequent arrival, of a pair of a shared data signal and a data ID which is coincident with one of the held data ID's.

12. A method for executing a group of programs by means of a parallel computer system which comprises a plurality of processors each of which executes at least one of the programs, the group of programs processing a group of data signals which include shared data signals each one of which is produced by a corresponding first one of the programs and used by a corresponding second one of the programs, and each one of the processors having a local memory portion for holding at least one of the programs to be executed by the one processor and a part of the group of data signals to be used or produced by the one program, and a receive memory portion for holding ones of the shared data signals transferred from other processors to the one processor, the method comprising the steps of:

(a) executing at least plural ones of the programs in parallel by different ones of the processors;
(b) requesting selective transfer of each one of the shared data signals, by a corresponding first program which has produced the one shared data signal and at a first timing determined by the corresponding first program, so that the one shared data signal is transferred from a corresponding first one of the processors which is executing the corresponding first program to a corresponding second one of the processors, selected by the corresponding first program, which is executing a corresponding second program which uses the one shared data signal;
(c) transferring each one of the shared data signals from a corresponding first processor to a receive memory portion of a corresponding second processor in response to execution of the step of requesting transfer of the one shared data signal by a corresponding first program, and in parallel to execution of a corresponding second program;
(d) selectively processing each one of the shared data signals, by a corresponding second program being executed by a corresponding second processor and either at a second timing requested by the corresponding second program or at a third timing when the one shared data signal subsequently available to use within arrives at a receive memory portion of a corresponding second processor, depending upon whether or not the one shared data signal is present within the receive memory portion at the second timing.

13. A method for executing a group of programs by means of a parallel computer system which comprises a plurality of processors each of which executes at least one of the programs, the group of programs processing a group of data signals which include shared data signals each one of which is produced by a corresponding first one of the programs and used by a corresponding second one of the programs, and each one of the processors having a local memory portion for holding at least one of the programs to be executed by the one processor and a part of the group of data signals to be used or produced by the one program, and a receive memory portion for holding ones of the shared data signals transferred from other processors to the one processor, the method comprising the steps of:

(a) executing at least plural ones of the programs in parallel by different ones of the processors;
(b) requesting selective transfer of each one of the shared data signals, by a corresponding first program which has produced the one shared data signal and at a first timing requested by the corresponding first program, so that the one shared data signal is transferred from a corresponding first one of the processors which is executing the corresponding first program to a corresponding second one of the processors, selected by the corresponding first program, which is executing a corresponding second program which uses the one shared data signal;
(c) transferring each one of the shared data signals from a corresponding first processor to a receive memory portion of a corresponding second processor in response to execution of the step of requesting transfer of the one shared data signal by a corresponding first program and in parallel to execution of a corresponding second program;
(d) requesting selective readout of each one of the shared data signals, among shared data signals used by a corresponding second program, from a receive memory portion within a corresponding second processor executing the corresponding second program, at a second timing requested by the corresponding second program.

14. A method for executing a group of programs by means of a parallel computer system which comprises a plurality of processors each of which executes at least one of the programs, the group of programs processing a group of data signals which include shared data signals each one of which is produced by a corresponding first one of the programs and used by a corresponding second one of the programs, and each one of the processors having a local memory portion for holding at least one of the programs to be executed by the one processor and a part of the group of data signals to be used or produced by the one program, and a receive memory portion for holding ones of the shared data signals transferred from other processors to the one processor, the method comprising the steps of:

(a) executing at least plural ones of the programs in parallel by different ones of the processors;
(b) executing, for each one of the shared data signals, a corresponding instruction of a first kind provided for selective transfer of the one shared data signal, in a corresponding first program which has produced the one shared data signal, so that the one shared data signal is transferred from a corresponding first one of the processors to a corresponding second one of the processors;
(c) transferring each one of the shared data signals from a corresponding first processor executing a corresponding first program to a receive memory portion of a corresponding second processor, selected by the corresponding first program, executing a corresponding second program which uses the one shared data signal, in response to execution of a corresponding instruction of a first kind by a corresponding first program and in parallel to execution of a corresponding second program; and
(d) requesting selective readout of each one of the shared data signals, among shared data signals used by a corresponding second program, from a receive memory portion in a corresponding second processor executing the corresponding second program, by execution of a corresponding instruction of a second kind provided in a corresponding second program for selective use of the one shared data signal.

15. A method for executing a group of programs by means of a parallel computer system which comprises a plurality of processors each of which executes at least one of the programs, the group of programs processing a group of data signals which include shared data signals each one of which is produced by a corresponding first one of the programs and used by a corresponding second one of the programs, and each one of the processors having a local memory portion for holding at least one of the programs to be executed by the one processor and a part of the group of data signals to be used or produced by the one program, and a receive memory portion for holding ones of the shared data signals transferred from other processors to the one processor, the method comprising the steps of:

(a) executing at least plural ones of the programs in parallel by different ones of the processors;
(b) requesting selective transfer of each one of the shared data signals and a data identification code (ID) predetermined for the one shared data signal, by a corresponding first program which has produced the one shared data signal, so that the one shared data signal and the data ID is transferred from a corresponding first one of the processors which is executing the corresponding first program to a corresponding second one of the processors, selected by the one program, which is executing a corresponding second program which uses the one shared data signal;
(c) transferring each one of the shared data signals and a data identification code (ID) predetermined for the one shared data signal from a corresponding first processor to a receive memory portion of a corresponding second processor, in response to execution of the step of requesting transfer for the one shared data signal, by a corresponding first program and in parallel to execution of a corresponding second program; and
(d) selectively processing each one of the shared data signals by a corresponding second program, among shared data signals to be used thereby, by designating a data ID for the one data signal from the corresponding second program.

16. A method for executing a group of programs by means of a parallel computer system which comprises a plurality of processors each of which executes at least one of the programs, the group of programs processing a group of data signals which include shared data signals each one of which is produced by a corresponding first one of the programs and used by a corresponding second one of the programs, and each one of the processors having a local memory portion for holding at least one of the programs to be executed by the one processor and a part of the group of data signals to be used or produced by the one program, and a receive memory portion for holding ones of the shared data signals transferred from other processors to the one processor, the method comprising the steps of:

(a) executing at least plural ones of the programs in parallel by different ones of the processors;
(b) requesting selective transfer of each one of the shared data signals and a data identification code (ID) predetermined for the one shared data signal, by a corresponding first program which has produced the one shared data signal, so that the one shared data signal and the data ID is transferred from a corresponding first one of the processors which is executing the corresponding first program to a corresponding second one of the processors which is executing a corresponding second program which uses the one shared data signal;
(c) transferring each one of the shared data signals and a data identification code (ID) predetermined for the one shared data signal from a corresponding first processor to a receive memory portion of a corresponding second processor, selected by the corresponding first program being executed by the corresponding first processor, in response to execution of the step of requesting transfer for the one shared data signal, by the corresponding first program and in parallel to execution of a corresponding second program; and
(d) requesting selective readout of each one of the shared data signals from a receive memory portion within a corresponding second processor, among shared data signals to be used by a corresponding second program, by designating a data ID for the one data signal by the corresponding second program.

17. A method for executing a group of programs by means of a parallel computer system which comprises a plurality of processors each of which executes at least one of the programs, the group of programs processing a group of data signals which include shared data signals each one of which is produced by a corresponding first one of the programs and used by a corresponding second one of the programs, and each one of the processors having a local memory portion for holding at least one of the programs to be executed by the one processor and a part of the group of data signals to be used or produced by the one program, and a receive memory portion for holding ones of the shared data signals transferred from other processors to the one processor, the method comprising the steps of:

(a) executing at least plural ones of the programs in parallel by different ones of the processors;
(b) executing, for each one of the shared data signals and a data identification code (ID) predetermined for the one shared data signal, a corresponding instruction of a first kind provided in a corresponding first program for selective transfer of the one shared data signal, by a corresponding first program which has produced the one shared data signal;
(c) transferring each one of the shared data signals and a data identification code (ID) predetermined for the one shared data signal from a corresponding first processor to a receive memory portion of a corresponding second processor, selected by a corresponding first program being executed by the corresponding first processor executing a corresponding second program which uses the one shared data signal, in response to execution of a corresponding instruction of the first kind and in parallel to execution of the corresponding second program; and
(d) requesting selective readout of each one of the shared data signals from a receive memory portion within a corresponding second processor, among shared data signals to be used by a corresponding second program, by designating a data ID for the one data signal by execution, by the corresponding second program, of a corresponding instruction for selective use of the one shared data signal.

18. A parallel computer system for executing a group of programs in parallel so as to process a group of data signals which include shared data signals each one of which is produced by a corresponding one of the programs and used by a corresponding another one of the programs, comprising:

(a) a plurality of processor elements operable in parallel;
(b) a network connected to the processors for transferring data signals in parallel between different pairs of the processors;
(c) each one of the processor elements having;
(c1) a local memory for holding at least one of the programs to be executed by the each one processor element and a part of the group of the data signals to be used or produced by the each one processor element;
(c2) a processor connected to the network and the local memory portion for executing instructions of the at least one program in accordance a processor program control;
(c3) a receive memory connected to the network and the processor for receiving, in parallel to execution of instructions by the processor, shared data signals transferred to the each one processor element from other processor elements and for holding the shared data signals; and,
(c4) a send unit connected to the network for transferring, via said network, a selected shared data signal produced by the at least one program to another selected processor element, selected by the at least one program, which is executing another program which uses the shared data signal,
wherein the processor connected to receive memory selectively processes the shared data signal requested by the at least one program, in accordance with a timing control determined by the processor program control, wherein other instructions are processed by the processor after the shared data signal is received in the receiver memory.

19. A parallel computer system for executing a group of programs in parallel so as to process a group of data signals which include shared data signals each one of which is produced by a corresponding one of the programs and used by a corresponding another one of the programs, comprising:

(a) a plurality of processor elements operable in parallel;
(b) a network connected to the processors for transferring data signals in parallel between different pairs of the processors;
(c) each one of the processor elements having;
(c1) a local memory for holding at least one of the programs to be executed by the each one processor element and a part of the group of the data signals to be used or produced by the each one processor element;
(c2) a processor connected to the network and the local memory portion for executing instructions of the at least one program;
(c3) a receive memory connected to the network and the processor for receiving, in parallel to execution of instructions by the processor, shared data signals transferred to the each one processor element from other processor elements and for holding the shared data signals;
(c4) a send unit connected to the network for transferring, via said network, a first selected shared data signal produced by the at least one program to another selected processor element, selected by the at least one program, which is executing another program which uses the first shared data signal, at a first timing requested by the at least one program and in parallel to execution of the another program by the another processor element; and,
wherein the processor connected to receive memory selectively processes a second shared data signal in an order requested by the at least one program, among shared data signals used thereby, by examining if a data identification code indicating the second shared data signal effectively exists in the receive memory, wherein the second shared data signal may be sent from the receive memory to the processor either at a second timing requested by the at least one program, or at a third timing after said second shared data signal subsequently arrives at the receive memory portion and the examining indicates a coincidence of the data identification code.

20. A parallel computer system for executing a group of programs in parallel so as to process a group of data signals which include shared data signals each one of which is produced by a corresponding one of the programs and used by a corresponding another one of the programs, comprising:

(a) a plurality of processor elements operable in parallel;
(b) a network connected to the processors for transferring data signals in parallel between different pairs of the processors;
(c) each one of the processor elements having;
(c1) a local memory for holding at least one of the programs to be executed by the each one processor element and a part of the group of the data signals to be used or produced by the each one processor element;
(c2) a processor connected to the network and the local memory portion for executing instructions of the at least one program in accordance a processor program control;
(c3) a receive memory connected to the network and the processor for receiving, in parallel to execution of instructions by the processor, shared data signals transferred to the each one processor element from other processor elements and for holding the shared data signals; and,
(c4) a send unit connected to the network for transferring, via said network, a selected shared data signal produced by the at least one program to another selected processor element, selected by the at least one program, which is executing another program which uses the selected shared data signal,
wherein the processor connected to the receive memory selectively processes the selected shared data signal requested by the at least one program, in accordance with a timing control determined by the processor program control, wherein other instructions are processed by the processor after the shared data signal is received in the receiver memory and said other instructions continue to be processed until the timing control determines that the shared data signal shall be processed.

21. A parallel computer system for executing a group of programs in parallel so as to process a group of data signals which include shared data signals each one of which is produced by a corresponding one of the programs and used by a corresponding another one of the programs, comprising:

(a) a plurality of processors operable in parallel;
(b) network means connected to the processors for transferring data signals in parallel between different pairs of the processors;
(c) each one of the processors having;
(c1) a local memory portion for holding at least one of the programs to be executed by the each one processor of the plurality of processors and a part of the group of the data signals to be used or produced by the each one processor;
(c2) execution means connected to the network means and the local memory portion for executing instructions of the at least one program;
(c3) a receive memory portion connected to the network means and the execution means for receiving, in parallel to execution of instructions by the execution means, shared data signals transferred to the each one processor from other processors and for holding the shared data signals, wherein the receive memory portion comprises associative memory means for holding pairs of a shared data signal and a data ID transferred from other processors;
(d) said execution means including;
(d1) send means connected to the network means for transferring, via said network means, a first shared data signal requested by the at least one program and a first data identification code (ID) predetermined for the first shared data signal, to another selected processor, selected by the at least one program, which is executing another program which uses the first shared data signal;
(d2) process means connected to the receive memory portion for selectively processing a second shared data signal for which is predetermined a second data ID requested by the at least one program among shared data signals used by the at least one program.
Referenced Cited
U.S. Patent Documents
3662401 May 1972 Collins et al.
3978452 August 31, 1976 Barton et al.
4064553 December 20, 1977 Kashio
4110822 August 29, 1978 Porter et al.
4153932 May 8, 1979 Dennis et al.
4156910 May 29, 1979 Barton et al.
4325120 April 13, 1982 Colley et al.
4408273 October 4, 1983 Plow
4412303 October 25, 1983 Barnes et al.
4414624 November 8, 1983 Summer, Jr. et al.
4462075 July 24, 1984 Mori et al.
4481571 November 6, 1984 Pilat et al.
4601586 July 22, 1986 Bahr et al.
4627055 December 2, 1986 Mori et al.
4763254 August 9, 1988 Mori et al.
4789986 December 6, 1988 Koizumi et al.
4794519 December 27, 1988 Koizumi et al.
4797885 January 10, 1989 Orimo et al.
4811215 March 7, 1989 Smith
4814978 March 21, 1989 Dennis
4831512 May 16, 1989 Nakai et al.
4858101 August 15, 1989 Stewart et al.
4870571 September 26, 1989 Frink
4872165 October 3, 1989 Mori et al.
4891787 January 2, 1990 Gifford
4901229 February 13, 1990 Tashiro et al.
4901274 February 13, 1990 Maejima et al.
4905145 February 27, 1990 Sauber
5021947 June 4, 1991 Campbell et al.
5043873 August 27, 1991 Muramatsu et al.
5127104 June 1992 Dennis
Foreign Patent Documents
79-127653 October 1979 JPX
80-061836 May 1980 JPX
83-127249 July 1983 JPX
85-169966 September 1985 JPX
86-059554 March 1986 JPX
Other references
  • Agrawal, Dharma P., Advanced Computer Architecture, IEEE, pp. 51-68, 1986. Amamiya et al., "Implementation and Evaluation of a List-Processing-Oriented Data Flow Machine.", Conference Proceedings of the 13th International Symposium on Computer Architecture, IEEE, Jun. 2-5, 1986, pp. 10-19. Agerveala et al., "Data Flow Systems," Computer, IEEE, Feb. 1982, pp. 10-12. Vason P. Srini, "Architectural Comparison of Dataflow Systems," Computer, IEEE, Mar. 1986, pp. 68-87. Watson et al., "Practical Data Flow Computer," Feb. 1982, Computer, IEEE, pp. 51-57. Davis et al., "Data Flow Program Graphs," Computer, IEEE, Feb. 1982, pp. 26-41. Kim P. Gostelove, "The U-Interpreter," Computer, IEEE, Feb. 1982, pp. 42-49. William B. Ackerman, "Data Flow Languages," Computer, IEEE, Feb. 1982, pp. 15-24. Greene, Maj. W., "A Review of Classification Schemes for Computer Communication Networks", vol. 10, No. 11, Nov. 1977, pp. 12-21. Deutsch, J. T., et al. "MSPLICE: A Multiprocessor-based Circuit Simulator", Aug. 1984, pp. 21-24.
Patent History
Patent number: 5867679
Type: Grant
Filed: Jan 30, 1991
Date of Patent: Feb 2, 1999
Assignee: Hitachi, Ltd. (Tokyo)
Inventors: Teruo Tanaka (Hachioji), Naoki Hamanaka (Tokyo), Koichiro Omoda (Sagamihara), Shigeo Nagashima (Hachioji), Akira Muramatsu (Kawasaki), Ikuo Yoshihara (Tama), Kazuo Nakao (Sagamihara), Junji Nakagoshi (Tokyo), Kazuo Ojima (Tokyo)
Primary Examiner: William M. Treat
Attorney: Fay Sharpe Fagan Minnich & McKee Beall
Application Number: 7/647,773
Classifications
Current U.S. Class: 395/377; 395/392; 395/566; 395/581; 395/588; 395/80016; 395/80018; 395/80021; 395/80026; 395/80027; 395/676; 395/80025
International Classification: G06F 1582;