Systolic Array Processor Patents (Class 712/19)

Barrier synchronization mechanism for processors of a systolic array

Patent number: 7100021

Abstract: A mechanism synchronizes among processors of a processing engine in an intermediate network station. The processing engine is configured as a systolic array having a plurality of processors arrayed as rows and columns. The mechanism comprises a barrier synchronization mechanism that enables synchronization among processors of a column (i.e., different rows) of the systolic array. That is, the barrier synchronization function allows all participating processors within a column to reach a common point within their instruction code sequences before any of the processors proceed.

Type: Grant

Filed: October 16, 2001

Date of Patent: August 29, 2006

Assignee: Cisco Technology, Inc.

Inventors: John William Marshall, Barry S. Burns, Darren Kerr
Processor having systolic array pipeline for processing data packets

Patent number: 7069372

Abstract: A processor for use in a router, the processor having a systolic array pipeline for processing data packets to determine to which output port of the router the data packet should be routed. In one embodiment, the systolic array pipeline includes a plurality of programmable functional units and register files arranged sequentially as stages, for processing packet contexts (which contain the packet's destination address) to perform operations, under programmatic control, to determine the destination port of the router for the packet. A single stage of the systolic array may contain a register file and one or more functional units such as adders, shifters, logical units, etc., for performing, in one example, very long instruction word (vliw) operations. The processor may also include a forwarding table memory, on-chip, for storing routing information, and a cross bar selectively connecting the stages of the systolic array with the forwarding table memory.

Type: Grant

Filed: June 20, 2002

Date of Patent: June 27, 2006

Assignee: CISCO Technology, Inc.

Inventors: Arthur Leung, Jr., Anthony J. Li, William L. Lynch, Sharad Mehrotra
Buffered coscheduling for parallel programming and enhanced fault tolerance

Patent number: 6993764

Abstract: A computer implemented method schedules processor jobs on a network of parallel machine processors or distributed system processors. Control information communications generated by each process performed by each processor during a defined time interval is accumulated in buffers, where adjacent time intervals are separated by strobe intervals for a global exchange of control information. A global exchange of the control information communications at the end of each defined time interval is performed during an intervening strobe interval so that each processor is informed by all of the other processors of the number of incoming jobs to be received by each processor in a subsequent time interval.

Type: Grant

Filed: June 28, 2001

Date of Patent: January 31, 2006

Assignee: The Regents of the University of California

Inventors: Fabrizio Petrini, Wu-chun Feng
Boundary synchronization mechanism for a processor of a systolic array

Patent number: 6986022

Abstract: A mechanism synchronizes instruction code executing on a processor of a processing engine in an intermediate network station. The processing engine is configured as a systolic array having a plurality of processors arrayed as rows and columns. The mechanism comprises a boundary (temporal) synchronization mechanism for cycle-based synchronization within a processor of the array. The synchronization mechanism is generally implemented using specialized synchronization micro operation codes (“opcodes”).

Type: Grant

Filed: October 16, 2001

Date of Patent: January 10, 2006

Assignee: Cisco Technology, Inc.

Inventors: John William Marshall, Barry S. Burns, Darren Kerr
Compiler synchronized multi-processor programmable logic device with direct transfer of computation results among processors

Patent number: 6915410

Abstract: A system for designing and implementing digital integrated circuits utilizing a set of synchronized sequencers that permit quick and efficient parallel processing of system level designs. The system and method converts digital schematics and hardware description language (HDL) based designs into a set of logic equations and single bit arithmetic-logic operations executed by a set of parallel operating sequencers. The system includes software for converting netlists and HDL designs into Boolean logic equations, and a compiler for distributing these logic equations between multiple sequencers. Each sequencer is comprised of a logic processor and the associated program memory for storing the executable code of the assigned Boolean logic equations and data memory for storing the results of processing of logic equations. To synchronize execution of logic equations by multiple sequencers, all program memories are addressed by one common address register.

Type: Grant

Filed: January 23, 2003

Date of Patent: July 5, 2005

Inventor: Stanley M. Hyduke
Architecture for a processor complex of an arrayed pipelined processing engine

Patent number: 6836838

Abstract: A processor complex architecture facilitates accurate passing of transient data among processor complex stages of a pipelined processing engine. The processor complex comprises a central processing unit (CPU) coupled to an instruction memory and a pair of context data memory structures via a memory manager circuit. The context memories store transient “context” data for processing by the CPU in accordance with instructions stored in the instruction memory. The architecture further comprises data mover circuitry that cooperates with the context memories and memory manager to provide a technique for efficiently passing data among the stages in a manner that maintains data coherency in the processing engine. An aspect of the architecture is the ability of the CPU to operate on the transient data substantially simultaneously with the passing of that data by the data mover.

Type: Grant

Filed: August 16, 2002

Date of Patent: December 28, 2004

Assignee: Cisco Technology, Inc.

Inventors: Michael L. Wright, Darren Kerr, Kenneth Michael Key, William E. Jennings
Single instruction multiple data massively parallel processor systems on a chip and system using same

Patent number: 6754802

Abstract: A single chip active memory includes a plurality of memory stripes, each coupled to a full word interface and one of a plurality of processing element (PE) sub-arrays. The large number of couplings between a PE sub-array and its associated memory stripe are managed by placing the PE sub-arrays so that their data paths run at right angle to the data paths of the plurality of memory stripes. The data lines exiting the memory stripes are run across the PE sub-arrays on one metal layer. At the appropriate locations, the data lines are coupled to another orthogonally oriented metal layer to complete the coupling between the memory stripe and its associated PE sub-array. The plurality of PE sub-arrays are mapped to form a large logical array, in which each PE is coupled to four other PEs. Physically distant PEs are coupled using current mode differential logical couplings an drivers to insure good signal integrity at high operational speeds. Each PE contains a small DRAM register array.

Type: Grant

Filed: August 25, 2000

Date of Patent: June 22, 2004

Assignee: Micron Technology, Inc.

Inventor: Graham Kirsch
Channel equalisers

Patent number: 6636561

Abstract: The present invention provides a method whereby an adaptive equaliser is applied to the terminal (mobile) receivers in a cellular radio CDMA system whose purpose is to minimise the mutual interference between users sharing the same radio channel. The application relevant to third generation cellular systems which consist of UTRA (or WBCDMA) in Europe and CDMA2000 in the USA (or future merged standards variants of these systems), and has a special application to the time domain duplex (TDD) mode of these systems. An algorithm is provided whereby the equaliser is adapted to conform to some recognised optimality criterion which is known to lead to a minimum mutual interference situation. One method is the constrained minimum output finite impulse response (FIR) digital filter power condition, but the technique is not limited to this criterion. The method reduces the computation load of a digital filter by selecting a sparse subset of delay line taps which are actively weighted and used as a filter.

Type: Grant

Filed: June 29, 1999

Date of Patent: October 21, 2003

Assignee: Nortel Networks Limited

Inventor: John E Hudson
MIMD array of single bit processors for processing logic equations in strict sequential order

Patent number: 6578133

Abstract: A system for designing and implementing digital integrated circuits utilizing a set of synchronized sequencers that permit quick and efficient parallel processing of system level designs. The system and method converts digital schematics and hardware description language (HDL) based designs into a set of logic equations and single bit arithmetic-logic operations executed by a set of parallel operating sequencers. The system includes software for converting netlists and HDL designs into Boolean logic equations, and a compiler for distributing these logic equations between multiple sequencers. Each sequencer is comprised of a logic processor and the associated program memory for storing the executable code of the assigned Boolean logic equations and data memory for storing the results of processing of logic equations. To synchronize execution of logic equations by multiple sequencers, all program memories are addressed by one common address register.

Type: Grant

Filed: February 24, 2000

Date of Patent: June 10, 2003

Inventor: Stanley M. Hyduke
Programmable processing engine for efficiently processing transient data

Patent number: 6513108

Abstract: A programmable processing engine processes transient data within an intermediate network station of a computer network. The engine comprises an array of processing elements symmetrically arrayed as rows and columns, and embedded between input and output buffer units with a plurality of interfaces from the array to an external memory. The external memory stores non-transient data organized within data structures, such as forwarding and routing tables, for use in processing the transient data. Each processing element contains an instruction memory that allows programming of the array to process the transient data as processing element stages of baseline or extended pipelines operating in parallel.

Type: Grant

Filed: June 29, 1998

Date of Patent: January 28, 2003

Assignee: Cisco Technology, Inc.

Inventors: Darren Kerr, Kenneth Michael Key, Michael L. Wright, William E. Jennings
Apparatus and method for signal processing

Patent number: 6460127

Abstract: An associative signal processing apparatus for processing a plurality of samples of an incoming signal in parallel, the apparatus comprising: (a) an array, of processors, each processor including a multiplicity of associative memory cells, the memory cells being operative to perform: (i) compare operations, in parallel, on the plurality of samples of the incoming signal; and (ii) write operations, in parallel, on the plurality of samples of the incoming signal; and (b) an I/O buffer register including a multiplicity of associative memory cells, the register being operative to: (i) input the plurality of samples of the incoming signal to the array of processors in parallel by having the I/O buffer register memory cells perform at least one associative compare operation and the array memory cells perform at least one associative write operation; and (ii) receive, in parallel, a plurality of processed samples from the array of processors by having the array memory cells perform at least one associative compare o

Type: Grant

Filed: October 26, 1998

Date of Patent: October 1, 2002

Assignee: Neomagic Israel Ltd.

Inventor: Avidan Akerib
Architecture for a process complex of an arrayed pipelined processing engine

Patent number: 6442669

Abstract: A processor complex architecture facilitates accurate passing of transient data among processor complex stages of a pipelined processing engine. The processor complex comprises a central processing unit (CPU) coupled to an instruction memory and a pair of context data memory structures via a memory manager circuit. The context memories store transient “context” data for processing by the CPU in accordance with instructions stored in the instruction memory. The architecture further comprises data mover circuitry that cooperates with the context memories and memory manager to provide a technique for efficiently passing data among the stages in a manner that maintains data coherency in the processing engine. An aspect of the architecture is the ability of the CPU to operate on the transient data substantially simultaneously with the passing of that data by the data mover.

Type: Grant

Filed: November 30, 2000

Date of Patent: August 27, 2002

Assignee: Cisco Technology, Inc.

Inventors: Michael L. Wright, Darren Kerr, Kenneth Michael Key, William E. Jennings
Massively parallel array processor

Patent number: 6405185

Abstract: Image processing for multimedia workstations is a computationally intensive task requiring special purpose hardware to meet the high speed requirements associated with the task. One type of specialized hardware that meets the computation high speed requirements is the mesh connected computer. Such a computer becomes a massively parallel machine when an array of computers interconnected by a network are replicated in a machine. The nearest neighbor mesh computer consists of an N×N square array of Processor Elements(PEs) where each PE is connected to the North, South, East and West PEs only. The diagonal folded mesh array processor, which is called Oracle, allows the matrix transformation operation to be accomplished in one cycle by simple interchange of the data elements in the dual symmetric processor elements.

Type: Grant

Filed: March 23, 1995

Date of Patent: June 11, 2002

Assignee: International Business Machines Corporation

Inventors: Gerald George Pechanek, Stamatis Vassiliadis, Jose Guadalupe Delgado-Frias
Multi-processor system, data processing system, data processing method, and computer program

Publication number: 20020059509

Abstract: The multi-processor system comprises a plurality of cell processors for performing data processing, a BCMC for broadcasting broadcast data including data used in data processing to the plurality of cell processors, each of the plurality of cell processors sorts out only data necessary for data processing that is performed by each cell processor from broadcast data broadcasted by BCMC to as to perform data processing. BCMC obtains results of data processing of all cell processors so that they can be supplied to all cell processors as broadcast data, thus making it possible to transmit and receive the results of data processing between the cell processors and perform high-speed data processing as an entire system.

Type: Application

Filed: September 26, 2001

Publication date: May 16, 2002

Inventor: Nobuo Sasaki
System and method for power optimization in parallel units

Publication number: 20020002664

Abstract: A plurality of parallel execution units are selectively powered from a plurality of power sources, the power to each execution unit being selected based upon expected time to completion of processing within the execution unit. Maximum power is gated to execution units executing complex instructions, or time-critical instructions. Less than maximum power is gated to execution units executing simple instructions, or instructions which are not time-critical, or in response to pipeline hazards or stalls. When less than maximum power is gated to an execution unit, a step up circuit may be employed to raise the output of that execution unit to maximum power.

Type: Application

Filed: April 10, 2001

Publication date: January 3, 2002

Applicant: International Business Machines Corporation

Inventor: Mark William Kuemerle
Method and apparatus for processing a set of data values with plural processing units mask bits generated by other processing units

Patent number: 6308250

Abstract: A method and system for operating a computing system having multiple processing units. According to a new machine instruction, called the iota instruction, the computing system operates on a vector of mask bits to generate an iota vector having a sequence of values. In one form, each value of the iota vector is a sum of a series of the lower order mask bits up to and including the mask bit corresponding to the entry in the iota vector. In another form, each entry in the iota vector is a sum of a series of lower order mask bits but does not include the mask bit corresponding to the particular entry in the iota vector. In order to calculate the iota vector, the multiple processing units of the present invention communicate the mask bits to the other processing units. Advantages of the present invention include the vectorization of software loops having certain data hazards that prevented conventional compilers from vectorizing the software.

Type: Grant

Filed: June 23, 1998

Date of Patent: October 23, 2001

Assignee: Silicon Graphics, Inc.

Inventor: Peter Michael Klausler
Synchronization and control system for an arrayed processing engine

Patent number: 6272621

Abstract: A synchronization and control system for an arrayed processing engine of an intermediate network station comprises sequencing circuitry that controls the processing engine. The processing engine generally includes a plurality of processing element stages arrayed as parallel pipelines. The control system further includes an input header buffer (IHB) and an output header buffer (OHB), the latter comprising circuitry for receiving current transient data processed by the pipelines and for decoding control signals to determine a destination for the processed data. One destination is a feedback path that couples the OHB to the IHB and returns the processed data to the IHB for immediate loading into an available pipeline.

Type: Grant

Filed: August 18, 2000

Date of Patent: August 7, 2001

Assignee: Cisco Technology, Inc.

Inventors: Kenneth Michael Key, Michael L. Wright, Darren Kerr, William E. Jennings
Register file having shared and local data word parts

Patent number: 6219777

Abstract: Disclosed is a register file used in a multiprocessor composition composed of a plurality of processor elements, the register file having a plurality of words and being provided for each of the plurality of processor elements, wherein: the plurality of words are divided into a word part that can be simultaneously accessed by some of the plurality of processor elements to use in common with other processor element, and a word part that can be accessed only by its own processor element.

Type: Grant

Filed: July 10, 1998

Date of Patent: April 17, 2001

Assignee: NEC Corporation

Inventor: Toshiaki Inoue
Method and apparatus for passing data among processor complex stages of a pipelined processing engine

Patent number: 6195739

Abstract: A processor complex architecture facilitates accurate passing of transient data among processor complex stages of a pipelined processing engine. The processor complex comprises a central processing unit (CPU) coupled to an instruction memory and a pair of context data memory structures via a memory manager circuit. The context memories store transient “context” data for processing by the CPU in accordance with instructions stored in the instruction memory. The architecture further comprises data mover circuitry that cooperates with the context memories and memory manager to provide a technique for efficiently passing data among the stages in a manner that maintains data coherency in the processing engine. An aspect of the architecture is the ability of the CPU to operate on the transient data substantially simultaneously with the passing of that data by the data mover.

Type: Grant

Filed: June 29, 1998

Date of Patent: February 27, 2001

Assignee: Cisco Technology, Inc.

Inventors: Michael L. Wright, Darren Kerr, Kenneth Michael Key, William E. Jennings
Synchronization and control system for an arrayed processing engine

Patent number: 6119215

Abstract: A synchronization and control system for an arrayed processing engine of an intermediate network station comprises sequencing circuitry that controls the processing engine. The processing engine generally includes a plurality of processing element stages arrayed as parallel pipelines. The control system further includes an input header buffer (IHB) and an output header buffer (OHB), the latter comprising circuitry for receiving current transient data processed by the pipelines and for decoding control signals to determine a destination for the processed data. One destination is a feedback path that couples the OHB to the IHB and returns the processed data to the IHB for immediate loading into an available pipeline.

Type: Grant

Filed: June 29, 1998

Date of Patent: September 12, 2000

Assignee: Cisco Technology, Inc.

Inventors: Kenneth Michael Key, Michael L. Wright, Darren Kerr, William E. Jennings
Manifold array processor

Patent number: 6023753

Abstract: An array processor includes processing elements arranged in clusters which are, in turn, combined in a rectangular array. Each cluster is formed of processing elements which preferably communicate with the processing elements of at least two other clusters. Additionally each inter-cluster communication path is mutually exclusive, that is, each path carries either north and west, south and east, north and east, or south and west communications. Due to the mutual exclusivity of the data paths, communications between the processing elements of each cluster may be combined in a single inter-cluster path. That is, communications from a cluster which communicates to the north and east with another cluster may be combined in one path, thus eliminating half the wiring required for the path. Additionally, the length of the longest communication path is not directly determined by the overall dimension of the array, as it is in conventional torus arrays.

Type: Grant

Filed: June 30, 1997

Date of Patent: February 8, 2000

Assignee: Billion of Operations Per Second, Inc.

Inventors: Gerald G. Pechanek, Charles W. Kurak, Jr.
Parallel prefix operations in asynchronous processors

Patent number: 5999961

Abstract: A circuit for performing prefix computation in an asynchronous digital processor by implementing a serial process and a tree process for the same prefix computation in parallel. The first output from either processes is selected and used for the subsequent operation. For a prefix computation with N inputs, an average-case latency of O(loglog N) can be achieved. Buffering can be used for a full-throughout operation.

Type: Grant

Filed: September 15, 1997

Date of Patent: December 7, 1999

Assignee: California Institute of Technology

Inventors: Rajit Manohar, Alain J. Martin
Sequence information signal processor

Patent number: 5964860

Abstract: An electronic circuit is used to compare two sequences, such as genetic sequences, to determine which alignment of the sequences produces the greatest similarity. The circuit includes a linear array of series-connected processors, each of which stores a single element from one of the sequences and compares that element with each successive element in the other sequence. For each comparison, the processor generates a scoring parameter that indicates which segment ending at those two elements produces the greatest degree of similarity between the sequences. The processor uses the scoring parameter to generate a similar scoring parameter for a comparison between the stored element and the next successive element from the other sequence. The processor also delivers the scoring parameter to the next processor in the array for use in generating a similar scoring parameter for another pair of elements.

Type: Grant

Filed: April 8, 1997

Date of Patent: October 12, 1999

Assignee: California Institute of Technology

Inventors: John C. Peterson, Edward T. Chow, Michael S. Waterman, Timothy J. Hunkapillar

prev 1 2