Systolic Array Processor Patents (Class 712/19)
  • Patent number: 7100021
    Abstract: A mechanism synchronizes among processors of a processing engine in an intermediate network station. The processing engine is configured as a systolic array having a plurality of processors arrayed as rows and columns. The mechanism comprises a barrier synchronization mechanism that enables synchronization among processors of a column (i.e., different rows) of the systolic array. That is, the barrier synchronization function allows all participating processors within a column to reach a common point within their instruction code sequences before any of the processors proceed.
    Type: Grant
    Filed: October 16, 2001
    Date of Patent: August 29, 2006
    Assignee: Cisco Technology, Inc.
    Inventors: John William Marshall, Barry S. Burns, Darren Kerr
  • Patent number: 7069372
    Abstract: A processor for use in a router, the processor having a systolic array pipeline for processing data packets to determine to which output port of the router the data packet should be routed. In one embodiment, the systolic array pipeline includes a plurality of programmable functional units and register files arranged sequentially as stages, for processing packet contexts (which contain the packet's destination address) to perform operations, under programmatic control, to determine the destination port of the router for the packet. A single stage of the systolic array may contain a register file and one or more functional units such as adders, shifters, logical units, etc., for performing, in one example, very long instruction word (vliw) operations. The processor may also include a forwarding table memory, on-chip, for storing routing information, and a cross bar selectively connecting the stages of the systolic array with the forwarding table memory.
    Type: Grant
    Filed: June 20, 2002
    Date of Patent: June 27, 2006
    Assignee: CISCO Technology, Inc.
    Inventors: Arthur Leung, Jr., Anthony J. Li, William L. Lynch, Sharad Mehrotra
  • Patent number: 6993764
    Abstract: A computer implemented method schedules processor jobs on a network of parallel machine processors or distributed system processors. Control information communications generated by each process performed by each processor during a defined time interval is accumulated in buffers, where adjacent time intervals are separated by strobe intervals for a global exchange of control information. A global exchange of the control information communications at the end of each defined time interval is performed during an intervening strobe interval so that each processor is informed by all of the other processors of the number of incoming jobs to be received by each processor in a subsequent time interval.
    Type: Grant
    Filed: June 28, 2001
    Date of Patent: January 31, 2006
    Assignee: The Regents of the University of California
    Inventors: Fabrizio Petrini, Wu-chun Feng
  • Patent number: 6986022
    Abstract: A mechanism synchronizes instruction code executing on a processor of a processing engine in an intermediate network station. The processing engine is configured as a systolic array having a plurality of processors arrayed as rows and columns. The mechanism comprises a boundary (temporal) synchronization mechanism for cycle-based synchronization within a processor of the array. The synchronization mechanism is generally implemented using specialized synchronization micro operation codes (“opcodes”).
    Type: Grant
    Filed: October 16, 2001
    Date of Patent: January 10, 2006
    Assignee: Cisco Technology, Inc.
    Inventors: John William Marshall, Barry S. Burns, Darren Kerr
  • Patent number: 6915410
    Abstract: A system for designing and implementing digital integrated circuits utilizing a set of synchronized sequencers that permit quick and efficient parallel processing of system level designs. The system and method converts digital schematics and hardware description language (HDL) based designs into a set of logic equations and single bit arithmetic-logic operations executed by a set of parallel operating sequencers. The system includes software for converting netlists and HDL designs into Boolean logic equations, and a compiler for distributing these logic equations between multiple sequencers. Each sequencer is comprised of a logic processor and the associated program memory for storing the executable code of the assigned Boolean logic equations and data memory for storing the results of processing of logic equations. To synchronize execution of logic equations by multiple sequencers, all program memories are addressed by one common address register.
    Type: Grant
    Filed: January 23, 2003
    Date of Patent: July 5, 2005
    Inventor: Stanley M. Hyduke
  • Patent number: 6836838
    Abstract: A processor complex architecture facilitates accurate passing of transient data among processor complex stages of a pipelined processing engine. The processor complex comprises a central processing unit (CPU) coupled to an instruction memory and a pair of context data memory structures via a memory manager circuit. The context memories store transient “context” data for processing by the CPU in accordance with instructions stored in the instruction memory. The architecture further comprises data mover circuitry that cooperates with the context memories and memory manager to provide a technique for efficiently passing data among the stages in a manner that maintains data coherency in the processing engine. An aspect of the architecture is the ability of the CPU to operate on the transient data substantially simultaneously with the passing of that data by the data mover.
    Type: Grant
    Filed: August 16, 2002
    Date of Patent: December 28, 2004
    Assignee: Cisco Technology, Inc.
    Inventors: Michael L. Wright, Darren Kerr, Kenneth Michael Key, William E. Jennings
  • Patent number: 6754802
    Abstract: A single chip active memory includes a plurality of memory stripes, each coupled to a full word interface and one of a plurality of processing element (PE) sub-arrays. The large number of couplings between a PE sub-array and its associated memory stripe are managed by placing the PE sub-arrays so that their data paths run at right angle to the data paths of the plurality of memory stripes. The data lines exiting the memory stripes are run across the PE sub-arrays on one metal layer. At the appropriate locations, the data lines are coupled to another orthogonally oriented metal layer to complete the coupling between the memory stripe and its associated PE sub-array. The plurality of PE sub-arrays are mapped to form a large logical array, in which each PE is coupled to four other PEs. Physically distant PEs are coupled using current mode differential logical couplings an drivers to insure good signal integrity at high operational speeds. Each PE contains a small DRAM register array.
    Type: Grant
    Filed: August 25, 2000
    Date of Patent: June 22, 2004
    Assignee: Micron Technology, Inc.
    Inventor: Graham Kirsch
  • Patent number: 6636561
    Abstract: The present invention provides a method whereby an adaptive equaliser is applied to the terminal (mobile) receivers in a cellular radio CDMA system whose purpose is to minimise the mutual interference between users sharing the same radio channel. The application relevant to third generation cellular systems which consist of UTRA (or WBCDMA) in Europe and CDMA2000 in the USA (or future merged standards variants of these systems), and has a special application to the time domain duplex (TDD) mode of these systems. An algorithm is provided whereby the equaliser is adapted to conform to some recognised optimality criterion which is known to lead to a minimum mutual interference situation. One method is the constrained minimum output finite impulse response (FIR) digital filter power condition, but the technique is not limited to this criterion. The method reduces the computation load of a digital filter by selecting a sparse subset of delay line taps which are actively weighted and used as a filter.
    Type: Grant
    Filed: June 29, 1999
    Date of Patent: October 21, 2003
    Assignee: Nortel Networks Limited
    Inventor: John E Hudson
  • Patent number: 6578133
    Abstract: A system for designing and implementing digital integrated circuits utilizing a set of synchronized sequencers that permit quick and efficient parallel processing of system level designs. The system and method converts digital schematics and hardware description language (HDL) based designs into a set of logic equations and single bit arithmetic-logic operations executed by a set of parallel operating sequencers. The system includes software for converting netlists and HDL designs into Boolean logic equations, and a compiler for distributing these logic equations between multiple sequencers. Each sequencer is comprised of a logic processor and the associated program memory for storing the executable code of the assigned Boolean logic equations and data memory for storing the results of processing of logic equations. To synchronize execution of logic equations by multiple sequencers, all program memories are addressed by one common address register.
    Type: Grant
    Filed: February 24, 2000
    Date of Patent: June 10, 2003
    Inventor: Stanley M. Hyduke
  • Patent number: 6513108
    Abstract: A programmable processing engine processes transient data within an intermediate network station of a computer network. The engine comprises an array of processing elements symmetrically arrayed as rows and columns, and embedded between input and output buffer units with a plurality of interfaces from the array to an external memory. The external memory stores non-transient data organized within data structures, such as forwarding and routing tables, for use in processing the transient data. Each processing element contains an instruction memory that allows programming of the array to process the transient data as processing element stages of baseline or extended pipelines operating in parallel.
    Type: Grant
    Filed: June 29, 1998
    Date of Patent: January 28, 2003
    Assignee: Cisco Technology, Inc.
    Inventors: Darren Kerr, Kenneth Michael Key, Michael L. Wright, William E. Jennings
  • Patent number: 6460127
    Abstract: An associative signal processing apparatus for processing a plurality of samples of an incoming signal in parallel, the apparatus comprising: (a) an array, of processors, each processor including a multiplicity of associative memory cells, the memory cells being operative to perform: (i) compare operations, in parallel, on the plurality of samples of the incoming signal; and (ii) write operations, in parallel, on the plurality of samples of the incoming signal; and (b) an I/O buffer register including a multiplicity of associative memory cells, the register being operative to: (i) input the plurality of samples of the incoming signal to the array of processors in parallel by having the I/O buffer register memory cells perform at least one associative compare operation and the array memory cells perform at least one associative write operation; and (ii) receive, in parallel, a plurality of processed samples from the array of processors by having the array memory cells perform at least one associative compare o
    Type: Grant
    Filed: October 26, 1998
    Date of Patent: October 1, 2002
    Assignee: Neomagic Israel Ltd.
    Inventor: Avidan Akerib
  • Patent number: 6442669
    Abstract: A processor complex architecture facilitates accurate passing of transient data among processor complex stages of a pipelined processing engine. The processor complex comprises a central processing unit (CPU) coupled to an instruction memory and a pair of context data memory structures via a memory manager circuit. The context memories store transient “context” data for processing by the CPU in accordance with instructions stored in the instruction memory. The architecture further comprises data mover circuitry that cooperates with the context memories and memory manager to provide a technique for efficiently passing data among the stages in a manner that maintains data coherency in the processing engine. An aspect of the architecture is the ability of the CPU to operate on the transient data substantially simultaneously with the passing of that data by the data mover.
    Type: Grant
    Filed: November 30, 2000
    Date of Patent: August 27, 2002
    Assignee: Cisco Technology, Inc.
    Inventors: Michael L. Wright, Darren Kerr, Kenneth Michael Key, William E. Jennings
  • Patent number: 6405185
    Abstract: Image processing for multimedia workstations is a computationally intensive task requiring special purpose hardware to meet the high speed requirements associated with the task. One type of specialized hardware that meets the computation high speed requirements is the mesh connected computer. Such a computer becomes a massively parallel machine when an array of computers interconnected by a network are replicated in a machine. The nearest neighbor mesh computer consists of an N×N square array of Processor Elements(PEs) where each PE is connected to the North, South, East and West PEs only. The diagonal folded mesh array processor, which is called Oracle, allows the matrix transformation operation to be accomplished in one cycle by simple interchange of the data elements in the dual symmetric processor elements.
    Type: Grant
    Filed: March 23, 1995
    Date of Patent: June 11, 2002
    Assignee: International Business Machines Corporation
    Inventors: Gerald George Pechanek, Stamatis Vassiliadis, Jose Guadalupe Delgado-Frias
  • Publication number: 20020059509
    Abstract: The multi-processor system comprises a plurality of cell processors for performing data processing, a BCMC for broadcasting broadcast data including data used in data processing to the plurality of cell processors, each of the plurality of cell processors sorts out only data necessary for data processing that is performed by each cell processor from broadcast data broadcasted by BCMC to as to perform data processing. BCMC obtains results of data processing of all cell processors so that they can be supplied to all cell processors as broadcast data, thus making it possible to transmit and receive the results of data processing between the cell processors and perform high-speed data processing as an entire system.
    Type: Application
    Filed: September 26, 2001
    Publication date: May 16, 2002
    Inventor: Nobuo Sasaki
  • Publication number: 20020002664
    Abstract: A plurality of parallel execution units are selectively powered from a plurality of power sources, the power to each execution unit being selected based upon expected time to completion of processing within the execution unit. Maximum power is gated to execution units executing complex instructions, or time-critical instructions. Less than maximum power is gated to execution units executing simple instructions, or instructions which are not time-critical, or in response to pipeline hazards or stalls. When less than maximum power is gated to an execution unit, a step up circuit may be employed to raise the output of that execution unit to maximum power.
    Type: Application
    Filed: April 10, 2001
    Publication date: January 3, 2002
    Applicant: International Business Machines Corporation
    Inventor: Mark William Kuemerle
  • Patent number: 6308250
    Abstract: A method and system for operating a computing system having multiple processing units. According to a new machine instruction, called the iota instruction, the computing system operates on a vector of mask bits to generate an iota vector having a sequence of values. In one form, each value of the iota vector is a sum of a series of the lower order mask bits up to and including the mask bit corresponding to the entry in the iota vector. In another form, each entry in the iota vector is a sum of a series of lower order mask bits but does not include the mask bit corresponding to the particular entry in the iota vector. In order to calculate the iota vector, the multiple processing units of the present invention communicate the mask bits to the other processing units. Advantages of the present invention include the vectorization of software loops having certain data hazards that prevented conventional compilers from vectorizing the software.
    Type: Grant
    Filed: June 23, 1998
    Date of Patent: October 23, 2001
    Assignee: Silicon Graphics, Inc.
    Inventor: Peter Michael Klausler
  • Patent number: 6272621
    Abstract: A synchronization and control system for an arrayed processing engine of an intermediate network station comprises sequencing circuitry that controls the processing engine. The processing engine generally includes a plurality of processing element stages arrayed as parallel pipelines. The control system further includes an input header buffer (IHB) and an output header buffer (OHB), the latter comprising circuitry for receiving current transient data processed by the pipelines and for decoding control signals to determine a destination for the processed data. One destination is a feedback path that couples the OHB to the IHB and returns the processed data to the IHB for immediate loading into an available pipeline.
    Type: Grant
    Filed: August 18, 2000
    Date of Patent: August 7, 2001
    Assignee: Cisco Technology, Inc.
    Inventors: Kenneth Michael Key, Michael L. Wright, Darren Kerr, William E. Jennings
  • Patent number: 6219777
    Abstract: Disclosed is a register file used in a multiprocessor composition composed of a plurality of processor elements, the register file having a plurality of words and being provided for each of the plurality of processor elements, wherein: the plurality of words are divided into a word part that can be simultaneously accessed by some of the plurality of processor elements to use in common with other processor element, and a word part that can be accessed only by its own processor element.
    Type: Grant
    Filed: July 10, 1998
    Date of Patent: April 17, 2001
    Assignee: NEC Corporation
    Inventor: Toshiaki Inoue
  • Patent number: 6195739
    Abstract: A processor complex architecture facilitates accurate passing of transient data among processor complex stages of a pipelined processing engine. The processor complex comprises a central processing unit (CPU) coupled to an instruction memory and a pair of context data memory structures via a memory manager circuit. The context memories store transient “context” data for processing by the CPU in accordance with instructions stored in the instruction memory. The architecture further comprises data mover circuitry that cooperates with the context memories and memory manager to provide a technique for efficiently passing data among the stages in a manner that maintains data coherency in the processing engine. An aspect of the architecture is the ability of the CPU to operate on the transient data substantially simultaneously with the passing of that data by the data mover.
    Type: Grant
    Filed: June 29, 1998
    Date of Patent: February 27, 2001
    Assignee: Cisco Technology, Inc.
    Inventors: Michael L. Wright, Darren Kerr, Kenneth Michael Key, William E. Jennings
  • Patent number: 6119215
    Abstract: A synchronization and control system for an arrayed processing engine of an intermediate network station comprises sequencing circuitry that controls the processing engine. The processing engine generally includes a plurality of processing element stages arrayed as parallel pipelines. The control system further includes an input header buffer (IHB) and an output header buffer (OHB), the latter comprising circuitry for receiving current transient data processed by the pipelines and for decoding control signals to determine a destination for the processed data. One destination is a feedback path that couples the OHB to the IHB and returns the processed data to the IHB for immediate loading into an available pipeline.
    Type: Grant
    Filed: June 29, 1998
    Date of Patent: September 12, 2000
    Assignee: Cisco Technology, Inc.
    Inventors: Kenneth Michael Key, Michael L. Wright, Darren Kerr, William E. Jennings
  • Patent number: 6023753
    Abstract: An array processor includes processing elements arranged in clusters which are, in turn, combined in a rectangular array. Each cluster is formed of processing elements which preferably communicate with the processing elements of at least two other clusters. Additionally each inter-cluster communication path is mutually exclusive, that is, each path carries either north and west, south and east, north and east, or south and west communications. Due to the mutual exclusivity of the data paths, communications between the processing elements of each cluster may be combined in a single inter-cluster path. That is, communications from a cluster which communicates to the north and east with another cluster may be combined in one path, thus eliminating half the wiring required for the path. Additionally, the length of the longest communication path is not directly determined by the overall dimension of the array, as it is in conventional torus arrays.
    Type: Grant
    Filed: June 30, 1997
    Date of Patent: February 8, 2000
    Assignee: Billion of Operations Per Second, Inc.
    Inventors: Gerald G. Pechanek, Charles W. Kurak, Jr.
  • Patent number: 5999961
    Abstract: A circuit for performing prefix computation in an asynchronous digital processor by implementing a serial process and a tree process for the same prefix computation in parallel. The first output from either processes is selected and used for the subsequent operation. For a prefix computation with N inputs, an average-case latency of O(loglog N) can be achieved. Buffering can be used for a full-throughout operation.
    Type: Grant
    Filed: September 15, 1997
    Date of Patent: December 7, 1999
    Assignee: California Institute of Technology
    Inventors: Rajit Manohar, Alain J. Martin
  • Patent number: 5964860
    Abstract: An electronic circuit is used to compare two sequences, such as genetic sequences, to determine which alignment of the sequences produces the greatest similarity. The circuit includes a linear array of series-connected processors, each of which stores a single element from one of the sequences and compares that element with each successive element in the other sequence. For each comparison, the processor generates a scoring parameter that indicates which segment ending at those two elements produces the greatest degree of similarity between the sequences. The processor uses the scoring parameter to generate a similar scoring parameter for a comparison between the stored element and the next successive element from the other sequence. The processor also delivers the scoring parameter to the next processor in the array for use in generating a similar scoring parameter for another pair of elements.
    Type: Grant
    Filed: April 8, 1997
    Date of Patent: October 12, 1999
    Assignee: California Institute of Technology
    Inventors: John C. Peterson, Edward T. Chow, Michael S. Waterman, Timothy J. Hunkapillar