Array Processor Operation Patents (Class 712/16)
  • Patent number: 7890735
    Abstract: A multi-threaded microprocessor (1105) for processing instructions in threads. The microprocessor (1105) includes first and second decode pipelines (1730.0, 1730.1), first and second execute pipelines (1740, 1750), and coupling circuitry (1916) operable in a first mode to couple first and second threads from the first and second decode pipelines (1730.0, 1730.1) to the first and second execute pipelines (1740, 1750) respectively, and the coupling circuitry (1916) operable in a second mode to couple the first thread to both the first and second execute pipelines (1740, 1750). Various processes of manufacture, articles of manufacture, processes and methods of operation, circuits, devices, and systems are disclosed.
    Type: Grant
    Filed: August 23, 2006
    Date of Patent: February 15, 2011
    Assignee: Texas Instruments Incorporated
    Inventor: Thang Tran
  • Patent number: 7882333
    Abstract: A method for loading microcode to a plurality of cores within a processor. The method includes loading the microcode to a first core of the plurality of cores within the processor system and generating a broadcast inter process interrupt (IPI) message via the first core. The IPI message causes other cores within the processor system to synchronize respective microcode with the microcode that is loaded into the first core. The synchronizing loads microcode to the plurality of cores without requiring independent loads of microcode to each core.
    Type: Grant
    Filed: November 5, 2007
    Date of Patent: February 1, 2011
    Assignee: Dell Products L.P.
    Inventor: Mukund Khatri
  • Patent number: 7870365
    Abstract: In some embodiments, control and data messages are transmitted non-contentiously over corresponding control and data channels of inter-processor links in a matrix of mesh-interconnected matrix processors. A data stream instruction executed by a user thread of an instruction processing pipeline of a matrix processor may initiate a data stream transfer by a hardware data switch of the matrix processor over multiple consecutive cycles over a data channel. While the data stream is being transferred, the corresponding control channel may transfer control messages non-contentiously with respect to the data stream. The control messages may be messages received from other matrix processors and/or control messages initiated by a kernel thread of the current matrix processor.
    Type: Grant
    Filed: July 7, 2008
    Date of Patent: January 11, 2011
    Assignee: Ovics
    Inventors: Sorin C Cismas, Ilie Garbacea
  • Publication number: 20100332793
    Abstract: The illustrative embodiments provide for a computer-implemented method for representing actions in a data processing system. A table is generated. The table comprises a plurality of rows and columns. Ones of the columns represent corresponding ones of computer applications that can start or stop in parallel with each other in a data processing system. Ones of the rows represent corresponding ones of sequences of actions within a corresponding column. Additionally, the table represents a definition of relationships among memory address spaces, wherein the table represents when each particular address space is started or stopped during one of a start-up process, a recovery process, and a shut-down process. The resulting table is stored.
    Type: Application
    Filed: December 21, 2007
    Publication date: December 30, 2010
    Inventor: Joseph John Katnic
  • Publication number: 20100318764
    Abstract: A system and method of compiling program code, wherein the program code includes an operation on an array of data elements stored in memory of a computer system. The program code is scanned for operations that are vectorizable. The vectorizable operations are examined to determine whether they should be executed at least in part in a vector atomic memory operation (AMO) functional unit attached to memory. If so, the compiled code includes vector AMO instructions.
    Type: Application
    Filed: June 12, 2009
    Publication date: December 16, 2010
    Applicant: Cray Inc.
    Inventor: Terry D. Greyzck
  • Patent number: 7849288
    Abstract: A reconfigurable circuit and control method therefor, capable of enhancing efficiency of implementation of a pipeline process in processing elements and improve processing performance. Processing elements are reconfigured to form a circuit based on configuration information and execute a prescribed process. Memory units store configuration information for the processing elements. A memory switching unit switches the plurality of memory units to store therein the configuration information on the stages of a pipeline process to be performed by the processing elements. A configuration information output unit switches the memory units to output therefrom the configuration information to the plurality of processing elements.
    Type: Grant
    Filed: October 12, 2006
    Date of Patent: December 7, 2010
    Assignee: Fujitsu Limited
    Inventors: Hisanori Fujisawa, Miyoshi Saito, Toshihiro Ozawa
  • Patent number: 7840778
    Abstract: A parallel processing architecture comprising a cluster of embedded processors that share a common code distribution bus. Pages or blocks of code are concurrently loaded into respective program memories of some or all of these processors (typically all processors assigned to a particular task) over the code distribution bus, and are executed in parallel by these processors. A task control processor determines when all of the processors assigned to a particular task have finished executing the current code page, and then loads a new code page (e.g., the next sequential code page within a task) into the program memories of these processors for execution. The processors within the cluster preferably share a common memory (1 per cluster) that is used to receive data inputs from, and to provide data outputs to, a higher level processor. Multiple interconnected clusters may be integrated within a common integrated circuit device.
    Type: Grant
    Filed: August 31, 2006
    Date of Patent: November 23, 2010
    Inventors: Richard F. Hobson, Bill Ressl, Allan R. Dyck
  • Patent number: 7840779
    Abstract: Methods, apparatus, and products are disclosed for line-plane broadcasting in a data communications network of a parallel computer, the parallel computer comprising a plurality of compute nodes connected together through the network, the network optimized for point to point data communications and characterized by at least a first dimension, a second dimension, and a third dimension, that include: initiating, by a broadcasting compute node, a broadcast operation, including sending a message to all of the compute nodes along an axis of the first dimension for the network; sending, by each compute node along the axis of the first dimension, the message to all of the compute nodes along an axis of the second dimension for the network; and sending, by each compute node along the axis of the second dimension, the message to all of the compute nodes along an axis of the third dimension for the network.
    Type: Grant
    Filed: August 22, 2007
    Date of Patent: November 23, 2010
    Assignee: International Business Machines Corporation
    Inventors: Charles J. Archer, Jeremy E. Berg, Michael A. Blocksome, Brian E. Smith
  • Patent number: 7827385
    Abstract: A parallel computer comprises a plurality of compute nodes organized into at least one operational group for collective parallel operations. Each compute node is assigned a unique rank and is coupled for data communications through a global combining network. One compute node is assigned to be a logical root. A send buffer and a receive buffer is configured. Each element of a contribution of the logical root in the send buffer is contributed. One or more zeros corresponding to a size of the element are injected. An allreduce operation with a bitwise OR using the element and the injected zeros is performed. And the result for the allreduce operation is determined and stored in each receive buffer.
    Type: Grant
    Filed: August 2, 2007
    Date of Patent: November 2, 2010
    Assignee: International Business Machines Corporation
    Inventors: Gheorghe Almasi, Charles J. Archer, Joseph D. Ratterman, Brian E. Smith
  • Patent number: 7822881
    Abstract: In a data-processing method, first result data may be obtained using a plurality of configurable coarse-granular elements, the first result data may be written into a memory that includes spatially separate first and second memory areas and that is connected via a bus to the plurality of configurable coarse-granular elements, the first result data may be subsequently read out from the memory, and the first result data may be subsequently processed using the plurality of configurable coarse-granular elements. In a first configuration, the first memory area may be configured as a write memory, and the second memory area may be configured as a read memory. Subsequent to writing to and reading from the memory in accordance with the first configuration, the first memory area may be configured as a read memory, and the second memory area may be configured as a write memory.
    Type: Grant
    Filed: October 7, 2005
    Date of Patent: October 26, 2010
    Inventors: Martin Vorbach, Robert Münch
  • Publication number: 20100257336
    Abstract: A processor having a dependency matrix comprises a first array comprising a plurality of first cells. A second array couples to the first array and comprises a plurality of second cells. A first write port couples to the first array and the second array and writes to the first array and the second array. A first read port couples to the first array and the second array and reads from the first array and the second array. A second read port couples to the first array and reads from the first array. A second write port couples to the second read port, reads from the second read port and writes to the second array.
    Type: Application
    Filed: April 3, 2009
    Publication date: October 7, 2010
    Applicant: International Business Machines Corporation
    Inventors: Saiful Islam, Mary D. Brown, Bjorn P. Christensen, Sam G. Chu, Robert A. Cordes, Maureen A. Delaney, Jafar Nahidi, Joel A. Silberman
  • Publication number: 20100235608
    Abstract: An apparatus for physical properties computation comprising an array processor. The array processor comprises of a plurality of processing elements, said processing elements arranged in a grid. A processing unit (PU) is coupled to the array processor. A local memory is coupled to the PU. The PU broadcasts data to rows of said processing elements in said grid, and performs physical computations in an order of complexity of O((?N) log N).
    Type: Application
    Filed: May 24, 2010
    Publication date: September 16, 2010
    Applicant: AiSeek Ltd.
    Inventors: Roy ARMONI, Ramon Axelrod
  • Publication number: 20100211757
    Abstract: A systolic data processing apparatus includes a processing element (PE) array and control unit. The PE array comprises a plurality of PEs, each PE executing a thread with respect to different data according to an input instruction and pipelining the instruction at each cycle for executing a program. The control unit inputs a new instruction to a first PE of the PE array at each cycle.
    Type: Application
    Filed: February 17, 2009
    Publication date: August 19, 2010
    Applicant: Samsung Electronics Co., Ltd.
    Inventors: Gi-Ho Park, Shin-dug Kim, Jung-wook Park, Hoon-mo Yang, Sung-bae Park
  • Patent number: 7769981
    Abstract: Provided is a parallel processor for supporting a floating-point operation. The parallel processor has a flexible structure for easy development of a parallel algorithm involving multimedia computing, requires low hardware cost, and consumes low power. To support floating-point operations, the parallel processor uses floating-point accumulators and a flag for floating-point multiplication. Using the parallel processor, it is possible to process a geometric transformation operation in a 3-dimensional (3D) graphics process at low cost. Also, the cost of a bus width for instructions can be minimized by a partitioned Single-Instruction Multiple-Data (SIMD) method and a method of conditionally executing instructions.
    Type: Grant
    Filed: March 11, 2008
    Date of Patent: August 3, 2010
    Assignee: Electronics and Telecommunications Research Institute
    Inventors: Chun Gi Lyuh, Yil Suk Yang, Se Wan Heo, Soon Il Yeo, Tae Moon Roh, Jong Dae Kim, Ki Chul Kim, Se Hoon Yoo
  • Patent number: 7756505
    Abstract: To realize a software radio processing with a reduced circuit area by hardware and software which can process transmission and reception, or synchronization and demodulation in time division. There are provided a circuit DRC that can dynamically change a configuration with a structure that can change the configuration at a high speed, a general processor, and an interface for connection with an external device such as an AD converter or a DA converter. Software radio is realized by using a software radio chip that conducts plural different processing such as transmission and reception, or synchronization and demodulation in time division. The different processing during the radio signal processing can be conducted in time division. As a result, the software radio can be realized with a circuit of a reduced area in a software radio system that allocates regions of an FPGA to the respective processing.
    Type: Grant
    Filed: October 3, 2005
    Date of Patent: July 13, 2010
    Assignee: Hitachi, Ltd.
    Inventors: Hiroshi Tanaka, Takanobu Tsunoda, Tetsuroo Honmura, Manabu Kawabe, Masashi Takada
  • Patent number: 7752421
    Abstract: A parallel-prefix broadcast for a parallel-prefix operation on a parallel computer includes: configuring, on each node, a parallel-prefix contribution buffer for storing the node's parallel-prefix contribution; configuring, on each node, a parallel-prefix results buffer for storing results of a operation, the results buffer having a position for each node that corresponds to node's rank; and repeatedly for each position in the results buffer: processing in parallel by each node, including: determining, by the node, whether the current position in the results buffer is to include the node's contribution, if the current position is not to include the contribution, contributing the identity element, and if the current position is to include the contribution, contributing the contribution, performing, by each node, the operation using the contributed identity elements and the contributed contributions, yielding a result from the operation, and storing, by each node, the result in the position in the results buffe
    Type: Grant
    Filed: April 19, 2007
    Date of Patent: July 6, 2010
    Assignee: International Business Machines Corporation
    Inventors: Charles J. Archer, Amanda Peters, Gary R. Ricard, Albert Sidelnik, Brian E. Smith
  • Patent number: 7743235
    Abstract: A parallel hardware-based multithreaded processor is described. The processor includes a general purpose processor that coordinates system functions and a plurality of microengines that support multiple hardware threads or contexts. The processor also includes a memory control system that has a first memory controller that sorts memory references based on whether the memory references are directed to an even bank or an odd bank of memory and a second memory controller that optimizes memory references based upon whether the memory references are read references or write references. Instructions for switching and branching based on executing contexts are also disclosed.
    Type: Grant
    Filed: June 6, 2007
    Date of Patent: June 22, 2010
    Assignee: Intel Corporation
    Inventors: Gilbert Wolrich, Matthew Adiletta, William R. Wheeler
  • Patent number: 7734895
    Abstract: An integrated circuit includes a plurality of processor core. Processing instructions in the integrated circuit includes: managing a plurality of sets of processor cores, each set including one or more processor cores assigned to a function associated with executing instructions; and reconfiguring the number of processor cores assigned to at least one of the sets during execution based on characteristics associated with executing the instructions.
    Type: Grant
    Filed: April 28, 2006
    Date of Patent: June 8, 2010
    Assignee: Massachusetts Institute of Technology
    Inventors: Anant Agarwal, David Wentzlaff
  • Publication number: 20100131737
    Abstract: A method for generating a reflection of data in a plurality of processing elements comprises shifting the data along, for example, each row in the array until each processing element in the row has received all the data held by every other processing element in that row. Each processing element stores and outputs final data as a function of its position in the row. A similar reflection along a horizontal line can be achieved by shifting data along columns instead of rows. Also disclosed is a method for reflecting data in a matrix of processing elements about a vertical line comprising shifting data between processing elements arranged in rows. An initial count is set in each processing element according to the expression (2×Col_Index)MOD(array size). In one embodiment, a counter counts down from the initial count in each processing element as a function of the number of shifts that have peen performed. Output is selected as a function of the current count.
    Type: Application
    Filed: January 28, 2010
    Publication date: May 27, 2010
    Applicant: Micron Technology, Inc.
    Inventor: Mark Beaumont
  • Patent number: 7707388
    Abstract: In one embodiment, a serial processor is configured to execute software instructions in a software program in serial. A serial memory is configured to store data for use by the serial processor in executing the software instructions in serial. A plurality of parallel processors are configured to execute software instructions in the software program in parallel. A plurality of partitioned memory modules are provided and configured to store data for use by the plurality of parallel processors in executing software instructions in parallel. Accordingly, a processor/memory structure is provided that allows serial programs to use quick local serial memories and parallel programs to use partitioned parallel memories. The system may switch between a serial mode and a parallel mode. The system may incorporate pre-fetching commands of several varieties.
    Type: Grant
    Filed: November 29, 2006
    Date of Patent: April 27, 2010
    Assignee: XMTT Inc.
    Inventor: Uzi Vishkin
  • Publication number: 20100100703
    Abstract: A system and a method for parallel computing for solving complex problems is envisaged. Particularly, hierarchical parallel computing system is envisaged by this invention, which is formed by multiple levels of groups, where each group consists of multiple processing elements. Each group of the parallel computing system models as processing element to its immediate upper layer. Thus, each processing element is hierarchically tagged to its immediate upper level, and a multi-level tier of groups are formed. In accordance with this invention, the parallel computing system operates by breaking any problem hierarchically, first across the groups and then within the groups. This hierarchical breakup of the problem helps in significantly improving the time required for processing a problem.
    Type: Application
    Filed: October 15, 2009
    Publication date: April 22, 2010
    Applicant: Computational Research Laboratories Ltd.
    Inventors: Chandan Basu, Mandar Nadgir, Avinash Pandey
  • Publication number: 20100070738
    Abstract: A flexible results pipeline for a processing element of a parallel processor is described. A plurality of result registers are selectively connected to each other, to processing logic of the processing element and to a neighbourhood connection register configured to receive data from and send data to other processing elements. The connections between the result registers and between the result registers and the neighbourhood connection register are selectively configurable by applied control signals.
    Type: Application
    Filed: November 18, 2009
    Publication date: March 18, 2010
    Applicant: Micron Technology, Inc.
    Inventor: GRAHAM KIRSCH
  • Patent number: 7676444
    Abstract: A search engine for selectively perform iterative compare operations between a searchable pattern and S overlapping substrings of an input string of characters includes a memory for storing a bitmap having S next success size (NSS) bits, wherein each NSS bit indicates whether an associated substring including a corresponding unique number of the input characters is to be compared with the searchable pattern in successive compare operations, and includes a compare circuit for selectively performing the successive compare operations in response to the NSS bits.
    Type: Grant
    Filed: March 21, 2007
    Date of Patent: March 9, 2010
    Assignee: NetLogic Microsystems, Inc.
    Inventors: Srinivasan Venkatachary, Pankaj Gupta
  • Patent number: 7657586
    Abstract: An archive cluster application runs in a distributed manner across a redundant array of independent nodes. Each node preferably runs a complete archive cluster application instance. A given nodes provides a data repository, which stores up to a large amount (e.g., a terabyte) of data, while also acting as a portal that enables access to archive files. Each symmetric node has a set of software processes, e.g., a request manager, a storage manager, a metadata manager, and a policy manager. The request manager manages requests to the node for data (i.e., file data), the storage manager manages data read/write functions from a disk associated with the node, and the metadata manager facilitates metadata transactions and recovery across the distributed database. The policy manager implements one or more policies, which are operations that determine the behavior of an “archive object” within the cluster. The archive cluster application provides object-based storage.
    Type: Grant
    Filed: December 13, 2006
    Date of Patent: February 2, 2010
    Assignee: Archivas, Inc.
    Inventors: Andres Rodriguez, Jack A. Orenstein, David M. Shaw, Benjamin K. D. Bernhard
  • Patent number: 7650483
    Abstract: A data processing apparatus and method are provided for handling execution of instructions within a data processing apparatus having a plurality of processing units. Each processing unit is operable to execute a sequence of instructions so as to perform associated operations, and at least a subset of the processing units form a cluster. Instruction forwarding logic is provided which for at least one instruction executed by at least one of the processing units in the cluster causes that instruction to be executed by each of the other processing units in the cluster, for example by causing that instruction to be inserted into the sequences of instructions executed by each of the other processing units in the cluster.
    Type: Grant
    Filed: November 3, 2006
    Date of Patent: January 19, 2010
    Assignee: ARM Limited
    Inventors: Elodie Charra, Frederic Claude Marie Piry, Richard Roy Grisenthwaite, Mélanie Emanuelle Lucie Vincent, Norbert Bernard Eugéne Lataille, Jocelyn Francois Orion Jaubert, Stuart David Biles
  • Patent number: 7650484
    Abstract: An array-type computer processor including a data path unit communicating with a state control unit obtains data of a predetermined number of cooperative partial instruction codes, and operates with temporarily holding only a predetermined number of data-obtained instruction codes comprising cooperative partial instruction codes corresponding to contexts and operation states for the data path unit and the state control unit, respectively, from an external program memory which stores data of a computer program.
    Type: Grant
    Filed: February 3, 2005
    Date of Patent: January 19, 2010
    Assignees: NEC Corporation, NEC Electronics Corporation
    Inventors: Takeshi Inuo, Nobuki Kajihara, Takao Toi, Tooru Awashima, Hirokazu Kami, Taro Fujii, Kenichiro Anjo, Kouichiro Furuta, Masato Motomura
  • Publication number: 20090327653
    Abstract: A reconfigurable computing circuit for reducing amount of dummy data to be stored in data registers, which is required when the wiring is shared by the configuration information bus and scan chain. When data is to be stored in data registers and configuration registers constituting the scan chain in reconfig computing block 2010, reg setting data selecting unit 3400 selects either a value stored in reg setting data storage unit 3000 or an initial value output from data reg data generating unit 4000, based on the information stored in reg type managing unit 1100 that indicates the types of registers and the connection order of the registers in the scan chain, and outputs the selected value in sequence to the scan chain under control of scan/reconfig control unit 1000. Each register in the scan chain then shifts data stored therein to the next register in the scan chain in sequence.
    Type: Application
    Filed: April 18, 2008
    Publication date: December 31, 2009
    Inventors: Masaki MAEDA, Takahiro Ichinomiya
  • Patent number: 7636835
    Abstract: An integrated circuit comprises a plurality of tiles. Each tile comprises a processor, and a switch including switching circuitry to forward data received over data paths from other tiles to the processor and to switches of other tiles, and to forward data received from the processor to switches of other tiles. The integrated circuit further comprises one or more interface modules including circuitry to transfer data to and from a device external to the tiles; and a sub-port routing network including circuitry to route data between a port of a switch and a plurality of sub-ports coupled to one or more interface modules.
    Type: Grant
    Filed: April 14, 2006
    Date of Patent: December 22, 2009
    Assignee: Tilera Corporation
    Inventors: Carl G. Ramey, David Wentzlaff, Anant Agarwal
  • Patent number: 7634159
    Abstract: An array transform system for parallel computation of a plurality of elements of an array transform includes a memory for storing an array of data elements. Each column of data elements from the memory is copied to a shifter that shifts the column of data elements in accordance with a shift value to produce a shifted column of data elements. The shifted columns of data elements are accumulated in a plurality of accumulators, with each accumulator producing an element of the array transform. A controller controls the shift value dependent upon the position of the column of data elements in the array of data elements.
    Type: Grant
    Filed: December 8, 2004
    Date of Patent: December 15, 2009
    Assignee: Motorola, Inc.
    Inventors: Malcolm R. Dwyer, James E. Crenshaw, Zhiyuan Li
  • Publication number: 20090307467
    Abstract: Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer. Each compute node includes at least two processing cores. Each processing core has contribution data for the allreduce operation. Performing an allreduce operation on a plurality of compute nodes of a parallel computer includes: establishing one or more logical rings among the compute nodes, each logical ring including at least one processing core from each compute node; performing, for each logical ring, a global allreduce operation using the contribution data for the processing cores included in that logical ring, yielding a global allreduce result for each processing core included in that logical ring; and performing, for each compute node, a local allreduce operation using the global allreduce results for each processing core on that compute node.
    Type: Application
    Filed: May 21, 2008
    Publication date: December 10, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Ahmad Faraj
  • Patent number: 7627698
    Abstract: A variety of advantageous mechanisms for improved data transfer control within a data processing system are described. A DMA controller is described which is implemented as a multiprocessing transfer engine supporting multiple transfer controllers which may work independently or in cooperation to carry out data transfers, with each transfer controller acting as an autonomous processor, fetching and dispatching DMA instructions to multiple execution units. In particular, mechanisms for initiating and controlling the sequence of data transfers are provided, as are processes for autonomously fetching DMA instructions which are decoded sequentially but executed in parallel.
    Type: Grant
    Filed: July 30, 2007
    Date of Patent: December 1, 2009
    Assignee: Altera Corporation
    Inventors: Edwin Franklin Barry, Edward A. Wolff
  • Patent number: 7627736
    Abstract: A data processing apparatus includes a plurality of processing elements arranged in a single instruction multiple data array. The apparatus is operable to process multiple instructions streams in parallel with one another.
    Type: Grant
    Filed: May 18, 2007
    Date of Patent: December 1, 2009
    Assignee: ClearSpeed Technology plc
    Inventors: Dave Stuttard, Dave Williams, Eamon O'Dea, Gordon Faulds, John Rhoades, Ken Cameron, Phil Atkin, Paul Winser, Russell David, Ray McConnell, Tim Day, Trey Greer
  • Patent number: 7603541
    Abstract: A method is disclosed for achieving synchronization in an array of semi-synchronous devices. A processor array has an array of processor elements, wherein each of said processor elements comprises a cycle counter, and a master processor element is able to transmit control command signals to each of the other processor elements. Each processor element is such that, on receipt of a control command signal, it acts on that signal only when its cycle counter reaches a predetermined value, and the master processor element is such that it transmits control command signals only when its cycle counter takes a value which is within a predetermined range, or “safe window”. By appropriate setting of the “safe window”, it can be guaranteed that, when the master processor element transmits a control command signal to each of the other processor elements, those command control signals are acted upon at corresponding times within the other processor elements.
    Type: Grant
    Filed: December 12, 2003
    Date of Patent: October 13, 2009
    Assignee: Picochip Designs Limited
    Inventors: John Matthew Nolan, Roger Paul Dealtry
  • Publication number: 20090249028
    Abstract: The present invention relates to a processor that, as its main feature, has an internal raster of ALUs, with the help of which sequential programs are executed. The connections between the ALUs are automatically created at runtime dynamically by means of multiplexers. A central decoding and configuration unit that creates configuration data for the ALU grid from a stream of conventional assembler commands at runtime is responsible for creating the connections. In addition to the ALU grid, a special unit for the execution of memory accesses and another unit for the processing of branch instructions are provided. The novel architecture that is the foundation of the processor makes efficient execution of both control flow- and data flow-oriented tasks possible.
    Type: Application
    Filed: June 12, 2007
    Publication date: October 1, 2009
    Inventor: Sascha Uhrig
  • Patent number: 7596678
    Abstract: A transpose of data appearing in a plurality of processing elements comprises shifting the data along diagonals of the plurality of processing elements until the processing elements in the diagonal have received the data held by every other processing element in that diagonal. Shifting along diagonals can be accomplished by executing pairs of horizontal and vertical shifts in the x-y directions or pairs of shifts in perpendicular directions, e.g., x-z. Each processing element stores data as its final output data as a function of the processing element's position. In one embodiment, an initial count is either loaded into each processing element or calculated locally based on the processing element's location. The initial count may be given by one of the following expressions: (x+y+1)MOD(array size); (C+R+1)MOD(array size); (C+y+1); or MOD(array size); or (x+R+1)MOD(array size).
    Type: Grant
    Filed: October 20, 2003
    Date of Patent: September 29, 2009
    Assignee: Micron Technology, Inc.
    Inventor: Mark Beaumont
  • Publication number: 20090228683
    Abstract: A controller operable to control an array of processing elements comprises a retrieval unit operable to retrieve instruction items for each of a plurality of instructions streams, each instruction stream having a plurality of instructions items, a combining unit operable to combine the plurality of instruction streams into a serial instruction stream, and a distribution unit operable to distribute the serial instruction stream to an array of processing elements.
    Type: Application
    Filed: March 13, 2009
    Publication date: September 10, 2009
    Applicant: ClearSpeed Technology plc
    Inventors: Dave Stuttard, Dave Williams, Eamon O'Dea, Gordon Faulds, John Rhoades, Ken Cameron, Phil Atkin, Paul Winser, Russell David, Ray McConnell, Tim Day, Trey Greer
  • Patent number: 7581079
    Abstract: A shared memory network for communicating between processors using store and load instructions is described. A new processor architecture which may be used with the shared memory network is also described that uses arithmetic/logic instructions that do not specify any source operand addresses or target operand addresses. The source operands and target operands for arithmetic/logic execution units are provided by independent load instruction operations and independent store instruction operations.
    Type: Grant
    Filed: March 26, 2006
    Date of Patent: August 25, 2009
    Inventor: Gerald George Pechanek
  • Publication number: 20090210652
    Abstract: There is provided a method for routing a plurality of signals in a processor array, the processor array comprising a plurality of processor elements interconnected by a network of switches, each signal having a respective source processor element and at least one destination processor element in the processor array, the method comprising (i) identifying a signal from the plurality of unrouted signals to route; (ii) identifying a candidate route from the source processor element to the destination processor element, the candidate route using a first plurality of switches; (iii) evaluating the candidate route by determining whether there are offset values that allow the signal to be routed through the first plurality of switches; and (iv) attempting to route the signal using one of the offset values identified in step (iii).
    Type: Application
    Filed: February 9, 2009
    Publication date: August 20, 2009
    Inventors: Andrew William George DULLER, William Philip ROBBINS
  • Patent number: 7577820
    Abstract: An integrated circuit comprises a plurality of tiles. Each tile comprises a processor including a storage module, wherein the processor is configured to process multiple streams of instructions, a switch including switching circuitry to forward data received over data paths from other tiles to the processor and to switches of other tiles, and to forward data received from the processor to switches of other tiles, and coupling circuitry configured to couple data resulting from processing an instruction from at least one of the streams of instructions to the storage module and to the switch.
    Type: Grant
    Filed: April 14, 2006
    Date of Patent: August 18, 2009
    Assignee: Tilera Corporation
    Inventors: David Wentzlaff, Anant Agarwal
  • Patent number: 7577821
    Abstract: An integrated circuit device comprising a data processing block including a first matrix and a second matrix is disclosed. The first matrix and the second matrix respectively include a plurality of types of operation units and a wiring group for connecting the plurality of types of operation units, a configuration of data flow with the plurality of types of operation units being changeable by changing a route of the wiring group for data supplying to the plurality of types of operation units. One of the plurality of types of operation units is a delay type operation unit that include a data path suited to processing for delaying a transfer time of data.
    Type: Grant
    Filed: February 1, 2007
    Date of Patent: August 18, 2009
    Assignee: IPFlex Inc.
    Inventors: Kenji Ikeda, Hiroshi Shimura, Tomoyoshi Sato
  • Publication number: 20090204788
    Abstract: Disclosed is an array of programmable data-processing cells configured as a plurality of cross-connected pipelines. An apparatus includes cells capable of performing data-processing functions selectable by a presented instruction. A first set of cells includes an input cell, an output cell, and a series of at least one interior cell providing an acyclic data processing path from the input cell to the output cell. Additional cells are similarly configured. Memory presents configuration instructions to cells in response to a configuration code. Data advances through ranks of the cells. The configuration code advances to memory associated with a rank in tandem with the data.
    Type: Application
    Filed: January 9, 2009
    Publication date: August 13, 2009
    Applicant: Theseus Research, Inc.
    Inventor: Karl M. Fant
  • Patent number: 7574582
    Abstract: There is disclosed a processor array, which achieves an approximately constant latency. Communications to and from the farthest array elements are suitably pipelined for the distance, while communications to and from closer array elements are deliberately “over-pipelined” such that the latency to all end-point elements is the same number of clock cycles. The processor array has a plurality of primary buses, each connected to a primary bus driver, and each having a respective plurality of primary bus nodes thereon; respective pluralities of secondary buses, connected to said primary bus nodes; a plurality of processor elements, each connected to one of the secondary buses; and delay elements associated with the primary bus nodes, for delaying communications with processor elements connected to different ones of the secondary buses by different amounts, in order to achieve a degree of synchronization between operation of said processor elements.
    Type: Grant
    Filed: January 26, 2004
    Date of Patent: August 11, 2009
    Assignee: Picochip Designs Limited
    Inventor: John Matthew Nolan
  • Patent number: 7574581
    Abstract: A method of communicating between processing units on different integrated circuit chips in a multi-processor computer system by issuing a command from a source processing unit to a destination processing unit, receiving the command at the destination processing unit while the destination processing unit is processing program instructions, and accessing free-running, scan registers in clock-controlled components of the destination processing unit without interrupting processing of the program instructions by the destination processing unit. The access may be a read from status or mode registers of the destination processing unit, or write to control or mode registers. Many processing units can be interconnected in a ring topology, and the access command can be passed from the source processing unit through several other processing units before reaching the destination processing unit.
    Type: Grant
    Filed: April 28, 2003
    Date of Patent: August 11, 2009
    Assignee: International Business Machines Corporation
    Inventors: Michael Stephen Floyd, Larry Scott Leitner, Kevin Franklin Reick, Kevin Dennis Woodling
  • Patent number: 7571300
    Abstract: A memory system includes a plurality of memory blocks, each having a dedicated local arithmetic logic unit (ALU). A data value having a plurality of bytes is stored such that each of the bytes is stored in a corresponding one of the memory blocks. In a read-modify-write operation, each byte of the data value is read from the corresponding memory block, and is provided to the corresponding ALU. Similarly, each byte of a modify data value is provided to a corresponding ALU on a memory data bus. Each ALU combines the read byte with the modify byte to create a write byte. Because the write bytes are all generated locally within the ALUs, long signal delay paths are avoided. Each ALU also generates two possible carry bits in parallel, and then uses the actual received carry bit to select from the two possible carry bits.
    Type: Grant
    Filed: January 8, 2007
    Date of Patent: August 4, 2009
    Assignee: Integrated Device Technologies, Inc.
    Inventor: Tak Kwong Wong
  • Patent number: 7565287
    Abstract: Techniques for implementing vocoders in parallel digital signal processors are described. A preferred approach is implemented in conjunction with the BOPS® Manifold Array (ManArray™) processing architecture so that in an array of N parallel processing elements, N channels of voice communication are processed in parallel. Techniques for forcing vocoder processing of one data-frame to take the same number of cycles are described. Improved throughput and lower clock rates can be achieved.
    Type: Grant
    Filed: December 20, 2005
    Date of Patent: July 21, 2009
    Assignee: Altera Corporation
    Inventors: Ali Soheil Sadri, Navin Jaffer, Anissim A. Silivra, Bin Huang, Matthew Plonski
  • Publication number: 20090164752
    Abstract: A data processor comprises a plurality of processing elements (PEs), with memory local to at least one of the processing elements, and a data packet-switched network interconnecting the processing elements and the memory to enable any of the PEs to access the memory. The network consists of nodes arranged linearly or in a grid, e.g., in a SIMD array, so as to connect the PEs and their local memories to a common controller. Transaction-enabled PEs and nodes set flags, which are maintained until the transaction is completed and signal status to the controller e.g., over a series of OR-gates. The processor performs memory accesses on data stored in the memory in response to control signals sent by the controller to the memory. The local memories share the same memory map or space. External memory may also be connected to the “end” nodes interfacing with the network, eg to provide cache.
    Type: Application
    Filed: August 11, 2005
    Publication date: June 25, 2009
    Applicant: CLEARSPEED TECHNOLOGY PLC
    Inventor: Ray McConnell
  • Publication number: 20090132785
    Abstract: A SIMD processor responds to a single min/max instruction to find the minimum or maximum valued data unit in an array of data units. The determined minimum/maximum value and an associated index value thereto may be output. Alternatively, the value of a data unit in another array may be output at a corresponding location. A further single instruction executable by the SIMD processor, may be applied to results obtained using such a single min/max instruction, to allow such instructions to operate on two dimensional arrays.
    Type: Application
    Filed: September 5, 2008
    Publication date: May 21, 2009
    Applicant: ATI Technologies ULC
    Inventors: Richard J. Selvaggi, Larry A. Pearlstein
  • Patent number: 7536493
    Abstract: The aspects of the present invention provide a computer implemented method, an apparatus, and a computer usable program code for identifying a service processor with current setting information. First stored information for a service processor is retrieved from a first memory in the first service processor. Second stored information for the service processor is retrieved from a second memory in a data processing system to which the service processor is connected. The first stored information is compared with the second stored information to form a comparison as to whether the service processor was previously connected to the data processing system and to determine whether the service processor was a last service processor to fully communicate with system firmware in the data processing system. The service processor has current setting information if the service processor was previously connected to the data processing system and was the last service processor to fully communicate with the system firmware.
    Type: Grant
    Filed: April 12, 2006
    Date of Patent: May 19, 2009
    Assignee: International Business Machines Corporation
    Inventors: Gary Dean Anderson, Joseph Donald Armstrong, Brent William Jacobs, Ajay Kumar Mahajan
  • Patent number: 7529917
    Abstract: A processor including a coarse grained array including a plurality of function units and a plurality of register files, wherein a loop to be executed by the coarse grained array is split into a plurality of sub-loops, and when an interrupt request occurs while executing the sub-loop in the coarse grained array, the interrupt request is processed after the executing of the sub-loop is completed.
    Type: Grant
    Filed: September 13, 2006
    Date of Patent: May 5, 2009
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Soo Jung Ryu, Jeong Wook Kim, Dong-Hoon Yoo, Hee Seok Kim
  • Patent number: RE40741
    Abstract: A system and method for synchronization of video raster display outputs from multiple PC graphics subsystems to facilitate synchronized output onto multiple displays are disclosed. The system and method allow multiple graphics subsystems, in a single or multiple chassis, to be used to provide multiple synchronized view ports of a single 3D database or a wide desktop with reduced inter-monitor artifacts and interference.
    Type: Grant
    Filed: November 14, 2005
    Date of Patent: June 16, 2009
    Assignee: Quantum 3D
    Inventors: Alan C. Simmonds, Paul M. Slade