Array Processor Operation Patents (Class 712/16)
-
Patent number: 7908462Abstract: The current invention provides a virtual world simulation system capable of hosting with massive amount of concurrent players by integrating commodity parallel co-processors into servers. The current invention proposes novel parallel processing algorithms to make use of commodity parallel co-processors like a graphic processing unit (GPU) or any specialized hardware with parallel architecture design like a field-programmable gate array (FPGA), to accelerate virtual world simulation.Type: GrantFiled: June 9, 2010Date of Patent: March 15, 2011Assignee: Zillians IncorporatedInventor: Mu Chi Sung
-
Patent number: 7908409Abstract: A variety of advantageous mechanisms for improved data transfer control within a data processing system are described. A DMA controller is described which is implemented as a multiprocessing transfer engine supporting multiple transfer controllers which may work independently or in cooperation to carry out data transfers, with each transfer controller acting as an autonomous processor, fetching and dispatching DMA instructions to multiple execution units. In particular, mechanisms for initiating and controlling the sequence of data transfers are provided, as are processes for autonomously fetching DMA instructions which are decoded sequentially but executed in parallel.Type: GrantFiled: August 6, 2009Date of Patent: March 15, 2011Assignee: Altera CorporationInventors: Edwin Franklin Barry, Edward A. Wolff
-
Patent number: 7904615Abstract: A computer array (10) has a plurality of computers (12). The computers (12) communicate with each other asynchronously, and the computers (12) themselves operate in a generally asynchronous manner internally. When one computer (12) attempts to communicate with another it goes to sleep until the other computer (12) is ready to complete the transaction, thereby saving power and reducing heat production. A plurality of read lines (18), write lines (20) and data lines (22) interconnect the computers (12). When one computer (12) sets a read line (18) high and the other computer sets a corresponding write line (20) then data is transferred on the data lines (22). When both the read line (18) and corresponding write line (20) go low this allows both communicating computers (12) to know that the communication is completed. An acknowledge line (72) goes high to restart the computers (12).Type: GrantFiled: February 16, 2006Date of Patent: March 8, 2011Assignee: VNS Portfolio LLCInventor: Charles H. Moore
-
Patent number: 7904695Abstract: A computer array (10) has a plurality of computers (12). The computers (12) communicate with each other asynchronously, and the computers (12) themselves operate in a generally asynchronous manner internally. When one computer (12) attempts to communicate with another it goes to sleep until the other computer (12) is ready to complete the transaction, thereby saving power and reducing heat production. A slot sequencer (42) in each of the computers produces a timing pulse to cause the computer (12) to execute a next instruction. However, when the present instruction is a read or write type instruction, the slot sequencer does not produce the pulse until an acknowledge signal (86) starts it. The acknowledge signal (86) is produced when it is recognized that the communication has been completed by the other computer (12).Type: GrantFiled: February 16, 2006Date of Patent: March 8, 2011Assignee: VNS Portfolio LLCInventor: Charles H. Moore
-
Patent number: 7890735Abstract: A multi-threaded microprocessor (1105) for processing instructions in threads. The microprocessor (1105) includes first and second decode pipelines (1730.0, 1730.1), first and second execute pipelines (1740, 1750), and coupling circuitry (1916) operable in a first mode to couple first and second threads from the first and second decode pipelines (1730.0, 1730.1) to the first and second execute pipelines (1740, 1750) respectively, and the coupling circuitry (1916) operable in a second mode to couple the first thread to both the first and second execute pipelines (1740, 1750). Various processes of manufacture, articles of manufacture, processes and methods of operation, circuits, devices, and systems are disclosed.Type: GrantFiled: August 23, 2006Date of Patent: February 15, 2011Assignee: Texas Instruments IncorporatedInventor: Thang Tran
-
Patent number: 7882333Abstract: A method for loading microcode to a plurality of cores within a processor. The method includes loading the microcode to a first core of the plurality of cores within the processor system and generating a broadcast inter process interrupt (IPI) message via the first core. The IPI message causes other cores within the processor system to synchronize respective microcode with the microcode that is loaded into the first core. The synchronizing loads microcode to the plurality of cores without requiring independent loads of microcode to each core.Type: GrantFiled: November 5, 2007Date of Patent: February 1, 2011Assignee: Dell Products L.P.Inventor: Mukund Khatri
-
Patent number: 7870365Abstract: In some embodiments, control and data messages are transmitted non-contentiously over corresponding control and data channels of inter-processor links in a matrix of mesh-interconnected matrix processors. A data stream instruction executed by a user thread of an instruction processing pipeline of a matrix processor may initiate a data stream transfer by a hardware data switch of the matrix processor over multiple consecutive cycles over a data channel. While the data stream is being transferred, the corresponding control channel may transfer control messages non-contentiously with respect to the data stream. The control messages may be messages received from other matrix processors and/or control messages initiated by a kernel thread of the current matrix processor.Type: GrantFiled: July 7, 2008Date of Patent: January 11, 2011Assignee: OvicsInventors: Sorin C Cismas, Ilie Garbacea
-
Publication number: 20100332793Abstract: The illustrative embodiments provide for a computer-implemented method for representing actions in a data processing system. A table is generated. The table comprises a plurality of rows and columns. Ones of the columns represent corresponding ones of computer applications that can start or stop in parallel with each other in a data processing system. Ones of the rows represent corresponding ones of sequences of actions within a corresponding column. Additionally, the table represents a definition of relationships among memory address spaces, wherein the table represents when each particular address space is started or stopped during one of a start-up process, a recovery process, and a shut-down process. The resulting table is stored.Type: ApplicationFiled: December 21, 2007Publication date: December 30, 2010Inventor: Joseph John Katnic
-
Publication number: 20100318764Abstract: A system and method of compiling program code, wherein the program code includes an operation on an array of data elements stored in memory of a computer system. The program code is scanned for operations that are vectorizable. The vectorizable operations are examined to determine whether they should be executed at least in part in a vector atomic memory operation (AMO) functional unit attached to memory. If so, the compiled code includes vector AMO instructions.Type: ApplicationFiled: June 12, 2009Publication date: December 16, 2010Applicant: Cray Inc.Inventor: Terry D. Greyzck
-
Patent number: 7849288Abstract: A reconfigurable circuit and control method therefor, capable of enhancing efficiency of implementation of a pipeline process in processing elements and improve processing performance. Processing elements are reconfigured to form a circuit based on configuration information and execute a prescribed process. Memory units store configuration information for the processing elements. A memory switching unit switches the plurality of memory units to store therein the configuration information on the stages of a pipeline process to be performed by the processing elements. A configuration information output unit switches the memory units to output therefrom the configuration information to the plurality of processing elements.Type: GrantFiled: October 12, 2006Date of Patent: December 7, 2010Assignee: Fujitsu LimitedInventors: Hisanori Fujisawa, Miyoshi Saito, Toshihiro Ozawa
-
Patent number: 7840778Abstract: A parallel processing architecture comprising a cluster of embedded processors that share a common code distribution bus. Pages or blocks of code are concurrently loaded into respective program memories of some or all of these processors (typically all processors assigned to a particular task) over the code distribution bus, and are executed in parallel by these processors. A task control processor determines when all of the processors assigned to a particular task have finished executing the current code page, and then loads a new code page (e.g., the next sequential code page within a task) into the program memories of these processors for execution. The processors within the cluster preferably share a common memory (1 per cluster) that is used to receive data inputs from, and to provide data outputs to, a higher level processor. Multiple interconnected clusters may be integrated within a common integrated circuit device.Type: GrantFiled: August 31, 2006Date of Patent: November 23, 2010Inventors: Richard F. Hobson, Bill Ressl, Allan R. Dyck
-
Patent number: 7840779Abstract: Methods, apparatus, and products are disclosed for line-plane broadcasting in a data communications network of a parallel computer, the parallel computer comprising a plurality of compute nodes connected together through the network, the network optimized for point to point data communications and characterized by at least a first dimension, a second dimension, and a third dimension, that include: initiating, by a broadcasting compute node, a broadcast operation, including sending a message to all of the compute nodes along an axis of the first dimension for the network; sending, by each compute node along the axis of the first dimension, the message to all of the compute nodes along an axis of the second dimension for the network; and sending, by each compute node along the axis of the second dimension, the message to all of the compute nodes along an axis of the third dimension for the network.Type: GrantFiled: August 22, 2007Date of Patent: November 23, 2010Assignee: International Business Machines CorporationInventors: Charles J. Archer, Jeremy E. Berg, Michael A. Blocksome, Brian E. Smith
-
Patent number: 7827385Abstract: A parallel computer comprises a plurality of compute nodes organized into at least one operational group for collective parallel operations. Each compute node is assigned a unique rank and is coupled for data communications through a global combining network. One compute node is assigned to be a logical root. A send buffer and a receive buffer is configured. Each element of a contribution of the logical root in the send buffer is contributed. One or more zeros corresponding to a size of the element are injected. An allreduce operation with a bitwise OR using the element and the injected zeros is performed. And the result for the allreduce operation is determined and stored in each receive buffer.Type: GrantFiled: August 2, 2007Date of Patent: November 2, 2010Assignee: International Business Machines CorporationInventors: Gheorghe Almasi, Charles J. Archer, Joseph D. Ratterman, Brian E. Smith
-
Patent number: 7822881Abstract: In a data-processing method, first result data may be obtained using a plurality of configurable coarse-granular elements, the first result data may be written into a memory that includes spatially separate first and second memory areas and that is connected via a bus to the plurality of configurable coarse-granular elements, the first result data may be subsequently read out from the memory, and the first result data may be subsequently processed using the plurality of configurable coarse-granular elements. In a first configuration, the first memory area may be configured as a write memory, and the second memory area may be configured as a read memory. Subsequent to writing to and reading from the memory in accordance with the first configuration, the first memory area may be configured as a read memory, and the second memory area may be configured as a write memory.Type: GrantFiled: October 7, 2005Date of Patent: October 26, 2010Inventors: Martin Vorbach, Robert Münch
-
Publication number: 20100257336Abstract: A processor having a dependency matrix comprises a first array comprising a plurality of first cells. A second array couples to the first array and comprises a plurality of second cells. A first write port couples to the first array and the second array and writes to the first array and the second array. A first read port couples to the first array and the second array and reads from the first array and the second array. A second read port couples to the first array and reads from the first array. A second write port couples to the second read port, reads from the second read port and writes to the second array.Type: ApplicationFiled: April 3, 2009Publication date: October 7, 2010Applicant: International Business Machines CorporationInventors: Saiful Islam, Mary D. Brown, Bjorn P. Christensen, Sam G. Chu, Robert A. Cordes, Maureen A. Delaney, Jafar Nahidi, Joel A. Silberman
-
Publication number: 20100235608Abstract: An apparatus for physical properties computation comprising an array processor. The array processor comprises of a plurality of processing elements, said processing elements arranged in a grid. A processing unit (PU) is coupled to the array processor. A local memory is coupled to the PU. The PU broadcasts data to rows of said processing elements in said grid, and performs physical computations in an order of complexity of O((?N) log N).Type: ApplicationFiled: May 24, 2010Publication date: September 16, 2010Applicant: AiSeek Ltd.Inventors: Roy ARMONI, Ramon Axelrod
-
Publication number: 20100211757Abstract: A systolic data processing apparatus includes a processing element (PE) array and control unit. The PE array comprises a plurality of PEs, each PE executing a thread with respect to different data according to an input instruction and pipelining the instruction at each cycle for executing a program. The control unit inputs a new instruction to a first PE of the PE array at each cycle.Type: ApplicationFiled: February 17, 2009Publication date: August 19, 2010Applicant: Samsung Electronics Co., Ltd.Inventors: Gi-Ho Park, Shin-dug Kim, Jung-wook Park, Hoon-mo Yang, Sung-bae Park
-
Patent number: 7769981Abstract: Provided is a parallel processor for supporting a floating-point operation. The parallel processor has a flexible structure for easy development of a parallel algorithm involving multimedia computing, requires low hardware cost, and consumes low power. To support floating-point operations, the parallel processor uses floating-point accumulators and a flag for floating-point multiplication. Using the parallel processor, it is possible to process a geometric transformation operation in a 3-dimensional (3D) graphics process at low cost. Also, the cost of a bus width for instructions can be minimized by a partitioned Single-Instruction Multiple-Data (SIMD) method and a method of conditionally executing instructions.Type: GrantFiled: March 11, 2008Date of Patent: August 3, 2010Assignee: Electronics and Telecommunications Research InstituteInventors: Chun Gi Lyuh, Yil Suk Yang, Se Wan Heo, Soon Il Yeo, Tae Moon Roh, Jong Dae Kim, Ki Chul Kim, Se Hoon Yoo
-
Patent number: 7756505Abstract: To realize a software radio processing with a reduced circuit area by hardware and software which can process transmission and reception, or synchronization and demodulation in time division. There are provided a circuit DRC that can dynamically change a configuration with a structure that can change the configuration at a high speed, a general processor, and an interface for connection with an external device such as an AD converter or a DA converter. Software radio is realized by using a software radio chip that conducts plural different processing such as transmission and reception, or synchronization and demodulation in time division. The different processing during the radio signal processing can be conducted in time division. As a result, the software radio can be realized with a circuit of a reduced area in a software radio system that allocates regions of an FPGA to the respective processing.Type: GrantFiled: October 3, 2005Date of Patent: July 13, 2010Assignee: Hitachi, Ltd.Inventors: Hiroshi Tanaka, Takanobu Tsunoda, Tetsuroo Honmura, Manabu Kawabe, Masashi Takada
-
Patent number: 7752421Abstract: A parallel-prefix broadcast for a parallel-prefix operation on a parallel computer includes: configuring, on each node, a parallel-prefix contribution buffer for storing the node's parallel-prefix contribution; configuring, on each node, a parallel-prefix results buffer for storing results of a operation, the results buffer having a position for each node that corresponds to node's rank; and repeatedly for each position in the results buffer: processing in parallel by each node, including: determining, by the node, whether the current position in the results buffer is to include the node's contribution, if the current position is not to include the contribution, contributing the identity element, and if the current position is to include the contribution, contributing the contribution, performing, by each node, the operation using the contributed identity elements and the contributed contributions, yielding a result from the operation, and storing, by each node, the result in the position in the results buffeType: GrantFiled: April 19, 2007Date of Patent: July 6, 2010Assignee: International Business Machines CorporationInventors: Charles J. Archer, Amanda Peters, Gary R. Ricard, Albert Sidelnik, Brian E. Smith
-
Patent number: 7743235Abstract: A parallel hardware-based multithreaded processor is described. The processor includes a general purpose processor that coordinates system functions and a plurality of microengines that support multiple hardware threads or contexts. The processor also includes a memory control system that has a first memory controller that sorts memory references based on whether the memory references are directed to an even bank or an odd bank of memory and a second memory controller that optimizes memory references based upon whether the memory references are read references or write references. Instructions for switching and branching based on executing contexts are also disclosed.Type: GrantFiled: June 6, 2007Date of Patent: June 22, 2010Assignee: Intel CorporationInventors: Gilbert Wolrich, Matthew Adiletta, William R. Wheeler
-
Patent number: 7734895Abstract: An integrated circuit includes a plurality of processor core. Processing instructions in the integrated circuit includes: managing a plurality of sets of processor cores, each set including one or more processor cores assigned to a function associated with executing instructions; and reconfiguring the number of processor cores assigned to at least one of the sets during execution based on characteristics associated with executing the instructions.Type: GrantFiled: April 28, 2006Date of Patent: June 8, 2010Assignee: Massachusetts Institute of TechnologyInventors: Anant Agarwal, David Wentzlaff
-
Publication number: 20100131737Abstract: A method for generating a reflection of data in a plurality of processing elements comprises shifting the data along, for example, each row in the array until each processing element in the row has received all the data held by every other processing element in that row. Each processing element stores and outputs final data as a function of its position in the row. A similar reflection along a horizontal line can be achieved by shifting data along columns instead of rows. Also disclosed is a method for reflecting data in a matrix of processing elements about a vertical line comprising shifting data between processing elements arranged in rows. An initial count is set in each processing element according to the expression (2×Col_Index)MOD(array size). In one embodiment, a counter counts down from the initial count in each processing element as a function of the number of shifts that have peen performed. Output is selected as a function of the current count.Type: ApplicationFiled: January 28, 2010Publication date: May 27, 2010Applicant: Micron Technology, Inc.Inventor: Mark Beaumont
-
Patent number: 7707388Abstract: In one embodiment, a serial processor is configured to execute software instructions in a software program in serial. A serial memory is configured to store data for use by the serial processor in executing the software instructions in serial. A plurality of parallel processors are configured to execute software instructions in the software program in parallel. A plurality of partitioned memory modules are provided and configured to store data for use by the plurality of parallel processors in executing software instructions in parallel. Accordingly, a processor/memory structure is provided that allows serial programs to use quick local serial memories and parallel programs to use partitioned parallel memories. The system may switch between a serial mode and a parallel mode. The system may incorporate pre-fetching commands of several varieties.Type: GrantFiled: November 29, 2006Date of Patent: April 27, 2010Assignee: XMTT Inc.Inventor: Uzi Vishkin
-
Publication number: 20100100703Abstract: A system and a method for parallel computing for solving complex problems is envisaged. Particularly, hierarchical parallel computing system is envisaged by this invention, which is formed by multiple levels of groups, where each group consists of multiple processing elements. Each group of the parallel computing system models as processing element to its immediate upper layer. Thus, each processing element is hierarchically tagged to its immediate upper level, and a multi-level tier of groups are formed. In accordance with this invention, the parallel computing system operates by breaking any problem hierarchically, first across the groups and then within the groups. This hierarchical breakup of the problem helps in significantly improving the time required for processing a problem.Type: ApplicationFiled: October 15, 2009Publication date: April 22, 2010Applicant: Computational Research Laboratories Ltd.Inventors: Chandan Basu, Mandar Nadgir, Avinash Pandey
-
Publication number: 20100070738Abstract: A flexible results pipeline for a processing element of a parallel processor is described. A plurality of result registers are selectively connected to each other, to processing logic of the processing element and to a neighbourhood connection register configured to receive data from and send data to other processing elements. The connections between the result registers and between the result registers and the neighbourhood connection register are selectively configurable by applied control signals.Type: ApplicationFiled: November 18, 2009Publication date: March 18, 2010Applicant: Micron Technology, Inc.Inventor: GRAHAM KIRSCH
-
Patent number: 7676444Abstract: A search engine for selectively perform iterative compare operations between a searchable pattern and S overlapping substrings of an input string of characters includes a memory for storing a bitmap having S next success size (NSS) bits, wherein each NSS bit indicates whether an associated substring including a corresponding unique number of the input characters is to be compared with the searchable pattern in successive compare operations, and includes a compare circuit for selectively performing the successive compare operations in response to the NSS bits.Type: GrantFiled: March 21, 2007Date of Patent: March 9, 2010Assignee: NetLogic Microsystems, Inc.Inventors: Srinivasan Venkatachary, Pankaj Gupta
-
Patent number: 7657586Abstract: An archive cluster application runs in a distributed manner across a redundant array of independent nodes. Each node preferably runs a complete archive cluster application instance. A given nodes provides a data repository, which stores up to a large amount (e.g., a terabyte) of data, while also acting as a portal that enables access to archive files. Each symmetric node has a set of software processes, e.g., a request manager, a storage manager, a metadata manager, and a policy manager. The request manager manages requests to the node for data (i.e., file data), the storage manager manages data read/write functions from a disk associated with the node, and the metadata manager facilitates metadata transactions and recovery across the distributed database. The policy manager implements one or more policies, which are operations that determine the behavior of an “archive object” within the cluster. The archive cluster application provides object-based storage.Type: GrantFiled: December 13, 2006Date of Patent: February 2, 2010Assignee: Archivas, Inc.Inventors: Andres Rodriguez, Jack A. Orenstein, David M. Shaw, Benjamin K. D. Bernhard
-
Patent number: 7650484Abstract: An array-type computer processor including a data path unit communicating with a state control unit obtains data of a predetermined number of cooperative partial instruction codes, and operates with temporarily holding only a predetermined number of data-obtained instruction codes comprising cooperative partial instruction codes corresponding to contexts and operation states for the data path unit and the state control unit, respectively, from an external program memory which stores data of a computer program.Type: GrantFiled: February 3, 2005Date of Patent: January 19, 2010Assignees: NEC Corporation, NEC Electronics CorporationInventors: Takeshi Inuo, Nobuki Kajihara, Takao Toi, Tooru Awashima, Hirokazu Kami, Taro Fujii, Kenichiro Anjo, Kouichiro Furuta, Masato Motomura
-
Patent number: 7650483Abstract: A data processing apparatus and method are provided for handling execution of instructions within a data processing apparatus having a plurality of processing units. Each processing unit is operable to execute a sequence of instructions so as to perform associated operations, and at least a subset of the processing units form a cluster. Instruction forwarding logic is provided which for at least one instruction executed by at least one of the processing units in the cluster causes that instruction to be executed by each of the other processing units in the cluster, for example by causing that instruction to be inserted into the sequences of instructions executed by each of the other processing units in the cluster.Type: GrantFiled: November 3, 2006Date of Patent: January 19, 2010Assignee: ARM LimitedInventors: Elodie Charra, Frederic Claude Marie Piry, Richard Roy Grisenthwaite, Mélanie Emanuelle Lucie Vincent, Norbert Bernard Eugéne Lataille, Jocelyn Francois Orion Jaubert, Stuart David Biles
-
Publication number: 20090327653Abstract: A reconfigurable computing circuit for reducing amount of dummy data to be stored in data registers, which is required when the wiring is shared by the configuration information bus and scan chain. When data is to be stored in data registers and configuration registers constituting the scan chain in reconfig computing block 2010, reg setting data selecting unit 3400 selects either a value stored in reg setting data storage unit 3000 or an initial value output from data reg data generating unit 4000, based on the information stored in reg type managing unit 1100 that indicates the types of registers and the connection order of the registers in the scan chain, and outputs the selected value in sequence to the scan chain under control of scan/reconfig control unit 1000. Each register in the scan chain then shifts data stored therein to the next register in the scan chain in sequence.Type: ApplicationFiled: April 18, 2008Publication date: December 31, 2009Inventors: Masaki MAEDA, Takahiro Ichinomiya
-
Patent number: 7636835Abstract: An integrated circuit comprises a plurality of tiles. Each tile comprises a processor, and a switch including switching circuitry to forward data received over data paths from other tiles to the processor and to switches of other tiles, and to forward data received from the processor to switches of other tiles. The integrated circuit further comprises one or more interface modules including circuitry to transfer data to and from a device external to the tiles; and a sub-port routing network including circuitry to route data between a port of a switch and a plurality of sub-ports coupled to one or more interface modules.Type: GrantFiled: April 14, 2006Date of Patent: December 22, 2009Assignee: Tilera CorporationInventors: Carl G. Ramey, David Wentzlaff, Anant Agarwal
-
Patent number: 7634159Abstract: An array transform system for parallel computation of a plurality of elements of an array transform includes a memory for storing an array of data elements. Each column of data elements from the memory is copied to a shifter that shifts the column of data elements in accordance with a shift value to produce a shifted column of data elements. The shifted columns of data elements are accumulated in a plurality of accumulators, with each accumulator producing an element of the array transform. A controller controls the shift value dependent upon the position of the column of data elements in the array of data elements.Type: GrantFiled: December 8, 2004Date of Patent: December 15, 2009Assignee: Motorola, Inc.Inventors: Malcolm R. Dwyer, James E. Crenshaw, Zhiyuan Li
-
Publication number: 20090307467Abstract: Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer. Each compute node includes at least two processing cores. Each processing core has contribution data for the allreduce operation. Performing an allreduce operation on a plurality of compute nodes of a parallel computer includes: establishing one or more logical rings among the compute nodes, each logical ring including at least one processing core from each compute node; performing, for each logical ring, a global allreduce operation using the contribution data for the processing cores included in that logical ring, yielding a global allreduce result for each processing core included in that logical ring; and performing, for each compute node, a local allreduce operation using the global allreduce results for each processing core on that compute node.Type: ApplicationFiled: May 21, 2008Publication date: December 10, 2009Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Ahmad Faraj
-
Patent number: 7627698Abstract: A variety of advantageous mechanisms for improved data transfer control within a data processing system are described. A DMA controller is described which is implemented as a multiprocessing transfer engine supporting multiple transfer controllers which may work independently or in cooperation to carry out data transfers, with each transfer controller acting as an autonomous processor, fetching and dispatching DMA instructions to multiple execution units. In particular, mechanisms for initiating and controlling the sequence of data transfers are provided, as are processes for autonomously fetching DMA instructions which are decoded sequentially but executed in parallel.Type: GrantFiled: July 30, 2007Date of Patent: December 1, 2009Assignee: Altera CorporationInventors: Edwin Franklin Barry, Edward A. Wolff
-
Patent number: 7627736Abstract: A data processing apparatus includes a plurality of processing elements arranged in a single instruction multiple data array. The apparatus is operable to process multiple instructions streams in parallel with one another.Type: GrantFiled: May 18, 2007Date of Patent: December 1, 2009Assignee: ClearSpeed Technology plcInventors: Dave Stuttard, Dave Williams, Eamon O'Dea, Gordon Faulds, John Rhoades, Ken Cameron, Phil Atkin, Paul Winser, Russell David, Ray McConnell, Tim Day, Trey Greer
-
Patent number: 7603541Abstract: A method is disclosed for achieving synchronization in an array of semi-synchronous devices. A processor array has an array of processor elements, wherein each of said processor elements comprises a cycle counter, and a master processor element is able to transmit control command signals to each of the other processor elements. Each processor element is such that, on receipt of a control command signal, it acts on that signal only when its cycle counter reaches a predetermined value, and the master processor element is such that it transmits control command signals only when its cycle counter takes a value which is within a predetermined range, or “safe window”. By appropriate setting of the “safe window”, it can be guaranteed that, when the master processor element transmits a control command signal to each of the other processor elements, those command control signals are acted upon at corresponding times within the other processor elements.Type: GrantFiled: December 12, 2003Date of Patent: October 13, 2009Assignee: Picochip Designs LimitedInventors: John Matthew Nolan, Roger Paul Dealtry
-
Publication number: 20090249028Abstract: The present invention relates to a processor that, as its main feature, has an internal raster of ALUs, with the help of which sequential programs are executed. The connections between the ALUs are automatically created at runtime dynamically by means of multiplexers. A central decoding and configuration unit that creates configuration data for the ALU grid from a stream of conventional assembler commands at runtime is responsible for creating the connections. In addition to the ALU grid, a special unit for the execution of memory accesses and another unit for the processing of branch instructions are provided. The novel architecture that is the foundation of the processor makes efficient execution of both control flow- and data flow-oriented tasks possible.Type: ApplicationFiled: June 12, 2007Publication date: October 1, 2009Inventor: Sascha Uhrig
-
Patent number: 7596678Abstract: A transpose of data appearing in a plurality of processing elements comprises shifting the data along diagonals of the plurality of processing elements until the processing elements in the diagonal have received the data held by every other processing element in that diagonal. Shifting along diagonals can be accomplished by executing pairs of horizontal and vertical shifts in the x-y directions or pairs of shifts in perpendicular directions, e.g., x-z. Each processing element stores data as its final output data as a function of the processing element's position. In one embodiment, an initial count is either loaded into each processing element or calculated locally based on the processing element's location. The initial count may be given by one of the following expressions: (x+y+1)MOD(array size); (C+R+1)MOD(array size); (C+y+1); or MOD(array size); or (x+R+1)MOD(array size).Type: GrantFiled: October 20, 2003Date of Patent: September 29, 2009Assignee: Micron Technology, Inc.Inventor: Mark Beaumont
-
Publication number: 20090228683Abstract: A controller operable to control an array of processing elements comprises a retrieval unit operable to retrieve instruction items for each of a plurality of instructions streams, each instruction stream having a plurality of instructions items, a combining unit operable to combine the plurality of instruction streams into a serial instruction stream, and a distribution unit operable to distribute the serial instruction stream to an array of processing elements.Type: ApplicationFiled: March 13, 2009Publication date: September 10, 2009Applicant: ClearSpeed Technology plcInventors: Dave Stuttard, Dave Williams, Eamon O'Dea, Gordon Faulds, John Rhoades, Ken Cameron, Phil Atkin, Paul Winser, Russell David, Ray McConnell, Tim Day, Trey Greer
-
Patent number: 7581079Abstract: A shared memory network for communicating between processors using store and load instructions is described. A new processor architecture which may be used with the shared memory network is also described that uses arithmetic/logic instructions that do not specify any source operand addresses or target operand addresses. The source operands and target operands for arithmetic/logic execution units are provided by independent load instruction operations and independent store instruction operations.Type: GrantFiled: March 26, 2006Date of Patent: August 25, 2009Inventor: Gerald George Pechanek
-
Publication number: 20090210652Abstract: There is provided a method for routing a plurality of signals in a processor array, the processor array comprising a plurality of processor elements interconnected by a network of switches, each signal having a respective source processor element and at least one destination processor element in the processor array, the method comprising (i) identifying a signal from the plurality of unrouted signals to route; (ii) identifying a candidate route from the source processor element to the destination processor element, the candidate route using a first plurality of switches; (iii) evaluating the candidate route by determining whether there are offset values that allow the signal to be routed through the first plurality of switches; and (iv) attempting to route the signal using one of the offset values identified in step (iii).Type: ApplicationFiled: February 9, 2009Publication date: August 20, 2009Inventors: Andrew William George DULLER, William Philip ROBBINS
-
Patent number: 7577820Abstract: An integrated circuit comprises a plurality of tiles. Each tile comprises a processor including a storage module, wherein the processor is configured to process multiple streams of instructions, a switch including switching circuitry to forward data received over data paths from other tiles to the processor and to switches of other tiles, and to forward data received from the processor to switches of other tiles, and coupling circuitry configured to couple data resulting from processing an instruction from at least one of the streams of instructions to the storage module and to the switch.Type: GrantFiled: April 14, 2006Date of Patent: August 18, 2009Assignee: Tilera CorporationInventors: David Wentzlaff, Anant Agarwal
-
Patent number: 7577821Abstract: An integrated circuit device comprising a data processing block including a first matrix and a second matrix is disclosed. The first matrix and the second matrix respectively include a plurality of types of operation units and a wiring group for connecting the plurality of types of operation units, a configuration of data flow with the plurality of types of operation units being changeable by changing a route of the wiring group for data supplying to the plurality of types of operation units. One of the plurality of types of operation units is a delay type operation unit that include a data path suited to processing for delaying a transfer time of data.Type: GrantFiled: February 1, 2007Date of Patent: August 18, 2009Assignee: IPFlex Inc.Inventors: Kenji Ikeda, Hiroshi Shimura, Tomoyoshi Sato
-
Publication number: 20090204788Abstract: Disclosed is an array of programmable data-processing cells configured as a plurality of cross-connected pipelines. An apparatus includes cells capable of performing data-processing functions selectable by a presented instruction. A first set of cells includes an input cell, an output cell, and a series of at least one interior cell providing an acyclic data processing path from the input cell to the output cell. Additional cells are similarly configured. Memory presents configuration instructions to cells in response to a configuration code. Data advances through ranks of the cells. The configuration code advances to memory associated with a rank in tandem with the data.Type: ApplicationFiled: January 9, 2009Publication date: August 13, 2009Applicant: Theseus Research, Inc.Inventor: Karl M. Fant
-
Patent number: 7574581Abstract: A method of communicating between processing units on different integrated circuit chips in a multi-processor computer system by issuing a command from a source processing unit to a destination processing unit, receiving the command at the destination processing unit while the destination processing unit is processing program instructions, and accessing free-running, scan registers in clock-controlled components of the destination processing unit without interrupting processing of the program instructions by the destination processing unit. The access may be a read from status or mode registers of the destination processing unit, or write to control or mode registers. Many processing units can be interconnected in a ring topology, and the access command can be passed from the source processing unit through several other processing units before reaching the destination processing unit.Type: GrantFiled: April 28, 2003Date of Patent: August 11, 2009Assignee: International Business Machines CorporationInventors: Michael Stephen Floyd, Larry Scott Leitner, Kevin Franklin Reick, Kevin Dennis Woodling
-
Patent number: 7574582Abstract: There is disclosed a processor array, which achieves an approximately constant latency. Communications to and from the farthest array elements are suitably pipelined for the distance, while communications to and from closer array elements are deliberately “over-pipelined” such that the latency to all end-point elements is the same number of clock cycles. The processor array has a plurality of primary buses, each connected to a primary bus driver, and each having a respective plurality of primary bus nodes thereon; respective pluralities of secondary buses, connected to said primary bus nodes; a plurality of processor elements, each connected to one of the secondary buses; and delay elements associated with the primary bus nodes, for delaying communications with processor elements connected to different ones of the secondary buses by different amounts, in order to achieve a degree of synchronization between operation of said processor elements.Type: GrantFiled: January 26, 2004Date of Patent: August 11, 2009Assignee: Picochip Designs LimitedInventor: John Matthew Nolan
-
Patent number: 7571300Abstract: A memory system includes a plurality of memory blocks, each having a dedicated local arithmetic logic unit (ALU). A data value having a plurality of bytes is stored such that each of the bytes is stored in a corresponding one of the memory blocks. In a read-modify-write operation, each byte of the data value is read from the corresponding memory block, and is provided to the corresponding ALU. Similarly, each byte of a modify data value is provided to a corresponding ALU on a memory data bus. Each ALU combines the read byte with the modify byte to create a write byte. Because the write bytes are all generated locally within the ALUs, long signal delay paths are avoided. Each ALU also generates two possible carry bits in parallel, and then uses the actual received carry bit to select from the two possible carry bits.Type: GrantFiled: January 8, 2007Date of Patent: August 4, 2009Assignee: Integrated Device Technologies, Inc.Inventor: Tak Kwong Wong
-
Patent number: 7565287Abstract: Techniques for implementing vocoders in parallel digital signal processors are described. A preferred approach is implemented in conjunction with the BOPS® Manifold Array (ManArray™) processing architecture so that in an array of N parallel processing elements, N channels of voice communication are processed in parallel. Techniques for forcing vocoder processing of one data-frame to take the same number of cycles are described. Improved throughput and lower clock rates can be achieved.Type: GrantFiled: December 20, 2005Date of Patent: July 21, 2009Assignee: Altera CorporationInventors: Ali Soheil Sadri, Navin Jaffer, Anissim A. Silivra, Bin Huang, Matthew Plonski
-
Publication number: 20090164752Abstract: A data processor comprises a plurality of processing elements (PEs), with memory local to at least one of the processing elements, and a data packet-switched network interconnecting the processing elements and the memory to enable any of the PEs to access the memory. The network consists of nodes arranged linearly or in a grid, e.g., in a SIMD array, so as to connect the PEs and their local memories to a common controller. Transaction-enabled PEs and nodes set flags, which are maintained until the transaction is completed and signal status to the controller e.g., over a series of OR-gates. The processor performs memory accesses on data stored in the memory in response to control signals sent by the controller to the memory. The local memories share the same memory map or space. External memory may also be connected to the “end” nodes interfacing with the network, eg to provide cache.Type: ApplicationFiled: August 11, 2005Publication date: June 25, 2009Applicant: CLEARSPEED TECHNOLOGY PLCInventor: Ray McConnell