Array Processor Operation Patents (Class 712/16)

Application specific (Class 712/17)

Data flow array processor (Class 712/18)

Systolic array processor (Class 712/19)

Multimode (e.g., mimd to simd, etc.) (Class 712/20)

Multiple instruction, multiple data (mimd) (Class 712/21)

Single instruction, multiple data (simd) (Class 712/22)

Virtual world simulation systems and methods utilizing parallel coprocessors, and computer program products thereof

Patent number: 7908462

Abstract: The current invention provides a virtual world simulation system capable of hosting with massive amount of concurrent players by integrating commodity parallel co-processors into servers. The current invention proposes novel parallel processing algorithms to make use of commodity parallel co-processors like a graphic processing unit (GPU) or any specialized hardware with parallel architecture design like a field-programmable gate array (FPGA), to accelerate virtual world simulation.

Type: Grant

Filed: June 9, 2010

Date of Patent: March 15, 2011

Assignee: Zillians Incorporated

Inventor: Mu Chi Sung
Methods and apparatus for providing data transfer control

Patent number: 7908409

Abstract: A variety of advantageous mechanisms for improved data transfer control within a data processing system are described. A DMA controller is described which is implemented as a multiprocessing transfer engine supporting multiple transfer controllers which may work independently or in cooperation to carry out data transfers, with each transfer controller acting as an autonomous processor, fetching and dispatching DMA instructions to multiple execution units. In particular, mechanisms for initiating and controlling the sequence of data transfers are provided, as are processes for autonomously fetching DMA instructions which are decoded sequentially but executed in parallel.

Type: Grant

Filed: August 6, 2009

Date of Patent: March 15, 2011

Assignee: Altera Corporation

Inventors: Edwin Franklin Barry, Edward A. Wolff
Asynchronous computer communication

Patent number: 7904615

Abstract: A computer array (10) has a plurality of computers (12). The computers (12) communicate with each other asynchronously, and the computers (12) themselves operate in a generally asynchronous manner internally. When one computer (12) attempts to communicate with another it goes to sleep until the other computer (12) is ready to complete the transaction, thereby saving power and reducing heat production. A plurality of read lines (18), write lines (20) and data lines (22) interconnect the computers (12). When one computer (12) sets a read line (18) high and the other computer sets a corresponding write line (20) then data is transferred on the data lines (22). When both the read line (18) and corresponding write line (20) go low this allows both communicating computers (12) to know that the communication is completed. An acknowledge line (72) goes high to restart the computers (12).

Type: Grant

Filed: February 16, 2006

Date of Patent: March 8, 2011

Assignee: VNS Portfolio LLC

Inventor: Charles H. Moore
Asynchronous power saving computer

Patent number: 7904695

Abstract: A computer array (10) has a plurality of computers (12). The computers (12) communicate with each other asynchronously, and the computers (12) themselves operate in a generally asynchronous manner internally. When one computer (12) attempts to communicate with another it goes to sleep until the other computer (12) is ready to complete the transaction, thereby saving power and reducing heat production. A slot sequencer (42) in each of the computers produces a timing pulse to cause the computer (12) to execute a next instruction. However, when the present instruction is a read or write type instruction, the slot sequencer does not produce the pulse until an acknowledge signal (86) starts it. The acknowledge signal (86) is produced when it is recognized that the communication has been completed by the other computer (12).

Type: Grant

Filed: February 16, 2006

Date of Patent: March 8, 2011

Assignee: VNS Portfolio LLC

Inventor: Charles H. Moore
Multi-threading processors, integrated circuit devices, systems, and processes of operation and manufacture

Patent number: 7890735

Abstract: A multi-threaded microprocessor (1105) for processing instructions in threads. The microprocessor (1105) includes first and second decode pipelines (1730.0, 1730.1), first and second execute pipelines (1740, 1750), and coupling circuitry (1916) operable in a first mode to couple first and second threads from the first and second decode pipelines (1730.0, 1730.1) to the first and second execute pipelines (1740, 1750) respectively, and the coupling circuitry (1916) operable in a second mode to couple the first thread to both the first and second execute pipelines (1740, 1750). Various processes of manufacture, articles of manufacture, processes and methods of operation, circuits, devices, and systems are disclosed.

Type: Grant

Filed: August 23, 2006

Date of Patent: February 15, 2011

Assignee: Texas Instruments Incorporated

Inventor: Thang Tran
Architectural enhancements to CPU microcode load mechanism using inter processor interrupt messages

Patent number: 7882333

Abstract: A method for loading microcode to a plurality of cores within a processor. The method includes loading the microcode to a first core of the plurality of cores within the processor system and generating a broadcast inter process interrupt (IPI) message via the first core. The IPI message causes other cores within the processor system to synchronize respective microcode with the microcode that is loaded into the first core. The synchronizing loads microcode to the plurality of cores without requiring independent loads of microcode to each core.

Type: Grant

Filed: November 5, 2007

Date of Patent: February 1, 2011

Assignee: Dell Products L.P.

Inventor: Mukund Khatri
Matrix of processors with data stream instruction execution pipeline coupled to data switch linking to neighbor units by non-contentious command channel / data channel

Patent number: 7870365

Abstract: In some embodiments, control and data messages are transmitted non-contentiously over corresponding control and data channels of inter-processor links in a matrix of mesh-interconnected matrix processors. A data stream instruction executed by a user thread of an instruction processing pipeline of a matrix processor may initiate a data stream transfer by a hardware data switch of the matrix processor over multiple consecutive cycles over a data channel. While the data stream is being transferred, the corresponding control channel may transfer control messages non-contentiously with respect to the data stream. The control messages may be messages received from other matrix processors and/or control messages initiated by a kernel thread of the current matrix processor.

Type: Grant

Filed: July 7, 2008

Date of Patent: January 11, 2011

Assignee: Ovics

Inventors: Sorin C Cismas, Ilie Garbacea
METHOD FOR SCHEDULING START-UP AND SHUT-DOWN OF MAINFRAME APPLICATIONS USING TOPOGRAPHICAL RELATIONSHIPS

Publication number: 20100332793

Abstract: The illustrative embodiments provide for a computer-implemented method for representing actions in a data processing system. A table is generated. The table comprises a plurality of rows and columns. Ones of the columns represent corresponding ones of computer applications that can start or stop in parallel with each other in a data processing system. Ones of the rows represent corresponding ones of sequences of actions within a corresponding column. Additionally, the table represents a definition of relationships among memory address spaces, wherein the table represents when each particular address space is started or stopped during one of a start-up process, a recovery process, and a shut-down process. The resulting table is stored.

Type: Application

Filed: December 21, 2007

Publication date: December 30, 2010

Inventor: Joseph John Katnic
SYSTEM AND METHOD FOR MANAGING PROCESSOR-IN-MEMORY (PIM) OPERATIONS

Publication number: 20100318764

Abstract: A system and method of compiling program code, wherein the program code includes an operation on an array of data elements stored in memory of a computer system. The program code is scanned for operations that are vectorizable. The vectorizable operations are examined to determine whether they should be executed at least in part in a vector atomic memory operation (AMO) functional unit attached to memory. If so, the compiled code includes vector AMO instructions.

Type: Application

Filed: June 12, 2009

Publication date: December 16, 2010

Applicant: Cray Inc.

Inventor: Terry D. Greyzck
Alternately selecting memory units to store and retrieve configuration information in respective areas for a plurality of processing elements to perform pipelined processes

Patent number: 7849288

Abstract: A reconfigurable circuit and control method therefor, capable of enhancing efficiency of implementation of a pipeline process in processing elements and improve processing performance. Processing elements are reconfigured to form a circuit based on configuration information and execute a prescribed process. Memory units store configuration information for the processing elements. A memory switching unit switches the plurality of memory units to store therein the configuration information on the stages of a pipeline process to be performed by the processing elements. A configuration information output unit switches the memory units to output therefrom the configuration information to the plurality of processing elements.

Type: Grant

Filed: October 12, 2006

Date of Patent: December 7, 2010

Assignee: Fujitsu Limited

Inventors: Hisanori Fujisawa, Miyoshi Saito, Toshihiro Ozawa
Processor cluster architecture and associated parallel processing methods

Patent number: 7840778

Abstract: A parallel processing architecture comprising a cluster of embedded processors that share a common code distribution bus. Pages or blocks of code are concurrently loaded into respective program memories of some or all of these processors (typically all processors assigned to a particular task) over the code distribution bus, and are executed in parallel by these processors. A task control processor determines when all of the processors assigned to a particular task have finished executing the current code page, and then loads a new code page (e.g., the next sequential code page within a task) into the program memories of these processors for execution. The processors within the cluster preferably share a common memory (1 per cluster) that is used to receive data inputs from, and to provide data outputs to, a higher level processor. Multiple interconnected clusters may be integrated within a common integrated circuit device.

Type: Grant

Filed: August 31, 2006

Date of Patent: November 23, 2010

Inventors: Richard F. Hobson, Bill Ressl, Allan R. Dyck
Line-plane broadcasting in a data communications network of a parallel computer

Patent number: 7840779

Abstract: Methods, apparatus, and products are disclosed for line-plane broadcasting in a data communications network of a parallel computer, the parallel computer comprising a plurality of compute nodes connected together through the network, the network optimized for point to point data communications and characterized by at least a first dimension, a second dimension, and a third dimension, that include: initiating, by a broadcasting compute node, a broadcast operation, including sending a message to all of the compute nodes along an axis of the first dimension for the network; sending, by each compute node along the axis of the first dimension, the message to all of the compute nodes along an axis of the second dimension for the network; and sending, by each compute node along the axis of the second dimension, the message to all of the compute nodes along an axis of the third dimension for the network.

Type: Grant

Filed: August 22, 2007

Date of Patent: November 23, 2010

Assignee: International Business Machines Corporation

Inventors: Charles J. Archer, Jeremy E. Berg, Michael A. Blocksome, Brian E. Smith
Effecting a broadcast with an allreduce operation on a parallel computer

Patent number: 7827385

Abstract: A parallel computer comprises a plurality of compute nodes organized into at least one operational group for collective parallel operations. Each compute node is assigned a unique rank and is coupled for data communications through a global combining network. One compute node is assigned to be a logical root. A send buffer and a receive buffer is configured. Each element of a contribution of the logical root in the send buffer is contributed. One or more zeros corresponding to a size of the element are injected. An allreduce operation with a bitwise OR using the element and the injected zeros is performed. And the result for the allreduce operation is determined and stored in each receive buffer.

Type: Grant

Filed: August 2, 2007

Date of Patent: November 2, 2010

Assignee: International Business Machines Corporation

Inventors: Gheorghe Almasi, Charles J. Archer, Joseph D. Ratterman, Brian E. Smith
Process for automatic dynamic reloading of data flow processors (DFPs) and units with two- or three-dimensional programmable cell architectures (FPGAs, DPGAs, and the like)

Patent number: 7822881

Abstract: In a data-processing method, first result data may be obtained using a plurality of configurable coarse-granular elements, the first result data may be written into a memory that includes spatially separate first and second memory areas and that is connected via a bus to the plurality of configurable coarse-granular elements, the first result data may be subsequently read out from the memory, and the first result data may be subsequently processed using the plurality of configurable coarse-granular elements. In a first configuration, the first memory area may be configured as a write memory, and the second memory area may be configured as a read memory. Subsequent to writing to and reading from the memory in accordance with the first configuration, the first memory area may be configured as a read memory, and the second memory area may be configured as a write memory.

Type: Grant

Filed: October 7, 2005

Date of Patent: October 26, 2010

Inventors: Martin Vorbach, Robert Münch
Dependency Matrix with Reduced Area and Power Consumption

Publication number: 20100257336

Abstract: A processor having a dependency matrix comprises a first array comprising a plurality of first cells. A second array couples to the first array and comprises a plurality of second cells. A first write port couples to the first array and the second array and writes to the first array and the second array. A first read port couples to the first array and the second array and reads from the first array and the second array. A second read port couples to the first array and reads from the first array. A second write port couples to the second read port, reads from the second read port and writes to the second array.

Type: Application

Filed: April 3, 2009

Publication date: October 7, 2010

Applicant: International Business Machines Corporation

Inventors: Saiful Islam, Mary D. Brown, Bjorn P. Christensen, Sam G. Chu, Robert A. Cordes, Maureen A. Delaney, Jafar Nahidi, Joel A. Silberman
METHOD AND APPARATUS FOR GAME PHYSICS CONCURRENT COMPUTATIONS

Publication number: 20100235608

Abstract: An apparatus for physical properties computation comprising an array processor. The array processor comprises of a plurality of processing elements, said processing elements arranged in a grid. A processing unit (PU) is coupled to the array processor. A local memory is coupled to the PU. The PU broadcasts data to rows of said processing elements in said grid, and performs physical computations in an order of complexity of O((?N) log N).

Type: Application

Filed: May 24, 2010

Publication date: September 16, 2010

Applicant: AiSeek Ltd.

Inventors: Roy ARMONI, Ramon Axelrod
Systolic Data Processing Apparatus and Method

Publication number: 20100211757

Abstract: A systolic data processing apparatus includes a processing element (PE) array and control unit. The PE array comprises a plurality of PEs, each PE executing a thread with respect to different data according to an input instruction and pipelining the instruction at each cycle for executing a program. The control unit inputs a new instruction to a first PE of the PE array at each cycle.

Type: Application

Filed: February 17, 2009

Publication date: August 19, 2010

Applicant: Samsung Electronics Co., Ltd.

Inventors: Gi-Ho Park, Shin-dug Kim, Jung-wook Park, Hoon-mo Yang, Sung-bae Park
Row of floating point accumulators coupled to respective PEs in uppermost row of PE array for performing addition operation

Patent number: 7769981

Abstract: Provided is a parallel processor for supporting a floating-point operation. The parallel processor has a flexible structure for easy development of a parallel algorithm involving multimedia computing, requires low hardware cost, and consumes low power. To support floating-point operations, the parallel processor uses floating-point accumulators and a flag for floating-point multiplication. Using the parallel processor, it is possible to process a geometric transformation operation in a 3-dimensional (3D) graphics process at low cost. Also, the cost of a bus width for instructions can be minimized by a partitioned Single-Instruction Multiple-Data (SIMD) method and a method of conditionally executing instructions.

Type: Grant

Filed: March 11, 2008

Date of Patent: August 3, 2010

Assignee: Electronics and Telecommunications Research Institute

Inventors: Chun Gi Lyuh, Yil Suk Yang, Se Wan Heo, Soon Il Yeo, Tae Moon Roh, Jong Dae Kim, Ki Chul Kim, Se Hoon Yoo
Semiconductor integrated circuit and a software radio device

Patent number: 7756505

Abstract: To realize a software radio processing with a reduced circuit area by hardware and software which can process transmission and reception, or synchronization and demodulation in time division. There are provided a circuit DRC that can dynamically change a configuration with a structure that can change the configuration at a high speed, a general processor, and an interface for connection with an external device such as an AD converter or a DA converter. Software radio is realized by using a software radio chip that conducts plural different processing such as transmission and reception, or synchronization and demodulation in time division. The different processing during the radio signal processing can be conducted in time division. As a result, the software radio can be realized with a circuit of a reduced area in a software radio system that allocates regions of an FPGA to the respective processing.

Type: Grant

Filed: October 3, 2005

Date of Patent: July 13, 2010

Assignee: Hitachi, Ltd.

Inventors: Hiroshi Tanaka, Takanobu Tsunoda, Tetsuroo Honmura, Manabu Kawabe, Masashi Takada
Parallel-prefix broadcast for a parallel-prefix operation on a parallel computer

Patent number: 7752421

Abstract: A parallel-prefix broadcast for a parallel-prefix operation on a parallel computer includes: configuring, on each node, a parallel-prefix contribution buffer for storing the node's parallel-prefix contribution; configuring, on each node, a parallel-prefix results buffer for storing results of a operation, the results buffer having a position for each node that corresponds to node's rank; and repeatedly for each position in the results buffer: processing in parallel by each node, including: determining, by the node, whether the current position in the results buffer is to include the node's contribution, if the current position is not to include the contribution, contributing the identity element, and if the current position is to include the contribution, contributing the contribution, performing, by each node, the operation using the contributed identity elements and the contributed contributions, yielding a result from the operation, and storing, by each node, the result in the position in the results buffe

Type: Grant

Filed: April 19, 2007

Date of Patent: July 6, 2010

Assignee: International Business Machines Corporation

Inventors: Charles J. Archer, Amanda Peters, Gary R. Ricard, Albert Sidelnik, Brian E. Smith
Processor having a dedicated hash unit integrated within

Patent number: 7743235

Abstract: A parallel hardware-based multithreaded processor is described. The processor includes a general purpose processor that coordinates system functions and a plurality of microengines that support multiple hardware threads or contexts. The processor also includes a memory control system that has a first memory controller that sorts memory references based on whether the memory references are directed to an even bank or an odd bank of memory and a second memory controller that optimizes memory references based upon whether the memory references are read references or write references. Instructions for switching and branching based on executing contexts are also disclosed.

Type: Grant

Filed: June 6, 2007

Date of Patent: June 22, 2010

Assignee: Intel Corporation

Inventors: Gilbert Wolrich, Matthew Adiletta, William R. Wheeler
Configuring sets of processor cores for processing instructions

Patent number: 7734895

Abstract: An integrated circuit includes a plurality of processor core. Processing instructions in the integrated circuit includes: managing a plurality of sets of processor cores, each set including one or more processor cores assigned to a function associated with executing instructions; and reconfiguring the number of processor cores assigned to at least one of the sets during execution based on characteristics associated with executing the instructions.

Type: Grant

Filed: April 28, 2006

Date of Patent: June 8, 2010

Assignee: Massachusetts Institute of Technology

Inventors: Anant Agarwal, David Wentzlaff
Method for Manipulating Data in a Group of Processing Elements To Perform a Reflection of the Data

Publication number: 20100131737

Abstract: A method for generating a reflection of data in a plurality of processing elements comprises shifting the data along, for example, each row in the array until each processing element in the row has received all the data held by every other processing element in that row. Each processing element stores and outputs final data as a function of its position in the row. A similar reflection along a horizontal line can be achieved by shifting data along columns instead of rows. Also disclosed is a method for reflecting data in a matrix of processing elements about a vertical line comprising shifting data between processing elements arranged in rows. An initial count is set in each processing element according to the expression (2×Col_Index)MOD(array size). In one embodiment, a counter counts down from the initial count in each processing element as a function of the number of shifts that have peen performed. Output is selected as a function of the current count.

Type: Application

Filed: January 28, 2010

Publication date: May 27, 2010

Applicant: Micron Technology, Inc.

Inventor: Mark Beaumont
Computer memory architecture for hybrid serial and parallel computing systems

Patent number: 7707388

Abstract: In one embodiment, a serial processor is configured to execute software instructions in a software program in serial. A serial memory is configured to store data for use by the serial processor in executing the software instructions in serial. A plurality of parallel processors are configured to execute software instructions in the software program in parallel. A plurality of partitioned memory modules are provided and configured to store data for use by the plurality of parallel processors in executing software instructions in parallel. Accordingly, a processor/memory structure is provided that allows serial programs to use quick local serial memories and parallel programs to use partitioned parallel memories. The system may switch between a serial mode and a parallel mode. The system may incorporate pre-fetching commands of several varieties.

Type: Grant

Filed: November 29, 2006

Date of Patent: April 27, 2010

Assignee: XMTT Inc.

Inventor: Uzi Vishkin
System For Parallel Computing

Publication number: 20100100703

Abstract: A system and a method for parallel computing for solving complex problems is envisaged. Particularly, hierarchical parallel computing system is envisaged by this invention, which is formed by multiple levels of groups, where each group consists of multiple processing elements. Each group of the parallel computing system models as processing element to its immediate upper layer. Thus, each processing element is hierarchically tagged to its immediate upper level, and a multi-level tier of groups are formed. In accordance with this invention, the parallel computing system operates by breaking any problem hierarchically, first across the groups and then within the groups. This hierarchical breakup of the problem helps in significantly improving the time required for processing a problem.

Type: Application

Filed: October 15, 2009

Publication date: April 22, 2010

Applicant: Computational Research Laboratories Ltd.

Inventors: Chandan Basu, Mandar Nadgir, Avinash Pandey
FLEXIBLE RESULTS PIPELINE FOR PROCESSING ELEMENT

Publication number: 20100070738

Abstract: A flexible results pipeline for a processing element of a parallel processor is described. A plurality of result registers are selectively connected to each other, to processing logic of the processing element and to a neighbourhood connection register configured to receive data from and send data to other processing elements. The connections between the result registers and between the result registers and the neighbourhood connection register are selectively configurable by applied control signals.

Type: Application

Filed: November 18, 2009

Publication date: March 18, 2010

Applicant: Micron Technology, Inc.

Inventor: GRAHAM KIRSCH
Iterative compare operations using next success size bitmap

Patent number: 7676444

Abstract: A search engine for selectively perform iterative compare operations between a searchable pattern and S overlapping substrings of an input string of characters includes a memory for storing a bitmap having S next success size (NSS) bits, wherein each NSS bit indicates whether an associated substring including a corresponding unique number of the input characters is to be compared with the searchable pattern in successive compare operations, and includes a compare circuit for selectively performing the successive compare operations in response to the NSS bits.

Type: Grant

Filed: March 21, 2007

Date of Patent: March 9, 2010

Assignee: NetLogic Microsystems, Inc.

Inventors: Srinivasan Venkatachary, Pankaj Gupta
Policy-based management of a redundant array of independent nodes

Patent number: 7657586

Abstract: An archive cluster application runs in a distributed manner across a redundant array of independent nodes. Each node preferably runs a complete archive cluster application instance. A given nodes provides a data repository, which stores up to a large amount (e.g., a terabyte) of data, while also acting as a portal that enables access to archive files. Each symmetric node has a set of software processes, e.g., a request manager, a storage manager, a metadata manager, and a policy manager. The request manager manages requests to the node for data (i.e., file data), the storage manager manages data read/write functions from a disk associated with the node, and the metadata manager facilitates metadata transactions and recovery across the distributed database. The policy manager implements one or more policies, which are operations that determine the behavior of an “archive object” within the cluster. The archive cluster application provides object-based storage.

Type: Grant

Filed: December 13, 2006

Date of Patent: February 2, 2010

Assignee: Archivas, Inc.

Inventors: Andres Rodriguez, Jack A. Orenstein, David M. Shaw, Benjamin K. D. Bernhard
Array—type computer processor with reduced instruction storage

Patent number: 7650484

Abstract: An array-type computer processor including a data path unit communicating with a state control unit obtains data of a predetermined number of cooperative partial instruction codes, and operates with temporarily holding only a predetermined number of data-obtained instruction codes comprising cooperative partial instruction codes corresponding to contexts and operation states for the data path unit and the state control unit, respectively, from an external program memory which stores data of a computer program.

Type: Grant

Filed: February 3, 2005

Date of Patent: January 19, 2010

Assignees: NEC Corporation, NEC Electronics Corporation

Inventors: Takeshi Inuo, Nobuki Kajihara, Takao Toi, Tooru Awashima, Hirokazu Kami, Taro Fujii, Kenichiro Anjo, Kouichiro Furuta, Masato Motomura
Execution of instructions within a data processing apparatus having a plurality of processing units

Patent number: 7650483

Abstract: A data processing apparatus and method are provided for handling execution of instructions within a data processing apparatus having a plurality of processing units. Each processing unit is operable to execute a sequence of instructions so as to perform associated operations, and at least a subset of the processing units form a cluster. Instruction forwarding logic is provided which for at least one instruction executed by at least one of the processing units in the cluster causes that instruction to be executed by each of the other processing units in the cluster, for example by causing that instruction to be inserted into the sequences of instructions executed by each of the other processing units in the cluster.

Type: Grant

Filed: November 3, 2006

Date of Patent: January 19, 2010

Assignee: ARM Limited

Inventors: Elodie Charra, Frederic Claude Marie Piry, Richard Roy Grisenthwaite, Mélanie Emanuelle Lucie Vincent, Norbert Bernard Eugéne Lataille, Jocelyn Francois Orion Jaubert, Stuart David Biles
RECONFIGURABLE COMPUTING CIRCUIT

Publication number: 20090327653

Abstract: A reconfigurable computing circuit for reducing amount of dummy data to be stored in data registers, which is required when the wiring is shared by the configuration information bus and scan chain. When data is to be stored in data registers and configuration registers constituting the scan chain in reconfig computing block 2010, reg setting data selecting unit 3400 selects either a value stored in reg setting data storage unit 3000 or an initial value output from data reg data generating unit 4000, based on the information stored in reg type managing unit 1100 that indicates the types of registers and the connection order of the registers in the scan chain, and outputs the selected value in sequence to the scan chain under control of scan/reconfig control unit 1000. Each register in the scan chain then shifts data stored therein to the next register in the scan chain in sequence.

Type: Application

Filed: April 18, 2008

Publication date: December 31, 2009

Inventors: Masaki MAEDA, Takahiro Ichinomiya
Coupling data in a parallel processing environment

Patent number: 7636835

Abstract: An integrated circuit comprises a plurality of tiles. Each tile comprises a processor, and a switch including switching circuitry to forward data received over data paths from other tiles to the processor and to switches of other tiles, and to forward data received from the processor to switches of other tiles. The integrated circuit further comprises one or more interface modules including circuitry to transfer data to and from a device external to the tiles; and a sub-port routing network including circuitry to route data between a port of a switch and a plurality of sub-ports coupled to one or more interface modules.

Type: Grant

Filed: April 14, 2006

Date of Patent: December 22, 2009

Assignee: Tilera Corporation

Inventors: Carl G. Ramey, David Wentzlaff, Anant Agarwal
System and method for parallel computation of an array transform

Patent number: 7634159

Abstract: An array transform system for parallel computation of a plurality of elements of an array transform includes a memory for storing an array of data elements. Each column of data elements from the memory is copied to a shifter that shifts the column of data elements in accordance with a shift value to produce a shifted column of data elements. The shifted columns of data elements are accumulated in a plurality of accumulators, with each accumulator producing an element of the array transform. A controller controls the shift value dependent upon the position of the column of data elements in the array of data elements.

Type: Grant

Filed: December 8, 2004

Date of Patent: December 15, 2009

Assignee: Motorola, Inc.

Inventors: Malcolm R. Dwyer, James E. Crenshaw, Zhiyuan Li
Performing An Allreduce Operation On A Plurality Of Compute Nodes Of A Parallel Computer

Publication number: 20090307467

Abstract: Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer. Each compute node includes at least two processing cores. Each processing core has contribution data for the allreduce operation. Performing an allreduce operation on a plurality of compute nodes of a parallel computer includes: establishing one or more logical rings among the compute nodes, each logical ring including at least one processing core from each compute node; performing, for each logical ring, a global allreduce operation using the contribution data for the processing cores included in that logical ring, yielding a global allreduce result for each processing core included in that logical ring; and performing, for each compute node, a local allreduce operation using the global allreduce results for each processing core on that compute node.

Type: Application

Filed: May 21, 2008

Publication date: December 10, 2009

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: Ahmad Faraj
Methods and apparatus for providing data transfer control

Patent number: 7627698

Abstract: A variety of advantageous mechanisms for improved data transfer control within a data processing system are described. A DMA controller is described which is implemented as a multiprocessing transfer engine supporting multiple transfer controllers which may work independently or in cooperation to carry out data transfers, with each transfer controller acting as an autonomous processor, fetching and dispatching DMA instructions to multiple execution units. In particular, mechanisms for initiating and controlling the sequence of data transfers are provided, as are processes for autonomously fetching DMA instructions which are decoded sequentially but executed in parallel.

Type: Grant

Filed: July 30, 2007

Date of Patent: December 1, 2009

Assignee: Altera Corporation

Inventors: Edwin Franklin Barry, Edward A. Wolff
Thread manager to control an array of processing elements

Patent number: 7627736

Abstract: A data processing apparatus includes a plurality of processing elements arranged in a single instruction multiple data array. The apparatus is operable to process multiple instructions streams in parallel with one another.

Type: Grant

Filed: May 18, 2007

Date of Patent: December 1, 2009

Assignee: ClearSpeed Technology plc

Inventors: Dave Stuttard, Dave Williams, Eamon O'Dea, Gordon Faulds, John Rhoades, Ken Cameron, Phil Atkin, Paul Winser, Russell David, Ray McConnell, Tim Day, Trey Greer
Array synchronization with counters

Patent number: 7603541

Abstract: A method is disclosed for achieving synchronization in an array of semi-synchronous devices. A processor array has an array of processor elements, wherein each of said processor elements comprises a cycle counter, and a master processor element is able to transmit control command signals to each of the other processor elements. Each processor element is such that, on receipt of a control command signal, it acts on that signal only when its cycle counter reaches a predetermined value, and the master processor element is such that it transmits control command signals only when its cycle counter takes a value which is within a predetermined range, or “safe window”. By appropriate setting of the “safe window”, it can be guaranteed that, when the master processor element transmits a control command signal to each of the other processor elements, those command control signals are acted upon at corresponding times within the other processor elements.

Type: Grant

Filed: December 12, 2003

Date of Patent: October 13, 2009

Assignee: Picochip Designs Limited

Inventors: John Matthew Nolan, Roger Paul Dealtry
PROCESSOR WITH INTERNAL RASTER OF EXECUTION UNITS

Publication number: 20090249028

Abstract: The present invention relates to a processor that, as its main feature, has an internal raster of ALUs, with the help of which sequential programs are executed. The connections between the ALUs are automatically created at runtime dynamically by means of multiplexers. A central decoding and configuration unit that creates configuration data for the ALU grid from a stream of conventional assembler commands at runtime is responsible for creating the connections. In addition to the ALU grid, a special unit for the execution of memory accesses and another unit for the processing of branch instructions are provided. The novel architecture that is the foundation of the processor makes efficient execution of both control flow- and data flow-oriented tasks possible.

Type: Application

Filed: June 12, 2007

Publication date: October 1, 2009

Inventor: Sascha Uhrig
Method of shifting data along diagonals in a group of processing elements to transpose the data

Patent number: 7596678

Abstract: A transpose of data appearing in a plurality of processing elements comprises shifting the data along diagonals of the plurality of processing elements until the processing elements in the diagonal have received the data held by every other processing element in that diagonal. Shifting along diagonals can be accomplished by executing pairs of horizontal and vertical shifts in the x-y directions or pairs of shifts in perpendicular directions, e.g., x-z. Each processing element stores data as its final output data as a function of the processing element's position. In one embodiment, an initial count is either loaded into each processing element or calculated locally based on the processing element's location. The initial count may be given by one of the following expressions: (x+y+1)MOD(array size); (C+R+1)MOD(array size); (C+y+1); or MOD(array size); or (x+R+1)MOD(array size).

Type: Grant

Filed: October 20, 2003

Date of Patent: September 29, 2009

Assignee: Micron Technology, Inc.

Inventor: Mark Beaumont
PARALLEL DATA PROCESSING APPARATUS

Publication number: 20090228683

Abstract: A controller operable to control an array of processing elements comprises a retrieval unit operable to retrieve instruction items for each of a plurality of instructions streams, each instruction stream having a plurality of instructions items, a combining unit operable to combine the plurality of instruction streams into a serial instruction stream, and a distribution unit operable to distribute the serial instruction stream to an array of processing elements.

Type: Application

Filed: March 13, 2009

Publication date: September 10, 2009

Applicant: ClearSpeed Technology plc

Inventors: Dave Stuttard, Dave Williams, Eamon O'Dea, Gordon Faulds, John Rhoades, Ken Cameron, Phil Atkin, Paul Winser, Russell David, Ray McConnell, Tim Day, Trey Greer
Processor composed of memory nodes that execute memory access instructions and cooperate with execution nodes to execute function instructions

Patent number: 7581079

Abstract: A shared memory network for communicating between processors using store and load instructions is described. A new processor architecture which may be used with the shared memory network is also described that uses arithmetic/logic instructions that do not specify any source operand addresses or target operand addresses. The source operands and target operands for arithmetic/logic execution units are provided by independent load instruction operations and independent store instruction operations.

Type: Grant

Filed: March 26, 2006

Date of Patent: August 25, 2009

Inventor: Gerald George Pechanek
SIGNAL ROUTING IN PROCESSOR ARRAYS

Publication number: 20090210652

Abstract: There is provided a method for routing a plurality of signals in a processor array, the processor array comprising a plurality of processor elements interconnected by a network of switches, each signal having a respective source processor element and at least one destination processor element in the processor array, the method comprising (i) identifying a signal from the plurality of unrouted signals to route; (ii) identifying a candidate route from the source processor element to the destination processor element, the candidate route using a first plurality of switches; (iii) evaluating the candidate route by determining whether there are offset values that allow the signal to be routed through the first plurality of switches; and (iv) attempting to route the signal using one of the offset values identified in step (iii).

Type: Application

Filed: February 9, 2009

Publication date: August 20, 2009

Inventors: Andrew William George DULLER, William Philip ROBBINS
Managing data in a parallel processing environment

Patent number: 7577820

Abstract: An integrated circuit comprises a plurality of tiles. Each tile comprises a processor including a storage module, wherein the processor is configured to process multiple streams of instructions, a switch including switching circuitry to forward data received over data paths from other tiles to the processor and to switches of other tiles, and to forward data received from the processor to switches of other tiles, and coupling circuitry configured to couple data resulting from processing an instruction from at least one of the streams of instructions to the storage module and to the switch.

Type: Grant

Filed: April 14, 2006

Date of Patent: August 18, 2009

Assignee: Tilera Corporation

Inventors: David Wentzlaff, Anant Agarwal
IC containing matrices of plural type operation units with configurable routing wiring group and plural delay operation units bridging two wiring groups

Patent number: 7577821

Abstract: An integrated circuit device comprising a data processing block including a first matrix and a second matrix is disclosed. The first matrix and the second matrix respectively include a plurality of types of operation units and a wiring group for connecting the plurality of types of operation units, a configuration of data flow with the plurality of types of operation units being changeable by changing a route of the wiring group for data supplying to the plurality of types of operation units. One of the plurality of types of operation units is a delay type operation unit that include a data path suited to processing for delaying a transfer time of data.

Type: Grant

Filed: February 1, 2007

Date of Patent: August 18, 2009

Assignee: IPFlex Inc.

Inventors: Kenji Ikeda, Hiroshi Shimura, Tomoyoshi Sato
Programmable pipeline array

Publication number: 20090204788

Abstract: Disclosed is an array of programmable data-processing cells configured as a plurality of cross-connected pipelines. An apparatus includes cells capable of performing data-processing functions selectable by a presented instruction. A first set of cells includes an input cell, an output cell, and a series of at least one interior cell providing an acyclic data processing path from the input cell to the output cell. Additional cells are similarly configured. Memory presents configuration instructions to cells in response to a configuration code. Data advances through ranks of the cells. The configuration code advances to memory associated with a rank in tandem with the data.

Type: Application

Filed: January 9, 2009

Publication date: August 13, 2009

Applicant: Theseus Research, Inc.

Inventor: Karl M. Fant
Cross-chip communication mechanism in distributed node topology to access free-running scan registers in clock-controlled components

Patent number: 7574581

Abstract: A method of communicating between processing units on different integrated circuit chips in a multi-processor computer system by issuing a command from a source processing unit to a destination processing unit, receiving the command at the destination processing unit while the destination processing unit is processing program instructions, and accessing free-running, scan registers in clock-controlled components of the destination processing unit without interrupting processing of the program instructions by the destination processing unit. The access may be a read from status or mode registers of the destination processing unit, or write to control or mode registers. Many processing units can be interconnected in a ring topology, and the access command can be passed from the source processing unit through several other processing units before reaching the destination processing unit.

Type: Grant

Filed: April 28, 2003

Date of Patent: August 11, 2009

Assignee: International Business Machines Corporation

Inventors: Michael Stephen Floyd, Larry Scott Leitner, Kevin Franklin Reick, Kevin Dennis Woodling
Processor array including delay elements associated with primary bus nodes

Patent number: 7574582

Abstract: There is disclosed a processor array, which achieves an approximately constant latency. Communications to and from the farthest array elements are suitably pipelined for the distance, while communications to and from closer array elements are deliberately “over-pipelined” such that the latency to all end-point elements is the same number of clock cycles. The processor array has a plurality of primary buses, each connected to a primary bus driver, and each having a respective plurality of primary bus nodes thereon; respective pluralities of secondary buses, connected to said primary bus nodes; a plurality of processor elements, each connected to one of the secondary buses; and delay elements associated with the primary bus nodes, for delaying communications with processor elements connected to different ones of the secondary buses by different amounts, in order to achieve a degree of synchronization between operation of said processor elements.

Type: Grant

Filed: January 26, 2004

Date of Patent: August 11, 2009

Assignee: Picochip Designs Limited

Inventor: John Matthew Nolan
Modular distributive arithmetic logic unit

Patent number: 7571300

Abstract: A memory system includes a plurality of memory blocks, each having a dedicated local arithmetic logic unit (ALU). A data value having a plurality of bytes is stored such that each of the bytes is stored in a corresponding one of the memory blocks. In a read-modify-write operation, each byte of the data value is read from the corresponding memory block, and is provided to the corresponding ALU. Similarly, each byte of a modify data value is provided to a corresponding ALU on a memory data bus. Each ALU combines the read byte with the modify byte to create a write byte. Because the write bytes are all generated locally within the ALUs, long signal delay paths are avoided. Each ALU also generates two possible carry bits in parallel, and then uses the actual received carry bit to select from the two possible carry bits.

Type: Grant

Filed: January 8, 2007

Date of Patent: August 4, 2009

Assignee: Integrated Device Technologies, Inc.

Inventor: Tak Kwong Wong
Methods and apparatus for efficient vocoder implementations

Patent number: 7565287

Abstract: Techniques for implementing vocoders in parallel digital signal processors are described. A preferred approach is implemented in conjunction with the BOPS® Manifold Array (ManArray™) processing architecture so that in an array of N parallel processing elements, N channels of voice communication are processed in parallel. Techniques for forcing vocoder processing of one data-frame to take the same number of cycles are described. Improved throughput and lower clock rates can be achieved.

Type: Grant

Filed: December 20, 2005

Date of Patent: July 21, 2009

Assignee: Altera Corporation

Inventors: Ali Soheil Sadri, Navin Jaffer, Anissim A. Silivra, Bin Huang, Matthew Plonski
PROCESSOR MEMORY SYSTEM

Publication number: 20090164752

Abstract: A data processor comprises a plurality of processing elements (PEs), with memory local to at least one of the processing elements, and a data packet-switched network interconnecting the processing elements and the memory to enable any of the PEs to access the memory. The network consists of nodes arranged linearly or in a grid, e.g., in a SIMD array, so as to connect the PEs and their local memories to a common controller. Transaction-enabled PEs and nodes set flags, which are maintained until the transaction is completed and signal status to the controller e.g., over a series of OR-gates. The processor performs memory accesses on data stored in the memory in response to control signals sent by the controller to the memory. The local memories share the same memory map or space. External memory may also be connected to the “end” nodes interfacing with the network, eg to provide cache.

Type: Application

Filed: August 11, 2005

Publication date: June 25, 2009

Applicant: CLEARSPEED TECHNOLOGY PLC

Inventor: Ray McConnell

prev 1 2 3 4 5 6 7 next