Array Processor Operation Patents (Class 712/16)

Application specific (Class 712/17)

Data flow array processor (Class 712/18)

Systolic array processor (Class 712/19)

Multimode (e.g., mimd to simd, etc.) (Class 712/20)

Multiple instruction, multiple data (mimd) (Class 712/21)

Single instruction, multiple data (simd) (Class 712/22)

Multi-threaded packet processing

Patent number: 8934332

Abstract: A system is disclosed for concurrently processing order sensitive data packets. A first data packet from a plurality of sequentially ordered data packets is directed to a first offload engine. A second data packet from the plurality of sequentially ordered data packets is directed to a second offload engine, wherein the second data packet is sequentially subsequent to the first data packet. The second offload engine receives information from the first offload engine, wherein the information reflects that the first offload engine is processing the first data packet. Based on the information received at the second offload engine, the second offload engine processes the second data packet so that critical events in the processing of the first data packet by the first offload engine occur prior to critical events in the processing of the second data packet by the second offload engine.

Type: Grant

Filed: February 29, 2012

Date of Patent: January 13, 2015

Assignee: International Business Machines Corporation

Inventors: Ronald E. Fuhs, Scott M. Willenborg
Efficient complex multiplication and fast fourier transform (FFT) implementation on the ManArray architecture

Patent number: 8904152

Abstract: Efficient computation of complex multiplication results and very efficient fast Fourier transforms (FFTs) are provided. A parallel array VLIW digital signal processor is employed along with specialized complex multiplication instructions and communication operations between the processing elements which are overlapped with computation to provide very high performance operation. Successive iterations of a loop of tightly packed VLIWs are used allowing the complex multiplication pipeline hardware to be efficiently used. In addition, efficient techniques for supporting combined multiply accumulate operations are described.

Type: Grant

Filed: May 26, 2011

Date of Patent: December 2, 2014

Assignee: Altera Corporation

Inventors: Nikos P. Pitsianis, Gerald George Pechanek, Ricardo Rodriguez
Endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface of a parallel computer

Patent number: 8892850

Abstract: Methods, apparatuses, and computer program products for endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface (‘PAMI’) of a parallel computer are provided. Embodiments include establishing by a parallel application a data communications geometry, the geometry specifying a set of endpoints that are used in collective operations of the PAMI, including associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry. Embodiments also include registering in each endpoint in the geometry a dispatch callback function for a collective operation and executing without blocking, through a single one of the endpoints in the geometry, an instruction for the collective operation.

Type: Grant

Filed: January 17, 2011

Date of Patent: November 18, 2014

Assignee: International Business Machines Corporation

Inventors: Charles J. Archer, Michael A. Blocksome, Bob R. Cernohous, Joseph D. Ratterman, Brian E. Smith
CONFIGURABLE LOGIC INTEGRATED CIRCUIT HAVING A MULTIDIMENSIONAL STRUCTURE OF CONFIGURABLE ELEMENTS

Publication number: 20140337601

Abstract: An array processor composed of processor cells that are programmed by a controlling unit, and that are reprogrammed when a cell has finished a current data processing operation, even while other cell continue to process data with their current programming.

Type: Application

Filed: May 13, 2014

Publication date: November 13, 2014

Applicant: PACT XPP TECHNOLOGIES AG

Inventors: Martin Vorbach, Armin Nuckel
Endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface of a parallel computer

Patent number: 8886916

Abstract: Endpoint-based parallel data processing with non-blocking collective instructions in a PAMI of a parallel computer is disclosed. The PAMI is composed of data communications endpoints, each including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task. The compute nodes are coupled for data communications through the PAMI. The parallel application establishes a data communications geometry specifying a set of endpoints that are used in collective operations of the PAMI by associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry; registering in each endpoint in the geometry a dispatch callback function for a collective operation; and executing without blocking, through a single one of the endpoints in the geometry, an instruction for the collective operation.

Type: Grant

Filed: November 8, 2012

Date of Patent: November 11, 2014

Assignee: International Business Machines Corporation

Inventors: Charles J. Archer, Michael A. Blocksome, Bob R. Cernohous, Joseph D. Ratterman, Brian E. Smith
Memory controller with inter-core interference detection

Patent number: 8880809

Abstract: Embodiments are described for a method for controlling access to memory in a processor-based system comprising monitoring a number of interference events, such as bank contentions, bus contentions, row-buffer conflicts, and increased write-to-read turnaround time caused by a first core in the processor-based system that causes a delay in access to the memory by a second core in the processor-based system; deriving a control signal based on the number of interference events; and transmitting the control signal to one or more resources of the processor-based system to reduce the number of interference events from an original number of interference events.

Type: Grant

Filed: October 29, 2012

Date of Patent: November 4, 2014

Assignee: Advanced Micro Devices Inc.

Inventors: Gabriel Loh, James O'Connor
Thread synchronization in a multi-thread, multi-flow network communications processor architecture

Patent number: 8874878

Abstract: Described embodiments provide a packet classifier for a network processor that generates tasks corresponding to each received packet. The packet classifier includes a scheduler to generate contexts corresponding to tasks received by the packet classifier from processing modules of the network processor. The packet classifier processes threads of instructions, each thread of instructions corresponding to a context received from the scheduler, and each thread associated with a data flow. A thread status table has N entries to track up to N active threads. Each status entry includes a valid status indicator, a sequence value, a thread indicator and a flow indicator. A sequence counter generates a sequence value for each data flow of each thread and is incremented when processing of a thread is started, and is decremented when a thread is completed. Instructions are processed in the order in which the threads were started for each data flow.

Type: Grant

Filed: November 28, 2012

Date of Patent: October 28, 2014

Assignee: LSI Corporation

Inventors: Deepak Mital, James Clee, Jerry Pirog
Identifying Logical Planes Formed Of Compute Nodes Of A Subcommunicator In A Parallel Computer

Publication number: 20140281374

Abstract: In a parallel computer, a plurality of logical planes formed of compute nodes of a subcommunicator may be identified by: for each compute node of the subcommunicator and for a number of dimensions beginning with a first dimension: establishing, by a plane building node, in a positive direction of the first dimension, all logical planes that include the plane building node and compute nodes of the subcommunicator in a positive direction of a second dimension, where the second dimension is orthogonal to the first dimension; and establishing, by the plane building node, in a negative direction of the first dimension, all logical planes that include the plane building node and compute nodes of the subcommunicator in the positive direction of the second dimension.

Type: Application

Filed: March 12, 2013

Publication date: September 18, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: INTERNATIONAL BUSINESS MACHINES CORPORATION
Parallel processing using multi-core processor

Patent number: 8830829

Abstract: Disclosed are methods, systems, paradigms and structures for processing data packets in a communication network by a multi-core network processor. The network processor includes a plurality of multi-threaded core processors and special purpose processors for processing the data packets atomically, and in parallel. An ingress module of the network processor stores the incoming data packets in the memory and adds them to an input queue. The network processor processes a data packet by performing a set of network operations on the data packet in a single thread of a core processor. The special purpose processors perform a subset of the set of network operations on the data packet atomically. An egress module retrieves the processed data packets from a plurality of output queues based on a quality of service (QoS) associated with the output queues, and forwards the data packets towards their destination addresses.

Type: Grant

Filed: December 2, 2013

Date of Patent: September 9, 2014

Assignee: Unbound Networks, Inc.

Inventors: Damon Finney, Ashok Mathur
Processing system with interspersed processors and communication elements having improved wormhole routing

Patent number: 8832413

Abstract: A processing system includes processors and dynamically configurable communication elements (DCCs) coupled together in an interspersed arrangement. A source device may transfer a data item through an intermediate subset of the DCCs to a destination device. The source and destination devices may each correspond to different processors, DCCs, or input/output devices, or mixed combinations of these. In response to detecting a stall after the source device begins transfer of the data item to the destination device and prior to receipt of all of the data item at the destination device, a stalling device is operable to propagate stalling information through one or more of the intermediate subset towards the source device. In response to receiving the stalling information, at least one of the intermediate subset is operable to buffer all or part of the data item.

Type: Grant

Filed: May 29, 2013

Date of Patent: September 9, 2014

Assignee: Coherent Logix, Incorporated

Inventors: Michael B. Doerr, William H. Hallidy, David A. Gibson, Craig M. Chase
Asynchronous computer communication

Patent number: 8825924

Abstract: A computer array (10) has a plurality of computers (12). The computers (12) communicate with each other asynchronously, and the computers (12) themselves operate in a generally asynchronous manner internally. When one computer (12) attempts to communicate with another it goes to sleep until the other computer (12) is ready to complete the transaction, thereby saving power and reducing heat production. A plurality of read lines (18), write lines (20) and data lines (22) interconnect the computers (12). When one computer (12) sets a read line (18) high and the other computer sets a corresponding write line (20) then data is transferred on the data lines (22). When both the read line (18) and corresponding write line (20) go low this allows both communicating computers (12) to know that the communication is completed. An acknowledge line (72) goes high to restart the computers (12).

Type: Grant

Filed: March 4, 2011

Date of Patent: September 2, 2014

Assignee: Array Portfolio LLC

Inventor: Charles H. Moore
SIMD parallel computer system, SIMD parallel computing method, and control program

Patent number: 8769244

Abstract: Uniforming of the processing load is efficiently realized. Each processing element configuring an SIMD parallel computer system includes a data storage module that stores data processed or transferred, a number-of-data-sets storage device that stores number of data sets, and a front data storage device that stores the front data. Each processing element further includes a control processor that compares the number of data sets stored in one processing element with the number of data sets stored in the own processing element, and issues a data distribution leveling instruction that designates an action for updating contents of the data storage module, the number-of-data-sets storage device, and the front data storage device according to a rule determined based on a comparison result of the own processing element and that of the other processing elements and an action for moving the data stored in the one processing element to the own processing element.

Type: Grant

Filed: April 8, 2009

Date of Patent: July 1, 2014

Assignee: Nec Corporation

Inventor: Shorin Kyo
Multi-threaded software-programmable framework for high-performance scalable and modular datapath designs

Patent number: 8761188

Abstract: In the provided architecture, one or more multi-threaded processors may be combined with hardware blocks. The resulting combination allows for data packets to undergo a processing sequence having the flexibility of software programmability with the high-performance of dedicated hardware. For example, a multi-threaded processor can control the high-level tasks of a processing sequence, while the computationally intensive events (e.g., signal processing filters, matrix operations, etc.) are handled by dedicated hardware blocks.

Type: Grant

Filed: April 30, 2008

Date of Patent: June 24, 2014

Assignee: Altera Corporation

Inventors: Anargyros Krikelis, Martin Roberts
CONCURRENT MULTIPLE INSTRUCTION ISSUE OF NON-PIPELINED INSTRUCTIONS USING NON-PIPELINED OPERATION RESOURCES IN ANOTHER PROCESSING CORE

Publication number: 20140164734

Abstract: A method and circuit arrangement utilize inactive non-pipelined operation resources in one processing core of a multi-core processing unit to execute non-pipelined instructions on behalf of another processing core in the same processing unit. Adjacent processing cores in a processing unit may be coupled together such that, for example, when one processing core's non-pipelined execution sequencer is busy, that processing core may issue into another processing core's non-pipelined execution sequencer if that other processing core's non-pipelined execution sequencer is idle, thereby providing intermittent concurrent execution of multiple non-pipelined instructions within each individual processing core.

Type: Application

Filed: December 6, 2012

Publication date: June 12, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
Methods and apparatus for scalable array processor interrupt detection and response

Patent number: 8751772

Abstract: Hardware and software techniques for interrupt detection and response in a scalable pipelined array processor environment are described. Utilizing these techniques, a sequential program execution model with interrupts can be maintained in a highly parallel scalable pipelined array processing containing multiple processing elements and distributed memories and register files. When an interrupt occurs, interface signals are provided to all PEs to support independent interrupt operations in each PE dependent upon the local PE instruction sequence prior to the interrupt. Processing/element exception interrupts are supported and low latency interrupt processing is also provided for embedded systems where real time signal processing is required. Further, a hierarchical interrupt structure is used allowing a generalized debug approach using debug interrupts and a dynamic debut monitor mechanism.

Type: Grant

Filed: June 13, 2013

Date of Patent: June 10, 2014

Assignee: Altera Corporation

Inventors: Edwin Franklin Barry, Patrick R. Marchand, Gerald George Pechanek, Larry D. Larsen
System and apparatus for group floating-point inflate and deflate operations

Patent number: 8683182

Abstract: Systems and apparatuses are presented relating a programmable processor comprising an execution unit that is operable to decode and execute instructions received from an instruction path and partition data stored in registers in the register file into multiple data elements, the execution unit capable of executing group data handling operations that re-arrange data elements in different ways in response to data handling instructions, the execution unit further capable of executing a plurality of different group floating-point and group integer arithmetic operations that each arithmetically operates on the multiple data elements stored in registers in the register file to produce a catenated result that is returned to a register in the register file, wherein the catenated result comprises a plurality of individual results.

Type: Grant

Filed: June 11, 2012

Date of Patent: March 25, 2014

Assignee: Microunity Systems Engineering, Inc.

Inventors: Craig Hansen, John Moussouris, Alexia Massalin
Task Switching and Inter-task Communications for Multi-core Processors

Publication number: 20140075154

Abstract: The invention provides hardware based techniques for switching processing tasks of software programs for execution on a multi-core processor. Invented techniques involve a hardware logic based controller for assigning, adaptive to program processing loads, tasks for processing by cores of a multi-core fabric as well as configuring a set of multiplexers to appropriately interconnect cores of the fabric and program task specific segments at fabric memories, to arrange efficient inter-task communication as well as transferring of activating and de-activating task memory images among the multi-core fabric. The invention thereby provides an efficient, hardware-automated runtime operating system for multi-core processors, minimizing any need to use processing capacity of the cores for traditional operating system software functions.

Type: Application

Filed: September 3, 2013

Publication date: March 13, 2014

Inventor: Mark Henrik Sandstrom
Placement and routing for a multiplexer-based interconnection network

Patent number: 8665727

Abstract: A computer-implemented method is described for determining cost in a non-blocking routing network that provides routing functionality using a single level of a plurality of multiplexers in each row of the routing network. The method includes assigning a respective numerical value, represented by bits, to each row of the routing network. A number of bits that differ between the respective numerical values of each pair of rows of the routing network indicates a number of row traversals necessary to traverse from a first row of the pair to a second row of the pair. A signal routing cost is computed from the number of bits that differ between the respective numerical values of the first row and the second row of the routing network. The calculated signal routing cost is provided to a placement module.

Type: Grant

Filed: June 21, 2010

Date of Patent: March 4, 2014

Assignee: Xilinx, Inc.

Inventor: Stephen M. Trimberger
Architecture and programming in a parallel processing environment with switch-interconnected processors

Patent number: 8656141

Abstract: An integrated circuit includes a plurality of tiles. Each tile includes a pipelined processor configured to process multiple streams of instructions for the processor; and a switch including switching circuitry to forward data over data paths from other tiles to one or more pipeline stages of the processor and to switches of other tiles. At least some of the data is forwarded based on one or more streams of instructions for the switch.

Type: Grant

Filed: December 13, 2005

Date of Patent: February 18, 2014

Assignee: Massachusetts Institute of Technology

Inventor: Anant Agarwal
Packet draining from a scheduling hierarchy in a traffic manager of a network processor

Patent number: 8638805

Abstract: Described embodiments provide for restructuring a scheduling hierarchy of a network processor having a plurality of processing modules and a shared memory. The scheduling hierarchy schedules packets for transmission. The network processor generates tasks corresponding to each received packet associated with a data flow. A traffic manager receives tasks provided by one of the processing modules and determines a queue of the scheduling hierarchy corresponding to the task. The queue has a parent scheduler at each of one or more next levels of the scheduling hierarchy up to a root scheduler, forming a branch of the hierarchy. The traffic manager determines if the queue and one or more of the parent schedulers of the branch should be restructured. If so, the traffic manager drops subsequently received tasks for the branch, drains all tasks of the branch, and removes the corresponding nodes of the branch from the scheduling hierarchy.

Type: Grant

Filed: September 30, 2011

Date of Patent: January 28, 2014

Assignee: LSI Corporation

Inventors: Balakrishnan Sundararaman, Shashank Nemawarkar, David Sonnier, Shailendra Aulakh, Allen Vestal
Parallel processing using multi-core processor

Patent number: 8625422

Abstract: Disclosed are methods, systems, paradigms and structures for processing data packets in a communication network by a multi-core network processor. The network processor includes a plurality of multi-threaded core processors and special purpose processors for processing the data packets atomically, and in parallel. An ingress module of the network processor stores the incoming data packets in the memory and adds them to an input queue. The network processor processes a data packet by performing a set of network operations on the data packet in a single thread of a core processor. The special purpose processors perform a subset of the set of network operations on the data packet atomically. An egress module retrieves the processed data packets from a plurality of output queues based on a quality of service (QoS) associated with the output queues, and forwards the data packets towards their destination addresses.

Type: Grant

Filed: March 5, 2013

Date of Patent: January 7, 2014

Assignee: Unbound Networks

Inventors: Damon Finney, Ashok Mathur
Dynamic reconfigurable circuit with a plurality of processing elements, data network, configuration memory, and immediate value network

Patent number: 8607029

Abstract: A dynamic reconfigurable circuit including a plurality of processing elements each provided with an arithmetic data input port, a configuration data input port and an output port, a data network that is coupled to the arithmetic data input ports and the output ports of the plurality of processing elements, a configuration memory that is coupled via a configuration path to the configuration data input port of a first processor element being at least one of the plurality of processing elements, and an immediate value network that is independent from the data network and that is coupled to the configuration data input port of a second processor element being at least one of the plurality of processing elements. An internal register of a third processor element is coupled to the immediate value network so that data stored in the internal register can be outputted to the immediate value network.

Type: Grant

Filed: December 16, 2008

Date of Patent: December 10, 2013

Assignee: Fujitsu Semiconductor Limited

Inventor: Shin-ichi Sutou
Condensed router headers with low latency output port calculation

Patent number: 8572353

Abstract: Communicating among cores in a computing system comprising a plurality of cores, each core comprising a processor and a switch, includes: routing a packet from an origin core to a destination core over a route including multiple cores; and at each core in the route before the destination core, routing the packet to the next core in the route according to a respective symbol in a sequence of multiple symbols. The respective symbol has a first symbol value indicating a single likely direction and the respective symbol has a second symbol value indicating multiple less likely directions.

Type: Grant

Filed: September 20, 2010

Date of Patent: October 29, 2013

Assignee: Tilera Corporation

Inventors: Ian Rudolf Bratt, Carl G. Ramey, Matthew Mattina
Methods and Apparatus For Attaching Application Specific Functions Within An Array Processor

Publication number: 20130283007

Abstract: A multi-node video signal processor (VSPN) is describes that tightly couples multiple multi-cycle state machines (hardware assist units) to each processor and each memory in each node of an N node scalable array processor. VSPN memory hardware assist instructions are used to initiate multi-cycle state machine functions, to pass parameters to the multi-cycle state machines, to fetch operands from a node's memory, and to control the transfer of results from the multi-cycle state machines.

Type: Application

Filed: June 11, 2013

Publication date: October 24, 2013

Applicant: ALTERA CORPORATION

Inventors: Gerald George Pechanek, Mihailo M. Stojancic
Performing a vector collective operation on a parallel computer having a plurality of compute nodes

Patent number: 8549259

Abstract: Systems, methods and articles of manufacture are disclosed for performing a vector collective operation on a parallel computing system that includes multiple compute nodes and a network connecting the compute nodes that includes an ALU. A collective operation may be performed to determine displacements for the vector collective operation. Descriptors for the vector collective operation may be generated based on the displacements. The vector collective operation may then be performed using the descriptors.

Type: Grant

Filed: September 15, 2010

Date of Patent: October 1, 2013

Assignee: International Business Machines Corporation

Inventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Brian E. Smith
Configurable processing apparatus and system thereof

Patent number: 8549258

Abstract: A configurable processing apparatus includes a plurality of processing units, at least an instruction synchronization control circuit, and at least a configuration memory. Each processing apparatus has a stall-output signal generating circuit to output a stall-output signal, wherein the stall-output signal indicates that an unexpected stall is occurred in the processing unit. The processing unit has a stall-in signal, and an external circuit of the processing unit can control whether the processing unit is stalled according to the stall-in signal. The instruction synchronization control circuit generates the stall-in signals to the processing units in response to a content stored in the configuration memory and the stall-output signals of the processing units, so as to determine operation modes and instruction synchronization of the processing units.

Type: Grant

Filed: February 7, 2010

Date of Patent: October 1, 2013

Assignee: Industrial Technology Research Institute

Inventors: Tzu-Fang Lee, Chien-Hong Lin, Jing-Shan Liang, Chi-Lung Wang
Selectively isolating processor elements into subsets of processor elements

Patent number: 8532288

Abstract: A cryptographic engine for modulo N multiplication, which is structured as a plurality of almost identical, serially connected Processing Elements, is controlled so as to accept input in blocks that are smaller than the maximum capability of the engine in terms of bits multiplied at one time. The serially connected hardware is thus partitioned on the fly to process a variety of cryptographic key sizes while still maintaining all of the hardware in an active processing state.

Type: Grant

Filed: December 1, 2006

Date of Patent: September 10, 2013

Assignee: International Business Machines Corporation

Inventors: Camil Fayad, John K. Li, Siegfried K. H. Sutter, Phil C. Yeh
Processor cluster architecture and associated parallel processing methods

Patent number: 8489857

Abstract: A parallel processing architecture comprising a cluster of embedded processors that share a common code distribution bus. Pages or blocks of code are concurrently loaded into respective program memories of some or all of these processors (typically all processors assigned to a particular task) over the code distribution bus, and are executed in parallel by these processors. A task control processor determines when all of the processors assigned to a particular task have finished executing the current code page, and then loads a new code page (e.g., the next sequential code page within a task) into the program memories of these processors for execution. The processors within the cluster preferably share a common memory (1 per cluster) that is used to receive data inputs from, and to provide data outputs to, a higher level processor. Multiple interconnected clusters may be integrated within a common integrated circuit device.

Type: Grant

Filed: November 5, 2010

Date of Patent: July 16, 2013

Assignee: Schism Electronics, L.L.C.

Inventors: Richard F. Hobson, Bill Ressl, Allan R. Dyck
Methods and apparatus for scalable array processor interrupt detection and response

Patent number: 8489858

Abstract: Hardware and software techniques for interrupt detection and response are provided in a scalable pipelined array processor environment. Utilizing these techniques, a sequential program execution model with interrupts can be maintained in a highly parallel scalable pipelined array processing containing multiple processing elements and distributed memories and register files. When an interrupt occurs, interface signals are provided to all PEs to support independent interrupt operations in each PE dependent upon the local PE instruction sequence prior to the interrupt. Processing/element exception interrupts are supported and low latency interrupt processing is also provided for embedded systems where real time signal processing is required. Further, a hierarchical interrupt structure is used allowing a generalized debug approach using debug interrupts and a dynamic debug monitor mechanism.

Type: Grant

Filed: March 12, 2012

Date of Patent: July 16, 2013

Assignee: Altera Corporation

Inventors: Edwin Franklin Barry, Patrick R. Marchand, Gerald George Pechanek, Larry D. Larsen
Processing array data on SIMD multi-core processor architectures

Patent number: 8484276

Abstract: Techniques are disclosed for converting data into a format tailored for efficient multidimensional fast Fourier transforms (FFTS) on single instruction, multiple data (SIMD) multi-core processor architectures. The technique includes converting data from a multidimensional array stored in a conventional row-major order into SIMD format. Converted data in SIMD format consists of a sequence of blocks, where each block interleaves s rows such that SIMD vector processors may operate on s rows simultaneously. As a result, the converted data in SIMD format enables smaller-sized 1D FFTs to be optimized in SIMD multi-core processor architectures.

Type: Grant

Filed: March 18, 2009

Date of Patent: July 9, 2013

Assignee: International Business Machines Corporation

Inventors: David G. Carlson, Travis M. Drucker, Timothy J. Mullins, Jeffrey S. McAllister, Nelson Ramirez
METHOD AND APPARATUS FOR USING A PREVIOUS COLUMN POINTER TO READ ENTRIES IN AN ARRAY OF A PROCESSOR

Publication number: 20130166876

Abstract: A method and apparatus are described for using a previous column pointer to read a subset of entries of an array in a processor. The array may have a plurality of rows and columns of entries, and each entry in the subset may reside on a different row of the array. A previous column pointer may be generated for each of the rows of the array based on a plurality of bits indicating the number of valid entries in the subset to be read, the previous column pointer indicating whether each entry is in a current column or a previous column. The entries in the subset may be read and re-ordered, and invalid entries in the subset may be replaced with nulls. The valid entries and nulls may then be outputted.

Type: Application

Filed: December 21, 2011

Publication date: June 27, 2013

Applicant: ADVANCED MICRO DEVICES, INC.

Inventors: Srikanth Arekapudi, Shloke Hajela
Clockless computer using a pulse generator that is triggered by an event other than a read or write instruction in place of a clock

Patent number: 8468323

Abstract: A computer array (10) has a plurality of computers (12). The computers (12) communicate with each other asynchronously, and the computers (12) themselves operate in a generally asynchronous manner internally. When one computer (12) attempts to communicate with another it goes to sleep until the other computer (12) is ready to complete the transaction, thereby saving power and reducing heat production. The sleeping computer (12) can be awaiting data or instructions (12). In the case of instructions, the sleeping computer (12) can be waiting to store the instructions or to immediately execute the instructions. In the later case, the instructions are placed in an instruction register (30a) when they are received and executed therefrom, without first placing the instructions first into memory. The instructions can include a micro-loop (100) which is capable of performing a series of operations repeatedly.

Type: Grant

Filed: March 21, 2011

Date of Patent: June 18, 2013

Assignee: ARRAY Portfolio LLC

Inventors: Charles H. Moore, Jeffrey Arthur Fox, John W. Rible
Signal processing apparatus with signal control units and processor units operating based on different threads

Patent number: 8464025

Abstract: A signal processing apparatus able to raise a processing capability in processing accompanying access to a storing means is provided. Stream control units (SCU) 203—0 to 203—3 access data at an external memory system or local memories 204—0 to 204—3 according to a thread under control from a host processor. Processor units (PU) arrays 202—0 to 202—3 perform image processing by a different thread from the thread of the SCUs 203—0 to 203—3.

Type: Grant

Filed: May 22, 2006

Date of Patent: June 11, 2013

Assignee: Sony Corporation

Inventors: Yuji Yamaguchi, Masatoshi Imai, Toshiharu Noda, Naosuke Asari, Tomoo Mitsunaga, Mitsuharu Ohki, Kazumasa Ito, Hidetoshi Nagano, Sumito Arakawa, Kei Ito
Dynamic atomic bitsets

Patent number: 8417733

Abstract: Embodiments of the present invention provide techniques, including systems, methods, and computer readable medium, for dynamic atomic bitsets. A dynamic atomic bitset is a data structure that provides a bitset that can grow or shrink in size as required. The dynamic atomic bitset is non-blocking, wait-free, and thread-safe.

Type: Grant

Filed: March 4, 2010

Date of Patent: April 9, 2013

Assignee: Oracle International Corporation

Inventor: Nathan Reynolds
CACHE AND/OR SOCKET SENSITIVE MULTI-PROCESSOR CORES BREADTH-FIRST TRAVERSAL

Publication number: 20130086354

Abstract: Methods, apparatuses and storage device associated with cache and/or socket sensitive breadth-first iterative traversal of a graph by parallel threads, are disclosed. In embodiments, a vertices visited array (VIS) may be employed to track graph vertices visited. VIS may be partitioned into VIS sub-arrays, taking into consideration cache sizes of LLC, to reduce likelihood of evictions. In embodiments, potential boundary vertices arrays (PBV) may be employed to store potential boundary vertices for a next iteration, for vertices being visited in a current iteration. The number of PBV generated for each thread may take into consideration a number of sockets, over which the processor cores employed are distributed. In various embodiments, the threads may be load balanced; further data locality awareness to reduce inter-socket communication may be considered, and/or lock-and-atomic free update operations may be employed. Other embodiments may be disclosed or claimed.

Type: Application

Filed: September 27, 2012

Publication date: April 4, 2013

Inventors: Nadathur Rajagopalan Satish, Changkyu Kim, Jatin Chhuagani, Jason D. Sewall
Computer architecture for a mobile communication platform

Patent number: 8370605

Abstract: A system includes first and second processors, first and second graphics processing units (GPUs), one or more peripheral devices, a switch matrix, and processor-readable memory. The switch matrix comprises programmable data paths between the processors, the GPUs, and the peripheral devices. Software encoded in the process-readable memory includes a first operating system (OS) executed by the first processor, a second OS executed by the second processor, a matrix scheduling engine, and a media interface switch (MIS) engine. The first OS boots faster than the second OS. The matrix scheduling engine runs on both OSs and configures the data paths in the switch matrix to couple the processors and the GPUs, and to couple the processors and the peripheral devices. The MIS engine runs on the operating systems, detects presence of the peripheral devices, and configures the data paths in the switch matrix to couple the processors and the peripheral devices.

Type: Grant

Filed: November 11, 2009

Date of Patent: February 5, 2013

Assignee: Sunman Engineering, Inc.

Inventors: Allen Nejah, Gholam Reza Golshan, George W. Harvey
Manifold Array Processor

Publication number: 20130019082

Abstract: An array processor includes processing elements arranged in to form a rectangular array. Inter-cluster communication paths are mutually exclusive. Due to the mutual exclusivity of the data paths, communications between the processing elements of each cluster may be combined in a single inter-cluster path, thus eliminating half the wiring required for the path. The length of the longest communication path is not directly determined by the overall dimension of the array, as in conventional torus arrays. Rather, the longest communications path is limited by the inter-cluster spacing. Transpose elements of an N×N torus may be combined in clusters and communicate with one another through intra-cluster communications paths. Transpose operation latency is eliminated in this approach. Each PE may have a single transmit port and a single receive port. Thus, the individual PEs are decoupled from the array topology.

Type: Application

Filed: September 14, 2012

Publication date: January 17, 2013

Applicant: ALTERA CORPORATION

Inventors: Gerald G. Pechanek, Charles W. Kurak, JR.
Reducing Power Consumption Of Uncore Circuitry Of A Processor

Publication number: 20120311360

Abstract: In one embodiment, a multi-core processor includes multiple cores and an uncore, where the uncore includes various logic units including a cache memory, a router, and a power control unit (PCU). The PCU can clock gate at least one of the logic units and the cache memory when the multi-core processor is in a low power state to thus reduce dynamic power consumption.

Type: Application

Filed: May 31, 2011

Publication date: December 6, 2012

Inventors: Srikanth Balasubramanian, Tessil Thomas, Satish Shrimali, Baskaran Ganesan
NOVEL MASSIVELY PARALLEL SUPERCOMPUTER

Publication number: 20120311299

Abstract: A novel massively parallel supercomputer of hundreds of teraOPS-scale includes node architectures based upon System-On-a-Chip technology, i.e., each processing node comprises a single Application Specific Integrated Circuit (ASIC). Within each ASIC node is a plurality of processing elements each of which consists of a central processing unit (CPU) and plurality of floating point processors to enable optimal balance of computational performance, packaging density, low cost, and power and cooling requirements. The plurality of processors within a single node individually or simultaneously work on any combination of computation or communication as required by the particular algorithm being solved. The system-on-a-chip ASIC nodes are interconnected by multiple independent networks that optimally maximizes packet communications throughput and minimizes latency.

Type: Application

Filed: August 3, 2012

Publication date: December 6, 2012

Applicant: International Business Machines Corporation

Inventors: Matthias A. Blumrich, Dong Chen, George L. Chiu, Thomas M. Cipolla, Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Philip Heidlberger, Gerard V. Kopcsay, Lawrence S. Mok, Todd E. Takken
Matrix processor proxy systems and methods

Patent number: 8327114

Abstract: In some embodiments, processor-to-processor and/or broadcast proxies are designated in a microprocessor matrix comprising a plurality of mesh-interconnected matrix processors when default processor-to-processor or broadcast routing algorithms used by data switches within the matrix to route messages would not deliver the messages to all intended recipients. The broadcast proxies broadcast messages within individual non-overlapping broadcast domains of the matrix. P-to-P and broadcast proxies may be designated as part of a boot-time testing/initialization sequence. Improving system fault tolerance allows improving semiconductor processing yields, which may be of particular significance in relatively large integrated circuits including large numbers of relatively-complex matrix processors.

Type: Grant

Filed: July 7, 2008

Date of Patent: December 4, 2012

Assignee: Ovics

Inventors: Sorin C Cismas, Ilie Garbacea
Dynamic atomic arrays

Patent number: 8312053

Abstract: Embodiments of the present invention provide techniques, including systems, methods, and computer readable medium, for dynamic atomic arrays. A dynamic atomic array is a data structure that provides an array that can grow or shrink in size as required. The dynamic atomic array is non-blocking, wait-free, and thread-safe. The dynamic atomic array may be used to provide arrays of any primitive data type as well as complex types, such as objects.

Type: Grant

Filed: September 11, 2009

Date of Patent: November 13, 2012

Assignee: Oracle International Corporation

Inventor: Nathan Reynolds
Algebra operation method, apparatus, and storage medium thereof

Patent number: 8276116

Abstract: An algebra operation method includes the steps of converting algebra operations for a plurality of objects which appear in a program into an algebra operation sequence object described using object access data used to access the plurality of objects and object state data used to store states associated with the plurality of objects without immediately evaluating the algebra operations, determining a function to be applied to the algebra operation sequence object, and evaluating the algebra operations by executing the function by designating an argument group required for the function in response to a call of a substitute operator.

Type: Grant

Filed: June 7, 2007

Date of Patent: September 25, 2012

Assignee: Canon Kabushiki Kaisha

Inventor: Yasuhiro Nakahara
Apparatus and method for performing re-arrangement operations on data

Patent number: 8200948

Abstract: An apparatus and method are provided for performing re-arrangement operations on data. The data processing apparatus has a register data store with a plurality of registers for storing data, and processing logic for performing a sequence of operations on data including at least one re-arrangement operation. The processing logic has scalar processing logic for performing scalar operations and SIMD processing logic for performing SIMD operations. The SIMD processing logic is responsive to a re-arrangement instruction specifying a family of re-arrangement operations to perform a selected re-arrangement operation from that family on a plurality of data elements constituted by data in one or more registers identified by the re-arrangement instruction. The selected re-arrangement operation is dependent on at least one parameter provided by the scalar processing logic, that parameter identifying a data element width for the data elements on which the selected re-arrangement operation is performed.

Type: Grant

Filed: December 4, 2007

Date of Patent: June 12, 2012

Assignee: ARM Limited

Inventors: Daniel Kershaw, Dominic Hugo Symes, Alastair Reid
Method and apparatus for decoding multithreaded instructions of a microprocessor

Patent number: 8195921

Abstract: A microprocessor capable of decoding a plurality of instructions associated with a plurality of threads is disclosed. The microprocessor may comprise a first array comprising a first plurality of microcode operations associated with an instruction from within the plurality of instructions, the first array capable of delivering a first predetermined number of microcode operations from the first plurality of microcode operations. The microprocessor may further comprise a second array comprising a second plurality of microcode operations, the second array capable of providing one or more of the second plurality of microcode operations in the event that the instruction decodes into more than the first predetermined number of microcode operations. The microprocessor may further comprise an arbiter coupled between the first and second arrays, where the arbiter may determine which thread from the plurality of threads accesses the second array.

Type: Grant

Filed: July 9, 2008

Date of Patent: June 5, 2012

Assignee: Oracle America, Inc.

Inventors: Robert Golla, Manish Shah
Message routing scheme for an array having a switch with address comparing component and message routing component

Patent number: 8185719

Abstract: Each possessor node in an array of nodes has a respective local node address, and each local node address comprises a plurality of components having an order of addressing significance from most to least significant. Each node comprises: mapping means configured to map each component of the local node address onto a respective routing direction, and a switch arranged to receive a message having a destination node address identifying a destination node. The switch comprises: means for comparing the local node address to the destination node address to identify a the most significant non-matching component; and means for routing the message to another node, on the condition that the local node address does not match the destination node address, in the direction mapped to the most significant non-matching component.

Type: Grant

Filed: November 18, 2010

Date of Patent: May 22, 2012

Assignee: XMOS Limited

Inventor: Michael David May
Accelerating Generic Loop Iterators Using Speculative Execution

Publication number: 20120110302

Abstract: A method, a system and a computer program product for effectively accelerating loop iterators using speculative execution of iterators. An Efficient Loop Iterator (ELI) utility detects initiation of a target program and initiates/spawns a speculative iterator thread at the start of the basic code block ahead of the code block that initiates a nested loop. The ELI utility assigns the iterator thread to a dedicated processor in a multi-processor system. The speculative thread runs/executes ahead of the execution of the nested loop and calculates indices in a corresponding multidimensional array. The iterator thread adds all the precomputed indices to a single queue. As a result, the ELI utility effectively enables a multidimensional loop to be replaced by a single dimensional loop. At the beginning of (or during) each iteration of the iterator, the ELI utility “dequeues” an entry from the queue to use the entry to access the array upon which the ELI utility iterates.

Type: Application

Filed: November 2, 2010

Publication date: May 3, 2012

Applicant: IBM Corporation

Inventors: Ganesh Bikshandi, Dibyendu Das, Smruti Ranjan Sarangi
Circuit and method for parallel perforation in speed rate matching

Publication number: 20120096238

Abstract: The present invention discloses a circuit and a method for parallel perforation in rate matching, which can reduce the perforation processing time delay to satisfy the requirements of a Long Term Evolution (LTE). Both the circuit and the method can adopt three selector arrays and three register groups. Specifically, the first selector array is configured to remove null bits in input data and output the remaining data to the first register group; the second selector array is configured to combine the first register group and the third register group and then output the combined data to the second register group; during the combination, the valid data in the third register group are preferentially selected, and then the data in the first register group are selected; when the second register group is full, the data therein are output to the exterior as the results of the perforation processing.

Type: Application

Filed: June 29, 2010

Publication date: April 19, 2012

Applicant: ZTE CORPORATION

Inventor: Ziyu Wen
Methods and apparatus for scalable array processor interrupt detection and response

Patent number: 8161267

Abstract: Hardware and software techniques for interrupt detection and response in a scalable pipelined array processor environment are described. Utilizing these techniques, a sequential program execution model with interrupts can be maintained in a highly parallel scalable pipelined array processing containing multiple processing elements and distributed memories and register files. When an interrupt occurs, interface signals are provided to all PEs to support independent interrupt operations in each PE dependent upon the local PE instruction sequence prior to the interrupt. Processing/element exception interrupts are supported and low latency interrupt processing is also provided for embedded systems where real time signal processing is required. Further, a hierarchical interrupt structure is used allowing a generalized debug approach using debut interrupts and a dynamic debut monitor mechanism.

Type: Grant

Filed: November 30, 2010

Date of Patent: April 17, 2012

Assignee: Altera Corporation

Inventors: Edwin Franklin Barry, Patrick R. Marchand, Gerald George Pechanek, Larry D. Larsen
Sequentially propagating instructions of thread through serially coupled PEs for concurrent processing respective thread on different data and synchronizing upon branch

Patent number: 8151090

Abstract: A systolic data processing apparatus includes a processing element (PE) array and control unit. The PE array comprises a plurality of PEs, each PE executing a thread with respect to different data according to an input instruction and pipelining the instruction at each cycle for executing a program. The control unit inputs a new instruction to a first PE of the PE array at each cycle.

Type: Grant

Filed: February 17, 2009

Date of Patent: April 3, 2012

Assignee: Samsung Electronics Co., Ltd.

Inventors: Gi-Ho Park, Shin-Dug Kim, Jung-Wook Park, Hoon-Mo Yang, Sung-Bae Park
Array-type processor having plural processor elements controlled by a state control unit

Patent number: 8151089

Abstract: A multiplicity of processor elements that are arranged in rows and columns individually execute data processing in accordance with instruction codes that are individually set as data and supply event data as output. A state control unit is composed of a plurality of units that successively switch the instruction codes of the multiplicity of processor elements in accordance with a computer program and the event data, these state control units communicating with each other to realize linked operation as necessary. An event distributing means distributes event data to this plurality of state control units that intercommunicate to realize linked operation, whereby the plurality of state control units can realize linked operation to control a large-scale state transition.

Type: Grant

Filed: October 29, 2003

Date of Patent: April 3, 2012

Assignee: Renesas Electronics Corporation

Inventors: Taro Fujii, Koichiro Furuta, Masato Motomura, Kenichiro Anjo, Yoshikazu Yabe, Toru Awashima, Takao Toi, Noritsugu Nakamura

prev 1 2 3 4 5 6 … next