Array Processor Element Interconnection Patents (Class 712/11)

Cube or hypercube (Class 712/12)

Partitioning (Class 712/13)

Processing element memory (Class 712/14)

Reconfiguring (Class 712/15)

Asynchronous computer communication

Patent number: 8825924

Abstract: A computer array (10) has a plurality of computers (12). The computers (12) communicate with each other asynchronously, and the computers (12) themselves operate in a generally asynchronous manner internally. When one computer (12) attempts to communicate with another it goes to sleep until the other computer (12) is ready to complete the transaction, thereby saving power and reducing heat production. A plurality of read lines (18), write lines (20) and data lines (22) interconnect the computers (12). When one computer (12) sets a read line (18) high and the other computer sets a corresponding write line (20) then data is transferred on the data lines (22). When both the read line (18) and corresponding write line (20) go low this allows both communicating computers (12) to know that the communication is completed. An acknowledge line (72) goes high to restart the computers (12).

Type: Grant

Filed: March 4, 2011

Date of Patent: September 2, 2014

Assignee: Array Portfolio LLC

Inventor: Charles H. Moore
Programming a multi-processor system

Patent number: 8826228

Abstract: A computer-implemented method for creating a program for a multi-processor system comprising a plurality of interspersed processors and memories. A user may specify or create source code using a programming language. The source code specifies a plurality of tasks and communication of data among the plurality of tasks. However, the source code may not (and preferably is not required to) 1) explicitly specify which physical processor will execute each task and 2) explicitly specify which communication mechanism to use among the plurality of tasks. The method then creates machine language instructions based on the source code, wherein the machine language instructions are designed to execute on the plurality of processors. Creation of the machine language instructions comprises assigning tasks for execution on respective processors and selecting communication mechanisms between the processors based on location of the respective processors and required data communication to satisfy system requirements.

Type: Grant

Filed: March 27, 2007

Date of Patent: September 2, 2014

Assignee: Coherent Logix, Incorporated

Inventors: John Mark Beardslee, Michael B. Doerr, Tommy K. Eng
ARRAY OF PROCESSOR CORE CIRCUITS WITH REVERSIBLE TIERS

Publication number: 20140244971

Abstract: Embodiments of the invention relate to an array of processor core circuits with reversible tiers. One embodiment comprises multiple tiers of core circuits and multiple switches for routing packets between the core circuits. Each tier comprises at least one core circuit. Each switch comprises multiple router channels for routing packets in different directions relative to the switch, and at least one routing circuit configured for reversing a logical direction of at least one router channel.

Type: Application

Filed: February 28, 2013

Publication date: August 28, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: INTERNATIONAL BUSINESS MACHINES CORPORATION
Data processing device and method

Patent number: 8812820

Abstract: A data processing device comprising a multidimensional array of coarse grained logic elements processing data and operating at a first clock rate and communicating with one another and/or other elements via busses and/or communication lines operated at a second clock rate is disclosed, wherein the first clock rate is higher than the second and wherein the coarse grained logic elements comprise storage means for storing data needed to be processed.

Type: Grant

Filed: February 19, 2009

Date of Patent: August 19, 2014

Assignee: Pact XPP Technologies AG

Inventors: Martin Vorbach, Alexander Thomas
System and method for remotely configuring semiconductor functional circuits

Patent number: 8768642

Abstract: The present invention systems and methods facilitate configuration of functional components included in a remotely located integrated circuit die. In one exemplary implementation, a die functional component reconfiguration request process is engaged in wherein a system requests a reconfiguration code from a remote centralized resource. A reconfiguration code production process is executed in which a request for a reconfiguration code and a permission indicator are received, validity of permission indicator is analyzed, and a reconfiguration code is provided if the permission indicator is valid. A die functional component configuration process is performed on the die when an appropriate reconfiguration code is received by the die. The functional component configuration process includes directing alteration of a functional component configuration. Workflow is diverted from disabled functional components to enabled functional components.

Type: Grant

Filed: December 18, 2003

Date of Patent: July 1, 2014

Assignee: Nvidia Corporation

Inventors: Michael B. Diamond, John S. Montrym, James M. Van Dyke, Michael B. Nagy, Sean J. Treichler
Transferring data in a parallel processing environment

Patent number: 8745604

Abstract: An integrated circuit includes a plurality of tiles. Each tile includes a processor, a switch including switching circuitry to forward data over data paths from other tiles to the processor and to switches of other tiles, and a switch memory that stores instruction streams that are able to operate independently for respective output ports of the switch.

Type: Grant

Filed: February 25, 2008

Date of Patent: June 3, 2014

Assignee: Massachusetts Institute of Technology

Inventor: Anant Agarwal
Configuring routing in mesh networks

Patent number: 8737392

Abstract: A processor includes a plurality of processor tiles, each tile including a processor core, and an interconnection network interconnects the processor cores and enables transfer of data among the processor cores. The interconnection network has a plurality of dimensions in which an ordering of dimensions for routing data is configurable.

Type: Grant

Filed: October 21, 2011

Date of Patent: May 27, 2014

Assignee: Tilera Corporation

Inventors: Liewei Bao, Ian Rudolf Bratt
Massively parallel supercomputer

Patent number: 8667049

Abstract: A novel massively parallel supercomputer of hundreds of teraOPS-scale includes node architectures based upon System-On-a-Chip technology, i.e., each processing node comprises a single Application Specific Integrated Circuit (ASIC). Within each ASIC node is a plurality of processing elements each of which consists of a central processing unit (CPU) and plurality of floating point processors to enable optimal balance of computational performance, packaging density, low cost, and power and cooling requirements. The plurality of processors within a single node individually or simultaneously work on any combination of computation or communication as required by the particular algorithm being solved. The system-on-a-chip ASIC nodes are interconnected by multiple independent networks that optimally maximizes packet communications throughput and minimizes latency.

Type: Grant

Filed: August 3, 2012

Date of Patent: March 4, 2014

Assignee: International Business Machines Corporation

Inventors: Matthias A. Blumrich, Dong Chen, George L. Chiu, Thomas M. Cipolla, Paul W. Coteus, Alan G. Gara, Mark E. Giampap, Philip Heidlberger, Gerard V. Kopcsay, Lawrence S. Mok, Todd E. Takken
Electronic chip and integrated circuit including a split routing unit having first-level routers for intra-layer transmissions and second-level routers for inter-layer transmissions and transmissions to the processing units

Patent number: 8667251

Abstract: This electronic chip includes functional modules each including a single processing unit and a single routing unit (110E) connected to one another, and connections, called routing connections, each of which has at least one end connected to the routing unit of a functional module, where the routing connections connect between themselves the routing units of the functional modules so as to allow routing of data between the processing units of the functional modules. The routing unit (110E) of at least one functional module, called a split routing unit, includes two routers (112E, 114E), called respectively a first-level router and a second-level router, which are connected to one another, where the first-level router is moreover connected to at least two routing connections, and where the second-level router is moreover connected to the processing unit of this functional module and connected to at least one other routing connection.

Type: Grant

Filed: February 15, 2011

Date of Patent: March 4, 2014

Assignee: Commissariat a l'Energie Atomique et aux Energies Alternatives

Inventors: Walid Lafi, Didier Lattard
Architecture and programming in a parallel processing environment with switch-interconnected processors

Patent number: 8656141

Abstract: An integrated circuit includes a plurality of tiles. Each tile includes a pipelined processor configured to process multiple streams of instructions for the processor; and a switch including switching circuitry to forward data over data paths from other tiles to one or more pipeline stages of the processor and to switches of other tiles. At least some of the data is forwarded based on one or more streams of instructions for the switch.

Type: Grant

Filed: December 13, 2005

Date of Patent: February 18, 2014

Assignee: Massachusetts Institute of Technology

Inventor: Anant Agarwal
Packet draining from a scheduling hierarchy in a traffic manager of a network processor

Patent number: 8638805

Abstract: Described embodiments provide for restructuring a scheduling hierarchy of a network processor having a plurality of processing modules and a shared memory. The scheduling hierarchy schedules packets for transmission. The network processor generates tasks corresponding to each received packet associated with a data flow. A traffic manager receives tasks provided by one of the processing modules and determines a queue of the scheduling hierarchy corresponding to the task. The queue has a parent scheduler at each of one or more next levels of the scheduling hierarchy up to a root scheduler, forming a branch of the hierarchy. The traffic manager determines if the queue and one or more of the parent schedulers of the branch should be restructured. If so, the traffic manager drops subsequently received tasks for the branch, drains all tasks of the branch, and removes the corresponding nodes of the branch from the scheduling hierarchy.

Type: Grant

Filed: September 30, 2011

Date of Patent: January 28, 2014

Assignee: LSI Corporation

Inventors: Balakrishnan Sundararaman, Shashank Nemawarkar, David Sonnier, Shailendra Aulakh, Allen Vestal
Adjustment of threads for execution based on over-utilization of a domain in a multi-processor system by sub-dividing parallizable group of threads to sub-domains

Patent number: 8631415

Abstract: Embodiments provide various techniques for dynamic adjustment of a number of threads for execution in any domain based on domain utilizations. In a multiprocessor system, the utilization for each domain is monitored. If a utilization of any of these domains changes, then the number of threads for each of the domains determined for execution may also be adjusted to adapt to the change.

Type: Grant

Filed: August 25, 2009

Date of Patent: January 14, 2014

Assignee: NetApp, Inc.

Inventors: Gokul Nadathur, Manpreet Singh, Grace Ho
Parallel processing using multi-core processor

Patent number: 8625422

Abstract: Disclosed are methods, systems, paradigms and structures for processing data packets in a communication network by a multi-core network processor. The network processor includes a plurality of multi-threaded core processors and special purpose processors for processing the data packets atomically, and in parallel. An ingress module of the network processor stores the incoming data packets in the memory and adds them to an input queue. The network processor processes a data packet by performing a set of network operations on the data packet in a single thread of a core processor. The special purpose processors perform a subset of the set of network operations on the data packet atomically. An egress module retrieves the processed data packets from a plurality of output queues based on a quality of service (QoS) associated with the output queues, and forwards the data packets towards their destination addresses.

Type: Grant

Filed: March 5, 2013

Date of Patent: January 7, 2014

Assignee: Unbound Networks

Inventors: Damon Finney, Ashok Mathur
Computing device, calculating method, and program product

Patent number: 8612507

Abstract: A computing device includes: a deciding unit which, in computation of values of nodes on a lattice in a direction where a value of m representing a horizontal axis coordinate of the lattice increases, decides dummy nodes to be added to m=n?1, so as to enable values of nodes on m=n to be calculated by adding the dummy nodes to m=n?1 and executing a vector operation through the use of the SIMD function by using values of nodes on m=n?1 and values of the added dummy nodes; an adding unit adding the dummy nodes decided by the deciding unit to m=n?1; and a calculating unit calculating the values of the nodes present on m=n by executing the vector operation through the use of the SIMD function by using the values of the nodes on m=n?1 and the values of the dummy nodes added by the adding unit.

Type: Grant

Filed: April 16, 2010

Date of Patent: December 17, 2013

Assignee: NS Solutions Corporation

Inventor: Hiroki Takeshita
Dynamic reconfigurable circuit with a plurality of processing elements, data network, configuration memory, and immediate value network

Patent number: 8607029

Abstract: A dynamic reconfigurable circuit including a plurality of processing elements each provided with an arithmetic data input port, a configuration data input port and an output port, a data network that is coupled to the arithmetic data input ports and the output ports of the plurality of processing elements, a configuration memory that is coupled via a configuration path to the configuration data input port of a first processor element being at least one of the plurality of processing elements, and an immediate value network that is independent from the data network and that is coupled to the configuration data input port of a second processor element being at least one of the plurality of processing elements. An internal register of a third processor element is coupled to the immediate value network so that data stored in the internal register can be outputted to the immediate value network.

Type: Grant

Filed: December 16, 2008

Date of Patent: December 10, 2013

Assignee: Fujitsu Semiconductor Limited

Inventor: Shin-ichi Sutou
Methods and apparatus for providing bit-reversal and multicast functions utilizing DMA controller

Patent number: 8601176

Abstract: Techniques for providing improved data distribution to and collection from multiple memories are described. Such memories are often associated with and local to processing elements (PEs) within an array processor. Improved data transfer control within a data processing system provides support for radix 2, 4 and 8 fast Fourier transform (FFT) algorithms through data reordering or bit-reversed addressing across multiple PEs, carried out concurrently with FFT computation on a digital signal processor (DSP) array by a DMA unit. Parallel data distribution and collection through forms of multicast and packet-gather operations are also supported.

Type: Grant

Filed: July 10, 2012

Date of Patent: December 3, 2013

Assignee: Altera Corporation

Inventors: Edwin Franklin Barry, Nikos P. Pitsianis, Kevin Coopman
Massively parallel processing core with plural chains of processing elements and respective smart memory storing select data received from each chain

Patent number: 8583896

Abstract: Systems and methods for massively parallel processing on an accelerator that includes a plurality of processing cores. Each processing core includes multiple processing chains configured to perform parallel computations, each of which includes a plurality of interconnected processing elements. The cores further include multiple of smart memory blocks configured to store and process data, each memory block accepting the output of one of the plurality of processing chains. The cores communicate with at least one off-chip memory bank.

Type: Grant

Filed: July 26, 2010

Date of Patent: November 12, 2013

Assignee: NEC Laboratories America, Inc.

Inventors: Srihari Cadambi, Abhinandan Majumdar, Michela Becchi, Srimat Chakradhar, Hans Peter Graf
Area efficient arrangement of interface devices within an integrated circuit

Patent number: 8549257

Abstract: An integrated circuit is disclosed that comprises: a core comprising logic circuitry: a plurality of interface devices for transmitting signals to and from the processing core, the plurality of interface devices comprising two types of interface devices: one type being a power interface device for delivering power to the core; and a second type being a signal interface device for transmitting data signals between the core and devices external to the integrated circuit; wherein the plurality of interface devices are arranged in two rows, an outer row towards an outer edge of the core and an inner row within the outer row closer to a centre of the core the inner row comprising one of the two types of interface devices and the outer row comprising an other of the two types of interface devices.

Type: Grant

Filed: January 10, 2011

Date of Patent: October 1, 2013

Assignee: ARM Limited

Inventors: Vikas Mishra, Bingda Brandon Wang
Selectively isolating processor elements into subsets of processor elements

Patent number: 8532288

Abstract: A cryptographic engine for modulo N multiplication, which is structured as a plurality of almost identical, serially connected Processing Elements, is controlled so as to accept input in blocks that are smaller than the maximum capability of the engine in terms of bits multiplied at one time. The serially connected hardware is thus partitioned on the fly to process a variety of cryptographic key sizes while still maintaining all of the hardware in an active processing state.

Type: Grant

Filed: December 1, 2006

Date of Patent: September 10, 2013

Assignee: International Business Machines Corporation

Inventors: Camil Fayad, John K. Li, Siegfried K. H. Sutter, Phil C. Yeh
Integrated circuit with coupled processing cores

Patent number: 8516179

Abstract: A processing system on an integrated circuit includes a group of processing cores. A group of dedicated random access memories are severally coupled to one of the group of processing cores or shared among the group. A star bus couples the group of processing cores and random access memories. Additional layer(s) of star bus may couple many such clusters to each other and to an off-chip environment.

Type: Grant

Filed: November 30, 2004

Date of Patent: August 20, 2013

Assignee: Digital RNA, LLC

Inventor: Joel Henry Hinrichs
Mixed torus and hypercube multi-rank tensor expansion method

Patent number: 8510535

Abstract: The present invention provides a mixed torus and hypercube multi-rank tensor expansion method which can be applied to the communication subsystem of a parallel processing system. The said expansion method is based on the conventional torus and hypercube topologies. A mixed torus and hypercube multi-rank tensor expansion interconnection network is built up by means of supernodes equipped with expansion interfaces. This method not only provides more bisection bandwidth to the entire system but also improves the long-range communication and global operations. Affirmatively, this expansion method can achieve better scalability and flexibility for the parallel system for a given system size.

Type: Grant

Filed: June 19, 2008

Date of Patent: August 13, 2013

Assignee: Shanghai Redneurons Co., Ltd

Inventors: Yuefan Deng, Peng Zhang
DATA PROCESSING APPARATUS AND METHOD FOR DECODING PROGRAM INSTRUCTIONS IN ORDER TO GENERATE CONTROL SIGNALS FOR PROCESSING CIRCUITRY OF THE DATA PROCESSING APPARATUS

Publication number: 20130198487

Abstract: A data processing apparatus and method for accessing operands stored within a set of registers. Instruction decoder circuitry, responsive to program instructions, generates register access control signals identifying for each program instruction which registers in the register set are to be accessed by the processing circuitry when performing the processing operation specified by that program instruction. The set of registers are logically arranged as a plurality of register groups, with each register in the set being a member of more than one register group. Each program instruction includes a register specifier field, and instruction decoder circuitry is responsive to each program instruction to determine a selected register group, and to determine one or more selected members of that selected register group. The instruction decoder circuitry then outputs register access control signals identifying the register corresponding to each selected member of the selected register group.

Type: Application

Filed: February 1, 2012

Publication date: August 1, 2013

Applicant: The Regents of the University of Michigan

Inventors: Joseph M. PUSDESRIS, Trevor N. MUDGE, Thomas D. MANVILLE
Network on chip with a low latency, high bandwidth application messaging interconnect

Patent number: 8490110

Abstract: Data processing on a network on chip (‘NOC’) that includes integrated processor (‘IP’) blocks, routers, memory communications controllers, and network interface controllers, with each IP block adapted to a router through a memory communications controller and a network interface controller, where each memory communications controller controlling communications between an IP block and memory, each network interface controller controlling inter-IP block communications through routers, with each IP block also adapted to the network by a low latency, high bandwidth application messaging interconnect comprising an inbox and an outbox.

Type: Grant

Filed: February 15, 2008

Date of Patent: July 16, 2013

Assignee: International Business Machines Corporation

Inventors: Russell D. Hoover, Jon K. Kriegel, Eric O. Mejdrich, Robert A. Shearer
Profiler for optimizing processor architecture and application

Patent number: 8490066

Abstract: A profiler which provides information to optimize an application specific architecture processor and a program for the processor is provided.

Type: Grant

Filed: March 29, 2007

Date of Patent: July 16, 2013

Assignee: Samsung Electronics Co., Ltd.

Inventors: Dong-Hoon Yoo, Soo-Jung Ryu, Jeong-Wook Kim, Hong-Seok Kim, Hee Seok Kim
Processing array data on SIMD multi-core processor architectures

Patent number: 8484276

Abstract: Techniques are disclosed for converting data into a format tailored for efficient multidimensional fast Fourier transforms (FFTS) on single instruction, multiple data (SIMD) multi-core processor architectures. The technique includes converting data from a multidimensional array stored in a conventional row-major order into SIMD format. Converted data in SIMD format consists of a sequence of blocks, where each block interleaves s rows such that SIMD vector processors may operate on s rows simultaneously. As a result, the converted data in SIMD format enables smaller-sized 1D FFTs to be optimized in SIMD multi-core processor architectures.

Type: Grant

Filed: March 18, 2009

Date of Patent: July 9, 2013

Assignee: International Business Machines Corporation

Inventors: David G. Carlson, Travis M. Drucker, Timothy J. Mullins, Jeffrey S. McAllister, Nelson Ramirez
Methods and apparatus for attaching application specific functions within an array processor

Patent number: 8484444

Abstract: A multi-node video signal processor (VSPN) is describes that tightly couples multiple multi-cycle state machines (hardware assist units) to each processor and each memory in each node of an N node scalable array processor. VSPN memory hardware assist instructions are used to initiate multi-cycle state machine functions, to pass parameters to the multi-cycle state machines, to fetch operands from a node's memory, and to control the transfer of results from the multi-cycle state machines.

Type: Grant

Filed: March 1, 2011

Date of Patent: July 9, 2013

Assignee: Altera Corporation

Inventors: Gerald George Pechanek, Mihailo Stojancic
Automatically creating parallel iterative program code in a data flow program

Patent number: 8478967

Abstract: System and method for automatically parallelizing iterative functionality in a data flow program. A data flow program is stored that includes a first data flow program portion, where the first data flow program portion is iterative. Program code implementing a plurality of second data flow program portions is automatically generated based on the first data flow program portion, where each of the second data flow program portions is configured to execute a respective one or more iterations. The plurality of second data flow program portions are configured to execute at least a portion of iterations concurrently during execution of the data flow program. Execution of the plurality of second data flow program portions is functionally equivalent to sequential execution of the iterations of the first data flow program portion.

Type: Grant

Filed: June 1, 2009

Date of Patent: July 2, 2013

Assignee: National Instruments Corporation

Inventors: Adam L. Bordelon, Robert E. Dye, Haoran Yi, Mary E. Fletcher
Signal processing apparatus with signal control units and processor units operating based on different threads

Patent number: 8464025

Abstract: A signal processing apparatus able to raise a processing capability in processing accompanying access to a storing means is provided. Stream control units (SCU) 203—0 to 203—3 access data at an external memory system or local memories 204—0 to 204—3 according to a thread under control from a host processor. Processor units (PU) arrays 202—0 to 202—3 perform image processing by a different thread from the thread of the SCUs 203—0 to 203—3.

Type: Grant

Filed: May 22, 2006

Date of Patent: June 11, 2013

Assignee: Sony Corporation

Inventors: Yuji Yamaguchi, Masatoshi Imai, Toshiharu Noda, Naosuke Asari, Tomoo Mitsunaga, Mitsuharu Ohki, Kazumasa Ito, Hidetoshi Nagano, Sumito Arakawa, Kei Ito
Communication method

Patent number: 8453003

Abstract: A communication method is provided to reduce an overhead of inter-processor synchronization for a communication phase in collective communication and to speed up the collective communication. Each of processors in a parallel computer start a previous process before a collective communication phase in which communications are performed at a same time among the processors through a inter-processor network. Each processor executes a synchronization command in advance at a time when a portion of the previous process for a predetermined time t is left. The inter-processor synchronization control section transmits a synchronization completion notice to each processor, if a synchronization condition is met. For the period, each processor executes the previous process in parallel. Then, the plurality of processors enter the collective communication phase.

Type: Grant

Filed: April 9, 2008

Date of Patent: May 28, 2013

Assignee: NEC Corporation

Inventor: Yasushi Kanoh
Method and apparatus for distributing network traffic processing on a multiprocessor computer

Patent number: 8447803

Abstract: An intelligent network interface card (INIC) or communication processing device (CPD) works with a host computer for data communication. The device provides a fast-path that avoids protocol processing for most messages, greatly accelerating data transfer and offloading time-intensive processing tasks from the host CPU. The host retains a fallback processing capability for messages that do not fit fast-path criteria, with the device providing assistance such as validation even for slow-path messages, and messages being selected for either fast-path or slow-path processing. A context for a connection is defined that allows the device to move data, free of headers, directly to or from a destination or source in the host. The context can be passed back to the host for message processing by the host. The device contains specialized hardware circuits that are much faster at their specific tasks than a general purpose CPU.

Type: Grant

Filed: May 14, 2003

Date of Patent: May 21, 2013

Assignee: Alacritech, Inc.

Inventors: Laurence B. Boucher, Stephen E. J. Blightman, Peter K. Craft, David A. Higgen, Clive M. Philbrick, Daryl D. Starr
Interconnection network connecting operation-configurable nodes according to one or more levels of adjacency in multiple dimensions of communication in a multi-processor and a neural processor

Patent number: 8443169

Abstract: A Wings array system for communicating between nodes using store and load instructions is described. Couplings between nodes are made according to a 1 to N adjacency of connections in each dimension of a G×H matrix of nodes, where G?N and H?N and N is a positive odd integer. Also, a 3D Wings neural network processor is described as a 3D G×H×K network of neurons, each neuron with an N×N×N array of synaptic weight values stored in coupled memory nodes, where G?N, H?N, K?N, and N is determined from a 1 to N adjacency of connections used in the G×H×K network. Further, a hexagonal processor array is organized according to an INFORM coordinate system having axes at 60 degree spacing. Nodes communicate on row paths parallel to an FM dimension of communication, column paths parallel to an IO dimension of communication, and diagonal paths parallel to an NR dimension of communication.

Type: Grant

Filed: February 28, 2011

Date of Patent: May 14, 2013

Inventor: Gerald George Pechanek
Method and system for implementing efficient locking to facilitate parallel processing of IC designs

Patent number: 8438512

Abstract: Disclosed is an improved method and system for implementing parallelism for execution of electronic design automation (EDA) tools, such as layout processing tools. Examples of EDA layout processing tools are placement and routing tools. Efficient locking mechanism are described for facilitating parallel processing and to minimize blocking.

Type: Grant

Filed: August 30, 2011

Date of Patent: May 7, 2013

Assignee: Cadence Design Systems, Inc.

Inventors: David Cross, Eric Nequist
LOW LATENCY MASSIVE PARALLEL DATA PROCESSING DEVICE

Publication number: 20130111188

Abstract: Data processing device comprising a multidimensional array of ALUs, having at least two dimension where the number of ALUs in the dimension is greater or equal to 2, adapted to process data without register caused latency between at least some of the ALUs in the corresponding array.

Type: Application

Filed: February 14, 2011

Publication date: May 2, 2013

Inventors: Martin Vorbach, Frank May
Information handling system including a multiple compute element processor with distributed data on-ramp data-off ramp topology

Patent number: 8429382

Abstract: A symmetric multi-processing (SMP) processor includes a primary interconnect trunk for communication of information between multiple compute elements situated along the primary interconnect trunk. The processor also includes a secondary interconnected trunk that may be oriented perpendicular with respect to the primary interconnect trunk. The processor distributes data on-ramps and data off-ramps across the data lanes of a data trunk of the primary interconnect trunk to enable communication with compute elements and other structures both on-chip and off-chip.

Type: Grant

Filed: April 30, 2008

Date of Patent: April 23, 2013

Assignee: International Business Machines Corporation

Inventors: Robert Alan Cargnoni, Gary Alan Gorman, Charles Francis Marino, Julie Ann Rosser
Processor core stacking for efficient collaboration

Patent number: 8417917

Abstract: A mechanism is provided for improving the performance and efficiency of multi-core processors. A system controller in a data processing system determines an operational function for each primary processor core in a set of primary processor cores in a primary processor core logic layer and for each secondary processor core in a set of secondary processor cores in a secondary processor core logic layer, thereby forming a set of determined operational functions. The system controller then generates an initial configuration, based on the set of determined operational functions, for initializing the set of primary processor cores and the set of secondary processor cores in the three-dimensional processor core architecture. The initial configuration indicates how at least one primary processor core of the set of primary processor cores collaborate with at least one secondary processor core of the set of secondary processor cores.

Type: Grant

Filed: September 30, 2009

Date of Patent: April 9, 2013
Switch-based parallel distributed cache architecture for memory access on reconfigurable computing platforms

Patent number: 8375395

Abstract: A computing architecture comprises a plurality of processing elements to perform data processing calculations, a plurality of memory elements to store the data processing results, and a reconfigurable interconnect network to couple the processing elements to the memory elements. The reconfigurable interconnect network includes a switching element, a control element, a plurality of processor interface units, a plurality of memory interface units, and a plurality of application control units. In various embodiments, the processing elements and the interconnect network may be implemented in a field-programmable gate array.

Type: Grant

Filed: January 3, 2008

Date of Patent: February 12, 2013

Assignee: L3 Communications Integrated Systems, L.P.

Inventors: Deepak Prasanna, Matthew Pascal DeLaquil
Mechanism for process migration on a massively parallel computer

Patent number: 8370844

Abstract: Embodiments off the invention provide a mechanism for process migration on a massively parallel computer system. In particular, embodiments of the invention may be used to update process state data for a migrated compute node, such as MPI (or other communication library) state data, across a full collection of compute nodes present in a given parallel system executing a parallel task. Migrating a process form one compute node to another may be useful to address a variety of sub-optimal operating conditions. For example, one or more processes may be migrated to cure network congestion resulting from a poorly mapped task or when a compute node is predicted to experience a hardware failure.

Type: Grant

Filed: September 12, 2007

Date of Patent: February 5, 2013

Assignee: International Business Machines Corporation

Inventors: Charles Jens Archer, David L. Darrington, Patrick Joseph McCarthy, Amanda Peters, Albert Sidelnik
RECONFIGURABLE PROCESSOR ARCHITECTURE

Publication number: 20120331268

Abstract: A reconfigurable data processor architecture. The processor architecture includes: a first plurality of data processing elements, each having a respective synchronization unit, a data link structure adapted for dynamically interconnecting a number of the data processing elements, at least one configuration register, and at least one control unit in operative connection with the configuration register for controlling a contents thereof, wherein, based on the contents, the first plurality of data processing elements is adapted for temporarily constituting at runtime at least one group of one or more of said data processing elements from said first plurality of data processing elements dynamically via the data link structure. The synchronization units are adapted for synchronizing data processing by individual data processing elements within the group. The first plurality of data processing elements may be reconfigurably grouped and thus adapted to various data processing tasks at runtime.

Type: Application

Filed: February 28, 2011

Publication date: December 27, 2012

Applicant: KARLSRUHER INSTITUT FUR TECHNOLOGIE

Inventors: Ralf Konig, Timo Stripf, Jurgen Becker
Matrix processor proxy systems and methods

Patent number: 8327114

Abstract: In some embodiments, processor-to-processor and/or broadcast proxies are designated in a microprocessor matrix comprising a plurality of mesh-interconnected matrix processors when default processor-to-processor or broadcast routing algorithms used by data switches within the matrix to route messages would not deliver the messages to all intended recipients. The broadcast proxies broadcast messages within individual non-overlapping broadcast domains of the matrix. P-to-P and broadcast proxies may be designated as part of a boot-time testing/initialization sequence. Improving system fault tolerance allows improving semiconductor processing yields, which may be of particular significance in relatively large integrated circuits including large numbers of relatively-complex matrix processors.

Type: Grant

Filed: July 7, 2008

Date of Patent: December 4, 2012

Assignee: Ovics

Inventors: Sorin C Cismas, Ilie Garbacea
Scalable bus-based on-chip interconnection networks

Patent number: 8307116

Abstract: The present disclosure generally relates to systems for routing data across a multinodal network. Example systems include a multinodal array having a plurality of nodes and a plurality of physical communication channels connecting the nodes. At least one of the physical communication channels may be configured to route data from a first node to two or more other destination nodes of the plurality of nodes. The present disclosure also generally relates to methods for routing data across a multinodal network and computer accessible mediums having stored thereon computer executable instructions for performing techniques for routing data across a multinodal network.

Type: Grant

Filed: June 19, 2009

Date of Patent: November 6, 2012

Assignee: Board of Regents of the University of Texas System

Inventors: Stephen W. Keckler, Boris Grot
Algebra operation method, apparatus, and storage medium thereof

Patent number: 8276116

Abstract: An algebra operation method includes the steps of converting algebra operations for a plurality of objects which appear in a program into an algebra operation sequence object described using object access data used to access the plurality of objects and object state data used to store states associated with the plurality of objects without immediately evaluating the algebra operations, determining a function to be applied to the algebra operation sequence object, and evaluating the algebra operations by executing the function by designating an argument group required for the function in response to a call of a substitute operator.

Type: Grant

Filed: June 7, 2007

Date of Patent: September 25, 2012

Assignee: Canon Kabushiki Kaisha

Inventor: Yasuhiro Nakahara
EFFICIENT AND SCALABLE MULTI-VALUE PROCESSOR AND SUPPORTING CIRCUITS

Publication number: 20120216013

Abstract: Briefly, an efficient and scalable processor device is disclosed that uses multi-value voltages for operands, results, and signaling. An array of cells is arranged in rows and columns, and one or more multi-value operands are used to select a cell from the array. A row driver may be used to select a row of cells, and a column driver is used to select a particular column from the selected row. Once a particular cell is selected, a voltage value associated with that cell is passed as an output, which is typically a multi-value result. The multi-value processor is constructed such that the row driver and column driver can be substantially identical, and have a structure that enables significant circuit reuse, provides substantial reduction in size for a circuit layout, has increased layout symmetry, simple scalability, and advantageous power conservation.

Type: Application

Filed: October 25, 2011

Publication date: August 23, 2012

Inventor: Benjamin J. Cooper
SEQUENTIAL PROCESSOR COMPRISING AN ALU ARRAY

Publication number: 20120216012

Abstract: The present invention discloses a single chip sequential processor comprising at least one ALU-Block wherein said sequential processor is capable of maintaining its op-codes while processing data such as to overcome the necessity of requiring a new instruction in every clock cycle.

Type: Application

Filed: October 15, 2009

Publication date: August 23, 2012

Applicant: HYPERION CORE, INC.

Inventors: Martin Vorbach, Frank May, Markus Weinhardt
Array processor with two parallel processing paths of multipliers and ALUs with idle operation capability controlled by portions of opcode including indication of valid output

Patent number: 8250337

Abstract: General purpose array processing techniques including processing methods and apparatus. Processors may include parallel processing paths designed with reusable computational components such as multipliers, multiplexers, and ALUs. Flow of data through the paths and operations performed may be controlled based on opcodes. Processors may be shared, scalable, and configured to perform matrix operations. In particular, such operation may be useful for physical sections of MIMO-OFDM communication systems.

Type: Grant

Filed: April 27, 2007

Date of Patent: August 21, 2012

Assignee: Qualcomm Incorporated

Inventor: Garret Webster Shih
Methods and apparatus for providing bit-reversal and multicast functions utilizing DMA controller

Patent number: 8244931

Abstract: Techniques for providing improved data distribution to and collection from multiple memories are described. Such memories are often associated with and local to processing elements (PEs) within an array processor. Improved data transfer control within a data processing system provides support for radix 2, 4 and 8 fast Fourier transform (FFT) algorithms through data reordering or bit-reversed addressing across multiple PEs, carried out concurrently with FFT computation on a digital signal processor (DSP) array by a DMA unit. Parallel data distribution and collection through forms of multicast and packet-gather operations are also supported.

Type: Grant

Filed: August 8, 2011

Date of Patent: August 14, 2012

Assignee: Altera Corporation

Inventors: Edwin Franklin Barry, Nikos P. Pitsianis, Kevin Coopman
Processor Architecture With Switch Matrices For Transferring Data Along Buses

Publication number: 20120191945

Abstract: There is described a processor architecture, comprising: a plurality of first bus pairs, each first bus pair including a respective first bus running in a first direction (for example, left to right) and a respective second bus running in a second direction opposite to the first direction (for example right to left); a plurality of second bus pairs, each second bus pair including a respective third bus running in a third direction (for example downwards) and a respective fourth bus running in a fourth direction opposite to the third direction (for example upwards), the third and fourth buses intersecting the first and second buses; a plurality of switch matrices, each switch matrix located at an intersection of a first and a second pair of buses; a plurality of elements arranged in an array, each element being arranged to receive data from a respective first or second bus, and transfer data to a respective first or second bus.

Type: Application

Filed: July 5, 2011

Publication date: July 26, 2012

Inventors: Anthony Peter John Claydon, Anne Patricia Claydon
Apparatus, system and method for configuration of adaptive integrated circuitry having heterogeneous computational elements

Patent number: 8225073

Abstract: The present invention concerns configuration of a new category of integrated circuitry for adaptive or reconfigurable computing. The preferred adaptive computing engine (ACE) IC includes a plurality of heterogeneous computational elements coupled to an interconnection network. The plurality of heterogeneous computational elements include corresponding computational elements having fixed and differing architectures, such as fixed architectures for different functions such as memory, addition, multiplication, complex multiplication, subtraction, configuration, reconfiguration, control, input, output, and field programmability.

Type: Grant

Filed: March 6, 2009

Date of Patent: July 17, 2012

Assignee: QST Holdings LLC

Inventors: Paul L. Master, Stephen J. Smith, John Watson
Area efficient arrangement of interface devices within an integrated circuit

Publication number: 20120179893

Abstract: An integrated circuit is disclosed that comprises: a core comprising logic circuitry: a plurality of interface devices for transmitting signals to and from the processing core, the plurality of interface devices comprising two types of interface devices: one type being a power interface device for delivering power to the core; and a second type being a signal interface device for transmitting data signals between the core and devices external to the integrated circuit; wherein the plurality of interface devices are arranged in two rows, an outer row towards an outer edge of the core and an inner row within the outer row closer to a centre of the core the inner row comprising one of the two types of interface devices and the outer row comprising an other of the two types of interface devices.

Type: Application

Filed: January 10, 2011

Publication date: July 12, 2012

Applicant: ARM LIMITED

Inventors: Vikas Mishra, Bingda Brandon Wang
Reduction operations in a synchronous parallel thread processing system with disabled execution threads

Patent number: 8200940

Abstract: A system and method for successfully performing reduction operations in a multi-threaded SIMD (single-instruction multiple-data) system while one or more threads are disabled allows for the reduction operations to be performed without a performance penalty compared with performing the same operation with all of the threads enabled. The source data for each intermediate computation of the reduction operation is remapped by a configurable crossbar as needed to avoid using invalid data from the disabled threads. The remapping function is transparent to the user and enables correct execution of order invariant reduction operations and order dependent prefix-reduction operations.

Type: Grant

Filed: June 30, 2008

Date of Patent: June 12, 2012

Assignee: NVIDIA Corporation

Inventor: John Erik Lindholm
I/O and memory bus system for DFPS and units with two- or multi-dimensional programmable cell architectures

Patent number: 8195856

Abstract: A general bus system is provided which combines a number of internal lines and leads them as a bundle to the terminals. The bus system control is predefined and does not require any influence by the programmer. Any number of memories, peripherals or other units can be connected to the bus system (for cascading).

Type: Grant

Filed: July 21, 2010

Date of Patent: June 5, 2012

Inventors: Martin Vorbach, Robert Münch

prev 1 2 3 4 5 6 7 … next