Array Processor Element Interconnection Patents (Class 712/11)
  • Patent number: 8825924
    Abstract: A computer array (10) has a plurality of computers (12). The computers (12) communicate with each other asynchronously, and the computers (12) themselves operate in a generally asynchronous manner internally. When one computer (12) attempts to communicate with another it goes to sleep until the other computer (12) is ready to complete the transaction, thereby saving power and reducing heat production. A plurality of read lines (18), write lines (20) and data lines (22) interconnect the computers (12). When one computer (12) sets a read line (18) high and the other computer sets a corresponding write line (20) then data is transferred on the data lines (22). When both the read line (18) and corresponding write line (20) go low this allows both communicating computers (12) to know that the communication is completed. An acknowledge line (72) goes high to restart the computers (12).
    Type: Grant
    Filed: March 4, 2011
    Date of Patent: September 2, 2014
    Assignee: Array Portfolio LLC
    Inventor: Charles H. Moore
  • Patent number: 8826228
    Abstract: A computer-implemented method for creating a program for a multi-processor system comprising a plurality of interspersed processors and memories. A user may specify or create source code using a programming language. The source code specifies a plurality of tasks and communication of data among the plurality of tasks. However, the source code may not (and preferably is not required to) 1) explicitly specify which physical processor will execute each task and 2) explicitly specify which communication mechanism to use among the plurality of tasks. The method then creates machine language instructions based on the source code, wherein the machine language instructions are designed to execute on the plurality of processors. Creation of the machine language instructions comprises assigning tasks for execution on respective processors and selecting communication mechanisms between the processors based on location of the respective processors and required data communication to satisfy system requirements.
    Type: Grant
    Filed: March 27, 2007
    Date of Patent: September 2, 2014
    Assignee: Coherent Logix, Incorporated
    Inventors: John Mark Beardslee, Michael B. Doerr, Tommy K. Eng
  • Publication number: 20140244971
    Abstract: Embodiments of the invention relate to an array of processor core circuits with reversible tiers. One embodiment comprises multiple tiers of core circuits and multiple switches for routing packets between the core circuits. Each tier comprises at least one core circuit. Each switch comprises multiple router channels for routing packets in different directions relative to the switch, and at least one routing circuit configured for reversing a logical direction of at least one router channel.
    Type: Application
    Filed: February 28, 2013
    Publication date: August 28, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: INTERNATIONAL BUSINESS MACHINES CORPORATION
  • Patent number: 8812820
    Abstract: A data processing device comprising a multidimensional array of coarse grained logic elements processing data and operating at a first clock rate and communicating with one another and/or other elements via busses and/or communication lines operated at a second clock rate is disclosed, wherein the first clock rate is higher than the second and wherein the coarse grained logic elements comprise storage means for storing data needed to be processed.
    Type: Grant
    Filed: February 19, 2009
    Date of Patent: August 19, 2014
    Assignee: Pact XPP Technologies AG
    Inventors: Martin Vorbach, Alexander Thomas
  • Patent number: 8768642
    Abstract: The present invention systems and methods facilitate configuration of functional components included in a remotely located integrated circuit die. In one exemplary implementation, a die functional component reconfiguration request process is engaged in wherein a system requests a reconfiguration code from a remote centralized resource. A reconfiguration code production process is executed in which a request for a reconfiguration code and a permission indicator are received, validity of permission indicator is analyzed, and a reconfiguration code is provided if the permission indicator is valid. A die functional component configuration process is performed on the die when an appropriate reconfiguration code is received by the die. The functional component configuration process includes directing alteration of a functional component configuration. Workflow is diverted from disabled functional components to enabled functional components.
    Type: Grant
    Filed: December 18, 2003
    Date of Patent: July 1, 2014
    Assignee: Nvidia Corporation
    Inventors: Michael B. Diamond, John S. Montrym, James M. Van Dyke, Michael B. Nagy, Sean J. Treichler
  • Patent number: 8745604
    Abstract: An integrated circuit includes a plurality of tiles. Each tile includes a processor, a switch including switching circuitry to forward data over data paths from other tiles to the processor and to switches of other tiles, and a switch memory that stores instruction streams that are able to operate independently for respective output ports of the switch.
    Type: Grant
    Filed: February 25, 2008
    Date of Patent: June 3, 2014
    Assignee: Massachusetts Institute of Technology
    Inventor: Anant Agarwal
  • Patent number: 8737392
    Abstract: A processor includes a plurality of processor tiles, each tile including a processor core, and an interconnection network interconnects the processor cores and enables transfer of data among the processor cores. The interconnection network has a plurality of dimensions in which an ordering of dimensions for routing data is configurable.
    Type: Grant
    Filed: October 21, 2011
    Date of Patent: May 27, 2014
    Assignee: Tilera Corporation
    Inventors: Liewei Bao, Ian Rudolf Bratt
  • Patent number: 8667049
    Abstract: A novel massively parallel supercomputer of hundreds of teraOPS-scale includes node architectures based upon System-On-a-Chip technology, i.e., each processing node comprises a single Application Specific Integrated Circuit (ASIC). Within each ASIC node is a plurality of processing elements each of which consists of a central processing unit (CPU) and plurality of floating point processors to enable optimal balance of computational performance, packaging density, low cost, and power and cooling requirements. The plurality of processors within a single node individually or simultaneously work on any combination of computation or communication as required by the particular algorithm being solved. The system-on-a-chip ASIC nodes are interconnected by multiple independent networks that optimally maximizes packet communications throughput and minimizes latency.
    Type: Grant
    Filed: August 3, 2012
    Date of Patent: March 4, 2014
    Assignee: International Business Machines Corporation
    Inventors: Matthias A. Blumrich, Dong Chen, George L. Chiu, Thomas M. Cipolla, Paul W. Coteus, Alan G. Gara, Mark E. Giampap, Philip Heidlberger, Gerard V. Kopcsay, Lawrence S. Mok, Todd E. Takken
  • Patent number: 8667251
    Abstract: This electronic chip includes functional modules each including a single processing unit and a single routing unit (110E) connected to one another, and connections, called routing connections, each of which has at least one end connected to the routing unit of a functional module, where the routing connections connect between themselves the routing units of the functional modules so as to allow routing of data between the processing units of the functional modules. The routing unit (110E) of at least one functional module, called a split routing unit, includes two routers (112E, 114E), called respectively a first-level router and a second-level router, which are connected to one another, where the first-level router is moreover connected to at least two routing connections, and where the second-level router is moreover connected to the processing unit of this functional module and connected to at least one other routing connection.
    Type: Grant
    Filed: February 15, 2011
    Date of Patent: March 4, 2014
    Assignee: Commissariat a l'Energie Atomique et aux Energies Alternatives
    Inventors: Walid Lafi, Didier Lattard
  • Patent number: 8656141
    Abstract: An integrated circuit includes a plurality of tiles. Each tile includes a pipelined processor configured to process multiple streams of instructions for the processor; and a switch including switching circuitry to forward data over data paths from other tiles to one or more pipeline stages of the processor and to switches of other tiles. At least some of the data is forwarded based on one or more streams of instructions for the switch.
    Type: Grant
    Filed: December 13, 2005
    Date of Patent: February 18, 2014
    Assignee: Massachusetts Institute of Technology
    Inventor: Anant Agarwal
  • Patent number: 8638805
    Abstract: Described embodiments provide for restructuring a scheduling hierarchy of a network processor having a plurality of processing modules and a shared memory. The scheduling hierarchy schedules packets for transmission. The network processor generates tasks corresponding to each received packet associated with a data flow. A traffic manager receives tasks provided by one of the processing modules and determines a queue of the scheduling hierarchy corresponding to the task. The queue has a parent scheduler at each of one or more next levels of the scheduling hierarchy up to a root scheduler, forming a branch of the hierarchy. The traffic manager determines if the queue and one or more of the parent schedulers of the branch should be restructured. If so, the traffic manager drops subsequently received tasks for the branch, drains all tasks of the branch, and removes the corresponding nodes of the branch from the scheduling hierarchy.
    Type: Grant
    Filed: September 30, 2011
    Date of Patent: January 28, 2014
    Assignee: LSI Corporation
    Inventors: Balakrishnan Sundararaman, Shashank Nemawarkar, David Sonnier, Shailendra Aulakh, Allen Vestal
  • Patent number: 8631415
    Abstract: Embodiments provide various techniques for dynamic adjustment of a number of threads for execution in any domain based on domain utilizations. In a multiprocessor system, the utilization for each domain is monitored. If a utilization of any of these domains changes, then the number of threads for each of the domains determined for execution may also be adjusted to adapt to the change.
    Type: Grant
    Filed: August 25, 2009
    Date of Patent: January 14, 2014
    Assignee: NetApp, Inc.
    Inventors: Gokul Nadathur, Manpreet Singh, Grace Ho
  • Patent number: 8625422
    Abstract: Disclosed are methods, systems, paradigms and structures for processing data packets in a communication network by a multi-core network processor. The network processor includes a plurality of multi-threaded core processors and special purpose processors for processing the data packets atomically, and in parallel. An ingress module of the network processor stores the incoming data packets in the memory and adds them to an input queue. The network processor processes a data packet by performing a set of network operations on the data packet in a single thread of a core processor. The special purpose processors perform a subset of the set of network operations on the data packet atomically. An egress module retrieves the processed data packets from a plurality of output queues based on a quality of service (QoS) associated with the output queues, and forwards the data packets towards their destination addresses.
    Type: Grant
    Filed: March 5, 2013
    Date of Patent: January 7, 2014
    Assignee: Unbound Networks
    Inventors: Damon Finney, Ashok Mathur
  • Patent number: 8612507
    Abstract: A computing device includes: a deciding unit which, in computation of values of nodes on a lattice in a direction where a value of m representing a horizontal axis coordinate of the lattice increases, decides dummy nodes to be added to m=n?1, so as to enable values of nodes on m=n to be calculated by adding the dummy nodes to m=n?1 and executing a vector operation through the use of the SIMD function by using values of nodes on m=n?1 and values of the added dummy nodes; an adding unit adding the dummy nodes decided by the deciding unit to m=n?1; and a calculating unit calculating the values of the nodes present on m=n by executing the vector operation through the use of the SIMD function by using the values of the nodes on m=n?1 and the values of the dummy nodes added by the adding unit.
    Type: Grant
    Filed: April 16, 2010
    Date of Patent: December 17, 2013
    Assignee: NS Solutions Corporation
    Inventor: Hiroki Takeshita
  • Patent number: 8607029
    Abstract: A dynamic reconfigurable circuit including a plurality of processing elements each provided with an arithmetic data input port, a configuration data input port and an output port, a data network that is coupled to the arithmetic data input ports and the output ports of the plurality of processing elements, a configuration memory that is coupled via a configuration path to the configuration data input port of a first processor element being at least one of the plurality of processing elements, and an immediate value network that is independent from the data network and that is coupled to the configuration data input port of a second processor element being at least one of the plurality of processing elements. An internal register of a third processor element is coupled to the immediate value network so that data stored in the internal register can be outputted to the immediate value network.
    Type: Grant
    Filed: December 16, 2008
    Date of Patent: December 10, 2013
    Assignee: Fujitsu Semiconductor Limited
    Inventor: Shin-ichi Sutou
  • Patent number: 8601176
    Abstract: Techniques for providing improved data distribution to and collection from multiple memories are described. Such memories are often associated with and local to processing elements (PEs) within an array processor. Improved data transfer control within a data processing system provides support for radix 2, 4 and 8 fast Fourier transform (FFT) algorithms through data reordering or bit-reversed addressing across multiple PEs, carried out concurrently with FFT computation on a digital signal processor (DSP) array by a DMA unit. Parallel data distribution and collection through forms of multicast and packet-gather operations are also supported.
    Type: Grant
    Filed: July 10, 2012
    Date of Patent: December 3, 2013
    Assignee: Altera Corporation
    Inventors: Edwin Franklin Barry, Nikos P. Pitsianis, Kevin Coopman
  • Patent number: 8583896
    Abstract: Systems and methods for massively parallel processing on an accelerator that includes a plurality of processing cores. Each processing core includes multiple processing chains configured to perform parallel computations, each of which includes a plurality of interconnected processing elements. The cores further include multiple of smart memory blocks configured to store and process data, each memory block accepting the output of one of the plurality of processing chains. The cores communicate with at least one off-chip memory bank.
    Type: Grant
    Filed: July 26, 2010
    Date of Patent: November 12, 2013
    Assignee: NEC Laboratories America, Inc.
    Inventors: Srihari Cadambi, Abhinandan Majumdar, Michela Becchi, Srimat Chakradhar, Hans Peter Graf
  • Patent number: 8549257
    Abstract: An integrated circuit is disclosed that comprises: a core comprising logic circuitry: a plurality of interface devices for transmitting signals to and from the processing core, the plurality of interface devices comprising two types of interface devices: one type being a power interface device for delivering power to the core; and a second type being a signal interface device for transmitting data signals between the core and devices external to the integrated circuit; wherein the plurality of interface devices are arranged in two rows, an outer row towards an outer edge of the core and an inner row within the outer row closer to a centre of the core the inner row comprising one of the two types of interface devices and the outer row comprising an other of the two types of interface devices.
    Type: Grant
    Filed: January 10, 2011
    Date of Patent: October 1, 2013
    Assignee: ARM Limited
    Inventors: Vikas Mishra, Bingda Brandon Wang
  • Patent number: 8532288
    Abstract: A cryptographic engine for modulo N multiplication, which is structured as a plurality of almost identical, serially connected Processing Elements, is controlled so as to accept input in blocks that are smaller than the maximum capability of the engine in terms of bits multiplied at one time. The serially connected hardware is thus partitioned on the fly to process a variety of cryptographic key sizes while still maintaining all of the hardware in an active processing state.
    Type: Grant
    Filed: December 1, 2006
    Date of Patent: September 10, 2013
    Assignee: International Business Machines Corporation
    Inventors: Camil Fayad, John K. Li, Siegfried K. H. Sutter, Phil C. Yeh
  • Patent number: 8516179
    Abstract: A processing system on an integrated circuit includes a group of processing cores. A group of dedicated random access memories are severally coupled to one of the group of processing cores or shared among the group. A star bus couples the group of processing cores and random access memories. Additional layer(s) of star bus may couple many such clusters to each other and to an off-chip environment.
    Type: Grant
    Filed: November 30, 2004
    Date of Patent: August 20, 2013
    Assignee: Digital RNA, LLC
    Inventor: Joel Henry Hinrichs
  • Patent number: 8510535
    Abstract: The present invention provides a mixed torus and hypercube multi-rank tensor expansion method which can be applied to the communication subsystem of a parallel processing system. The said expansion method is based on the conventional torus and hypercube topologies. A mixed torus and hypercube multi-rank tensor expansion interconnection network is built up by means of supernodes equipped with expansion interfaces. This method not only provides more bisection bandwidth to the entire system but also improves the long-range communication and global operations. Affirmatively, this expansion method can achieve better scalability and flexibility for the parallel system for a given system size.
    Type: Grant
    Filed: June 19, 2008
    Date of Patent: August 13, 2013
    Assignee: Shanghai Redneurons Co., Ltd
    Inventors: Yuefan Deng, Peng Zhang
  • Publication number: 20130198487
    Abstract: A data processing apparatus and method for accessing operands stored within a set of registers. Instruction decoder circuitry, responsive to program instructions, generates register access control signals identifying for each program instruction which registers in the register set are to be accessed by the processing circuitry when performing the processing operation specified by that program instruction. The set of registers are logically arranged as a plurality of register groups, with each register in the set being a member of more than one register group. Each program instruction includes a register specifier field, and instruction decoder circuitry is responsive to each program instruction to determine a selected register group, and to determine one or more selected members of that selected register group. The instruction decoder circuitry then outputs register access control signals identifying the register corresponding to each selected member of the selected register group.
    Type: Application
    Filed: February 1, 2012
    Publication date: August 1, 2013
    Applicant: The Regents of the University of Michigan
    Inventors: Joseph M. PUSDESRIS, Trevor N. MUDGE, Thomas D. MANVILLE
  • Patent number: 8490110
    Abstract: Data processing on a network on chip (‘NOC’) that includes integrated processor (‘IP’) blocks, routers, memory communications controllers, and network interface controllers, with each IP block adapted to a router through a memory communications controller and a network interface controller, where each memory communications controller controlling communications between an IP block and memory, each network interface controller controlling inter-IP block communications through routers, with each IP block also adapted to the network by a low latency, high bandwidth application messaging interconnect comprising an inbox and an outbox.
    Type: Grant
    Filed: February 15, 2008
    Date of Patent: July 16, 2013
    Assignee: International Business Machines Corporation
    Inventors: Russell D. Hoover, Jon K. Kriegel, Eric O. Mejdrich, Robert A. Shearer
  • Patent number: 8490066
    Abstract: A profiler which provides information to optimize an application specific architecture processor and a program for the processor is provided.
    Type: Grant
    Filed: March 29, 2007
    Date of Patent: July 16, 2013
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Dong-Hoon Yoo, Soo-Jung Ryu, Jeong-Wook Kim, Hong-Seok Kim, Hee Seok Kim
  • Patent number: 8484276
    Abstract: Techniques are disclosed for converting data into a format tailored for efficient multidimensional fast Fourier transforms (FFTS) on single instruction, multiple data (SIMD) multi-core processor architectures. The technique includes converting data from a multidimensional array stored in a conventional row-major order into SIMD format. Converted data in SIMD format consists of a sequence of blocks, where each block interleaves s rows such that SIMD vector processors may operate on s rows simultaneously. As a result, the converted data in SIMD format enables smaller-sized 1D FFTs to be optimized in SIMD multi-core processor architectures.
    Type: Grant
    Filed: March 18, 2009
    Date of Patent: July 9, 2013
    Assignee: International Business Machines Corporation
    Inventors: David G. Carlson, Travis M. Drucker, Timothy J. Mullins, Jeffrey S. McAllister, Nelson Ramirez
  • Patent number: 8484444
    Abstract: A multi-node video signal processor (VSPN) is describes that tightly couples multiple multi-cycle state machines (hardware assist units) to each processor and each memory in each node of an N node scalable array processor. VSPN memory hardware assist instructions are used to initiate multi-cycle state machine functions, to pass parameters to the multi-cycle state machines, to fetch operands from a node's memory, and to control the transfer of results from the multi-cycle state machines.
    Type: Grant
    Filed: March 1, 2011
    Date of Patent: July 9, 2013
    Assignee: Altera Corporation
    Inventors: Gerald George Pechanek, Mihailo Stojancic
  • Patent number: 8478967
    Abstract: System and method for automatically parallelizing iterative functionality in a data flow program. A data flow program is stored that includes a first data flow program portion, where the first data flow program portion is iterative. Program code implementing a plurality of second data flow program portions is automatically generated based on the first data flow program portion, where each of the second data flow program portions is configured to execute a respective one or more iterations. The plurality of second data flow program portions are configured to execute at least a portion of iterations concurrently during execution of the data flow program. Execution of the plurality of second data flow program portions is functionally equivalent to sequential execution of the iterations of the first data flow program portion.
    Type: Grant
    Filed: June 1, 2009
    Date of Patent: July 2, 2013
    Assignee: National Instruments Corporation
    Inventors: Adam L. Bordelon, Robert E. Dye, Haoran Yi, Mary E. Fletcher
  • Patent number: 8464025
    Abstract: A signal processing apparatus able to raise a processing capability in processing accompanying access to a storing means is provided. Stream control units (SCU) 203—0 to 203—3 access data at an external memory system or local memories 204—0 to 204—3 according to a thread under control from a host processor. Processor units (PU) arrays 202—0 to 202—3 perform image processing by a different thread from the thread of the SCUs 203—0 to 203—3.
    Type: Grant
    Filed: May 22, 2006
    Date of Patent: June 11, 2013
    Assignee: Sony Corporation
    Inventors: Yuji Yamaguchi, Masatoshi Imai, Toshiharu Noda, Naosuke Asari, Tomoo Mitsunaga, Mitsuharu Ohki, Kazumasa Ito, Hidetoshi Nagano, Sumito Arakawa, Kei Ito
  • Patent number: 8453003
    Abstract: A communication method is provided to reduce an overhead of inter-processor synchronization for a communication phase in collective communication and to speed up the collective communication. Each of processors in a parallel computer start a previous process before a collective communication phase in which communications are performed at a same time among the processors through a inter-processor network. Each processor executes a synchronization command in advance at a time when a portion of the previous process for a predetermined time t is left. The inter-processor synchronization control section transmits a synchronization completion notice to each processor, if a synchronization condition is met. For the period, each processor executes the previous process in parallel. Then, the plurality of processors enter the collective communication phase.
    Type: Grant
    Filed: April 9, 2008
    Date of Patent: May 28, 2013
    Assignee: NEC Corporation
    Inventor: Yasushi Kanoh
  • Patent number: 8447803
    Abstract: An intelligent network interface card (INIC) or communication processing device (CPD) works with a host computer for data communication. The device provides a fast-path that avoids protocol processing for most messages, greatly accelerating data transfer and offloading time-intensive processing tasks from the host CPU. The host retains a fallback processing capability for messages that do not fit fast-path criteria, with the device providing assistance such as validation even for slow-path messages, and messages being selected for either fast-path or slow-path processing. A context for a connection is defined that allows the device to move data, free of headers, directly to or from a destination or source in the host. The context can be passed back to the host for message processing by the host. The device contains specialized hardware circuits that are much faster at their specific tasks than a general purpose CPU.
    Type: Grant
    Filed: May 14, 2003
    Date of Patent: May 21, 2013
    Assignee: Alacritech, Inc.
    Inventors: Laurence B. Boucher, Stephen E. J. Blightman, Peter K. Craft, David A. Higgen, Clive M. Philbrick, Daryl D. Starr
  • Patent number: 8443169
    Abstract: A Wings array system for communicating between nodes using store and load instructions is described. Couplings between nodes are made according to a 1 to N adjacency of connections in each dimension of a G×H matrix of nodes, where G?N and H?N and N is a positive odd integer. Also, a 3D Wings neural network processor is described as a 3D G×H×K network of neurons, each neuron with an N×N×N array of synaptic weight values stored in coupled memory nodes, where G?N, H?N, K?N, and N is determined from a 1 to N adjacency of connections used in the G×H×K network. Further, a hexagonal processor array is organized according to an INFORM coordinate system having axes at 60 degree spacing. Nodes communicate on row paths parallel to an FM dimension of communication, column paths parallel to an IO dimension of communication, and diagonal paths parallel to an NR dimension of communication.
    Type: Grant
    Filed: February 28, 2011
    Date of Patent: May 14, 2013
    Inventor: Gerald George Pechanek
  • Patent number: 8438512
    Abstract: Disclosed is an improved method and system for implementing parallelism for execution of electronic design automation (EDA) tools, such as layout processing tools. Examples of EDA layout processing tools are placement and routing tools. Efficient locking mechanism are described for facilitating parallel processing and to minimize blocking.
    Type: Grant
    Filed: August 30, 2011
    Date of Patent: May 7, 2013
    Assignee: Cadence Design Systems, Inc.
    Inventors: David Cross, Eric Nequist
  • Publication number: 20130111188
    Abstract: Data processing device comprising a multidimensional array of ALUs, having at least two dimension where the number of ALUs in the dimension is greater or equal to 2, adapted to process data without register caused latency between at least some of the ALUs in the corresponding array.
    Type: Application
    Filed: February 14, 2011
    Publication date: May 2, 2013
    Inventors: Martin Vorbach, Frank May
  • Patent number: 8429382
    Abstract: A symmetric multi-processing (SMP) processor includes a primary interconnect trunk for communication of information between multiple compute elements situated along the primary interconnect trunk. The processor also includes a secondary interconnected trunk that may be oriented perpendicular with respect to the primary interconnect trunk. The processor distributes data on-ramps and data off-ramps across the data lanes of a data trunk of the primary interconnect trunk to enable communication with compute elements and other structures both on-chip and off-chip.
    Type: Grant
    Filed: April 30, 2008
    Date of Patent: April 23, 2013
    Assignee: International Business Machines Corporation
    Inventors: Robert Alan Cargnoni, Gary Alan Gorman, Charles Francis Marino, Julie Ann Rosser
  • Patent number: 8417917
    Abstract: A mechanism is provided for improving the performance and efficiency of multi-core processors. A system controller in a data processing system determines an operational function for each primary processor core in a set of primary processor cores in a primary processor core logic layer and for each secondary processor core in a set of secondary processor cores in a secondary processor core logic layer, thereby forming a set of determined operational functions. The system controller then generates an initial configuration, based on the set of determined operational functions, for initializing the set of primary processor cores and the set of secondary processor cores in the three-dimensional processor core architecture. The initial configuration indicates how at least one primary processor core of the set of primary processor cores collaborate with at least one secondary processor core of the set of secondary processor cores.
    Type: Grant
    Filed: September 30, 2009
    Date of Patent: April 9, 2013
  • Patent number: 8375395
    Abstract: A computing architecture comprises a plurality of processing elements to perform data processing calculations, a plurality of memory elements to store the data processing results, and a reconfigurable interconnect network to couple the processing elements to the memory elements. The reconfigurable interconnect network includes a switching element, a control element, a plurality of processor interface units, a plurality of memory interface units, and a plurality of application control units. In various embodiments, the processing elements and the interconnect network may be implemented in a field-programmable gate array.
    Type: Grant
    Filed: January 3, 2008
    Date of Patent: February 12, 2013
    Assignee: L3 Communications Integrated Systems, L.P.
    Inventors: Deepak Prasanna, Matthew Pascal DeLaquil
  • Patent number: 8370844
    Abstract: Embodiments off the invention provide a mechanism for process migration on a massively parallel computer system. In particular, embodiments of the invention may be used to update process state data for a migrated compute node, such as MPI (or other communication library) state data, across a full collection of compute nodes present in a given parallel system executing a parallel task. Migrating a process form one compute node to another may be useful to address a variety of sub-optimal operating conditions. For example, one or more processes may be migrated to cure network congestion resulting from a poorly mapped task or when a compute node is predicted to experience a hardware failure.
    Type: Grant
    Filed: September 12, 2007
    Date of Patent: February 5, 2013
    Assignee: International Business Machines Corporation
    Inventors: Charles Jens Archer, David L. Darrington, Patrick Joseph McCarthy, Amanda Peters, Albert Sidelnik
  • Publication number: 20120331268
    Abstract: A reconfigurable data processor architecture. The processor architecture includes: a first plurality of data processing elements, each having a respective synchronization unit, a data link structure adapted for dynamically interconnecting a number of the data processing elements, at least one configuration register, and at least one control unit in operative connection with the configuration register for controlling a contents thereof, wherein, based on the contents, the first plurality of data processing elements is adapted for temporarily constituting at runtime at least one group of one or more of said data processing elements from said first plurality of data processing elements dynamically via the data link structure. The synchronization units are adapted for synchronizing data processing by individual data processing elements within the group. The first plurality of data processing elements may be reconfigurably grouped and thus adapted to various data processing tasks at runtime.
    Type: Application
    Filed: February 28, 2011
    Publication date: December 27, 2012
    Applicant: KARLSRUHER INSTITUT FUR TECHNOLOGIE
    Inventors: Ralf Konig, Timo Stripf, Jurgen Becker
  • Patent number: 8327114
    Abstract: In some embodiments, processor-to-processor and/or broadcast proxies are designated in a microprocessor matrix comprising a plurality of mesh-interconnected matrix processors when default processor-to-processor or broadcast routing algorithms used by data switches within the matrix to route messages would not deliver the messages to all intended recipients. The broadcast proxies broadcast messages within individual non-overlapping broadcast domains of the matrix. P-to-P and broadcast proxies may be designated as part of a boot-time testing/initialization sequence. Improving system fault tolerance allows improving semiconductor processing yields, which may be of particular significance in relatively large integrated circuits including large numbers of relatively-complex matrix processors.
    Type: Grant
    Filed: July 7, 2008
    Date of Patent: December 4, 2012
    Assignee: Ovics
    Inventors: Sorin C Cismas, Ilie Garbacea
  • Patent number: 8307116
    Abstract: The present disclosure generally relates to systems for routing data across a multinodal network. Example systems include a multinodal array having a plurality of nodes and a plurality of physical communication channels connecting the nodes. At least one of the physical communication channels may be configured to route data from a first node to two or more other destination nodes of the plurality of nodes. The present disclosure also generally relates to methods for routing data across a multinodal network and computer accessible mediums having stored thereon computer executable instructions for performing techniques for routing data across a multinodal network.
    Type: Grant
    Filed: June 19, 2009
    Date of Patent: November 6, 2012
    Assignee: Board of Regents of the University of Texas System
    Inventors: Stephen W. Keckler, Boris Grot
  • Patent number: 8276116
    Abstract: An algebra operation method includes the steps of converting algebra operations for a plurality of objects which appear in a program into an algebra operation sequence object described using object access data used to access the plurality of objects and object state data used to store states associated with the plurality of objects without immediately evaluating the algebra operations, determining a function to be applied to the algebra operation sequence object, and evaluating the algebra operations by executing the function by designating an argument group required for the function in response to a call of a substitute operator.
    Type: Grant
    Filed: June 7, 2007
    Date of Patent: September 25, 2012
    Assignee: Canon Kabushiki Kaisha
    Inventor: Yasuhiro Nakahara
  • Publication number: 20120216013
    Abstract: Briefly, an efficient and scalable processor device is disclosed that uses multi-value voltages for operands, results, and signaling. An array of cells is arranged in rows and columns, and one or more multi-value operands are used to select a cell from the array. A row driver may be used to select a row of cells, and a column driver is used to select a particular column from the selected row. Once a particular cell is selected, a voltage value associated with that cell is passed as an output, which is typically a multi-value result. The multi-value processor is constructed such that the row driver and column driver can be substantially identical, and have a structure that enables significant circuit reuse, provides substantial reduction in size for a circuit layout, has increased layout symmetry, simple scalability, and advantageous power conservation.
    Type: Application
    Filed: October 25, 2011
    Publication date: August 23, 2012
    Inventor: Benjamin J. Cooper
  • Publication number: 20120216012
    Abstract: The present invention discloses a single chip sequential processor comprising at least one ALU-Block wherein said sequential processor is capable of maintaining its op-codes while processing data such as to overcome the necessity of requiring a new instruction in every clock cycle.
    Type: Application
    Filed: October 15, 2009
    Publication date: August 23, 2012
    Applicant: HYPERION CORE, INC.
    Inventors: Martin Vorbach, Frank May, Markus Weinhardt
  • Patent number: 8250337
    Abstract: General purpose array processing techniques including processing methods and apparatus. Processors may include parallel processing paths designed with reusable computational components such as multipliers, multiplexers, and ALUs. Flow of data through the paths and operations performed may be controlled based on opcodes. Processors may be shared, scalable, and configured to perform matrix operations. In particular, such operation may be useful for physical sections of MIMO-OFDM communication systems.
    Type: Grant
    Filed: April 27, 2007
    Date of Patent: August 21, 2012
    Assignee: Qualcomm Incorporated
    Inventor: Garret Webster Shih
  • Patent number: 8244931
    Abstract: Techniques for providing improved data distribution to and collection from multiple memories are described. Such memories are often associated with and local to processing elements (PEs) within an array processor. Improved data transfer control within a data processing system provides support for radix 2, 4 and 8 fast Fourier transform (FFT) algorithms through data reordering or bit-reversed addressing across multiple PEs, carried out concurrently with FFT computation on a digital signal processor (DSP) array by a DMA unit. Parallel data distribution and collection through forms of multicast and packet-gather operations are also supported.
    Type: Grant
    Filed: August 8, 2011
    Date of Patent: August 14, 2012
    Assignee: Altera Corporation
    Inventors: Edwin Franklin Barry, Nikos P. Pitsianis, Kevin Coopman
  • Publication number: 20120191945
    Abstract: There is described a processor architecture, comprising: a plurality of first bus pairs, each first bus pair including a respective first bus running in a first direction (for example, left to right) and a respective second bus running in a second direction opposite to the first direction (for example right to left); a plurality of second bus pairs, each second bus pair including a respective third bus running in a third direction (for example downwards) and a respective fourth bus running in a fourth direction opposite to the third direction (for example upwards), the third and fourth buses intersecting the first and second buses; a plurality of switch matrices, each switch matrix located at an intersection of a first and a second pair of buses; a plurality of elements arranged in an array, each element being arranged to receive data from a respective first or second bus, and transfer data to a respective first or second bus.
    Type: Application
    Filed: July 5, 2011
    Publication date: July 26, 2012
    Inventors: Anthony Peter John Claydon, Anne Patricia Claydon
  • Patent number: 8225073
    Abstract: The present invention concerns configuration of a new category of integrated circuitry for adaptive or reconfigurable computing. The preferred adaptive computing engine (ACE) IC includes a plurality of heterogeneous computational elements coupled to an interconnection network. The plurality of heterogeneous computational elements include corresponding computational elements having fixed and differing architectures, such as fixed architectures for different functions such as memory, addition, multiplication, complex multiplication, subtraction, configuration, reconfiguration, control, input, output, and field programmability.
    Type: Grant
    Filed: March 6, 2009
    Date of Patent: July 17, 2012
    Assignee: QST Holdings LLC
    Inventors: Paul L. Master, Stephen J. Smith, John Watson
  • Publication number: 20120179893
    Abstract: An integrated circuit is disclosed that comprises: a core comprising logic circuitry: a plurality of interface devices for transmitting signals to and from the processing core, the plurality of interface devices comprising two types of interface devices: one type being a power interface device for delivering power to the core; and a second type being a signal interface device for transmitting data signals between the core and devices external to the integrated circuit; wherein the plurality of interface devices are arranged in two rows, an outer row towards an outer edge of the core and an inner row within the outer row closer to a centre of the core the inner row comprising one of the two types of interface devices and the outer row comprising an other of the two types of interface devices.
    Type: Application
    Filed: January 10, 2011
    Publication date: July 12, 2012
    Applicant: ARM LIMITED
    Inventors: Vikas Mishra, Bingda Brandon Wang
  • Patent number: 8200940
    Abstract: A system and method for successfully performing reduction operations in a multi-threaded SIMD (single-instruction multiple-data) system while one or more threads are disabled allows for the reduction operations to be performed without a performance penalty compared with performing the same operation with all of the threads enabled. The source data for each intermediate computation of the reduction operation is remapped by a configurable crossbar as needed to avoid using invalid data from the disabled threads. The remapping function is transparent to the user and enables correct execution of order invariant reduction operations and order dependent prefix-reduction operations.
    Type: Grant
    Filed: June 30, 2008
    Date of Patent: June 12, 2012
    Assignee: NVIDIA Corporation
    Inventor: John Erik Lindholm
  • Patent number: 8195856
    Abstract: A general bus system is provided which combines a number of internal lines and leads them as a bundle to the terminals. The bus system control is predefined and does not require any influence by the programmer. Any number of memories, peripherals or other units can be connected to the bus system (for cascading).
    Type: Grant
    Filed: July 21, 2010
    Date of Patent: June 5, 2012
    Inventors: Martin Vorbach, Robert Münch