Cube Or Hypercube Patents (Class 712/12)
  • Publication number: 20130232319
    Abstract: A disclosed information processing system includes 2n nodes that are connected in a manner of an n-dimensional hyper cube, wherein the n is a natural number equal to or greater than 3. A channel is provided between each node of specific nodes satisfying a predetermined condition among the 2n nodes and each of the specific nodes other than the node, and the predetermined condition is a condition that all numerical values of digits other than a last digit and a second last digit of a node number are same, in case where an n-digit binary node number is assigned to each of the 2n nodes so that a Hamming distance between directly connected nodes is 1.
    Type: Application
    Filed: April 15, 2013
    Publication date: September 5, 2013
    Applicant: FUJITSU LIMITED
    Inventor: Toru KONO
  • Patent number: 8510535
    Abstract: The present invention provides a mixed torus and hypercube multi-rank tensor expansion method which can be applied to the communication subsystem of a parallel processing system. The said expansion method is based on the conventional torus and hypercube topologies. A mixed torus and hypercube multi-rank tensor expansion interconnection network is built up by means of supernodes equipped with expansion interfaces. This method not only provides more bisection bandwidth to the entire system but also improves the long-range communication and global operations. Affirmatively, this expansion method can achieve better scalability and flexibility for the parallel system for a given system size.
    Type: Grant
    Filed: June 19, 2008
    Date of Patent: August 13, 2013
    Assignee: Shanghai Redneurons Co., Ltd
    Inventors: Yuefan Deng, Peng Zhang
  • Patent number: 8438512
    Abstract: Disclosed is an improved method and system for implementing parallelism for execution of electronic design automation (EDA) tools, such as layout processing tools. Examples of EDA layout processing tools are placement and routing tools. Efficient locking mechanism are described for facilitating parallel processing and to minimize blocking.
    Type: Grant
    Filed: August 30, 2011
    Date of Patent: May 7, 2013
    Assignee: Cadence Design Systems, Inc.
    Inventors: David Cross, Eric Nequist
  • Patent number: 8438404
    Abstract: The disclosure is applied to a generic microprocessor architecture with a set (e.g., one or more) of controlling elements (e.g., MPEs) and a set of groups of sub-processing elements (e.g., SPEs). Under this arrangement, MPEs and SPEs are organized in a way that a smaller number MPEs control the behavior of a group of SPEs using program code embodied as a set of virtualized control threads. The arrangement also enables MPEs delegate functionality to one or more groups of SPEs such that those group(s) of SPEs will act as pseudo MPEs. The pseudo MPEs will utilize pseudo virtualized control threads to control the behavior of other groups of SPEs. In a typical embodiment, the apparatus includes a MCP coupled to a power supply coupled with cores to provide a supply voltage to each core (or core group) and controlling-digital elements and multiple instances of sub-processing elements.
    Type: Grant
    Filed: September 30, 2008
    Date of Patent: May 7, 2013
    Assignee: International Business Machines Corporation
    Inventors: Karl J. Duvalsaint, Harm P. Hofstee, Daeik Kim, Moon J. Kim
  • Patent number: 8433816
    Abstract: A system and method for interconnecting a plurality of processing element nodes within a scalable multiprocessor system is provided. Each processing element node includes at least one processor and memory. A scalable interconnect network includes physical communication links interconnecting the processing element nodes in a cluster. A first set of routers in the scalable interconnect network route messages between the plurality of processing element nodes. One or more metarouters in the scalable interconnect network route messages between the first set of routers so that each one of the routers in a first cluster is connected to all other clusters through one or more metarouters.
    Type: Grant
    Filed: May 16, 2008
    Date of Patent: April 30, 2013
    Assignee: Silicon Graphics International Corp.
    Inventors: Martin M. Deneroff, Gregory M. Thorson, Randal S. Passint
  • Patent number: 8411809
    Abstract: A cosite interference cancellation system is provided for improved rejection of a signal coupled from a transmission antenna into a local receive antenna in the presence of local multipath. The cosite interference cancellation system and associated method advantageously provide improved signal rejection by continuously controlling (adjusting) a matching time delay to reduce cosite interference.
    Type: Grant
    Filed: January 4, 2012
    Date of Patent: April 2, 2013
    Assignee: BAE Systems Information and Electronic Systems Integration Inc.
    Inventor: Raymond J. Lackey
  • Patent number: 8370844
    Abstract: Embodiments off the invention provide a mechanism for process migration on a massively parallel computer system. In particular, embodiments of the invention may be used to update process state data for a migrated compute node, such as MPI (or other communication library) state data, across a full collection of compute nodes present in a given parallel system executing a parallel task. Migrating a process form one compute node to another may be useful to address a variety of sub-optimal operating conditions. For example, one or more processes may be migrated to cure network congestion resulting from a poorly mapped task or when a compute node is predicted to experience a hardware failure.
    Type: Grant
    Filed: September 12, 2007
    Date of Patent: February 5, 2013
    Assignee: International Business Machines Corporation
    Inventors: Charles Jens Archer, David L. Darrington, Patrick Joseph McCarthy, Amanda Peters, Albert Sidelnik
  • Publication number: 20120272040
    Abstract: A computer program product for generating and implementing a three-dimensional (3D) computer processing chip stack plan. The computer readable program code includes computer readable program code configured for receiving system requirements from a plurality of clients, identifying common processing structures and technologies from the system requirements, and assigning the common processing structures and technologies to at least one layer in the 3D computer processing chip stack plan. The computer readable program code is also configured for identifying uncommon processing structures and technologies from the system requirements and assigning the uncommon processing structures and technologies to a host layer in the 3D computer processing chip stack plan. The computer readable program code is further configured for determining placement and wiring of the uncommon structures on the host layer, storing placement information in the plan, and transmitting the plan to manufacturing equipment.
    Type: Application
    Filed: June 28, 2012
    Publication date: October 25, 2012
  • Patent number: 8218911
    Abstract: An image processing apparatus which applies processes to input image data is disclosed. The image processing apparatus includes a first processing section which applies processes to the image data by a specific calculating device, and a second processing section which applies processes to the image data by a general-purpose calculating program. The input image data are multilevel image data. The first processing section includes an image data binarizing unit for forming binary image data from the multilevel image data, and a multilevel image data processing section for applying a calculation process to the multilevel image data. The second processing section includes a binary image data processing section for applying a calculation process to the binary image data formed by the image data binarizing unit.
    Type: Grant
    Filed: December 18, 2007
    Date of Patent: July 10, 2012
    Assignee: Ricoh Company, Ltd.
    Inventor: Makoto Odamaki
  • Patent number: 8169440
    Abstract: A method of processing data relating to geometrical primitives is disclosed. Each of the primitives has a plurality of vertices. The method uses a plurality of processing elements in parallel with one another, and comprises assigning respective vertex data to the processing elements, on each processing element, and in parallel with one another, performing at least one processing step on vertex data to produce processed vertex data, and transferring processed vertex data between processing elements so as to assemble primitive data.
    Type: Grant
    Filed: May 29, 2007
    Date of Patent: May 1, 2012
    Assignee: Rambus Inc.
    Inventors: Dave Stuttard, Dave Williams, Eamon O'Dea, Gordon Faulds, John Rhoades, Ken Cameron, Phil Atkin, Paul Winser, Russell David, Ray McConnell, Tim Day, Trey Greer
  • Patent number: 8169434
    Abstract: An octree GPU construction system and method for constructing a complete octree data structure on a graphics processing unit (GPU). Embodiments of the octree GPU construction system and method first defines a complete octree data structure as forming a complete partition of the 3-D space and including a vertex, edge, face, and node arrays, and neighborhood information. Embodiments of the octree GPU construction system and method input a point cloud and construct a node array. Next, neighboring nodes are computed for each of the nodes in the node arrays by using at least two pre-computed look-up tables (such as a parent look-up table and a child look-up table). Embodiments of the octree GPU construction system and method then use the neighboring nodes and neighborhood information to compute a vertex array, edge array, and face array are computed by determining owner information and self-ownership information based on the neighboring nodes.
    Type: Grant
    Filed: September 29, 2008
    Date of Patent: May 1, 2012
    Assignee: Microsoft Corporation
    Inventors: Kun Zhou, Minmin Gong, Baining Guo
  • Publication number: 20120102299
    Abstract: A processing system includes processors and dynamically configurable communication elements (DCCs) coupled together in an interspersed arrangement. A source device may transfer a data item through an intermediate subset of the DCCs to a destination device. The source and destination devices may each correspond to different processors, DCCs, or input/output devices, or mixed combinations of these. In response to detecting a stall after the source device begins transfer of the data item to the destination device and prior to receipt of all of the data item at the destination device, a stalling device is operable to propagate stalling information through one or more of the intermediate subset towards the source device. In response to receiving the stalling information, at least one of the intermediate subset is operable to buffer all or part of the data item.
    Type: Application
    Filed: December 30, 2011
    Publication date: April 26, 2012
    Inventors: Michael B. Doerr, William H. Hallidy, David A. Gibson, Craig M. Chase
  • Patent number: 8156311
    Abstract: A shared memory network for communicating between processors using store and load instructions is described. A new processor architecture which may be used with the shared memory network is also described that uses arithmetic/logic instructions that do not specify any source operand addresses or target operand addresses. The source operands and target operands for arithmetic/logic execution units are provided by independent load instruction operations and independent store instruction operations.
    Type: Grant
    Filed: November 27, 2010
    Date of Patent: April 10, 2012
    Inventor: Gerald George Pechanek
  • Patent number: 8151088
    Abstract: A plurality of processor tiles are provided, each processor tile including a processor core. An interconnection network interconnects the processor cores and enables transfer of data among the processor cores. The interconnection network has a plurality of dimensions and is configurable to transmit data from an initial processor core or an input/output device to an intermediate processor core based on a first dimension ordering policy, and from the intermediate processor core to a destination processor core. The first dimension ordering policy specifies an ordering of the dimensions of the interconnection network when routing data through the interconnection network.
    Type: Grant
    Filed: July 8, 2008
    Date of Patent: April 3, 2012
    Assignee: Tilera Corporation
    Inventors: Liewei Bao, Ian Rudolf Bratt
  • Patent number: 8135940
    Abstract: A method of rotating data in a plurality of processing elements comprises a plurality of shifting operations and a plurality of storing operations, with the shifting and storing operations coordinated to enable a three shears operation to be performed on the data. The plurality of storing operations is responsive to the processing element's positions.
    Type: Grant
    Filed: March 15, 2011
    Date of Patent: March 13, 2012
    Assignee: Micron Technologies, Inc.
    Inventor: Mark Beaumont
  • Patent number: 8132031
    Abstract: A method, apparatus, and program product optimize power consumption in a parallel computing system that includes a plurality of computing nodes by selectively throttling performance of selected nodes to effectively slow down the completion of quicker executing parts of a workload of the computing system when those parts are dependent upon or otherwise associated with the completion of other, slower executing parts of the same workload. Parts of the workload are executed on the computing nodes, including concurrently executing a first part on a first computing node and a second part on a second computing node. The first node is selectively throttled during execution of the first part to decrease power consumption of the first node and conform a completion time of for the first node in completing the first part of the workload with a completion time for the second node in completing the second part.
    Type: Grant
    Filed: March 17, 2009
    Date of Patent: March 6, 2012
    Assignee: International Business Machines Corporation
    Inventors: Eric Lawrence Barsness, David L. Darrington, Amanda Peters, John Matthew Santosuosso
  • Patent number: 8094764
    Abstract: A cosite interference cancellation system is provided for improved rejection of a signal coupled from a transmission antenna into a local receive antenna in the presence of local multipath. The cosite interference cancellation system and associated method advantageously provide improved signal rejection by continuously controlling (adjusting) a matching time delay to reduce cosite interference.
    Type: Grant
    Filed: December 3, 2008
    Date of Patent: January 10, 2012
    Assignee: BAE Systems Information and Electronic Systems Integration Inc.
    Inventor: Raymond J. Lackey
  • Publication number: 20110219208
    Abstract: A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC).
    Type: Application
    Filed: January 10, 2011
    Publication date: September 8, 2011
    Applicant: International Business Machines Corporation
    Inventors: Sameh Asaad, Ralph E. Bellofatto, Michael A. Blocksome, Matthias A. Blumrich, Peter Boyle, Jose R. Brunheroto, Dong Chen, Chen-Yong Cher, George L. Chiu, Norman Christ, Paul W. Coteus, Kristan D. Davis, Gabor J. Dozsa, Alexandre E. Eichenberger, Noel A. Eisley, Matthew R. Ellavsky, Kahn C. Evans, Bruce M. Fleischer, Thomas W. Fox, Alan Gara, Mark E. Giampapa, Thomas M. Gooding, Michael K. Gschwind, John A. Gunnels, Shawn A. Hall, Rudolf A. Haring, Philip Heidelberger, Todd A. Inglett, Brant L. Knudson, Gerard V. Kopcsay, Sameer Kumar, Amith R. Mamidala, James A. Marcella, Mark G. Megerian, Douglas R. Miller, Samuel J. Miller, Adam J. Muff, Michael B. Mundy, John K. O'Brien, Kathryn M. O'Brien, Martin Ohmacht, Jeffrey J. Parker, Ruth J. Poole, Joseph D. Ratterman, Valentina Salapura, David L. Satterfield, Robert M. Senger, Brian Smith, Burkhard Steinmacher-Burow, William M. Stockdell, Craig B. Stunkel, Krishnan Sugavanam, Yutaka Sugawara, Todd E. Takken, Barry M. Trager, James L. Van Oosten, Charles D. Wait, Robert E. Walkup, Alfred T. Watson, Robert W. Wisniewski, Peng Wu
  • Patent number: 8015390
    Abstract: A flight control system includes an output device, a first processor, and a second processor. The second processor is dissimilar to the first processor. The flight control system also includes a first arbitration device coupled to the first processor and a second arbitration device coupled to the second processor. The second arbitration device is configured to coordinate transaction synchronization with the first arbitration device and the first arbitration device is configured to coordinate transaction synchronization with the second arbitration device. A comparator processor is coupled to the first arbitration device and the second arbitration device. The comparator processor is configured to compare transaction synchronized outputs of the first and second processors and the comparator processor effectuates a command to the output device if the comparison is valid.
    Type: Grant
    Filed: March 19, 2008
    Date of Patent: September 6, 2011
    Assignee: Rockwell Collins, Inc.
    Inventors: James J. Corcoran, Eric J. Danielson, Samir S. Hemaidan, John W. Roltgen, James E. Sisson, Mark A. Kovalan, Mark C. Singer
  • Patent number: 7970735
    Abstract: A data processing and analysis system is provided. The system includes an analysis engine that queries one or more components of data. A rules component specifies a relationship between at least one dimension of the data with respect to at least one other dimension of the data in order to facilitate an analysis of the data. In one example, the analysis engine is provided as an online analytical processing component.
    Type: Grant
    Filed: March 20, 2006
    Date of Patent: June 28, 2011
    Assignee: Microsoft Corporation
    Inventors: Thierry D'Hers, Bala Atur, Marius Dumitru
  • Patent number: 7958183
    Abstract: A mechanism for performing collective operations. In software executing on a parent processor in a first processor book, a number of other processors are determined in a same or different processor book of the data processing system that is needed to execute the collective operation, thereby establishing a plurality of processors comprising the parent processor and the other processors. In software executing on the parent processor, the plurality of processors are logically arranged as a plurality of nodes in a hierarchical structure. The collective operation is transmitted to the plurality of processors based on the hierarchical structure. In hardware of the parent processor, results are received from the execution of the collective operation from the other processors, a final result is generated of the collective operation based on the received results, and the final result is output.
    Type: Grant
    Filed: August 27, 2007
    Date of Patent: June 7, 2011
    Assignee: International Business Machines Corporation
    Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony, William E. Speight
  • Publication number: 20110131391
    Abstract: A technique for manufacturing a three-dimensional integrated circuit includes stacking a memory unit on a first die that includes a first computational unit. In this case, the memory unit is included in a second die. A second computational unit that is included in a third die is stacked on the second die. Sets of vertical vias that extend through the first, second, and third dies are connected to connect components of the first and second computational units and the memory unit. Multiplexers of the first and second computational units are configured to selectively couple the components to different ones of the sets of vertical vias responsive to respective control words for each of the first and third dies.
    Type: Application
    Filed: November 23, 2010
    Publication date: June 2, 2011
    Applicant: International Business Machines Corporation
    Inventors: Harry S. Barowski, Tim Niggemeier
  • Patent number: 7913062
    Abstract: A method of rotating data in a plurality of processing elements comprises a plurality of shifting operations and a plurality of storing operations, with the shifting and storing operations coordinated to enable a three shears operation to be performed on the data. The plurality of storing operations is responsive to the processing element's positions.
    Type: Grant
    Filed: October 20, 2003
    Date of Patent: March 22, 2011
    Assignee: Micron Technology, Inc.
    Inventor: Mark Beaumont
  • Patent number: 7908422
    Abstract: A system and method for single hop, processor-to-processor communication in a multiprocessing system over a plurality of crossbars are disclosed. Briefly described, one embodiment is a multiprocessing system comprising a plurality of processors having a plurality of high-bandwidth point-to-point links; a plurality of processor clusters, each processor cluster having a predefined number of the processors residing therein; and a plurality of crossbars, one of the crossbars coupling each of the processors of one of the plurality of processor clusters to each of the processors of another of the plurality of processor clusters, such that all processors are coupled to each of the other processors, and such that the number of crossbars is equal to [X*(X?1)/2], wherein X equals the number of processor clusters.
    Type: Grant
    Filed: June 10, 2009
    Date of Patent: March 15, 2011
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Gary B. Gostin, Mark E. Shaw
  • Patent number: 7889725
    Abstract: A computer cluster arranged at a lattice point in a lattice-like interconnection network contains four nodes and an internal communication network. Two nodes can transmit packets to adjacent computer clusters located along the X direction, and the two other nodes can transmit packets to adjacent computer clusters located along the Y direction. Each node directly transmits a packet to an adjacent computer cluster in the direction in which the node can transmit packets, when the destination of the packet is located in the direction. When the destination of a packet to be transmitted from a node is not located in the direction in which the receiving node can transmit packets, the node transfers the packet to one of the other nodes through the internal communication network for transmitting the packet to the destination of the packet through the one of the other nodes.
    Type: Grant
    Filed: March 27, 2007
    Date of Patent: February 15, 2011
    Assignee: Fujitsu Limited
    Inventor: Yuichiro Ajima
  • Patent number: 7886128
    Abstract: A shared memory network for communicating between processors using store and load instructions is described. A new processor architecture which may be used with the shared memory network is also described that uses arithmetic/logic instructions that do not specify any source operand addresses or target operand addresses. The source operands and target operands for arithmetic/logic execution units are provided by independent load instruction operations and independent store instruction operations.
    Type: Grant
    Filed: June 3, 2009
    Date of Patent: February 8, 2011
    Inventor: Gerald George Pechanek
  • Patent number: 7840779
    Abstract: Methods, apparatus, and products are disclosed for line-plane broadcasting in a data communications network of a parallel computer, the parallel computer comprising a plurality of compute nodes connected together through the network, the network optimized for point to point data communications and characterized by at least a first dimension, a second dimension, and a third dimension, that include: initiating, by a broadcasting compute node, a broadcast operation, including sending a message to all of the compute nodes along an axis of the first dimension for the network; sending, by each compute node along the axis of the first dimension, the message to all of the compute nodes along an axis of the second dimension for the network; and sending, by each compute node along the axis of the second dimension, the message to all of the compute nodes along an axis of the third dimension for the network.
    Type: Grant
    Filed: August 22, 2007
    Date of Patent: November 23, 2010
    Assignee: International Business Machines Corporation
    Inventors: Charles J. Archer, Jeremy E. Berg, Michael A. Blocksome, Brian E. Smith
  • Patent number: 7673118
    Abstract: This present invention brings to the multiprocessor what vectorization brought to the single processor. It provides similar tools to speed communication that have traditionally been used to speed computation; namely, the capability to program optimal communication algorithms on an architecture that can replicate their performance in terms of wall clock time. In addition to the usual complement of logic and arithmetic units, each processor contains a programmable communication unit that orchestrates traffic between the network and registers that communicate directly with comparable registers in neighboring processors. Communication tasks are performed out of these registers like computational tasks on a vector uniprocessor. The architecture is balanced and the hardware/software combination is scalable to any number of processors.
    Type: Grant
    Filed: June 1, 2006
    Date of Patent: March 2, 2010
    Inventor: Paul N. Swarztrauber
  • Publication number: 20100023728
    Abstract: A method and system for transposing a multi-dimensional array for a multi-processor system having a main memory for storing the multi-dimensional array and a local memory is provided. One implementation involves partitioning the multi-dimensional array into a number of equally sized portions in the local memory, in each processor performing a transpose function including a logical transpose on one of said portions and then a physical transpose of said portion, and combining the transposed portions and storing back in their original place in the main memory.
    Type: Application
    Filed: July 25, 2008
    Publication date: January 28, 2010
    Applicant: International Business Machines Corporation
    Inventors: Ahmed H.M.R. El-Mahdy, Ali A. El-Moursy, Hisham ElShishiny
  • Patent number: 7581087
    Abstract: Techniques for debugging a multicore system with synchronous stop and resume capabilities are described. In one design, an apparatus (e.g., an ASIC) includes first and second processing cores. During debugging, the first or second processing core receives a software command to stop operation and generates a first hardware signal indicating the stop. The other processing core receives the first hardware signal and stops operation. Both processing cores stop at approximately the same time based on the first hardware signal. Thereafter, the first or second processing core receives another software command to resume operation and generates a second hardware signal indicating resumption of operation. The other processing core receives the second hardware signal and resumes operation. Both processing cores resume at approximately the same time based on the second hardware signal. The first and second hardware signals may come from the same or different processing cores.
    Type: Grant
    Filed: February 22, 2006
    Date of Patent: August 25, 2009
    Assignee: QUALCOMM Incorporated
    Inventor: Johnny Kallacheril John
  • Patent number: 7574581
    Abstract: A method of communicating between processing units on different integrated circuit chips in a multi-processor computer system by issuing a command from a source processing unit to a destination processing unit, receiving the command at the destination processing unit while the destination processing unit is processing program instructions, and accessing free-running, scan registers in clock-controlled components of the destination processing unit without interrupting processing of the program instructions by the destination processing unit. The access may be a read from status or mode registers of the destination processing unit, or write to control or mode registers. Many processing units can be interconnected in a ring topology, and the access command can be passed from the source processing unit through several other processing units before reaching the destination processing unit.
    Type: Grant
    Filed: April 28, 2003
    Date of Patent: August 11, 2009
    Assignee: International Business Machines Corporation
    Inventors: Michael Stephen Floyd, Larry Scott Leitner, Kevin Franklin Reick, Kevin Dennis Woodling
  • Patent number: 7526631
    Abstract: A processor book designed to support both commercial workloads and technical workloads based on a dynamic or static mechanism of reconfiguring the external wiring interconnect. The processor book is configured as a building block for commercial workload processing systems with external connector buses (ECBs). The processor book is also provided with routing logic to enable to ECBs to be utilized for either book-to-book routing or routing within the same processor book. A table specific wiring scheme is provided for coupling the ECBs running off the chips of one MCM to the chips of the second MCM on the processor book so that the chips of the first MCM are connected directly to the chips of a second MCM that is logically furthest away and vice versa. Once the wiring of the ECBs are completed according to the wiring scheme, the operational and functional characteristics reflect those of a processor book configured for technical workloads.
    Type: Grant
    Filed: April 28, 2003
    Date of Patent: April 28, 2009
    Assignee: International Business Machines Corporation
    Inventors: Ravi Kumar Arimilli, Vicente Enrique Chung, Jody Bern Joyner, Jerry Don Lewis
  • Publication number: 20090024829
    Abstract: The present invention provides a mixed torus and hypercube multi-rank tensor expansion method which can be applied to the communication subsystem of a parallel processing system. The said expansion method is based on the conventional torus and hypercube topologies. A mixed torus and hypercube multi-rank tensor expansion interconnection network is built up by means of supernodes equipped with expansion interfaces. This method not only provides more bisection bandwidth to the entire system but also improves the long-range communication and global operations. Affirmatively, this expansion method can achieve better scalability and flexibility for the parallel system for a given system size.
    Type: Application
    Filed: June 19, 2008
    Publication date: January 22, 2009
    Inventors: YUEFAN DENG, Peng Zhang
  • Publication number: 20090006808
    Abstract: A novel massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. Novel use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.
    Type: Application
    Filed: June 26, 2007
    Publication date: January 1, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Matthias A. Blumrich, Dong Chen, George Chiu, Thomas M. Cipolla, Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Shawn Hall, Rudolf A. Haring, Philip Heidelberger, Gerard V. Kopcsay, Martin Ohmacht, Valentina Salapura, Krishnan Sugavanam, Todd Takken
  • Patent number: 7441098
    Abstract: A method of executing instructions in a computer system on operands containing a plurality of packed objects in respective lanes of the operand is described. Each instruction defines an operation and contains a condition setting indicator settable independently of the operation. The status of the condition setting indicator determines whether or not multibit condition codes are set. When they are to be set, they are set depending on the results for carrying out the operation for each lane.
    Type: Grant
    Filed: May 6, 2005
    Date of Patent: October 21, 2008
    Assignee: Broadcom Corporation
    Inventor: Sophie Wilson
  • Publication number: 20080209163
    Abstract: A processor book designed to support both commercial workloads and technical workloads based on a dynamic or static mechanism of reconfiguring the external wiring interconnect. The processor book is configured as a building block for commercial workload processing systems with external connector buses (ECBs). The processor book is also provided with routing logic to enable to ECBs to be utilized for either book-to-book routing or routing within the same processor book. A table specific wiring scheme is provided for coupling the ECBs running off the chips of one MCM to the chips of the second MCM on the processor book so that the chips of the first MCM are connected directly to the chips of a second MCM that is logically furthest away and vice versa. Once the wiring of the ECBs are completed according to the wiring scheme, the operational and functional characteristics reflect those of a processor book configured for technical workloads.
    Type: Application
    Filed: May 9, 2008
    Publication date: August 28, 2008
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Ravi Kumar Arimilli, Vicente Enrique Chung, Jody Bern Joyner, Jerry Don Lewis
  • Patent number: 7185226
    Abstract: A multiprocessor, parallel computer is made tolerant to hardware failures by providing extra groups of redundant standby processors and by designing the system so that these extra groups of processors can be swapped with any group which experiences a hardware failure. This swapping can be under software control, thereby permitting the entire computer to sustain a hardware failure but, after swapping in the standby processors, to still appear to software as a pristine, fully functioning system.
    Type: Grant
    Filed: February 25, 2002
    Date of Patent: February 27, 2007
    Assignee: International Business Machines Corporation
    Inventors: Dong Chen, Paul W. Coteus, Alan G. Gara, Todd E. Takken
  • Patent number: 7103639
    Abstract: The present invention flexibly manages the formation of a partition from a plurality of independently executing cells (discrete hardware entities comprising system resources) in preparation for the instantiation of an operating system instance upon the partition. Specifically, the invention manages configuration activities that occur to transition from having individual cells acting independently, and having cells rendezvous, to having cells become interdependent to continue operations as a partition. The invention manages the partitioning forming process such that no single point of failure disrupts the process. Instead, the invention is implemented as a distributed application wherein individual cells independently execute instructions based upon respective copies of the complex profile (a “map” of the complex configuration). Also, the invention adapts to a degree of delay associated with certain cells becoming ready to join the formation or rendezvous process.
    Type: Grant
    Filed: December 5, 2000
    Date of Patent: September 5, 2006
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Andrew C. Walton, Guy L. Kuntz
  • Patent number: 7043562
    Abstract: Irregularities are provided in at least one dimension of a torus or mesh network for lower average path length and lower maximum channel load while increasing tolerance for omitted end-around connections. In preferred embodiments, all nodes supported on each backplane are connected in a single cycle which includes nodes on opposite sides of lower dimension tori. The cycles in adjacent backplanes hop different numbers of nodes.
    Type: Grant
    Filed: June 9, 2003
    Date of Patent: May 9, 2006
    Assignee: Avivi Systems, Inc.
    Inventors: William J. Dally, William F. Mann, Philip P. Carvey
  • Patent number: 6973559
    Abstract: A system and method for interconnecting a plurality of processing element nodes within a scalable multiprocessor system is provided. Each processing element node includes at least one processor and memory. A scalable interconnect network includes physical communication links interconnecting the processing element nodes in a cluster. A first set of routers in the scalable interconnect network route messages between the plurality of processing element nodes. One or more metarouters in the scalable interconnect network route messages between the first set of routers so that each one of the routers in a first cluster is connected to all other clusters through one or more metarouters.
    Type: Grant
    Filed: September 29, 1999
    Date of Patent: December 6, 2005
    Assignee: Silicon Graphics, Inc.
    Inventors: Martin M. Deneroff, Gregory M. Thorson, Randal S. Passint
  • Patent number: 6898657
    Abstract: A multi-processor arrangement having an interprocessor communication path between each of every possible pair of processors, in addition to I/O paths to and from the arrangement, having signal processing functions configurably embedded in series with the communication paths and/or the I/O paths. Each processor is provided with a local memory which can be accessed by the local processor as well as by the other processors via the communications paths. This allows for efficient data movement from one processor's local memory to another processor's local memory, such as commonly done during signal processing corner turning operations. Configurable signal processing logic may be configured to host one or more signal processing functions which allow data to be autonomously accessed from the processor local memories, processed, and re-deposited in a local memory.
    Type: Grant
    Filed: December 16, 2002
    Date of Patent: May 24, 2005
    Assignee: Tera Force Technology Corp.
    Inventor: Winthrop W. Smith
  • Patent number: 6873287
    Abstract: The present invention relates to a method and an arrangement suitable for embedded signal processing, comprising a number of computational units (100), each computational unit comprising a number of processing elements (20) capable of working independently and transmitting data simultaneously. Said computational units are arranged in clusters, work independently, and transmit data simultaneously, and that said processing elements (20) are globally and regularly inter-connected optically in a hypercube topology and transformed into a planar waveguide.
    Type: Grant
    Filed: November 1, 2001
    Date of Patent: March 29, 2005
    Assignee: Telefonaktiebolaget LM Ericsson
    Inventor: Häkan Forsberg
  • Patent number: 6769056
    Abstract: A manifold array topology includes processing elements, nodes, memories or the like arranged in clusters. Clusters are connected by cluster switch arrangements which advantageously allow changes of organization without physical rearrangement of processing elements. A significant reduction in the typical number of interconnections for preexisting arrays is also achieved. Fast, efficient and cost effective processing and communication result with the added benefit of ready scalability.
    Type: Grant
    Filed: September 24, 2002
    Date of Patent: July 27, 2004
    Assignee: PTS Corporation
    Inventors: Edwin F. Barry, Thomas L. Drabenstott, Gerald G. Pechanek, Nikos P. Pitsianis
  • Patent number: 6754892
    Abstract: A process for packing an instruction word including providing a word value representing an instruction word into which an operation is to be fit be equal to some initial value having a plurality of portions representing constraints, operating on the initial value of the value word with operation class values having a plurality of portions representing constraints of a new operation as the new operation is attempted to be fit into the instruction to affect the processor word value in a manner to indicate when the limit of any constraint for the instruction is reached, and determining a violation of any constraint to determine that the new operation does not fit the format.
    Type: Grant
    Filed: December 15, 1999
    Date of Patent: June 22, 2004
    Assignee: Transmeta Corporation
    Inventor: Stephen C. Johnson
  • Patent number: 6754735
    Abstract: A processing system includes a processing device and a host processor operatively coupled to the processing device via a system bus, and implements a scatter gather data transfer technique. The host processor is configurable to control the transfer of information to or from scattered or non-contiguous memory locations in a memory associated with the processing device, utilizing a data structure comprising a single descriptor. An information transfer bandwidth of the system bus is thereby more efficiently utilized than if a separate descriptor were used for transfer of information involving each of the non-contiguous memory locations.
    Type: Grant
    Filed: December 21, 2001
    Date of Patent: June 22, 2004
    Assignee: Agere Systems Inc.
    Inventors: Prachi Kale, Stephen H. Miller, Abraham Prasad, Narender R. Vangati
  • Patent number: 6741552
    Abstract: Generally speaking, the cell switching architecture of the present invention offers a powerful, simple, and in many ways elegant solution to the problem of providing cost-effective, high-bandwidth, fault-tolerant cell switching. The architecture is based on a network of switching elements connected in a hypercube topology to form a switch fabric. The generalized hypercube is D dimensional, where D≧3 when all radices in the radix set are 2 and D≧2 when at least one of the radices is greater than 2. A fully-populated switch is fully symmetric: each switching element has the same number and kind of connections to both its neighbors and to the outside world as every other switching element. In an exemplary embodiment, each switching element is connected to one data source and one data sink, e.g., a Utopia bus or other broadband connection. In the same exemplary embodiment, links between switching elements are bidirectional and synchronous, operating in accordance with a Cell Exchange Cycle (CEC).
    Type: Grant
    Filed: February 12, 1998
    Date of Patent: May 25, 2004
    Assignee: PMC Sierra Inertnational, Inc.
    Inventors: Carl McCrosky, Jeff S. Roe, Ian G. Barrett, Ken Sailor
  • Patent number: 6680915
    Abstract: A router, which is basically a point-to-point communication router, is devised for the BUS-like communication between processors. Therefore, it is named as ‘Virtual Bus’. One processor is connected to one router and the router can be connected in one dimensional array or two dimensional arrays. In case of two dimensional arrays, there are row and column router controllers. The method of communication consists of two phases: Firstly, the path between source processor and destination processor by sending set-up message. Secondly, messages are transferred without intervention of the intermediate routers between the source and destination processors. The idea is that the intermediate routers are set up to by-passing mode at the set-up phase. That is the routers in by-passing mode just relay the incoming messages to their output ports without any interruption. Therefore the virtual bus can guarantee high speed communication between processors.
    Type: Grant
    Filed: May 18, 1999
    Date of Patent: January 20, 2004
    Assignee: Korea Advanced Institute of Science and Technology
    Inventors: Kyu Ho Park, Jong Hyuk Choi, Bong Wan Kim
  • Publication number: 20030212877
    Abstract: Irregularities are provided in at least one dimension of a torus or mesh network for lower average path length and lower maximum channel load while increasing tolerance for omitted end-around connections. In preferred embodiments, all nodes supported on each backplane are connected in a single cycle which includes nodes on opposite sides of lower dimension tori. The cycles in adjacent backplanes hop different numbers of nodes.
    Type: Application
    Filed: June 9, 2003
    Publication date: November 13, 2003
    Applicant: Avici Systems, Inc.
    Inventors: William J. Dally, William F. Mann, Philip P. Carvey
  • Publication number: 20030172247
    Abstract: A method is described for providing performance metrics stored in an array of at least three-dimensions. The method includes receiving at least one metric criteria associated with a performance metric. The method also includes determining a list of array elements. The list represents a portion of the array including the at least one metric criteria. The method further includes sorting the list of array elements according to predetermined ordering criteria to identify a best match of the at least one metric criteria. A system and article of manufacture are also described for providing performance metrics stored in an array of at least three dimensions.
    Type: Application
    Filed: July 8, 2002
    Publication date: September 11, 2003
    Applicant: Computer Associates Think, Inc.
    Inventors: Christopher Bayer, Nigel Trousdale
  • Patent number: 6609189
    Abstract: The poor scalability of existing superscalar processors has been of great concern to the computer engineering community. In particular, the critical-path delays of many components in existing implementations grow quadratically with the issue width and the window size. This patent presents a novel way to reimplement these components and reduce their critical-path delay growth. It then describes an entire processor microarchitecture, called the Ultrascalar processor, that has better critical-path delay growth than existing superscalars. Most of our scalable designs are based on a single circuit, a cyclic segmented parallel prefix (cspp). We observe that processor components typically operate on a wrap-around sequence of instructions, computing some associative property of that sequence. For example, to assign an ALU to the oldest requesting instruction, each instruction in the instruction sequence must be told whether any preceding instructions are requesting an ALU.
    Type: Grant
    Filed: March 12, 1999
    Date of Patent: August 19, 2003
    Assignee: Yale University
    Inventors: Bradley C. Kuszmaul, Dana Sue Henry-Kuszmaul