Cube Or Hypercube Patents (Class 712/12)

3-D STACKED MULTIPROCESSOR STRUCTURES AND METHODS TO ENABLE RELIABLE OPERATION OF PROCESSORS AT SPEEDS ABOVE SPECIFIED LIMITS

Publication number: 20140006852

Abstract: A three-dimensional (3-D) processor system includes a first processor chip and a second processor chip in a stacked configuration. The first processor chip includes a first processor having a first set of state registers. The second processor chip includes a second processor having a second set of state registers that corresponds to the first set of state registers. The first and second processors are connected through vertical connections between the first and second processor chips. A mode control circuit operates the processor system in one of a plurality of operating modes. In one mode of operation, the first processor is active and the second processor is inactive, and the first processor operates at a speed greater than a maximum safe speed of the first processor, and the first processor uses the second set of state registers of the second processor to checkpoint a state of the first processor.

Type: Application

Filed: June 28, 2012

Publication date: January 2, 2014

Applicant: International Business Machines Corporation

Inventors: Alper Buyuktosunoglu, Philip G. Emma, Allan M. Hartstein, Michael B. Healy, Krishnan K. Kailas
3-D STACKED MULTIPROCESSOR STRUCTURES AND METHODS FOR MULTIMODAL OPERATION OF SAME

Publication number: 20130283005

Abstract: Three-dimensional (3-D) processor devices are provided, which are constructed by connecting processors in a stacked configuration. For instance, a semiconductor device includes a first processor chip comprising one or more processors, a second processor chip comprising one or more processors, and a plurality of input/output ports. The first and second processor chips are connected in a stacked configuration and commonly share the plurality of input/output ports. Methods are also provided to selectively operate the semiconductor device in one of a plurality of operating modes to control power of the semiconductor device.

Type: Application

Filed: April 20, 2012

Publication date: October 24, 2013

Applicant: International Business Machines Corporation

Inventor: Philip G. Emma
3-D STACKED MULTIPROCESSOR STRUCTURES AND METHODS FOR MULTIMODAL OPERATION OF SAME

Publication number: 20130283006

Abstract: Three-dimensional (3-D) processor structures are provided which are constructed by connecting processors in a stacked configuration. For example, a processor system includes a first processor chip comprising a first processor, and a second processor chip comprising a second processor. The first and second processor chips are connected in a stacked configuration with the first and second processors connected through vertical connections between the first and second processor chips. The processor system further includes a mode control circuit to selectively configure the first and second processors of the first and second processor chips to operate in one of a plurality of operating modes, wherein the processors can be selectively configured to operate independently, to aggregate resources, to share resources, and/or be combined to form a single processor image.

Type: Application

Filed: September 4, 2012

Publication date: October 24, 2013

Applicant: International Business Machines Corporation

Inventors: Alper Buyuktosunoglu, Philip G. Emma, Allan M. Hartstein, Michael B. Healy, Krishnan Kunjunny Kailas
Selectively isolating processor elements into subsets of processor elements

Patent number: 8532288

Abstract: A cryptographic engine for modulo N multiplication, which is structured as a plurality of almost identical, serially connected Processing Elements, is controlled so as to accept input in blocks that are smaller than the maximum capability of the engine in terms of bits multiplied at one time. The serially connected hardware is thus partitioned on the fly to process a variety of cryptographic key sizes while still maintaining all of the hardware in an active processing state.

Type: Grant

Filed: December 1, 2006

Date of Patent: September 10, 2013

Assignee: International Business Machines Corporation

Inventors: Camil Fayad, John K. Li, Siegfried K. H. Sutter, Phil C. Yeh
INFORMATION PROCESSING SYSTEM, ROUTING METHOD AND PROGRAM

Publication number: 20130232319

Abstract: A disclosed information processing system includes 2n nodes that are connected in a manner of an n-dimensional hyper cube, wherein the n is a natural number equal to or greater than 3. A channel is provided between each node of specific nodes satisfying a predetermined condition among the 2n nodes and each of the specific nodes other than the node, and the predetermined condition is a condition that all numerical values of digits other than a last digit and a second last digit of a node number are same, in case where an n-digit binary node number is assigned to each of the 2n nodes so that a Hamming distance between directly connected nodes is 1.

Type: Application

Filed: April 15, 2013

Publication date: September 5, 2013

Applicant: FUJITSU LIMITED

Inventor: Toru KONO
Mixed torus and hypercube multi-rank tensor expansion method

Patent number: 8510535

Abstract: The present invention provides a mixed torus and hypercube multi-rank tensor expansion method which can be applied to the communication subsystem of a parallel processing system. The said expansion method is based on the conventional torus and hypercube topologies. A mixed torus and hypercube multi-rank tensor expansion interconnection network is built up by means of supernodes equipped with expansion interfaces. This method not only provides more bisection bandwidth to the entire system but also improves the long-range communication and global operations. Affirmatively, this expansion method can achieve better scalability and flexibility for the parallel system for a given system size.

Type: Grant

Filed: June 19, 2008

Date of Patent: August 13, 2013

Assignee: Shanghai Redneurons Co., Ltd

Inventors: Yuefan Deng, Peng Zhang
Method and system for implementing efficient locking to facilitate parallel processing of IC designs

Patent number: 8438512

Abstract: Disclosed is an improved method and system for implementing parallelism for execution of electronic design automation (EDA) tools, such as layout processing tools. Examples of EDA layout processing tools are placement and routing tools. Efficient locking mechanism are described for facilitating parallel processing and to minimize blocking.

Type: Grant

Filed: August 30, 2011

Date of Patent: May 7, 2013

Assignee: Cadence Design Systems, Inc.

Inventors: David Cross, Eric Nequist
Main processing element for delegating virtualized control threads controlling clock speed and power consumption to groups of sub-processing elements in a system such that a group of sub-processing elements can be designated as pseudo main processing element

Patent number: 8438404

Abstract: The disclosure is applied to a generic microprocessor architecture with a set (e.g., one or more) of controlling elements (e.g., MPEs) and a set of groups of sub-processing elements (e.g., SPEs). Under this arrangement, MPEs and SPEs are organized in a way that a smaller number MPEs control the behavior of a group of SPEs using program code embodied as a set of virtualized control threads. The arrangement also enables MPEs delegate functionality to one or more groups of SPEs such that those group(s) of SPEs will act as pseudo MPEs. The pseudo MPEs will utilize pseudo virtualized control threads to control the behavior of other groups of SPEs. In a typical embodiment, the apparatus includes a MCP coupled to a power supply coupled with cores to provide a supply voltage to each core (or core group) and controlling-digital elements and multiple instances of sub-processing elements.

Type: Grant

Filed: September 30, 2008

Date of Patent: May 7, 2013

Assignee: International Business Machines Corporation

Inventors: Karl J. Duvalsaint, Harm P. Hofstee, Daeik Kim, Moon J. Kim
Network topology for a scalable multiprocessor system

Patent number: 8433816

Abstract: A system and method for interconnecting a plurality of processing element nodes within a scalable multiprocessor system is provided. Each processing element node includes at least one processor and memory. A scalable interconnect network includes physical communication links interconnecting the processing element nodes in a cluster. A first set of routers in the scalable interconnect network route messages between the plurality of processing element nodes. One or more metarouters in the scalable interconnect network route messages between the first set of routers so that each one of the routers in a first cluster is connected to all other clusters through one or more metarouters.

Type: Grant

Filed: May 16, 2008

Date of Patent: April 30, 2013

Assignee: Silicon Graphics International Corp.

Inventors: Martin M. Deneroff, Gregory M. Thorson, Randal S. Passint
Variable time delay control structure for channel matching

Patent number: 8411809

Abstract: A cosite interference cancellation system is provided for improved rejection of a signal coupled from a transmission antenna into a local receive antenna in the presence of local multipath. The cosite interference cancellation system and associated method advantageously provide improved signal rejection by continuously controlling (adjusting) a matching time delay to reduce cosite interference.

Type: Grant

Filed: January 4, 2012

Date of Patent: April 2, 2013

Assignee: BAE Systems Information and Electronic Systems Integration Inc.

Inventor: Raymond J. Lackey
Mechanism for process migration on a massively parallel computer

Patent number: 8370844

Abstract: Embodiments off the invention provide a mechanism for process migration on a massively parallel computer system. In particular, embodiments of the invention may be used to update process state data for a migrated compute node, such as MPI (or other communication library) state data, across a full collection of compute nodes present in a given parallel system executing a parallel task. Migrating a process form one compute node to another may be useful to address a variety of sub-optimal operating conditions. For example, one or more processes may be migrated to cure network congestion resulting from a poorly mapped task or when a compute node is predicted to experience a hardware failure.

Type: Grant

Filed: September 12, 2007

Date of Patent: February 5, 2013

Assignee: International Business Machines Corporation

Inventors: Charles Jens Archer, David L. Darrington, Patrick Joseph McCarthy, Amanda Peters, Albert Sidelnik
Enhanced Modularity in Heterogeneous 3D Stacks

Publication number: 20120272040

Abstract: A computer program product for generating and implementing a three-dimensional (3D) computer processing chip stack plan. The computer readable program code includes computer readable program code configured for receiving system requirements from a plurality of clients, identifying common processing structures and technologies from the system requirements, and assigning the common processing structures and technologies to at least one layer in the 3D computer processing chip stack plan. The computer readable program code is also configured for identifying uncommon processing structures and technologies from the system requirements and assigning the uncommon processing structures and technologies to a host layer in the 3D computer processing chip stack plan. The computer readable program code is further configured for determining placement and wiring of the uncommon structures on the host layer, storing placement information in the plan, and transmitting the plan to manufacturing equipment.

Type: Application

Filed: June 28, 2012

Publication date: October 25, 2012
Image processing apparatus and image processing method

Patent number: 8218911

Abstract: An image processing apparatus which applies processes to input image data is disclosed. The image processing apparatus includes a first processing section which applies processes to the image data by a specific calculating device, and a second processing section which applies processes to the image data by a general-purpose calculating program. The input image data are multilevel image data. The first processing section includes an image data binarizing unit for forming binary image data from the multilevel image data, and a multilevel image data processing section for applying a calculation process to the multilevel image data. The second processing section includes a binary image data processing section for applying a calculation process to the binary image data formed by the image data binarizing unit.

Type: Grant

Filed: December 18, 2007

Date of Patent: July 10, 2012

Assignee: Ricoh Company, Ltd.

Inventor: Makoto Odamaki
Parallel data processing apparatus

Patent number: 8169440

Abstract: A method of processing data relating to geometrical primitives is disclosed. Each of the primitives has a plurality of vertices. The method uses a plurality of processing elements in parallel with one another, and comprises assigning respective vertex data to the processing elements, on each processing element, and in parallel with one another, performing at least one processing step on vertex data to produce processed vertex data, and transferring processed vertex data between processing elements so as to assemble primitive data.

Type: Grant

Filed: May 29, 2007

Date of Patent: May 1, 2012

Assignee: Rambus Inc.

Inventors: Dave Stuttard, Dave Williams, Eamon O'Dea, Gordon Faulds, John Rhoades, Ken Cameron, Phil Atkin, Paul Winser, Russell David, Ray McConnell, Tim Day, Trey Greer
Octree construction on graphics processing units

Patent number: 8169434

Abstract: An octree GPU construction system and method for constructing a complete octree data structure on a graphics processing unit (GPU). Embodiments of the octree GPU construction system and method first defines a complete octree data structure as forming a complete partition of the 3-D space and including a vertex, edge, face, and node arrays, and neighborhood information. Embodiments of the octree GPU construction system and method input a point cloud and construct a node array. Next, neighboring nodes are computed for each of the nodes in the node arrays by using at least two pre-computed look-up tables (such as a parent look-up table and a child look-up table). Embodiments of the octree GPU construction system and method then use the neighboring nodes and neighborhood information to compute a vertex array, edge array, and face array are computed by determining owner information and self-ownership information based on the neighboring nodes.

Type: Grant

Filed: September 29, 2008

Date of Patent: May 1, 2012

Assignee: Microsoft Corporation

Inventors: Kun Zhou, Minmin Gong, Baining Guo
STALL PROPAGATION IN A PROCESSING SYSTEM WITH INTERSPERSED PROCESSORS AND COMMUNICATON ELEMENTS

Publication number: 20120102299

Abstract: A processing system includes processors and dynamically configurable communication elements (DCCs) coupled together in an interspersed arrangement. A source device may transfer a data item through an intermediate subset of the DCCs to a destination device. The source and destination devices may each correspond to different processors, DCCs, or input/output devices, or mixed combinations of these. In response to detecting a stall after the source device begins transfer of the data item to the destination device and prior to receipt of all of the data item at the destination device, a stalling device is operable to propagate stalling information through one or more of the intermediate subset towards the source device. In response to receiving the stalling information, at least one of the intermediate subset is operable to buffer all or part of the data item.

Type: Application

Filed: December 30, 2011

Publication date: April 26, 2012

Inventors: Michael B. Doerr, William H. Hallidy, David A. Gibson, Craig M. Chase
Interconnection networks and methods of construction thereof for efficiently sharing memory and processing in a multiprocessor wherein connections are made according to adjacency of nodes in a dimension

Patent number: 8156311

Abstract: A shared memory network for communicating between processors using store and load instructions is described. A new processor architecture which may be used with the shared memory network is also described that uses arithmetic/logic instructions that do not specify any source operand addresses or target operand addresses. The source operands and target operands for arithmetic/logic execution units are provided by independent load instruction operations and independent store instruction operations.

Type: Grant

Filed: November 27, 2010

Date of Patent: April 10, 2012

Inventor: Gerald George Pechanek
Configuring routing in mesh networks

Patent number: 8151088

Abstract: A plurality of processor tiles are provided, each processor tile including a processor core. An interconnection network interconnects the processor cores and enables transfer of data among the processor cores. The interconnection network has a plurality of dimensions and is configurable to transmit data from an initial processor core or an input/output device to an intermediate processor core based on a first dimension ordering policy, and from the intermediate processor core to a destination processor core. The first dimension ordering policy specifies an ordering of the dimensions of the interconnection network when routing data through the interconnection network.

Type: Grant

Filed: July 8, 2008

Date of Patent: April 3, 2012

Assignee: Tilera Corporation

Inventors: Liewei Bao, Ian Rudolf Bratt
Method of rotating data in a plurality of processing elements

Patent number: 8135940

Abstract: A method of rotating data in a plurality of processing elements comprises a plurality of shifting operations and a plurality of storing operations, with the shifting and storing operations coordinated to enable a three shears operation to be performed on the data. The plurality of storing operations is responsive to the processing element's positions.

Type: Grant

Filed: March 15, 2011

Date of Patent: March 13, 2012

Assignee: Micron Technologies, Inc.

Inventor: Mark Beaumont
Power adjustment based on completion times in a parallel computing system

Patent number: 8132031

Abstract: A method, apparatus, and program product optimize power consumption in a parallel computing system that includes a plurality of computing nodes by selectively throttling performance of selected nodes to effectively slow down the completion of quicker executing parts of a workload of the computing system when those parts are dependent upon or otherwise associated with the completion of other, slower executing parts of the same workload. Parts of the workload are executed on the computing nodes, including concurrently executing a first part on a first computing node and a second part on a second computing node. The first node is selectively throttled during execution of the first part to decrease power consumption of the first node and conform a completion time of for the first node in completing the first part of the workload with a completion time for the second node in completing the second part.

Type: Grant

Filed: March 17, 2009

Date of Patent: March 6, 2012

Assignee: International Business Machines Corporation

Inventors: Eric Lawrence Barsness, David L. Darrington, Amanda Peters, John Matthew Santosuosso
Variable time delay control structure for channel matching

Patent number: 8094764

Abstract: A cosite interference cancellation system is provided for improved rejection of a signal coupled from a transmission antenna into a local receive antenna in the presence of local multipath. The cosite interference cancellation system and associated method advantageously provide improved signal rejection by continuously controlling (adjusting) a matching time delay to reduce cosite interference.

Type: Grant

Filed: December 3, 2008

Date of Patent: January 10, 2012

Assignee: BAE Systems Information and Electronic Systems Integration Inc.

Inventor: Raymond J. Lackey
MULTI-PETASCALE HIGHLY EFFICIENT PARALLEL SUPERCOMPUTER

Publication number: 20110219208

Abstract: A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC).

Type: Application

Filed: January 10, 2011

Publication date: September 8, 2011

Applicant: International Business Machines Corporation

Inventors: Sameh Asaad, Ralph E. Bellofatto, Michael A. Blocksome, Matthias A. Blumrich, Peter Boyle, Jose R. Brunheroto, Dong Chen, Chen-Yong Cher, George L. Chiu, Norman Christ, Paul W. Coteus, Kristan D. Davis, Gabor J. Dozsa, Alexandre E. Eichenberger, Noel A. Eisley, Matthew R. Ellavsky, Kahn C. Evans, Bruce M. Fleischer, Thomas W. Fox, Alan Gara, Mark E. Giampapa, Thomas M. Gooding, Michael K. Gschwind, John A. Gunnels, Shawn A. Hall, Rudolf A. Haring, Philip Heidelberger, Todd A. Inglett, Brant L. Knudson, Gerard V. Kopcsay, Sameer Kumar, Amith R. Mamidala, James A. Marcella, Mark G. Megerian, Douglas R. Miller, Samuel J. Miller, Adam J. Muff, Michael B. Mundy, John K. O'Brien, Kathryn M. O'Brien, Martin Ohmacht, Jeffrey J. Parker, Ruth J. Poole, Joseph D. Ratterman, Valentina Salapura, David L. Satterfield, Robert M. Senger, Brian Smith, Burkhard Steinmacher-Burow, William M. Stockdell, Craig B. Stunkel, Krishnan Sugavanam, Yutaka Sugawara, Todd E. Takken, Barry M. Trager, James L. Van Oosten, Charles D. Wait, Robert E. Walkup, Alfred T. Watson, Robert W. Wisniewski, Peng Wu
Dissimilar processor synchronization in fly-by-wire high integrity computing platforms and displays

Patent number: 8015390

Abstract: A flight control system includes an output device, a first processor, and a second processor. The second processor is dissimilar to the first processor. The flight control system also includes a first arbitration device coupled to the first processor and a second arbitration device coupled to the second processor. The second arbitration device is configured to coordinate transaction synchronization with the first arbitration device and the first arbitration device is configured to coordinate transaction synchronization with the second arbitration device. A comparator processor is coupled to the first arbitration device and the second arbitration device. The comparator processor is configured to compare transaction synchronized outputs of the first and second processors and the comparator processor effectuates a command to the output device if the comparison is valid.

Type: Grant

Filed: March 19, 2008

Date of Patent: September 6, 2011

Assignee: Rockwell Collins, Inc.

Inventors: James J. Corcoran, Eric J. Danielson, Samir S. Hemaidan, John W. Roltgen, James E. Sisson, Mark A. Kovalan, Mark C. Singer
Cross varying dimension support for analysis services engine

Patent number: 7970735

Abstract: A data processing and analysis system is provided. The system includes an analysis engine that queries one or more components of data. A rules component specifies a relationship between at least one dimension of the data with respect to at least one other dimension of the data in order to facilitate an analysis of the data. In one example, the analysis engine is provided as an online analytical processing component.

Type: Grant

Filed: March 20, 2006

Date of Patent: June 28, 2011

Assignee: Microsoft Corporation

Inventors: Thierry D'Hers, Bala Atur, Marius Dumitru
Performing collective operations using software setup and partial software execution at leaf nodes in a multi-tiered full-graph interconnect architecture

Patent number: 7958183

Abstract: A mechanism for performing collective operations. In software executing on a parent processor in a first processor book, a number of other processors are determined in a same or different processor book of the data processing system that is needed to execute the collective operation, thereby establishing a plurality of processors comprising the parent processor and the other processors. In software executing on the parent processor, the plurality of processors are logically arranged as a plurality of nodes in a hierarchical structure. The collective operation is transmitted to the plurality of processors based on the hierarchical structure. In hardware of the parent processor, results are received from the execution of the collective operation from the other processors, a final result is generated of the collective operation based on the received results, and the final result is output.

Type: Grant

Filed: August 27, 2007

Date of Patent: June 7, 2011

Assignee: International Business Machines Corporation

Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony, William E. Speight
Integrated Circuit with Stacked Computational Units and Configurable through Vias

Publication number: 20110131391

Abstract: A technique for manufacturing a three-dimensional integrated circuit includes stacking a memory unit on a first die that includes a first computational unit. In this case, the memory unit is included in a second die. A second computational unit that is included in a third die is stacked on the second die. Sets of vertical vias that extend through the first, second, and third dies are connected to connect components of the first and second computational units and the memory unit. Multiplexers of the first and second computational units are configured to selectively couple the components to different ones of the sets of vertical vias responsive to respective control words for each of the first and third dies.

Type: Application

Filed: November 23, 2010

Publication date: June 2, 2011

Applicant: International Business Machines Corporation

Inventors: Harry S. Barowski, Tim Niggemeier
Method of rotating data in a plurality of processing elements

Patent number: 7913062

Abstract: A method of rotating data in a plurality of processing elements comprises a plurality of shifting operations and a plurality of storing operations, with the shifting and storing operations coordinated to enable a three shears operation to be performed on the data. The plurality of storing operations is responsive to the processing element's positions.

Type: Grant

Filed: October 20, 2003

Date of Patent: March 22, 2011

Assignee: Micron Technology, Inc.

Inventor: Mark Beaumont
System and method for a distributed crossbar network using a plurality of crossbars

Patent number: 7908422

Abstract: A system and method for single hop, processor-to-processor communication in a multiprocessing system over a plurality of crossbars are disclosed. Briefly described, one embodiment is a multiprocessing system comprising a plurality of processors having a plurality of high-bandwidth point-to-point links; a plurality of processor clusters, each processor cluster having a predefined number of the processors residing therein; and a plurality of crossbars, one of the crossbars coupling each of the processors of one of the plurality of processor clusters to each of the processors of another of the plurality of processor clusters, such that all processors are coupled to each of the other processors, and such that the number of crossbars is equal to [X*(X?1)/2], wherein X equals the number of processor clusters.

Type: Grant

Filed: June 10, 2009

Date of Patent: March 15, 2011

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: Gary B. Gostin, Mark E. Shaw
Computer cluster

Patent number: 7889725

Abstract: A computer cluster arranged at a lattice point in a lattice-like interconnection network contains four nodes and an internal communication network. Two nodes can transmit packets to adjacent computer clusters located along the X direction, and the two other nodes can transmit packets to adjacent computer clusters located along the Y direction. Each node directly transmits a packet to an adjacent computer cluster in the direction in which the node can transmit packets, when the destination of the packet is located in the direction. When the destination of a packet to be transmitted from a node is not located in the direction in which the receiving node can transmit packets, the node transfers the packet to one of the other nodes through the internal communication network for transmitting the packet to the destination of the packet through the one of the other nodes.

Type: Grant

Filed: March 27, 2007

Date of Patent: February 15, 2011

Assignee: Fujitsu Limited

Inventor: Yuichiro Ajima
Interconnection network and method of construction thereof for efficiently sharing memory and processing in a multi-processor wherein connections are made according to adjacency of nodes in a dimension

Patent number: 7886128

Abstract: A shared memory network for communicating between processors using store and load instructions is described. A new processor architecture which may be used with the shared memory network is also described that uses arithmetic/logic instructions that do not specify any source operand addresses or target operand addresses. The source operands and target operands for arithmetic/logic execution units are provided by independent load instruction operations and independent store instruction operations.

Type: Grant

Filed: June 3, 2009

Date of Patent: February 8, 2011

Inventor: Gerald George Pechanek
Line-plane broadcasting in a data communications network of a parallel computer

Patent number: 7840779

Abstract: Methods, apparatus, and products are disclosed for line-plane broadcasting in a data communications network of a parallel computer, the parallel computer comprising a plurality of compute nodes connected together through the network, the network optimized for point to point data communications and characterized by at least a first dimension, a second dimension, and a third dimension, that include: initiating, by a broadcasting compute node, a broadcast operation, including sending a message to all of the compute nodes along an axis of the first dimension for the network; sending, by each compute node along the axis of the first dimension, the message to all of the compute nodes along an axis of the second dimension for the network; and sending, by each compute node along the axis of the second dimension, the message to all of the compute nodes along an axis of the third dimension for the network.

Type: Grant

Filed: August 22, 2007

Date of Patent: November 23, 2010

Assignee: International Business Machines Corporation

Inventors: Charles J. Archer, Jeremy E. Berg, Michael A. Blocksome, Brian E. Smith
System and method for vector-parallel multiprocessor communication

Patent number: 7673118

Abstract: This present invention brings to the multiprocessor what vectorization brought to the single processor. It provides similar tools to speed communication that have traditionally been used to speed computation; namely, the capability to program optimal communication algorithms on an architecture that can replicate their performance in terms of wall clock time. In addition to the usual complement of logic and arithmetic units, each processor contains a programmable communication unit that orchestrates traffic between the network and registers that communicate directly with comparable registers in neighboring processors. Communication tasks are performed out of these registers like computational tasks on a vector uniprocessor. The architecture is balanced and the hardware/software combination is scalable to any number of processors.

Type: Grant

Filed: June 1, 2006

Date of Patent: March 2, 2010

Inventor: Paul N. Swarztrauber
METHOD AND SYSTEM FOR IN-PLACE MULTI-DIMENSIONAL TRANSPOSE FOR MULTI-CORE PROCESSORS WITH SOFTWARE-MANAGED MEMORY HIERARCHY

Publication number: 20100023728

Abstract: A method and system for transposing a multi-dimensional array for a multi-processor system having a main memory for storing the multi-dimensional array and a local memory is provided. One implementation involves partitioning the multi-dimensional array into a number of equally sized portions in the local memory, in each processor performing a transpose function including a logical transpose on one of said portions and then a physical transpose of said portion, and combining the transposed portions and storing back in their original place in the main memory.

Type: Application

Filed: July 25, 2008

Publication date: January 28, 2010

Applicant: International Business Machines Corporation

Inventors: Ahmed H.M.R. El-Mahdy, Ali A. El-Moursy, Hisham ElShishiny
Method and apparatus for debugging a multicore system

Patent number: 7581087

Abstract: Techniques for debugging a multicore system with synchronous stop and resume capabilities are described. In one design, an apparatus (e.g., an ASIC) includes first and second processing cores. During debugging, the first or second processing core receives a software command to stop operation and generates a first hardware signal indicating the stop. The other processing core receives the first hardware signal and stops operation. Both processing cores stop at approximately the same time based on the first hardware signal. Thereafter, the first or second processing core receives another software command to resume operation and generates a second hardware signal indicating resumption of operation. The other processing core receives the second hardware signal and resumes operation. Both processing cores resume at approximately the same time based on the second hardware signal. The first and second hardware signals may come from the same or different processing cores.

Type: Grant

Filed: February 22, 2006

Date of Patent: August 25, 2009

Assignee: QUALCOMM Incorporated

Inventor: Johnny Kallacheril John
Cross-chip communication mechanism in distributed node topology to access free-running scan registers in clock-controlled components

Patent number: 7574581

Abstract: A method of communicating between processing units on different integrated circuit chips in a multi-processor computer system by issuing a command from a source processing unit to a destination processing unit, receiving the command at the destination processing unit while the destination processing unit is processing program instructions, and accessing free-running, scan registers in clock-controlled components of the destination processing unit without interrupting processing of the program instructions by the destination processing unit. The access may be a read from status or mode registers of the destination processing unit, or write to control or mode registers. Many processing units can be interconnected in a ring topology, and the access command can be passed from the source processing unit through several other processing units before reaching the destination processing unit.

Type: Grant

Filed: April 28, 2003

Date of Patent: August 11, 2009

Assignee: International Business Machines Corporation

Inventors: Michael Stephen Floyd, Larry Scott Leitner, Kevin Franklin Reick, Kevin Dennis Woodling
Data processing system with backplane and processor books configurable to support both technical and commercial workloads

Patent number: 7526631

Abstract: A processor book designed to support both commercial workloads and technical workloads based on a dynamic or static mechanism of reconfiguring the external wiring interconnect. The processor book is configured as a building block for commercial workload processing systems with external connector buses (ECBs). The processor book is also provided with routing logic to enable to ECBs to be utilized for either book-to-book routing or routing within the same processor book. A table specific wiring scheme is provided for coupling the ECBs running off the chips of one MCM to the chips of the second MCM on the processor book so that the chips of the first MCM are connected directly to the chips of a second MCM that is logically furthest away and vice versa. Once the wiring of the ECBs are completed according to the wiring scheme, the operational and functional characteristics reflect those of a processor book configured for technical workloads.

Type: Grant

Filed: April 28, 2003

Date of Patent: April 28, 2009

Assignee: International Business Machines Corporation

Inventors: Ravi Kumar Arimilli, Vicente Enrique Chung, Jody Bern Joyner, Jerry Don Lewis
MIXED TORUS AND HYPERCUBE MULTI-RANK TENSOR EXPANSION METHOD

Publication number: 20090024829

Abstract: The present invention provides a mixed torus and hypercube multi-rank tensor expansion method which can be applied to the communication subsystem of a parallel processing system. The said expansion method is based on the conventional torus and hypercube topologies. A mixed torus and hypercube multi-rank tensor expansion interconnection network is built up by means of supernodes equipped with expansion interfaces. This method not only provides more bisection bandwidth to the entire system but also improves the long-range communication and global operations. Affirmatively, this expansion method can achieve better scalability and flexibility for the parallel system for a given system size.

Type: Application

Filed: June 19, 2008

Publication date: January 22, 2009

Inventors: YUEFAN DENG, Peng Zhang
ULTRASCALABLE PETAFLOP PARALLEL SUPERCOMPUTER

Publication number: 20090006808

Abstract: A novel massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. Novel use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.

Type: Application

Filed: June 26, 2007

Publication date: January 1, 2009

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Matthias A. Blumrich, Dong Chen, George Chiu, Thomas M. Cipolla, Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Shawn Hall, Rudolf A. Haring, Philip Heidelberger, Gerard V. Kopcsay, Martin Ohmacht, Valentina Salapura, Krishnan Sugavanam, Todd Takken
Conditional execution of instructions in a computer

Patent number: 7441098

Abstract: A method of executing instructions in a computer system on operands containing a plurality of packed objects in respective lanes of the operand is described. Each instruction defines an operation and contains a condition setting indicator settable independently of the operation. The status of the condition setting indicator determines whether or not multibit condition codes are set. When they are to be set, they are set depending on the results for carrying out the operation for each lane.

Type: Grant

Filed: May 6, 2005

Date of Patent: October 21, 2008

Assignee: Broadcom Corporation

Inventor: Sophie Wilson
DATA PROCESSING SYSTEM WITH BACKPLANE AND PROCESSOR BOOKS CONFIGURABLE TO SUPPPRT BOTH TECHNICAL AND COMMERCIAL WORKLOADS

Publication number: 20080209163

Abstract: A processor book designed to support both commercial workloads and technical workloads based on a dynamic or static mechanism of reconfiguring the external wiring interconnect. The processor book is configured as a building block for commercial workload processing systems with external connector buses (ECBs). The processor book is also provided with routing logic to enable to ECBs to be utilized for either book-to-book routing or routing within the same processor book. A table specific wiring scheme is provided for coupling the ECBs running off the chips of one MCM to the chips of the second MCM on the processor book so that the chips of the first MCM are connected directly to the chips of a second MCM that is logically furthest away and vice versa. Once the wiring of the ECBs are completed according to the wiring scheme, the operational and functional characteristics reflect those of a processor book configured for technical workloads.

Type: Application

Filed: May 9, 2008

Publication date: August 28, 2008

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Ravi Kumar Arimilli, Vicente Enrique Chung, Jody Bern Joyner, Jerry Don Lewis
Fault tolerance in a supercomputer through dynamic repartitioning

Patent number: 7185226

Abstract: A multiprocessor, parallel computer is made tolerant to hardware failures by providing extra groups of redundant standby processors and by designing the system so that these extra groups of processors can be swapped with any group which experiences a hardware failure. This swapping can be under software control, thereby permitting the entire computer to sustain a hardware failure but, after swapping in the standby processors, to still appear to software as a pristine, fully functioning system.

Type: Grant

Filed: February 25, 2002

Date of Patent: February 27, 2007

Assignee: International Business Machines Corporation

Inventors: Dong Chen, Paul W. Coteus, Alan G. Gara, Todd E. Takken
Method and apparatus for processing unit synchronization for scalable parallel processing

Patent number: 7103639

Abstract: The present invention flexibly manages the formation of a partition from a plurality of independently executing cells (discrete hardware entities comprising system resources) in preparation for the instantiation of an operating system instance upon the partition. Specifically, the invention manages configuration activities that occur to transition from having individual cells acting independently, and having cells rendezvous, to having cells become interdependent to continue operations as a partition. The invention manages the partitioning forming process such that no single point of failure disrupts the process. Instead, the invention is implemented as a distributed application wherein individual cells independently execute instructions based upon respective copies of the complex profile (a “map” of the complex configuration). Also, the invention adapts to a degree of delay associated with certain cells becoming ready to join the formation or rendezvous process.

Type: Grant

Filed: December 5, 2000

Date of Patent: September 5, 2006

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: Andrew C. Walton, Guy L. Kuntz
Irregular network

Patent number: 7043562

Abstract: Irregularities are provided in at least one dimension of a torus or mesh network for lower average path length and lower maximum channel load while increasing tolerance for omitted end-around connections. In preferred embodiments, all nodes supported on each backplane are connected in a single cycle which includes nodes on opposite sides of lower dimension tori. The cycles in adjacent backplanes hop different numbers of nodes.

Type: Grant

Filed: June 9, 2003

Date of Patent: May 9, 2006

Assignee: Avivi Systems, Inc.

Inventors: William J. Dally, William F. Mann, Philip P. Carvey
Scalable hypercube multiprocessor network for massive parallel processing

Patent number: 6973559

Abstract: A system and method for interconnecting a plurality of processing element nodes within a scalable multiprocessor system is provided. Each processing element node includes at least one processor and memory. A scalable interconnect network includes physical communication links interconnecting the processing element nodes in a cluster. A first set of routers in the scalable interconnect network route messages between the plurality of processing element nodes. One or more metarouters in the scalable interconnect network route messages between the first set of routers so that each one of the routers in a first cluster is connected to all other clusters through one or more metarouters.

Type: Grant

Filed: September 29, 1999

Date of Patent: December 6, 2005

Assignee: Silicon Graphics, Inc.

Inventors: Martin M. Deneroff, Gregory M. Thorson, Randal S. Passint
Autonomous signal processing resource for selective series processing of data in transit on communications paths in multi-processor arrangements

Patent number: 6898657

Abstract: A multi-processor arrangement having an interprocessor communication path between each of every possible pair of processors, in addition to I/O paths to and from the arrangement, having signal processing functions configurably embedded in series with the communication paths and/or the I/O paths. Each processor is provided with a local memory which can be accessed by the local processor as well as by the other processors via the communications paths. This allows for efficient data movement from one processor's local memory to another processor's local memory, such as commonly done during signal processing corner turning operations. Configurable signal processing logic may be configured to host one or more signal processing functions which allow data to be autonomously accessed from the processor local memories, processed, and re-deposited in a local memory.

Type: Grant

Filed: December 16, 2002

Date of Patent: May 24, 2005

Assignee: Tera Force Technology Corp.

Inventor: Winthrop W. Smith
Signal processing arrangement

Patent number: 6873287

Abstract: The present invention relates to a method and an arrangement suitable for embedded signal processing, comprising a number of computational units (100), each computational unit comprising a number of processing elements (20) capable of working independently and transmitting data simultaneously. Said computational units are arranged in clusters, work independently, and transmit data simultaneously, and that said processing elements (20) are globally and regularly inter-connected optically in a hypercube topology and transformed into a planar waveguide.

Type: Grant

Filed: November 1, 2001

Date of Patent: March 29, 2005

Assignee: Telefonaktiebolaget LM Ericsson

Inventor: Häkan Forsberg
Methods and apparatus for manifold array processing

Patent number: 6769056

Abstract: A manifold array topology includes processing elements, nodes, memories or the like arranged in clusters. Clusters are connected by cluster switch arrangements which advantageously allow changes of organization without physical rearrangement of processing elements. A significant reduction in the typical number of interconnections for preexisting arrays is also achieved. Fast, efficient and cost effective processing and communication result with the added benefit of ready scalability.

Type: Grant

Filed: September 24, 2002

Date of Patent: July 27, 2004

Assignee: PTS Corporation

Inventors: Edwin F. Barry, Thomas L. Drabenstott, Gerald G. Pechanek, Nikos P. Pitsianis
Instruction packing for an advanced microprocessor

Patent number: 6754892

Abstract: A process for packing an instruction word including providing a word value representing an instruction word into which an operation is to be fit be equal to some initial value having a plurality of portions representing constraints, operating on the initial value of the value word with operation class values having a plurality of portions representing constraints of a new operation as the new operation is attempted to be fit into the instruction to affect the processor word value in a manner to indicate when the limit of any constraint for the instruction is reached, and determining a violation of any constraint to determine that the new operation does not fit the format.

Type: Grant

Filed: December 15, 1999

Date of Patent: June 22, 2004

Assignee: Transmeta Corporation

Inventor: Stephen C. Johnson
Single descriptor scatter gather data transfer to or from a host processor

Patent number: 6754735

Abstract: A processing system includes a processing device and a host processor operatively coupled to the processing device via a system bus, and implements a scatter gather data transfer technique. The host processor is configurable to control the transfer of information to or from scattered or non-contiguous memory locations in a memory associated with the processing device, utilizing a data structure comprising a single descriptor. An information transfer bandwidth of the system bus is thereby more efficiently utilized than if a separate descriptor were used for transfer of information involving each of the non-contiguous memory locations.

Type: Grant

Filed: December 21, 2001

Date of Patent: June 22, 2004

Assignee: Agere Systems Inc.

Inventors: Prachi Kale, Stephen H. Miller, Abraham Prasad, Narender R. Vangati
Fault-tolerant, highly-scalable cell switching architecture

Patent number: 6741552

Abstract: Generally speaking, the cell switching architecture of the present invention offers a powerful, simple, and in many ways elegant solution to the problem of providing cost-effective, high-bandwidth, fault-tolerant cell switching. The architecture is based on a network of switching elements connected in a hypercube topology to form a switch fabric. The generalized hypercube is D dimensional, where D≧3 when all radices in the radix set are 2 and D≧2 when at least one of the radices is greater than 2. A fully-populated switch is fully symmetric: each switching element has the same number and kind of connections to both its neighbors and to the outside world as every other switching element. In an exemplary embodiment, each switching element is connected to one data source and one data sink, e.g., a Utopia bus or other broadband connection. In the same exemplary embodiment, links between switching elements are bidirectional and synchronous, operating in accordance with a Cell Exchange Cycle (CEC).

Type: Grant

Filed: February 12, 1998

Date of Patent: May 25, 2004

Assignee: PMC Sierra Inertnational, Inc.

Inventors: Carl McCrosky, Jeff S. Roe, Ian G. Barrett, Ken Sailor

prev 1 2 3 next