Patents by Inventor Alan G. Gara

Alan G. Gara has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Ultrascalable petaflop parallel supercomputer

Patent number: 7761687

Abstract: A massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. The use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.

Type: Grant

Filed: June 26, 2007

Date of Patent: July 20, 2010

Assignee: International Business Machines Corporation

Inventors: Matthias A. Blumrich, Dong Chen, George Chiu, Thomas M. Cipolla, Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Shawn Hall, Rudolf A. Haring, Philip Heidelberger, Gerard V. Kopcsay, Martin Ohmacht, Valentina Salapura, Krishnan Sugavanam, Todd Takken
DMA shared byte counters in a parallel computer

Patent number: 7694035

Abstract: A parallel computer system is constructed as a network of interconnected compute nodes. Each of the compute nodes includes at least one processor, a memory and a DMA engine. The DMA engine includes a processor interface for interfacing with the at least one processor, DMA logic, a memory interface for interfacing with the memory, a DMA network interface for interfacing with the network, injection and reception byte counters, injection and reception FIFO metadata, and status registers and control registers. The injection FIFOs maintain memory locations of the injection FIFO metadata memory locations including its current head and tail, and the reception FIFOs maintain the reception FIFO metadata memory locations including its current head and tail. The injection byte counters and reception byte counters may be shared between messages.

Type: Grant

Filed: June 26, 2007

Date of Patent: April 6, 2010

Assignee: International Business Machines Corporation

Inventors: Dong Chen, Alan G. Gara, Philip Heidelberger, Pavlos Vranas
Space and power efficient hybrid counters array

Patent number: 7688931

Abstract: A hybrid counter array device for counting events. The hybrid counter array includes a first counter portion comprising N counter devices, each counter device for receiving signals representing occurrences of events from an event source and providing a first count value corresponding to a lower order bits of the hybrid counter array. The hybrid counter array includes a second counter portion comprising a memory array device having N addressable memory locations in correspondence with the N counter devices, each addressable memory location for storing a second count value representing higher order bits of the hybrid counter array. A control device monitors each of the N counter devices of the first counter portion and initiates updating a value of a corresponding second count value stored at the corresponding addressable memory location in the second counter portion. Thus, a combination of the first and second count values provide an instantaneous measure of number of events received.

Type: Grant

Filed: May 14, 2008

Date of Patent: March 30, 2010

Assignee: International Business Machines Corporation

Inventors: Alan G. Gara, Valentina Salapura
Global tree network for computing structures enabling global processing operations

Patent number: 7650434

Abstract: A system and method for enabling high-speed, low-latency global tree network communications among processing nodes interconnected according to a tree network structure. The global tree network enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices are included that interconnect the nodes of the tree via links to facilitate performance of low-latency global processing operations at nodes of the virtual tree and sub-tree structures. The global operations performed include one or more of: broadcast operations downstream from a root node to leaf nodes of a virtual tree, reduction operations upstream from leaf nodes to the root node in the virtual tree, and point-to-point message passing from any node to the root node.

Type: Grant

Filed: February 25, 2002

Date of Patent: January 19, 2010

Assignee: International Business Machines Corporation

Inventors: Matthias A. Blumrich, Dong Chen, Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Dirk Hoenicke, Burkhard D. Steinmacher-Burow, Todd E. Takken, Pavlos M. Vranas
MANAGING COHERENCE VIA PUT/GET WINDOWS

Publication number: 20090313439

Abstract: A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.

Type: Application

Filed: August 19, 2009

Publication date: December 17, 2009

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Matthias A. Blumrich, Dong Chen, Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Dirk Hoenicke, Martin Ohmacht
Method and apparatus for updating wide storage array over a narrow bus

Patent number: 7620756

Abstract: A method and apparatus for transferring wide data (e.g., n bits) from a narrow bus (m bits, where m<n) for updating a wide data storage array. The apparatus includes: a staging latch accommodating m bits, e.g., 32 bits; control circuitry for depositing the m bits of data from a data bus port into the staging latch addressed using a specific register address; and control circuitry adapted to merging the m bit data contained in the staging latch with m bit data from a data bus port, to generate the n bit wide data, for example, 64 bits, that is written atomically to a storage array specified by an address corresponding to a storage array location.

Type: Grant

Filed: August 21, 2006

Date of Patent: November 17, 2009

Assignee: International Business Machines Corporation

Inventors: Alan G. Gara, Michael K. Gschwind, Valentina Salapura
Method and apparatus for filtering snoop requests using mulitiple snoop caches

Patent number: 7617366

Abstract: A method and apparatus for detecting a cache wrap condition in a computing environment having a processor and a cache. A cache wrap condition is detected when the entire contents of a cache have been replaced, relative to a particular starting state. A set-associative cache is considered to have wrapped when all of the sets within the cache have been replaced. The starting point for cache wrap detection is the state of the cache sets at the time of the previous cache wrap. The method and apparatus is preferably implemented in a snoop filter having filter mechanisms that rely upon detecting the cache wrap condition. These snoop filter mechanisms requiring this information are operatively coupled with cache wrap detection logic adapted to detect the cache wrap event, and perform an indication step to the snoop filter mechanisms. In the various embodiments, cache wrap detection logic is implemented using registers and comparators, loadable counters, or a scoreboard data structure.

Type: Grant

Filed: May 1, 2008

Date of Patent: November 10, 2009

Assignee: International Business Machines Corporation

Inventors: Matthias A. Blumrich, Alan G. Gara, Mark E. Giampapa, Martin Ohmacht, Valentina Salapura
NOVEL MASSIVELY PARALLEL SUPERCOMPUTER

Publication number: 20090259713

Abstract: A novel massively parallel supercomputer of hundreds of teraOPS-scale includes node architectures based upon System-On-a-Chip technology, i.e., each processing node comprises a single Application Specific Integrated Circuit (ASIC). Within each ASIC node is a plurality of processing elements each of which consists of a central processing unit (CPU) and plurality of floating point processors to enable optimal balance of computational performance, packaging density, low cost, and power and cooling requirements. The plurality of processors within a single node may be used individually or simultaneously to work on any combination of computation or communication as required by the particular algorithm being solved or executed at any point in time. The system-on-a-chip ASIC nodes are interconnected by multiple independent networks that optimally maximizes packet communications throughput and minimizes latency.

Type: Application

Filed: June 26, 2009

Publication date: October 15, 2009

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Matthias A. Blumrich, Dong Chen, George L. Chiu, Thomas M. Cipolla, Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Gerard V. Kopcsay, Lawrence S. Mok, Todd E. Takken
Method and apparatus for filtering snoop requests in a point-to-point interconnect architecture

Patent number: 7603523

Abstract: A method and apparatus for supporting cache coherency in a multiprocessor computing environment having multiple processing units, each processing unit having one or more local cache memories associated and operatively connected therewith. The method comprises providing a snoop filter device associated with each processing unit, each snoop filter device having a plurality of dedicated input ports for receiving snoop requests from dedicated memory writing sources in the multiprocessor computing environment. Each of the memory writing sources is directly connected to the dedicated input ports of all other snoop filter devices associated with all other processing units in a point-to-point interconnect fashion.

Type: Grant

Filed: February 21, 2008

Date of Patent: October 13, 2009

Assignee: International Business Machines Corporation

Inventors: Matthias A. Blumrich, Dong Chen, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Dirk I. Hoenicke, Martin Ohmacht, Valentina Salapura, Pavlos M. Vranas
Method and apparatus for filtering snoop requests using multiple snoop caches

Patent number: 7603524

Abstract: A method and apparatus for implementing a snoop filter unit associated with a single processor in a multiprocessor system. The snoop filter unit has a plurality of ports, each port receiving snoop requests from exactly one dedicated source. Associated with each port is a snoop cache filter that processes each snoop cache request and records addresses of the most recent snoop requests for exactly one source. The snoop cache filter uses vector encoding to record the occurrence of snoop requests for a sequence of consecutive cache lines. All addresses of snoop requests are added to the snoop cache unless a received snoop cache request matches an entry present in the associated snoop cache, in which case the snoop request is discarded. Otherwise, the associated snoop cache request is enqueued for forwarding to the single processor.

Type: Grant

Filed: March 5, 2008

Date of Patent: October 13, 2009

Assignee: International Business Machines Corporation

Inventors: Matthias A. Blumrich, Alan G. Gara, Valentina Salapura
Class network routing

Patent number: 7587516

Abstract: Class network routing is implemented in a network such as a computer network comprising a plurality of parallel compute processors at nodes thereof. Class network routing allows a compute processor to broadcast a message to a range (one or more) of other compute processors in the computer network, such as processors in a column or a row. Normally this type of operation requires a separate message to be sent to each processor. With class network routing pursuant to the invention, a single message is sufficient, which generally reduces the total number of messages in the network as well as the latency to do a broadcast. Class network routing is also applied to dense matrix inversion algorithms on distributed memory parallel supercomputers with hardware class function (multicast) capability. This is achieved by exploiting the fact that the communication patterns of dense matrix inversion can be served by hardware class functions, which results in faster execution times.

Type: Grant

Filed: February 25, 2002

Date of Patent: September 8, 2009

Assignee: International Business Machines Corporation

Inventors: Gyan Bhanot, Matthias A. Blumrich, Dong Chen, Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Burkhard D. Steinmacher-Burow, Todd E. Takken, Pavlos M. Vranas
Massively parallel supercomputer

Patent number: 7555566

Abstract: A novel massively parallel supercomputer of hundreds of teraOPS-scale includes node architectures based upon System-On-a-Chip technology, i.e., each processing node comprises a single Application Specific Integrated Circuit (ASIC). Within each ASIC node is a plurality of processing elements each of which consists of a central processing unit (CPU) and plurality of floating point processors to enable optimal balance of computational performance, packaging density, low cost, and power and cooling requirements. The plurality of processors within a single node may be used individually or simultaneously to work on any combination of computation or communication as required by the particular algorithm being solved or executed at any point in time. The system-on-a-chip ASIC nodes are interconnected by multiple independent networks that optimally maximizes packet communications throughput and minimizes latency.

Type: Grant

Filed: February 25, 2002

Date of Patent: June 30, 2009

Assignee: International Business Machines Corporation

Inventors: Matthias A. Blumrich, Dong Chen, George L. Chiu, Thomas M. Cipolla, Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Gerard V. Kopcsay, Lawrence S. Mok, Todd E. Takken
Space and power efficient hybrid counters array

Patent number: 7532700

Abstract: A hybrid counter array device for counting events. The hybrid counter array includes a first counter portion comprising N counter devices, each counter device for receiving signals representing occurrences of events from an event source and providing a first count value corresponding to a lower order bits of the hybrid counter array. The hybrid counter array includes a second counter portion comprising a memory array device having N addressable memory locations in correspondence with the N counter devices, each addressable memory location for storing a second count value representing higher order bits of the hybrid counter array. A control device monitors each of the N counter devices of the first counter portion and initiates updating a value of a corresponding second count value stored at the corresponding addressable memory location in the second counter portion. Thus, a combination of the first and second count values provide an instantaneous measure of number of events received.

Type: Grant

Filed: August 21, 2006

Date of Patent: May 12, 2009

Assignee: International Business Machines Corporation

Inventors: Alan G. Gara, Valentina Salapura
LOW LATENCY COUNTER EVENT INDICATION

Publication number: 20090116610

Abstract: A hybrid counter array device for counting events with interrupt indication includes a first counter portion comprising N counter devices, each for counting signals representing event occurrences and providing a first count value representing lower order bits. An overflow bit device associated with each respective counter device is additionally set in response to an overflow condition. The hybrid counter array includes a second counter portion comprising a memory array device having N addressable memory locations in correspondence with the N counter devices, each addressable memory location for storing a second count value representing higher order bits. An operatively coupled control device monitors each associated overflow bit device and initiates incrementing a second count value stored at a corresponding memory location in response to a respective overflow bit being set.

Type: Application

Filed: May 30, 2008

Publication date: May 7, 2009

Applicant: International Business Machines Corporation

Inventors: Alan G. Gara, Valentina Salapura
SPACE AND POWER EFFICIENT HYBRID COUNTERS ARRAY

Publication number: 20090116611

Abstract: A hybrid counter array device for counting events. The hybrid counter array includes a first counter portion comprising N counter devices, each counter device for receiving signals representing occurrences of events from an event source and providing a first count value corresponding to a lower order bits of the hybrid counter array. The hybrid counter array includes a second counter portion comprising a memory array device having N addressable memory locations in correspondence with the N counter devices, each addressable memory location for storing a second count value representing higher order bits of the hybrid counter array. A control device monitors each of the N counter devices of the first counter portion and initiates updating a value of a corresponding second count value stored at the corresponding addressable memory location in the second counter portion. Thus, a combination of the first and second count values provide an instantaneous measure of number of events received.

Type: Application

Filed: May 14, 2008

Publication date: May 7, 2009

Applicant: International Business Machines Corporation

Inventors: Alan G. Gara, Valentina Salapura
Method for prefetching non-contiguous data structures

Patent number: 7529895

Abstract: A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple perfecting for non-contiguous data structures is also disclosed.

Type: Grant

Filed: December 28, 2006

Date of Patent: May 5, 2009

Assignee: International Business Machines Corporation

Inventors: Matthias A. Blumrich, Dong Chen, Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Dirk Hoenicke, Martin Ohmacht, Burkhard D. Steinmacher-Burow, Todd E. Takken, Pavlos M. Vranas
METHOD AND APPARATUS FOR EFFICIENT PERFORMANCE MONITORING OF A LARGE NUMBER OF SIMULTANEOUS EVENTS

Publication number: 20090077571

Abstract: A system for monitoring a large number of simultaneous events implements a hybrid counter array device having a first counter portion comprising counter devices, each counter device for receiving signals representing occurrences of events from an event source and providing a first count value corresponding to a lower order bits of the hybrid counter array. A second counter portion comprises a memory array device having addressable memory locations in correspondence with the counter devices, each addressable memory location for storing a second count value representing higher order bits. A control device monitors each of the counter devices and initiates updating a value of a corresponding second count value stored at the corresponding addressable memory location. The system includes interrupt pre-indication for providing fast interrupt trigger to a processor device when a count value related to an event equals a threshold value.

Type: Application

Filed: November 26, 2008

Publication date: March 19, 2009

Applicant: International Business Machines Corporation

Inventors: Alan G. Gara, Michael K. Gschwind, Valentina Salapura
Multidimensional switch network

Patent number: 7486619

Abstract: Multidimensional switch data networks are disclosed, such as are used by a distributed-memory parallel computer, as applied for example to computations in the field of life sciences. A distributed memory parallel computing system comprises a number of parallel compute nodes and a message passing data network connecting the compute nodes together. The data network connecting the compute nodes comprises a multidimensional switch data network of compute nodes having N dimensions, and a number/array of compute nodes Ln in each of the N dimensions. Each compute node includes an N port routing element having a port for each of the N dimensions. Each compute node of an array of Ln compute nodes in each of the N dimensions connects through a port of its routing element to an Ln port crossbar switch having Ln ports. Several embodiments are disclosed of a 4 dimensional computing system having 65,536 compute nodes.

Type: Grant

Filed: March 4, 2004

Date of Patent: February 3, 2009

Assignee: International Business Machines Corporation

Inventors: Dong Chen, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Dirk Hoenicke, Burkhard D. Steinmacher-Burow, Pavlos M. Vranas, Matthias Augustin Blumrich
DATA EYE MONITOR METHOD AND APPARATUS

Publication number: 20090006730

Abstract: An apparatus and method for providing a data eye monitor. The data eye monitor apparatus utilizes an inverter/latch string circuit and a set of latches to save the data eye for providing an infinite persistent data eye. In operation, incoming read data signals are adjusted in the first stage individually and latched to provide the read data to the requesting unit. The data is also simultaneously fed into a balanced XOR tree to combine the transitions of all incoming read data signals into a single signal. This signal is passed along a delay chain and tapped at constant intervals. The tap points are fed into latches, capturing the transitions at a delay element interval resolution. Using XORs, differences between adjacent taps and therefore transitions are detected. The eye is defined by segments that show no transitions over a series of samples. The eye size and position can be used to readjust the delay of incoming signals and/or to control environment parameters like voltage, clock speed and temperature.

Type: Application

Filed: June 26, 2007

Publication date: January 1, 2009

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Alan G. Gara, James A. Marcella, Martin Ohmacht
SYSTEM AND METHOD FOR PROGRAMMABLE BANK SELECTION FOR BANKED MEMORY SUBSYSTEMS

Publication number: 20090006718

Abstract: A programmable memory system and method for enabling one or more processor devices access to shared memory in a computing environment, the shared memory including one or more memory storage structures having addressable locations for storing data. The system comprises: one or more first logic devices associated with a respective one or more processor devices, each first logic device for receiving physical memory address signals and programmable for generating a respective memory storage structure select signal upon receipt of pre-determined address bit values at selected physical memory address bit locations; and, a second logic device responsive to each the respective select signal for generating an address signal used for selecting a memory storage structure for processor access. The system thus enables each processor device of a computing environment memory storage access distributed across the one or more memory storage structures.

Type: Application

Filed: June 26, 2007

Publication date: January 1, 2009

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Matthias A. Blumrich, Dong Chen, Alan G. Gara, Mark E. Giampapa, Dirk Hoenicke, Martin Ohmacht, Valentina Salapura, Krishnan Sugavanam

prev 1 2 3 4 5 6 7 next