Patents by Inventor Alan G. Gara

Alan G. Gara has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 7761687
    Abstract: A massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. The use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.
    Type: Grant
    Filed: June 26, 2007
    Date of Patent: July 20, 2010
    Assignee: International Business Machines Corporation
    Inventors: Matthias A. Blumrich, Dong Chen, George Chiu, Thomas M. Cipolla, Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Shawn Hall, Rudolf A. Haring, Philip Heidelberger, Gerard V. Kopcsay, Martin Ohmacht, Valentina Salapura, Krishnan Sugavanam, Todd Takken
  • Patent number: 7694035
    Abstract: A parallel computer system is constructed as a network of interconnected compute nodes. Each of the compute nodes includes at least one processor, a memory and a DMA engine. The DMA engine includes a processor interface for interfacing with the at least one processor, DMA logic, a memory interface for interfacing with the memory, a DMA network interface for interfacing with the network, injection and reception byte counters, injection and reception FIFO metadata, and status registers and control registers. The injection FIFOs maintain memory locations of the injection FIFO metadata memory locations including its current head and tail, and the reception FIFOs maintain the reception FIFO metadata memory locations including its current head and tail. The injection byte counters and reception byte counters may be shared between messages.
    Type: Grant
    Filed: June 26, 2007
    Date of Patent: April 6, 2010
    Assignee: International Business Machines Corporation
    Inventors: Dong Chen, Alan G. Gara, Philip Heidelberger, Pavlos Vranas
  • Patent number: 7688931
    Abstract: A hybrid counter array device for counting events. The hybrid counter array includes a first counter portion comprising N counter devices, each counter device for receiving signals representing occurrences of events from an event source and providing a first count value corresponding to a lower order bits of the hybrid counter array. The hybrid counter array includes a second counter portion comprising a memory array device having N addressable memory locations in correspondence with the N counter devices, each addressable memory location for storing a second count value representing higher order bits of the hybrid counter array. A control device monitors each of the N counter devices of the first counter portion and initiates updating a value of a corresponding second count value stored at the corresponding addressable memory location in the second counter portion. Thus, a combination of the first and second count values provide an instantaneous measure of number of events received.
    Type: Grant
    Filed: May 14, 2008
    Date of Patent: March 30, 2010
    Assignee: International Business Machines Corporation
    Inventors: Alan G. Gara, Valentina Salapura
  • Patent number: 7650434
    Abstract: A system and method for enabling high-speed, low-latency global tree network communications among processing nodes interconnected according to a tree network structure. The global tree network enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices are included that interconnect the nodes of the tree via links to facilitate performance of low-latency global processing operations at nodes of the virtual tree and sub-tree structures. The global operations performed include one or more of: broadcast operations downstream from a root node to leaf nodes of a virtual tree, reduction operations upstream from leaf nodes to the root node in the virtual tree, and point-to-point message passing from any node to the root node.
    Type: Grant
    Filed: February 25, 2002
    Date of Patent: January 19, 2010
    Assignee: International Business Machines Corporation
    Inventors: Matthias A. Blumrich, Dong Chen, Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Dirk Hoenicke, Burkhard D. Steinmacher-Burow, Todd E. Takken, Pavlos M. Vranas
  • Publication number: 20090313439
    Abstract: A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.
    Type: Application
    Filed: August 19, 2009
    Publication date: December 17, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Matthias A. Blumrich, Dong Chen, Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Dirk Hoenicke, Martin Ohmacht
  • Patent number: 7620756
    Abstract: A method and apparatus for transferring wide data (e.g., n bits) from a narrow bus (m bits, where m<n) for updating a wide data storage array. The apparatus includes: a staging latch accommodating m bits, e.g., 32 bits; control circuitry for depositing the m bits of data from a data bus port into the staging latch addressed using a specific register address; and control circuitry adapted to merging the m bit data contained in the staging latch with m bit data from a data bus port, to generate the n bit wide data, for example, 64 bits, that is written atomically to a storage array specified by an address corresponding to a storage array location.
    Type: Grant
    Filed: August 21, 2006
    Date of Patent: November 17, 2009
    Assignee: International Business Machines Corporation
    Inventors: Alan G. Gara, Michael K. Gschwind, Valentina Salapura
  • Patent number: 7617366
    Abstract: A method and apparatus for detecting a cache wrap condition in a computing environment having a processor and a cache. A cache wrap condition is detected when the entire contents of a cache have been replaced, relative to a particular starting state. A set-associative cache is considered to have wrapped when all of the sets within the cache have been replaced. The starting point for cache wrap detection is the state of the cache sets at the time of the previous cache wrap. The method and apparatus is preferably implemented in a snoop filter having filter mechanisms that rely upon detecting the cache wrap condition. These snoop filter mechanisms requiring this information are operatively coupled with cache wrap detection logic adapted to detect the cache wrap event, and perform an indication step to the snoop filter mechanisms. In the various embodiments, cache wrap detection logic is implemented using registers and comparators, loadable counters, or a scoreboard data structure.
    Type: Grant
    Filed: May 1, 2008
    Date of Patent: November 10, 2009
    Assignee: International Business Machines Corporation
    Inventors: Matthias A. Blumrich, Alan G. Gara, Mark E. Giampapa, Martin Ohmacht, Valentina Salapura
  • Publication number: 20090259713
    Abstract: A novel massively parallel supercomputer of hundreds of teraOPS-scale includes node architectures based upon System-On-a-Chip technology, i.e., each processing node comprises a single Application Specific Integrated Circuit (ASIC). Within each ASIC node is a plurality of processing elements each of which consists of a central processing unit (CPU) and plurality of floating point processors to enable optimal balance of computational performance, packaging density, low cost, and power and cooling requirements. The plurality of processors within a single node may be used individually or simultaneously to work on any combination of computation or communication as required by the particular algorithm being solved or executed at any point in time. The system-on-a-chip ASIC nodes are interconnected by multiple independent networks that optimally maximizes packet communications throughput and minimizes latency.
    Type: Application
    Filed: June 26, 2009
    Publication date: October 15, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Matthias A. Blumrich, Dong Chen, George L. Chiu, Thomas M. Cipolla, Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Gerard V. Kopcsay, Lawrence S. Mok, Todd E. Takken
  • Patent number: 7603523
    Abstract: A method and apparatus for supporting cache coherency in a multiprocessor computing environment having multiple processing units, each processing unit having one or more local cache memories associated and operatively connected therewith. The method comprises providing a snoop filter device associated with each processing unit, each snoop filter device having a plurality of dedicated input ports for receiving snoop requests from dedicated memory writing sources in the multiprocessor computing environment. Each of the memory writing sources is directly connected to the dedicated input ports of all other snoop filter devices associated with all other processing units in a point-to-point interconnect fashion.
    Type: Grant
    Filed: February 21, 2008
    Date of Patent: October 13, 2009
    Assignee: International Business Machines Corporation
    Inventors: Matthias A. Blumrich, Dong Chen, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Dirk I. Hoenicke, Martin Ohmacht, Valentina Salapura, Pavlos M. Vranas
  • Patent number: 7603524
    Abstract: A method and apparatus for implementing a snoop filter unit associated with a single processor in a multiprocessor system. The snoop filter unit has a plurality of ports, each port receiving snoop requests from exactly one dedicated source. Associated with each port is a snoop cache filter that processes each snoop cache request and records addresses of the most recent snoop requests for exactly one source. The snoop cache filter uses vector encoding to record the occurrence of snoop requests for a sequence of consecutive cache lines. All addresses of snoop requests are added to the snoop cache unless a received snoop cache request matches an entry present in the associated snoop cache, in which case the snoop request is discarded. Otherwise, the associated snoop cache request is enqueued for forwarding to the single processor.
    Type: Grant
    Filed: March 5, 2008
    Date of Patent: October 13, 2009
    Assignee: International Business Machines Corporation
    Inventors: Matthias A. Blumrich, Alan G. Gara, Valentina Salapura
  • Patent number: 7587516
    Abstract: Class network routing is implemented in a network such as a computer network comprising a plurality of parallel compute processors at nodes thereof. Class network routing allows a compute processor to broadcast a message to a range (one or more) of other compute processors in the computer network, such as processors in a column or a row. Normally this type of operation requires a separate message to be sent to each processor. With class network routing pursuant to the invention, a single message is sufficient, which generally reduces the total number of messages in the network as well as the latency to do a broadcast. Class network routing is also applied to dense matrix inversion algorithms on distributed memory parallel supercomputers with hardware class function (multicast) capability. This is achieved by exploiting the fact that the communication patterns of dense matrix inversion can be served by hardware class functions, which results in faster execution times.
    Type: Grant
    Filed: February 25, 2002
    Date of Patent: September 8, 2009
    Assignee: International Business Machines Corporation
    Inventors: Gyan Bhanot, Matthias A. Blumrich, Dong Chen, Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Burkhard D. Steinmacher-Burow, Todd E. Takken, Pavlos M. Vranas
  • Patent number: 7555566
    Abstract: A novel massively parallel supercomputer of hundreds of teraOPS-scale includes node architectures based upon System-On-a-Chip technology, i.e., each processing node comprises a single Application Specific Integrated Circuit (ASIC). Within each ASIC node is a plurality of processing elements each of which consists of a central processing unit (CPU) and plurality of floating point processors to enable optimal balance of computational performance, packaging density, low cost, and power and cooling requirements. The plurality of processors within a single node may be used individually or simultaneously to work on any combination of computation or communication as required by the particular algorithm being solved or executed at any point in time. The system-on-a-chip ASIC nodes are interconnected by multiple independent networks that optimally maximizes packet communications throughput and minimizes latency.
    Type: Grant
    Filed: February 25, 2002
    Date of Patent: June 30, 2009
    Assignee: International Business Machines Corporation
    Inventors: Matthias A. Blumrich, Dong Chen, George L. Chiu, Thomas M. Cipolla, Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Gerard V. Kopcsay, Lawrence S. Mok, Todd E. Takken
  • Patent number: 7532700
    Abstract: A hybrid counter array device for counting events. The hybrid counter array includes a first counter portion comprising N counter devices, each counter device for receiving signals representing occurrences of events from an event source and providing a first count value corresponding to a lower order bits of the hybrid counter array. The hybrid counter array includes a second counter portion comprising a memory array device having N addressable memory locations in correspondence with the N counter devices, each addressable memory location for storing a second count value representing higher order bits of the hybrid counter array. A control device monitors each of the N counter devices of the first counter portion and initiates updating a value of a corresponding second count value stored at the corresponding addressable memory location in the second counter portion. Thus, a combination of the first and second count values provide an instantaneous measure of number of events received.
    Type: Grant
    Filed: August 21, 2006
    Date of Patent: May 12, 2009
    Assignee: International Business Machines Corporation
    Inventors: Alan G. Gara, Valentina Salapura
  • Publication number: 20090116610
    Abstract: A hybrid counter array device for counting events with interrupt indication includes a first counter portion comprising N counter devices, each for counting signals representing event occurrences and providing a first count value representing lower order bits. An overflow bit device associated with each respective counter device is additionally set in response to an overflow condition. The hybrid counter array includes a second counter portion comprising a memory array device having N addressable memory locations in correspondence with the N counter devices, each addressable memory location for storing a second count value representing higher order bits. An operatively coupled control device monitors each associated overflow bit device and initiates incrementing a second count value stored at a corresponding memory location in response to a respective overflow bit being set.
    Type: Application
    Filed: May 30, 2008
    Publication date: May 7, 2009
    Applicant: International Business Machines Corporation
    Inventors: Alan G. Gara, Valentina Salapura
  • Publication number: 20090116611
    Abstract: A hybrid counter array device for counting events. The hybrid counter array includes a first counter portion comprising N counter devices, each counter device for receiving signals representing occurrences of events from an event source and providing a first count value corresponding to a lower order bits of the hybrid counter array. The hybrid counter array includes a second counter portion comprising a memory array device having N addressable memory locations in correspondence with the N counter devices, each addressable memory location for storing a second count value representing higher order bits of the hybrid counter array. A control device monitors each of the N counter devices of the first counter portion and initiates updating a value of a corresponding second count value stored at the corresponding addressable memory location in the second counter portion. Thus, a combination of the first and second count values provide an instantaneous measure of number of events received.
    Type: Application
    Filed: May 14, 2008
    Publication date: May 7, 2009
    Applicant: International Business Machines Corporation
    Inventors: Alan G. Gara, Valentina Salapura
  • Patent number: 7529895
    Abstract: A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple perfecting for non-contiguous data structures is also disclosed.
    Type: Grant
    Filed: December 28, 2006
    Date of Patent: May 5, 2009
    Assignee: International Business Machines Corporation
    Inventors: Matthias A. Blumrich, Dong Chen, Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Dirk Hoenicke, Martin Ohmacht, Burkhard D. Steinmacher-Burow, Todd E. Takken, Pavlos M. Vranas
  • Publication number: 20090077571
    Abstract: A system for monitoring a large number of simultaneous events implements a hybrid counter array device having a first counter portion comprising counter devices, each counter device for receiving signals representing occurrences of events from an event source and providing a first count value corresponding to a lower order bits of the hybrid counter array. A second counter portion comprises a memory array device having addressable memory locations in correspondence with the counter devices, each addressable memory location for storing a second count value representing higher order bits. A control device monitors each of the counter devices and initiates updating a value of a corresponding second count value stored at the corresponding addressable memory location. The system includes interrupt pre-indication for providing fast interrupt trigger to a processor device when a count value related to an event equals a threshold value.
    Type: Application
    Filed: November 26, 2008
    Publication date: March 19, 2009
    Applicant: International Business Machines Corporation
    Inventors: Alan G. Gara, Michael K. Gschwind, Valentina Salapura
  • Patent number: 7486619
    Abstract: Multidimensional switch data networks are disclosed, such as are used by a distributed-memory parallel computer, as applied for example to computations in the field of life sciences. A distributed memory parallel computing system comprises a number of parallel compute nodes and a message passing data network connecting the compute nodes together. The data network connecting the compute nodes comprises a multidimensional switch data network of compute nodes having N dimensions, and a number/array of compute nodes Ln in each of the N dimensions. Each compute node includes an N port routing element having a port for each of the N dimensions. Each compute node of an array of Ln compute nodes in each of the N dimensions connects through a port of its routing element to an Ln port crossbar switch having Ln ports. Several embodiments are disclosed of a 4 dimensional computing system having 65,536 compute nodes.
    Type: Grant
    Filed: March 4, 2004
    Date of Patent: February 3, 2009
    Assignee: International Business Machines Corporation
    Inventors: Dong Chen, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Dirk Hoenicke, Burkhard D. Steinmacher-Burow, Pavlos M. Vranas, Matthias Augustin Blumrich
  • Publication number: 20090006730
    Abstract: An apparatus and method for providing a data eye monitor. The data eye monitor apparatus utilizes an inverter/latch string circuit and a set of latches to save the data eye for providing an infinite persistent data eye. In operation, incoming read data signals are adjusted in the first stage individually and latched to provide the read data to the requesting unit. The data is also simultaneously fed into a balanced XOR tree to combine the transitions of all incoming read data signals into a single signal. This signal is passed along a delay chain and tapped at constant intervals. The tap points are fed into latches, capturing the transitions at a delay element interval resolution. Using XORs, differences between adjacent taps and therefore transitions are detected. The eye is defined by segments that show no transitions over a series of samples. The eye size and position can be used to readjust the delay of incoming signals and/or to control environment parameters like voltage, clock speed and temperature.
    Type: Application
    Filed: June 26, 2007
    Publication date: January 1, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Alan G. Gara, James A. Marcella, Martin Ohmacht
  • Publication number: 20090006718
    Abstract: A programmable memory system and method for enabling one or more processor devices access to shared memory in a computing environment, the shared memory including one or more memory storage structures having addressable locations for storing data. The system comprises: one or more first logic devices associated with a respective one or more processor devices, each first logic device for receiving physical memory address signals and programmable for generating a respective memory storage structure select signal upon receipt of pre-determined address bit values at selected physical memory address bit locations; and, a second logic device responsive to each the respective select signal for generating an address signal used for selecting a memory storage structure for processor access. The system thus enables each processor device of a computing environment memory storage access distributed across the one or more memory storage structures.
    Type: Application
    Filed: June 26, 2007
    Publication date: January 1, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Matthias A. Blumrich, Dong Chen, Alan G. Gara, Mark E. Giampapa, Dirk Hoenicke, Martin Ohmacht, Valentina Salapura, Krishnan Sugavanam