Patents by Inventor Alan G. Gara

Alan G. Gara has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 7315877
    Abstract: The present in invention is directed to a method, system and program storage device for efficiently implementing a multidimensional Fast Fourier Transform (FFT) of a multidimensional array comprising a plurality of elements initially distributed in a multi-node computer system comprising a plurality of nodes in communication over a network, comprising: distributing the plurality of elements of the array in a first dimension across the plurality of nodes of the computer system over the network to facilitate a first one-dimensional FFT; performing the first one-dimensional FFT on the elements of the array distributed at each node in the first dimension; re-distributing the one-dimensional FFT-transformed elements at each node in a second dimension via “all-to-all” distribution in random order across other nodes of the computer system over the network; and performing a second one-dimensional FFT on elements of the array re-distributed at each node in the second dimension, wherein the random order facilitates eff
    Type: Grant
    Filed: February 25, 2002
    Date of Patent: January 1, 2008
    Assignee: International Business Machines Corporation
    Inventors: Gyan V. Bhanot, Dong Chen, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Burkhard D. Steinmacher-Burow, Pavlos M. Vranas
  • Patent number: 7313582
    Abstract: Methods and systems for performing arithmetic functions. In accordance with a first aspect of the invention, methods and apparatus are provided, working in conjunction of software algorithms and hardware implementation of class network routing, to achieve a very significant reduction in the time required for global arithmetic operation on the torus. Therefore, it leads to greater scalability of applications running on large parallel machines. The invention involves three steps in improving the efficiency and accuracy of global operations: (1) Ensuring, when necessary, that all the nodes do the global operation on the data in the same order and so obtain a unique answer, independent of roundoff error; (2) Using the topology of the torus to minimize the number of hops and the bidirectional capabilities of the network to reduce the number of time steps in the data transfer operation to an absolute minimum; and (3) Using class function routing to reduce latency in the data transfer.
    Type: Grant
    Filed: February 25, 2002
    Date of Patent: December 25, 2007
    Assignee: International Business Machines Corporation
    Inventors: Gyan Bhanot, Matthias A. Blumrich, Dong Chen, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Burkhard D. Steinmacher-Burow, Pavlos M. Vranas
  • Patent number: 7305487
    Abstract: In a massively parallel computing system having a plurality of nodes configured in m multi-dimensions, each node including a computing device, a method for routing packets towards their destination nodes is provided which includes generating at least one of a 2m plurality of compact bit vectors containing information derived from downstream nodes. A multilevel arbitration process in which downstream information stored in the compact vectors, such as link status information and fullness of downstream buffers, is used to determine a preferred direction and virtual channel for packet transmission. Preferred direction ranges are encoded and virtual channels are selected by examining the plurality of compact bit vectors. This dynamic routing method eliminates the necessity of routing tables, thus enhancing scalability of the switch.
    Type: Grant
    Filed: February 25, 2002
    Date of Patent: December 4, 2007
    Assignee: International Business Machines Corporation
    Inventors: Matthias A. Blumrich, Dong Chen, Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Burkhard D. Steinmacher-Burow, Todd E. Takken, Pavlos M. Vranas
  • Patent number: 7210088
    Abstract: A fault isolation technique for checking the accuracy of data packets transmitted between nodes of a parallel processor. An independent crc is kept of all data sent from one processor to another, and received from one processor to another. At the end of each checkpoint, the crcs are compared. If they do not match, there was an error. The crcs may be cleared and restarted at each checkpoint. In the preferred embodiment, the basic functionality is to calculate a CRC of all packet data that has been successfully transmitted across a given link. This CRC is done on both ends of the link, thereby allowing an independent check on all data believed to have been correctly transmitted. Preferably, all links have this CRC coverage, and the CRC used in this link level check is different from that used in the packet transfer protocol. This independent check, if successfully passed, virtually eliminates the possibility that any data errors were missed during the previous transfer period.
    Type: Grant
    Filed: February 25, 2002
    Date of Patent: April 24, 2007
    Assignee: International Business Machines Corporation
    Inventors: Dong Chen, Paul W. Coteus, Alan G. Gara
  • Patent number: 7185226
    Abstract: A multiprocessor, parallel computer is made tolerant to hardware failures by providing extra groups of redundant standby processors and by designing the system so that these extra groups of processors can be swapped with any group which experiences a hardware failure. This swapping can be under software control, thereby permitting the entire computer to sustain a hardware failure but, after swapping in the standby processors, to still appear to software as a pristine, fully functioning system.
    Type: Grant
    Filed: February 25, 2002
    Date of Patent: February 27, 2007
    Assignee: International Business Machines Corporation
    Inventors: Dong Chen, Paul W. Coteus, Alan G. Gara, Todd E. Takken
  • Patent number: 7174434
    Abstract: A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple prefetching for non-contiguous data structures is also disclosed.
    Type: Grant
    Filed: February 25, 2002
    Date of Patent: February 6, 2007
    Assignee: International Business Machines Corporation
    Inventors: Matthias A. Blumrich, Dong Chen, Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Dirk Hoenicke, Martin Ohmacht, Burkhard D. Steinmacher-Burow, Todd E. Takken, Pavlos M. Vranas
  • Patent number: 7149920
    Abstract: Disclosed are an error recovery method and system for use with a communication system having first and second nodes, each of said nodes having a receiver and a sender, the sender of the first node being connected to the receiver of the second node by a first cable, and the sender of the second node being connected to the receiver of the first node by a second cable. The method comprising the step of after one of the nodes detects an error, both of the nodes entering the same defined state. In particular, the receiver of the first node enters an error state, stays in the error state for a defined period of time T, and, after said defined period of time T, enters a wait state. Also, the sender of the first node sends to the receiver of the second node an error message for a defined period of time Te, and after the defined period of time Te, the sender of the first node enters an idle state.
    Type: Grant
    Filed: September 30, 2003
    Date of Patent: December 12, 2006
    Assignee: International Business Machines Corporation
    Inventors: Matthew A. Blumrich, Dong Chen, Alan G. Gara, Philip Heidelberger, Dirk I. Hoenicke, Burkhard D. Steinmacher-Burow, Pavlos M. Vranas
  • Patent number: 6895416
    Abstract: The present in invention is directed to a checkpointing filesystem of a distributed-memory parallel supercomputer comprising a node that accesses user data on the filesystem, the filesystem comprising an interface that is associated with a disk for storing the user data. The checkpointing filesystem provides for taking and checkpoint of the filesystem and rolling back to a previously taken checkpoint, as well as for writing user data to and deleting user data from the checkpointing filesystem.
    Type: Grant
    Filed: February 25, 2002
    Date of Patent: May 17, 2005
    Assignee: International Business Machines Corporation
    Inventors: Alan G. Gara, Mark E. Giampapa, Burkhard D. Steinmacher-Burow
  • Publication number: 20050002387
    Abstract: A one-bounce data network comprises a plurality of nodes interconnected to each other via communication links, the network including a plurality of interconnected switch devices, said switch devices interconnected such that a message is communicated between any two switches passes over a single link from a source switch to a destination switch; and, the source switch concurrently sends a message to an arbitrary bounce switch which then sends the message to the destination switch.
    Type: Application
    Filed: September 30, 2003
    Publication date: January 6, 2005
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Matthias A. Blumrich, Dong Chen, Alan G. Gara, Mark Giampapa, Philip Heidelberger, Burkhard Steinmacher-Burow, Pavlos Vranas
  • Publication number: 20040153754
    Abstract: A multiprocessor, parallel computer is made tolerant to hardware failures by providing extra groups of redundant standby processors and by designing the system so that these extra groups of processors can be swapped with any group which experiences a hardware failure. This swapping can be under software control, thereby permitting the entire computer to sustain a hardware failure but, after swapping in the standby processors, to still appear to software as a pristine, fully functioning system.
    Type: Application
    Filed: August 22, 2003
    Publication date: August 5, 2004
    Inventors: Dong Chen, Paul W. Coteus, Alan G. Gara, Todd E. Takken
  • Publication number: 20040103218
    Abstract: A novel massively parallel supercomputer of hundreds of teraOPS-scale includes node architectures based upon System-On-a-Chip technology, i.e., each processing node comprises a single Application Specific Integrated Circuit (ASIC). Within each ASIC node is a plurality of processing elements each of which consists of a central processing unit (CPU) and plurality of floating point processors to enable optimal balance of computational performance, packaging density, low cost, and power and cooling requirements. The plurality of processors within a single node may be used individually or simultaneously to work on any combination of computation or communication as required by the particular algorithm being solved or executed at any point in time. The system-on-a-chip ASIC nodes are interconnected by multiple independent networks that optimally maximizes packet communications throughput and minimizes latency.
    Type: Application
    Filed: August 22, 2003
    Publication date: May 27, 2004
    Inventors: Matthias A Blumrich, Dong Chen, George L Chiu, Thomas M Cipolla, Paul W Coteus, Alan G Gara, Mark E Giampapa, Philip Heidelberg, Gerard V Kopcsay, Lawrence S Mok, Todd E Takken
  • Publication number: 20040098660
    Abstract: A fault isolation technique for checking the accuracy of data packets transmitted between nodes of a parallel processor. An independent crc is kept of all data sent from one processor to another, and received from one processor to another. At the end of each checkpoint, the crcs are compared. If they do not match, there was an error. The crcs may be cleared and restarted at each checkpoint. In the preferred embodiment, the basic functionality is to calculate a CRC of all packet data that has been successfully transmitted across a given link. This CRC is done on both ends of the link, thereby allowing an independent check on all data believed to have been correctly transmitted. Preferably, all links have this CRC coverage, and the CRC used in this link level check is different from that used in the packet transfer protocol. This independent check, if successfully passed, virtually eliminates the possibility that any data errors were missed during the previous transfer period.
    Type: Application
    Filed: August 22, 2003
    Publication date: May 20, 2004
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Chen Dong, Paul A Coteus, Alan G Gara
  • Publication number: 20040081155
    Abstract: Class network routing is implemented in a network such as a computer network comprising a plurality of parallel compute processors at nodes thereof. Class network routing allows a compute processor to broadcast a message to a range (one or more) of other compute processors in the computer network, such as processors in a column or a row. Normally this type of operation requires a separate message to be sent to each processor. With class network routing pursuant to the invention, a single message is sufficient, which generally reduces the total number of messages in the network as well as the latency to do a broadcast. Class network routing is also applied to dense matrix inversion algorithms on distributed memory parallel supercomputers with hardware class function (multicast) capability. This is achieved by exploiting the fact that the communication patterns of dense matrix inversion can be served by hardware class functions, which results in faster execution times.
    Type: Application
    Filed: August 22, 2003
    Publication date: April 29, 2004
    Inventors: Gyan V Bhanot, Matthias A. Blumrich, Dong Chen, Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Burkhard D. Steinmacher-Burow, Todd E. Takken, Pavlos M. Vranas
  • Publication number: 20040083293
    Abstract: In a massively parallel system, a method and apparatus for uniquely assigning a MAC address to a device encodes the MAC address with a physical location of the device. The method and apparatus include configuring device interconnections of the parallel system with physical topological information such as a rack number, a midplane number, a card number, and a chip number. A device or node with a physical location encoded MAC address may then be interrogated by location for test, diagnostic, and program loading purposes.
    Type: Application
    Filed: August 22, 2003
    Publication date: April 29, 2004
    Inventors: Dong Chen, Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Todd E. Takken
  • Publication number: 20040078482
    Abstract: In a massively parallel computing system having a plurality of nodes configured in m multi-dimensions, each node including a computing device, a method for routing packets towards their destination nodes is provided which includes generating at least one of a 2m plurality of compact bit vectors containing information derived from downstream nodes. A multilevel arbitration process in which downstream information stored in the compact vectors, such as link status information and fullness of downstream buffers, is used to determine a preferred direction and virtual channel for packet transmission. Preferred direction ranges are encoded and virtual channels are selected by examining the plurality of compact bit vectors. This dynamic routing method eliminates the necessity of routing tables, thus enhancing scalability of the switch.
    Type: Application
    Filed: August 22, 2003
    Publication date: April 22, 2004
    Inventors: Matthias A. Blumrich, Dong Chen, Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Burkhard D. Steinmacher-Burow, Todd E. Takken, Pavlos M. Vranas
  • Publication number: 20040078405
    Abstract: The present in invention is directed to a method, system and program storage device for efficiently implementing a multidimensional Fast Fourier Transform (FFT) of a multidimensional array comprising a plurality of elements initially distributed in a multi-node computer system comprising a plurality of nodes in communication over a network, comprising: distributing the plurality of elements of the array in a first dimension across the plurality of nodes of the computer system over the network to facilitate a first one-dimensional FFT; performing the first one-dimensional FFT on the elements of the array distributed at each node in the first dimension; re-distributing the one-dimensional FFT-transformed elements at each node in a second dimension via “all-to-all” distribution in random order across other nodes of the computer system over the network; and performing a second one-dimensional FFT on elements of the array re-distributed at each node in the second dimension, wherein the random order fac
    Type: Application
    Filed: August 22, 2003
    Publication date: April 22, 2004
    Inventors: Gyan V Bhanot, Dong Chen, Alan G. Gara, Mark E Giampapa, Philip Heidelberger, Burkhard D Steinmacher-Burow, Pavlos M Vranas
  • Publication number: 20040078493
    Abstract: A system and method for enabling high-speed, low-latency global tree communications among processing nodes interconnected according to a tree network structure. The global tree network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices are included that interconnect the nodes of the tree via links to facilitate performance of low-latency global processing operations at nodes of the virtual tree and sub-tree structures. The global operations include one or more of: global broadcast operations downstream from a root node to leaf nodes of a virtual tree, global reduction operations upstream from leaf nodes to the root node in the virtual tree, and point-to-point message passing from any node to the root node in the virtual tree.
    Type: Application
    Filed: August 22, 2003
    Publication date: April 22, 2004
    Inventors: Matthias A Blumrich, Dong Chen, Paul W Coteus, Alan G Gara, Mark E Giampapa, Philip Heidelberger, Dirk Hoenicke, Burkhard D Steinmacher-Burow, Todd E Takken, Pavlos M Vranas
  • Publication number: 20040073830
    Abstract: A method for maintaining full performance of a file system in the presence of a failure is provided. The file system having N storage devices, where N is an integer greater than zero and N primary file servers where each file server is operatively connected to a corresponding storage device for accessing files therein. The file system further having a secondary file server operatively connected to at least one of the N storage devices. The method including: switching the connection of one of the N storage devices to the secondary file server upon a failure of one of the N primary file servers; and switching the connections of one or more of the remaining storage devices to a primary file server other than the failed file server as necessary so as to prevent a loss in performance and to provide each storage device with an operating file server.
    Type: Application
    Filed: August 22, 2003
    Publication date: April 15, 2004
    Inventors: Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Burkhard D. Steinmacher-Burow
  • Publication number: 20040073758
    Abstract: A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple prefetching for non-contiguous data structures is also disclosed.
    Type: Application
    Filed: August 22, 2003
    Publication date: April 15, 2004
    Inventors: Matthias A. Blumrich, Dong Chen, Paul W Coteus, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Dirk Hoenicke, Martin Ohmacht, Burkhard D. Steinmacher-Burow, Todd E. Takken, Pavlos M. Vranas
  • Publication number: 20040073590
    Abstract: Methods and systems for performing arithmetic functions. In accordance with a first aspect of the invention, methods and apparatus are provided, working in conjunction of software algorithms and hardware implementation of class network routing, to achieve a very significant reduction in the time required for global arithmetic operation on the torus. Therefore, it leads to greater scalability of applications running on large parallel machines. The invention involves three steps in improving the efficiency and accuracy of global operations: (1) Ensuring, when necessary, that all the nodes do the global operation on the data in the same order and so obtain a unique answer, independent of roundoff error; (2) Using the topology of the torus to minimize the number of hops and the bidirectional capabilities of the network to reduce the number of time steps in the data transfer operation to an absolute minimum; and (3) Using class function routing to reduce latency in the data transfer.
    Type: Application
    Filed: August 22, 2003
    Publication date: April 15, 2004
    Inventors: Gyan Bhanot, Matthias A. Blumrich, Dong Chen, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Burkhard D. Steinmacher-Burow, Pavlos M. Vranas