Patents by Inventor Burkhard D. Steinmacher-Burow

Burkhard D. Steinmacher-Burow has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10069599
    Abstract: A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices are included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network and class structures. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to needs of a processing algorithm.
    Type: Grant
    Filed: December 17, 2015
    Date of Patent: September 4, 2018
    Assignee: International Business Machines Corporation
    Inventors: Matthias A. Blumrich, Paul W. Coteus, Dong Chen, Alan Gara, Mark E. Giampapa, Philip Heidelberger, Dirk Hoenicke, Todd E. Takken, Burkhard D. Steinmacher-Burow, Pavlos M. Vranas
  • Publication number: 20160105262
    Abstract: A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices are included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network and class structures. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to needs of a processing algorithm.
    Type: Application
    Filed: December 17, 2015
    Publication date: April 14, 2016
    Inventors: Matthias A. Blumrich, Paul W. Coteus, Dong Chen, Alan Gara, Mark E. Giampapa, Philip Heidelberger, Dirk Hoenicke, Todd E. Takken, Burkhard D. Steinmacher-Burow, Pavlos M. Vranas
  • Patent number: 8626957
    Abstract: A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices are included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to needs of a processing algorithm.
    Type: Grant
    Filed: May 5, 2011
    Date of Patent: January 7, 2014
    Assignee: International Business Machines Corporation
    Inventors: Matthias A. Blumrich, Paul W. Coteus, Dong Chen, Alan Gara, Mark E. Giampapa, Philip Heidelberger, Dirk Hoenicke, Todd E. Takken, Burkhard D. Steinmacher-Burow, Pavlos M. Vranas
  • Patent number: 8095585
    Abstract: The present in invention is directed to a method, system and program storage device for efficiently implementing a multidimensional Fast Fourier Transform (FFT) of a multidimensional array comprising a plurality of elements initially distributed in a multi-node computer system comprising a plurality of nodes in communication over a network, comprising: distributing the plurality of elements of the array in a first dimension across the plurality of nodes of the computer system over the network to facilitate a first one-dimensional FFT; performing the first one-dimensional FFT on the elements of the array distributed at each node in the first dimension; re-distributing the one-dimensional FFT-transformed elements at each node in a second dimension via “all-to-all” distribution in random order across other nodes of the computer system over the network; and performing a second one-dimensional FFT on elements of the array re-distributed at each node in the second dimension, wherein the random order facilitates eff
    Type: Grant
    Filed: October 31, 2007
    Date of Patent: January 10, 2012
    Assignee: International Business Machines Corporation
    Inventors: Gyan V. Bhanot, Dong Chen, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Burkhard D. Steinmacher-Burow, Pavlos M. Vranas
  • Publication number: 20110219280
    Abstract: A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices are included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network and class structures. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to needs of a processing algorithm.
    Type: Application
    Filed: May 5, 2011
    Publication date: September 8, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Matthias A. Blumrich, Paul W. Coteus, Dong Chen, Alan Gara, Mark E. Giampapa, Philip Heidelberger, Dirk Hoenicke, Todd E. Takken, Burkhard D. Steinmacher-Burow, Pavlos M. Vranas
  • Patent number: 8001280
    Abstract: A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices ate included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network and class structures. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to needs of a processing algorithm.
    Type: Grant
    Filed: July 18, 2005
    Date of Patent: August 16, 2011
    Assignee: International Business Machines Corporation
    Inventors: Matthias A. Blumrich, Paul W. Coteus, Dong Chen, Alan Gara, Mark E. Giampapa, Philip Heidelberger, Dirk Hoenicke, Todd E. Takken, Burkhard D. Steinmacher-Burow, Pavlos M. Vranas
  • Patent number: 7818514
    Abstract: A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Bach processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple prefetching for non-contiguous data structures is also disclosed.
    Type: Grant
    Filed: August 22, 2008
    Date of Patent: October 19, 2010
    Assignee: International Business Machines Corporation
    Inventors: Matthias A. Blumrich, Dong Chen, Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Dirk Hoenicke, Martin Ohmacht, Burkhard D. Steinmacher-Burow, Todd E. Takken, Pavlos M. Vranas
  • Patent number: 7765337
    Abstract: Methods, compute nodes, and computer program products are provided for direct memory access (‘DMA’) transfer completion notification. Embodiments include determining, by an origin DMA engine on an origin compute node, whether a data descriptor for an application message to be sent to a target compute node is currently in an injection first-in-first-out (‘FIFO’) buffer in dependence upon a sequence number previously associated with the data descriptor, the total number of descriptors currently in the injection FIFO buffer, and the current sequence number for the newest data descriptor stored in the injection FIFO buffer; and notifying a processor core on the origin DMA engine that the message has been sent if the data descriptor for the message is not currently in the injection FIFO buffer.
    Type: Grant
    Filed: June 5, 2007
    Date of Patent: July 27, 2010
    Assignee: International Business Machines Corporation
    Inventors: Dong Chen, Mark E. Giampapa, Philip Heidelberger, Sameer Kumar, Jeffrey J. Parker, Burkhard D. Steinmacher-Burow, Pavlos Vranas
  • Patent number: 7650434
    Abstract: A system and method for enabling high-speed, low-latency global tree network communications among processing nodes interconnected according to a tree network structure. The global tree network enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices are included that interconnect the nodes of the tree via links to facilitate performance of low-latency global processing operations at nodes of the virtual tree and sub-tree structures. The global operations performed include one or more of: broadcast operations downstream from a root node to leaf nodes of a virtual tree, reduction operations upstream from leaf nodes to the root node in the virtual tree, and point-to-point message passing from any node to the root node.
    Type: Grant
    Filed: February 25, 2002
    Date of Patent: January 19, 2010
    Assignee: International Business Machines Corporation
    Inventors: Matthias A. Blumrich, Dong Chen, Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Dirk Hoenicke, Burkhard D. Steinmacher-Burow, Todd E. Takken, Pavlos M. Vranas
  • Patent number: 7587516
    Abstract: Class network routing is implemented in a network such as a computer network comprising a plurality of parallel compute processors at nodes thereof. Class network routing allows a compute processor to broadcast a message to a range (one or more) of other compute processors in the computer network, such as processors in a column or a row. Normally this type of operation requires a separate message to be sent to each processor. With class network routing pursuant to the invention, a single message is sufficient, which generally reduces the total number of messages in the network as well as the latency to do a broadcast. Class network routing is also applied to dense matrix inversion algorithms on distributed memory parallel supercomputers with hardware class function (multicast) capability. This is achieved by exploiting the fact that the communication patterns of dense matrix inversion can be served by hardware class functions, which results in faster execution times.
    Type: Grant
    Filed: February 25, 2002
    Date of Patent: September 8, 2009
    Assignee: International Business Machines Corporation
    Inventors: Gyan Bhanot, Matthias A. Blumrich, Dong Chen, Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Burkhard D. Steinmacher-Burow, Todd E. Takken, Pavlos M. Vranas
  • Patent number: 7529895
    Abstract: A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple perfecting for non-contiguous data structures is also disclosed.
    Type: Grant
    Filed: December 28, 2006
    Date of Patent: May 5, 2009
    Assignee: International Business Machines Corporation
    Inventors: Matthias A. Blumrich, Dong Chen, Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Dirk Hoenicke, Martin Ohmacht, Burkhard D. Steinmacher-Burow, Todd E. Takken, Pavlos M. Vranas
  • Patent number: 7486619
    Abstract: Multidimensional switch data networks are disclosed, such as are used by a distributed-memory parallel computer, as applied for example to computations in the field of life sciences. A distributed memory parallel computing system comprises a number of parallel compute nodes and a message passing data network connecting the compute nodes together. The data network connecting the compute nodes comprises a multidimensional switch data network of compute nodes having N dimensions, and a number/array of compute nodes Ln in each of the N dimensions. Each compute node includes an N port routing element having a port for each of the N dimensions. Each compute node of an array of Ln compute nodes in each of the N dimensions connects through a port of its routing element to an Ln port crossbar switch having Ln ports. Several embodiments are disclosed of a 4 dimensional computing system having 65,536 compute nodes.
    Type: Grant
    Filed: March 4, 2004
    Date of Patent: February 3, 2009
    Assignee: International Business Machines Corporation
    Inventors: Dong Chen, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Dirk Hoenicke, Burkhard D. Steinmacher-Burow, Pavlos M. Vranas, Matthias Augustin Blumrich
  • Publication number: 20080313408
    Abstract: A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Bach processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple prefetching for non-contiguous data structures is also disclosed.
    Type: Application
    Filed: August 22, 2008
    Publication date: December 18, 2008
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Matthias A. Blumrich, Dong Chen, Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Dirk Hoenicke, Martin Ohmacht, Burkhard D. Steinmacher-Burow, Todd E. Takken, Pavlos M. Vranas
  • Publication number: 20080307121
    Abstract: Methods, compute nodes, and computer program products are provided for direct memory access (‘DMA’) transfer completion notification. Embodiments include determining, by an origin DMA engine on an origin compute node, whether a data descriptor for an application message to be sent to a target compute node is currently in an injection first-in-first-out (‘FIFO’) buffer in dependence upon a sequence number previously associated with the data descriptor, the total number of descriptors currently in the injection FIFO buffer, and the current sequence number for the newest data descriptor stored in the injection FIFO buffer; and notifying a processor core on the origin DMA engine that the message has been sent if the data descriptor for the message is not currently in the injection FIFO buffer.
    Type: Application
    Filed: June 5, 2007
    Publication date: December 11, 2008
    Inventors: Dong Chen, Mark E. Giampapa, Philip Heidelberger, Sameer Kumar, Jeffrey J. Parker, Burkhard D. Steinmacher-Burow, Pavlos Vranas
  • Patent number: 7457303
    Abstract: A one-bounce data network comprises a plurality of nodes interconnected to each other via communication links, the network including a plurality of interconnected switch devices, said switch devices interconnected such that a message is communicated between any two switches passes over a single link from a source switch to a destination switch; and, the source switch concurrently sends a message to an arbitrary bounce switch which then sends the message to the destination switch.
    Type: Grant
    Filed: September 30, 2003
    Date of Patent: November 25, 2008
    Assignee: International Business Machines Corporation
    Inventors: Matthias A. Blumrich, Dong Chen, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Burkhard D. Steinmacher-Burow, Pavlos M. Vranas
  • Patent number: 7444385
    Abstract: A system and method for generating global asynchronous signals in a computing structure. Particularly, a global interrupt and barrier network is implemented that implements logic for generating global interrupt and barrier signals for controlling global asynchronous operations performed by processing elements at selected processing nodes of a computing structure in accordance with a processing algorithm; and includes the physical interconnecting of the processing nodes for communicating the global interrupt and barrier signals to the elements via low-latency paths. The global asynchronous signals respectively initiate interrupt and barrier operations at the processing nodes at times selected for optimizing performance of the processing algorithms.
    Type: Grant
    Filed: February 25, 2002
    Date of Patent: October 28, 2008
    Assignee: International Business Machines Corporation
    Inventors: Matthias A. Blumrich, Dong Chen, Paul W. Coteus, Alan G. Gara, Mark E Giampapa, Philip Heidelberger, Gerard V. Kopcsay, Burkhard D. Steinmacher-Burow, Todd E. Takken
  • Publication number: 20080133633
    Abstract: The present in invention is directed to a method, system and program storage device for efficiently implementing a multidimensional Fast Fourier Transform (FFT) of a multidimensional array comprising a plurality of elements initially distributed in a multi-node computer system comprising a plurality of nodes in communication over a network, comprising: distributing the plurality of elements of the array in a first dimension across the plurality of nodes of the computer system over the network to facilitate a first one-dimensional FFT; performing the first one-dimensional FFT on the elements of the array distributed at each node in the first dimension; re-distributing the one-dimensional FFT-transformed elements at each node in a second dimension via “all-to-all” distribution in random order across other nodes of the computer system over the network; and performing a second one-dimensional FFT on elements of the array re-distributed at each node in the second dimension, wherein the random order facilitates eff
    Type: Application
    Filed: October 31, 2007
    Publication date: June 5, 2008
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Gyan V. Bhanot, Dong Chen, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Burkhard D. Steinmacher-Burow, Pavlos M. Vranas
  • Patent number: 7383490
    Abstract: Methods and apparatus perform fault isolation in multiple node computing systems using commutative error detection values for—example, checksums—to identify and to isolate faulty nodes. When information associated with a reproducible portion of a computer program is injected into a network by a node, a commutative error detection value is calculated. At intervals, node fault detection apparatus associated with the multiple node computer system retrieve commutative error detection values associated with the node and stores them in memory. When the computer program is executed again by the multiple node computer system, new commutative error detection values are created and stored in memory. The node fault detection apparatus identifies faulty nodes by comparing commutative error detection values associated with reproducible portions of the application program generated by a particular node from different runs of the application program. Differences in values indicate a possible faulty node.
    Type: Grant
    Filed: April 14, 2005
    Date of Patent: June 3, 2008
    Assignee: International Business Machines Corporation
    Inventors: Gheorghe Almasi, Matthias Augustin Blumrich, Dong Chen, Paul Coteus, Alan Gara, Mark E. Giampapa, Philip Heidelberger, Dirk I. Hoenicke, Sarabjeet Singh, Burkhard D. Steinmacher-Burow, Todd Takken, Pavlos Vranas
  • Publication number: 20080104367
    Abstract: A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices ate included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network and class structures. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to needs of a processing algorithm.
    Type: Application
    Filed: July 18, 2005
    Publication date: May 1, 2008
    Inventors: Matthias A. Blumrich, Paul W. Coteus, Dong Chen, Alan Gara, Mark E. Giampapa, Philip Heidelberger, Dirk Hoenicke, Todd E. Takken, Burkhard D. Steinmacher-Burow, Pavlos M. Vranas
  • Patent number: 7330996
    Abstract: A method for maintaining full performance of a file system in the presence of a failure is provided. The file system having N storage devices, where N is an integer greater than zero and N primary file servers where each file server is operatively connected to a corresponding storage device for accessing files therein. The file system further having a secondary file server operatively connected to at least one of the N storage devices. The method including: switching the connection of one of the N storage devices to the secondary file server upon a failure of one of the N primary file servers; and switching the connections of one or more of the remaining storage devices to a primary file server other than the failed file server as necessary so as to prevent a loss in performance and to provide each storage device with an operating file server.
    Type: Grant
    Filed: February 25, 2002
    Date of Patent: February 12, 2008
    Assignee: International Business Machines Corporation
    Inventors: Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Burkhard D. Steinmacher-Burow