Patents by Inventor Dirk Hoenicke
Dirk Hoenicke has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 10069599Abstract: A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices are included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network and class structures. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to needs of a processing algorithm.Type: GrantFiled: December 17, 2015Date of Patent: September 4, 2018Assignee: International Business Machines CorporationInventors: Matthias A. Blumrich, Paul W. Coteus, Dong Chen, Alan Gara, Mark E. Giampapa, Philip Heidelberger, Dirk Hoenicke, Todd E. Takken, Burkhard D. Steinmacher-Burow, Pavlos M. Vranas
-
Publication number: 20160105262Abstract: A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices are included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network and class structures. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to needs of a processing algorithm.Type: ApplicationFiled: December 17, 2015Publication date: April 14, 2016Inventors: Matthias A. Blumrich, Paul W. Coteus, Dong Chen, Alan Gara, Mark E. Giampapa, Philip Heidelberger, Dirk Hoenicke, Todd E. Takken, Burkhard D. Steinmacher-Burow, Pavlos M. Vranas
-
Patent number: 9244759Abstract: An error-recovery method to enable error-free message transfer between nodes of a computer network. A first node of the network sends a packet to a second node of the network over a link between the nodes, and the first node keeps a copy of the packet on a sending end of the link until the first node receives acknowledgment from the second node that the packet was received without error. The second node tests the packet to determine if the packet is error free. If the packet is not error free, the second node sets a flag to mark the packet as corrupt. The second node returns acknowledgement to the first node specifying whether the packet was received with or without error. When the packet is received with error, the link is returned to a known state and the packet is sent again to the second node.Type: GrantFiled: January 6, 2014Date of Patent: January 26, 2016Assignee: International Business Machines CorporationInventors: Matthias A. Blumrich, Paul W. Coteus, Dong Chen, Alan Gara, Mark E. Giampapa, Philip Heidelberger, Dirk Hoenicke, Todd Takken, Burkhard Steinmacher-Burow, Pavlos M. Vranas
-
Patent number: 8902892Abstract: Disclosed are a unified method and apparatus to classify, route, and process injected data packets into a network so as to belong to a plurality of logical networks, each implementing a specific flow of data on top of a common physical network. The method allows to locally identify collectives of packets for local processing, such as the computation of the sum, difference, maximum, minimum, or other logical operations among the identified packet collective. Packets are injected together with a class-attribute and an opcode attribute. Network routers, employing the described method, use the packet attributes to look-up the class-specific route information from a local route table, which contains the local incoming and outgoing directions as part of the specifically implemented global data flow of the particular virtual network.Type: GrantFiled: July 15, 2008Date of Patent: December 2, 2014Assignee: International Business Machines CorporationInventor: Dirk Hoenicke
-
Publication number: 20140122980Abstract: A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices are included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network and class structures. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to needs of a processing algorithm.Type: ApplicationFiled: January 6, 2014Publication date: May 1, 2014Applicant: International Business Machines CorporationInventors: Matthias A. Blumrich, Paul W. Coteus, Dong Chen, Alan Gara, Mark E. Giampapa, Philip Heidelberger, Dirk Hoenicke, Todd Takken, Burkhard Steinmacher-Burow, Pavlos M. Vranas
-
Patent number: 8626957Abstract: A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices are included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to needs of a processing algorithm.Type: GrantFiled: May 5, 2011Date of Patent: January 7, 2014Assignee: International Business Machines CorporationInventors: Matthias A. Blumrich, Paul W. Coteus, Dong Chen, Alan Gara, Mark E. Giampapa, Philip Heidelberger, Dirk Hoenicke, Todd E. Takken, Burkhard D. Steinmacher-Burow, Pavlos M. Vranas
-
Patent number: 8161248Abstract: A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.Type: GrantFiled: November 24, 2010Date of Patent: April 17, 2012Assignee: International Business Machines CorporationInventors: Matthias A. Blumrich, Dong Chen, Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Phillip Heidelberger, Dirk Hoenicke, Martin Ohmacht
-
Patent number: 8122197Abstract: A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.Type: GrantFiled: August 19, 2009Date of Patent: February 21, 2012Assignee: International Business Machines CorporationInventors: Matthias A. Blumrich, Dong Chen, Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Dirk Hoenicke, Martin Ohmacht
-
Patent number: 8103832Abstract: Method and apparatus of prefetching streams of varying prefetch depth dynamically changes the depth of prefetching so that the number of multiple streams as well as the hit rate of a single stream are optimized. The method and apparatus in one aspect monitor a plurality of load requests from a processing unit for data in a prefetch buffer, determine an access pattern associated with the plurality of load requests and adjust a prefetch depth according to the access pattern.Type: GrantFiled: June 26, 2007Date of Patent: January 24, 2012Assignee: International Business Machines CorporationInventors: Alan Gara, Martin Ohmacht, Valentina Salapura, Krishnan Sugavanam, Dirk Hoenicke
-
Publication number: 20110219280Abstract: A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices are included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network and class structures. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to needs of a processing algorithm.Type: ApplicationFiled: May 5, 2011Publication date: September 8, 2011Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Matthias A. Blumrich, Paul W. Coteus, Dong Chen, Alan Gara, Mark E. Giampapa, Philip Heidelberger, Dirk Hoenicke, Todd E. Takken, Burkhard D. Steinmacher-Burow, Pavlos M. Vranas
-
Patent number: 8001280Abstract: A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices ate included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network and class structures. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to needs of a processing algorithm.Type: GrantFiled: July 18, 2005Date of Patent: August 16, 2011Assignee: International Business Machines CorporationInventors: Matthias A. Blumrich, Paul W. Coteus, Dong Chen, Alan Gara, Mark E. Giampapa, Philip Heidelberger, Dirk Hoenicke, Todd E. Takken, Burkhard D. Steinmacher-Burow, Pavlos M. Vranas
-
Publication number: 20110072219Abstract: A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.Type: ApplicationFiled: November 24, 2010Publication date: March 24, 2011Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Matthias A. Blumrich, Dong Chen, Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Phillip Heidelberger, Dirk Hoenicke, Martin Ohmacht
-
Patent number: 7870343Abstract: A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.Type: GrantFiled: February 25, 2002Date of Patent: January 11, 2011Assignee: International Business Machines CorporationInventors: Matthias A. Blumrich, Dong Chen, Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Dirk Hoenicke, Martin Ohmacht
-
Patent number: 7818514Abstract: A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Bach processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple prefetching for non-contiguous data structures is also disclosed.Type: GrantFiled: August 22, 2008Date of Patent: October 19, 2010Assignee: International Business Machines CorporationInventors: Matthias A. Blumrich, Dong Chen, Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Dirk Hoenicke, Martin Ohmacht, Burkhard D. Steinmacher-Burow, Todd E. Takken, Pavlos M. Vranas
-
Patent number: 7793038Abstract: A programmable memory system and method for enabling one or more processor devices access to shared memory in a computing environment, the shared memory including one or more memory storage structures having addressable locations for storing data. The system comprises: one or more first logic devices associated with a respective one or more processor devices, each first logic device for receiving physical memory address signals and programmable for generating a respective memory storage structure select signal upon receipt of pre-determined address bit values at selected physical memory address bit locations; and, a second logic device responsive to each of the respective select signal for generating an address signal used for selecting a memory storage structure for processor access. The system thus enables each processor device of a computing environment memory storage access distributed across the one or more memory storage structures.Type: GrantFiled: June 26, 2007Date of Patent: September 7, 2010Assignee: International Business Machines CorporationInventors: Matthias A. Blumrich, Dong Chen, Alan G. Gara, Mark E. Giampapa, Dirk Hoenicke, Martin Ohmacht, Valentina Salapura, Krishnan Sugavanam
-
Patent number: 7650434Abstract: A system and method for enabling high-speed, low-latency global tree network communications among processing nodes interconnected according to a tree network structure. The global tree network enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices are included that interconnect the nodes of the tree via links to facilitate performance of low-latency global processing operations at nodes of the virtual tree and sub-tree structures. The global operations performed include one or more of: broadcast operations downstream from a root node to leaf nodes of a virtual tree, reduction operations upstream from leaf nodes to the root node in the virtual tree, and point-to-point message passing from any node to the root node.Type: GrantFiled: February 25, 2002Date of Patent: January 19, 2010Assignee: International Business Machines CorporationInventors: Matthias A. Blumrich, Dong Chen, Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Dirk Hoenicke, Burkhard D. Steinmacher-Burow, Todd E. Takken, Pavlos M. Vranas
-
Publication number: 20090313439Abstract: A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.Type: ApplicationFiled: August 19, 2009Publication date: December 17, 2009Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Matthias A. Blumrich, Dong Chen, Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Dirk Hoenicke, Martin Ohmacht
-
Patent number: 7529895Abstract: A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple perfecting for non-contiguous data structures is also disclosed.Type: GrantFiled: December 28, 2006Date of Patent: May 5, 2009Assignee: International Business Machines CorporationInventors: Matthias A. Blumrich, Dong Chen, Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Dirk Hoenicke, Martin Ohmacht, Burkhard D. Steinmacher-Burow, Todd E. Takken, Pavlos M. Vranas
-
Patent number: 7486619Abstract: Multidimensional switch data networks are disclosed, such as are used by a distributed-memory parallel computer, as applied for example to computations in the field of life sciences. A distributed memory parallel computing system comprises a number of parallel compute nodes and a message passing data network connecting the compute nodes together. The data network connecting the compute nodes comprises a multidimensional switch data network of compute nodes having N dimensions, and a number/array of compute nodes Ln in each of the N dimensions. Each compute node includes an N port routing element having a port for each of the N dimensions. Each compute node of an array of Ln compute nodes in each of the N dimensions connects through a port of its routing element to an Ln port crossbar switch having Ln ports. Several embodiments are disclosed of a 4 dimensional computing system having 65,536 compute nodes.Type: GrantFiled: March 4, 2004Date of Patent: February 3, 2009Assignee: International Business Machines CorporationInventors: Dong Chen, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Dirk Hoenicke, Burkhard D. Steinmacher-Burow, Pavlos M. Vranas, Matthias Augustin Blumrich
-
Publication number: 20090006718Abstract: A programmable memory system and method for enabling one or more processor devices access to shared memory in a computing environment, the shared memory including one or more memory storage structures having addressable locations for storing data. The system comprises: one or more first logic devices associated with a respective one or more processor devices, each first logic device for receiving physical memory address signals and programmable for generating a respective memory storage structure select signal upon receipt of pre-determined address bit values at selected physical memory address bit locations; and, a second logic device responsive to each the respective select signal for generating an address signal used for selecting a memory storage structure for processor access. The system thus enables each processor device of a computing environment memory storage access distributed across the one or more memory storage structures.Type: ApplicationFiled: June 26, 2007Publication date: January 1, 2009Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Matthias A. Blumrich, Dong Chen, Alan G. Gara, Mark E. Giampapa, Dirk Hoenicke, Martin Ohmacht, Valentina Salapura, Krishnan Sugavanam