Patents by Inventor Philip Heidelberger

Philip Heidelberger has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Hybrid data-model parallelism for efficient deep learning

Patent number: 11556450

Abstract: The embodiments herein describe hybrid parallelism techniques where a mix of data and model parallelism techniques are used to split the workload of a layer across an array of processors. When configuring the array, the bandwidth of the processors in one direction may be greater than the bandwidth in the other direction. Each layer is characterized according to whether they are more feature heavy or weight heavy. Depending on this characterization, the workload of an NN layer can be assigned to the array using a hybrid parallelism technique rather than using solely the data parallelism technique or solely the model parallelism technique. For example, if an NN layer is more weight heavy than feature heavy, data parallelism is used in the direction with the greater bandwidth (to minimize the negative impact of weight reduction) while model parallelism is used in the direction with the smaller bandwidth.

Type: Grant

Filed: October 11, 2019

Date of Patent: January 17, 2023

Assignee: International Business Machines Corporation

Inventors: Swagath Venkataramani, Vijayalakshmi Srinivasan, Philip Heidelberger
Network architecture with locally enhanced bandwidth

Patent number: 11165719

Abstract: A computer network architecture includes a plurality N of first nodes, each first node having kC ports to a cluster network, where N and kC are integers greater than 0; and a local network switch connected to each of the plurality of first nodes, but not to the cluster network. Each first node has kL ports to the local network switch, where kL, is an integer greater than 0, and any two first nodes in the plurality of first nodes communicate with each other via the local network switch or via the cluster network.

Type: Grant

Filed: June 12, 2019

Date of Patent: November 2, 2021

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Philip Heidelberger, Craig Brian Stunkel
Switch-connected Dragonfly network

Patent number: 11165686

Abstract: A switch-connected dragonfly network and method of operating. A plurality of groups of row switches is organized according to multiple rows and columns, each row including multiple groups of row switches connected to form a two-level dragonfly network. A plurality of column switches interconnect groups of row switches along respective columns, a column switch associated with a corresponding group of row switches in a row. A switch port with a same logical port on a row switch at a same location in each group along the respective column connects to a same column switch. The switch-connected dragonfly network is expandable by adding additional rows, an added row comprising a two-level dragonfly network. A switch group of said added row associated with a column being connects to an available port at an existing column switch of said column by corresponding added S path link with no re-cabling of the switched network required.

Type: Grant

Filed: August 7, 2018

Date of Patent: November 2, 2021

Assignee: International Business Machines Corporation

Inventors: Philip Heidelberger, Dong Chen, Yutaka Sugawara, Paul W. Coteus
HYBRID DATA-MODEL PARALLELISM FOR EFFICIENT DEEP LEARNING

Publication number: 20210110247

Abstract: The embodiments herein describe hybrid parallelism techniques where a mix of data and model parallelism techniques are used to split the workload of a layer across an array of processors. When configuring the array, the bandwidth of the processors in one direction may be greater than the bandwidth in the other direction. Each layer is characterized according to whether they are more feature heavy or weight heavy. Depending on this characterization, the workload of an NN layer can be assigned to the array using a hybrid parallelism technique rather than using solely the data parallelism technique or solely the model parallelism technique. For example, if an NN layer is more weight heavy than feature heavy, data parallelism is used in the direction with the greater bandwidth (to minimize the negative impact of weight reduction) while model parallelism is used in the direction with the smaller bandwidth.

Type: Application

Filed: October 11, 2019

Publication date: April 15, 2021

Inventors: Swagath Venkataramani, Vijayalakshmi Srinivasan, Philip Heidelberger
I/O routing in a multidimensional torus network

Patent number: 10979337

Abstract: A method, system and computer program product are disclosed for routing data packet in a computing system comprising a multidimensional torus compute node network including a multitude of compute nodes, and an I/O node network including a plurality of I/O nodes. In one embodiment, the method comprises assigning to each of the data packets a destination address identifying one of the compute nodes; providing each of the data packets with a toio value; routing the data packets through the compute node network to the destination addresses of the data packets; and when each of the data packets reaches the destination address assigned to said each data packet, routing said each data packet to one of the I/O nodes if the toio value of said each data packet is a specified value. In one embodiment, each of the data packets is also provided with an ioreturn value used to route the data packets through the compute node network.

Type: Grant

Filed: January 29, 2020

Date of Patent: April 13, 2021

Assignee: International Business Machines Corporation

Inventors: Dong Chen, Noel A. Eisley, Philip Heidelberger
Reliability processing of remote direct memory access

Patent number: 10958588

Abstract: Methods and systems for monitoring remote transmissions of messages among a plurality of nodes are described. A processing element in a first node may allocate a sequence number to a request to read and/or update data in a second node. The processing element may be different from main processors of the first node. The processing element may send the message and the sequence number to the second node. The processing element may modify a status of the sequence number to an active state, indicating a transmission of the message is pending. The processing element may, in response to a response from the second node, modify the status of the sequence number to an inactive state, indicating a completed transmission of the message. The processing element may, in response to no response from the second node within a time period, resend the message and the sequence number to the second node.

Type: Grant

Filed: February 5, 2018

Date of Patent: March 23, 2021

Assignee: International Business Machines Corporation

Inventors: Sameer Kumar, Philip Heidelberger, Yutaka Sugawara, Dong Chen, Robert M. Senger
NETWORK ARCHITECTURE WITH LOCALLY ENHANCED BANDWIDTH

Publication number: 20200396175

Abstract: A computer network architecture includes a plurality N of first nodes, each first node having kC ports to a cluster network, where N and kC are integers greater than 0; and a local network switch connected to each of the plurality of first nodes, but not to the cluster network. Each first node has kL ports to the local network switch, where kL, is an integer greater than 0, and any two first nodes in the plurality of first nodes communicate with each other via the local network switch or via the cluster network.

Type: Application

Filed: June 12, 2019

Publication date: December 17, 2020

Inventors: Philip Heidelberger, Craig Brian Stunkel
Power management of network links

Patent number: 10834672

Abstract: A first method includes determining a total length of pending packets for a network link, determining a currently preferred power mode for the network link based on the total length of pending packets for the network link, and changing a current power mode for the network link to the currently preferred power mode. A corresponding apparatus is also disclosed herein. A second method includes determining a utilization for a network link, determining a currently preferred power mode for the network link based on the utilization for the network link, and changing a current power mode for the network link to the currently preferred power mode. A corresponding apparatus is also disclosed herein.

Type: Grant

Filed: September 23, 2015

Date of Patent: November 10, 2020

Assignee: International Business Machines Corporation

Inventors: Dong Chen, Paul W. Coteus, Noel A. Eisley, Philip Heidelberger, Robert M. Senger, Burkhard Steinmacher-Burow, Yutaka Sugawara
Reduced number of counters for reliable messaging

Patent number: 10812416

Abstract: A shared memory maintained by sender processes stores a sequence number counter per destination process. A sender process increments the sequence number counter in the shared memory in sending a message to a destination process. The sender process sends a data packet comprising the message and at least a sequence number specified by the sequence number counter. All of the sender processes share a sequence number counter per destination process, each of the sender processes incrementing the sequence number counter in sending a respective message. Receiver processes run on the hardware processor, each of the receiver processes maintaining a local memory counter on the memory, the local memory counter associated with a sending node. The local memory counter stores a sequence number of a message received from the sending node. The receiver process delivers incoming data packets ordered by sequence numbers of the data packets.

Type: Grant

Filed: December 27, 2017

Date of Patent: October 20, 2020

Assignee: International Business Machines Corporation

Inventors: Sameer Kumar, Philip Heidelberger, Dong Chen, Yutaka Sugawara, Robert M. Senger, Burkhard Steinmacher-Burow
Embedding global barrier and collective in a torus network

Patent number: 10740097

Abstract: Embodiments of the invention provide a method, system and computer program product for embedding a global barrier and global interrupt network in a parallel computer system organized as a torus network. The computer system includes a multitude of nodes. In one embodiment, the method comprises taking inputs from a set of receivers of the nodes, dividing the inputs from the receivers into a plurality of classes, combining the inputs of each of the classes to obtain a result, and sending said result to a set of senders of the nodes. Embodiments of the invention provide a method, system and computer program product for embedding a collective network in a parallel computer system organized as a torus network. In one embodiment, the method comprises adding to a torus network a central collective logic to route messages among at least a group of nodes in a tree structure.

Type: Grant

Filed: May 20, 2016

Date of Patent: August 11, 2020

Assignee: International Business Machines Corporation

Inventors: Dong Chen, Paul W. Coteus, Noel A. Eisley, Alan Gara, Philip Heidelberger, Robert M. Senger, Valentina Salapura, Burkhard Steinmacher-Burow, Yutaka Sugawara, Todd E. Takken
I/O ROUTING IN A MULTIDIMENSIONAL TORUS NETWORK

Publication number: 20200228435

Abstract: A method, system and computer program product are disclosed for routing data packet in a computing system comprising a multidimensional torus compute node network including a multitude of compute nodes, and an I/O node network including a plurality of I/O nodes. In one embodiment, the method comprises assigning to each of the data packets a destination address identifying one of the compute nodes; providing each of the data packets with a toio value; routing the data packets through the compute node network to the destination addresses of the data packets; and when each of the data packets reaches the destination address assigned to said each data packet, routing said each data packet to one of the I/O nodes if the toio value of said each data packet is a specified value. In one embodiment, each of the data packets is also provided with an ioreturn value used to route the data packets through the compute node network.

Type: Application

Filed: January 29, 2020

Publication date: July 16, 2020

Inventors: Dong Chen, Noel A. Eisley, Philip Heidelberger
I/O routing in a multidimensional torus network

Patent number: 10601697

Abstract: A method, system and computer program product are disclosed for routing data packet in a computing system comprising a multidimensional torus compute node network including a multitude of compute nodes, and an I/O node network including a plurality of I/O nodes. In one embodiment, the method comprises assigning to each of the data packets a destination address identifying one of the compute nodes; providing each of the data packets with a toio value; routing the data packets through the compute node network to the destination addresses of the data packets; and when each of the data packets reaches the destination address assigned to said each data packet, routing said each data packet to one of the I/O nodes if the toio value of said each data packet is a specified value. In one embodiment, each of the data packets is also provided with an ioreturn value used to route the data packets through the compute node network.

Type: Grant

Filed: May 6, 2019

Date of Patent: March 24, 2020

Assignee: International Business Machines Corporation

Inventors: Dong Chen, Noel A. Eisley, Philip Heidelberger
SWITCH-CONNECTED DRAGONFLY NETWORK

Publication number: 20200053002

Abstract: A switch-connected dragonfly network and method of operating. A plurality of groups of row switches is organized according to multiple rows and columns, each row including multiple groups of row switches connected to form a two-level dragonfly network. A plurality of column switches interconnect groups of row switches along respective columns, a column switch associated with a corresponding group of row switches in a row. A switch port with a same logical port on a row switch at a same location in each group along the respective column connects to a same column switch. The switch-connected dragonfly network is expandable by adding additional rows, an added row comprising a two-level dragonfly network. A switch group of said added row associated with a column being connects to an available port at an existing column switch of said column by corresponding added S path link with no re-cabling of the switched network required.

Type: Application

Filed: August 7, 2018

Publication date: February 13, 2020

Inventors: Philip Heidelberger, Dong Chen, Yutaka Sugawara, Paul W. Coteus
Network switch architecture supporting multiple simultaneous collective operations

Patent number: 10425358

Abstract: An apparatus includes a collective switch hardware architecture, including an input arrangement circuit including multiple input ports and multiple outputs. The input arrangement circuit routes its multiple input ports to selected ones of its outputs. The collective switch hardware architecture includes collective reduction logic coupled to the multiple outputs of the input arrangement circuit and having multiple outputs. The collective reduction logic includes ALU(s) and arbitration and control circuitry. The ALU(s) and arbitration and control circuitry support multiple simultaneous collective operations from different collective classes, and support arbitrary input port and output port mapping to different collective classes. The collective switch hardware architecture further includes an output arrangement circuit including a multiple inputs coupled to the multiple outputs of the collective reduction logic and including multiple output ports.

Type: Grant

Filed: September 29, 2016

Date of Patent: September 24, 2019

Assignee: International Business Machines Corporation

Inventors: Dong Chen, Philip Heidelberger, Craig Stunkel
I/O ROUTING IN A MULTIDIMENSIONAL TORUS NETWORK

Publication number: 20190260666

Abstract: A method, system and computer program product are disclosed for routing data packet in a computing system comprising a multidimensional torus compute node network including a multitude of compute nodes, and an I/O node network including a plurality of I/O nodes. In one embodiment, the method comprises assigning to each of the data packets a destination address identifying one of the compute nodes; providing each of the data packets with a toio value; routing the data packets through the compute node network to the destination addresses of the data packets; and when each of the data packets reaches the destination address assigned to said each data packet, routing said each data packet to one of the I/O nodes if the toio value of said each data packet is a specified value. In one embodiment, each of the data packets is also provided with an ioreturn value used to route the data packets through the compute node network.

Type: Application

Filed: May 6, 2019

Publication date: August 22, 2019

Inventors: Dong Chen, Noel A. Eisley, Philip Heidelberger
RELIABILITY PROCESSING OF REMOTE DIRECT MEMORY ACCESS

Publication number: 20190245799

Abstract: Methods and systems for monitoring remote transmissions of messages among a plurality of nodes are described. A processing element in a first node may allocate a sequence number to a request to read and/or update data in a second node. The processing element may be different from main processors of the first node. The processing element may send the message and the sequence number to the second node. The processing element may modify a status of the sequence number to an active state, indicating a transmission of the message is pending. The processing element may, in response to a response from the second node, modify the status of the sequence number to an inactive state, indicating a completed transmission of the message. The processing element may, in response to no response from the second node within a time period, resend the message and the sequence number to the second node.

Type: Application

Filed: February 5, 2018

Publication date: August 8, 2019

Inventors: Sameer Kumar, Philip Heidelberger, Yutaka Sugawara, Dong Chen, Robert M. Senger
I/O routing in a multidimensional torus network

Patent number: 10348609

Abstract: A method, system and computer program product are disclosed for routing data packet in a computing system comprising a multidimensional torus compute node network including a multitude of compute nodes, and an I/O node network including a plurality of I/O nodes. In one embodiment, the method comprises assigning to each of the data packets a destination address identifying one of the compute nodes; providing each of the data packets with a toio value; routing the data packets through the compute node network to the destination addresses of the data packets; and when each of the data packets reaches the destination address assigned to said each data packet, routing said each data packet to one of the I/O nodes if the toio value of said each data packet is a specified value. In one embodiment, each of the data packets is also provided with an ioreturn value used to route the data packets through the compute node network.

Type: Grant

Filed: April 18, 2018

Date of Patent: July 9, 2019

Assignee: International Business Machines Corporation

Inventors: Dong Chen, Noel A. Eisley, Philip Heidelberger
REDUCED NUMBER OF COUNTERS FOR RELIABLE MESSAGING

Publication number: 20190199653

Abstract: A shared memory maintained by sender processes stores a sequence number counter per destination process. A sender process increments the sequence number counter in the shared memory in sending a message to a destination process. The sender process sends a data packet comprising the message and at least a sequence number specified by the sequence number counter. All of the sender processes share a sequence number counter per destination process, each of the sender processes incrementing the sequence number counter in sending a respective message. Receiver processes run on the hardware processor, each of the receiver processes maintaining a local memory counter on the memory, the local memory counter associated with a sending node. The local memory counter stores a sequence number of a message received from the sending node. The receiver process delivers incoming data packets ordered by sequence numbers of the data packets.

Type: Application

Filed: December 27, 2017

Publication date: June 27, 2019

Inventors: Sameer Kumar, Philip Heidelberger, Dong Chen, Yutaka Sugawara, Robert M. Senger, Burkhard Steinmacher-Burow
Remote processing and memory utilization

Patent number: 10152450

Abstract: According to one embodiment of the present invention, a system for operating memory includes a first node coupled to a second node by a network, the system configured to perform a method including receiving the remote transaction message from the second node in a processing element in the first node via the network, wherein the remote transaction message bypasses a main processor in the first node as it is transmitted to the processing element. In addition, the method includes accessing, by the processing element, data from a location in a memory in the first node based on the remote transaction message, and performing, by the processing element, computations based on the data and the remote transaction message.

Type: Grant

Filed: August 13, 2012

Date of Patent: December 11, 2018

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Dong Chen, Noel A. Eisley, Philip Heidelberger, James A. Kahle, Fabrizio Petrini, Robert M. Senger, Burkhard Steinmacher-Burow, Yutaka Sugawara
Combined group ECC protection and subgroup parity protection

Patent number: 10140179

Abstract: A method and system are disclosed for providing combined error code protection and subgroup parity protection for a given group of n bits. The method comprises the steps of identifying a number, m, of redundant bits for said error protection; and constructing a matrix P, wherein multiplying said given group of n bits with P produces m redundant error correction code (ECC) protection bits, and two columns of P provide parity protection for subgroups of said given group of n bits. In the preferred embodiment of the invention, the matrix P is constructed by generating permutations of m bit wide vectors with three or more, but an odd number of, elements with value one and the other elements with value zero; and assigning said vectors to rows of the matrix P.

Type: Grant

Filed: December 17, 2015

Date of Patent: November 27, 2018

Assignee: International Business Machines Corporation

Inventors: Alan Gara, Dong Chen, Philip Heidelberger, Martin Ohmacht

1 2 3 4 5 … next