Patents by Inventor Larry Robert Dennison

Larry Robert Dennison has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

SPARSE CONVOLUTIONAL NEURAL NETWORK ACCELERATOR

Publication number: 20210089864

Abstract: A method, computer program product, and system perform computations using a processor. A first instruction including a first index vector operand and a second index vector operand is received and the first index vector operand is decoded to produce first coordinate sets for a first array, each first coordinate set including at least a first coordinate and a second coordinate of a position of a non-zero element in the first array. The second index vector operand is decoded to produce second coordinate sets for a second array, each second coordinate set including at least a third coordinate and a fourth coordinate of a position of a non-zero element in the second array. The first coordinate sets are summed with the second coordinate sets to produce output coordinate sets and the output coordinate sets are converted into a set of linear indices.

Type: Application

Filed: December 4, 2020

Publication date: March 25, 2021

Inventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
SCALABLE IN-NETWORK COMPUTATION FOR MASSIVELY-PARALLEL SHARED-MEMORY PROCESSORS

Publication number: 20210037107

Abstract: A network device configured to perform scalable, in-network computations is described. The network device is configured to process pull requests and/or push requests from a plurality of endpoints connected to the network. A collective communication primitive from a particular endpoint can be received at a network device. The collective communication primitive is associated with a multicast region of a shared global address space and is mapped to a plurality of participating endpoints. The network device is configured to perform an in-network computation based on information received from the participating endpoints before forwarding a response to the collective communication primitive back to one or more of the participating endpoints. The endpoints can inject pull requests (e.g., load commands) and/or push requests (e.g., store commands) into the network. A multicast capability enables tasks, such as a reduction operation, to be offloaded to hardware in the network device.

Type: Application

Filed: July 24, 2020

Publication date: February 4, 2021

Inventors: Benjamin Klenk, Nan Jiang, Larry Robert Dennison, Gregory M. Thorson
SCALABLE IN-NETWORK COMPUTATION FOR MASSIVELY-PARALLEL SHARED-MEMORY PROCESSORS

Publication number: 20210036877

Abstract: A network device configured to perform scalable, in-network computations is described. The network device is configured to process pull requests and/or push requests from a plurality of endpoints connected to the network. A collective communication primitive from a particular endpoint can be received at a network device. The collective communication primitive is associated with a multicast region of a shared global address space and is mapped to a plurality of participating endpoints. The network device is configured to perform an in-network computation based on information received from the participating endpoints before forwarding a response to the collective communication primitive back to one or more of the participating endpoints. The endpoints can inject pull requests (e.g., load commands) and/or push requests (e.g., store commands) into the network. A multicast capability enables tasks, such as a reduction operation, to be offloaded to hardware in the network device.

Type: Application

Filed: July 24, 2020

Publication date: February 4, 2021

Inventors: Benjamin Klenk, Nan Jiang, Larry Robert Dennison, Gregory M. Thorson
INJECTION LIMITING AND WAVE SYNCHRONIZATION FOR SCALABLE IN-NETWORK COMPUTATION

Publication number: 20210036881

Abstract: A network device configured to perform scalable, in-network computations is described. The network device is configured to process pull requests and/or push requests from a plurality of endpoints connected to the network. A collective communication primitive from a particular endpoint can be received at a network device. The collective communication primitive is associated with a multicast region of a shared global address space and is mapped to a plurality of participating endpoints. The network device is configured to perform an in-network computation based on information received from the participating endpoints before forwarding a response to the collective communication primitive back to one or more of the participating endpoints.

Type: Application

Filed: July 24, 2020

Publication date: February 4, 2021

Inventors: Benjamin Klenk, Nan Jiang, Larry Robert Dennison
Sparse convolutional neural network accelerator

Patent number: 10891538

Abstract: A method, computer program product, and system perform computations using a processor. A first instruction including a first index vector operand and a second index vector operand is received and the first index vector operand is decoded to produce first coordinate sets for a first array, each first coordinate set including at least a first coordinate and a second coordinate of a position of a non-zero element in the first array. The second index vector operand is decoded to produce second coordinate sets for a second array, each second coordinate set including at least a third coordinate and a fourth coordinate of a position of a non-zero element in the second array. The first coordinate sets are summed with the second coordinate sets to produce output coordinate sets and the output coordinate sets are converted into a set of linear indices.

Type: Grant

Filed: July 25, 2017

Date of Patent: January 12, 2021

Assignee: NVIDIA Corporation

Inventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
Sparse convolutional neural network accelerator

Patent number: 10860922

Abstract: A method, computer program product, and system perform computations using a sparse convolutional neural network accelerator. A first vector comprising only non-zero weight values and first associated positions of the non-zero weight values within a 3D space is received. A second vector comprising only non-zero input activation values and second associated positions of the non-zero input activation values within a 2D space is received. The non-zero weight values are multiplied with the non-zero input activation values, within a multiplier array, to produce a third vector of products. The first associated positions are combined with the second associated positions to produce a fourth vector of positions, where each position in the fourth vector is associated with a respective product in the third vector. The products in the third vector are transmitted to adders in an accumulator array, based on the position associated with each one of the products.

Type: Grant

Filed: November 18, 2019

Date of Patent: December 8, 2020

Assignee: NVIDIA Corporation

Inventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
SCALABLE LIGHT-WEIGHT PROTOCOLS FOR WIRE-SPEED PACKET ORDERING

Publication number: 20200374594

Abstract: A communication method between a source device and a target device utilizes speculative connection setup between the source device and the target device, target-device-side packet ordering, and fine-grained ordering to remove packet dependencies.

Type: Application

Filed: July 21, 2020

Publication date: November 26, 2020

Applicant: NVIDIA Corp.

Inventors: Hans Eberle, Larry Robert Dennison
SCALABLE LIGHT-WEIGHT PROTOCOLS FOR WIRE-SPEED PACKET ORDERING

Publication number: 20200374593

Abstract: A communication method between a source device and a target device utilizes speculative connection setup between the source device and the target device, target-device-side packet ordering, and fine-grained ordering to remove packet dependencies.

Type: Application

Filed: July 20, 2020

Publication date: November 26, 2020

Applicant: NVIDIA Corp.

Inventors: Hans Eberle, Larry Robert Dennison
SECURING MEMORY ACCESSES IN A VIRTUALIZED ENVIRONMENT

Publication number: 20200356492

Abstract: Multiprocessor clusters in a virtualized environment conventionally fail to provide memory access security, which is frequently a requirement for efficient utilization in multi-client settings. Without adequate access security, a malicious process may access what might be confidential data that belongs to a different client sharing the multiprocessor cluster. Furthermore, an inadvertent programming error in the code for one client process may accidentally corrupt data that belongs to the different client. Neither scenario is acceptable. Embodiments of the present disclosure provide access security by enabling each processing node within a multiprocessor cluster to virtualize and manage local memory access and only process access requests possessing proper access credentials. In this way, different applications executing on a multiprocessor cluster may be isolated from each other while advantageously sharing the hardware resources of the multiprocessor cluster.

Type: Application

Filed: July 23, 2020

Publication date: November 12, 2020

Inventors: Samuel Hammond Duncan, Sanjeev Jain, Mark Douglas Hummel, Vyas Venkataraman, Olivier Giroux, Larry Robert Dennison, Alexander Toichi Ishii, Hemayet Hossain, Nir Haim Arad
Scalable light-weight protocols for wire-speed packet ordering

Patent number: 10820057

Abstract: A communication method between a source device and a target device utilizes speculative connection setup between the source device and the target device, target-device-side packet ordering, and fine-grained ordering to remove packet dependencies.

Type: Grant

Filed: April 5, 2019

Date of Patent: October 27, 2020

Assignee: NVIDIA Corp.

Inventors: Hans Eberle, Larry Robert Dennison
Distributed address translation in a multi-node interconnect fabric

Patent number: 10769076

Abstract: Multiprocessor clusters in a virtualized environment conventionally fail to provide memory access security, which is frequently a requirement for efficient utilization in multi-client settings. Without adequate access security, a malicious process may access what might be confidential data that belongs to a different client sharing the multiprocessor cluster. Furthermore, an inadvertent programming error in the code for one client process may accidentally corrupt data that belongs to the different client. Neither scenario is acceptable. Embodiments of the present disclosure provide access security by enabling each processing node within a multiprocessor cluster to virtualize and manage local memory access and only process access requests possessing proper access credentials. In this way, different applications executing on a multiprocessor cluster may be isolated from each other while advantageously sharing the hardware resources of the multiprocessor cluster.

Type: Grant

Filed: November 21, 2018

Date of Patent: September 8, 2020

Assignee: NVIDIA Corporation

Inventors: Samuel Hammond Duncan, Sanjeev Jain, Mark Douglas Hummel, Vyas Venkataraman, Olivier Giroux, Larry Robert Dennison, Alexander Toichi Ishii, Hemayet Hossain, Nir Haim Arad
USE OF STASHING BUFFERS TO IMPROVE THE EFFICIENCY OF CROSSBAR SWITCHES

Publication number: 20200177521

Abstract: A switch architecture enables ports to stash packets in unused buffers on other ports, exploiting excess internal bandwidth that may exist, for example, in a tiled switch. This architecture leverages unused port buffer memory to improve features such as congestion handling and error recovery.

Type: Application

Filed: December 4, 2019

Publication date: June 4, 2020

Applicant: NVIDIA Corp.

Inventors: Matthias Augustin Blumrich, Nan Jiang, Larry Robert Dennison
DISTRIBUTED BATCH NORMALIZATION USING ESTIMATES AND ROLLBACK

Publication number: 20200160123

Abstract: A technique utilizing speculative execution and rollback for performing data parallel training of a neural network model is disclosed. Activations for a layer of the neural network model are normalized during a speculative normalization operation using estimated normalization parameters associated with a partial population of a set of training data allocated to a particular processor. Normalization parameters associated with the total population of the set of training data are generated by a distributed reduce operation in parallel with the speculative normalization operation. An optional rollback operation can revert the activations to a pre-normalization state if the estimated normalization parameters for the partial population are subsequently determined to be inaccurate compared to the normalization parameters for the population of the set of training data distributed across a plurality of processors.

Type: Application

Filed: October 31, 2019

Publication date: May 21, 2020

Inventors: Larry Robert Dennison, Benjamin Klenk
DISTRIBUTED ADDRESS TRANSLATION IN A MULTI-NODE INTERCONNECT FABRIC

Publication number: 20200159669

Abstract: Multiprocessor clusters in a virtualized environment conventionally fail to provide memory access security, which is frequently a requirement for efficient utilization in multi-client settings. Without adequate access security, a malicious process may access what might be confidential data that belongs to a different client sharing the multiprocessor cluster. Furthermore, an inadvertent programming error in the code for one client process may accidentally corrupt data that belongs to the different client. Neither scenario is acceptable. Embodiments of the present disclosure provide access security by enabling each processing node within a multiprocessor cluster to virtualize and manage local memory access and only process access requests possessing proper access credentials. In this way, different applications executing on a multiprocessor cluster may be isolated from each other while advantageously sharing the hardware resources of the multiprocessor cluster.

Type: Application

Filed: November 21, 2018

Publication date: May 21, 2020

Inventors: Samuel Hammond Duncan, Sanjeev Jain, Mark Douglas Hummel, Vyas Venkataraman, Olivier Giroux, Larry Robert Dennison, Alexander Toichi Ishii, Hemayet Hossain, Nir Haim Arad
DISTRIBUTED BATCH NORMALIZATION USING PARTIAL POPULATIONS

Publication number: 20200160112

Abstract: A technique for performing data parallel training of a neural network model is disclosed that incorporates batch normalization techniques using partial populations to generate normalization parameters. The technique involves processing, by each processor of a plurality of processors in parallel, a first portion of a sub-batch of training samples allocated to the processor to generate activations for the first portion of the sub-batch. Each processor analyzes the activations and transmits statistical measures for the first portion to an additional processor that reduces the statistical measures from multiple processors to generate normalization parameters for a partial population of the training samples that includes the first portion from each of the plurality of processors. The normalization parameters are then transmitted back to each of the processors to normalize the activations for both the first portion and a second portion of the sub-batch of training samples allocated to each processor.

Type: Application

Filed: October 31, 2019

Publication date: May 21, 2020

Inventors: Larry Robert Dennison, Benjamin Klenk
SCALABLE LIGHT-WEIGHT PROTOCOLS FOR WIRE-SPEED PACKET ORDERING

Publication number: 20200145725

Abstract: A communication method between a source device and a target device utilizes speculative connection setup between the source device and the target device, target-device-side packet ordering, and fine-grained ordering to remove packet dependencies.

Type: Application

Filed: April 5, 2019

Publication date: May 7, 2020

Inventors: Hans Eberle, Larry Robert Dennison
SPARSE CONVOLUTIONAL NEURAL NETWORK ACCELERATOR

Publication number: 20200082254

Abstract: A method, computer program product, and system perform computations using a sparse convolutional neural network accelerator. A first vector comprising only non-zero weight values and first associated positions of the non-zero weight values within a 3D space is received. A second vector comprising only non-zero input activation values and second associated positions of the non-zero input activation values within a 2D space is received. The non-zero weight values are multiplied with the non-zero input activation values, within a multiplier array, to produce a third vector of products. The first associated positions are combined with the second associated positions to produce a fourth vector of positions, where each position in the fourth vector is associated with a respective product in the third vector. The products in the third vector are transmitted to adders in an accumulator array, based on the position associated with each one of the products.

Type: Application

Filed: November 18, 2019

Publication date: March 12, 2020

Inventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
Sparse convolutional neural network accelerator

Patent number: 10528864

Abstract: A method, computer program product, and system perform computations using a sparse convolutional neural network accelerator. A first vector comprising only non-zero weight values and first associated positions of the non-zero weight values within a 3D space is received. A second vector comprising only non-zero input activation values and second associated positions of the non-zero input activation values within a 2D space is received. The non-zero weight values are multiplied with the non-zero input activation values, within a multiplier array, to produce a third vector of products. The first associated positions are combined with the second associated positions to produce a fourth vector of positions, where each position in the fourth vector is associated with a respective product in the third vector. The products in the third vector are transmitted to adders in an accumulator array, based on the position associated with each one of the products.

Type: Grant

Filed: March 14, 2017

Date of Patent: January 7, 2020

Assignee: NVIDIA Corporation

Inventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
Network endpoint congestion management

Patent number: 10063481

Abstract: A congestion management protocol that can be used for small messages in which the last-hop switch determines the congestion of the end point. The last-hop switch drops messages when the end point is congested and schedules a retransmission. A second congestion management protocol transmits small messages in a speculative mode to avoid the overhead caused by reservation handshakes.

Type: Grant

Filed: May 23, 2016

Date of Patent: August 28, 2018

Assignee: U.S. Department of Energy

Inventors: Nan Jiang, Larry Robert Dennison, William James Dally
SPARSE CONVOLUTIONAL NEURAL NETWORK ACCELERATOR

Publication number: 20180046900

Abstract: A method, computer program product, and system perform computations using a processor. A first instruction including a first index vector operand and a second index vector operand is received and the first index vector operand is decoded to produce first coordinate sets for a first array, each first coordinate set including at least a first coordinate and a second coordinate of a position of a non-zero element in the first array. The second index vector operand is decoded to produce second coordinate sets for a second array, each second coordinate set including at least a third coordinate and a fourth coordinate of a position of a non-zero element in the second array. The first coordinate sets are summed with the second coordinate sets to produce output coordinate sets and the output coordinate sets are converted into a set of linear indices.

Type: Application

Filed: July 25, 2017

Publication date: February 15, 2018

Inventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison

prev 1 2 3 next