Patents by Inventor Paulius Micikevicius

Paulius Micikevicius has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

LOSS-SCALING FOR DEEP NEURAL NETWORK TRAINING WITH REDUCED PRECISION

Publication number: 20240078433

Abstract: In training a deep neural network using reduced precision, gradient computation operates on larger values without affecting the rest of the training procedure. One technique trains the deep neural network to develop loss, scales the loss, computes gradients at a reduced precision, and reduces the magnitude of the computed gradients to compensate for scaling of the loss. In one example non-limiting arrangement, the training forward pass scales a loss value by some factor S and the weight update reduces the weight gradient contribution by 1/S. Several techniques can be used for selecting scaling factor S and adjusting the weight update.

Type: Application

Filed: October 31, 2023

Publication date: March 7, 2024

Inventors: Jonah Alben, Paulius Micikevicius, Hao Wu
Loss-scaling for deep neural network training with reduced precision

Patent number: 11842280

Abstract: In training a deep neural network using reduced precision, gradient computation operates on larger values without affecting the rest of the training procedure. One technique trains the deep neural network to develop loss, scales the loss, computes gradients at a reduced precision, and reduces the magnitude of the computed gradients to compensate for scaling of the loss. In one example non-limiting arrangement, the training forward pass scales a loss value by some factor S and the weight update reduces the weight gradient contribution by 1/S. Several techniques can be used for selecting scaling factor S and adjusting the weight update.

Type: Grant

Filed: May 4, 2018

Date of Patent: December 12, 2023

Assignee: NVIDIA Corporation

Inventors: Jonah Alben, Paulius Micikevicius, Hao Wu
INCREASING SPARCITY IN DATA SETS

Publication number: 20220327101

Abstract: Apparatuses, systems, and techniques to transform data sets, such as matrices representing layers of neural networks, to increase sparsity and/or other characteristics of said data sets to improve performance in computations, such as neural network computations. In at least one embodiment, one or more subsets of data in one or more sets of data are rearranged as part of a process to increase sparsity in said one or more sets of data to satisfy one or more one or more structural sparsity constraints.

Type: Application

Filed: May 18, 2021

Publication date: October 13, 2022

Inventors: Jeffrey Michael Pool, Chong Yu, Paulius Micikevicius
Stochastic rounding of numerical values

Patent number: 10684824

Abstract: A method, computer readable medium, and system are disclosed for rounding numerical values. A set of bits from an input value is identified as a rounding value. A second set of bits representing a second value is extracted from the input value and added with the rounding value to produce a sum. The sum is truncated to produce the rounded output value. Thus, the present invention provides a stochastic rounding technique that rounds up an input value as a function of a second value and a rounding value, both of which were obtained from the input value. When the second value and rounding value are obtained from consistent bit locations of the input value, the resulting output value is deterministic. Stochastic rounding, which is deterministic, is advantageously applicable in deep learning applications.

Type: Grant

Filed: June 6, 2018

Date of Patent: June 16, 2020

Assignee: NVIDIA Corporation

Inventors: Jonah M. Alben, Paulius Micikevicius, Hao Wu, Ming Yiu Siu
STOCHASTIC ROUNDING OF NUMERICAL VALUES

Publication number: 20190377549

Abstract: A method, computer readable medium, and system are disclosed for rounding numerical values. A set of bits from an input value is identified as a rounding value. A second set of bits representing a second value is extracted from the input value and added with the rounding value to produce a sum. The sum is truncated to produce the rounded output value. Thus, the present invention provides a stochastic rounding technique that rounds up an input value as a function of a second value and a rounding value, both of which were obtained from the input value. When the second value and rounding value are obtained from consistent bit locations of the input value, the resulting output value is deterministic. Stochastic rounding, which is deterministic, is advantageously applicable in deep learning applications.

Type: Application

Filed: June 6, 2018

Publication date: December 12, 2019

Inventors: Jonah M. Alben, Paulius Micikevicius, Hao Wu, Ming Yiu Siu
Fusing a sequence of operations through subdividing

Patent number: 10152310

Abstract: A compiler and a method of compiling code that reduces memory bandwidth when processing code on a computer are provided herein. In one embodiment, the method includes: (1) automatically identifying a sequence of operations for fusing, wherein the sequence of operations correspond to instructions from a source code, (2) determining subdivisions of a final output of the sequence of operations, (3) determining input data and intermediate operations needed to obtain a final subdivision output for each of the subdivisions and (4) automatically generating code to fuse the sequence of operations employing the subdivisions, wherein the automatically identifying and the automatically generating are performed by a processor.

Type: Grant

Filed: May 27, 2015

Date of Patent: December 11, 2018

Assignee: Nvidia Corporation

Inventors: Mahesh Ravishankar, Paulius Micikevicius, Vinod Grover
LOSS-SCALING FOR DEEP NEURAL NETWORK TRAINING WITH REDUCED PRECISION

Publication number: 20180322391

Abstract: In training a deep neural network using reduced precision, gradient computation operates on larger values without affecting the rest of the training procedure. One technique trains the deep neural network to develop loss, scales the loss, computes gradients at a reduced precision, and reduces the magnitude of the computed gradients to compensate for scaling of the loss. In one example non-limiting arrangement, the training forward pass scales a loss value by some factor S and the weight update reduces the weight gradient contribution by 1/S. Several techniques can be used for selecting scaling factor S and adjusting the weight update.

Type: Application

Filed: May 4, 2018

Publication date: November 8, 2018

Inventors: Hao WU, Jonah ALBEN, Paulius MICIKEVICIUS
FUSING A SEQUENCE OF OPERATIONS THROUGH SUBDIVIDING

Publication number: 20160350088

Abstract: A compiler and a method of compiling code that reduces memory bandwidth when processing code on a computer are provided herein. In one embodiment, the method includes: (1) automatically identifying a sequence of operations for fusing, wherein the sequence of operations correspond to instructions from a source code, (2) determining subdivisions of a final output of the sequence of operations, (3) determining input data and intermediate operations needed to obtain a final subdivision output for each of the subdivisions and (4) automatically generating code to fuse the sequence of operations employing the subdivisions, wherein the automatically identifying and the automatically generating are performed by a processor.

Type: Application

Filed: May 27, 2015

Publication date: December 1, 2016

Inventors: Mahesh Ravishankar, Paulius Micikevicius, Vinod Grover