Patents by Inventor Rangharajan VENKATESAN

Rangharajan VENKATESAN has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

ENERGY-EFFICIENT DATAPATH FOR VECTOR-SCALED HIERARCHICAL CODEBOOK QUANTIZATION

Publication number: 20250125819

Abstract: Vector-scaled hierarchical codebook quantization reduces precision (bitwidth) vectors of parameters and may enable energy-efficient acceleration of deep neural networks. A vector (block array) comprises one or more parameters within a single dimension of a multi-dimensional tensor (or kernel). For example, block array comprises 4 sub-vectors (blocks) and each sub-vector comprises 8 parameters. The parameters may be represented in integer, floating-point, or any other suitable format. A vector cluster quantization technique is used to quantize blocks of parameters in real-time. Hardware circuitry within a datapath identifies an optimal codebook of a plurality of codebooks for quantizing each block of parameters and the block is encoded using the identified codebook. During processing, the identified codebook is used to obtain the quantized parameter and perform computations at the reduced precision.

Type: Application

Filed: December 18, 2024

Publication date: April 17, 2025

Inventors: Rangharajan Venkatesan, Reena Elangovan, Brucek Kurdo Khailany, Brian Matthew Zimmer
Inference accelerator using logarithmic-based arithmetic

Patent number: 12141225

Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components using an asynchronous accumulator to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum.

Type: Grant

Filed: January 23, 2020

Date of Patent: November 12, 2024

Assignee: NVIDIA Corporation

Inventors: William James Dally, Rangharajan Venkatesan, Brucek Kurdo Khailany
Neural network accelerator using logarithmic-based arithmetic

Patent number: 12118454

Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum.

Type: Grant

Filed: December 12, 2023

Date of Patent: October 15, 2024

Assignee: NVIDIA Corporation

Inventors: William James Dally, Rangharajan Venkatesan, Brucek Kurdo Khailany
ASYNCHRONOUS ACCUMULATOR USING LOGARITHMIC-BASED ARITHMETIC

Publication number: 20240311626

Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components using an asynchronous accumulator to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum.

Type: Application

Filed: May 24, 2024

Publication date: September 19, 2024

Inventors: William James Dally, Rangharajan Venkatesan, Brucek Kurdo Khailany, Stephen G. Tell
Fine-grained per-vector scaling for neural network quantization

Patent number: 12045307

Abstract: Today neural networks are used to enable autonomous vehicles and improve the quality of speech recognition, real-time language translation, and online search optimizations. However, operation of the neural networks for these applications consumes energy. Quantization of parameters used by the neural networks reduces the amount of memory needed to store the parameters while also reducing the power consumed during operation of the neural network. Matrix operations performed by the neural networks require many multiplication calculations, so reducing the number of bits that are multiplied reduces the energy that is consumed. Quantizing smaller sets of the parameters using a shared scale factor improves accuracy compared with quantizing larger sets of the parameters. Accuracy of the calculations may be maintained by quantizing and scaling the parameters using fine-grained per-vector scale factors. A vector includes one or more elements within a single dimension of a multi-dimensional matrix.

Type: Grant

Filed: October 30, 2020

Date of Patent: July 23, 2024

Assignee: NVIDIA Corporation

Inventors: Brucek Kurdo Khailany, Steve Haihang Dai, Rangharajan Venkatesan, Haoxing Ren
Asynchronous accumulator using logarithmic-based arithmetic

Patent number: 12033060

Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components using an asynchronous accumulator to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum.

Type: Grant

Filed: January 23, 2020

Date of Patent: July 9, 2024

Assignee: NVIDIA Corporation

Inventors: William James Dally, Rangharajan Venkatesan, Brucek Kurdo Khailany, Stephen G. Tell
LOW-PRECISION FLOATING-POINT DATAPATH IN A COMPUTER PROCESSOR

Publication number: 20240160406

Abstract: Mechanisms to exploit the inherent resiliency of deep learning inference workloads to improve the energy efficiency of computer processors such as graphics processing units with these workloads. The mechanisms provide energy-accuracy tradeoffs in the computation of deep learning inference calculations via energy-efficient floating point data path micro-architectures with integer accumulation, and enhanced mechanisms for per-vector scaled quantization (VS-Quant) of floating-point arguments.

Type: Application

Filed: October 11, 2023

Publication date: May 16, 2024

Applicant: NVIDIA Corp.

Inventors: Rangharajan Venkatesan, Reena Elangovan, Charbel Sakr, Brucek Kurdo Khailany, Ming Y Siu, Ilyas Elkin, Brent Ralph Boswell
NEURAL NETWORK ACCELERATOR USING LOGARITHMIC-BASED ARITHMETIC

Publication number: 20240112007

Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum.

Type: Application

Filed: December 12, 2023

Publication date: April 4, 2024

Inventors: William James Dally, Rangharajan Venkatesan, Brucek Kurdo Khailany
Neural network accelerator using logarithmic-based arithmetic

Patent number: 11886980

Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum.

Type: Grant

Filed: August 23, 2019

Date of Patent: January 30, 2024

Assignee: NVIDIA Corporation

Inventors: William James Dally, Rangharajan Venkatesan, Brucek Kurdo Khailany
Scalable multi-die deep learning system

Patent number: 11769040

Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture implemented on a semiconductor package. The package includes multiple chips, each with a central processing element, a global memory buffer, and processing elements. Each processing element includes a weight buffer, an activation buffer, and multiply-accumulate units to combine, in parallel, the weight values and the activation values.

Type: Grant

Filed: July 19, 2019

Date of Patent: September 26, 2023

Assignee: NVIDIA CORP.

Inventors: Yakun Shao, Rangharajan Venkatesan, Nan Jiang, Brian Matthew Zimmer, Jason Clemons, Nathaniel Pinckney, Matthew R Fojtik, William James Dally, Joel S. Emer, Stephen W. Keckler, Brucek Khailany
OPTIMALLY CLIPPED TENSORS AND VECTORS

Publication number: 20230237308

Abstract: Quantizing tensors and vectors processed within a neural network reduces power consumption and may accelerate processing. Quantization reduces the number of bits used to represent a value, where decreasing the number of bits used can decrease the accuracy of computations that use the value. Ideally, quantization is performed without reducing accuracy. Quantization-aware training (QAT) is performed by dynamically quantizing tensors (weights and activations) using optimal clipping scalars. “Optimal” in that the mean squared error (MSE) of the quantized operation is minimized and the clipping scalars define the degree or amount of quantization for various tensors of the operation. Conventional techniques that quantize tensors during training suffer from high amounts of noise (error). Other techniques compute the clipping scalars offline through a brute force search to provide high accuracy.

Type: Application

Filed: July 26, 2022

Publication date: July 27, 2023

Inventors: Charbel Sakr, Steve Haihang Dai, Brucek Kurdo Khailany, William James Dally, Rangharajan Venkatesan, Brian Matthew Zimmer
QUANTIZED NEURAL NETWORK TRAINING AND INFERENCE

Publication number: 20230068941

Abstract: One embodiment of a computer-implemented method for processing a neural network comprises receiving a first quantized matrix that corresponds to a portion of a multi-dimensional input tensor and has been quantized based on a first scale factor; and performing one or more computational operations using the first quantized matrix and the first scale factor to generate one or more data values that correspond to a first portion of a multi-dimensional output tensor.

Type: Application

Filed: February 11, 2022

Publication date: March 2, 2023

Inventors: Thierry TAMBE, Steve DAI, Brucek KHAILANY, Rangharajan VENKATESAN
MACHINE LEARNING TRAINING IN LOGARITHMIC NUMBER SYSTEM

Publication number: 20220261650

Abstract: An end-to-end low-precision training system based on a multi-base logarithmic number system and a multiplicative weight update algorithm. The multi-base logarithmic number system is applied to update weights of the neural network, with different bases of the multi-base logarithmic number system utilized between calculation of weight updates, calculation of feed-forward signals, and calculation of feedback signals. The LNS expresses a high dynamic range and computational energy efficiency, making it advantageous for on-board training in energy-constrained edge devices.

Type: Application

Filed: June 11, 2021

Publication date: August 18, 2022

Applicant: NVIDIA Corp.

Inventors: Jiawei Zhao, Steve Haihang Dai, Rangharajan Venkatesan, Ming-Yu Liu, William James Dally, Anima Anandkumar
Efficient Neural Network Accelerator Dataflows

Publication number: 20220076110

Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture includes multiple chips, each with a central processing element, a global memory buffer, and a plurality of additional processing elements. Each additional processing element includes a weight buffer, an activation buffer, and vector multiply-accumulate units to combine, in parallel, the weight values and the activation values using stationary data flows.

Type: Application

Filed: November 19, 2021

Publication date: March 10, 2022

Applicant: NVIDIA Corp.

Inventors: Yakun Shao, Rangharajan Venkatesan, Miaorong Wang, Daniel Smith, William James Dally, Joel Emer, Stephen W. Keckler, Brucek Khailany
Efficient neural network accelerator dataflows

Patent number: 11270197

Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture includes multiple chips, each with a central processing element, a global memory buffer, and a plurality of additional processing elements. Each additional processing element includes a weight buffer, an activation buffer, and vector multiply-accumulate units to combine, in parallel, the weight values and the activation values using stationary data flows.

Type: Grant

Filed: November 4, 2019

Date of Patent: March 8, 2022

Assignee: NVIDIA Corp.

Inventors: Yakun Shao, Rangharajan Venkatesan, Miaorong Wang, Daniel Smith, William James Dally, Joel Emer, Stephen W. Keckler, Brucek Khailany
FINE-GRAINED PER-VECTOR SCALING FOR NEURAL NETWORK QUANTIZATION

Publication number: 20220067530

Abstract: Today neural networks are used to enable autonomous vehicles and improve the quality of speech recognition, real-time language translation, and online search optimizations. However, operation of the neural networks for these applications consumes energy. Quantization of parameters used by the neural networks reduces the amount of memory needed to store the parameters while also reducing the power consumed during operation of the neural network. Matrix operations performed by the neural networks require many multiplication calculations, so reducing the number of bits that are multiplied reduces the energy that is consumed. Quantizing smaller sets of the parameters using a shared scale factor improves accuracy compared with quantizing larger sets of the parameters. Accuracy of the calculations may be maintained by quantizing and scaling the parameters using fine-grained per-vector scale factors. A vector includes one or more elements within a single dimension of a multi-dimensional matrix.

Type: Application

Filed: October 30, 2020

Publication date: March 3, 2022

Inventors: Brucek Kurdo Khailany, Steve Haihang Dai, Rangharajan Venkatesan, Haoxing Ren
FINE-GRAINED PER-VECTOR SCALING FOR NEURAL NETWORK QUANTIZATION

Publication number: 20220067512

Abstract: Today neural networks are used to enable autonomous vehicles and improve the quality of speech recognition, real-time language translation, and online search optimizations. However, operation of the neural networks for these applications consumes energy. Quantization of parameters used by the neural networks reduces the amount of memory needed to store the parameters while also reducing the power consumed during operation of the neural network. Matrix operations performed by the neural networks require many multiplication calculations, so reducing the number of bits that are multiplied reduces the energy that is consumed. Quantizing smaller sets of the parameters using a shared scale factor improves accuracy compared with quantizing larger sets of the parameters. Accuracy of the calculations may be maintained by quantizing and scaling the parameters using fine-grained per-vector scale factors. A vector includes one or more elements within a single dimension of a multi-dimensional matrix.

Type: Application

Filed: October 30, 2020

Publication date: March 3, 2022

Inventors: Brucek Kurdo Khailany, Steve Haihang Dai, Rangharajan Venkatesan, Haoxing Ren
EFFICIENT SOFTMAX COMPUTATION

Publication number: 20220067513

Abstract: Solutions improving efficiency of Softmax computation applied for efficient deep learning inference in transformers and other neural networks. The solutions utilize a reduced-precision implementation of various operations in Softmax, replacing ex with 2x to reduce instruction overhead associated with computing ex, and replacing floating point max computation with integer max computation. Further described is a scalable implementation that decomposes Softmax into UnNormalized Softmax and Normalization operations.

Type: Application

Filed: December 4, 2020

Publication date: March 3, 2022

Applicant: NVIDIA Corp.

Inventors: Jacob Robert Stevens, Rangharajan Venkatesan, Steve Haihang Dai, Brucek Khailany
Approximate cache memory

Patent number: 11151040

Abstract: An approximate cache system is disclosed. The system includes a quality aware cache controller (QACC), a cache, a quality table configured to receive addresses and a quality specification from the processor associated with each address and further configured to provide the quality specification for each address to the QACC, wherein the QACC controls approximation is based on one or more of i) approximation through partial read operations; ii) approximation through lower read currents; iii) approximation through skipped write operations; iv) approximation through partial write operations; v) approximations through lower write duration; vi) approximation through lower write currents; and vii) approximations through skipped refreshes.

Type: Grant

Filed: March 24, 2019

Date of Patent: October 19, 2021

Assignee: Purdue Research Foundation

Inventors: Ashish Ranjan, Swagath Venkataramani, Zoha Pajouhi, Rangharajan Venkatesan, Kaushik Roy, Anand Raghunathan
NEURAL NETWORK ACCELERATOR USING LOGARITHMIC-BASED ARITHMETIC

Publication number: 20210056397

Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum.

Type: Application

Filed: August 23, 2019

Publication date: February 25, 2021

Inventors: William James Dally, Rangharajan Venkatesan, Brucek Kurdo Khailany

1 2 next