Patents by Inventor Rangharajan VENKATESAN

Rangharajan VENKATESAN has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12141225
    Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components using an asynchronous accumulator to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum.
    Type: Grant
    Filed: January 23, 2020
    Date of Patent: November 12, 2024
    Assignee: NVIDIA Corporation
    Inventors: William James Dally, Rangharajan Venkatesan, Brucek Kurdo Khailany
  • Patent number: 12118454
    Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum.
    Type: Grant
    Filed: December 12, 2023
    Date of Patent: October 15, 2024
    Assignee: NVIDIA Corporation
    Inventors: William James Dally, Rangharajan Venkatesan, Brucek Kurdo Khailany
  • Publication number: 20240311626
    Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components using an asynchronous accumulator to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum.
    Type: Application
    Filed: May 24, 2024
    Publication date: September 19, 2024
    Inventors: William James Dally, Rangharajan Venkatesan, Brucek Kurdo Khailany, Stephen G. Tell
  • Patent number: 12045307
    Abstract: Today neural networks are used to enable autonomous vehicles and improve the quality of speech recognition, real-time language translation, and online search optimizations. However, operation of the neural networks for these applications consumes energy. Quantization of parameters used by the neural networks reduces the amount of memory needed to store the parameters while also reducing the power consumed during operation of the neural network. Matrix operations performed by the neural networks require many multiplication calculations, so reducing the number of bits that are multiplied reduces the energy that is consumed. Quantizing smaller sets of the parameters using a shared scale factor improves accuracy compared with quantizing larger sets of the parameters. Accuracy of the calculations may be maintained by quantizing and scaling the parameters using fine-grained per-vector scale factors. A vector includes one or more elements within a single dimension of a multi-dimensional matrix.
    Type: Grant
    Filed: October 30, 2020
    Date of Patent: July 23, 2024
    Assignee: NVIDIA Corporation
    Inventors: Brucek Kurdo Khailany, Steve Haihang Dai, Rangharajan Venkatesan, Haoxing Ren
  • Patent number: 12033060
    Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components using an asynchronous accumulator to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum.
    Type: Grant
    Filed: January 23, 2020
    Date of Patent: July 9, 2024
    Assignee: NVIDIA Corporation
    Inventors: William James Dally, Rangharajan Venkatesan, Brucek Kurdo Khailany, Stephen G. Tell
  • Publication number: 20240160406
    Abstract: Mechanisms to exploit the inherent resiliency of deep learning inference workloads to improve the energy efficiency of computer processors such as graphics processing units with these workloads. The mechanisms provide energy-accuracy tradeoffs in the computation of deep learning inference calculations via energy-efficient floating point data path micro-architectures with integer accumulation, and enhanced mechanisms for per-vector scaled quantization (VS-Quant) of floating-point arguments.
    Type: Application
    Filed: October 11, 2023
    Publication date: May 16, 2024
    Applicant: NVIDIA Corp.
    Inventors: Rangharajan Venkatesan, Reena Elangovan, Charbel Sakr, Brucek Kurdo Khailany, Ming Y Siu, Ilyas Elkin, Brent Ralph Boswell
  • Publication number: 20240112007
    Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum.
    Type: Application
    Filed: December 12, 2023
    Publication date: April 4, 2024
    Inventors: William James Dally, Rangharajan Venkatesan, Brucek Kurdo Khailany
  • Patent number: 11886980
    Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum.
    Type: Grant
    Filed: August 23, 2019
    Date of Patent: January 30, 2024
    Assignee: NVIDIA Corporation
    Inventors: William James Dally, Rangharajan Venkatesan, Brucek Kurdo Khailany
  • Patent number: 11769040
    Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture implemented on a semiconductor package. The package includes multiple chips, each with a central processing element, a global memory buffer, and processing elements. Each processing element includes a weight buffer, an activation buffer, and multiply-accumulate units to combine, in parallel, the weight values and the activation values.
    Type: Grant
    Filed: July 19, 2019
    Date of Patent: September 26, 2023
    Assignee: NVIDIA CORP.
    Inventors: Yakun Shao, Rangharajan Venkatesan, Nan Jiang, Brian Matthew Zimmer, Jason Clemons, Nathaniel Pinckney, Matthew R Fojtik, William James Dally, Joel S. Emer, Stephen W. Keckler, Brucek Khailany
  • Publication number: 20230237308
    Abstract: Quantizing tensors and vectors processed within a neural network reduces power consumption and may accelerate processing. Quantization reduces the number of bits used to represent a value, where decreasing the number of bits used can decrease the accuracy of computations that use the value. Ideally, quantization is performed without reducing accuracy. Quantization-aware training (QAT) is performed by dynamically quantizing tensors (weights and activations) using optimal clipping scalars. “Optimal” in that the mean squared error (MSE) of the quantized operation is minimized and the clipping scalars define the degree or amount of quantization for various tensors of the operation. Conventional techniques that quantize tensors during training suffer from high amounts of noise (error). Other techniques compute the clipping scalars offline through a brute force search to provide high accuracy.
    Type: Application
    Filed: July 26, 2022
    Publication date: July 27, 2023
    Inventors: Charbel Sakr, Steve Haihang Dai, Brucek Kurdo Khailany, William James Dally, Rangharajan Venkatesan, Brian Matthew Zimmer
  • Publication number: 20230068941
    Abstract: One embodiment of a computer-implemented method for processing a neural network comprises receiving a first quantized matrix that corresponds to a portion of a multi-dimensional input tensor and has been quantized based on a first scale factor; and performing one or more computational operations using the first quantized matrix and the first scale factor to generate one or more data values that correspond to a first portion of a multi-dimensional output tensor.
    Type: Application
    Filed: February 11, 2022
    Publication date: March 2, 2023
    Inventors: Thierry TAMBE, Steve DAI, Brucek KHAILANY, Rangharajan VENKATESAN
  • Publication number: 20220261650
    Abstract: An end-to-end low-precision training system based on a multi-base logarithmic number system and a multiplicative weight update algorithm. The multi-base logarithmic number system is applied to update weights of the neural network, with different bases of the multi-base logarithmic number system utilized between calculation of weight updates, calculation of feed-forward signals, and calculation of feedback signals. The LNS expresses a high dynamic range and computational energy efficiency, making it advantageous for on-board training in energy-constrained edge devices.
    Type: Application
    Filed: June 11, 2021
    Publication date: August 18, 2022
    Applicant: NVIDIA Corp.
    Inventors: Jiawei Zhao, Steve Haihang Dai, Rangharajan Venkatesan, Ming-Yu Liu, William James Dally, Anima Anandkumar
  • Publication number: 20220076110
    Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture includes multiple chips, each with a central processing element, a global memory buffer, and a plurality of additional processing elements. Each additional processing element includes a weight buffer, an activation buffer, and vector multiply-accumulate units to combine, in parallel, the weight values and the activation values using stationary data flows.
    Type: Application
    Filed: November 19, 2021
    Publication date: March 10, 2022
    Applicant: NVIDIA Corp.
    Inventors: Yakun Shao, Rangharajan Venkatesan, Miaorong Wang, Daniel Smith, William James Dally, Joel Emer, Stephen W. Keckler, Brucek Khailany
  • Patent number: 11270197
    Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture includes multiple chips, each with a central processing element, a global memory buffer, and a plurality of additional processing elements. Each additional processing element includes a weight buffer, an activation buffer, and vector multiply-accumulate units to combine, in parallel, the weight values and the activation values using stationary data flows.
    Type: Grant
    Filed: November 4, 2019
    Date of Patent: March 8, 2022
    Assignee: NVIDIA Corp.
    Inventors: Yakun Shao, Rangharajan Venkatesan, Miaorong Wang, Daniel Smith, William James Dally, Joel Emer, Stephen W. Keckler, Brucek Khailany
  • Publication number: 20220067530
    Abstract: Today neural networks are used to enable autonomous vehicles and improve the quality of speech recognition, real-time language translation, and online search optimizations. However, operation of the neural networks for these applications consumes energy. Quantization of parameters used by the neural networks reduces the amount of memory needed to store the parameters while also reducing the power consumed during operation of the neural network. Matrix operations performed by the neural networks require many multiplication calculations, so reducing the number of bits that are multiplied reduces the energy that is consumed. Quantizing smaller sets of the parameters using a shared scale factor improves accuracy compared with quantizing larger sets of the parameters. Accuracy of the calculations may be maintained by quantizing and scaling the parameters using fine-grained per-vector scale factors. A vector includes one or more elements within a single dimension of a multi-dimensional matrix.
    Type: Application
    Filed: October 30, 2020
    Publication date: March 3, 2022
    Inventors: Brucek Kurdo Khailany, Steve Haihang Dai, Rangharajan Venkatesan, Haoxing Ren
  • Publication number: 20220067512
    Abstract: Today neural networks are used to enable autonomous vehicles and improve the quality of speech recognition, real-time language translation, and online search optimizations. However, operation of the neural networks for these applications consumes energy. Quantization of parameters used by the neural networks reduces the amount of memory needed to store the parameters while also reducing the power consumed during operation of the neural network. Matrix operations performed by the neural networks require many multiplication calculations, so reducing the number of bits that are multiplied reduces the energy that is consumed. Quantizing smaller sets of the parameters using a shared scale factor improves accuracy compared with quantizing larger sets of the parameters. Accuracy of the calculations may be maintained by quantizing and scaling the parameters using fine-grained per-vector scale factors. A vector includes one or more elements within a single dimension of a multi-dimensional matrix.
    Type: Application
    Filed: October 30, 2020
    Publication date: March 3, 2022
    Inventors: Brucek Kurdo Khailany, Steve Haihang Dai, Rangharajan Venkatesan, Haoxing Ren
  • Publication number: 20220067513
    Abstract: Solutions improving efficiency of Softmax computation applied for efficient deep learning inference in transformers and other neural networks. The solutions utilize a reduced-precision implementation of various operations in Softmax, replacing ex with 2x to reduce instruction overhead associated with computing ex, and replacing floating point max computation with integer max computation. Further described is a scalable implementation that decomposes Softmax into UnNormalized Softmax and Normalization operations.
    Type: Application
    Filed: December 4, 2020
    Publication date: March 3, 2022
    Applicant: NVIDIA Corp.
    Inventors: Jacob Robert Stevens, Rangharajan Venkatesan, Steve Haihang Dai, Brucek Khailany
  • Patent number: 11151040
    Abstract: An approximate cache system is disclosed. The system includes a quality aware cache controller (QACC), a cache, a quality table configured to receive addresses and a quality specification from the processor associated with each address and further configured to provide the quality specification for each address to the QACC, wherein the QACC controls approximation is based on one or more of i) approximation through partial read operations; ii) approximation through lower read currents; iii) approximation through skipped write operations; iv) approximation through partial write operations; v) approximations through lower write duration; vi) approximation through lower write currents; and vii) approximations through skipped refreshes.
    Type: Grant
    Filed: March 24, 2019
    Date of Patent: October 19, 2021
    Assignee: Purdue Research Foundation
    Inventors: Ashish Ranjan, Swagath Venkataramani, Zoha Pajouhi, Rangharajan Venkatesan, Kaushik Roy, Anand Raghunathan
  • Publication number: 20210056397
    Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum.
    Type: Application
    Filed: August 23, 2019
    Publication date: February 25, 2021
    Inventors: William James Dally, Rangharajan Venkatesan, Brucek Kurdo Khailany
  • Publication number: 20210056446
    Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components using an asynchronous accumulator to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum.
    Type: Application
    Filed: January 23, 2020
    Publication date: February 25, 2021
    Inventors: William James Dally, Rangharajan Venkatesan, Brucek Kurdo Khailany