Patents by Inventor Rangharajan VENKATESAN
Rangharajan VENKATESAN has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12141225Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components using an asynchronous accumulator to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum.Type: GrantFiled: January 23, 2020Date of Patent: November 12, 2024Assignee: NVIDIA CorporationInventors: William James Dally, Rangharajan Venkatesan, Brucek Kurdo Khailany
-
Patent number: 12118454Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum.Type: GrantFiled: December 12, 2023Date of Patent: October 15, 2024Assignee: NVIDIA CorporationInventors: William James Dally, Rangharajan Venkatesan, Brucek Kurdo Khailany
-
Publication number: 20240311626Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components using an asynchronous accumulator to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum.Type: ApplicationFiled: May 24, 2024Publication date: September 19, 2024Inventors: William James Dally, Rangharajan Venkatesan, Brucek Kurdo Khailany, Stephen G. Tell
-
Patent number: 12045307Abstract: Today neural networks are used to enable autonomous vehicles and improve the quality of speech recognition, real-time language translation, and online search optimizations. However, operation of the neural networks for these applications consumes energy. Quantization of parameters used by the neural networks reduces the amount of memory needed to store the parameters while also reducing the power consumed during operation of the neural network. Matrix operations performed by the neural networks require many multiplication calculations, so reducing the number of bits that are multiplied reduces the energy that is consumed. Quantizing smaller sets of the parameters using a shared scale factor improves accuracy compared with quantizing larger sets of the parameters. Accuracy of the calculations may be maintained by quantizing and scaling the parameters using fine-grained per-vector scale factors. A vector includes one or more elements within a single dimension of a multi-dimensional matrix.Type: GrantFiled: October 30, 2020Date of Patent: July 23, 2024Assignee: NVIDIA CorporationInventors: Brucek Kurdo Khailany, Steve Haihang Dai, Rangharajan Venkatesan, Haoxing Ren
-
Patent number: 12033060Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components using an asynchronous accumulator to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum.Type: GrantFiled: January 23, 2020Date of Patent: July 9, 2024Assignee: NVIDIA CorporationInventors: William James Dally, Rangharajan Venkatesan, Brucek Kurdo Khailany, Stephen G. Tell
-
Publication number: 20240160406Abstract: Mechanisms to exploit the inherent resiliency of deep learning inference workloads to improve the energy efficiency of computer processors such as graphics processing units with these workloads. The mechanisms provide energy-accuracy tradeoffs in the computation of deep learning inference calculations via energy-efficient floating point data path micro-architectures with integer accumulation, and enhanced mechanisms for per-vector scaled quantization (VS-Quant) of floating-point arguments.Type: ApplicationFiled: October 11, 2023Publication date: May 16, 2024Applicant: NVIDIA Corp.Inventors: Rangharajan Venkatesan, Reena Elangovan, Charbel Sakr, Brucek Kurdo Khailany, Ming Y Siu, Ilyas Elkin, Brent Ralph Boswell
-
Publication number: 20240112007Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum.Type: ApplicationFiled: December 12, 2023Publication date: April 4, 2024Inventors: William James Dally, Rangharajan Venkatesan, Brucek Kurdo Khailany
-
Patent number: 11886980Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum.Type: GrantFiled: August 23, 2019Date of Patent: January 30, 2024Assignee: NVIDIA CorporationInventors: William James Dally, Rangharajan Venkatesan, Brucek Kurdo Khailany
-
Patent number: 11769040Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture implemented on a semiconductor package. The package includes multiple chips, each with a central processing element, a global memory buffer, and processing elements. Each processing element includes a weight buffer, an activation buffer, and multiply-accumulate units to combine, in parallel, the weight values and the activation values.Type: GrantFiled: July 19, 2019Date of Patent: September 26, 2023Assignee: NVIDIA CORP.Inventors: Yakun Shao, Rangharajan Venkatesan, Nan Jiang, Brian Matthew Zimmer, Jason Clemons, Nathaniel Pinckney, Matthew R Fojtik, William James Dally, Joel S. Emer, Stephen W. Keckler, Brucek Khailany
-
Publication number: 20230237308Abstract: Quantizing tensors and vectors processed within a neural network reduces power consumption and may accelerate processing. Quantization reduces the number of bits used to represent a value, where decreasing the number of bits used can decrease the accuracy of computations that use the value. Ideally, quantization is performed without reducing accuracy. Quantization-aware training (QAT) is performed by dynamically quantizing tensors (weights and activations) using optimal clipping scalars. “Optimal” in that the mean squared error (MSE) of the quantized operation is minimized and the clipping scalars define the degree or amount of quantization for various tensors of the operation. Conventional techniques that quantize tensors during training suffer from high amounts of noise (error). Other techniques compute the clipping scalars offline through a brute force search to provide high accuracy.Type: ApplicationFiled: July 26, 2022Publication date: July 27, 2023Inventors: Charbel Sakr, Steve Haihang Dai, Brucek Kurdo Khailany, William James Dally, Rangharajan Venkatesan, Brian Matthew Zimmer
-
Publication number: 20230068941Abstract: One embodiment of a computer-implemented method for processing a neural network comprises receiving a first quantized matrix that corresponds to a portion of a multi-dimensional input tensor and has been quantized based on a first scale factor; and performing one or more computational operations using the first quantized matrix and the first scale factor to generate one or more data values that correspond to a first portion of a multi-dimensional output tensor.Type: ApplicationFiled: February 11, 2022Publication date: March 2, 2023Inventors: Thierry TAMBE, Steve DAI, Brucek KHAILANY, Rangharajan VENKATESAN
-
Publication number: 20220261650Abstract: An end-to-end low-precision training system based on a multi-base logarithmic number system and a multiplicative weight update algorithm. The multi-base logarithmic number system is applied to update weights of the neural network, with different bases of the multi-base logarithmic number system utilized between calculation of weight updates, calculation of feed-forward signals, and calculation of feedback signals. The LNS expresses a high dynamic range and computational energy efficiency, making it advantageous for on-board training in energy-constrained edge devices.Type: ApplicationFiled: June 11, 2021Publication date: August 18, 2022Applicant: NVIDIA Corp.Inventors: Jiawei Zhao, Steve Haihang Dai, Rangharajan Venkatesan, Ming-Yu Liu, William James Dally, Anima Anandkumar
-
Publication number: 20220076110Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture includes multiple chips, each with a central processing element, a global memory buffer, and a plurality of additional processing elements. Each additional processing element includes a weight buffer, an activation buffer, and vector multiply-accumulate units to combine, in parallel, the weight values and the activation values using stationary data flows.Type: ApplicationFiled: November 19, 2021Publication date: March 10, 2022Applicant: NVIDIA Corp.Inventors: Yakun Shao, Rangharajan Venkatesan, Miaorong Wang, Daniel Smith, William James Dally, Joel Emer, Stephen W. Keckler, Brucek Khailany
-
Patent number: 11270197Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture includes multiple chips, each with a central processing element, a global memory buffer, and a plurality of additional processing elements. Each additional processing element includes a weight buffer, an activation buffer, and vector multiply-accumulate units to combine, in parallel, the weight values and the activation values using stationary data flows.Type: GrantFiled: November 4, 2019Date of Patent: March 8, 2022Assignee: NVIDIA Corp.Inventors: Yakun Shao, Rangharajan Venkatesan, Miaorong Wang, Daniel Smith, William James Dally, Joel Emer, Stephen W. Keckler, Brucek Khailany
-
Publication number: 20220067530Abstract: Today neural networks are used to enable autonomous vehicles and improve the quality of speech recognition, real-time language translation, and online search optimizations. However, operation of the neural networks for these applications consumes energy. Quantization of parameters used by the neural networks reduces the amount of memory needed to store the parameters while also reducing the power consumed during operation of the neural network. Matrix operations performed by the neural networks require many multiplication calculations, so reducing the number of bits that are multiplied reduces the energy that is consumed. Quantizing smaller sets of the parameters using a shared scale factor improves accuracy compared with quantizing larger sets of the parameters. Accuracy of the calculations may be maintained by quantizing and scaling the parameters using fine-grained per-vector scale factors. A vector includes one or more elements within a single dimension of a multi-dimensional matrix.Type: ApplicationFiled: October 30, 2020Publication date: March 3, 2022Inventors: Brucek Kurdo Khailany, Steve Haihang Dai, Rangharajan Venkatesan, Haoxing Ren
-
Publication number: 20220067512Abstract: Today neural networks are used to enable autonomous vehicles and improve the quality of speech recognition, real-time language translation, and online search optimizations. However, operation of the neural networks for these applications consumes energy. Quantization of parameters used by the neural networks reduces the amount of memory needed to store the parameters while also reducing the power consumed during operation of the neural network. Matrix operations performed by the neural networks require many multiplication calculations, so reducing the number of bits that are multiplied reduces the energy that is consumed. Quantizing smaller sets of the parameters using a shared scale factor improves accuracy compared with quantizing larger sets of the parameters. Accuracy of the calculations may be maintained by quantizing and scaling the parameters using fine-grained per-vector scale factors. A vector includes one or more elements within a single dimension of a multi-dimensional matrix.Type: ApplicationFiled: October 30, 2020Publication date: March 3, 2022Inventors: Brucek Kurdo Khailany, Steve Haihang Dai, Rangharajan Venkatesan, Haoxing Ren
-
Publication number: 20220067513Abstract: Solutions improving efficiency of Softmax computation applied for efficient deep learning inference in transformers and other neural networks. The solutions utilize a reduced-precision implementation of various operations in Softmax, replacing ex with 2x to reduce instruction overhead associated with computing ex, and replacing floating point max computation with integer max computation. Further described is a scalable implementation that decomposes Softmax into UnNormalized Softmax and Normalization operations.Type: ApplicationFiled: December 4, 2020Publication date: March 3, 2022Applicant: NVIDIA Corp.Inventors: Jacob Robert Stevens, Rangharajan Venkatesan, Steve Haihang Dai, Brucek Khailany
-
Patent number: 11151040Abstract: An approximate cache system is disclosed. The system includes a quality aware cache controller (QACC), a cache, a quality table configured to receive addresses and a quality specification from the processor associated with each address and further configured to provide the quality specification for each address to the QACC, wherein the QACC controls approximation is based on one or more of i) approximation through partial read operations; ii) approximation through lower read currents; iii) approximation through skipped write operations; iv) approximation through partial write operations; v) approximations through lower write duration; vi) approximation through lower write currents; and vii) approximations through skipped refreshes.Type: GrantFiled: March 24, 2019Date of Patent: October 19, 2021Assignee: Purdue Research FoundationInventors: Ashish Ranjan, Swagath Venkataramani, Zoha Pajouhi, Rangharajan Venkatesan, Kaushik Roy, Anand Raghunathan
-
Publication number: 20210056397Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum.Type: ApplicationFiled: August 23, 2019Publication date: February 25, 2021Inventors: William James Dally, Rangharajan Venkatesan, Brucek Kurdo Khailany
-
Publication number: 20210056446Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components using an asynchronous accumulator to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum.Type: ApplicationFiled: January 23, 2020Publication date: February 25, 2021Inventors: William James Dally, Rangharajan Venkatesan, Brucek Kurdo Khailany