Patents by Inventor Steve Haihang Dai

Steve Haihang Dai has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12045307
    Abstract: Today neural networks are used to enable autonomous vehicles and improve the quality of speech recognition, real-time language translation, and online search optimizations. However, operation of the neural networks for these applications consumes energy. Quantization of parameters used by the neural networks reduces the amount of memory needed to store the parameters while also reducing the power consumed during operation of the neural network. Matrix operations performed by the neural networks require many multiplication calculations, so reducing the number of bits that are multiplied reduces the energy that is consumed. Quantizing smaller sets of the parameters using a shared scale factor improves accuracy compared with quantizing larger sets of the parameters. Accuracy of the calculations may be maintained by quantizing and scaling the parameters using fine-grained per-vector scale factors. A vector includes one or more elements within a single dimension of a multi-dimensional matrix.
    Type: Grant
    Filed: October 30, 2020
    Date of Patent: July 23, 2024
    Assignee: NVIDIA Corporation
    Inventors: Brucek Kurdo Khailany, Steve Haihang Dai, Rangharajan Venkatesan, Haoxing Ren
  • Publication number: 20230237308
    Abstract: Quantizing tensors and vectors processed within a neural network reduces power consumption and may accelerate processing. Quantization reduces the number of bits used to represent a value, where decreasing the number of bits used can decrease the accuracy of computations that use the value. Ideally, quantization is performed without reducing accuracy. Quantization-aware training (QAT) is performed by dynamically quantizing tensors (weights and activations) using optimal clipping scalars. “Optimal” in that the mean squared error (MSE) of the quantized operation is minimized and the clipping scalars define the degree or amount of quantization for various tensors of the operation. Conventional techniques that quantize tensors during training suffer from high amounts of noise (error). Other techniques compute the clipping scalars offline through a brute force search to provide high accuracy.
    Type: Application
    Filed: July 26, 2022
    Publication date: July 27, 2023
    Inventors: Charbel Sakr, Steve Haihang Dai, Brucek Kurdo Khailany, William James Dally, Rangharajan Venkatesan, Brian Matthew Zimmer
  • Publication number: 20220261650
    Abstract: An end-to-end low-precision training system based on a multi-base logarithmic number system and a multiplicative weight update algorithm. The multi-base logarithmic number system is applied to update weights of the neural network, with different bases of the multi-base logarithmic number system utilized between calculation of weight updates, calculation of feed-forward signals, and calculation of feedback signals. The LNS expresses a high dynamic range and computational energy efficiency, making it advantageous for on-board training in energy-constrained edge devices.
    Type: Application
    Filed: June 11, 2021
    Publication date: August 18, 2022
    Applicant: NVIDIA Corp.
    Inventors: Jiawei Zhao, Steve Haihang Dai, Rangharajan Venkatesan, Ming-Yu Liu, William James Dally, Anima Anandkumar
  • Publication number: 20220067513
    Abstract: Solutions improving efficiency of Softmax computation applied for efficient deep learning inference in transformers and other neural networks. The solutions utilize a reduced-precision implementation of various operations in Softmax, replacing ex with 2x to reduce instruction overhead associated with computing ex, and replacing floating point max computation with integer max computation. Further described is a scalable implementation that decomposes Softmax into UnNormalized Softmax and Normalization operations.
    Type: Application
    Filed: December 4, 2020
    Publication date: March 3, 2022
    Applicant: NVIDIA Corp.
    Inventors: Jacob Robert Stevens, Rangharajan Venkatesan, Steve Haihang Dai, Brucek Khailany
  • Publication number: 20220067512
    Abstract: Today neural networks are used to enable autonomous vehicles and improve the quality of speech recognition, real-time language translation, and online search optimizations. However, operation of the neural networks for these applications consumes energy. Quantization of parameters used by the neural networks reduces the amount of memory needed to store the parameters while also reducing the power consumed during operation of the neural network. Matrix operations performed by the neural networks require many multiplication calculations, so reducing the number of bits that are multiplied reduces the energy that is consumed. Quantizing smaller sets of the parameters using a shared scale factor improves accuracy compared with quantizing larger sets of the parameters. Accuracy of the calculations may be maintained by quantizing and scaling the parameters using fine-grained per-vector scale factors. A vector includes one or more elements within a single dimension of a multi-dimensional matrix.
    Type: Application
    Filed: October 30, 2020
    Publication date: March 3, 2022
    Inventors: Brucek Kurdo Khailany, Steve Haihang Dai, Rangharajan Venkatesan, Haoxing Ren
  • Publication number: 20220067530
    Abstract: Today neural networks are used to enable autonomous vehicles and improve the quality of speech recognition, real-time language translation, and online search optimizations. However, operation of the neural networks for these applications consumes energy. Quantization of parameters used by the neural networks reduces the amount of memory needed to store the parameters while also reducing the power consumed during operation of the neural network. Matrix operations performed by the neural networks require many multiplication calculations, so reducing the number of bits that are multiplied reduces the energy that is consumed. Quantizing smaller sets of the parameters using a shared scale factor improves accuracy compared with quantizing larger sets of the parameters. Accuracy of the calculations may be maintained by quantizing and scaling the parameters using fine-grained per-vector scale factors. A vector includes one or more elements within a single dimension of a multi-dimensional matrix.
    Type: Application
    Filed: October 30, 2020
    Publication date: March 3, 2022
    Inventors: Brucek Kurdo Khailany, Steve Haihang Dai, Rangharajan Venkatesan, Haoxing Ren