Patents by Inventor Eric S. Chung

Eric S. Chung has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230140185
    Abstract: Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and in particular for storing activation values from a neural network in a compressed format having lossy or non-uniform mantissas for use during forward and backward propagation training of the neural network. In certain examples of the disclosed technology, a computing system includes processors, memory, and a compressor in communication with the memory. The computing system is configured to perform forward propagation for a layer of a neural network to produced first activation values in a first block floating-point format. In some examples, activation values generated by forward propagation are converted by the compressor to a second block floating-point format having a non-uniform and/or lossy mantissa. The compressed activation values are stored in the memory, where they can be retrieved for use during back propagation.
    Type: Application
    Filed: January 3, 2023
    Publication date: May 4, 2023
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Daniel Lo, Amar Phanishayee, Eric S. Chung, Yiren Zhao
  • Publication number: 20230106651
    Abstract: Aspects of embodiments of the present disclosure relate to a field programmable gate array (FPGA) configured to implement an exponential function data path including: an input scaling stage including constant shifters and integer adders to scale a mantissa portion of an input floating-point value by approximately log2 e to compute a scaled mantissa value, where e is Euler's number; and an exponential stage including barrel shifters and an exponential lookup table to: extract an integer portion and a fractional portion from the scaled mantissa value based on the exponent portion of the input floating-point value; apply a bias shift to the integer portion to compute a result exponent portion of a result floating-point value; lookup a result mantissa portion of the result floating-point value in the exponential lookup table based on the fractional portion; and combine the result exponent portion and the result mantissa portion to generate the result floating-point value.
    Type: Application
    Filed: September 28, 2021
    Publication date: April 6, 2023
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Jinwen XI, Ritchie ZHAO, Ming Gang LIU, Eric S. CHUNG
  • Patent number: 11604960
    Abstract: Machine learning is utilized to learn an optimized quantization configuration for an artificial neural network (ANN). For example, an ANN can be utilized to learn an optimal bit width for quantizing weights for layers of the ANN. The ANN can also be utilized to learn an optimal bit width for quantizing activation values for the layers of the ANN. Once the bit widths have been learned, they can be utilized at inference time to improve the performance of the ANN by quantizing the weights and activation values of the layers of the ANN.
    Type: Grant
    Filed: March 18, 2019
    Date of Patent: March 14, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Kalin Ovtcharov, Eric S. Chung, Vahideh Akhlaghi, Ritchie Zhao
  • Patent number: 11586883
    Abstract: Methods and apparatus are disclosed for providing emulation of quantized precision operations in a neural network. In some examples, the quantized precision operations are performed in a block floating-point format where values of a tensor share a common exponent. Techniques for selecting higher precision or lower precision can be used based on a variety of input metrics. When converting to a quantized tensor, a residual tensor is produced. In one embodiment, an error value associated with converting from a normal-precision floating point number to the quantized tensor is used to determine whether to use the residual tensor in a dot product calculation. Using the residual tensor increases the precision of an output from a node. Selection of whether to use the residual tensor can depend on various input metrics including the error value, the layer number, the exponent value, the layer type, etc.
    Type: Grant
    Filed: December 14, 2018
    Date of Patent: February 21, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Eric S. Chung, Daniel Lo, Jialiang Zhang, Ritchie Zhao
  • Patent number: 11574239
    Abstract: Machine learning may include training and drawing inference from artificial neural networks, processes which may include performing convolution and matrix multiplication operations. Convolution and matrix multiplication operations are performed using vectors of block floating-point (BFP) values that may include outliers. BFP format stores floating-point values using a plurality of mantissas of a fixed bit width and a shared exponent. Elements are outliers when they are too large to be represented precisely with the fixed bit width mantissa and shared exponent. Outlier values are split into two mantissas. One mantissa is stored in the vector with non-outliers, while the other mantissa is stored outside the vector. Operations, such as a dot product, may be performed on the vectors in part by combining the in-vector mantissa and exponent of an outlier value with the out-of-vector mantissa and exponent.
    Type: Grant
    Filed: March 18, 2019
    Date of Patent: February 7, 2023
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Eric S. Chung, Daniel Lo, Ritchie Zhao
  • Patent number: 11562247
    Abstract: Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and in particular for storing activation values from a neural network in a compressed format having lossy or non-uniform mantissas for use during forward and backward propagation training of the neural network. In certain examples of the disclosed technology, a computing system includes processors, memory, and a compressor in communication with the memory. The computing system is configured to perform forward propagation for a layer of a neural network to produced first activation values in a first block floating-point format. In some examples, activation values generated by forward propagation are converted by the compressor to a second block floating-point format having a non-uniform and/or lossy mantissa. The compressed activation values are stored in the memory, where they can be retrieved for use during back propagation.
    Type: Grant
    Filed: January 24, 2019
    Date of Patent: January 24, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Daniel Lo, Amar Phanishayee, Eric S. Chung, Yiren Zhao
  • Patent number: 11556762
    Abstract: Neural network processors that have been customized based on application specific synthesis specialization parameters and related methods are described. Certain example neural network processors and methods described in the present disclosure expose several major synthesis specialization parameters that can be used for specializing a microarchitecture instance of a neural network processor to specific neural network models including: (1) aligning the native vector dimension to the parameters of the model to minimize padding and waste during model evaluation, (2) increasing lane widths to drive up intra-row-level parallelism, or (3) increasing matrix multiply tiles to exploit sub-matrix parallelism for large neural network models.
    Type: Grant
    Filed: April 21, 2018
    Date of Patent: January 17, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Jeremy Fowers, Kalin Ovtcharov, Eric S. Chung, Todd Michael Massengill, Ming Gang Liu, Gabriel Leonard Weisz
  • Publication number: 20220405571
    Abstract: Embodiments of the present disclosure include systems and methods for sparsifying narrow data formats for neural networks. A plurality of activation values in a neural network are provided to a muxing unit. A set of sparsification operations are performed on a plurality of weight values to generate a subset of the plurality of weight values and mask values associated with the plurality of weight values. The subset of the plurality of weight values are provided to a matrix multiplication unit. The muxing unit generates a subset of the plurality of activation values based on the mask values and provides the subset of the plurality of activation values to the matrix multiplication unit. The matrix multiplication unit performs a set of matrix multiplication operations on the subset of the plurality of weight values and the subset of the plurality of activation values to generate a set of outputs.
    Type: Application
    Filed: June 16, 2021
    Publication date: December 22, 2022
    Inventors: Bita DARVISH ROUHANI, Venmugil Elango, Eric S. Chung, Douglas C Burger, Mattheus C. Heddes, Nishit Shah, Rasoul Shafipour, Ankit More
  • Publication number: 20220391209
    Abstract: Hardware and methods for neural network processing are provided. A method in a hardware node including a pipeline having a matrix vector unit (MVU), a first multifunction unit connected to receive an input from the matrix vector unit, a second multifunction unit connected to receive an output from the first multifunction unit, and a third multifunction unit connected to receive an output from the second multifunction unit is provided. The method includes performing using the MVU a first type of instruction that can only be performed by the MVU to generate a first result. The method further includes performing a second type of instruction that can only be performed by one of the multifunction units and generating a second result and without storing the any of the two results in a global register, passing the second result to the second multifunction and the third multifunction unit.
    Type: Application
    Filed: August 8, 2022
    Publication date: December 8, 2022
    Inventors: Jeremy FOWERS, Eric S. CHUNG, Douglas C. BURGER
  • Publication number: 20220383092
    Abstract: Embodiments of the present disclosure includes systems and methods for reducing computational cost associated with training a neural network model. A neural network model is received and a neural network training process is executed in which the neural network model is trained according to a first fidelity during a first training phase. As a result of a determination that training of the neural network model during the first training phase satisfies one or more criteria, the neural network model is trained at a second fidelity during a second training phase, the second fidelity being a higher fidelity than the first fidelity.
    Type: Application
    Filed: May 25, 2021
    Publication date: December 1, 2022
    Inventors: Ritchie ZHAO, Bita DARVISH ROUHANI, Eric S. CHUNG, Douglas C. BURGER, Maximilian GOLUB
  • Publication number: 20220383123
    Abstract: Embodiments of the present disclosure include systems and methods for performing data-aware model pruning for neural networks. During a training phase, a neural network is trained with a first set of data. During a validation phase, inference with the neural network is performed using a second set of data that causes the neural network to generate a first set of outputs at a layer in the neural network. During the validation phase, a plurality of mean values and a plurality of variance values are calculated based on the first set of outputs. A plurality of entropy values are calculated based on the plurality of mean values and the plurality of variance values. A second set of outputs are pruned based on the plurality of entropy values. The second set of outputs are generated by the layer of the neural network using a third set of data.
    Type: Application
    Filed: May 28, 2021
    Publication date: December 1, 2022
    Inventors: Venmugil ELANGO, Bita DARVISH ROUHANI, Eric S. CHUNG, Douglas C. BURGER, Maximilian GOLUB
  • Publication number: 20220366236
    Abstract: Embodiments of the present disclosure include systems and methods for reducing operations for training neural networks. A plurality of training data selected from a training data set is used as a plurality of inputs for training a neural network. The neural network includes a plurality of weights. A plurality of loss values are determined based on outputs generated by the neural network and expected output data of the plurality of training data. A subset of the plurality of loss values are determined. An average loss value is determined based on the subset of the plurality of loss values. A set of gradients is calculated based on the average loss value and the plurality of weights in the neural network. The plurality of weights in the neural network are adjusted based on the set of gradients.
    Type: Application
    Filed: May 17, 2021
    Publication date: November 17, 2022
    Inventors: Maral MESMAKHOSROSHAHI, Bita Darvish ROUHANI, Eric S. CHUNG, Douglas C. BURGER
  • Patent number: 11494614
    Abstract: Perplexity scores are computed for training data samples during ANN training. Perplexity scores can be computed as a divergence between data defining a class associated with a current training data sample and a probability vector generated by the ANN model. Perplexity scores can alternately be computed by learning a probability density function (“PDF”) fitting activation maps generated by an ANN model during training. A perplexity score can then be computed for a current training data sample by computing a probability for the current training data sample based on the PDF. If the perplexity score for a training data sample is lower than a threshold, the training data sample is removed from the training data set so that it will not be utilized for training during subsequent epochs. Training of the ANN model continues following the removal of training data samples from the training data set.
    Type: Grant
    Filed: March 20, 2019
    Date of Patent: November 8, 2022
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Eric S. Chung, Douglas C. Burger, Bita Darvish Rouhani
  • Publication number: 20220253281
    Abstract: Embodiments of the present disclosure include systems and methods for providing hierarchical and shared exponent floating point data types. First and second shared exponent values are determined based on exponent values of a plurality of floating point values. A third shared exponent value is determined based the first shared exponent value and the second shared exponent value. First and second difference values are determined based on the first shared exponent value, the second shared exponent value, and the third shared exponent value. Sign values and mantissa values are determined for the plurality of floating point values. The sign value and the mantissa value for each floating point value in the plurality of floating point values, the third shared exponent value, the first difference value, and the second difference value are stored in a data structure for a shared exponent floating point data type.
    Type: Application
    Filed: June 28, 2021
    Publication date: August 11, 2022
    Inventors: Bita DARVISH ROUHANI, Venmugil ELANGO, Rasoul SHAFIPOUR, Jeremy FOWERS, Ming Gang LIU, Jinwen XI, Douglas C. BURGER, Eric S. CHUNG
  • Publication number: 20220012577
    Abstract: Systems and methods for neural network processing are provided. A method in a system comprising a plurality of nodes interconnected via a network, where each node includes a plurality of on-chip memory blocks and a plurality of compute units, is provided. The method includes upon service activation receiving an N by M matrix of coefficients corresponding to the neural network model. The method includes loading the coefficients corresponding to the neural network model into the plurality of the on-chip memory blocks for processing by the plurality of compute units. The method includes regardless of a utilization of the plurality of the on-chip memory blocks as part of an evaluation of the neural network model, maintaining the coefficients corresponding to the neural network model in the plurality of the on-chip memory blocks until the service is interrupted or the neural network model is modified or replaced.
    Type: Application
    Filed: September 23, 2021
    Publication date: January 13, 2022
    Inventors: Eric S. Chung, Douglas C. Burger, Jeremy Fowers, Kalin Ovtcharov
  • Publication number: 20210406657
    Abstract: Processors and methods for neural network processing are provided. A method in a processor including a pipeline having a matrix vector unit (MVU), a first multifunction unit connected to receive an input from the MVU, a second multifunction unit connected to receive an output from the first multifunction unit, and a third multifunction unit connected to receive an output from the second multifunction unit is provided. The method includes decoding instructions including a first type of instruction for processing by only the MVU and a second type of instruction for processing by only one of the multifunction units. The method includes mapping a first instruction for processing by the matrix vector unit or to any one of the first multifunction unit, the second multifunction unit, or the third multifunction unit depending on whether the first instruction is the first type of instruction or the second type of instruction.
    Type: Application
    Filed: August 23, 2021
    Publication date: December 30, 2021
    Inventors: Eric S. Chung, Douglas C. Burger, Jeremy Fowers
  • Patent number: 11157801
    Abstract: Systems and methods for neural network processing are provided. A method in a system comprising a plurality of nodes interconnected via a network, where each node includes a plurality of on-chip memory blocks and a plurality of compute units, is provided. The method includes upon service activation receiving an N by M matrix of coefficients corresponding to the neural network model. The method includes loading the coefficients corresponding to the neural network model into the plurality of the on-chip memory blocks for processing by the plurality of compute units. The method includes regardless of a utilization of the plurality of the on-chip memory blocks as part of an evaluation of the neural network model, maintaining the coefficients corresponding to the neural network model in the plurality of the on-chip memory blocks until the service is interrupted or the neural network model is modified or replaced.
    Type: Grant
    Filed: June 29, 2017
    Date of Patent: October 26, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Eric S. Chung, Douglas C. Burger, Jeremy Fowers, Kalin Ovtcharov
  • Patent number: 11144820
    Abstract: Processors and methods for neural network processing are provided. A method in a processor including a pipeline having a matrix vector unit (MVU), a first multifunction unit connected to receive an input from the matrix vector unit, a second multifunction unit connected to receive an output from the first multifunction unit, and a third multifunction unit connected to receive an output from the second multifunction unit is provided. The method includes decoding a chain of instructions received via an input queue, where the chain of instructions comprises a first instruction that can only be processed by the matrix vector unit and a sequence of instructions that can only be processed by a multifunction unit. The method includes processing the first instruction using the MVU and processing each of instructions in the sequence of instructions depending upon a position of the each of instructions in the sequence of instructions.
    Type: Grant
    Filed: June 29, 2017
    Date of Patent: October 12, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Eric S. Chung, Douglas C. Burger, Jeremy Fowers
  • Patent number: 11132599
    Abstract: Processors and methods for neural network processing are provided. A method in a processor including a pipeline having a matrix vector unit (MVU), a first multifunction unit connected to receive an input from the MVU, a second multifunction unit connected to receive an output from the first multifunction unit, and a third multifunction unit connected to receive an output from the second multifunction unit is provided. The method includes decoding instructions including a first type of instruction for processing by only the MVU and a second type of instruction for processing by only one of the multifunction units. The method includes mapping a first instruction for processing by the matrix vector unit or to any one of the first multifunction unit, the second multifunction unit, or the third multifunction unit depending on whether the first instruction is the first type of instruction or the second type of instruction.
    Type: Grant
    Filed: June 29, 2017
    Date of Patent: September 28, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Eric S. Chung, Douglas C. Burger, Jeremy Fowers
  • Patent number: 10958717
    Abstract: A server system is provided that includes a plurality of servers, each server including at least one hardware acceleration device and at least one processor communicatively coupled to the hardware acceleration device by an internal data bus and executing a host server instance, the host server instances of the plurality of servers collectively providing a software plane, and the hardware acceleration devices of the plurality of servers collectively providing a hardware acceleration plane that implements a plurality of hardware accelerated services, wherein each hardware acceleration device maintains in memory a data structure that contains load data indicating a load of each of a plurality of target hardware acceleration devices, and wherein a requesting hardware acceleration device routes the request to a target hardware acceleration device that is indicated by the load data in the data structure to have a lower load than other of the target hardware acceleration devices.
    Type: Grant
    Filed: August 30, 2019
    Date of Patent: March 23, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Adrian Michael Caulfield, Eric S. Chung, Michael Konstantinos Papamichael, Douglas C. Burger, Shlomi Alkalay