Patents by Inventor Eric S. Chung

Eric S. Chung has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

NEURAL NETWORK ACTIVATION COMPRESSION WITH NON-UNIFORM MANTISSAS

Publication number: 20230140185

Abstract: Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and in particular for storing activation values from a neural network in a compressed format having lossy or non-uniform mantissas for use during forward and backward propagation training of the neural network. In certain examples of the disclosed technology, a computing system includes processors, memory, and a compressor in communication with the memory. The computing system is configured to perform forward propagation for a layer of a neural network to produced first activation values in a first block floating-point format. In some examples, activation values generated by forward propagation are converted by the compressor to a second block floating-point format having a non-uniform and/or lossy mantissa. The compressed activation values are stored in the memory, where they can be retrieved for use during back propagation.

Type: Application

Filed: January 3, 2023

Publication date: May 4, 2023

Applicant: Microsoft Technology Licensing, LLC

Inventors: Daniel Lo, Amar Phanishayee, Eric S. Chung, Yiren Zhao
SYSTEMS AND METHODS FOR ACCELERATING THE COMPUTATION OF THE EXPONENTIAL FUNCTION

Publication number: 20230106651

Abstract: Aspects of embodiments of the present disclosure relate to a field programmable gate array (FPGA) configured to implement an exponential function data path including: an input scaling stage including constant shifters and integer adders to scale a mantissa portion of an input floating-point value by approximately log2 e to compute a scaled mantissa value, where e is Euler's number; and an exponential stage including barrel shifters and an exponential lookup table to: extract an integer portion and a fractional portion from the scaled mantissa value based on the exponent portion of the input floating-point value; apply a bias shift to the integer portion to compute a result exponent portion of a result floating-point value; lookup a result mantissa portion of the result floating-point value in the exponential lookup table based on the fractional portion; and combine the result exponent portion and the result mantissa portion to generate the result floating-point value.

Type: Application

Filed: September 28, 2021

Publication date: April 6, 2023

Applicant: Microsoft Technology Licensing, LLC

Inventors: Jinwen XI, Ritchie ZHAO, Ming Gang LIU, Eric S. CHUNG
Differential bit width neural architecture search

Patent number: 11604960

Abstract: Machine learning is utilized to learn an optimized quantization configuration for an artificial neural network (ANN). For example, an ANN can be utilized to learn an optimal bit width for quantizing weights for layers of the ANN. The ANN can also be utilized to learn an optimal bit width for quantizing activation values for the layers of the ANN. Once the bit widths have been learned, they can be utilized at inference time to improve the performance of the ANN by quantizing the weights and activation values of the layers of the ANN.

Type: Grant

Filed: March 18, 2019

Date of Patent: March 14, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Kalin Ovtcharov, Eric S. Chung, Vahideh Akhlaghi, Ritchie Zhao
Residual quantization for neural networks

Patent number: 11586883

Abstract: Methods and apparatus are disclosed for providing emulation of quantized precision operations in a neural network. In some examples, the quantized precision operations are performed in a block floating-point format where values of a tensor share a common exponent. Techniques for selecting higher precision or lower precision can be used based on a variety of input metrics. When converting to a quantized tensor, a residual tensor is produced. In one embodiment, an error value associated with converting from a normal-precision floating point number to the quantized tensor is used to determine whether to use the residual tensor in a dot product calculation. Using the residual tensor increases the precision of an output from a node. Selection of whether to use the residual tensor can depend on various input metrics including the error value, the layer number, the exponent value, the layer type, etc.

Type: Grant

Filed: December 14, 2018

Date of Patent: February 21, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Eric S. Chung, Daniel Lo, Jialiang Zhang, Ritchie Zhao
Outlier quantization for training and inference

Patent number: 11574239

Abstract: Machine learning may include training and drawing inference from artificial neural networks, processes which may include performing convolution and matrix multiplication operations. Convolution and matrix multiplication operations are performed using vectors of block floating-point (BFP) values that may include outliers. BFP format stores floating-point values using a plurality of mantissas of a fixed bit width and a shared exponent. Elements are outliers when they are too large to be represented precisely with the fixed bit width mantissa and shared exponent. Outlier values are split into two mantissas. One mantissa is stored in the vector with non-outliers, while the other mantissa is stored outside the vector. Operations, such as a dot product, may be performed on the vectors in part by combining the in-vector mantissa and exponent of an outlier value with the out-of-vector mantissa and exponent.

Type: Grant

Filed: March 18, 2019

Date of Patent: February 7, 2023

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Eric S. Chung, Daniel Lo, Ritchie Zhao
Neural network activation compression with non-uniform mantissas

Patent number: 11562247

Abstract: Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and in particular for storing activation values from a neural network in a compressed format having lossy or non-uniform mantissas for use during forward and backward propagation training of the neural network. In certain examples of the disclosed technology, a computing system includes processors, memory, and a compressor in communication with the memory. The computing system is configured to perform forward propagation for a layer of a neural network to produced first activation values in a first block floating-point format. In some examples, activation values generated by forward propagation are converted by the compressor to a second block floating-point format having a non-uniform and/or lossy mantissa. The compressed activation values are stored in the memory, where they can be retrieved for use during back propagation.

Type: Grant

Filed: January 24, 2019

Date of Patent: January 24, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Daniel Lo, Amar Phanishayee, Eric S. Chung, Yiren Zhao
Neural network processor based on application specific synthesis specialization parameters

Patent number: 11556762

Abstract: Neural network processors that have been customized based on application specific synthesis specialization parameters and related methods are described. Certain example neural network processors and methods described in the present disclosure expose several major synthesis specialization parameters that can be used for specializing a microarchitecture instance of a neural network processor to specific neural network models including: (1) aligning the native vector dimension to the parameters of the model to minimize padding and waste during model evaluation, (2) increasing lane widths to drive up intra-row-level parallelism, or (3) increasing matrix multiply tiles to exploit sub-matrix parallelism for large neural network models.

Type: Grant

Filed: April 21, 2018

Date of Patent: January 17, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Jeremy Fowers, Kalin Ovtcharov, Eric S. Chung, Todd Michael Massengill, Ming Gang Liu, Gabriel Leonard Weisz
SPARSIFYING NARROW DATA FORMATS FOR NEURAL NETWORKS

Publication number: 20220405571

Abstract: Embodiments of the present disclosure include systems and methods for sparsifying narrow data formats for neural networks. A plurality of activation values in a neural network are provided to a muxing unit. A set of sparsification operations are performed on a plurality of weight values to generate a subset of the plurality of weight values and mask values associated with the plurality of weight values. The subset of the plurality of weight values are provided to a matrix multiplication unit. The muxing unit generates a subset of the plurality of activation values based on the mask values and provides the subset of the plurality of activation values to the matrix multiplication unit. The matrix multiplication unit performs a set of matrix multiplication operations on the subset of the plurality of weight values and the subset of the plurality of activation values to generate a set of outputs.

Type: Application

Filed: June 16, 2021

Publication date: December 22, 2022

Inventors: Bita DARVISH ROUHANI, Venmugil Elango, Eric S. Chung, Douglas C Burger, Mattheus C. Heddes, Nishit Shah, Rasoul Shafipour, Ankit More
NEURAL NETWORK PROCESSING WITH CHAINED INSTRUCTIONS

Publication number: 20220391209

Abstract: Hardware and methods for neural network processing are provided. A method in a hardware node including a pipeline having a matrix vector unit (MVU), a first multifunction unit connected to receive an input from the matrix vector unit, a second multifunction unit connected to receive an output from the first multifunction unit, and a third multifunction unit connected to receive an output from the second multifunction unit is provided. The method includes performing using the MVU a first type of instruction that can only be performed by the MVU to generate a first result. The method further includes performing a second type of instruction that can only be performed by one of the multifunction units and generating a second result and without storing the any of the two results in a global register, passing the second result to the second multifunction and the third multifunction unit.

Type: Application

Filed: August 8, 2022

Publication date: December 8, 2022

Inventors: Jeremy FOWERS, Eric S. CHUNG, Douglas C. BURGER
TURBO TRAINING FOR DEEP NEURAL NETWORKS

Publication number: 20220383092

Abstract: Embodiments of the present disclosure includes systems and methods for reducing computational cost associated with training a neural network model. A neural network model is received and a neural network training process is executed in which the neural network model is trained according to a first fidelity during a first training phase. As a result of a determination that training of the neural network model during the first training phase satisfies one or more criteria, the neural network model is trained at a second fidelity during a second training phase, the second fidelity being a higher fidelity than the first fidelity.

Type: Application

Filed: May 25, 2021

Publication date: December 1, 2022

Inventors: Ritchie ZHAO, Bita DARVISH ROUHANI, Eric S. CHUNG, Douglas C. BURGER, Maximilian GOLUB
DATA-AWARE MODEL PRUNING FOR NEURAL NETWORKS

Publication number: 20220383123

Abstract: Embodiments of the present disclosure include systems and methods for performing data-aware model pruning for neural networks. During a training phase, a neural network is trained with a first set of data. During a validation phase, inference with the neural network is performed using a second set of data that causes the neural network to generate a first set of outputs at a layer in the neural network. During the validation phase, a plurality of mean values and a plurality of variance values are calculated based on the first set of outputs. A plurality of entropy values are calculated based on the plurality of mean values and the plurality of variance values. A second set of outputs are pruned based on the plurality of entropy values. The second set of outputs are generated by the layer of the neural network using a third set of data.

Type: Application

Filed: May 28, 2021

Publication date: December 1, 2022

Inventors: Venmugil ELANGO, Bita DARVISH ROUHANI, Eric S. CHUNG, Douglas C. BURGER, Maximilian GOLUB
REDUCING OPERATIONS FOR TRAINING NEURAL NETWORKS

Publication number: 20220366236

Abstract: Embodiments of the present disclosure include systems and methods for reducing operations for training neural networks. A plurality of training data selected from a training data set is used as a plurality of inputs for training a neural network. The neural network includes a plurality of weights. A plurality of loss values are determined based on outputs generated by the neural network and expected output data of the plurality of training data. A subset of the plurality of loss values are determined. An average loss value is determined based on the subset of the plurality of loss values. A set of gradients is calculated based on the average loss value and the plurality of weights in the neural network. The plurality of weights in the neural network are adjusted based on the set of gradients.

Type: Application

Filed: May 17, 2021

Publication date: November 17, 2022

Inventors: Maral MESMAKHOSROSHAHI, Bita Darvish ROUHANI, Eric S. CHUNG, Douglas C. BURGER
Subsampling training data during artificial neural network training

Patent number: 11494614

Abstract: Perplexity scores are computed for training data samples during ANN training. Perplexity scores can be computed as a divergence between data defining a class associated with a current training data sample and a probability vector generated by the ANN model. Perplexity scores can alternately be computed by learning a probability density function (“PDF”) fitting activation maps generated by an ANN model during training. A perplexity score can then be computed for a current training data sample by computing a probability for the current training data sample based on the PDF. If the perplexity score for a training data sample is lower than a threshold, the training data sample is removed from the training data set so that it will not be utilized for training during subsequent epochs. Training of the ANN model continues following the removal of training data samples from the training data set.

Type: Grant

Filed: March 20, 2019

Date of Patent: November 8, 2022

Assignee: Microsoft Technology Licensing, LLC

Inventors: Eric S. Chung, Douglas C. Burger, Bita Darvish Rouhani
HIERARCHICAL AND SHARED EXPONENT FLOATING POINT DATA TYPES

Publication number: 20220253281

Abstract: Embodiments of the present disclosure include systems and methods for providing hierarchical and shared exponent floating point data types. First and second shared exponent values are determined based on exponent values of a plurality of floating point values. A third shared exponent value is determined based the first shared exponent value and the second shared exponent value. First and second difference values are determined based on the first shared exponent value, the second shared exponent value, and the third shared exponent value. Sign values and mantissa values are determined for the plurality of floating point values. The sign value and the mantissa value for each floating point value in the plurality of floating point values, the third shared exponent value, the first difference value, and the second difference value are stored in a data structure for a shared exponent floating point data type.

Type: Application

Filed: June 28, 2021

Publication date: August 11, 2022

Inventors: Bita DARVISH ROUHANI, Venmugil ELANGO, Rasoul SHAFIPOUR, Jeremy FOWERS, Ming Gang LIU, Jinwen XI, Douglas C. BURGER, Eric S. CHUNG
NEURAL NETWORK PROCESSING WITH MODEL PINNING

Publication number: 20220012577

Abstract: Systems and methods for neural network processing are provided. A method in a system comprising a plurality of nodes interconnected via a network, where each node includes a plurality of on-chip memory blocks and a plurality of compute units, is provided. The method includes upon service activation receiving an N by M matrix of coefficients corresponding to the neural network model. The method includes loading the coefficients corresponding to the neural network model into the plurality of the on-chip memory blocks for processing by the plurality of compute units. The method includes regardless of a utilization of the plurality of the on-chip memory blocks as part of an evaluation of the neural network model, maintaining the coefficients corresponding to the neural network model in the plurality of the on-chip memory blocks until the service is interrupted or the neural network model is modified or replaced.

Type: Application

Filed: September 23, 2021

Publication date: January 13, 2022

Inventors: Eric S. Chung, Douglas C. Burger, Jeremy Fowers, Kalin Ovtcharov
MULTI-FUNCTION UNIT FOR PROGRAMMABLE HARDWARE NODES FOR NEURAL NETWORK PROCESSING

Publication number: 20210406657

Abstract: Processors and methods for neural network processing are provided. A method in a processor including a pipeline having a matrix vector unit (MVU), a first multifunction unit connected to receive an input from the MVU, a second multifunction unit connected to receive an output from the first multifunction unit, and a third multifunction unit connected to receive an output from the second multifunction unit is provided. The method includes decoding instructions including a first type of instruction for processing by only the MVU and a second type of instruction for processing by only one of the multifunction units. The method includes mapping a first instruction for processing by the matrix vector unit or to any one of the first multifunction unit, the second multifunction unit, or the third multifunction unit depending on whether the first instruction is the first type of instruction or the second type of instruction.

Type: Application

Filed: August 23, 2021

Publication date: December 30, 2021

Inventors: Eric S. Chung, Douglas C. Burger, Jeremy Fowers
Neural network processing with the neural network model pinned to on-chip memories of hardware nodes

Patent number: 11157801

Abstract: Systems and methods for neural network processing are provided. A method in a system comprising a plurality of nodes interconnected via a network, where each node includes a plurality of on-chip memory blocks and a plurality of compute units, is provided. The method includes upon service activation receiving an N by M matrix of coefficients corresponding to the neural network model. The method includes loading the coefficients corresponding to the neural network model into the plurality of the on-chip memory blocks for processing by the plurality of compute units. The method includes regardless of a utilization of the plurality of the on-chip memory blocks as part of an evaluation of the neural network model, maintaining the coefficients corresponding to the neural network model in the plurality of the on-chip memory blocks until the service is interrupted or the neural network model is modified or replaced.

Type: Grant

Filed: June 29, 2017

Date of Patent: October 26, 2021

Assignee: Microsoft Technology Licensing, LLC

Inventors: Eric S. Chung, Douglas C. Burger, Jeremy Fowers, Kalin Ovtcharov
Hardware node with position-dependent memories for neural network processing

Patent number: 11144820

Abstract: Processors and methods for neural network processing are provided. A method in a processor including a pipeline having a matrix vector unit (MVU), a first multifunction unit connected to receive an input from the matrix vector unit, a second multifunction unit connected to receive an output from the first multifunction unit, and a third multifunction unit connected to receive an output from the second multifunction unit is provided. The method includes decoding a chain of instructions received via an input queue, where the chain of instructions comprises a first instruction that can only be processed by the matrix vector unit and a sequence of instructions that can only be processed by a multifunction unit. The method includes processing the first instruction using the MVU and processing each of instructions in the sequence of instructions depending upon a position of the each of instructions in the sequence of instructions.

Type: Grant

Filed: June 29, 2017

Date of Patent: October 12, 2021

Assignee: Microsoft Technology Licensing, LLC

Inventors: Eric S. Chung, Douglas C. Burger, Jeremy Fowers
Multi-function unit for programmable hardware nodes for neural network processing

Patent number: 11132599

Abstract: Processors and methods for neural network processing are provided. A method in a processor including a pipeline having a matrix vector unit (MVU), a first multifunction unit connected to receive an input from the MVU, a second multifunction unit connected to receive an output from the first multifunction unit, and a third multifunction unit connected to receive an output from the second multifunction unit is provided. The method includes decoding instructions including a first type of instruction for processing by only the MVU and a second type of instruction for processing by only one of the multifunction units. The method includes mapping a first instruction for processing by the matrix vector unit or to any one of the first multifunction unit, the second multifunction unit, or the third multifunction unit depending on whether the first instruction is the first type of instruction or the second type of instruction.

Type: Grant

Filed: June 29, 2017

Date of Patent: September 28, 2021

Assignee: Microsoft Technology Licensing, LLC

Inventors: Eric S. Chung, Douglas C. Burger, Jeremy Fowers
Hardware implemented load balancing

Patent number: 10958717

Abstract: A server system is provided that includes a plurality of servers, each server including at least one hardware acceleration device and at least one processor communicatively coupled to the hardware acceleration device by an internal data bus and executing a host server instance, the host server instances of the plurality of servers collectively providing a software plane, and the hardware acceleration devices of the plurality of servers collectively providing a hardware acceleration plane that implements a plurality of hardware accelerated services, wherein each hardware acceleration device maintains in memory a data structure that contains load data indicating a load of each of a plurality of target hardware acceleration devices, and wherein a requesting hardware acceleration device routes the request to a target hardware acceleration device that is indicated by the load data in the data structure to have a lower load than other of the target hardware acceleration devices.

Type: Grant

Filed: August 30, 2019

Date of Patent: March 23, 2021

Assignee: Microsoft Technology Licensing, LLC

Inventors: Adrian Michael Caulfield, Eric S. Chung, Michael Konstantinos Papamichael, Douglas C. Burger, Shlomi Alkalay

prev 1 2 3 4 5 next