Patents by Inventor Ritchie Zhao

Ritchie Zhao has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

ADJUSTING ACTIVATION COMPRESSION FOR NEURAL NETWORK TRAINING

Publication number: 20250061320

Abstract: Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and, in particular, for adjusting floating-point formats used to store activation values during training. In certain examples of the disclosed technology, a computing system includes processors, memory, and a floating-point compressor in communication with the memory. The computing system is configured to produce a neural network comprising activation values expressed in a first floating-point format, select a second floating-point format for the neural network based on a performance metric, convert at least one of the activation values to the second floating-point format, and store the compressed activation values in the memory. Aspects of the second floating-point format that can be adjusted include the number of bits used to express mantissas, exponent format, use of non-uniform mantissas, and/or use of outlier values to express some of the mantissas.

Type: Application

Filed: November 4, 2024

Publication date: February 20, 2025

Applicant: Microsoft Technology Licensing, LLC

Inventors: Daniel Lo, Bita Darvish Rouhani, Eric S. Chung, Yiren Zhao, Amar Phanishayee, Ritchie Zhao
System for training an artificial neural network

Patent number: 12190235

Abstract: Embodiments of the present disclosure include a system for optimizing an artificial neural network by configuring a model, based on a plurality of training parameters, to execute a training process, monitoring a plurality of statistics produced upon execution of the training process, and adjusting one or more of the training parameters, based on one or more of the statistics, to maintain at least one of the statistics within a predetermined range. In some embodiments, artificial intelligence (AI) processors may execute a training process on a model, the training process having an associated set of training parameters. Execution of the training process may produce a plurality of statistics. Control processor(s) coupled to the AI processor(s) may receive the statistics, and in accordance therewith, adjust one or more of the training parameters to maintain at least one of the statistics within a predetermined range during execution of the training process.

Type: Grant

Filed: January 29, 2021

Date of Patent: January 7, 2025

Assignee: Microsoft Technology Licensing, LLC

Inventors: Maximilian Golub, Ritchie Zhao, Eric Chung, Douglas Burger, Bita Darvish Rouhani, Ge Yang, Nicolo Fusi
Adjusting activation compression for neural network training

Patent number: 12165038

Abstract: Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and, in particular, for adjusting floating-point formats used to store activation values during training. In certain examples of the disclosed technology, a computing system includes processors, memory, and a floating-point compressor in communication with the memory. The computing system is configured to produce a neural network comprising activation values expressed in a first floating-point format, select a second floating-point format for the neural network based on a performance metric, convert at least one of the activation values to the second floating-point format, and store the compressed activation values in the memory. Aspects of the second floating-point format that can be adjusted include the number of bits used to express mantissas, exponent format, use of non-uniform mantissas, and/or use of outlier values to express some of the mantissas.

Type: Grant

Filed: February 14, 2019

Date of Patent: December 10, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Daniel Lo, Bita Darvish Rouhani, Eric S. Chung, Yiren Zhao, Amar Phanishayee, Ritchie Zhao
Neural network activation compression with outlier block floating-point

Patent number: 12045724

Abstract: Apparatus and methods for training a neural network accelerator using quantized precision data formats having outlier values are disclosed, and in particular for storing activation values from a neural network in a compressed format for use during forward and backward propagation training of the neural network. In certain examples of the disclosed technology, a computing system is configured to perform forward propagation for a layer of a neural network to produced first activation values in a first block floating-point format. In some examples, activation values generated by forward propagation are converted by the compressor to a second block floating-point format having a narrower numerical precision than the first block floating-point format. Outlier values, comprising additional bits of mantissa and/or exponent are stored in ancillary storage for subset of the activation values. The compressed activation values are stored in the memory, where they can be retrieved for use during back propagation.

Type: Grant

Filed: December 31, 2018

Date of Patent: July 23, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Daniel Lo, Amar Phanishayee, Eric S. Chung, Yiren Zhao, Ritchie Zhao
Quantization-aware neural architecture search

Patent number: 11790212

Abstract: Quantization-aware neural architecture search (“QNAS”) can be utilized to learn optimal hyperparameters for configuring an artificial neural network (“ANN”) that quantizes activation values and/or weights. The hyperparameters can include model topology parameters, quantization parameters, and hardware architecture parameters. Model topology parameters specify the structure and connectivity of an ANN. Quantization parameters can define a quantization configuration for an ANN such as, for example, a bit width for a mantissa for storing activation values or weights generated by the layers of an ANN. The activation values and weights can be represented using a quantized-precision floating-point format, such as a block floating-point format (“BFP”) having a mantissa that has fewer bits than a mantissa in a normal-precision floating-point representation and a shared exponent.

Type: Grant

Filed: March 18, 2019

Date of Patent: October 17, 2023

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Kalin Ovtcharov, Eric S. Chung, Vahideh Akhlaghi, Ritchie Zhao
SPARSITY AND QUANTIZATION FOR DEEP NEURAL NETWORKS

Publication number: 20230316039

Abstract: A computing system is configured to implement a deep neural network comprising an input layer for receiving inputs applied to the deep neural network, an output layer for outputting inferences based on the received inputs, and a plurality of hidden layers interposed between the input layer and the output layer. A plurality of nodes selectively operate on the inputs to generate and cause outputting of the inferences, wherein operation of the nodes is controlled based on parameters of the deep neural network. A sparsity controller is configured to selectively apply a plurality of different sparsity states to control parameter density of the deep neural network. A quantization controller is configured to selectively quantize the parameters of the deep neural network in a manner that is sparsity-dependent, such that quantization applied to each parameter is based on which of the plurality of different sparsity states applies to the parameter.

Type: Application

Filed: May 23, 2022

Publication date: October 5, 2023

Applicant: Microsoft Technology Licensing, LLC

Inventors: Rasoul SHAFIPOUR, Bita DARVISH ROUHANI, Douglas Christopher BURGER, Ming Gang LIU, Eric S. CHUNG, Ritchie Zhao
RESIDUAL QUANTIZATION FOR NEURAL NETWORKS

Publication number: 20230196085

Abstract: Methods and apparatus are disclosed for providing emulation of quantized precision operations in a neural network. In some examples, the quantized precision operations are performed in a block floating-point format where values of a tensor share a common exponent. Techniques for selecting higher precision or lower precision can be used based on a variety of input metrics. When converting to a quantized tensor, a residual tensor is produced. In one embodiment, an error value associated with converting from a normal-precision floating point number to the quantized tensor is used to determine whether to use the residual tensor in a dot product calculation. Using the residual tensor increases the precision of an output from a node. Selection of whether to use the residual tensor can depend on various input metrics including the error value, the layer number, the exponent value, the layer type, etc.

Type: Application

Filed: February 16, 2023

Publication date: June 22, 2023

Applicant: Microsoft Technology Licensing, LLC

Inventors: Eric S. Chung, Daniel Lo, Jialiang Zhang, Ritchie Zhao
Flow for quantized neural networks

Patent number: 11645493

Abstract: Methods and apparatus are disclosed supporting a design flow for developing quantized neural networks. In one example of the disclosed technology, a method includes quantizing a normal-precision floating-point neural network model into a quantized format. For example, the quantized format can be a block floating-point format, where two or more elements of tensors in the neural network share a common exponent. A set of test input is applied to a normal-precision flooding point model and the corresponding quantized model and the respective output tensors are compared. Based on this comparison, hyperparameters or other attributes of the neural networks can be adjusted. Further, quantization parameters determining the widths of data and selection of shared exponents for the block floating-point format can be selected. An adjusted, quantized neural network is retrained and programmed into a hardware accelerator.

Type: Grant

Filed: May 4, 2018

Date of Patent: May 9, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Douglas C. Burger, Eric S. Chung, Bita Darvish Rouhani, Daniel Lo, Ritchie Zhao
SYSTEMS AND METHODS FOR ACCELERATING THE COMPUTATION OF THE EXPONENTIAL FUNCTION

Publication number: 20230106651

Abstract: Aspects of embodiments of the present disclosure relate to a field programmable gate array (FPGA) configured to implement an exponential function data path including: an input scaling stage including constant shifters and integer adders to scale a mantissa portion of an input floating-point value by approximately log2 e to compute a scaled mantissa value, where e is Euler's number; and an exponential stage including barrel shifters and an exponential lookup table to: extract an integer portion and a fractional portion from the scaled mantissa value based on the exponent portion of the input floating-point value; apply a bias shift to the integer portion to compute a result exponent portion of a result floating-point value; lookup a result mantissa portion of the result floating-point value in the exponential lookup table based on the fractional portion; and combine the result exponent portion and the result mantissa portion to generate the result floating-point value.

Type: Application

Filed: September 28, 2021

Publication date: April 6, 2023

Applicant: Microsoft Technology Licensing, LLC

Inventors: Jinwen XI, Ritchie ZHAO, Ming Gang LIU, Eric S. CHUNG
Differential bit width neural architecture search

Patent number: 11604960

Abstract: Machine learning is utilized to learn an optimized quantization configuration for an artificial neural network (ANN). For example, an ANN can be utilized to learn an optimal bit width for quantizing weights for layers of the ANN. The ANN can also be utilized to learn an optimal bit width for quantizing activation values for the layers of the ANN. Once the bit widths have been learned, they can be utilized at inference time to improve the performance of the ANN by quantizing the weights and activation values of the layers of the ANN.

Type: Grant

Filed: March 18, 2019

Date of Patent: March 14, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Kalin Ovtcharov, Eric S. Chung, Vahideh Akhlaghi, Ritchie Zhao
Residual quantization for neural networks

Patent number: 11586883

Abstract: Methods and apparatus are disclosed for providing emulation of quantized precision operations in a neural network. In some examples, the quantized precision operations are performed in a block floating-point format where values of a tensor share a common exponent. Techniques for selecting higher precision or lower precision can be used based on a variety of input metrics. When converting to a quantized tensor, a residual tensor is produced. In one embodiment, an error value associated with converting from a normal-precision floating point number to the quantized tensor is used to determine whether to use the residual tensor in a dot product calculation. Using the residual tensor increases the precision of an output from a node. Selection of whether to use the residual tensor can depend on various input metrics including the error value, the layer number, the exponent value, the layer type, etc.

Type: Grant

Filed: December 14, 2018

Date of Patent: February 21, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Eric S. Chung, Daniel Lo, Jialiang Zhang, Ritchie Zhao
Outlier quantization for training and inference

Patent number: 11574239

Abstract: Machine learning may include training and drawing inference from artificial neural networks, processes which may include performing convolution and matrix multiplication operations. Convolution and matrix multiplication operations are performed using vectors of block floating-point (BFP) values that may include outliers. BFP format stores floating-point values using a plurality of mantissas of a fixed bit width and a shared exponent. Elements are outliers when they are too large to be represented precisely with the fixed bit width mantissa and shared exponent. Outlier values are split into two mantissas. One mantissa is stored in the vector with non-outliers, while the other mantissa is stored outside the vector. Operations, such as a dot product, may be performed on the vectors in part by combining the in-vector mantissa and exponent of an outlier value with the out-of-vector mantissa and exponent.

Type: Grant

Filed: March 18, 2019

Date of Patent: February 7, 2023

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Eric S. Chung, Daniel Lo, Ritchie Zhao
TURBO TRAINING FOR DEEP NEURAL NETWORKS

Publication number: 20220383092

Abstract: Embodiments of the present disclosure includes systems and methods for reducing computational cost associated with training a neural network model. A neural network model is received and a neural network training process is executed in which the neural network model is trained according to a first fidelity during a first training phase. As a result of a determination that training of the neural network model during the first training phase satisfies one or more criteria, the neural network model is trained at a second fidelity during a second training phase, the second fidelity being a higher fidelity than the first fidelity.

Type: Application

Filed: May 25, 2021

Publication date: December 1, 2022

Inventors: Ritchie ZHAO, Bita DARVISH ROUHANI, Eric S. CHUNG, Douglas C. BURGER, Maximilian GOLUB
SYSTEM FOR TRAINING AN ARTIFICIAL NEURAL NETWORK

Publication number: 20220245444

Abstract: Embodiments of the present disclosure include a system for optimizing an artificial neural network by configuring a model, based on a plurality of training parameters, to execute a training process, monitoring a plurality of statistics produced upon execution of the training process, and adjusting one or more of the training parameters, based on one or more of the statistics, to maintain at least one of the statistics within a predetermined range. In some embodiments, artificial intelligence (AI) processors may execute a training process on a model, the training process having an associated set of training parameters. Execution of the training process may produce a plurality of statistics. Control processor(s) coupled to the AI processor(s) may receive the statistics, and in accordance therewith, adjust one or more of the training parameters to maintain at least one of the statistics within a predetermined range during execution of the training process.

Type: Application

Filed: January 29, 2021

Publication date: August 4, 2022

Inventors: Maximilian Golub, Ritchie Zhao, Eric Chung, Douglas Burger, Bita Darvish Rouhani, Ge Yang, Nicolo Fusi
OUTLIER QUANTIZATION FOR TRAINING AND INFERENCE

Publication number: 20200302330

Abstract: Machine learning may include training and drawing inference from artificial neural networks, processes which may include performing convolution and matrix multiplication operations. Convolution and matrix multiplication operations are performed using vectors of block floating-point (BFP) values that may include outliers. BFP format stores floating-point values using a plurality of mantissas of a fixed bit width and a shared exponent. Elements are outliers when they are too large to be represented precisely with the fixed bit width mantissa and shared exponent. Outlier values are split into two mantissas. One mantissa is stored in the vector with non-outliers, while the other mantissa is stored outside the vector. Operations, such as a dot product, may be performed on the vectors in part by combining the in-vector mantissa and exponent of an outlier value with the out-of-vector mantissa and exponent.

Type: Application

Filed: March 18, 2019

Publication date: September 24, 2020

Inventors: Eric S. CHUNG, Daniel LO, Ritchie ZHAO
DIFFERENTIAL BIT WIDTH NEURAL ARCHITECTURE SEARCH

Publication number: 20200302269

Abstract: Machine learning is utilized to learn an optimized quantization configuration for an artificial neural network (ANN). For example, an ANN can be utilized to learn an optimal bit width for quantizing weights for layers of the ANN. The ANN can also be utilized to learn an optimal bit width for quantizing activation values for the layers of the ANN. Once the bit widths have been learned, they can be utilized at inference time to improve the performance of the ANN by quantizing the weights and activation values of the layers of the ANN.

Type: Application

Filed: March 18, 2019

Publication date: September 24, 2020

Inventors: Kalin OVTCHAROV, Eric S. CHUNG, Vahideh AKHLAGHI, Ritchie ZHAO
QUANTIZATION-AWARE NEURAL ARCHITECTURE SEARCH

Publication number: 20200302271

Abstract: Quantization-aware neural architecture search (“QNAS”) can be utilized to learn optimal hyperparameters for configuring an artificial neural network (“ANN”) that quantizes activation values and/or weights. The hyperparameters can include model topology parameters, quantization parameters, and hardware architecture parameters. Model topology parameters specify the structure and connectivity of an ANN. Quantization parameters can define a quantization configuration for an ANN such as, for example, a bit width for a mantissa for storing activation values or weights generated by the layers of an ANN. The activation values and weights can be represented using a quantized-precision floating-point format, such as a block floating-point format (“BFP”) having a mantissa that has fewer bits than a mantissa in a normal-precision floating-point representation and a shared exponent.

Type: Application

Filed: March 18, 2019

Publication date: September 24, 2020

Inventors: Kalin Ovtcharov, Eric S. Chung, Vahideh Akhlaghi, Ritchie Zhao
ADJUSTING ACTIVATION COMPRESSION FOR NEURAL NETWORK TRAINING

Publication number: 20200264876

Abstract: Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and, in particular, for adjusting floating-point formats used to store activation values during training. In certain examples of the disclosed technology, a computing system includes processors, memory, and a floating-point compressor in communication with the memory. The computing system is configured to produce a neural network comprising activation values expressed in a first floating-point format, select a second floating-point format for the neural network based on a performance metric, convert at least one of the activation values to the second floating-point format, and store the compressed activation values in the memory. Aspects of the second floating-point format that can be adjusted include the number of bits used to express mantissas, exponent format, use of non-uniform mantissas, and/or use of outlier values to express some of the mantissas.

Type: Application

Filed: February 14, 2019

Publication date: August 20, 2020

Applicant: Microsoft Technology Licensing, LLC

Inventors: Daniel Lo, Bita Darvish Rouhani, Eric S. Chung, Yiren Zhao, Amar Phanishayee, Ritchie Zhao
NEURAL NETWORK ACTIVATION COMPRESSION WITH NARROW BLOCK FLOATING-POINT

Publication number: 20200210838

Abstract: Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and in particular for storing activation values from a neural network in a compressed format for use during forward and backward propagation training of the neural network. In certain examples of the disclosed technology, a computing system includes processors, memory, and a compressor in communication with the memory. The computing system is configured to perform forward propagation for a layer of a neural network to produced first activation values in a first block floating-point format. In some examples, activation values generated by forward propagation are converted by the compressor to a second block floating-point format having a narrower numerical precision than the first block floating-point format. The compressed activation values are stored in the memory, where they can be retrieved for use during back propagation.

Type: Application

Filed: December 31, 2018

Publication date: July 2, 2020

Applicant: Microsoft Technology Licensing, LLC

Inventors: Daniel Lo, Amar Phanishayee, Eric S. Chung, Yiren Zhao, Ritchie Zhao
NEURAL NETWORK ACTIVATION COMPRESSION WITH OUTLIER BLOCK FLOATING-POINT

Publication number: 20200210839

Abstract: Apparatus and methods for training a neural network accelerator using quantized precision data formats having outlier values are disclosed, and in particular for storing activation values from a neural network in a compressed format for use during forward and backward propagation training of the neural network. In certain examples of the disclosed technology, a computing system is configured to perform forward propagation for a layer of a neural network to produced first activation values in a first block floating-point format. In some examples, activation values generated by forward propagation are converted by the compressor to a second block floating-point format having a narrower numerical precision than the first block floating-point format. Outlier values, comprising additional bits of mantissa and/or exponent are stored in ancillary storage for subset of the activation values. The compressed activation values are stored in the memory, where they can be retrieved for use during back propagation.

Type: Application

Filed: December 31, 2018

Publication date: July 2, 2020

Applicant: Microsoft Technology Licensing, LLC

Inventors: Daniel Lo, Amar Phanishayee, Eric S. Chung, Yiren Zhao, Ritchie Zhao

1 2 next