Patents by Inventor Ritchie Zhao
Ritchie Zhao has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12190235Abstract: Embodiments of the present disclosure include a system for optimizing an artificial neural network by configuring a model, based on a plurality of training parameters, to execute a training process, monitoring a plurality of statistics produced upon execution of the training process, and adjusting one or more of the training parameters, based on one or more of the statistics, to maintain at least one of the statistics within a predetermined range. In some embodiments, artificial intelligence (AI) processors may execute a training process on a model, the training process having an associated set of training parameters. Execution of the training process may produce a plurality of statistics. Control processor(s) coupled to the AI processor(s) may receive the statistics, and in accordance therewith, adjust one or more of the training parameters to maintain at least one of the statistics within a predetermined range during execution of the training process.Type: GrantFiled: January 29, 2021Date of Patent: January 7, 2025Assignee: Microsoft Technology Licensing, LLCInventors: Maximilian Golub, Ritchie Zhao, Eric Chung, Douglas Burger, Bita Darvish Rouhani, Ge Yang, Nicolo Fusi
-
Patent number: 12165038Abstract: Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and, in particular, for adjusting floating-point formats used to store activation values during training. In certain examples of the disclosed technology, a computing system includes processors, memory, and a floating-point compressor in communication with the memory. The computing system is configured to produce a neural network comprising activation values expressed in a first floating-point format, select a second floating-point format for the neural network based on a performance metric, convert at least one of the activation values to the second floating-point format, and store the compressed activation values in the memory. Aspects of the second floating-point format that can be adjusted include the number of bits used to express mantissas, exponent format, use of non-uniform mantissas, and/or use of outlier values to express some of the mantissas.Type: GrantFiled: February 14, 2019Date of Patent: December 10, 2024Assignee: Microsoft Technology Licensing, LLCInventors: Daniel Lo, Bita Darvish Rouhani, Eric S. Chung, Yiren Zhao, Amar Phanishayee, Ritchie Zhao
-
Patent number: 12045724Abstract: Apparatus and methods for training a neural network accelerator using quantized precision data formats having outlier values are disclosed, and in particular for storing activation values from a neural network in a compressed format for use during forward and backward propagation training of the neural network. In certain examples of the disclosed technology, a computing system is configured to perform forward propagation for a layer of a neural network to produced first activation values in a first block floating-point format. In some examples, activation values generated by forward propagation are converted by the compressor to a second block floating-point format having a narrower numerical precision than the first block floating-point format. Outlier values, comprising additional bits of mantissa and/or exponent are stored in ancillary storage for subset of the activation values. The compressed activation values are stored in the memory, where they can be retrieved for use during back propagation.Type: GrantFiled: December 31, 2018Date of Patent: July 23, 2024Assignee: Microsoft Technology Licensing, LLCInventors: Daniel Lo, Amar Phanishayee, Eric S. Chung, Yiren Zhao, Ritchie Zhao
-
Patent number: 11790212Abstract: Quantization-aware neural architecture search (“QNAS”) can be utilized to learn optimal hyperparameters for configuring an artificial neural network (“ANN”) that quantizes activation values and/or weights. The hyperparameters can include model topology parameters, quantization parameters, and hardware architecture parameters. Model topology parameters specify the structure and connectivity of an ANN. Quantization parameters can define a quantization configuration for an ANN such as, for example, a bit width for a mantissa for storing activation values or weights generated by the layers of an ANN. The activation values and weights can be represented using a quantized-precision floating-point format, such as a block floating-point format (“BFP”) having a mantissa that has fewer bits than a mantissa in a normal-precision floating-point representation and a shared exponent.Type: GrantFiled: March 18, 2019Date of Patent: October 17, 2023Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Kalin Ovtcharov, Eric S. Chung, Vahideh Akhlaghi, Ritchie Zhao
-
Publication number: 20230316039Abstract: A computing system is configured to implement a deep neural network comprising an input layer for receiving inputs applied to the deep neural network, an output layer for outputting inferences based on the received inputs, and a plurality of hidden layers interposed between the input layer and the output layer. A plurality of nodes selectively operate on the inputs to generate and cause outputting of the inferences, wherein operation of the nodes is controlled based on parameters of the deep neural network. A sparsity controller is configured to selectively apply a plurality of different sparsity states to control parameter density of the deep neural network. A quantization controller is configured to selectively quantize the parameters of the deep neural network in a manner that is sparsity-dependent, such that quantization applied to each parameter is based on which of the plurality of different sparsity states applies to the parameter.Type: ApplicationFiled: May 23, 2022Publication date: October 5, 2023Applicant: Microsoft Technology Licensing, LLCInventors: Rasoul SHAFIPOUR, Bita DARVISH ROUHANI, Douglas Christopher BURGER, Ming Gang LIU, Eric S. CHUNG, Ritchie Zhao
-
Publication number: 20230196085Abstract: Methods and apparatus are disclosed for providing emulation of quantized precision operations in a neural network. In some examples, the quantized precision operations are performed in a block floating-point format where values of a tensor share a common exponent. Techniques for selecting higher precision or lower precision can be used based on a variety of input metrics. When converting to a quantized tensor, a residual tensor is produced. In one embodiment, an error value associated with converting from a normal-precision floating point number to the quantized tensor is used to determine whether to use the residual tensor in a dot product calculation. Using the residual tensor increases the precision of an output from a node. Selection of whether to use the residual tensor can depend on various input metrics including the error value, the layer number, the exponent value, the layer type, etc.Type: ApplicationFiled: February 16, 2023Publication date: June 22, 2023Applicant: Microsoft Technology Licensing, LLCInventors: Eric S. Chung, Daniel Lo, Jialiang Zhang, Ritchie Zhao
-
Patent number: 11645493Abstract: Methods and apparatus are disclosed supporting a design flow for developing quantized neural networks. In one example of the disclosed technology, a method includes quantizing a normal-precision floating-point neural network model into a quantized format. For example, the quantized format can be a block floating-point format, where two or more elements of tensors in the neural network share a common exponent. A set of test input is applied to a normal-precision flooding point model and the corresponding quantized model and the respective output tensors are compared. Based on this comparison, hyperparameters or other attributes of the neural networks can be adjusted. Further, quantization parameters determining the widths of data and selection of shared exponents for the block floating-point format can be selected. An adjusted, quantized neural network is retrained and programmed into a hardware accelerator.Type: GrantFiled: May 4, 2018Date of Patent: May 9, 2023Assignee: Microsoft Technology Licensing, LLCInventors: Douglas C. Burger, Eric S. Chung, Bita Darvish Rouhani, Daniel Lo, Ritchie Zhao
-
Publication number: 20230106651Abstract: Aspects of embodiments of the present disclosure relate to a field programmable gate array (FPGA) configured to implement an exponential function data path including: an input scaling stage including constant shifters and integer adders to scale a mantissa portion of an input floating-point value by approximately log2 e to compute a scaled mantissa value, where e is Euler's number; and an exponential stage including barrel shifters and an exponential lookup table to: extract an integer portion and a fractional portion from the scaled mantissa value based on the exponent portion of the input floating-point value; apply a bias shift to the integer portion to compute a result exponent portion of a result floating-point value; lookup a result mantissa portion of the result floating-point value in the exponential lookup table based on the fractional portion; and combine the result exponent portion and the result mantissa portion to generate the result floating-point value.Type: ApplicationFiled: September 28, 2021Publication date: April 6, 2023Applicant: Microsoft Technology Licensing, LLCInventors: Jinwen XI, Ritchie ZHAO, Ming Gang LIU, Eric S. CHUNG
-
Patent number: 11604960Abstract: Machine learning is utilized to learn an optimized quantization configuration for an artificial neural network (ANN). For example, an ANN can be utilized to learn an optimal bit width for quantizing weights for layers of the ANN. The ANN can also be utilized to learn an optimal bit width for quantizing activation values for the layers of the ANN. Once the bit widths have been learned, they can be utilized at inference time to improve the performance of the ANN by quantizing the weights and activation values of the layers of the ANN.Type: GrantFiled: March 18, 2019Date of Patent: March 14, 2023Assignee: Microsoft Technology Licensing, LLCInventors: Kalin Ovtcharov, Eric S. Chung, Vahideh Akhlaghi, Ritchie Zhao
-
Patent number: 11586883Abstract: Methods and apparatus are disclosed for providing emulation of quantized precision operations in a neural network. In some examples, the quantized precision operations are performed in a block floating-point format where values of a tensor share a common exponent. Techniques for selecting higher precision or lower precision can be used based on a variety of input metrics. When converting to a quantized tensor, a residual tensor is produced. In one embodiment, an error value associated with converting from a normal-precision floating point number to the quantized tensor is used to determine whether to use the residual tensor in a dot product calculation. Using the residual tensor increases the precision of an output from a node. Selection of whether to use the residual tensor can depend on various input metrics including the error value, the layer number, the exponent value, the layer type, etc.Type: GrantFiled: December 14, 2018Date of Patent: February 21, 2023Assignee: Microsoft Technology Licensing, LLCInventors: Eric S. Chung, Daniel Lo, Jialiang Zhang, Ritchie Zhao
-
Patent number: 11574239Abstract: Machine learning may include training and drawing inference from artificial neural networks, processes which may include performing convolution and matrix multiplication operations. Convolution and matrix multiplication operations are performed using vectors of block floating-point (BFP) values that may include outliers. BFP format stores floating-point values using a plurality of mantissas of a fixed bit width and a shared exponent. Elements are outliers when they are too large to be represented precisely with the fixed bit width mantissa and shared exponent. Outlier values are split into two mantissas. One mantissa is stored in the vector with non-outliers, while the other mantissa is stored outside the vector. Operations, such as a dot product, may be performed on the vectors in part by combining the in-vector mantissa and exponent of an outlier value with the out-of-vector mantissa and exponent.Type: GrantFiled: March 18, 2019Date of Patent: February 7, 2023Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Eric S. Chung, Daniel Lo, Ritchie Zhao
-
Publication number: 20220383092Abstract: Embodiments of the present disclosure includes systems and methods for reducing computational cost associated with training a neural network model. A neural network model is received and a neural network training process is executed in which the neural network model is trained according to a first fidelity during a first training phase. As a result of a determination that training of the neural network model during the first training phase satisfies one or more criteria, the neural network model is trained at a second fidelity during a second training phase, the second fidelity being a higher fidelity than the first fidelity.Type: ApplicationFiled: May 25, 2021Publication date: December 1, 2022Inventors: Ritchie ZHAO, Bita DARVISH ROUHANI, Eric S. CHUNG, Douglas C. BURGER, Maximilian GOLUB
-
Publication number: 20220245444Abstract: Embodiments of the present disclosure include a system for optimizing an artificial neural network by configuring a model, based on a plurality of training parameters, to execute a training process, monitoring a plurality of statistics produced upon execution of the training process, and adjusting one or more of the training parameters, based on one or more of the statistics, to maintain at least one of the statistics within a predetermined range. In some embodiments, artificial intelligence (AI) processors may execute a training process on a model, the training process having an associated set of training parameters. Execution of the training process may produce a plurality of statistics. Control processor(s) coupled to the AI processor(s) may receive the statistics, and in accordance therewith, adjust one or more of the training parameters to maintain at least one of the statistics within a predetermined range during execution of the training process.Type: ApplicationFiled: January 29, 2021Publication date: August 4, 2022Inventors: Maximilian Golub, Ritchie Zhao, Eric Chung, Douglas Burger, Bita Darvish Rouhani, Ge Yang, Nicolo Fusi
-
Publication number: 20200302271Abstract: Quantization-aware neural architecture search (“QNAS”) can be utilized to learn optimal hyperparameters for configuring an artificial neural network (“ANN”) that quantizes activation values and/or weights. The hyperparameters can include model topology parameters, quantization parameters, and hardware architecture parameters. Model topology parameters specify the structure and connectivity of an ANN. Quantization parameters can define a quantization configuration for an ANN such as, for example, a bit width for a mantissa for storing activation values or weights generated by the layers of an ANN. The activation values and weights can be represented using a quantized-precision floating-point format, such as a block floating-point format (“BFP”) having a mantissa that has fewer bits than a mantissa in a normal-precision floating-point representation and a shared exponent.Type: ApplicationFiled: March 18, 2019Publication date: September 24, 2020Inventors: Kalin Ovtcharov, Eric S. Chung, Vahideh Akhlaghi, Ritchie Zhao
-
Publication number: 20200302330Abstract: Machine learning may include training and drawing inference from artificial neural networks, processes which may include performing convolution and matrix multiplication operations. Convolution and matrix multiplication operations are performed using vectors of block floating-point (BFP) values that may include outliers. BFP format stores floating-point values using a plurality of mantissas of a fixed bit width and a shared exponent. Elements are outliers when they are too large to be represented precisely with the fixed bit width mantissa and shared exponent. Outlier values are split into two mantissas. One mantissa is stored in the vector with non-outliers, while the other mantissa is stored outside the vector. Operations, such as a dot product, may be performed on the vectors in part by combining the in-vector mantissa and exponent of an outlier value with the out-of-vector mantissa and exponent.Type: ApplicationFiled: March 18, 2019Publication date: September 24, 2020Inventors: Eric S. CHUNG, Daniel LO, Ritchie ZHAO
-
Publication number: 20200302269Abstract: Machine learning is utilized to learn an optimized quantization configuration for an artificial neural network (ANN). For example, an ANN can be utilized to learn an optimal bit width for quantizing weights for layers of the ANN. The ANN can also be utilized to learn an optimal bit width for quantizing activation values for the layers of the ANN. Once the bit widths have been learned, they can be utilized at inference time to improve the performance of the ANN by quantizing the weights and activation values of the layers of the ANN.Type: ApplicationFiled: March 18, 2019Publication date: September 24, 2020Inventors: Kalin OVTCHAROV, Eric S. CHUNG, Vahideh AKHLAGHI, Ritchie ZHAO
-
Publication number: 20200264876Abstract: Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and, in particular, for adjusting floating-point formats used to store activation values during training. In certain examples of the disclosed technology, a computing system includes processors, memory, and a floating-point compressor in communication with the memory. The computing system is configured to produce a neural network comprising activation values expressed in a first floating-point format, select a second floating-point format for the neural network based on a performance metric, convert at least one of the activation values to the second floating-point format, and store the compressed activation values in the memory. Aspects of the second floating-point format that can be adjusted include the number of bits used to express mantissas, exponent format, use of non-uniform mantissas, and/or use of outlier values to express some of the mantissas.Type: ApplicationFiled: February 14, 2019Publication date: August 20, 2020Applicant: Microsoft Technology Licensing, LLCInventors: Daniel Lo, Bita Darvish Rouhani, Eric S. Chung, Yiren Zhao, Amar Phanishayee, Ritchie Zhao
-
Publication number: 20200210838Abstract: Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and in particular for storing activation values from a neural network in a compressed format for use during forward and backward propagation training of the neural network. In certain examples of the disclosed technology, a computing system includes processors, memory, and a compressor in communication with the memory. The computing system is configured to perform forward propagation for a layer of a neural network to produced first activation values in a first block floating-point format. In some examples, activation values generated by forward propagation are converted by the compressor to a second block floating-point format having a narrower numerical precision than the first block floating-point format. The compressed activation values are stored in the memory, where they can be retrieved for use during back propagation.Type: ApplicationFiled: December 31, 2018Publication date: July 2, 2020Applicant: Microsoft Technology Licensing, LLCInventors: Daniel Lo, Amar Phanishayee, Eric S. Chung, Yiren Zhao, Ritchie Zhao
-
Publication number: 20200210839Abstract: Apparatus and methods for training a neural network accelerator using quantized precision data formats having outlier values are disclosed, and in particular for storing activation values from a neural network in a compressed format for use during forward and backward propagation training of the neural network. In certain examples of the disclosed technology, a computing system is configured to perform forward propagation for a layer of a neural network to produced first activation values in a first block floating-point format. In some examples, activation values generated by forward propagation are converted by the compressor to a second block floating-point format having a narrower numerical precision than the first block floating-point format. Outlier values, comprising additional bits of mantissa and/or exponent are stored in ancillary storage for subset of the activation values. The compressed activation values are stored in the memory, where they can be retrieved for use during back propagation.Type: ApplicationFiled: December 31, 2018Publication date: July 2, 2020Applicant: Microsoft Technology Licensing, LLCInventors: Daniel Lo, Amar Phanishayee, Eric S. Chung, Yiren Zhao, Ritchie Zhao
-
Publication number: 20200193273Abstract: Methods and apparatus are disclosed for providing emulation of quantized precision operations in a neural network. In some examples, the quantized precision operations are performed in a block floating-point format where values of a tensor share a common exponent. Techniques for selecting higher precision or lower precision can be used based on a variety of input metrics. When converting to a quantized tensor, a residual tensor is produced. In one embodiment, an error value associated with converting from a normal-precision floating point number to the quantized tensor is used to determine whether to use the residual tensor in a dot product calculation. Using the residual tensor increases the precision of an output from a node. Selection of whether to use the residual tensor can depend on various input metrics including the error value, the layer number, the exponent value, the layer type, etc.Type: ApplicationFiled: December 14, 2018Publication date: June 18, 2020Applicant: Microsoft Technology Licensing, LLCInventors: Eric S. Chung, Daniel Lo, Jialiang Zhang, Ritchie Zhao