Patents by Inventor Eric S. Chung
Eric S. Chung has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 10795678Abstract: Neural network processors including a vector register file (VRF) having a multi-port memory and related methods are provided. The processor may include tiles to process an N by N matrix of data elements and an N by 1 vector of data elements. The VRF may, in response to a write instruction, store N data elements in a multi-port memory and during each one of out of P clock cycles provide N data elements to each one of P input interface circuits of the multi-port memory comprising an input lane configured to carry L data elements in parallel. During the each one of the P clock cycles the multi-port memory may be configured to receive N data elements via a selected at least one of the P input interface circuits. The VRF may include output interface circuits for providing N data elements in response to a read instruction.Type: GrantFiled: April 21, 2018Date of Patent: October 6, 2020Assignee: Microsoft Technology Licensing, LLCInventors: Jeremy Fowers, Kalin Ovtcharov, Eric S. Chung, Todd Michael Massengill, Ming Gang Liu, Gabriel Leonard Weisz
-
Patent number: 10791054Abstract: Systems and methods for flow control and congestion management of messages among acceleration components (ACs) configurable to accelerate a service are provided. An example system comprises a software plane including host components configured to execute instructions corresponding to a service and an acceleration plane including ACs configurable to accelerate the service. In a first mode a sending AC is configured to, in response to receiving a first indication from a receiving AC, send subsequent packets corresponding to a first message associated with the service using a larger inter-packet gap than an inter-packet gap used for previous packets corresponding to the first message associated with the service, and in the second mode the sending AC is configured to, in response to receiving a second indication from the receiving AC, delay a transmission of a next packet corresponding to the first message associated with the service.Type: GrantFiled: April 26, 2019Date of Patent: September 29, 2020Assignee: Microsoft Technology Licensing, LLCInventors: Adrian M. Caulfield, Eric S. Chung, Michael Papamichael
-
Publication number: 20200302271Abstract: Quantization-aware neural architecture search (“QNAS”) can be utilized to learn optimal hyperparameters for configuring an artificial neural network (“ANN”) that quantizes activation values and/or weights. The hyperparameters can include model topology parameters, quantization parameters, and hardware architecture parameters. Model topology parameters specify the structure and connectivity of an ANN. Quantization parameters can define a quantization configuration for an ANN such as, for example, a bit width for a mantissa for storing activation values or weights generated by the layers of an ANN. The activation values and weights can be represented using a quantized-precision floating-point format, such as a block floating-point format (“BFP”) having a mantissa that has fewer bits than a mantissa in a normal-precision floating-point representation and a shared exponent.Type: ApplicationFiled: March 18, 2019Publication date: September 24, 2020Inventors: Kalin Ovtcharov, Eric S. Chung, Vahideh Akhlaghi, Ritchie Zhao
-
Publication number: 20200302273Abstract: Perplexity scores are computed for training data samples during ANN training. Perplexity scores can be computed as a divergence between data defining a class associated with a current training data sample and a probability vector generated by the ANN model. Perplexity scores can alternately be computed by learning a probability density function (“PDF”) fitting activation maps generated by an ANN model during training. A perplexity score can then be computed for a current training data sample by computing a probability for the current training data sample based on the PDF. If the perplexity score for a training data sample is lower than a threshold, the training data sample is removed from the training data set so that it will not be utilized for training during subsequent epochs. Training of the ANN model continues following the removal of training data samples from the training data set.Type: ApplicationFiled: March 20, 2019Publication date: September 24, 2020Inventors: Eric S. CHUNG, Douglas C. BURGER, Bita DARVISH ROUHANI
-
Publication number: 20200302269Abstract: Machine learning is utilized to learn an optimized quantization configuration for an artificial neural network (ANN). For example, an ANN can be utilized to learn an optimal bit width for quantizing weights for layers of the ANN. The ANN can also be utilized to learn an optimal bit width for quantizing activation values for the layers of the ANN. Once the bit widths have been learned, they can be utilized at inference time to improve the performance of the ANN by quantizing the weights and activation values of the layers of the ANN.Type: ApplicationFiled: March 18, 2019Publication date: September 24, 2020Inventors: Kalin OVTCHAROV, Eric S. CHUNG, Vahideh AKHLAGHI, Ritchie ZHAO
-
Publication number: 20200302330Abstract: Machine learning may include training and drawing inference from artificial neural networks, processes which may include performing convolution and matrix multiplication operations. Convolution and matrix multiplication operations are performed using vectors of block floating-point (BFP) values that may include outliers. BFP format stores floating-point values using a plurality of mantissas of a fixed bit width and a shared exponent. Elements are outliers when they are too large to be represented precisely with the fixed bit width mantissa and shared exponent. Outlier values are split into two mantissas. One mantissa is stored in the vector with non-outliers, while the other mantissa is stored outside the vector. Operations, such as a dot product, may be performed on the vectors in part by combining the in-vector mantissa and exponent of an outlier value with the out-of-vector mantissa and exponent.Type: ApplicationFiled: March 18, 2019Publication date: September 24, 2020Inventors: Eric S. CHUNG, Daniel LO, Ritchie ZHAO
-
Publication number: 20200302283Abstract: The use of mixed precision values when training an artificial neural network (ANN) can increase performance while reducing cost. Certain portions and/or steps of an ANN may be selected to use higher or lower precision values when training. Additionally, or alternatively, early phases of training are accurate enough with lower levels of precision to quickly refine an ANN model, while higher levels of precision may be used to increase accuracy for later steps and epochs. Similarly, different gates of a long short-term memory (LSTM) may be supplied with values having different precisions.Type: ApplicationFiled: March 18, 2019Publication date: September 24, 2020Inventors: Haishan ZHU, Taesik NA, Daniel LO, Eric S. CHUNG
-
Publication number: 20200264876Abstract: Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and, in particular, for adjusting floating-point formats used to store activation values during training. In certain examples of the disclosed technology, a computing system includes processors, memory, and a floating-point compressor in communication with the memory. The computing system is configured to produce a neural network comprising activation values expressed in a first floating-point format, select a second floating-point format for the neural network based on a performance metric, convert at least one of the activation values to the second floating-point format, and store the compressed activation values in the memory. Aspects of the second floating-point format that can be adjusted include the number of bits used to express mantissas, exponent format, use of non-uniform mantissas, and/or use of outlier values to express some of the mantissas.Type: ApplicationFiled: February 14, 2019Publication date: August 20, 2020Applicant: Microsoft Technology Licensing, LLCInventors: Daniel Lo, Bita Darvish Rouhani, Eric S. Chung, Yiren Zhao, Amar Phanishayee, Ritchie Zhao
-
Publication number: 20200265301Abstract: Technology related to incremental training of machine learning tools is disclosed. In one example of the disclosed technology, a method can include receiving operational parameters of a machine learning tool based on a primary set of training data. The machine learning tool can be a deep neural network. Input data can be applied to the machine learning tool to generate an output of the machine learning tool. A measure of prediction quality can be generated for the output of the machine learning tool. In response to determining the measure of prediction quality is below a threshold, incremental training of the operational parameters can be initiated using the input data as training data for the machine learning tool. Operational parameters of the machine learning tool can be updated based on the incremental training. The updated operational parameters can be stored.Type: ApplicationFiled: February 15, 2019Publication date: August 20, 2020Applicant: Microsoft Technology Licensing, LLCInventors: Douglas C. Burger, Eric S. Chung, Bita Darvish Rouhani
-
Publication number: 20200242474Abstract: Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and in particular for storing activation values from a neural network in a compressed format having lossy or non-uniform mantissas for use during forward and backward propagation training of the neural network. In certain examples of the disclosed technology, a computing system includes processors, memory, and a compressor in communication with the memory. The computing system is configured to perform forward propagation for a layer of a neural network to produced first activation values in a first block floating-point format. In some examples, activation values generated by forward propagation are converted by the compressor to a second block floating-point format having a non-uniform and/or lossy mantissa. The compressed activation values are stored in the memory, where they can be retrieved for use during back propagation.Type: ApplicationFiled: January 24, 2019Publication date: July 30, 2020Applicant: Microsoft Technology Licensing, LLCInventors: Daniel Lo, Amar Phanishayee, Eric S. Chung, Yiren Zhao
-
Publication number: 20200218982Abstract: A machine learning tool uses dithered quantization of parameters during training of a machine learning model such as a neural network. The machine learning tool receives training data and initializes certain parameters of the machine learning model (e.g., weights for connections between nodes of a neural network, biases for nodes). The machine learning tool trains the parameters in one or more iterations based on the training data. In particular, in a given iteration, the machine learning tool applies the machine learning model to at least some of the training data and, based at least in part on the results, determines parameter updates to the parameters. The machine learning tool updates the parameters using the parameter updates and a dithered quantizer function, which can add random values before a rounding or truncation operation.Type: ApplicationFiled: January 4, 2019Publication date: July 9, 2020Applicant: Microsoft Technology Licensing, LLCInventors: Thomas M. ANNAU, Haishan ZHU, Daniel LO, Eric S. CHUNG
-
Publication number: 20200210838Abstract: Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and in particular for storing activation values from a neural network in a compressed format for use during forward and backward propagation training of the neural network. In certain examples of the disclosed technology, a computing system includes processors, memory, and a compressor in communication with the memory. The computing system is configured to perform forward propagation for a layer of a neural network to produced first activation values in a first block floating-point format. In some examples, activation values generated by forward propagation are converted by the compressor to a second block floating-point format having a narrower numerical precision than the first block floating-point format. The compressed activation values are stored in the memory, where they can be retrieved for use during back propagation.Type: ApplicationFiled: December 31, 2018Publication date: July 2, 2020Applicant: Microsoft Technology Licensing, LLCInventors: Daniel Lo, Amar Phanishayee, Eric S. Chung, Yiren Zhao, Ritchie Zhao
-
Publication number: 20200210839Abstract: Apparatus and methods for training a neural network accelerator using quantized precision data formats having outlier values are disclosed, and in particular for storing activation values from a neural network in a compressed format for use during forward and backward propagation training of the neural network. In certain examples of the disclosed technology, a computing system is configured to perform forward propagation for a layer of a neural network to produced first activation values in a first block floating-point format. In some examples, activation values generated by forward propagation are converted by the compressor to a second block floating-point format having a narrower numerical precision than the first block floating-point format. Outlier values, comprising additional bits of mantissa and/or exponent are stored in ancillary storage for subset of the activation values. The compressed activation values are stored in the memory, where they can be retrieved for use during back propagation.Type: ApplicationFiled: December 31, 2018Publication date: July 2, 2020Applicant: Microsoft Technology Licensing, LLCInventors: Daniel Lo, Amar Phanishayee, Eric S. Chung, Yiren Zhao, Ritchie Zhao
-
Publication number: 20200210840Abstract: Apparatus and methods for training neural networks based on a performance metric, including adjusting numerical precision and topology as training progresses are disclosed. In some examples, block floating-point formats having relatively lower accuracy are used during early stages of training. Accuracy of the floating-point format can be increased as training progresses based on a determined performance metric. In some examples, values for the neural network are transformed to normal precision floating-point formats. The performance metric can be determined based on entropy of values for the neural network, accuracy of the neural network, or by other suitable techniques. Accelerator hardware can be used to implement certain implementations, including hardware having direct support for block floating-point formats.Type: ApplicationFiled: December 31, 2018Publication date: July 2, 2020Applicant: Microsoft Technology Licensing, LLCInventors: Bita Darvish Rouhani, Eric S. Chung, Daniel Lo, Douglas C. Burger
-
Publication number: 20200202213Abstract: Methods and apparatus are disclosed for adjusting hyper-parameters of a neural network to compensate for noise, such as noise introduced via quantization of one or more parameters of the neural network. In some examples, the adjustment can include scaling the hyper-parameter based on at least one metric representing noise present in the neural network. The at least one metric can include a noise-to-signal ratio for weights of the neural network, such as edge weights and activation weights. In a quantized neural network, a learning rate hyper-parameter used to compute a gradient update for a layer during back propagation can be scaled based on the at least one metric. In some examples, the same scaled learning rate can be used when computing gradient updates for other layers.Type: ApplicationFiled: December 19, 2018Publication date: June 25, 2020Applicant: Microsoft Technology Licensing, LLCInventors: Bita Darvish Rouhani, Eric S. Chung, Daniel Lo, Douglas C. Burger
-
Patent number: 10691413Abstract: A system for block floating point computation in a neural network receives a block floating point number comprising a mantissa portion. A bit-width of the block floating point number is reduced by decomposing the block floating point number into a plurality of numbers each having a mantissa portion with a bit-width that is smaller than a bit-width of the mantissa portion of the block floating point number. One or more dot product operations are performed separately on each of the plurality of numbers to obtain individual results, which are summed to generate a final dot product value. The final dot product value is used to implement the neural network. The reduced bit width computations allow higher precision mathematical operations to be performed on lower-precision processors with improved accuracy.Type: GrantFiled: May 4, 2018Date of Patent: June 23, 2020Assignee: Microsoft Technology Licensing, LLCInventors: Daniel Lo, Eric S. Chung, Douglas C. Burger
-
Publication number: 20200193273Abstract: Methods and apparatus are disclosed for providing emulation of quantized precision operations in a neural network. In some examples, the quantized precision operations are performed in a block floating-point format where values of a tensor share a common exponent. Techniques for selecting higher precision or lower precision can be used based on a variety of input metrics. When converting to a quantized tensor, a residual tensor is produced. In one embodiment, an error value associated with converting from a normal-precision floating point number to the quantized tensor is used to determine whether to use the residual tensor in a dot product calculation. Using the residual tensor increases the precision of an output from a node. Selection of whether to use the residual tensor can depend on various input metrics including the error value, the layer number, the exponent value, the layer type, etc.Type: ApplicationFiled: December 14, 2018Publication date: June 18, 2020Applicant: Microsoft Technology Licensing, LLCInventors: Eric S. Chung, Daniel Lo, Jialiang Zhang, Ritchie Zhao
-
Publication number: 20200193274Abstract: Technology related to training a neural network accelerator using mixed precision data formats is disclosed. In one example of the disclosed technology, a neural network accelerator is configured to accelerate a given layer of a multi-layer neural network. An input tensor for the given layer can be converted from a normal-precision floating-point format to a quantized-precision floating-point format. A tensor operation can be performed using the converted input tensor. A result of the tensor operation can be converted from the block floating-point format to the normal-precision floating-point format. The converted result can be used to generate an output tensor of the layer of the neural network, where the output tensor is in normal-precision floating-point format.Type: ApplicationFiled: December 18, 2018Publication date: June 18, 2020Applicant: Microsoft Technology Licensing, LLCInventors: Bita Darvish Rouhani, Taesik Na, Eric S. Chung, Daniel Lo, Douglas C. Burger
-
Patent number: 10528119Abstract: Dynamic power routing is utilized to route power from other components, which are transitioned to lower power consuming states, in order to accommodate more efficient processing of computational tasks by hardware accelerators, thereby staying within electrical power thresholds that would otherwise not have accommodated simultaneous full-power operation of the other components and such hardware accelerators. Once a portion of a workflow is being processed by hardware accelerators, the workflow, or the hardware accelerators, can be self-throttling to stay within power thresholds, or they can be throttled by independent coordinators, including device-centric and system-wide coordinators.Type: GrantFiled: August 25, 2017Date of Patent: January 7, 2020Assignee: Microsoft Technology Licensing, LLCInventors: Andrew R Putnam, Douglas Christopher Burger, Stephen F Heil, Eric S. Chung, Adrian M. Caulfield
-
Publication number: 20190394260Abstract: A server system is provided that includes a plurality of servers, each server including at least one hardware acceleration device and at least one processor communicatively coupled to the hardware acceleration device by an internal data bus and executing a host server instance, the host server instances of the plurality of servers collectively providing a software plane, and the hardware acceleration devices of the plurality of servers collectively providing a hardware acceleration plane that implements a plurality of hardware accelerated services, wherein each hardware acceleration device maintains in memory a data structure that contains load data indicating a load of each of a plurality of target hardware acceleration devices, and wherein a requesting hardware acceleration device routes the request to a target hardware acceleration device that is indicated by the load data in the data structure to have a lower load than other of the target hardware acceleration devices.Type: ApplicationFiled: August 30, 2019Publication date: December 26, 2019Applicant: Microsoft Technology Licensing, LLCInventors: Adrian Michael Caulfield, Eric S. Chung, Michael Konstantinos Papamichael, Douglas C. Burger, Shlomi Alkalay