Patents by Inventor Eric S. Chung

Eric S. Chung has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10795678
    Abstract: Neural network processors including a vector register file (VRF) having a multi-port memory and related methods are provided. The processor may include tiles to process an N by N matrix of data elements and an N by 1 vector of data elements. The VRF may, in response to a write instruction, store N data elements in a multi-port memory and during each one of out of P clock cycles provide N data elements to each one of P input interface circuits of the multi-port memory comprising an input lane configured to carry L data elements in parallel. During the each one of the P clock cycles the multi-port memory may be configured to receive N data elements via a selected at least one of the P input interface circuits. The VRF may include output interface circuits for providing N data elements in response to a read instruction.
    Type: Grant
    Filed: April 21, 2018
    Date of Patent: October 6, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Jeremy Fowers, Kalin Ovtcharov, Eric S. Chung, Todd Michael Massengill, Ming Gang Liu, Gabriel Leonard Weisz
  • Patent number: 10791054
    Abstract: Systems and methods for flow control and congestion management of messages among acceleration components (ACs) configurable to accelerate a service are provided. An example system comprises a software plane including host components configured to execute instructions corresponding to a service and an acceleration plane including ACs configurable to accelerate the service. In a first mode a sending AC is configured to, in response to receiving a first indication from a receiving AC, send subsequent packets corresponding to a first message associated with the service using a larger inter-packet gap than an inter-packet gap used for previous packets corresponding to the first message associated with the service, and in the second mode the sending AC is configured to, in response to receiving a second indication from the receiving AC, delay a transmission of a next packet corresponding to the first message associated with the service.
    Type: Grant
    Filed: April 26, 2019
    Date of Patent: September 29, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Adrian M. Caulfield, Eric S. Chung, Michael Papamichael
  • Publication number: 20200302271
    Abstract: Quantization-aware neural architecture search (“QNAS”) can be utilized to learn optimal hyperparameters for configuring an artificial neural network (“ANN”) that quantizes activation values and/or weights. The hyperparameters can include model topology parameters, quantization parameters, and hardware architecture parameters. Model topology parameters specify the structure and connectivity of an ANN. Quantization parameters can define a quantization configuration for an ANN such as, for example, a bit width for a mantissa for storing activation values or weights generated by the layers of an ANN. The activation values and weights can be represented using a quantized-precision floating-point format, such as a block floating-point format (“BFP”) having a mantissa that has fewer bits than a mantissa in a normal-precision floating-point representation and a shared exponent.
    Type: Application
    Filed: March 18, 2019
    Publication date: September 24, 2020
    Inventors: Kalin Ovtcharov, Eric S. Chung, Vahideh Akhlaghi, Ritchie Zhao
  • Publication number: 20200302273
    Abstract: Perplexity scores are computed for training data samples during ANN training. Perplexity scores can be computed as a divergence between data defining a class associated with a current training data sample and a probability vector generated by the ANN model. Perplexity scores can alternately be computed by learning a probability density function (“PDF”) fitting activation maps generated by an ANN model during training. A perplexity score can then be computed for a current training data sample by computing a probability for the current training data sample based on the PDF. If the perplexity score for a training data sample is lower than a threshold, the training data sample is removed from the training data set so that it will not be utilized for training during subsequent epochs. Training of the ANN model continues following the removal of training data samples from the training data set.
    Type: Application
    Filed: March 20, 2019
    Publication date: September 24, 2020
    Inventors: Eric S. CHUNG, Douglas C. BURGER, Bita DARVISH ROUHANI
  • Publication number: 20200302269
    Abstract: Machine learning is utilized to learn an optimized quantization configuration for an artificial neural network (ANN). For example, an ANN can be utilized to learn an optimal bit width for quantizing weights for layers of the ANN. The ANN can also be utilized to learn an optimal bit width for quantizing activation values for the layers of the ANN. Once the bit widths have been learned, they can be utilized at inference time to improve the performance of the ANN by quantizing the weights and activation values of the layers of the ANN.
    Type: Application
    Filed: March 18, 2019
    Publication date: September 24, 2020
    Inventors: Kalin OVTCHAROV, Eric S. CHUNG, Vahideh AKHLAGHI, Ritchie ZHAO
  • Publication number: 20200302330
    Abstract: Machine learning may include training and drawing inference from artificial neural networks, processes which may include performing convolution and matrix multiplication operations. Convolution and matrix multiplication operations are performed using vectors of block floating-point (BFP) values that may include outliers. BFP format stores floating-point values using a plurality of mantissas of a fixed bit width and a shared exponent. Elements are outliers when they are too large to be represented precisely with the fixed bit width mantissa and shared exponent. Outlier values are split into two mantissas. One mantissa is stored in the vector with non-outliers, while the other mantissa is stored outside the vector. Operations, such as a dot product, may be performed on the vectors in part by combining the in-vector mantissa and exponent of an outlier value with the out-of-vector mantissa and exponent.
    Type: Application
    Filed: March 18, 2019
    Publication date: September 24, 2020
    Inventors: Eric S. CHUNG, Daniel LO, Ritchie ZHAO
  • Publication number: 20200302283
    Abstract: The use of mixed precision values when training an artificial neural network (ANN) can increase performance while reducing cost. Certain portions and/or steps of an ANN may be selected to use higher or lower precision values when training. Additionally, or alternatively, early phases of training are accurate enough with lower levels of precision to quickly refine an ANN model, while higher levels of precision may be used to increase accuracy for later steps and epochs. Similarly, different gates of a long short-term memory (LSTM) may be supplied with values having different precisions.
    Type: Application
    Filed: March 18, 2019
    Publication date: September 24, 2020
    Inventors: Haishan ZHU, Taesik NA, Daniel LO, Eric S. CHUNG
  • Publication number: 20200264876
    Abstract: Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and, in particular, for adjusting floating-point formats used to store activation values during training. In certain examples of the disclosed technology, a computing system includes processors, memory, and a floating-point compressor in communication with the memory. The computing system is configured to produce a neural network comprising activation values expressed in a first floating-point format, select a second floating-point format for the neural network based on a performance metric, convert at least one of the activation values to the second floating-point format, and store the compressed activation values in the memory. Aspects of the second floating-point format that can be adjusted include the number of bits used to express mantissas, exponent format, use of non-uniform mantissas, and/or use of outlier values to express some of the mantissas.
    Type: Application
    Filed: February 14, 2019
    Publication date: August 20, 2020
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Daniel Lo, Bita Darvish Rouhani, Eric S. Chung, Yiren Zhao, Amar Phanishayee, Ritchie Zhao
  • Publication number: 20200265301
    Abstract: Technology related to incremental training of machine learning tools is disclosed. In one example of the disclosed technology, a method can include receiving operational parameters of a machine learning tool based on a primary set of training data. The machine learning tool can be a deep neural network. Input data can be applied to the machine learning tool to generate an output of the machine learning tool. A measure of prediction quality can be generated for the output of the machine learning tool. In response to determining the measure of prediction quality is below a threshold, incremental training of the operational parameters can be initiated using the input data as training data for the machine learning tool. Operational parameters of the machine learning tool can be updated based on the incremental training. The updated operational parameters can be stored.
    Type: Application
    Filed: February 15, 2019
    Publication date: August 20, 2020
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Douglas C. Burger, Eric S. Chung, Bita Darvish Rouhani
  • Publication number: 20200242474
    Abstract: Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and in particular for storing activation values from a neural network in a compressed format having lossy or non-uniform mantissas for use during forward and backward propagation training of the neural network. In certain examples of the disclosed technology, a computing system includes processors, memory, and a compressor in communication with the memory. The computing system is configured to perform forward propagation for a layer of a neural network to produced first activation values in a first block floating-point format. In some examples, activation values generated by forward propagation are converted by the compressor to a second block floating-point format having a non-uniform and/or lossy mantissa. The compressed activation values are stored in the memory, where they can be retrieved for use during back propagation.
    Type: Application
    Filed: January 24, 2019
    Publication date: July 30, 2020
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Daniel Lo, Amar Phanishayee, Eric S. Chung, Yiren Zhao
  • Publication number: 20200218982
    Abstract: A machine learning tool uses dithered quantization of parameters during training of a machine learning model such as a neural network. The machine learning tool receives training data and initializes certain parameters of the machine learning model (e.g., weights for connections between nodes of a neural network, biases for nodes). The machine learning tool trains the parameters in one or more iterations based on the training data. In particular, in a given iteration, the machine learning tool applies the machine learning model to at least some of the training data and, based at least in part on the results, determines parameter updates to the parameters. The machine learning tool updates the parameters using the parameter updates and a dithered quantizer function, which can add random values before a rounding or truncation operation.
    Type: Application
    Filed: January 4, 2019
    Publication date: July 9, 2020
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Thomas M. ANNAU, Haishan ZHU, Daniel LO, Eric S. CHUNG
  • Publication number: 20200210838
    Abstract: Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and in particular for storing activation values from a neural network in a compressed format for use during forward and backward propagation training of the neural network. In certain examples of the disclosed technology, a computing system includes processors, memory, and a compressor in communication with the memory. The computing system is configured to perform forward propagation for a layer of a neural network to produced first activation values in a first block floating-point format. In some examples, activation values generated by forward propagation are converted by the compressor to a second block floating-point format having a narrower numerical precision than the first block floating-point format. The compressed activation values are stored in the memory, where they can be retrieved for use during back propagation.
    Type: Application
    Filed: December 31, 2018
    Publication date: July 2, 2020
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Daniel Lo, Amar Phanishayee, Eric S. Chung, Yiren Zhao, Ritchie Zhao
  • Publication number: 20200210839
    Abstract: Apparatus and methods for training a neural network accelerator using quantized precision data formats having outlier values are disclosed, and in particular for storing activation values from a neural network in a compressed format for use during forward and backward propagation training of the neural network. In certain examples of the disclosed technology, a computing system is configured to perform forward propagation for a layer of a neural network to produced first activation values in a first block floating-point format. In some examples, activation values generated by forward propagation are converted by the compressor to a second block floating-point format having a narrower numerical precision than the first block floating-point format. Outlier values, comprising additional bits of mantissa and/or exponent are stored in ancillary storage for subset of the activation values. The compressed activation values are stored in the memory, where they can be retrieved for use during back propagation.
    Type: Application
    Filed: December 31, 2018
    Publication date: July 2, 2020
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Daniel Lo, Amar Phanishayee, Eric S. Chung, Yiren Zhao, Ritchie Zhao
  • Publication number: 20200210840
    Abstract: Apparatus and methods for training neural networks based on a performance metric, including adjusting numerical precision and topology as training progresses are disclosed. In some examples, block floating-point formats having relatively lower accuracy are used during early stages of training. Accuracy of the floating-point format can be increased as training progresses based on a determined performance metric. In some examples, values for the neural network are transformed to normal precision floating-point formats. The performance metric can be determined based on entropy of values for the neural network, accuracy of the neural network, or by other suitable techniques. Accelerator hardware can be used to implement certain implementations, including hardware having direct support for block floating-point formats.
    Type: Application
    Filed: December 31, 2018
    Publication date: July 2, 2020
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Bita Darvish Rouhani, Eric S. Chung, Daniel Lo, Douglas C. Burger
  • Publication number: 20200202213
    Abstract: Methods and apparatus are disclosed for adjusting hyper-parameters of a neural network to compensate for noise, such as noise introduced via quantization of one or more parameters of the neural network. In some examples, the adjustment can include scaling the hyper-parameter based on at least one metric representing noise present in the neural network. The at least one metric can include a noise-to-signal ratio for weights of the neural network, such as edge weights and activation weights. In a quantized neural network, a learning rate hyper-parameter used to compute a gradient update for a layer during back propagation can be scaled based on the at least one metric. In some examples, the same scaled learning rate can be used when computing gradient updates for other layers.
    Type: Application
    Filed: December 19, 2018
    Publication date: June 25, 2020
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Bita Darvish Rouhani, Eric S. Chung, Daniel Lo, Douglas C. Burger
  • Patent number: 10691413
    Abstract: A system for block floating point computation in a neural network receives a block floating point number comprising a mantissa portion. A bit-width of the block floating point number is reduced by decomposing the block floating point number into a plurality of numbers each having a mantissa portion with a bit-width that is smaller than a bit-width of the mantissa portion of the block floating point number. One or more dot product operations are performed separately on each of the plurality of numbers to obtain individual results, which are summed to generate a final dot product value. The final dot product value is used to implement the neural network. The reduced bit width computations allow higher precision mathematical operations to be performed on lower-precision processors with improved accuracy.
    Type: Grant
    Filed: May 4, 2018
    Date of Patent: June 23, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Daniel Lo, Eric S. Chung, Douglas C. Burger
  • Publication number: 20200193273
    Abstract: Methods and apparatus are disclosed for providing emulation of quantized precision operations in a neural network. In some examples, the quantized precision operations are performed in a block floating-point format where values of a tensor share a common exponent. Techniques for selecting higher precision or lower precision can be used based on a variety of input metrics. When converting to a quantized tensor, a residual tensor is produced. In one embodiment, an error value associated with converting from a normal-precision floating point number to the quantized tensor is used to determine whether to use the residual tensor in a dot product calculation. Using the residual tensor increases the precision of an output from a node. Selection of whether to use the residual tensor can depend on various input metrics including the error value, the layer number, the exponent value, the layer type, etc.
    Type: Application
    Filed: December 14, 2018
    Publication date: June 18, 2020
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Eric S. Chung, Daniel Lo, Jialiang Zhang, Ritchie Zhao
  • Publication number: 20200193274
    Abstract: Technology related to training a neural network accelerator using mixed precision data formats is disclosed. In one example of the disclosed technology, a neural network accelerator is configured to accelerate a given layer of a multi-layer neural network. An input tensor for the given layer can be converted from a normal-precision floating-point format to a quantized-precision floating-point format. A tensor operation can be performed using the converted input tensor. A result of the tensor operation can be converted from the block floating-point format to the normal-precision floating-point format. The converted result can be used to generate an output tensor of the layer of the neural network, where the output tensor is in normal-precision floating-point format.
    Type: Application
    Filed: December 18, 2018
    Publication date: June 18, 2020
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Bita Darvish Rouhani, Taesik Na, Eric S. Chung, Daniel Lo, Douglas C. Burger
  • Patent number: 10528119
    Abstract: Dynamic power routing is utilized to route power from other components, which are transitioned to lower power consuming states, in order to accommodate more efficient processing of computational tasks by hardware accelerators, thereby staying within electrical power thresholds that would otherwise not have accommodated simultaneous full-power operation of the other components and such hardware accelerators. Once a portion of a workflow is being processed by hardware accelerators, the workflow, or the hardware accelerators, can be self-throttling to stay within power thresholds, or they can be throttled by independent coordinators, including device-centric and system-wide coordinators.
    Type: Grant
    Filed: August 25, 2017
    Date of Patent: January 7, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Andrew R Putnam, Douglas Christopher Burger, Stephen F Heil, Eric S. Chung, Adrian M. Caulfield
  • Publication number: 20190394260
    Abstract: A server system is provided that includes a plurality of servers, each server including at least one hardware acceleration device and at least one processor communicatively coupled to the hardware acceleration device by an internal data bus and executing a host server instance, the host server instances of the plurality of servers collectively providing a software plane, and the hardware acceleration devices of the plurality of servers collectively providing a hardware acceleration plane that implements a plurality of hardware accelerated services, wherein each hardware acceleration device maintains in memory a data structure that contains load data indicating a load of each of a plurality of target hardware acceleration devices, and wherein a requesting hardware acceleration device routes the request to a target hardware acceleration device that is indicated by the load data in the data structure to have a lower load than other of the target hardware acceleration devices.
    Type: Application
    Filed: August 30, 2019
    Publication date: December 26, 2019
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Adrian Michael Caulfield, Eric S. Chung, Michael Konstantinos Papamichael, Douglas C. Burger, Shlomi Alkalay