Patents by Inventor Eric S. Chung

Eric S. Chung has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20190340499
    Abstract: Methods and apparatus are disclosed for providing emulation of quantized precision operations. In some examples, the quantized precision operations are performed for neural network models. Parameters of the quantized precision operations can be selected to emulate operation of hardware accelerators adapted to perform quantized format operations. In some examples, the quantized precision operations are performed in a block floating-point format where one or more values of a tensor, matrix, or vectors share a common exponent. Techniques for selecting the exponent, reshaping the input tensors, and training neural networks for use with quantized precision models are also disclosed. In some examples, a neural network model is further retrained based on the quantized model. For example, a normal precision model or a quantized precision model can be retrained by evaluating loss induced by performing operations in the quantized format.
    Type: Application
    Filed: May 4, 2018
    Publication date: November 7, 2019
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Douglas C. Burger, Eric S. Chung, Bita Darvish Rouhani, Daniel Lo, Ritchie Zhao
  • Publication number: 20190340492
    Abstract: Methods and apparatus are disclosed supporting a design flow for developing quantized neural networks. In one example of the disclosed technology, a method includes quantizing a normal-precision floating-point neural network model into a quantized format. For example, the quantized format can be a block floating-point format, where two or more elements of tensors in the neural network share a common exponent. A set of test input is applied to a normal-precision flooding point model and the corresponding quantized model and the respective output tensors are compared. Based on this comparison, hyperparameters or other attributes of the neural networks can be adjusted. Further, quantization parameters determining the widths of data and selection of shared exponents for the block floating-point format can be selected. An adjusted, quantized neural network is retrained and programmed into a hardware accelerator.
    Type: Application
    Filed: May 4, 2018
    Publication date: November 7, 2019
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Douglas C. Burger, Eric S. Chung, Bita Darvish Rouhani, Daniel Lo, Ritchie Zhao
  • Publication number: 20190339937
    Abstract: A system for block floating point computation in a neural network receives a block floating point number comprising a mantissa portion. A bit-width of the block floating point number is reduced by decomposing the block floating point number into a plurality of numbers each having a mantissa portion with a bit-width that is smaller than a bit-width of the mantissa portion of the block floating point number. One or more dot product operations are performed separately on each of the plurality of numbers to obtain individual results, which are summed to generate a final dot product value. The final dot product value is used to implement the neural network. The reduced bit width computations allow higher precision mathematical operations to be performed on lower-precision processors with improved accuracy.
    Type: Application
    Filed: May 4, 2018
    Publication date: November 7, 2019
    Inventors: Daniel LO, Eric S. CHUNG, Douglas C. BURGER
  • Publication number: 20190324748
    Abstract: Neural network processors including a vector register file (VRF) having a multi-port memory and related methods are provided. The processor may include tiles to process an N by N matrix of data elements and an N by 1 vector of data elements. The VRF may, in response to a write instruction, store N data elements in a multi-port memory and during each one of out of P clock cycles provide N data elements to each one of P input interface circuits of the multi-port memory comprising an input lane configured to carry L data elements in parallel. During the each one of the P clock cycles the multi-port memory may be configured to receive N data elements via a selected at least one of the P input interface circuits. The VRF may include output interface circuits for providing N data elements in response to a read instruction.
    Type: Application
    Filed: April 21, 2018
    Publication date: October 24, 2019
    Inventors: Jeremy Fowers, Kalin Ovtcharov, Eric S. Chung, Todd Michael Massengill, Ming Gang Liu, Gabriel Leonard Weisz
  • Publication number: 20190325296
    Abstract: Neural network processors that have been customized based on application specific synthesis specialization parameters and related methods are described. Certain example neural network processors and methods described in the present disclosure expose several major synthesis specialization parameters that can be used for specializing a microarchitecture instance of a neural network processor to specific neural network models including: (1) aligning the native vector dimension to the parameters of the model to minimize padding and waste during model evaluation, (2) increasing lane widths to drive up intra-row-level parallelism, or (3) increasing matrix multiply tiles to exploit sub-matrix parallelism for large neural network models.
    Type: Application
    Filed: April 21, 2018
    Publication date: October 24, 2019
    Inventors: Jeremy Fowers, Kalin Ovtcharov, Eric S. Chung, Todd Michael Massengill, Ming Gang Liu, Gabriel Leonard Weisz
  • Patent number: 10425472
    Abstract: A server system is provided that includes a plurality of servers, each server including at least one hardware acceleration device and at least one processor communicatively coupled to the hardware acceleration device by an internal data bus and executing a host server instance, the host server instances of the plurality of servers collectively providing a software plane, and the hardware acceleration devices of the plurality of servers collectively providing a hardware acceleration plane that implements a plurality of hardware accelerated services, wherein each hardware acceleration device maintains in memory a data structure that contains load data indicating a load of each of a plurality of target hardware acceleration devices, and wherein a requesting hardware acceleration device routes the request to a target hardware acceleration device that is indicated by the load data in the data structure to have a lower load than other of the target hardware acceleration devices.
    Type: Grant
    Filed: January 17, 2017
    Date of Patent: September 24, 2019
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Adrian Michael Caulfield, Eric S. Chung, Michael Konstantinos Papamichael, Douglas C. Burger, Shlomi Alkalay
  • Publication number: 20190286973
    Abstract: Technology related to hardware accelerated neural network subgraphs is disclosed. In one example of the disclosed technology, a method includes receiving source code specifying a neural network model. The source code includes an application programming interface (API) marking a subgraph of the neural network model as targeted for hardware acceleration. The method includes compiling the subgraph to the neural network accelerator target to generate configuration information for the hardware accelerator. The method includes configuring the hardware accelerator to evaluate the neural network model, where the hardware accelerator is configured using the configuration information.
    Type: Application
    Filed: May 4, 2018
    Publication date: September 19, 2019
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Ratna Kumar Kovvuri, Ahmad Mahdi El Husseini, Steven K. Reinhardt, Daniel Lo, Eric S. Chung, Sarabjit Singh Seera, Friedel van Megen, Alessandro Forin
  • Publication number: 20190253354
    Abstract: Systems and methods for flow control and congestion management of messages among acceleration components (ACs) configurable to accelerate a service are provided. An example system comprises a software plane including host components configured to execute instructions corresponding to a service and an acceleration plane including ACs configurable to accelerate the service. In a first mode a sending AC is configured to, in response to receiving a first indication from a receiving AC, send subsequent packets corresponding to a first message associated with the service using a larger inter-packet gap than an inter-packet gap used for previous packets corresponding to the first message associated with the service, and in the second mode the sending AC is configured to, in response to receiving a second indication from the receiving AC, delay a transmission of a next packet corresponding to the first message associated with the service.
    Type: Application
    Filed: April 26, 2019
    Publication date: August 15, 2019
    Inventors: Adrian M. Caulfield, Eric S. Chung, Michael Papamichael
  • Publication number: 20190197406
    Abstract: A computer implemented method of optimizing a neural network includes obtaining a deep neural network (DNN) trained with a training dataset, determining a spreading signal between neurons in multiple adjacent layers of the DNN wherein the spreading signal is an element-wise multiplication of input activations between the neurons in a first layer to neurons in a second next layer with a corresponding weight matrix of connections between such neurons, and determining neural entropies of respective connections between neurons by calculating an exponent of a volume of an area covered by the spreading signal. The DNN may be optimized based on the determined neural entropies between the neurons in the multiple adjacent layers.
    Type: Application
    Filed: December 22, 2017
    Publication date: June 27, 2019
    Inventors: Bita Darvish Rouhani, Douglas C. Burger, Eric S. Chung
  • Patent number: 10326696
    Abstract: Components, methods, and systems allowing acceleration components to transmit messages are provided. An acceleration component for use among a first plurality of acceleration components, associated with a first top-of-rack (TOR) switch, to transmit messages to other acceleration components in an acceleration plane configurable to provide service acceleration for a service is provided. The acceleration component includes a transport component configured to transmit a first point-to-point message to a second acceleration component, associated with a second TOR switch different form the first TOR switch, and to a third acceleration component, associated with a third TOR switch different from the first TOR switch and the second TOR switch. The transport component may be configured to broadcast a second point-to-point message to all of a second plurality of acceleration components associated with the second TOR switch and to all of a third plurality of acceleration components associated with the third TOR switch.
    Type: Grant
    Filed: January 2, 2017
    Date of Patent: June 18, 2019
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Adrian M. Caulfield, Eric S. Chung, Michael Papamichael
  • Patent number: 10320677
    Abstract: Systems and methods for flow control and congestion management of messages among acceleration components (ACs) configurable to accelerate a service are provided. An example system comprises a software plane including host components configured to execute instructions corresponding to a service and an acceleration plane including ACs configurable to accelerate the service. In a first mode a sending AC is configured to, in response to receiving a first indication from a receiving AC, send subsequent packets corresponding to a first message associated with the service using a larger inter-packet gap than an inter-packet gap used for previous packets corresponding to the first message associated with the service, and in the second mode the sending AC is configured to, in response to receiving a second indication from the receiving AC, delay a transmission of a next packet corresponding to the first message associated with the service.
    Type: Grant
    Filed: February 10, 2017
    Date of Patent: June 11, 2019
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Adrian M. Caulfield, Eric S. Chung, Michael Papamichael
  • Patent number: 10296392
    Abstract: A data processing system is described herein that includes two or more software-driven host components that collectively provide a software plane. The data processing system further includes two or more hardware acceleration components that collectively provide a hardware acceleration plane. The hardware acceleration plane implements one or more services, including at least one multi-component service. The multi-component service has plural parts, and is implemented on a collection of two or more hardware acceleration components, where each hardware acceleration component in the collection implements a corresponding part of the multi-component service. Each hardware acceleration component in the collection is configured to interact with other hardware acceleration components in the collection without involvement from any host component. A function parsing component is also described herein that determines a manner of parsing a function into the plural parts of the multi-component service.
    Type: Grant
    Filed: May 20, 2015
    Date of Patent: May 21, 2019
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Stephen F. Heil, Adrian M. Caulfield, Douglas C. Burger, Andrew R. Putnam, Eric S. Chung
  • Patent number: 10167800
    Abstract: Processors and methods for neural network processing are provided. A method includes receiving vector data corresponding to a layer of a neural network model, where each of the vector data has a value comprising at least one exponent. The method further includes first processing a first subset of the vector data to determine a first shared exponent for representing values in the first subset of the vector data in a block-floating point format and second processing a second subset of the vector data to determine a second shared exponent for representing values in the second subset of the vector data in a block-floating point format in a manner that no vector data from the second subset of the vector data influences a determination of the first shared exponent and no vector data from the first subset of the vector data influences a determination of the second shared exponent.
    Type: Grant
    Filed: August 18, 2017
    Date of Patent: January 1, 2019
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Eric S. Chung, Douglas C. Burger, Daniel Lo, Kalin Ovtcharov
  • Publication number: 20180349196
    Abstract: A data processing system is described herein that includes two or more software-driven host components that collectively provide a software plane. The data processing system further includes two or more hardware acceleration components that collectively provide a hardware acceleration plane. The hardware acceleration plane implements one or more services, including at least one multi-component service. The multi-component service has plural parts, and is implemented on a collection of two or more hardware acceleration components, where each hardware acceleration component in the collection implements a corresponding part of the multi-component service. Each hardware acceleration component in the collection is configured to interact with other hardware acceleration components in the collection without involvement from any host component. A function parsing component is also described herein that determines a manner of parsing a function into the plural parts of the multi-component service.
    Type: Application
    Filed: August 9, 2018
    Publication date: December 6, 2018
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Stephen F. Heil, Adrian M. Caulfield, Douglas C. Burger, Andrew R. Putnam, Eric S. Chung
  • Patent number: 10140252
    Abstract: Hardware and methods for neural network processing are provided. A method in a system comprising a plurality of nodes, where each node comprises a plurality of tiles, is provided. The method includes receiving an N by M matrix of coefficients configured to control a neural network model. The method includes storing a first row and a second row of the N by M matrix of coefficients in a first and a second on-chip memories incorporated within a first and a second of the plurality of tiles. The method includes processing the first row of the coefficients and a first set of input vectors using a first compute unit incorporated within the first of the plurality of tiles. The method includes processing the second row of the coefficients and a second set of input vectors using a second compute unit incorporated within the second of the plurality of tiles.
    Type: Grant
    Filed: June 29, 2017
    Date of Patent: November 27, 2018
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Jeremy Fowers, Eric S. Chung
  • Publication number: 20180246853
    Abstract: Hardware and methods for neural network processing are provided. A method in a system comprising a plurality of nodes, where each node comprises a plurality of tiles, is provided. The method includes receiving an N by M matrix of coefficients configured to control a neural network model. The method includes storing a first row and a second row of the N by M matrix of coefficients in a first and a second on-chip memories incorporated within a first and a second of the plurality of tiles. The method includes processing the first row of the coefficients and a first set of input vectors using a first compute unit incorporated within the first of the plurality of tiles. The method includes processing the second row of the coefficients and a second set of input vectors using a second compute unit incorporated within the second of the plurality of tiles.
    Type: Application
    Filed: June 29, 2017
    Publication date: August 30, 2018
    Inventors: Jeremy Fowers, Eric S. Chung
  • Publication number: 20180247190
    Abstract: Systems and methods for neural network processing are provided. A method in a system comprising a plurality of nodes interconnected via a network, where each node includes a plurality of on-chip memory blocks and a plurality of compute units, is provided. The method includes upon service activation receiving an N by M matrix of coefficients corresponding to the neural network model. The method includes loading the coefficients corresponding to the neural network model into the plurality of the on-chip memory blocks for processing by the plurality of compute units. The method includes regardless of a utilization of the plurality of the on-chip memory blocks as part of an evaluation of the neural network model, maintaining the coefficients corresponding to the neural network model in the plurality of the on-chip memory blocks until the service is interrupted or the neural network model is modified or replaced.
    Type: Application
    Filed: June 29, 2017
    Publication date: August 30, 2018
    Inventors: Eric S. Chung, Douglas C. Burger, Jeremy Fowers, Kalin Ovtcharov
  • Publication number: 20180247187
    Abstract: Processors and methods for neural network processing are provided. A method in a processor including a pipeline having a matrix vector unit (MVU), a first multifunction unit connected to receive an input from the MVU, a second multifunction unit connected to receive an output from the first multifunction unit, and a third multifunction unit connected to receive an output from the second multifunction unit is provided. The method includes decoding instructions including a first type of instruction for processing by only the MVU and a second type of instruction for processing by only one of the multifunction units. The method includes mapping a first instruction for processing by the matrix vector unit or to any one of the first multifunction unit, the second multifunction unit, or the third multifunction unit depending on whether the first instruction is the first type of instruction or the second type of instruction.
    Type: Application
    Filed: June 29, 2017
    Publication date: August 30, 2018
    Inventors: Eric S. Chung, Douglas C. Burger, Jeremy Fowers
  • Publication number: 20180247185
    Abstract: Processors and methods for neural network processing are provided. A method in a processor including a pipeline having a matrix vector unit (MVU), a first multifunction unit connected to receive an input from the matrix vector unit, a second multifunction unit connected to receive an output from the first multifunction unit, and a third multifunction unit connected to receive an output from the second multifunction unit is provided. The method includes decoding a chain of instructions received via an input queue, where the chain of instructions comprises a first instruction that can only be processed by the matrix vector unit and a sequence of instructions that can only be processed by a multifunction unit. The method includes processing the first instruction using the MVU and processing each of instructions in the sequence of instructions depending upon a position of the each of instructions in the sequence of instructions.
    Type: Application
    Filed: June 29, 2017
    Publication date: August 30, 2018
    Inventors: Eric S. Chung, Douglas C. Burger, Jeremy Fowers
  • Publication number: 20180247186
    Abstract: Hardware and methods for neural network processing are provided. A method in a hardware node including a pipeline having a matrix vector unit (MVU), a first multifunction unit connected to receive an input from the matrix vector unit, a second multifunction unit connected to receive an output from the first multifunction unit, and a third multifunction unit connected to receive an output from the second multifunction unit is provided. The method includes performing using the MVU a first type of instruction that can only be performed by the MVU to generate a first result. The method further includes performing a second type of instruction that can only be performed by one of the multifunction units and generating a second result and without storing the any of the two results in a global register, passing the second result to the second multifunction and the third multifunction unit.
    Type: Application
    Filed: June 29, 2017
    Publication date: August 30, 2018
    Inventors: Jeremy Fowers, Eric S. Chung, Douglas C. Burger