Patents by Inventor Eric S. Chung

Eric S. Chung has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

QUANTIZATION FOR DNN ACCELERATORS

Publication number: 20190340499

Abstract: Methods and apparatus are disclosed for providing emulation of quantized precision operations. In some examples, the quantized precision operations are performed for neural network models. Parameters of the quantized precision operations can be selected to emulate operation of hardware accelerators adapted to perform quantized format operations. In some examples, the quantized precision operations are performed in a block floating-point format where one or more values of a tensor, matrix, or vectors share a common exponent. Techniques for selecting the exponent, reshaping the input tensors, and training neural networks for use with quantized precision models are also disclosed. In some examples, a neural network model is further retrained based on the quantized model. For example, a normal precision model or a quantized precision model can be retrained by evaluating loss induced by performing operations in the quantized format.

Type: Application

Filed: May 4, 2018

Publication date: November 7, 2019

Applicant: Microsoft Technology Licensing, LLC

Inventors: Douglas C. Burger, Eric S. Chung, Bita Darvish Rouhani, Daniel Lo, Ritchie Zhao
DESIGN FLOW FOR QUANTIZED NEURAL NETWORKS

Publication number: 20190340492

Abstract: Methods and apparatus are disclosed supporting a design flow for developing quantized neural networks. In one example of the disclosed technology, a method includes quantizing a normal-precision floating-point neural network model into a quantized format. For example, the quantized format can be a block floating-point format, where two or more elements of tensors in the neural network share a common exponent. A set of test input is applied to a normal-precision flooding point model and the corresponding quantized model and the respective output tensors are compared. Based on this comparison, hyperparameters or other attributes of the neural networks can be adjusted. Further, quantization parameters determining the widths of data and selection of shared exponents for the block floating-point format can be selected. An adjusted, quantized neural network is retrained and programmed into a hardware accelerator.

Type: Application

Filed: May 4, 2018

Publication date: November 7, 2019

Applicant: Microsoft Technology Licensing, LLC

Inventors: Douglas C. Burger, Eric S. Chung, Bita Darvish Rouhani, Daniel Lo, Ritchie Zhao
BLOCK FLOATING POINT COMPUTATIONS USING REDUCED BIT-WIDTH VECTORS

Publication number: 20190339937

Abstract: A system for block floating point computation in a neural network receives a block floating point number comprising a mantissa portion. A bit-width of the block floating point number is reduced by decomposing the block floating point number into a plurality of numbers each having a mantissa portion with a bit-width that is smaller than a bit-width of the mantissa portion of the block floating point number. One or more dot product operations are performed separately on each of the plurality of numbers to obtain individual results, which are summed to generate a final dot product value. The final dot product value is used to implement the neural network. The reduced bit width computations allow higher precision mathematical operations to be performed on lower-precision processors with improved accuracy.

Type: Application

Filed: May 4, 2018

Publication date: November 7, 2019

Inventors: Daniel LO, Eric S. CHUNG, Douglas C. BURGER
MATRIX VECTOR MULTIPLIER WITH A VECTOR REGISTER FILE COMPRISING A MULTI-PORT MEMORY

Publication number: 20190324748

Abstract: Neural network processors including a vector register file (VRF) having a multi-port memory and related methods are provided. The processor may include tiles to process an N by N matrix of data elements and an N by 1 vector of data elements. The VRF may, in response to a write instruction, store N data elements in a multi-port memory and during each one of out of P clock cycles provide N data elements to each one of P input interface circuits of the multi-port memory comprising an input lane configured to carry L data elements in parallel. During the each one of the P clock cycles the multi-port memory may be configured to receive N data elements via a selected at least one of the P input interface circuits. The VRF may include output interface circuits for providing N data elements in response to a read instruction.

Type: Application

Filed: April 21, 2018

Publication date: October 24, 2019

Inventors: Jeremy Fowers, Kalin Ovtcharov, Eric S. Chung, Todd Michael Massengill, Ming Gang Liu, Gabriel Leonard Weisz
NEURAL NETWORK PROCESSOR BASED ON APPLICATION SPECIFIC SYNTHESIS SPECIALIZATION PARAMETERS

Publication number: 20190325296

Abstract: Neural network processors that have been customized based on application specific synthesis specialization parameters and related methods are described. Certain example neural network processors and methods described in the present disclosure expose several major synthesis specialization parameters that can be used for specializing a microarchitecture instance of a neural network processor to specific neural network models including: (1) aligning the native vector dimension to the parameters of the model to minimize padding and waste during model evaluation, (2) increasing lane widths to drive up intra-row-level parallelism, or (3) increasing matrix multiply tiles to exploit sub-matrix parallelism for large neural network models.

Type: Application

Filed: April 21, 2018

Publication date: October 24, 2019

Inventors: Jeremy Fowers, Kalin Ovtcharov, Eric S. Chung, Todd Michael Massengill, Ming Gang Liu, Gabriel Leonard Weisz
Hardware implemented load balancing

Patent number: 10425472

Abstract: A server system is provided that includes a plurality of servers, each server including at least one hardware acceleration device and at least one processor communicatively coupled to the hardware acceleration device by an internal data bus and executing a host server instance, the host server instances of the plurality of servers collectively providing a software plane, and the hardware acceleration devices of the plurality of servers collectively providing a hardware acceleration plane that implements a plurality of hardware accelerated services, wherein each hardware acceleration device maintains in memory a data structure that contains load data indicating a load of each of a plurality of target hardware acceleration devices, and wherein a requesting hardware acceleration device routes the request to a target hardware acceleration device that is indicated by the load data in the data structure to have a lower load than other of the target hardware acceleration devices.

Type: Grant

Filed: January 17, 2017

Date of Patent: September 24, 2019

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Adrian Michael Caulfield, Eric S. Chung, Michael Konstantinos Papamichael, Douglas C. Burger, Shlomi Alkalay
HARDWARE ACCELERATED NEURAL NETWORK SUBGRAPHS

Publication number: 20190286973

Abstract: Technology related to hardware accelerated neural network subgraphs is disclosed. In one example of the disclosed technology, a method includes receiving source code specifying a neural network model. The source code includes an application programming interface (API) marking a subgraph of the neural network model as targeted for hardware acceleration. The method includes compiling the subgraph to the neural network accelerator target to generate configuration information for the hardware accelerator. The method includes configuring the hardware accelerator to evaluate the neural network model, where the hardware accelerator is configured using the configuration information.

Type: Application

Filed: May 4, 2018

Publication date: September 19, 2019

Applicant: Microsoft Technology Licensing, LLC

Inventors: Ratna Kumar Kovvuri, Ahmad Mahdi El Husseini, Steven K. Reinhardt, Daniel Lo, Eric S. Chung, Sarabjit Singh Seera, Friedel van Megen, Alessandro Forin
FLOW CONTROL AND CONGESTION MANAGEMENT FOR ACCELERATION COMPONENTS CONFIGURED TO ACCELERATE A SERVICE

Publication number: 20190253354

Abstract: Systems and methods for flow control and congestion management of messages among acceleration components (ACs) configurable to accelerate a service are provided. An example system comprises a software plane including host components configured to execute instructions corresponding to a service and an acceleration plane including ACs configurable to accelerate the service. In a first mode a sending AC is configured to, in response to receiving a first indication from a receiving AC, send subsequent packets corresponding to a first message associated with the service using a larger inter-packet gap than an inter-packet gap used for previous packets corresponding to the first message associated with the service, and in the second mode the sending AC is configured to, in response to receiving a second indication from the receiving AC, delay a transmission of a next packet corresponding to the first message associated with the service.

Type: Application

Filed: April 26, 2019

Publication date: August 15, 2019

Inventors: Adrian M. Caulfield, Eric S. Chung, Michael Papamichael
NEURAL ENTROPY ENHANCED MACHINE LEARNING

Publication number: 20190197406

Abstract: A computer implemented method of optimizing a neural network includes obtaining a deep neural network (DNN) trained with a training dataset, determining a spreading signal between neurons in multiple adjacent layers of the DNN wherein the spreading signal is an element-wise multiplication of input activations between the neurons in a first layer to neurons in a second next layer with a corresponding weight matrix of connections between such neurons, and determining neural entropies of respective connections between neurons by calculating an exponent of a volume of an area covered by the spreading signal. The DNN may be optimized based on the determined neural entropies between the neurons in the multiple adjacent layers.

Type: Application

Filed: December 22, 2017

Publication date: June 27, 2019

Inventors: Bita Darvish Rouhani, Douglas C. Burger, Eric S. Chung
Transmission of messages by acceleration components configured to accelerate a service

Patent number: 10326696

Abstract: Components, methods, and systems allowing acceleration components to transmit messages are provided. An acceleration component for use among a first plurality of acceleration components, associated with a first top-of-rack (TOR) switch, to transmit messages to other acceleration components in an acceleration plane configurable to provide service acceleration for a service is provided. The acceleration component includes a transport component configured to transmit a first point-to-point message to a second acceleration component, associated with a second TOR switch different form the first TOR switch, and to a third acceleration component, associated with a third TOR switch different from the first TOR switch and the second TOR switch. The transport component may be configured to broadcast a second point-to-point message to all of a second plurality of acceleration components associated with the second TOR switch and to all of a third plurality of acceleration components associated with the third TOR switch.

Type: Grant

Filed: January 2, 2017

Date of Patent: June 18, 2019

Assignee: Microsoft Technology Licensing, LLC

Inventors: Adrian M. Caulfield, Eric S. Chung, Michael Papamichael
Flow control and congestion management for acceleration components configured to accelerate a service

Patent number: 10320677

Abstract: Systems and methods for flow control and congestion management of messages among acceleration components (ACs) configurable to accelerate a service are provided. An example system comprises a software plane including host components configured to execute instructions corresponding to a service and an acceleration plane including ACs configurable to accelerate the service. In a first mode a sending AC is configured to, in response to receiving a first indication from a receiving AC, send subsequent packets corresponding to a first message associated with the service using a larger inter-packet gap than an inter-packet gap used for previous packets corresponding to the first message associated with the service, and in the second mode the sending AC is configured to, in response to receiving a second indication from the receiving AC, delay a transmission of a next packet corresponding to the first message associated with the service.

Type: Grant

Filed: February 10, 2017

Date of Patent: June 11, 2019

Assignee: Microsoft Technology Licensing, LLC

Inventors: Adrian M. Caulfield, Eric S. Chung, Michael Papamichael
Implementing a multi-component service using plural hardware acceleration components

Patent number: 10296392

Abstract: A data processing system is described herein that includes two or more software-driven host components that collectively provide a software plane. The data processing system further includes two or more hardware acceleration components that collectively provide a hardware acceleration plane. The hardware acceleration plane implements one or more services, including at least one multi-component service. The multi-component service has plural parts, and is implemented on a collection of two or more hardware acceleration components, where each hardware acceleration component in the collection implements a corresponding part of the multi-component service. Each hardware acceleration component in the collection is configured to interact with other hardware acceleration components in the collection without involvement from any host component. A function parsing component is also described herein that determines a manner of parsing a function into the plural parts of the multi-component service.

Type: Grant

Filed: May 20, 2015

Date of Patent: May 21, 2019

Assignee: Microsoft Technology Licensing, LLC

Inventors: Stephen F. Heil, Adrian M. Caulfield, Douglas C. Burger, Andrew R. Putnam, Eric S. Chung
Hardware node having a matrix vector unit with block-floating point processing

Patent number: 10167800

Abstract: Processors and methods for neural network processing are provided. A method includes receiving vector data corresponding to a layer of a neural network model, where each of the vector data has a value comprising at least one exponent. The method further includes first processing a first subset of the vector data to determine a first shared exponent for representing values in the first subset of the vector data in a block-floating point format and second processing a second subset of the vector data to determine a second shared exponent for representing values in the second subset of the vector data in a block-floating point format in a manner that no vector data from the second subset of the vector data influences a determination of the first shared exponent and no vector data from the first subset of the vector data influences a determination of the second shared exponent.

Type: Grant

Filed: August 18, 2017

Date of Patent: January 1, 2019

Assignee: Microsoft Technology Licensing, LLC

Inventors: Eric S. Chung, Douglas C. Burger, Daniel Lo, Kalin Ovtcharov
Implementing a Service Using Plural Acceleration Components

Publication number: 20180349196

Abstract: A data processing system is described herein that includes two or more software-driven host components that collectively provide a software plane. The data processing system further includes two or more hardware acceleration components that collectively provide a hardware acceleration plane. The hardware acceleration plane implements one or more services, including at least one multi-component service. The multi-component service has plural parts, and is implemented on a collection of two or more hardware acceleration components, where each hardware acceleration component in the collection implements a corresponding part of the multi-component service. Each hardware acceleration component in the collection is configured to interact with other hardware acceleration components in the collection without involvement from any host component. A function parsing component is also described herein that determines a manner of parsing a function into the plural parts of the multi-component service.

Type: Application

Filed: August 9, 2018

Publication date: December 6, 2018

Applicant: Microsoft Technology Licensing, LLC

Inventors: Stephen F. Heil, Adrian M. Caulfield, Douglas C. Burger, Andrew R. Putnam, Eric S. Chung
Hardware node with matrix-vector multiply tiles for neural network processing

Patent number: 10140252

Abstract: Hardware and methods for neural network processing are provided. A method in a system comprising a plurality of nodes, where each node comprises a plurality of tiles, is provided. The method includes receiving an N by M matrix of coefficients configured to control a neural network model. The method includes storing a first row and a second row of the N by M matrix of coefficients in a first and a second on-chip memories incorporated within a first and a second of the plurality of tiles. The method includes processing the first row of the coefficients and a first set of input vectors using a first compute unit incorporated within the first of the plurality of tiles. The method includes processing the second row of the coefficients and a second set of input vectors using a second compute unit incorporated within the second of the plurality of tiles.

Type: Grant

Filed: June 29, 2017

Date of Patent: November 27, 2018

Assignee: Microsoft Technology Licensing, LLC

Inventors: Jeremy Fowers, Eric S. Chung
HARDWARE NODE WITH MATRIX-VECTOR MULTIPLY TILES FOR NEURAL NETWORK PROCESSING

Publication number: 20180246853

Abstract: Hardware and methods for neural network processing are provided. A method in a system comprising a plurality of nodes, where each node comprises a plurality of tiles, is provided. The method includes receiving an N by M matrix of coefficients configured to control a neural network model. The method includes storing a first row and a second row of the N by M matrix of coefficients in a first and a second on-chip memories incorporated within a first and a second of the plurality of tiles. The method includes processing the first row of the coefficients and a first set of input vectors using a first compute unit incorporated within the first of the plurality of tiles. The method includes processing the second row of the coefficients and a second set of input vectors using a second compute unit incorporated within the second of the plurality of tiles.

Type: Application

Filed: June 29, 2017

Publication date: August 30, 2018

Inventors: Jeremy Fowers, Eric S. Chung
NEURAL NETWORK PROCESSING WITH MODEL PINNING

Publication number: 20180247190

Abstract: Systems and methods for neural network processing are provided. A method in a system comprising a plurality of nodes interconnected via a network, where each node includes a plurality of on-chip memory blocks and a plurality of compute units, is provided. The method includes upon service activation receiving an N by M matrix of coefficients corresponding to the neural network model. The method includes loading the coefficients corresponding to the neural network model into the plurality of the on-chip memory blocks for processing by the plurality of compute units. The method includes regardless of a utilization of the plurality of the on-chip memory blocks as part of an evaluation of the neural network model, maintaining the coefficients corresponding to the neural network model in the plurality of the on-chip memory blocks until the service is interrupted or the neural network model is modified or replaced.

Type: Application

Filed: June 29, 2017

Publication date: August 30, 2018

Inventors: Eric S. Chung, Douglas C. Burger, Jeremy Fowers, Kalin Ovtcharov
MULTI-FUNCTION UNIT FOR PROGRAMMABLE HARDWARE NODES FOR NEURAL NETWORK PROCESSING

Publication number: 20180247187

Abstract: Processors and methods for neural network processing are provided. A method in a processor including a pipeline having a matrix vector unit (MVU), a first multifunction unit connected to receive an input from the MVU, a second multifunction unit connected to receive an output from the first multifunction unit, and a third multifunction unit connected to receive an output from the second multifunction unit is provided. The method includes decoding instructions including a first type of instruction for processing by only the MVU and a second type of instruction for processing by only one of the multifunction units. The method includes mapping a first instruction for processing by the matrix vector unit or to any one of the first multifunction unit, the second multifunction unit, or the third multifunction unit depending on whether the first instruction is the first type of instruction or the second type of instruction.

Type: Application

Filed: June 29, 2017

Publication date: August 30, 2018

Inventors: Eric S. Chung, Douglas C. Burger, Jeremy Fowers
HARDWARE NODE WITH POSITION-DEPENDENT MEMORIES FOR NEURAL NETWORK PROCESSING

Publication number: 20180247185

Abstract: Processors and methods for neural network processing are provided. A method in a processor including a pipeline having a matrix vector unit (MVU), a first multifunction unit connected to receive an input from the matrix vector unit, a second multifunction unit connected to receive an output from the first multifunction unit, and a third multifunction unit connected to receive an output from the second multifunction unit is provided. The method includes decoding a chain of instructions received via an input queue, where the chain of instructions comprises a first instruction that can only be processed by the matrix vector unit and a sequence of instructions that can only be processed by a multifunction unit. The method includes processing the first instruction using the MVU and processing each of instructions in the sequence of instructions depending upon a position of the each of instructions in the sequence of instructions.

Type: Application

Filed: June 29, 2017

Publication date: August 30, 2018

Inventors: Eric S. Chung, Douglas C. Burger, Jeremy Fowers
NEURAL NETWORK PROCESSING WITH CHAINED INSTRUCTIONS

Publication number: 20180247186

Abstract: Hardware and methods for neural network processing are provided. A method in a hardware node including a pipeline having a matrix vector unit (MVU), a first multifunction unit connected to receive an input from the matrix vector unit, a second multifunction unit connected to receive an output from the first multifunction unit, and a third multifunction unit connected to receive an output from the second multifunction unit is provided. The method includes performing using the MVU a first type of instruction that can only be performed by the MVU to generate a first result. The method further includes performing a second type of instruction that can only be performed by one of the multifunction units and generating a second result and without storing the any of the two results in a global register, passing the second result to the second multifunction and the third multifunction unit.

Type: Application

Filed: June 29, 2017

Publication date: August 30, 2018

Inventors: Jeremy Fowers, Eric S. Chung, Douglas C. Burger

prev 1 2 3 4 5 next