Patents by Inventor Eric S. Chung

Eric S. Chung has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

DEEP FUSION OF KERNEL EXECUTION

Publication number: 20240126617

Abstract: Embodiments of the present disclosure include techniques for machine language processing. In one embodiment, the present disclosure includes configuring functional modules on a machine learning processor to execute a plurality of machine learning (ML) operations during a plurality of time segments. During the time segments, a first portion of the ML operations execute serially and at least one other ML operation executes during at least a majority of the time of each of the time segments. Serial ML operations may be processed simultaneously with the at least one other ML operation.

Type: Application

Filed: October 14, 2022

Publication date: April 18, 2024

Inventors: Haishan ZHU, Preyas Janak SHAH, Tiyasa MITRA, Eric S. CHUNG
PROGRAM ACCELERATORS WITH MULTIDIMENSIONAL NESTED COMMAND STRUCTURES

Publication number: 20240127107

Abstract: Embodiments of the present disclosure include techniques for machine language processing. In one embodiment, the present disclosure include commands with data structures comprising fields describing multi-dimensional data and fields describing synchronization. Large volumes of data may be processed and automatically synchronized by execution of a single command.

Type: Application

Filed: October 14, 2022

Publication date: April 18, 2024

Inventors: Haishan ZHU, Eric S. CHUNG
Systems and methods for hardware acceleration of data masking using a field programmable gate array

Patent number: 11934327

Abstract: A field programmable gate array (FPGA) including a configurable interconnect fabric connecting a plurality of logic blocks, the configurable interconnect fabric and the logic blocks being configured to implement a data masking circuit configured to: receive input data including data values at a plurality of indices of the input data; select between a data value of the data values and an alternative value using a masking multiplexer to generate masked data, the masking multiplexer being controlled by a mask value of a plurality of mask values at indices corresponding to the indices of the input data; and output the masked data. In some examples, the configurable interconnect fabric and the logic blocks are further configured to implement a mask generation circuit configured to generate the mask values. In some examples, the mask values are received from external memory.

Type: Grant

Filed: December 22, 2021

Date of Patent: March 19, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Jinwen Xi, Ming Gang Liu, Eric S. Chung
HIERARCHICAL PROGRAMMING MODEL FOR ARTIFICIAL INTELLIGENCE HARDWARE

Publication number: 20240086233

Abstract: Embodiments of the present disclosure include systems and methods for providing a hierarchical programming model for AI hardware. A system includes a set of lower-level control threads. The system also includes a higher-level control thread configured to receive a command from a device, generate a set of commands based on the command, and provide the set of commands to a subset of the set of lower-level control threads. A lower-level control thread in the subset of the set of lower-level control threads is configured to instruct, based on a particular command in the set of commands, a subset of a plurality of processing threads to perform a set of operations.

Type: Application

Filed: September 9, 2022

Publication date: March 14, 2024

Inventors: Haishan ZHU, Eric S. CHUNG
Hierarchical and shared exponent floating point data types

Patent number: 11886833

Abstract: Embodiments of the present disclosure include systems and methods for providing hierarchical and shared exponent floating point data types. First and second shared exponent values are determined based on exponent values of a plurality of floating point values. A third shared exponent value is determined based the first shared exponent value and the second shared exponent value. First and second difference values are determined based on the first shared exponent value, the second shared exponent value, and the third shared exponent value. Sign values and mantissa values are determined for the plurality of floating point values. The sign value and the mantissa value for each floating point value in the plurality of floating point values, the third shared exponent value, the first difference value, and the second difference value are stored in a data structure for a shared exponent floating point data type.

Type: Grant

Filed: June 28, 2021

Date of Patent: January 30, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Bita Darvish Rouhani, Venmugil Elango, Rasoul Shafipour, Jeremy Fowers, Ming Gang Liu, Jinwen Xi, Douglas C. Burger, Eric S. Chung
SYSTEMS AND METHODS FOR SPARSE MATRIX MULTIPLICATION

Publication number: 20230385374

Abstract: A method for sparse matrix multiplication comprises receiving a first block having M elements in a first dimension, and parsing the first block of M elements into a first set of B sub-blocks including MB elements in the first dimension. A first sparsity mask having S % sparsity is applied to the first block of elements, such that each of the first set of B sub-blocks has S % sparsity. A second block is received having M elements in a second dimension, and is parsed into a second set of B sub-blocks that include MB elements in the second dimension. A second sparsity mask having S?% sparsity is applied to the second block of elements, such that S?% of the second set of B sub-blocks have 100% sparsity and (100?S?)% of the second set of B sub-blocks have 0% sparsity. The first and second blocks are then matrix multiplied.

Type: Application

Filed: April 4, 2022

Publication date: November 30, 2023

Applicant: Microsoft Technology Licensing, LLC

Inventors: Venmugil ELANGO, Bita DARVISH ROUHANI, Eric S CHUNG, Douglas Christopher BURGER
MODEL CUSTOMIZATION OF TRANSFORMERS FOR IMPROVED EFFICIENCY

Publication number: 20230376725

Abstract: Embodiments of the present disclosure include systems and methods for providing model customizations of transformers for improved efficiency. A first set of settings for a transformer model is received. Based on the first set of settings, a second set of settings for the transformer model is determined. The first set of settings and the second set of settings are used to configure and train the transformer model.

Type: Application

Filed: May 19, 2022

Publication date: November 23, 2023

Inventors: Maral Mesmakhosroshahi, Bita Darvish Rouhani, Eric S. Chung, Douglas C. Burger, Maximilian Taylor Golub
Quantization-aware neural architecture search

Patent number: 11790212

Abstract: Quantization-aware neural architecture search (“QNAS”) can be utilized to learn optimal hyperparameters for configuring an artificial neural network (“ANN”) that quantizes activation values and/or weights. The hyperparameters can include model topology parameters, quantization parameters, and hardware architecture parameters. Model topology parameters specify the structure and connectivity of an ANN. Quantization parameters can define a quantization configuration for an ANN such as, for example, a bit width for a mantissa for storing activation values or weights generated by the layers of an ANN. The activation values and weights can be represented using a quantized-precision floating-point format, such as a block floating-point format (“BFP”) having a mantissa that has fewer bits than a mantissa in a normal-precision floating-point representation and a shared exponent.

Type: Grant

Filed: March 18, 2019

Date of Patent: October 17, 2023

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Kalin Ovtcharov, Eric S. Chung, Vahideh Akhlaghi, Ritchie Zhao
SPARSITY AND QUANTIZATION FOR DEEP NEURAL NETWORKS

Publication number: 20230316039

Abstract: A computing system is configured to implement a deep neural network comprising an input layer for receiving inputs applied to the deep neural network, an output layer for outputting inferences based on the received inputs, and a plurality of hidden layers interposed between the input layer and the output layer. A plurality of nodes selectively operate on the inputs to generate and cause outputting of the inferences, wherein operation of the nodes is controlled based on parameters of the deep neural network. A sparsity controller is configured to selectively apply a plurality of different sparsity states to control parameter density of the deep neural network. A quantization controller is configured to selectively quantize the parameters of the deep neural network in a manner that is sparsity-dependent, such that quantization applied to each parameter is based on which of the plurality of different sparsity states applies to the parameter.

Type: Application

Filed: May 23, 2022

Publication date: October 5, 2023

Applicant: Microsoft Technology Licensing, LLC

Inventors: Rasoul SHAFIPOUR, Bita DARVISH ROUHANI, Douglas Christopher BURGER, Ming Gang LIU, Eric S. CHUNG, Ritchie Zhao
MACHINE LEARNING MODEL PROCESSING BASED ON PERPLEXITY

Publication number: 20230316043

Abstract: A method for operating a machine learning model is presented. The machine learning model includes a plurality of sequential transformer blocks. The method comprises receiving input data at a transformer block and processing the input data via a mixture of experts layer. At an auxiliary classifier, a measure of perplexity of the processed input data is determined. Based on the determined measure of perplexity, one or more experts in a downstream transformer block that will subsequently process the input data are indicated. Weight matrices are then fetched for the indicated one or more experts.

Type: Application

Filed: March 31, 2022

Publication date: October 5, 2023

Applicant: Microsoft Technology Licensing, LLC

Inventors: Bita DARVISH ROUHANI, Douglas Christopher BURGER, Eric S. CHUNG
SPARSITY MASKING METHODS FOR NEURAL NETWORK TRAINING

Publication number: 20230316080

Abstract: A method is presented for training a neural network. For a weight matrix having integer dimensions M1 in a first dimension and an integer dimension M2 in a second dimension, a first balanced sparsity mask is generated that is an N1 of M1 mask in the first dimension. The first balanced sparsity mask is applied to the weight matrix during inference. A second balanced sparsity mask is generated for a transpose of the weight matrix. The second balanced sparsity mask is an N2 of M2 mask in the second dimension. The second balanced sparsity mask is applied to the transpose of the weight matrix during backpropagation.

Type: Application

Filed: March 29, 2022

Publication date: October 5, 2023

Applicant: Microsoft Technology Licensing, LLC

Inventors: Maximilian Taylor GOLUB, Bita DARVISH ROUHANI, Eric S CHUNG, Douglas Christopher BURGER
MIXTURE OF EXPERTS MODELS WITH SPARSIFIED WEIGHTS

Publication number: 20230316042

Abstract: A method is presented for operating a machine learning model including one or more mixture of experts layers. The method comprises receiving one or more input data shards at a routing gate network for a mixture of experts layer comprising a plurality of neural network experts. One or more neural network experts in the mixture of experts layer is designated layer to evaluate each input data shard. For each designated neural network expert, a weight matrix is retrieved having a predetermined sparsity to generate a sparsified designated neural network expert. Each input data shard is evaluated with a respective sparsified designated neural network expert.

Type: Application

Filed: March 31, 2022

Publication date: October 5, 2023

Applicant: Microsoft Technology Licensing, LLC

Inventors: Bita DARVISH ROUHANI, Douglas Christopher BURGER, Eric S CHUNG
TRAINING NEURAL NETWORK ACCELERATORS USING MIXED PRECISION DATA FORMATS

Publication number: 20230267319

Abstract: Technology related to training a neural network accelerator using mixed precision data formats is disclosed. In one example of the disclosed technology, a neural network accelerator is configured to accelerate a given layer of a multi-layer neural network. An input tensor for the given layer can be converted from a normal-precision floating-point format to a quantized-precision floating-point format. A tensor operation can be performed using the converted input tensor. A result of the tensor operation can be converted from the block floating-point format to the normal-precision floating-point format. The converted result can be used to generate an output tensor of the layer of the neural network, where the output tensor is in normal-precision floating-point format.

Type: Application

Filed: April 28, 2023

Publication date: August 24, 2023

Applicant: Microsoft Technology Licensing, LLC

Inventors: Bita Darvish Rouhani, Taesik Na, Eric S. Chung, Daniel Lo, Douglas C. Burger
FUSING OPERATORS FOR NEURAL NETWORK HARDWARE ACCELERATORS

Publication number: 20230195833

Abstract: Embodiments of the present disclosure include systems and methods for fusing operators for neural network hardware accelerators. A plurality of vector multiplication operations in a data path of a mapping function included in a neural network are identified. The plurality of vector multiplication operations are combined into a single vector multiplication operation in the data path of the mapping function.

Type: Application

Filed: December 22, 2021

Publication date: June 22, 2023

Inventors: Jinwen XI, Eric S. CHUNG
RESIDUAL QUANTIZATION FOR NEURAL NETWORKS

Publication number: 20230196085

Abstract: Methods and apparatus are disclosed for providing emulation of quantized precision operations in a neural network. In some examples, the quantized precision operations are performed in a block floating-point format where values of a tensor share a common exponent. Techniques for selecting higher precision or lower precision can be used based on a variety of input metrics. When converting to a quantized tensor, a residual tensor is produced. In one embodiment, an error value associated with converting from a normal-precision floating point number to the quantized tensor is used to determine whether to use the residual tensor in a dot product calculation. Using the residual tensor increases the precision of an output from a node. Selection of whether to use the residual tensor can depend on various input metrics including the error value, the layer number, the exponent value, the layer type, etc.

Type: Application

Filed: February 16, 2023

Publication date: June 22, 2023

Applicant: Microsoft Technology Licensing, LLC

Inventors: Eric S. Chung, Daniel Lo, Jialiang Zhang, Ritchie Zhao
SYSTEMS AND METHODS FOR HARDWARE ACCELERATION OF DATA MASKING

Publication number: 20230195665

Abstract: A field programmable gate array (FPGA) including a configurable interconnect fabric connecting a plurality of logic blocks, the configurable interconnect fabric and the logic blocks being configured to implement a data masking circuit configured to: receive input data including data values at a plurality of indices of the input data; select between a data value of the data values and an alternative value using a masking multiplexer to generate masked data, the masking multiplexer being controlled by a mask value of a plurality of mask values at indices corresponding to the indices of the input data; and output the masked data. In some examples, the configurable interconnect fabric and the logic blocks are further configured to implement a mask generation circuit configured to generate the mask values. In some examples, the mask values are received from external memory.

Type: Application

Filed: December 22, 2021

Publication date: June 22, 2023

Applicant: Microsoft Technology Licensing, LLC

Inventors: Jinwen XI, Ming Gang LIU, Eric S. CHUNG
Training neural network accelerators using mixed precision data formats

Patent number: 11676003

Abstract: Technology related to training a neural network accelerator using mixed precision data formats is disclosed. In one example of the disclosed technology, a neural network accelerator is configured to accelerate a given layer of a multi-layer neural network. An input tensor for the given layer can be converted from a normal-precision floating-point format to a quantized-precision floating-point format. A tensor operation can be performed using the converted input tensor. A result of the tensor operation can be converted from the block floating-point format to the normal-precision floating-point format. The converted result can be used to generate an output tensor of the layer of the neural network, where the output tensor is in normal-precision floating-point format.

Type: Grant

Filed: December 18, 2018

Date of Patent: June 13, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Bita Darvish Rouhani, Taesik Na, Eric S. Chung, Daniel Lo, Douglas C. Burger
Neural network processing with chained instructions

Patent number: 11663450

Abstract: Hardware and methods for neural network processing are provided. A method in a hardware node including a pipeline having a matrix vector unit (MVU), a first multifunction unit connected to receive an input from the matrix vector unit, a second multifunction unit connected to receive an output from the first multifunction unit, and a third multifunction unit connected to receive an output from the second multifunction unit is provided. The method includes performing using the MVU a first type of instruction that can only be performed by the MVU to generate a first result. The method further includes performing a second type of instruction that can only be performed by one of the multifunction units and generating a second result and without storing the any of the two results in a global register, passing the second result to the second multifunction and the third multifunction unit.

Type: Grant

Filed: June 29, 2017

Date of Patent: May 30, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Jeremy Fowers, Eric S. Chung, Douglas C. Burger
Flow for quantized neural networks

Patent number: 11645493

Abstract: Methods and apparatus are disclosed supporting a design flow for developing quantized neural networks. In one example of the disclosed technology, a method includes quantizing a normal-precision floating-point neural network model into a quantized format. For example, the quantized format can be a block floating-point format, where two or more elements of tensors in the neural network share a common exponent. A set of test input is applied to a normal-precision flooding point model and the corresponding quantized model and the respective output tensors are compared. Based on this comparison, hyperparameters or other attributes of the neural networks can be adjusted. Further, quantization parameters determining the widths of data and selection of shared exponents for the block floating-point format can be selected. An adjusted, quantized neural network is retrained and programmed into a hardware accelerator.

Type: Grant

Filed: May 4, 2018

Date of Patent: May 9, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Douglas C. Burger, Eric S. Chung, Bita Darvish Rouhani, Daniel Lo, Ritchie Zhao
NEURAL NETWORK ACTIVATION COMPRESSION WITH NON-UNIFORM MANTISSAS

Publication number: 20230140185

Abstract: Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and in particular for storing activation values from a neural network in a compressed format having lossy or non-uniform mantissas for use during forward and backward propagation training of the neural network. In certain examples of the disclosed technology, a computing system includes processors, memory, and a compressor in communication with the memory. The computing system is configured to perform forward propagation for a layer of a neural network to produced first activation values in a first block floating-point format. In some examples, activation values generated by forward propagation are converted by the compressor to a second block floating-point format having a non-uniform and/or lossy mantissa. The compressed activation values are stored in the memory, where they can be retrieved for use during back propagation.

Type: Application

Filed: January 3, 2023

Publication date: May 4, 2023

Applicant: Microsoft Technology Licensing, LLC

Inventors: Daniel Lo, Amar Phanishayee, Eric S. Chung, Yiren Zhao

1 2 3 4 5 next