Patents by Inventor Douglas C. Burger

Douglas C. Burger has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

OPTIMIZING DEEP NEURAL NETWORK MODELS BASED ON SPARSIFICATION AND QUANTIZATION

Publication number: 20240403644

Abstract: Embodiments of the present disclosure include systems and methods for optimizing deep neural network models based on sparsification and quantization. A device may identify a layer in a plurality of layers included in a neural network model, each layer in the plurality of layers comprising a plurality of weight values. The device may select a weight value from the plurality of weight values in the layer. The device may remove the weight value from the plurality of weight values in the layer to produce a modified version of the layer. The device may update remaining weight values in the plurality of weight values in the modified version of the layer, wherein removing the weight value and updating the remaining weight values provides greater compression of the neural network model and reduces loss of accuracy of the neural network model.

Type: Application

Filed: May 31, 2023

Publication date: December 5, 2024

Inventors: Rasoul SHAFIPOUR, Bita DARVISH ROUHANI, Eric Sen CHUNG, Douglas C. BURGER
DETERMINING SHARED EXPONENT VALUES FOR SHARED EXPONENT FLOATING POINT DATA TYPES

Publication number: 20240402993

Abstract: Embodiments of the present disclosure include systems and methods for determining shared exponent values for shared exponent floating point data types. A device may determine a shared global exponent value for a plurality of floating point numbers. The device may determine a sub exponent value from a plurality of candidate sub exponent values that minimizes a quantization error value determined based on a subset of the plurality of floating point numbers and a version of the subset of the plurality of floating point numbers quantized based on the shared global exponent and the sub exponent value. The determined sub exponent value is shared among the subset of the plurality of floating point numbers. The device may, based on the sub exponent value, represent the subset of the plurality of floating point numbers using a shared exponent floating point data type.

Type: Application

Filed: May 30, 2023

Publication date: December 5, 2024

Inventors: Rasoul SHAFIPOUR, Bita DARVISH ROUHANI, Marius Octavian STAN, Mathew Kent HALL, Preyas Janak SHAH, Ankit MORE, Eric Sen CHUNG, Douglas C. BURGER
Implicit program order

Patent number: 11977891

Abstract: Apparatus and methods are disclosed for controlling execution of memory access instructions in a block-based processor architecture using a hardware structure that generates a relative ordering of memory access instruction in an instruction block. In one example of the disclosed technology, a method of executing an instruction block having a plurality of memory load and/or memory store instructions includes decoding an instruction block encoding a plurality of memory access instructions and generating data indicating a relative order for executing the memory access instructions in the instruction block and scheduling operation of a portion of the instruction block based at least in part on the relative order data. In some examples, a store vector data register can store the generated relative ordering data for use in subsequent instances of the instruction block.

Type: Grant

Filed: February 1, 2016

Date of Patent: May 7, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Douglas C. Burger, Aaron L. Smith
ANALOG MAC AWARE DNN IMPROVEMENT

Publication number: 20240134439

Abstract: Methods, systems and computer program products are provided for improving performance (e.g., reducing power consumption) of a hardware accelerator (e.g., neural processor) comprising hybrid or analog multiply and accumulate (MAC) processing elements (PEs). Selective variation of the precision of an array of MAC PEs may reduce power consumption of a neural processor. Power may be conserved by dynamically controlling the precision of analog to digital (ADC) output bits for one or more MAC PEs. Dynamic control of ADC output bit precision may be based on precision information determined during training and/or post-training (e.g., quantization) of an artificial intelligence (AI) neural network (NN) model implemented by the neural processor. Precision information may include a range of dynamic precision for each of a plurality of nodes of a computation graph for the AI NN model.

Type: Application

Filed: December 29, 2023

Publication date: April 25, 2024

Inventors: Gilad KIRSHENBOIM, Ran SAHAR, Douglas C. BURGER, Yehonathan REFAEL KALIM
Analog MAC aware DNN improvement

Patent number: 11899518

Abstract: Methods, systems and computer program products are provided for improving performance (e.g., reducing power consumption) of a hardware accelerator (e.g., neural processor) comprising hybrid or analog multiply and accumulate (MAC) processing elements (PEs). Selective variation of the precision of an array of MAC PEs may reduce power consumption of a neural processor. Power may be conserved by dynamically controlling the precision of analog to digital (ADC) output bits for one or more MAC PEs. Dynamic control of ADC output bit precision may be based on precision information determined during training and/or post-training (e.g., quantization) of an artificial intelligence (AI) neural network (NN) model implemented by the neural processor. Precision information may include a range of dynamic precision for each of a plurality of nodes of a computation graph for the AI NN model.

Type: Grant

Filed: December 15, 2021

Date of Patent: February 13, 2024

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Gilad Kirshenboim, Ran Sahar, Douglas C. Burger, Yehonathan Refael Kalim
Hierarchical and shared exponent floating point data types

Patent number: 11886833

Abstract: Embodiments of the present disclosure include systems and methods for providing hierarchical and shared exponent floating point data types. First and second shared exponent values are determined based on exponent values of a plurality of floating point values. A third shared exponent value is determined based the first shared exponent value and the second shared exponent value. First and second difference values are determined based on the first shared exponent value, the second shared exponent value, and the third shared exponent value. Sign values and mantissa values are determined for the plurality of floating point values. The sign value and the mantissa value for each floating point value in the plurality of floating point values, the third shared exponent value, the first difference value, and the second difference value are stored in a data structure for a shared exponent floating point data type.

Type: Grant

Filed: June 28, 2021

Date of Patent: January 30, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Bita Darvish Rouhani, Venmugil Elango, Rasoul Shafipour, Jeremy Fowers, Ming Gang Liu, Jinwen Xi, Douglas C. Burger, Eric S. Chung
MODEL CUSTOMIZATION OF TRANSFORMERS FOR IMPROVED EFFICIENCY

Publication number: 20230376725

Abstract: Embodiments of the present disclosure include systems and methods for providing model customizations of transformers for improved efficiency. A first set of settings for a transformer model is received. Based on the first set of settings, a second set of settings for the transformer model is determined. The first set of settings and the second set of settings are used to configure and train the transformer model.

Type: Application

Filed: May 19, 2022

Publication date: November 23, 2023

Inventors: Maral Mesmakhosroshahi, Bita Darvish Rouhani, Eric S. Chung, Douglas C. Burger, Maximilian Taylor Golub
Reach matrix scheduler circuit for scheduling instructions to be executed in a processor

Patent number: 11803389

Abstract: A reach matrix scheduler circuit for scheduling instructions to be executed in a processor is disclosed. The scheduler circuit includes an N×R matrix wake-up circuit, where ‘N’ is the instruction window size of the scheduler circuit, and ‘R’ is the “reach” within the instruction window of the matrix wake-up circuit, with ‘R’ being less than ‘N’. A grant line associated with each instruction request entry in the N×R matrix wake-up circuit is coupled to ‘R’ other instruction entries among the ‘N’ instruction entries. When a producer instruction in an instruction request entry is ready for issuance, the grant line associated with the instruction request entry is activated so that any other instruction entries coupled to the grant line (i.e., within the “reach” of the instruction request entry) that consume the produced value generated by the producer instruction are “woken-up” and subsequently indicated as ready to be issued.

Type: Grant

Filed: January 9, 2020

Date of Patent: October 31, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Yusuf Cagatay Tekmen, Rodney Wayne Smith, Douglas C. Burger, Gagan Gupta, Kiran Ravi Seth
Instruction block allocation

Patent number: 11755484

Abstract: Apparatus and methods are disclosed for throttling processor operation in block-based processor architectures. In one example of the disclosed technology, a block-based instruction set architecture processor includes a plurality of processing cores configured to fetch and execute a sequence of instruction blocks. Each of the processing cores includes function resources for performing operations specified by the instruction blocks. The processor further includes a core scheduler configured to allocate functional resources for performing the operations. The functional resources are allocated for executing the instruction blocks based, at least in part, on a performance metric. The performance metric can be generated dynamically or statically based on branch prediction accuracy, energy usage tolerance, and other suitable metrics.

Type: Grant

Filed: June 26, 2015

Date of Patent: September 12, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Jan S. Gray, Douglas C. Burger, Aaron L. Smith
TRAINING NEURAL NETWORK ACCELERATORS USING MIXED PRECISION DATA FORMATS

Publication number: 20230267319

Abstract: Technology related to training a neural network accelerator using mixed precision data formats is disclosed. In one example of the disclosed technology, a neural network accelerator is configured to accelerate a given layer of a multi-layer neural network. An input tensor for the given layer can be converted from a normal-precision floating-point format to a quantized-precision floating-point format. A tensor operation can be performed using the converted input tensor. A result of the tensor operation can be converted from the block floating-point format to the normal-precision floating-point format. The converted result can be used to generate an output tensor of the layer of the neural network, where the output tensor is in normal-precision floating-point format.

Type: Application

Filed: April 28, 2023

Publication date: August 24, 2023

Applicant: Microsoft Technology Licensing, LLC

Inventors: Bita Darvish Rouhani, Taesik Na, Eric S. Chung, Daniel Lo, Douglas C. Burger
Coupling wide memory interface to wide write back paths

Patent number: 11726912

Abstract: Systems and methods are disclosed for performing wide memory operations for a wide data cache line. In some examples of the disclosed technology, a processor having two or more execution lanes includes a data cache coupled to memory, a wide memory load circuit that concurrently loads two or more words from a cache line of the data cache, and a writeback circuit situated to send a respective word of the concurrently-loaded words to a selected execution lane of the processor, either into an operand buffer or bypassing the operand buffer. In some examples, a sharding circuit is provided that allows bitwise, byte-wise, and/or word-wise manipulation of memory operation data. In some examples, wide cache loads allows for concurrent execution of plural execution lanes of the processor.

Type: Grant

Filed: March 29, 2021

Date of Patent: August 15, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Douglas C. Burger, Aaron L. Smith, Gagan Gupta, David T. Harper
Generation and use of memory access instruction order encodings

Patent number: 11681531

Abstract: Apparatus and methods are disclosed for controlling execution of memory access instructions in a block-based processor architecture using a hardware structure that indicates a relative ordering of memory access instruction in an instruction block. In one example of the disclosed technology, a method of executing an instruction block having a plurality of memory load and/or memory store instructions includes selecting a next memory load or memory store instruction to execute based on dependencies encoded within the block, and on a store vector that stores data indicating which memory load and memory store instructions in the instruction block have executed. The store vector can be masked using a store mask. The store mask can be generated when decoding the instruction block, or copied from an instruction block header. Based on the encoded dependencies and the masked store vector, the next instruction can issue when its dependencies are available.

Type: Grant

Filed: October 23, 2015

Date of Patent: June 20, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Douglas C. Burger, Aaron L. Smith
ANALOG MAC AWARE DNN IMPROVEMENT

Publication number: 20230185352

Abstract: Methods, systems and computer program products are provided for improving performance (e.g., reducing power consumption) of a hardware accelerator (e.g., neural processor) comprising hybrid or analog multiply and accumulate (MAC) processing elements (PEs). Selective variation of the precision of an array of MAC PEs may reduce power consumption of a neural processor. Power may be conserved by dynamically controlling the precision of analog to digital (ADC) output bits for one or more MAC PEs. Dynamic control of ADC output bit precision may be based on precision information determined during training and/or post-training (e.g., quantization) of an artificial intelligence (AI) neural network (NN) model implemented by the neural processor. Precision information may include a range of dynamic precision for each of a plurality of nodes of a computation graph for the AI NN model.

Type: Application

Filed: December 15, 2021

Publication date: June 15, 2023

Inventors: Gilad KIRSHENBOIM, Ran SAHAR, Douglas C. BURGER, Yehonathan REFAEL KALIM
Training neural network accelerators using mixed precision data formats

Patent number: 11676003

Abstract: Technology related to training a neural network accelerator using mixed precision data formats is disclosed. In one example of the disclosed technology, a neural network accelerator is configured to accelerate a given layer of a multi-layer neural network. An input tensor for the given layer can be converted from a normal-precision floating-point format to a quantized-precision floating-point format. A tensor operation can be performed using the converted input tensor. A result of the tensor operation can be converted from the block floating-point format to the normal-precision floating-point format. The converted result can be used to generate an output tensor of the layer of the neural network, where the output tensor is in normal-precision floating-point format.

Type: Grant

Filed: December 18, 2018

Date of Patent: June 13, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Bita Darvish Rouhani, Taesik Na, Eric S. Chung, Daniel Lo, Douglas C. Burger
Neural network processing with chained instructions

Patent number: 11663450

Abstract: Hardware and methods for neural network processing are provided. A method in a hardware node including a pipeline having a matrix vector unit (MVU), a first multifunction unit connected to receive an input from the matrix vector unit, a second multifunction unit connected to receive an output from the first multifunction unit, and a third multifunction unit connected to receive an output from the second multifunction unit is provided. The method includes performing using the MVU a first type of instruction that can only be performed by the MVU to generate a first result. The method further includes performing a second type of instruction that can only be performed by one of the multifunction units and generating a second result and without storing the any of the two results in a global register, passing the second result to the second multifunction and the third multifunction unit.

Type: Grant

Filed: June 29, 2017

Date of Patent: May 30, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Jeremy Fowers, Eric S. Chung, Douglas C. Burger
Flow for quantized neural networks

Patent number: 11645493

Abstract: Methods and apparatus are disclosed supporting a design flow for developing quantized neural networks. In one example of the disclosed technology, a method includes quantizing a normal-precision floating-point neural network model into a quantized format. For example, the quantized format can be a block floating-point format, where two or more elements of tensors in the neural network share a common exponent. A set of test input is applied to a normal-precision flooding point model and the corresponding quantized model and the respective output tensors are compared. Based on this comparison, hyperparameters or other attributes of the neural networks can be adjusted. Further, quantization parameters determining the widths of data and selection of shared exponents for the block floating-point format can be selected. An adjusted, quantized neural network is retrained and programmed into a hardware accelerator.

Type: Grant

Filed: May 4, 2018

Date of Patent: May 9, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Douglas C. Burger, Eric S. Chung, Bita Darvish Rouhani, Daniel Lo, Ritchie Zhao
EXECUTING MULTIPLE PROGRAMS SIMULTANEOUSLY ON A PROCESSOR CORE

Publication number: 20230106990

Abstract: Systems and methods are disclosed for allocating resources to contexts in block-based processor architectures. In one example of the disclosed technology, a processor is configured to spatially allocate resources between multiple contexts being executed by the processor, including caches, functional units, and register files. In a second example of the disclosed technology, a processor is configured to temporally allocate resources between multiple contexts, for example, on a clock cycle basis, including caches, register files, and branch predictors. Each context is guaranteed access to its allocated resources to avoid starvation from contexts competing for resources of the processor. A results buffer can be used for folding larger instruction blocks into portions that can be mapped to smaller-sized instruction windows. The results buffer stores operand results that can be passed to subsequent portions of an instruction block.

Type: Application

Filed: December 9, 2022

Publication date: April 6, 2023

Applicant: Microsoft Technology Licensing, LLC

Inventors: Gagan Gupta, Douglas C. Burger
SPARSIFYING NARROW DATA FORMATS FOR NEURAL NETWORKS

Publication number: 20220405571

Abstract: Embodiments of the present disclosure include systems and methods for sparsifying narrow data formats for neural networks. A plurality of activation values in a neural network are provided to a muxing unit. A set of sparsification operations are performed on a plurality of weight values to generate a subset of the plurality of weight values and mask values associated with the plurality of weight values. The subset of the plurality of weight values are provided to a matrix multiplication unit. The muxing unit generates a subset of the plurality of activation values based on the mask values and provides the subset of the plurality of activation values to the matrix multiplication unit. The matrix multiplication unit performs a set of matrix multiplication operations on the subset of the plurality of weight values and the subset of the plurality of activation values to generate a set of outputs.

Type: Application

Filed: June 16, 2021

Publication date: December 22, 2022

Inventors: Bita DARVISH ROUHANI, Venmugil Elango, Eric S. Chung, Douglas C Burger, Mattheus C. Heddes, Nishit Shah, Rasoul Shafipour, Ankit More
Executing multiple programs simultaneously on a processor core

Patent number: 11531552

Abstract: Systems and methods are disclosed for allocating resources to contexts in block-based processor architectures. In one example of the disclosed technology, a processor is configured to spatially allocate resources between multiple contexts being executed by the processor, including caches, functional units, and register files. In a second example of the disclosed technology, a processor is configured to temporally allocate resources between multiple contexts, for example, on a clock cycle basis, including caches, register files, and branch predictors. Each context is guaranteed access to its allocated resources to avoid starvation from contexts competing for resources of the processor. A results buffer can be used for folding larger instruction blocks into portions that can be mapped to smaller-sized instruction windows. The results buffer stores operand results that can be passed to subsequent portions of an instruction block.

Type: Grant

Filed: February 6, 2017

Date of Patent: December 20, 2022

Assignee: Microsoft Technology Licensing, LLC

Inventors: Gagan Gupta, Douglas C. Burger
NEURAL NETWORK PROCESSING WITH CHAINED INSTRUCTIONS

Publication number: 20220391209

Abstract: Hardware and methods for neural network processing are provided. A method in a hardware node including a pipeline having a matrix vector unit (MVU), a first multifunction unit connected to receive an input from the matrix vector unit, a second multifunction unit connected to receive an output from the first multifunction unit, and a third multifunction unit connected to receive an output from the second multifunction unit is provided. The method includes performing using the MVU a first type of instruction that can only be performed by the MVU to generate a first result. The method further includes performing a second type of instruction that can only be performed by one of the multifunction units and generating a second result and without storing the any of the two results in a global register, passing the second result to the second multifunction and the third multifunction unit.

Type: Application

Filed: August 8, 2022

Publication date: December 8, 2022

Inventors: Jeremy FOWERS, Eric S. CHUNG, Douglas C. BURGER

1 2 3 4 5 … next