Patents by Inventor Douglas C. Burger

Douglas C. Burger has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

ANALOG MAC AWARE DNN IMPROVEMENT

Publication number: 20240134439

Abstract: Methods, systems and computer program products are provided for improving performance (e.g., reducing power consumption) of a hardware accelerator (e.g., neural processor) comprising hybrid or analog multiply and accumulate (MAC) processing elements (PEs). Selective variation of the precision of an array of MAC PEs may reduce power consumption of a neural processor. Power may be conserved by dynamically controlling the precision of analog to digital (ADC) output bits for one or more MAC PEs. Dynamic control of ADC output bit precision may be based on precision information determined during training and/or post-training (e.g., quantization) of an artificial intelligence (AI) neural network (NN) model implemented by the neural processor. Precision information may include a range of dynamic precision for each of a plurality of nodes of a computation graph for the AI NN model.

Type: Application

Filed: December 29, 2023

Publication date: April 25, 2024

Inventors: Gilad KIRSHENBOIM, Ran SAHAR, Douglas C. BURGER, Yehonathan REFAEL KALIM
Analog MAC aware DNN improvement

Patent number: 11899518

Abstract: Methods, systems and computer program products are provided for improving performance (e.g., reducing power consumption) of a hardware accelerator (e.g., neural processor) comprising hybrid or analog multiply and accumulate (MAC) processing elements (PEs). Selective variation of the precision of an array of MAC PEs may reduce power consumption of a neural processor. Power may be conserved by dynamically controlling the precision of analog to digital (ADC) output bits for one or more MAC PEs. Dynamic control of ADC output bit precision may be based on precision information determined during training and/or post-training (e.g., quantization) of an artificial intelligence (AI) neural network (NN) model implemented by the neural processor. Precision information may include a range of dynamic precision for each of a plurality of nodes of a computation graph for the AI NN model.

Type: Grant

Filed: December 15, 2021

Date of Patent: February 13, 2024

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Gilad Kirshenboim, Ran Sahar, Douglas C. Burger, Yehonathan Refael Kalim
Hierarchical and shared exponent floating point data types

Patent number: 11886833

Abstract: Embodiments of the present disclosure include systems and methods for providing hierarchical and shared exponent floating point data types. First and second shared exponent values are determined based on exponent values of a plurality of floating point values. A third shared exponent value is determined based the first shared exponent value and the second shared exponent value. First and second difference values are determined based on the first shared exponent value, the second shared exponent value, and the third shared exponent value. Sign values and mantissa values are determined for the plurality of floating point values. The sign value and the mantissa value for each floating point value in the plurality of floating point values, the third shared exponent value, the first difference value, and the second difference value are stored in a data structure for a shared exponent floating point data type.

Type: Grant

Filed: June 28, 2021

Date of Patent: January 30, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Bita Darvish Rouhani, Venmugil Elango, Rasoul Shafipour, Jeremy Fowers, Ming Gang Liu, Jinwen Xi, Douglas C. Burger, Eric S. Chung
MODEL CUSTOMIZATION OF TRANSFORMERS FOR IMPROVED EFFICIENCY

Publication number: 20230376725

Abstract: Embodiments of the present disclosure include systems and methods for providing model customizations of transformers for improved efficiency. A first set of settings for a transformer model is received. Based on the first set of settings, a second set of settings for the transformer model is determined. The first set of settings and the second set of settings are used to configure and train the transformer model.

Type: Application

Filed: May 19, 2022

Publication date: November 23, 2023

Inventors: Maral Mesmakhosroshahi, Bita Darvish Rouhani, Eric S. Chung, Douglas C. Burger, Maximilian Taylor Golub
Reach matrix scheduler circuit for scheduling instructions to be executed in a processor

Patent number: 11803389

Abstract: A reach matrix scheduler circuit for scheduling instructions to be executed in a processor is disclosed. The scheduler circuit includes an N×R matrix wake-up circuit, where ‘N’ is the instruction window size of the scheduler circuit, and ‘R’ is the “reach” within the instruction window of the matrix wake-up circuit, with ‘R’ being less than ‘N’. A grant line associated with each instruction request entry in the N×R matrix wake-up circuit is coupled to ‘R’ other instruction entries among the ‘N’ instruction entries. When a producer instruction in an instruction request entry is ready for issuance, the grant line associated with the instruction request entry is activated so that any other instruction entries coupled to the grant line (i.e., within the “reach” of the instruction request entry) that consume the produced value generated by the producer instruction are “woken-up” and subsequently indicated as ready to be issued.

Type: Grant

Filed: January 9, 2020

Date of Patent: October 31, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Yusuf Cagatay Tekmen, Rodney Wayne Smith, Douglas C. Burger, Gagan Gupta, Kiran Ravi Seth
Instruction block allocation

Patent number: 11755484

Abstract: Apparatus and methods are disclosed for throttling processor operation in block-based processor architectures. In one example of the disclosed technology, a block-based instruction set architecture processor includes a plurality of processing cores configured to fetch and execute a sequence of instruction blocks. Each of the processing cores includes function resources for performing operations specified by the instruction blocks. The processor further includes a core scheduler configured to allocate functional resources for performing the operations. The functional resources are allocated for executing the instruction blocks based, at least in part, on a performance metric. The performance metric can be generated dynamically or statically based on branch prediction accuracy, energy usage tolerance, and other suitable metrics.

Type: Grant

Filed: June 26, 2015

Date of Patent: September 12, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Jan S. Gray, Douglas C. Burger, Aaron L. Smith
TRAINING NEURAL NETWORK ACCELERATORS USING MIXED PRECISION DATA FORMATS

Publication number: 20230267319

Abstract: Technology related to training a neural network accelerator using mixed precision data formats is disclosed. In one example of the disclosed technology, a neural network accelerator is configured to accelerate a given layer of a multi-layer neural network. An input tensor for the given layer can be converted from a normal-precision floating-point format to a quantized-precision floating-point format. A tensor operation can be performed using the converted input tensor. A result of the tensor operation can be converted from the block floating-point format to the normal-precision floating-point format. The converted result can be used to generate an output tensor of the layer of the neural network, where the output tensor is in normal-precision floating-point format.

Type: Application

Filed: April 28, 2023

Publication date: August 24, 2023

Applicant: Microsoft Technology Licensing, LLC

Inventors: Bita Darvish Rouhani, Taesik Na, Eric S. Chung, Daniel Lo, Douglas C. Burger
Coupling wide memory interface to wide write back paths

Patent number: 11726912

Abstract: Systems and methods are disclosed for performing wide memory operations for a wide data cache line. In some examples of the disclosed technology, a processor having two or more execution lanes includes a data cache coupled to memory, a wide memory load circuit that concurrently loads two or more words from a cache line of the data cache, and a writeback circuit situated to send a respective word of the concurrently-loaded words to a selected execution lane of the processor, either into an operand buffer or bypassing the operand buffer. In some examples, a sharding circuit is provided that allows bitwise, byte-wise, and/or word-wise manipulation of memory operation data. In some examples, wide cache loads allows for concurrent execution of plural execution lanes of the processor.

Type: Grant

Filed: March 29, 2021

Date of Patent: August 15, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Douglas C. Burger, Aaron L. Smith, Gagan Gupta, David T. Harper
Generation and use of memory access instruction order encodings

Patent number: 11681531

Abstract: Apparatus and methods are disclosed for controlling execution of memory access instructions in a block-based processor architecture using a hardware structure that indicates a relative ordering of memory access instruction in an instruction block. In one example of the disclosed technology, a method of executing an instruction block having a plurality of memory load and/or memory store instructions includes selecting a next memory load or memory store instruction to execute based on dependencies encoded within the block, and on a store vector that stores data indicating which memory load and memory store instructions in the instruction block have executed. The store vector can be masked using a store mask. The store mask can be generated when decoding the instruction block, or copied from an instruction block header. Based on the encoded dependencies and the masked store vector, the next instruction can issue when its dependencies are available.

Type: Grant

Filed: October 23, 2015

Date of Patent: June 20, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Douglas C. Burger, Aaron L. Smith
ANALOG MAC AWARE DNN IMPROVEMENT

Publication number: 20230185352

Abstract: Methods, systems and computer program products are provided for improving performance (e.g., reducing power consumption) of a hardware accelerator (e.g., neural processor) comprising hybrid or analog multiply and accumulate (MAC) processing elements (PEs). Selective variation of the precision of an array of MAC PEs may reduce power consumption of a neural processor. Power may be conserved by dynamically controlling the precision of analog to digital (ADC) output bits for one or more MAC PEs. Dynamic control of ADC output bit precision may be based on precision information determined during training and/or post-training (e.g., quantization) of an artificial intelligence (AI) neural network (NN) model implemented by the neural processor. Precision information may include a range of dynamic precision for each of a plurality of nodes of a computation graph for the AI NN model.

Type: Application

Filed: December 15, 2021

Publication date: June 15, 2023

Inventors: Gilad KIRSHENBOIM, Ran SAHAR, Douglas C. BURGER, Yehonathan REFAEL KALIM
Training neural network accelerators using mixed precision data formats

Patent number: 11676003

Abstract: Technology related to training a neural network accelerator using mixed precision data formats is disclosed. In one example of the disclosed technology, a neural network accelerator is configured to accelerate a given layer of a multi-layer neural network. An input tensor for the given layer can be converted from a normal-precision floating-point format to a quantized-precision floating-point format. A tensor operation can be performed using the converted input tensor. A result of the tensor operation can be converted from the block floating-point format to the normal-precision floating-point format. The converted result can be used to generate an output tensor of the layer of the neural network, where the output tensor is in normal-precision floating-point format.

Type: Grant

Filed: December 18, 2018

Date of Patent: June 13, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Bita Darvish Rouhani, Taesik Na, Eric S. Chung, Daniel Lo, Douglas C. Burger
Neural network processing with chained instructions

Patent number: 11663450

Abstract: Hardware and methods for neural network processing are provided. A method in a hardware node including a pipeline having a matrix vector unit (MVU), a first multifunction unit connected to receive an input from the matrix vector unit, a second multifunction unit connected to receive an output from the first multifunction unit, and a third multifunction unit connected to receive an output from the second multifunction unit is provided. The method includes performing using the MVU a first type of instruction that can only be performed by the MVU to generate a first result. The method further includes performing a second type of instruction that can only be performed by one of the multifunction units and generating a second result and without storing the any of the two results in a global register, passing the second result to the second multifunction and the third multifunction unit.

Type: Grant

Filed: June 29, 2017

Date of Patent: May 30, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Jeremy Fowers, Eric S. Chung, Douglas C. Burger
Flow for quantized neural networks

Patent number: 11645493

Abstract: Methods and apparatus are disclosed supporting a design flow for developing quantized neural networks. In one example of the disclosed technology, a method includes quantizing a normal-precision floating-point neural network model into a quantized format. For example, the quantized format can be a block floating-point format, where two or more elements of tensors in the neural network share a common exponent. A set of test input is applied to a normal-precision flooding point model and the corresponding quantized model and the respective output tensors are compared. Based on this comparison, hyperparameters or other attributes of the neural networks can be adjusted. Further, quantization parameters determining the widths of data and selection of shared exponents for the block floating-point format can be selected. An adjusted, quantized neural network is retrained and programmed into a hardware accelerator.

Type: Grant

Filed: May 4, 2018

Date of Patent: May 9, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Douglas C. Burger, Eric S. Chung, Bita Darvish Rouhani, Daniel Lo, Ritchie Zhao
EXECUTING MULTIPLE PROGRAMS SIMULTANEOUSLY ON A PROCESSOR CORE

Publication number: 20230106990

Abstract: Systems and methods are disclosed for allocating resources to contexts in block-based processor architectures. In one example of the disclosed technology, a processor is configured to spatially allocate resources between multiple contexts being executed by the processor, including caches, functional units, and register files. In a second example of the disclosed technology, a processor is configured to temporally allocate resources between multiple contexts, for example, on a clock cycle basis, including caches, register files, and branch predictors. Each context is guaranteed access to its allocated resources to avoid starvation from contexts competing for resources of the processor. A results buffer can be used for folding larger instruction blocks into portions that can be mapped to smaller-sized instruction windows. The results buffer stores operand results that can be passed to subsequent portions of an instruction block.

Type: Application

Filed: December 9, 2022

Publication date: April 6, 2023

Applicant: Microsoft Technology Licensing, LLC

Inventors: Gagan Gupta, Douglas C. Burger
SPARSIFYING NARROW DATA FORMATS FOR NEURAL NETWORKS

Publication number: 20220405571

Abstract: Embodiments of the present disclosure include systems and methods for sparsifying narrow data formats for neural networks. A plurality of activation values in a neural network are provided to a muxing unit. A set of sparsification operations are performed on a plurality of weight values to generate a subset of the plurality of weight values and mask values associated with the plurality of weight values. The subset of the plurality of weight values are provided to a matrix multiplication unit. The muxing unit generates a subset of the plurality of activation values based on the mask values and provides the subset of the plurality of activation values to the matrix multiplication unit. The matrix multiplication unit performs a set of matrix multiplication operations on the subset of the plurality of weight values and the subset of the plurality of activation values to generate a set of outputs.

Type: Application

Filed: June 16, 2021

Publication date: December 22, 2022

Inventors: Bita DARVISH ROUHANI, Venmugil Elango, Eric S. Chung, Douglas C Burger, Mattheus C. Heddes, Nishit Shah, Rasoul Shafipour, Ankit More
Executing multiple programs simultaneously on a processor core

Patent number: 11531552

Abstract: Systems and methods are disclosed for allocating resources to contexts in block-based processor architectures. In one example of the disclosed technology, a processor is configured to spatially allocate resources between multiple contexts being executed by the processor, including caches, functional units, and register files. In a second example of the disclosed technology, a processor is configured to temporally allocate resources between multiple contexts, for example, on a clock cycle basis, including caches, register files, and branch predictors. Each context is guaranteed access to its allocated resources to avoid starvation from contexts competing for resources of the processor. A results buffer can be used for folding larger instruction blocks into portions that can be mapped to smaller-sized instruction windows. The results buffer stores operand results that can be passed to subsequent portions of an instruction block.

Type: Grant

Filed: February 6, 2017

Date of Patent: December 20, 2022

Assignee: Microsoft Technology Licensing, LLC

Inventors: Gagan Gupta, Douglas C. Burger
NEURAL NETWORK PROCESSING WITH CHAINED INSTRUCTIONS

Publication number: 20220391209

Abstract: Hardware and methods for neural network processing are provided. A method in a hardware node including a pipeline having a matrix vector unit (MVU), a first multifunction unit connected to receive an input from the matrix vector unit, a second multifunction unit connected to receive an output from the first multifunction unit, and a third multifunction unit connected to receive an output from the second multifunction unit is provided. The method includes performing using the MVU a first type of instruction that can only be performed by the MVU to generate a first result. The method further includes performing a second type of instruction that can only be performed by one of the multifunction units and generating a second result and without storing the any of the two results in a global register, passing the second result to the second multifunction and the third multifunction unit.

Type: Application

Filed: August 8, 2022

Publication date: December 8, 2022

Inventors: Jeremy FOWERS, Eric S. CHUNG, Douglas C. BURGER
DATA-AWARE MODEL PRUNING FOR NEURAL NETWORKS

Publication number: 20220383123

Abstract: Embodiments of the present disclosure include systems and methods for performing data-aware model pruning for neural networks. During a training phase, a neural network is trained with a first set of data. During a validation phase, inference with the neural network is performed using a second set of data that causes the neural network to generate a first set of outputs at a layer in the neural network. During the validation phase, a plurality of mean values and a plurality of variance values are calculated based on the first set of outputs. A plurality of entropy values are calculated based on the plurality of mean values and the plurality of variance values. A second set of outputs are pruned based on the plurality of entropy values. The second set of outputs are generated by the layer of the neural network using a third set of data.

Type: Application

Filed: May 28, 2021

Publication date: December 1, 2022

Inventors: Venmugil ELANGO, Bita DARVISH ROUHANI, Eric S. CHUNG, Douglas C. BURGER, Maximilian GOLUB
TURBO TRAINING FOR DEEP NEURAL NETWORKS

Publication number: 20220383092

Abstract: Embodiments of the present disclosure includes systems and methods for reducing computational cost associated with training a neural network model. A neural network model is received and a neural network training process is executed in which the neural network model is trained according to a first fidelity during a first training phase. As a result of a determination that training of the neural network model during the first training phase satisfies one or more criteria, the neural network model is trained at a second fidelity during a second training phase, the second fidelity being a higher fidelity than the first fidelity.

Type: Application

Filed: May 25, 2021

Publication date: December 1, 2022

Inventors: Ritchie ZHAO, Bita DARVISH ROUHANI, Eric S. CHUNG, Douglas C. BURGER, Maximilian GOLUB
REDUCING OPERATIONS FOR TRAINING NEURAL NETWORKS

Publication number: 20220366236

Abstract: Embodiments of the present disclosure include systems and methods for reducing operations for training neural networks. A plurality of training data selected from a training data set is used as a plurality of inputs for training a neural network. The neural network includes a plurality of weights. A plurality of loss values are determined based on outputs generated by the neural network and expected output data of the plurality of training data. A subset of the plurality of loss values are determined. An average loss value is determined based on the subset of the plurality of loss values. A set of gradients is calculated based on the average loss value and the plurality of weights in the neural network. The plurality of weights in the neural network are adjusted based on the set of gradients.

Type: Application

Filed: May 17, 2021

Publication date: November 17, 2022

Inventors: Maral MESMAKHOSROSHAHI, Bita Darvish ROUHANI, Eric S. CHUNG, Douglas C. BURGER

1 2 3 4 5 … next