Patents by Inventor Douglas C. Burger
Douglas C. Burger has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11977891Abstract: Apparatus and methods are disclosed for controlling execution of memory access instructions in a block-based processor architecture using a hardware structure that generates a relative ordering of memory access instruction in an instruction block. In one example of the disclosed technology, a method of executing an instruction block having a plurality of memory load and/or memory store instructions includes decoding an instruction block encoding a plurality of memory access instructions and generating data indicating a relative order for executing the memory access instructions in the instruction block and scheduling operation of a portion of the instruction block based at least in part on the relative order data. In some examples, a store vector data register can store the generated relative ordering data for use in subsequent instances of the instruction block.Type: GrantFiled: February 1, 2016Date of Patent: May 7, 2024Assignee: Microsoft Technology Licensing, LLCInventors: Douglas C. Burger, Aaron L. Smith
-
Publication number: 20240134439Abstract: Methods, systems and computer program products are provided for improving performance (e.g., reducing power consumption) of a hardware accelerator (e.g., neural processor) comprising hybrid or analog multiply and accumulate (MAC) processing elements (PEs). Selective variation of the precision of an array of MAC PEs may reduce power consumption of a neural processor. Power may be conserved by dynamically controlling the precision of analog to digital (ADC) output bits for one or more MAC PEs. Dynamic control of ADC output bit precision may be based on precision information determined during training and/or post-training (e.g., quantization) of an artificial intelligence (AI) neural network (NN) model implemented by the neural processor. Precision information may include a range of dynamic precision for each of a plurality of nodes of a computation graph for the AI NN model.Type: ApplicationFiled: December 29, 2023Publication date: April 25, 2024Inventors: Gilad KIRSHENBOIM, Ran SAHAR, Douglas C. BURGER, Yehonathan REFAEL KALIM
-
Patent number: 11899518Abstract: Methods, systems and computer program products are provided for improving performance (e.g., reducing power consumption) of a hardware accelerator (e.g., neural processor) comprising hybrid or analog multiply and accumulate (MAC) processing elements (PEs). Selective variation of the precision of an array of MAC PEs may reduce power consumption of a neural processor. Power may be conserved by dynamically controlling the precision of analog to digital (ADC) output bits for one or more MAC PEs. Dynamic control of ADC output bit precision may be based on precision information determined during training and/or post-training (e.g., quantization) of an artificial intelligence (AI) neural network (NN) model implemented by the neural processor. Precision information may include a range of dynamic precision for each of a plurality of nodes of a computation graph for the AI NN model.Type: GrantFiled: December 15, 2021Date of Patent: February 13, 2024Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Gilad Kirshenboim, Ran Sahar, Douglas C. Burger, Yehonathan Refael Kalim
-
Patent number: 11886833Abstract: Embodiments of the present disclosure include systems and methods for providing hierarchical and shared exponent floating point data types. First and second shared exponent values are determined based on exponent values of a plurality of floating point values. A third shared exponent value is determined based the first shared exponent value and the second shared exponent value. First and second difference values are determined based on the first shared exponent value, the second shared exponent value, and the third shared exponent value. Sign values and mantissa values are determined for the plurality of floating point values. The sign value and the mantissa value for each floating point value in the plurality of floating point values, the third shared exponent value, the first difference value, and the second difference value are stored in a data structure for a shared exponent floating point data type.Type: GrantFiled: June 28, 2021Date of Patent: January 30, 2024Assignee: Microsoft Technology Licensing, LLCInventors: Bita Darvish Rouhani, Venmugil Elango, Rasoul Shafipour, Jeremy Fowers, Ming Gang Liu, Jinwen Xi, Douglas C. Burger, Eric S. Chung
-
Publication number: 20230376725Abstract: Embodiments of the present disclosure include systems and methods for providing model customizations of transformers for improved efficiency. A first set of settings for a transformer model is received. Based on the first set of settings, a second set of settings for the transformer model is determined. The first set of settings and the second set of settings are used to configure and train the transformer model.Type: ApplicationFiled: May 19, 2022Publication date: November 23, 2023Inventors: Maral Mesmakhosroshahi, Bita Darvish Rouhani, Eric S. Chung, Douglas C. Burger, Maximilian Taylor Golub
-
Patent number: 11803389Abstract: A reach matrix scheduler circuit for scheduling instructions to be executed in a processor is disclosed. The scheduler circuit includes an N×R matrix wake-up circuit, where ‘N’ is the instruction window size of the scheduler circuit, and ‘R’ is the “reach” within the instruction window of the matrix wake-up circuit, with ‘R’ being less than ‘N’. A grant line associated with each instruction request entry in the N×R matrix wake-up circuit is coupled to ‘R’ other instruction entries among the ‘N’ instruction entries. When a producer instruction in an instruction request entry is ready for issuance, the grant line associated with the instruction request entry is activated so that any other instruction entries coupled to the grant line (i.e., within the “reach” of the instruction request entry) that consume the produced value generated by the producer instruction are “woken-up” and subsequently indicated as ready to be issued.Type: GrantFiled: January 9, 2020Date of Patent: October 31, 2023Assignee: Microsoft Technology Licensing, LLCInventors: Yusuf Cagatay Tekmen, Rodney Wayne Smith, Douglas C. Burger, Gagan Gupta, Kiran Ravi Seth
-
Patent number: 11755484Abstract: Apparatus and methods are disclosed for throttling processor operation in block-based processor architectures. In one example of the disclosed technology, a block-based instruction set architecture processor includes a plurality of processing cores configured to fetch and execute a sequence of instruction blocks. Each of the processing cores includes function resources for performing operations specified by the instruction blocks. The processor further includes a core scheduler configured to allocate functional resources for performing the operations. The functional resources are allocated for executing the instruction blocks based, at least in part, on a performance metric. The performance metric can be generated dynamically or statically based on branch prediction accuracy, energy usage tolerance, and other suitable metrics.Type: GrantFiled: June 26, 2015Date of Patent: September 12, 2023Assignee: Microsoft Technology Licensing, LLCInventors: Jan S. Gray, Douglas C. Burger, Aaron L. Smith
-
Publication number: 20230267319Abstract: Technology related to training a neural network accelerator using mixed precision data formats is disclosed. In one example of the disclosed technology, a neural network accelerator is configured to accelerate a given layer of a multi-layer neural network. An input tensor for the given layer can be converted from a normal-precision floating-point format to a quantized-precision floating-point format. A tensor operation can be performed using the converted input tensor. A result of the tensor operation can be converted from the block floating-point format to the normal-precision floating-point format. The converted result can be used to generate an output tensor of the layer of the neural network, where the output tensor is in normal-precision floating-point format.Type: ApplicationFiled: April 28, 2023Publication date: August 24, 2023Applicant: Microsoft Technology Licensing, LLCInventors: Bita Darvish Rouhani, Taesik Na, Eric S. Chung, Daniel Lo, Douglas C. Burger
-
Patent number: 11726912Abstract: Systems and methods are disclosed for performing wide memory operations for a wide data cache line. In some examples of the disclosed technology, a processor having two or more execution lanes includes a data cache coupled to memory, a wide memory load circuit that concurrently loads two or more words from a cache line of the data cache, and a writeback circuit situated to send a respective word of the concurrently-loaded words to a selected execution lane of the processor, either into an operand buffer or bypassing the operand buffer. In some examples, a sharding circuit is provided that allows bitwise, byte-wise, and/or word-wise manipulation of memory operation data. In some examples, wide cache loads allows for concurrent execution of plural execution lanes of the processor.Type: GrantFiled: March 29, 2021Date of Patent: August 15, 2023Assignee: Microsoft Technology Licensing, LLCInventors: Douglas C. Burger, Aaron L. Smith, Gagan Gupta, David T. Harper
-
Patent number: 11681531Abstract: Apparatus and methods are disclosed for controlling execution of memory access instructions in a block-based processor architecture using a hardware structure that indicates a relative ordering of memory access instruction in an instruction block. In one example of the disclosed technology, a method of executing an instruction block having a plurality of memory load and/or memory store instructions includes selecting a next memory load or memory store instruction to execute based on dependencies encoded within the block, and on a store vector that stores data indicating which memory load and memory store instructions in the instruction block have executed. The store vector can be masked using a store mask. The store mask can be generated when decoding the instruction block, or copied from an instruction block header. Based on the encoded dependencies and the masked store vector, the next instruction can issue when its dependencies are available.Type: GrantFiled: October 23, 2015Date of Patent: June 20, 2023Assignee: Microsoft Technology Licensing, LLCInventors: Douglas C. Burger, Aaron L. Smith
-
Publication number: 20230185352Abstract: Methods, systems and computer program products are provided for improving performance (e.g., reducing power consumption) of a hardware accelerator (e.g., neural processor) comprising hybrid or analog multiply and accumulate (MAC) processing elements (PEs). Selective variation of the precision of an array of MAC PEs may reduce power consumption of a neural processor. Power may be conserved by dynamically controlling the precision of analog to digital (ADC) output bits for one or more MAC PEs. Dynamic control of ADC output bit precision may be based on precision information determined during training and/or post-training (e.g., quantization) of an artificial intelligence (AI) neural network (NN) model implemented by the neural processor. Precision information may include a range of dynamic precision for each of a plurality of nodes of a computation graph for the AI NN model.Type: ApplicationFiled: December 15, 2021Publication date: June 15, 2023Inventors: Gilad KIRSHENBOIM, Ran SAHAR, Douglas C. BURGER, Yehonathan REFAEL KALIM
-
Patent number: 11676003Abstract: Technology related to training a neural network accelerator using mixed precision data formats is disclosed. In one example of the disclosed technology, a neural network accelerator is configured to accelerate a given layer of a multi-layer neural network. An input tensor for the given layer can be converted from a normal-precision floating-point format to a quantized-precision floating-point format. A tensor operation can be performed using the converted input tensor. A result of the tensor operation can be converted from the block floating-point format to the normal-precision floating-point format. The converted result can be used to generate an output tensor of the layer of the neural network, where the output tensor is in normal-precision floating-point format.Type: GrantFiled: December 18, 2018Date of Patent: June 13, 2023Assignee: Microsoft Technology Licensing, LLCInventors: Bita Darvish Rouhani, Taesik Na, Eric S. Chung, Daniel Lo, Douglas C. Burger
-
Patent number: 11663450Abstract: Hardware and methods for neural network processing are provided. A method in a hardware node including a pipeline having a matrix vector unit (MVU), a first multifunction unit connected to receive an input from the matrix vector unit, a second multifunction unit connected to receive an output from the first multifunction unit, and a third multifunction unit connected to receive an output from the second multifunction unit is provided. The method includes performing using the MVU a first type of instruction that can only be performed by the MVU to generate a first result. The method further includes performing a second type of instruction that can only be performed by one of the multifunction units and generating a second result and without storing the any of the two results in a global register, passing the second result to the second multifunction and the third multifunction unit.Type: GrantFiled: June 29, 2017Date of Patent: May 30, 2023Assignee: Microsoft Technology Licensing, LLCInventors: Jeremy Fowers, Eric S. Chung, Douglas C. Burger
-
Patent number: 11645493Abstract: Methods and apparatus are disclosed supporting a design flow for developing quantized neural networks. In one example of the disclosed technology, a method includes quantizing a normal-precision floating-point neural network model into a quantized format. For example, the quantized format can be a block floating-point format, where two or more elements of tensors in the neural network share a common exponent. A set of test input is applied to a normal-precision flooding point model and the corresponding quantized model and the respective output tensors are compared. Based on this comparison, hyperparameters or other attributes of the neural networks can be adjusted. Further, quantization parameters determining the widths of data and selection of shared exponents for the block floating-point format can be selected. An adjusted, quantized neural network is retrained and programmed into a hardware accelerator.Type: GrantFiled: May 4, 2018Date of Patent: May 9, 2023Assignee: Microsoft Technology Licensing, LLCInventors: Douglas C. Burger, Eric S. Chung, Bita Darvish Rouhani, Daniel Lo, Ritchie Zhao
-
Publication number: 20230106990Abstract: Systems and methods are disclosed for allocating resources to contexts in block-based processor architectures. In one example of the disclosed technology, a processor is configured to spatially allocate resources between multiple contexts being executed by the processor, including caches, functional units, and register files. In a second example of the disclosed technology, a processor is configured to temporally allocate resources between multiple contexts, for example, on a clock cycle basis, including caches, register files, and branch predictors. Each context is guaranteed access to its allocated resources to avoid starvation from contexts competing for resources of the processor. A results buffer can be used for folding larger instruction blocks into portions that can be mapped to smaller-sized instruction windows. The results buffer stores operand results that can be passed to subsequent portions of an instruction block.Type: ApplicationFiled: December 9, 2022Publication date: April 6, 2023Applicant: Microsoft Technology Licensing, LLCInventors: Gagan Gupta, Douglas C. Burger
-
Publication number: 20220405571Abstract: Embodiments of the present disclosure include systems and methods for sparsifying narrow data formats for neural networks. A plurality of activation values in a neural network are provided to a muxing unit. A set of sparsification operations are performed on a plurality of weight values to generate a subset of the plurality of weight values and mask values associated with the plurality of weight values. The subset of the plurality of weight values are provided to a matrix multiplication unit. The muxing unit generates a subset of the plurality of activation values based on the mask values and provides the subset of the plurality of activation values to the matrix multiplication unit. The matrix multiplication unit performs a set of matrix multiplication operations on the subset of the plurality of weight values and the subset of the plurality of activation values to generate a set of outputs.Type: ApplicationFiled: June 16, 2021Publication date: December 22, 2022Inventors: Bita DARVISH ROUHANI, Venmugil Elango, Eric S. Chung, Douglas C Burger, Mattheus C. Heddes, Nishit Shah, Rasoul Shafipour, Ankit More
-
Patent number: 11531552Abstract: Systems and methods are disclosed for allocating resources to contexts in block-based processor architectures. In one example of the disclosed technology, a processor is configured to spatially allocate resources between multiple contexts being executed by the processor, including caches, functional units, and register files. In a second example of the disclosed technology, a processor is configured to temporally allocate resources between multiple contexts, for example, on a clock cycle basis, including caches, register files, and branch predictors. Each context is guaranteed access to its allocated resources to avoid starvation from contexts competing for resources of the processor. A results buffer can be used for folding larger instruction blocks into portions that can be mapped to smaller-sized instruction windows. The results buffer stores operand results that can be passed to subsequent portions of an instruction block.Type: GrantFiled: February 6, 2017Date of Patent: December 20, 2022Assignee: Microsoft Technology Licensing, LLCInventors: Gagan Gupta, Douglas C. Burger
-
Publication number: 20220391209Abstract: Hardware and methods for neural network processing are provided. A method in a hardware node including a pipeline having a matrix vector unit (MVU), a first multifunction unit connected to receive an input from the matrix vector unit, a second multifunction unit connected to receive an output from the first multifunction unit, and a third multifunction unit connected to receive an output from the second multifunction unit is provided. The method includes performing using the MVU a first type of instruction that can only be performed by the MVU to generate a first result. The method further includes performing a second type of instruction that can only be performed by one of the multifunction units and generating a second result and without storing the any of the two results in a global register, passing the second result to the second multifunction and the third multifunction unit.Type: ApplicationFiled: August 8, 2022Publication date: December 8, 2022Inventors: Jeremy FOWERS, Eric S. CHUNG, Douglas C. BURGER
-
Publication number: 20220383123Abstract: Embodiments of the present disclosure include systems and methods for performing data-aware model pruning for neural networks. During a training phase, a neural network is trained with a first set of data. During a validation phase, inference with the neural network is performed using a second set of data that causes the neural network to generate a first set of outputs at a layer in the neural network. During the validation phase, a plurality of mean values and a plurality of variance values are calculated based on the first set of outputs. A plurality of entropy values are calculated based on the plurality of mean values and the plurality of variance values. A second set of outputs are pruned based on the plurality of entropy values. The second set of outputs are generated by the layer of the neural network using a third set of data.Type: ApplicationFiled: May 28, 2021Publication date: December 1, 2022Inventors: Venmugil ELANGO, Bita DARVISH ROUHANI, Eric S. CHUNG, Douglas C. BURGER, Maximilian GOLUB
-
Publication number: 20220383092Abstract: Embodiments of the present disclosure includes systems and methods for reducing computational cost associated with training a neural network model. A neural network model is received and a neural network training process is executed in which the neural network model is trained according to a first fidelity during a first training phase. As a result of a determination that training of the neural network model during the first training phase satisfies one or more criteria, the neural network model is trained at a second fidelity during a second training phase, the second fidelity being a higher fidelity than the first fidelity.Type: ApplicationFiled: May 25, 2021Publication date: December 1, 2022Inventors: Ritchie ZHAO, Bita DARVISH ROUHANI, Eric S. CHUNG, Douglas C. BURGER, Maximilian GOLUB