Patents by Inventor Douglas C. Burger
Douglas C. Burger has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20190340499Abstract: Methods and apparatus are disclosed for providing emulation of quantized precision operations. In some examples, the quantized precision operations are performed for neural network models. Parameters of the quantized precision operations can be selected to emulate operation of hardware accelerators adapted to perform quantized format operations. In some examples, the quantized precision operations are performed in a block floating-point format where one or more values of a tensor, matrix, or vectors share a common exponent. Techniques for selecting the exponent, reshaping the input tensors, and training neural networks for use with quantized precision models are also disclosed. In some examples, a neural network model is further retrained based on the quantized model. For example, a normal precision model or a quantized precision model can be retrained by evaluating loss induced by performing operations in the quantized format.Type: ApplicationFiled: May 4, 2018Publication date: November 7, 2019Applicant: Microsoft Technology Licensing, LLCInventors: Douglas C. Burger, Eric S. Chung, Bita Darvish Rouhani, Daniel Lo, Ritchie Zhao
-
Publication number: 20190340492Abstract: Methods and apparatus are disclosed supporting a design flow for developing quantized neural networks. In one example of the disclosed technology, a method includes quantizing a normal-precision floating-point neural network model into a quantized format. For example, the quantized format can be a block floating-point format, where two or more elements of tensors in the neural network share a common exponent. A set of test input is applied to a normal-precision flooding point model and the corresponding quantized model and the respective output tensors are compared. Based on this comparison, hyperparameters or other attributes of the neural networks can be adjusted. Further, quantization parameters determining the widths of data and selection of shared exponents for the block floating-point format can be selected. An adjusted, quantized neural network is retrained and programmed into a hardware accelerator.Type: ApplicationFiled: May 4, 2018Publication date: November 7, 2019Applicant: Microsoft Technology Licensing, LLCInventors: Douglas C. Burger, Eric S. Chung, Bita Darvish Rouhani, Daniel Lo, Ritchie Zhao
-
Publication number: 20190339937Abstract: A system for block floating point computation in a neural network receives a block floating point number comprising a mantissa portion. A bit-width of the block floating point number is reduced by decomposing the block floating point number into a plurality of numbers each having a mantissa portion with a bit-width that is smaller than a bit-width of the mantissa portion of the block floating point number. One or more dot product operations are performed separately on each of the plurality of numbers to obtain individual results, which are summed to generate a final dot product value. The final dot product value is used to implement the neural network. The reduced bit width computations allow higher precision mathematical operations to be performed on lower-precision processors with improved accuracy.Type: ApplicationFiled: May 4, 2018Publication date: November 7, 2019Inventors: Daniel LO, Eric S. CHUNG, Douglas C. BURGER
-
Patent number: 10452995Abstract: A method is provided for processing on an acceleration component a machine learning classification model. The machine learning classification model includes a plurality of decision trees, the decision trees including a first amount of decision tree data. The acceleration component includes an acceleration component die and a memory stack disposed in an integrated circuit package. The memory die includes an acceleration component memory having a second amount of memory less than the first amount of decision tree data. The memory stack includes a memory bandwidth greater than about 50 GB/sec and a power efficiency of greater than about 20 MB/sec/mW.Type: GrantFiled: June 29, 2015Date of Patent: October 22, 2019Assignee: Microsoft Technology Licensing, LLCInventors: Douglas C. Burger, Derek Chiou, Eric Chung, Andrew R. Putnam
-
Patent number: 10452399Abstract: Apparatus and methods are disclosed for example computer processors that are based on a hybrid dataflow execution model. In particular embodiments, a processor core in a block-based processor comprises: one or more functional units configured to perform functions using one or more operands; an instruction window comprising buffers configured to store individual instructions for execution by the processor core, the instruction window including one or more operand buffers for an individual instruction configured to store operand values; a control unit configured to execute the instructions in the instruction window and control operation of the one or more functional units; and a broadcast value store comprising a plurality of buffers dedicated to storing broadcast values, each buffer of the broadcast value store being associated with a respective broadcast channel from among a plurality of available broadcast channels.Type: GrantFiled: March 18, 2016Date of Patent: October 22, 2019Assignee: Microsoft Technology Licensing, LLCInventors: Douglas C. Burger, Aaron L. Smith
-
Patent number: 10445097Abstract: Apparatus and methods are disclosed for decoding targets from an instruction and transmitting data to those targets in accordance with a current instruction. Multimodal target hardware is used in conjunction with one or more of the routers so as to route data to an appropriate target. The data can be one or more operands or a predicate and the targets can include operand buffers, broadcast channels, and general registers. In this way, operands, for example, can be directed for use with multiple subsequent instructions, and there are multiple modes for distributing the operands to the multiple instructions.Type: GrantFiled: March 17, 2016Date of Patent: October 15, 2019Assignee: Microsoft Technology Licensing, LLCInventors: Douglas C. Burger, Aaron L. Smith
-
Publication number: 20190310852Abstract: A processor core in an instruction block-based microarchitecture is configured so that an instruction window and operand buffers are decoupled for independent operation in which instructions in the block are not tied to resources such as control bits and operands that are maintained in the operand buffers. Instead, pointers are established among instructions in the block and the resources so that control state can be established for a refreshed instruction block (i.e., an instruction block that is reused without re-fetching it from an instruction cache) by following the pointers. Such decoupling of the instruction window from the operand space can provide greater processor efficiency, particularly in multiple core arrays where refreshing is utilized (for example when executing program code that uses tight loops), because the operands and control bits are pre-validated.Type: ApplicationFiled: June 24, 2019Publication date: October 10, 2019Inventors: Douglas C. Burger, Aaron Smith, Jan Gray
-
Patent number: 10425472Abstract: A server system is provided that includes a plurality of servers, each server including at least one hardware acceleration device and at least one processor communicatively coupled to the hardware acceleration device by an internal data bus and executing a host server instance, the host server instances of the plurality of servers collectively providing a software plane, and the hardware acceleration devices of the plurality of servers collectively providing a hardware acceleration plane that implements a plurality of hardware accelerated services, wherein each hardware acceleration device maintains in memory a data structure that contains load data indicating a load of each of a plurality of target hardware acceleration devices, and wherein a requesting hardware acceleration device routes the request to a target hardware acceleration device that is indicated by the load data in the data structure to have a lower load than other of the target hardware acceleration devices.Type: GrantFiled: January 17, 2017Date of Patent: September 24, 2019Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Adrian Michael Caulfield, Eric S. Chung, Michael Konstantinos Papamichael, Douglas C. Burger, Shlomi Alkalay
-
Publication number: 20190274537Abstract: Methods and systems for determining a physiological parameter of a subject through interrogation of an eye of the subject with an optical signal are described. Interrogation is performed unobtrusively. The physiological parameter is determined from a signal sensed from the eye of a subject according to a schedule, under the control of a scheduling controller.Type: ApplicationFiled: March 15, 2019Publication date: September 12, 2019Inventors: Allen L. Brown, JR., Douglas C. Burger, Eric Horvitz, Roderick A. Hyde, Edward K.Y. Jung, Jordin T. Kare, Chris Demetrios Karkanias, Eric C. Leuthardt, John L. Manferdelli, Craig J. Mundie, Nathan P. Myhrvold, Barney Pell, Clarence T. Tegreene, Willard H. Wattenburg, Charles Whitmer, Lowell L. Wood, JR., Richard N. Zare
-
Patent number: 10409606Abstract: Apparatus and methods are disclosed for implementing bad jump detection in block-based processor architectures. In one example of the disclosed technology, a block-based processor includes one or more block-based processing cores configured to fetch and execute atomic blocks of instructions and a control unit configured to, based at least in part on receiving a branch signal indicating a target location is received from one of the instruction blocks, verify that the target location is a valid branch target.Type: GrantFiled: June 26, 2015Date of Patent: September 10, 2019Assignee: Microsoft Technology Licensing, LLCInventors: Douglas C. Burger, Aaron L. Smith, Jan S. Gray
-
Publication number: 20190236009Abstract: Systems and methods are disclosed for performing wide memory operations for a wide data cache line. In some examples of the disclosed technology, a processor having two or more execution lanes includes a data cache coupled to memory, a wide memory load circuit that concurrently loads two or more words from a cache line of the data cache, and a writeback circuit situated to send a respective word of the concurrently-loaded words to a selected execution lane of the processor, either into an operand buffer or bypassing the operand buffer. In some examples, a sharding circuit is provided that allows bitwise, byte-wise, and/or word-wise manipulation of memory operation data. In some examples, wide cache loads allows for concurrent execution of plural execution lanes of the processor.Type: ApplicationFiled: February 2, 2018Publication date: August 1, 2019Applicant: Microsoft Technology Licensing, LLCInventors: Douglas C. Burger, Aaron L. Smith, Gagan Gupta, David T. Harper
-
Patent number: 10346168Abstract: A processor core in an instruction block-based microarchitecture is configured so that an instruction window and operand buffers are decoupled for independent operation in which instructions in the block are not tied to resources such as control bits and operands that are maintained in the operand buffers. Instead, pointers are established among instructions in the block and the resources so that control state can be established for a refreshed instruction block (i.e., an instruction block that is reused without re-fetching it from an instruction cache) by following the pointers. Such decoupling of the instruction window from the operand space can provide greater processor efficiency, particularly in multiple core arrays where refreshing is utilized (for example when executing program code that uses tight loops), because the operands and control bits are pre-validated.Type: GrantFiled: June 26, 2015Date of Patent: July 9, 2019Assignee: Microsoft Technology Licensing, LLCInventors: Douglas C. Burger, Aaron Smith, Jan Gray
-
Publication number: 20190197406Abstract: A computer implemented method of optimizing a neural network includes obtaining a deep neural network (DNN) trained with a training dataset, determining a spreading signal between neurons in multiple adjacent layers of the DNN wherein the spreading signal is an element-wise multiplication of input activations between the neurons in a first layer to neurons in a second next layer with a corresponding weight matrix of connections between such neurons, and determining neural entropies of respective connections between neurons by calculating an exponent of a volume of an area covered by the spreading signal. The DNN may be optimized based on the determined neural entropies between the neurons in the multiple adjacent layers.Type: ApplicationFiled: December 22, 2017Publication date: June 27, 2019Inventors: Bita Darvish Rouhani, Douglas C. Burger, Eric S. Chung
-
Patent number: 10332008Abstract: A decision tree multi-processor system includes a plurality of decision tree processors that access a common feature vector and execute one or more decision trees with respect to the common feature vector. A related method includes providing a common feature vector to a plurality of decision tree processors implemented within an on-chip decision tree scoring system, and executing, by the plurality of decision tree processors, a plurality off decision trees, by reference to the common feature vector. A related decision tree-walking system includes feature storage that stores a common feature vector and a plurality of decision tree processors that access the common feature vector from the feature storage and execute a plurality of decision trees by comparing threshold values of the decision trees to feature values within the common feature vector.Type: GrantFiled: March 17, 2014Date of Patent: June 25, 2019Assignee: Microsoft Technology Licensing, LLCInventors: Douglas C. Burger, James R. Larus, Andrew Putnam, Jan Gray
-
Patent number: 10333781Abstract: Aspects extend to methods, systems, and computer program products for changing between different roles at acceleration components. Changing roles at an acceleration component can be facilitated without loading an image file to configure or partially reconfigure the acceleration component. At configuration time, an acceleration component can be configured with a framework and a plurality of selectable roles. The framework also provides a mechanism for loading different selectable roles for execution at the acceleration component (e.g., the framework can include a superset of instructions for providing any of a plurality of different roles). The framework can receive requests for specified roles from other components and switch to a subset of instructions for the specified roles. Switching between subsets of instructions at an acceleration component is a lower overhead operation relative to reconfiguring or partially reconfiguring an acceleration component by loading an image file.Type: GrantFiled: June 26, 2015Date of Patent: June 25, 2019Assignee: Microsoft Technology Licensing, LLCInventors: Andrew R. Putnam, Douglas C. Burger, Michael David Haselman, Stephen F. Heil, Yi Xiao, Sitaram V. Lanka
-
Publication number: 20190190847Abstract: Aspects extend to methods, systems, and computer program products for allocating acceleration component functionality for supporting services. A service manager uses a finite number of acceleration components to accelerate services. Acceleration components can be allocated in a manner that balances load in a hardware acceleration plane, minimizes role switching, and adapts to demand changes. When role switching is appropriate, less extensive mechanisms (e.g., based on configuration data versus image files) can be used to switch roles to the extent possible.Type: ApplicationFiled: February 25, 2019Publication date: June 20, 2019Inventors: Douglas C. Burger, Andrew R. Putnam, Stephen F. Heil, Michael David Haselman, Sitaram V. Lanka, Yi Xiao
-
Publication number: 20190155669Abstract: Aspects extend to methods, systems, and computer program products for partially reconfiguring acceleration components. Partial reconfiguration can be implemented for any of a variety of reasons, including to address an error in functionality at the acceleration component or to update functionality at the acceleration component. During partial reconfiguration, connectivity can be maintained for any other functionality at the acceleration component untouched by the partial reconfiguration. Partial reconfiguration is more efficient to deploy than full reconfiguration of an acceleration component.Type: ApplicationFiled: January 25, 2019Publication date: May 23, 2019Inventors: Derek T. Chiou, Sitaram V. Lanka, Adrian M. Caulfield, Andrew R. Putnam, Douglas C. Burger
-
Patent number: 10296392Abstract: A data processing system is described herein that includes two or more software-driven host components that collectively provide a software plane. The data processing system further includes two or more hardware acceleration components that collectively provide a hardware acceleration plane. The hardware acceleration plane implements one or more services, including at least one multi-component service. The multi-component service has plural parts, and is implemented on a collection of two or more hardware acceleration components, where each hardware acceleration component in the collection implements a corresponding part of the multi-component service. Each hardware acceleration component in the collection is configured to interact with other hardware acceleration components in the collection without involvement from any host component. A function parsing component is also described herein that determines a manner of parsing a function into the plural parts of the multi-component service.Type: GrantFiled: May 20, 2015Date of Patent: May 21, 2019Assignee: Microsoft Technology Licensing, LLCInventors: Stephen F. Heil, Adrian M. Caulfield, Douglas C. Burger, Andrew R. Putnam, Eric S. Chung
-
Patent number: 10270709Abstract: Aspects extend to methods, systems, and computer program products for allocating acceleration component functionality for supporting services. A service manager uses a finite number of acceleration components to accelerate services. Acceleration components can be allocated in a manner that balances load in a hardware acceleration plane, minimizes role switching, and adapts to demand changes. When role switching is appropriate, less extensive mechanisms (e.g., based on configuration data versus image files) can be used to switch roles to the extent possible.Type: GrantFiled: June 26, 2015Date of Patent: April 23, 2019Assignee: Microsoft Technology Licensing, LLCInventors: Douglas C. Burger, Andrew R. Putnam, Stephen F. Heil, Michael David Haselman, Sitaram V. Lanka, Yi Xiao
-
Patent number: 10251541Abstract: Methods and systems for determining a physiological parameter of a subject through interrogation of an eye of the subject with an optical signal are described. Interrogation is performed unobtrusively. The physiological parameter is determined from a signal sensed from the eye of a subject according to a schedule, under the control of a scheduling controller.Type: GrantFiled: April 1, 2016Date of Patent: April 9, 2019Assignee: Elwha LLCInventors: Allen L. Brown, Jr., Douglas C. Burger, Eric Horvitz, Roderick A. Hyde, Edward K. Y. Jung, Jordin T. Kare, Chris Demetrios Karkanias, Eric C. Leuthardt, John L. Manferdelli, Craig J. Mundie, Nathan P. Myhrvold, Barney Pell, Clarence T. Tegreene, Willard H. Wattenburg, Charles Whitmer, Lowell L. Wood, Jr., Richard N. Zare