Patents by Inventor Douglas C. Burger

Douglas C. Burger has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20190340499
    Abstract: Methods and apparatus are disclosed for providing emulation of quantized precision operations. In some examples, the quantized precision operations are performed for neural network models. Parameters of the quantized precision operations can be selected to emulate operation of hardware accelerators adapted to perform quantized format operations. In some examples, the quantized precision operations are performed in a block floating-point format where one or more values of a tensor, matrix, or vectors share a common exponent. Techniques for selecting the exponent, reshaping the input tensors, and training neural networks for use with quantized precision models are also disclosed. In some examples, a neural network model is further retrained based on the quantized model. For example, a normal precision model or a quantized precision model can be retrained by evaluating loss induced by performing operations in the quantized format.
    Type: Application
    Filed: May 4, 2018
    Publication date: November 7, 2019
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Douglas C. Burger, Eric S. Chung, Bita Darvish Rouhani, Daniel Lo, Ritchie Zhao
  • Publication number: 20190340492
    Abstract: Methods and apparatus are disclosed supporting a design flow for developing quantized neural networks. In one example of the disclosed technology, a method includes quantizing a normal-precision floating-point neural network model into a quantized format. For example, the quantized format can be a block floating-point format, where two or more elements of tensors in the neural network share a common exponent. A set of test input is applied to a normal-precision flooding point model and the corresponding quantized model and the respective output tensors are compared. Based on this comparison, hyperparameters or other attributes of the neural networks can be adjusted. Further, quantization parameters determining the widths of data and selection of shared exponents for the block floating-point format can be selected. An adjusted, quantized neural network is retrained and programmed into a hardware accelerator.
    Type: Application
    Filed: May 4, 2018
    Publication date: November 7, 2019
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Douglas C. Burger, Eric S. Chung, Bita Darvish Rouhani, Daniel Lo, Ritchie Zhao
  • Publication number: 20190339937
    Abstract: A system for block floating point computation in a neural network receives a block floating point number comprising a mantissa portion. A bit-width of the block floating point number is reduced by decomposing the block floating point number into a plurality of numbers each having a mantissa portion with a bit-width that is smaller than a bit-width of the mantissa portion of the block floating point number. One or more dot product operations are performed separately on each of the plurality of numbers to obtain individual results, which are summed to generate a final dot product value. The final dot product value is used to implement the neural network. The reduced bit width computations allow higher precision mathematical operations to be performed on lower-precision processors with improved accuracy.
    Type: Application
    Filed: May 4, 2018
    Publication date: November 7, 2019
    Inventors: Daniel LO, Eric S. CHUNG, Douglas C. BURGER
  • Patent number: 10452995
    Abstract: A method is provided for processing on an acceleration component a machine learning classification model. The machine learning classification model includes a plurality of decision trees, the decision trees including a first amount of decision tree data. The acceleration component includes an acceleration component die and a memory stack disposed in an integrated circuit package. The memory die includes an acceleration component memory having a second amount of memory less than the first amount of decision tree data. The memory stack includes a memory bandwidth greater than about 50 GB/sec and a power efficiency of greater than about 20 MB/sec/mW.
    Type: Grant
    Filed: June 29, 2015
    Date of Patent: October 22, 2019
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Douglas C. Burger, Derek Chiou, Eric Chung, Andrew R. Putnam
  • Patent number: 10452399
    Abstract: Apparatus and methods are disclosed for example computer processors that are based on a hybrid dataflow execution model. In particular embodiments, a processor core in a block-based processor comprises: one or more functional units configured to perform functions using one or more operands; an instruction window comprising buffers configured to store individual instructions for execution by the processor core, the instruction window including one or more operand buffers for an individual instruction configured to store operand values; a control unit configured to execute the instructions in the instruction window and control operation of the one or more functional units; and a broadcast value store comprising a plurality of buffers dedicated to storing broadcast values, each buffer of the broadcast value store being associated with a respective broadcast channel from among a plurality of available broadcast channels.
    Type: Grant
    Filed: March 18, 2016
    Date of Patent: October 22, 2019
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Douglas C. Burger, Aaron L. Smith
  • Patent number: 10445097
    Abstract: Apparatus and methods are disclosed for decoding targets from an instruction and transmitting data to those targets in accordance with a current instruction. Multimodal target hardware is used in conjunction with one or more of the routers so as to route data to an appropriate target. The data can be one or more operands or a predicate and the targets can include operand buffers, broadcast channels, and general registers. In this way, operands, for example, can be directed for use with multiple subsequent instructions, and there are multiple modes for distributing the operands to the multiple instructions.
    Type: Grant
    Filed: March 17, 2016
    Date of Patent: October 15, 2019
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Douglas C. Burger, Aaron L. Smith
  • Publication number: 20190310852
    Abstract: A processor core in an instruction block-based microarchitecture is configured so that an instruction window and operand buffers are decoupled for independent operation in which instructions in the block are not tied to resources such as control bits and operands that are maintained in the operand buffers. Instead, pointers are established among instructions in the block and the resources so that control state can be established for a refreshed instruction block (i.e., an instruction block that is reused without re-fetching it from an instruction cache) by following the pointers. Such decoupling of the instruction window from the operand space can provide greater processor efficiency, particularly in multiple core arrays where refreshing is utilized (for example when executing program code that uses tight loops), because the operands and control bits are pre-validated.
    Type: Application
    Filed: June 24, 2019
    Publication date: October 10, 2019
    Inventors: Douglas C. Burger, Aaron Smith, Jan Gray
  • Patent number: 10425472
    Abstract: A server system is provided that includes a plurality of servers, each server including at least one hardware acceleration device and at least one processor communicatively coupled to the hardware acceleration device by an internal data bus and executing a host server instance, the host server instances of the plurality of servers collectively providing a software plane, and the hardware acceleration devices of the plurality of servers collectively providing a hardware acceleration plane that implements a plurality of hardware accelerated services, wherein each hardware acceleration device maintains in memory a data structure that contains load data indicating a load of each of a plurality of target hardware acceleration devices, and wherein a requesting hardware acceleration device routes the request to a target hardware acceleration device that is indicated by the load data in the data structure to have a lower load than other of the target hardware acceleration devices.
    Type: Grant
    Filed: January 17, 2017
    Date of Patent: September 24, 2019
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Adrian Michael Caulfield, Eric S. Chung, Michael Konstantinos Papamichael, Douglas C. Burger, Shlomi Alkalay
  • Publication number: 20190274537
    Abstract: Methods and systems for determining a physiological parameter of a subject through interrogation of an eye of the subject with an optical signal are described. Interrogation is performed unobtrusively. The physiological parameter is determined from a signal sensed from the eye of a subject according to a schedule, under the control of a scheduling controller.
    Type: Application
    Filed: March 15, 2019
    Publication date: September 12, 2019
    Inventors: Allen L. Brown, JR., Douglas C. Burger, Eric Horvitz, Roderick A. Hyde, Edward K.Y. Jung, Jordin T. Kare, Chris Demetrios Karkanias, Eric C. Leuthardt, John L. Manferdelli, Craig J. Mundie, Nathan P. Myhrvold, Barney Pell, Clarence T. Tegreene, Willard H. Wattenburg, Charles Whitmer, Lowell L. Wood, JR., Richard N. Zare
  • Patent number: 10409606
    Abstract: Apparatus and methods are disclosed for implementing bad jump detection in block-based processor architectures. In one example of the disclosed technology, a block-based processor includes one or more block-based processing cores configured to fetch and execute atomic blocks of instructions and a control unit configured to, based at least in part on receiving a branch signal indicating a target location is received from one of the instruction blocks, verify that the target location is a valid branch target.
    Type: Grant
    Filed: June 26, 2015
    Date of Patent: September 10, 2019
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Douglas C. Burger, Aaron L. Smith, Jan S. Gray
  • Publication number: 20190236009
    Abstract: Systems and methods are disclosed for performing wide memory operations for a wide data cache line. In some examples of the disclosed technology, a processor having two or more execution lanes includes a data cache coupled to memory, a wide memory load circuit that concurrently loads two or more words from a cache line of the data cache, and a writeback circuit situated to send a respective word of the concurrently-loaded words to a selected execution lane of the processor, either into an operand buffer or bypassing the operand buffer. In some examples, a sharding circuit is provided that allows bitwise, byte-wise, and/or word-wise manipulation of memory operation data. In some examples, wide cache loads allows for concurrent execution of plural execution lanes of the processor.
    Type: Application
    Filed: February 2, 2018
    Publication date: August 1, 2019
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Douglas C. Burger, Aaron L. Smith, Gagan Gupta, David T. Harper
  • Patent number: 10346168
    Abstract: A processor core in an instruction block-based microarchitecture is configured so that an instruction window and operand buffers are decoupled for independent operation in which instructions in the block are not tied to resources such as control bits and operands that are maintained in the operand buffers. Instead, pointers are established among instructions in the block and the resources so that control state can be established for a refreshed instruction block (i.e., an instruction block that is reused without re-fetching it from an instruction cache) by following the pointers. Such decoupling of the instruction window from the operand space can provide greater processor efficiency, particularly in multiple core arrays where refreshing is utilized (for example when executing program code that uses tight loops), because the operands and control bits are pre-validated.
    Type: Grant
    Filed: June 26, 2015
    Date of Patent: July 9, 2019
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Douglas C. Burger, Aaron Smith, Jan Gray
  • Publication number: 20190197406
    Abstract: A computer implemented method of optimizing a neural network includes obtaining a deep neural network (DNN) trained with a training dataset, determining a spreading signal between neurons in multiple adjacent layers of the DNN wherein the spreading signal is an element-wise multiplication of input activations between the neurons in a first layer to neurons in a second next layer with a corresponding weight matrix of connections between such neurons, and determining neural entropies of respective connections between neurons by calculating an exponent of a volume of an area covered by the spreading signal. The DNN may be optimized based on the determined neural entropies between the neurons in the multiple adjacent layers.
    Type: Application
    Filed: December 22, 2017
    Publication date: June 27, 2019
    Inventors: Bita Darvish Rouhani, Douglas C. Burger, Eric S. Chung
  • Patent number: 10332008
    Abstract: A decision tree multi-processor system includes a plurality of decision tree processors that access a common feature vector and execute one or more decision trees with respect to the common feature vector. A related method includes providing a common feature vector to a plurality of decision tree processors implemented within an on-chip decision tree scoring system, and executing, by the plurality of decision tree processors, a plurality off decision trees, by reference to the common feature vector. A related decision tree-walking system includes feature storage that stores a common feature vector and a plurality of decision tree processors that access the common feature vector from the feature storage and execute a plurality of decision trees by comparing threshold values of the decision trees to feature values within the common feature vector.
    Type: Grant
    Filed: March 17, 2014
    Date of Patent: June 25, 2019
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Douglas C. Burger, James R. Larus, Andrew Putnam, Jan Gray
  • Patent number: 10333781
    Abstract: Aspects extend to methods, systems, and computer program products for changing between different roles at acceleration components. Changing roles at an acceleration component can be facilitated without loading an image file to configure or partially reconfigure the acceleration component. At configuration time, an acceleration component can be configured with a framework and a plurality of selectable roles. The framework also provides a mechanism for loading different selectable roles for execution at the acceleration component (e.g., the framework can include a superset of instructions for providing any of a plurality of different roles). The framework can receive requests for specified roles from other components and switch to a subset of instructions for the specified roles. Switching between subsets of instructions at an acceleration component is a lower overhead operation relative to reconfiguring or partially reconfiguring an acceleration component by loading an image file.
    Type: Grant
    Filed: June 26, 2015
    Date of Patent: June 25, 2019
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Andrew R. Putnam, Douglas C. Burger, Michael David Haselman, Stephen F. Heil, Yi Xiao, Sitaram V. Lanka
  • Publication number: 20190190847
    Abstract: Aspects extend to methods, systems, and computer program products for allocating acceleration component functionality for supporting services. A service manager uses a finite number of acceleration components to accelerate services. Acceleration components can be allocated in a manner that balances load in a hardware acceleration plane, minimizes role switching, and adapts to demand changes. When role switching is appropriate, less extensive mechanisms (e.g., based on configuration data versus image files) can be used to switch roles to the extent possible.
    Type: Application
    Filed: February 25, 2019
    Publication date: June 20, 2019
    Inventors: Douglas C. Burger, Andrew R. Putnam, Stephen F. Heil, Michael David Haselman, Sitaram V. Lanka, Yi Xiao
  • Publication number: 20190155669
    Abstract: Aspects extend to methods, systems, and computer program products for partially reconfiguring acceleration components. Partial reconfiguration can be implemented for any of a variety of reasons, including to address an error in functionality at the acceleration component or to update functionality at the acceleration component. During partial reconfiguration, connectivity can be maintained for any other functionality at the acceleration component untouched by the partial reconfiguration. Partial reconfiguration is more efficient to deploy than full reconfiguration of an acceleration component.
    Type: Application
    Filed: January 25, 2019
    Publication date: May 23, 2019
    Inventors: Derek T. Chiou, Sitaram V. Lanka, Adrian M. Caulfield, Andrew R. Putnam, Douglas C. Burger
  • Patent number: 10296392
    Abstract: A data processing system is described herein that includes two or more software-driven host components that collectively provide a software plane. The data processing system further includes two or more hardware acceleration components that collectively provide a hardware acceleration plane. The hardware acceleration plane implements one or more services, including at least one multi-component service. The multi-component service has plural parts, and is implemented on a collection of two or more hardware acceleration components, where each hardware acceleration component in the collection implements a corresponding part of the multi-component service. Each hardware acceleration component in the collection is configured to interact with other hardware acceleration components in the collection without involvement from any host component. A function parsing component is also described herein that determines a manner of parsing a function into the plural parts of the multi-component service.
    Type: Grant
    Filed: May 20, 2015
    Date of Patent: May 21, 2019
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Stephen F. Heil, Adrian M. Caulfield, Douglas C. Burger, Andrew R. Putnam, Eric S. Chung
  • Patent number: 10270709
    Abstract: Aspects extend to methods, systems, and computer program products for allocating acceleration component functionality for supporting services. A service manager uses a finite number of acceleration components to accelerate services. Acceleration components can be allocated in a manner that balances load in a hardware acceleration plane, minimizes role switching, and adapts to demand changes. When role switching is appropriate, less extensive mechanisms (e.g., based on configuration data versus image files) can be used to switch roles to the extent possible.
    Type: Grant
    Filed: June 26, 2015
    Date of Patent: April 23, 2019
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Douglas C. Burger, Andrew R. Putnam, Stephen F. Heil, Michael David Haselman, Sitaram V. Lanka, Yi Xiao
  • Patent number: 10251541
    Abstract: Methods and systems for determining a physiological parameter of a subject through interrogation of an eye of the subject with an optical signal are described. Interrogation is performed unobtrusively. The physiological parameter is determined from a signal sensed from the eye of a subject according to a schedule, under the control of a scheduling controller.
    Type: Grant
    Filed: April 1, 2016
    Date of Patent: April 9, 2019
    Assignee: Elwha LLC
    Inventors: Allen L. Brown, Jr., Douglas C. Burger, Eric Horvitz, Roderick A. Hyde, Edward K. Y. Jung, Jordin T. Kare, Chris Demetrios Karkanias, Eric C. Leuthardt, John L. Manferdelli, Craig J. Mundie, Nathan P. Myhrvold, Barney Pell, Clarence T. Tegreene, Willard H. Wattenburg, Charles Whitmer, Lowell L. Wood, Jr., Richard N. Zare