Patents by Inventor Kalin Ovtcharov

Kalin Ovtcharov has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230342320
    Abstract: The present disclosure relates to devices for using a configurable stacked architecture for a fixed function datapath with an accelerator for accelerating an operation or a layer of a deep neural network (DNN). The stacked architecture may have a fixed function datapath that includes one or more configurable micro-execution units that execute a series of vector, scalar, reduction, broadcasting, and normalization operations for a DNN layer operation. The fixed function datapath may be customizable based on the DNN or the operation.
    Type: Application
    Filed: July 5, 2023
    Publication date: October 26, 2023
    Inventors: Stephen Sangho YOUN, Steven Karl REINHARDT, Jeremy Halden FOWERS, Lok Chand KOPPAKA, Kalin OVTCHAROV
  • Patent number: 11790212
    Abstract: Quantization-aware neural architecture search (“QNAS”) can be utilized to learn optimal hyperparameters for configuring an artificial neural network (“ANN”) that quantizes activation values and/or weights. The hyperparameters can include model topology parameters, quantization parameters, and hardware architecture parameters. Model topology parameters specify the structure and connectivity of an ANN. Quantization parameters can define a quantization configuration for an ANN such as, for example, a bit width for a mantissa for storing activation values or weights generated by the layers of an ANN. The activation values and weights can be represented using a quantized-precision floating-point format, such as a block floating-point format (“BFP”) having a mantissa that has fewer bits than a mantissa in a normal-precision floating-point representation and a shared exponent.
    Type: Grant
    Filed: March 18, 2019
    Date of Patent: October 17, 2023
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Kalin Ovtcharov, Eric S. Chung, Vahideh Akhlaghi, Ritchie Zhao
  • Patent number: 11734214
    Abstract: The present disclosure relates to devices for using a configurable stacked architecture for a fixed function datapath with an accelerator for accelerating an operation or a layer of a deep neural network (DNN). The stacked architecture may have a fixed function datapath that includes one or more configurable micro-execution units that execute a series of vector, scalar, reduction, broadcasting, and normalization operations for a DNN layer operation. The fixed function datapath may be customizable based on the DNN or the operation.
    Type: Grant
    Filed: March 25, 2021
    Date of Patent: August 22, 2023
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Stephen Sangho Youn, Steven Karl Reinhardt, Jeremy Halden Fowers, Lok Chand Koppaka, Kalin Ovtcharov
  • Patent number: 11604960
    Abstract: Machine learning is utilized to learn an optimized quantization configuration for an artificial neural network (ANN). For example, an ANN can be utilized to learn an optimal bit width for quantizing weights for layers of the ANN. The ANN can also be utilized to learn an optimal bit width for quantizing activation values for the layers of the ANN. Once the bit widths have been learned, they can be utilized at inference time to improve the performance of the ANN by quantizing the weights and activation values of the layers of the ANN.
    Type: Grant
    Filed: March 18, 2019
    Date of Patent: March 14, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Kalin Ovtcharov, Eric S. Chung, Vahideh Akhlaghi, Ritchie Zhao
  • Patent number: 11556762
    Abstract: Neural network processors that have been customized based on application specific synthesis specialization parameters and related methods are described. Certain example neural network processors and methods described in the present disclosure expose several major synthesis specialization parameters that can be used for specializing a microarchitecture instance of a neural network processor to specific neural network models including: (1) aligning the native vector dimension to the parameters of the model to minimize padding and waste during model evaluation, (2) increasing lane widths to drive up intra-row-level parallelism, or (3) increasing matrix multiply tiles to exploit sub-matrix parallelism for large neural network models.
    Type: Grant
    Filed: April 21, 2018
    Date of Patent: January 17, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Jeremy Fowers, Kalin Ovtcharov, Eric S. Chung, Todd Michael Massengill, Ming Gang Liu, Gabriel Leonard Weisz
  • Publication number: 20220245083
    Abstract: The present disclosure relates to devices for using a configurable stacked architecture for a fixed function datapath with an accelerator for accelerating an operation or a layer of a deep neural network (DNN). The stacked architecture may have a fixed function datapath that includes one or more configurable micro-execution units that execute a series of vector, scalar, reduction, broadcasting, and normalization operations for a DNN layer operation. The fixed function datapath may be customizable based on the DNN or the operation.
    Type: Application
    Filed: March 25, 2021
    Publication date: August 4, 2022
    Inventors: Stephen Sangho YOUN, Steven Karl REINHARDT, Jeremy Halden FOWERS, Lok Chand KOPPAKA, Kalin OVTCHAROV
  • Publication number: 20220012577
    Abstract: Systems and methods for neural network processing are provided. A method in a system comprising a plurality of nodes interconnected via a network, where each node includes a plurality of on-chip memory blocks and a plurality of compute units, is provided. The method includes upon service activation receiving an N by M matrix of coefficients corresponding to the neural network model. The method includes loading the coefficients corresponding to the neural network model into the plurality of the on-chip memory blocks for processing by the plurality of compute units. The method includes regardless of a utilization of the plurality of the on-chip memory blocks as part of an evaluation of the neural network model, maintaining the coefficients corresponding to the neural network model in the plurality of the on-chip memory blocks until the service is interrupted or the neural network model is modified or replaced.
    Type: Application
    Filed: September 23, 2021
    Publication date: January 13, 2022
    Inventors: Eric S. Chung, Douglas C. Burger, Jeremy Fowers, Kalin Ovtcharov
  • Patent number: 11200486
    Abstract: A hardware acceleration component is provided for implementing a convolutional neural network. The hardware acceleration component includes an array of N rows and M columns of functional units, an array of N input data buffers configured to store input data, and an array of M weights data buffers configured to store weights data. Each of the N input data buffers is coupled to a corresponding one of the N rows of functional units. Each of the M weights data buffers is coupled to a corresponding one of the M columns of functional units. Each functional unit in a row is configured to receive a same set of input data. Each functional unit in a column is configured to receive a same set of weights data from the weights data buffer coupled to the row. Each of the functional units is configured to perform a convolution of the received input data and the received weights data, and the M columns of functional units are configured to provide M planes of output data.
    Type: Grant
    Filed: June 13, 2019
    Date of Patent: December 14, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Eric Chung, Karin Strauss, Kalin Ovtcharov, Joo-Young Kim, Olatunji Ruwase
  • Patent number: 11157801
    Abstract: Systems and methods for neural network processing are provided. A method in a system comprising a plurality of nodes interconnected via a network, where each node includes a plurality of on-chip memory blocks and a plurality of compute units, is provided. The method includes upon service activation receiving an N by M matrix of coefficients corresponding to the neural network model. The method includes loading the coefficients corresponding to the neural network model into the plurality of the on-chip memory blocks for processing by the plurality of compute units. The method includes regardless of a utilization of the plurality of the on-chip memory blocks as part of an evaluation of the neural network model, maintaining the coefficients corresponding to the neural network model in the plurality of the on-chip memory blocks until the service is interrupted or the neural network model is modified or replaced.
    Type: Grant
    Filed: June 29, 2017
    Date of Patent: October 26, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Eric S. Chung, Douglas C. Burger, Jeremy Fowers, Kalin Ovtcharov
  • Patent number: 10795678
    Abstract: Neural network processors including a vector register file (VRF) having a multi-port memory and related methods are provided. The processor may include tiles to process an N by N matrix of data elements and an N by 1 vector of data elements. The VRF may, in response to a write instruction, store N data elements in a multi-port memory and during each one of out of P clock cycles provide N data elements to each one of P input interface circuits of the multi-port memory comprising an input lane configured to carry L data elements in parallel. During the each one of the P clock cycles the multi-port memory may be configured to receive N data elements via a selected at least one of the P input interface circuits. The VRF may include output interface circuits for providing N data elements in response to a read instruction.
    Type: Grant
    Filed: April 21, 2018
    Date of Patent: October 6, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Jeremy Fowers, Kalin Ovtcharov, Eric S. Chung, Todd Michael Massengill, Ming Gang Liu, Gabriel Leonard Weisz
  • Publication number: 20200302271
    Abstract: Quantization-aware neural architecture search (“QNAS”) can be utilized to learn optimal hyperparameters for configuring an artificial neural network (“ANN”) that quantizes activation values and/or weights. The hyperparameters can include model topology parameters, quantization parameters, and hardware architecture parameters. Model topology parameters specify the structure and connectivity of an ANN. Quantization parameters can define a quantization configuration for an ANN such as, for example, a bit width for a mantissa for storing activation values or weights generated by the layers of an ANN. The activation values and weights can be represented using a quantized-precision floating-point format, such as a block floating-point format (“BFP”) having a mantissa that has fewer bits than a mantissa in a normal-precision floating-point representation and a shared exponent.
    Type: Application
    Filed: March 18, 2019
    Publication date: September 24, 2020
    Inventors: Kalin Ovtcharov, Eric S. Chung, Vahideh Akhlaghi, Ritchie Zhao
  • Publication number: 20200302269
    Abstract: Machine learning is utilized to learn an optimized quantization configuration for an artificial neural network (ANN). For example, an ANN can be utilized to learn an optimal bit width for quantizing weights for layers of the ANN. The ANN can also be utilized to learn an optimal bit width for quantizing activation values for the layers of the ANN. Once the bit widths have been learned, they can be utilized at inference time to improve the performance of the ANN by quantizing the weights and activation values of the layers of the ANN.
    Type: Application
    Filed: March 18, 2019
    Publication date: September 24, 2020
    Inventors: Kalin OVTCHAROV, Eric S. CHUNG, Vahideh AKHLAGHI, Ritchie ZHAO
  • Patent number: 10566076
    Abstract: Comparisons between two nucleotide sequences can be performed by customized integrated circuitry that can implement a Smith Waterman analysis in series, as opposed to the parallel implementations known in the art. Series performance enables such customized integrated circuitry to take advantage of optimizations, including enveloping thresholds that demarcate between cells of a two-dimensional matrix for which nucleotide comparisons are to be performed, and cells of the two-dimensional matrix for which no such comparison need be performed, and, instead, a value of zero can simply be entered. Additionally, such customized integrated circuitry facilitates the combination of multiple control units, each directing the comparison of a unique pair of nucleotides, with a single calculation engine that can generate values for individual cells of the two-dimensional matrices by which such pairs of nucleotides are compared.
    Type: Grant
    Filed: November 11, 2016
    Date of Patent: February 18, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Daniel Lo, Eric Chung, Kalin Ovtcharov, Ravindra Pandya, David Heckerman
  • Publication number: 20190325296
    Abstract: Neural network processors that have been customized based on application specific synthesis specialization parameters and related methods are described. Certain example neural network processors and methods described in the present disclosure expose several major synthesis specialization parameters that can be used for specializing a microarchitecture instance of a neural network processor to specific neural network models including: (1) aligning the native vector dimension to the parameters of the model to minimize padding and waste during model evaluation, (2) increasing lane widths to drive up intra-row-level parallelism, or (3) increasing matrix multiply tiles to exploit sub-matrix parallelism for large neural network models.
    Type: Application
    Filed: April 21, 2018
    Publication date: October 24, 2019
    Inventors: Jeremy Fowers, Kalin Ovtcharov, Eric S. Chung, Todd Michael Massengill, Ming Gang Liu, Gabriel Leonard Weisz
  • Publication number: 20190324748
    Abstract: Neural network processors including a vector register file (VRF) having a multi-port memory and related methods are provided. The processor may include tiles to process an N by N matrix of data elements and an N by 1 vector of data elements. The VRF may, in response to a write instruction, store N data elements in a multi-port memory and during each one of out of P clock cycles provide N data elements to each one of P input interface circuits of the multi-port memory comprising an input lane configured to carry L data elements in parallel. During the each one of the P clock cycles the multi-port memory may be configured to receive N data elements via a selected at least one of the P input interface circuits. The VRF may include output interface circuits for providing N data elements in response to a read instruction.
    Type: Application
    Filed: April 21, 2018
    Publication date: October 24, 2019
    Inventors: Jeremy Fowers, Kalin Ovtcharov, Eric S. Chung, Todd Michael Massengill, Ming Gang Liu, Gabriel Leonard Weisz
  • Patent number: 10452971
    Abstract: A method is provided for implementing a deep neural network on a server component that includes a host component including a CPU and a hardware acceleration component coupled to the host component. The deep neural network includes a plurality of layers. The method includes partitioning the deep neural network into a first segment and a second segment, the first segment including a first subset of the plurality of layers, the second segment including a second subset of the plurality of layers, configuring the host component to implement the first segment, and configuring the hardware acceleration component to implement the second segment.
    Type: Grant
    Filed: June 29, 2015
    Date of Patent: October 22, 2019
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Eric Chung, Karin Strauss, Kalin Ovtcharov, Joo-Young Kim, Olatunji Ruwase
  • Publication number: 20190311253
    Abstract: A hardware acceleration component is provided for implementing a convolutional neural network. The hardware acceleration component includes an array of N rows and M columns of functional units, an array of N input data buffers configured to store input data, and an array of M weights data buffers configured to store weights data. Each of the N input data buffers is coupled to a corresponding one of the N rows of functional units. Each of the M weights data buffers is coupled to a corresponding one of the M columns of functional units. Each functional unit in a row is configured to receive a same set of input data. Each functional unit in a column is configured to receive a same set of weights data from the weights data buffer coupled to the row. Each of the functional units is configured to perform a convolution of the received input data and the received weights data, and the M columns of functional units are configured to provide M planes of output data.
    Type: Application
    Filed: June 13, 2019
    Publication date: October 10, 2019
    Inventors: Eric Chung, Karin Strauss, Kalin Ovtcharov, Joo-Young Kim, Olatunji Ruwase
  • Patent number: 10372456
    Abstract: A hardware accelerator having an efficient instruction set is disclosed. An apparatus may comprise logic configured to access a first and a second machine instruction. The second machine instruction may be missing a tensor operand needed to execute the second machine instruction. The logic may be further configured to execute the first machine instruction, resulting in a tensor. The logic may be further configured to execute the second machine instruction using the resultant tensor as the missing tensor operand.
    Type: Grant
    Filed: May 24, 2017
    Date of Patent: August 6, 2019
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Jeremy Halden Fowers, Kalin Ovtcharov, Steven Karl Reinhardt, Eric Sen Chung, Ming Gang Liu
  • Patent number: 10338925
    Abstract: Tensor register files in a hardware accelerator are disclosed. An apparatus may comprise tensor operation calculators each configured to perform a type of tensor operation. The apparatus may also comprises tensor register files, each of which is associated with one of the tensor operation calculators. The apparatus may also comprises logic configured to store respective ones of the tensors in the plurality of tensor register files in accordance with the type of tensor operation to be performed on the respective tensors. The apparatus may also control read access to tensor register files based on a type of tensor operation that a machine instruction is to perform.
    Type: Grant
    Filed: May 24, 2017
    Date of Patent: July 2, 2019
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Jeremy Halden Fowers, Steven Karl Reinhardt, Kalin Ovtcharov, Eric Sen Chung
  • Patent number: 10331445
    Abstract: A processor circuit is provided that includes an input terminal and an output terminal, a plurality of vector processor operation circuits, a selector circuit coupled to the input terminal, the output terminal, and each of the vector processor operation circuits, and a scheduler circuit adapted to control the selector circuit to configure a vector processing pipeline comprising zero, one or more of the vector processor operation circuits in any order between the input terminal and the output terminal.
    Type: Grant
    Filed: May 24, 2017
    Date of Patent: June 25, 2019
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Jeremy Halden Fowers, Ming Gang Liu, Kalin Ovtcharov, Steven Karl Reinhardt, Eric Sen Chung