Patents by Inventor Kalin Ovtcharov
Kalin Ovtcharov has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 10241970Abstract: Comparisons between two nucleotide sequences can be performed by customized integrated circuitry that can implement a Smith Waterman analysis in a reduced memory footprint, storing and referencing only individual portions, or subsections, of a two-dimensional matrix that is representative of the comparison between the two nucleotide sequences. As the backtracking proceeds, backtracking metadata corresponding to a cell from a subsection that is not currently retained in memory can be required. Such a subsection can be regenerated from previously generated scores associated with checkpoint cells of the two-dimensional matrix that comprise two edges of the subsection being regenerated.Type: GrantFiled: November 14, 2016Date of Patent: March 26, 2019Assignee: Microsoft Technology Licensing, LLCInventors: Daniel Lo, Eric Chung, Kalin Ovtcharov, Ravindra Pandya, David Heckerman, Roman Snytsar
-
Patent number: 10167800Abstract: Processors and methods for neural network processing are provided. A method includes receiving vector data corresponding to a layer of a neural network model, where each of the vector data has a value comprising at least one exponent. The method further includes first processing a first subset of the vector data to determine a first shared exponent for representing values in the first subset of the vector data in a block-floating point format and second processing a second subset of the vector data to determine a second shared exponent for representing values in the second subset of the vector data in a block-floating point format in a manner that no vector data from the second subset of the vector data influences a determination of the first shared exponent and no vector data from the first subset of the vector data influences a determination of the second shared exponent.Type: GrantFiled: August 18, 2017Date of Patent: January 1, 2019Assignee: Microsoft Technology Licensing, LLCInventors: Eric S. Chung, Douglas C. Burger, Daniel Lo, Kalin Ovtcharov
-
Publication number: 20180341484Abstract: A hardware accelerator having an efficient instruction set is disclosed. An apparatus may comprise logic configured to access a first and a second machine instruction. The second machine instruction may be missing a tensor operand needed to execute the second machine instruction. The logic may be further configured to execute the first machine instruction, resulting in a tensor. The logic may be further configured to execute the second machine instruction using the resultant tensor as the missing tensor operand.Type: ApplicationFiled: May 24, 2017Publication date: November 29, 2018Applicant: Microsoft Technology Licensing, LLCInventors: Jeremy Halden FOWERS, Kalin OVTCHAROV, Steven Karl REINHARDT, Eric Sen CHUNG, Ming Gang LIU
-
Publication number: 20180341486Abstract: A processor circuit is provided that includes an input terminal and an output terminal, a plurality of vector processor operation circuits, a selector circuit coupled to the input terminal, the output terminal, and each of the vector processor operation circuits, and a scheduler circuit adapted to control the selector circuit to configure a vector processing pipeline comprising zero, one or more of the vector processor operation circuits in any order between the input terminal and the output terminal.Type: ApplicationFiled: May 24, 2017Publication date: November 29, 2018Applicant: Microsoft Technology Licensing, LLCInventors: Jeremy Halden FOWERS, Ming Gang LIU, Kalin OVTCHAROV, Steven Karl REINHARDT, Eric Sen CHUNG
-
Publication number: 20180341483Abstract: Tensor register files in a hardware accelerator are disclosed. An apparatus may comprise tensor operation calculators each configured to perform a type of tensor operation. The apparatus may also comprises tensor register files, each of which is associated with one of the tensor operation calculators. The apparatus may also comprises logic configured to store respective ones of the tensors in the plurality of tensor register files in accordance with the type of tensor operation to be performed on the respective tensors. The apparatus may also control read access to tensor register files based on a type of tensor operation that a machine instruction is to perform.Type: ApplicationFiled: May 24, 2017Publication date: November 29, 2018Applicant: Microsoft Technology Licensing, LLCInventors: Jeremy Halden FOWERS, Steven Karl REINHARDT, Kalin OVTCHAROV, Eric Sen CHUNG
-
Publication number: 20180247190Abstract: Systems and methods for neural network processing are provided. A method in a system comprising a plurality of nodes interconnected via a network, where each node includes a plurality of on-chip memory blocks and a plurality of compute units, is provided. The method includes upon service activation receiving an N by M matrix of coefficients corresponding to the neural network model. The method includes loading the coefficients corresponding to the neural network model into the plurality of the on-chip memory blocks for processing by the plurality of compute units. The method includes regardless of a utilization of the plurality of the on-chip memory blocks as part of an evaluation of the neural network model, maintaining the coefficients corresponding to the neural network model in the plurality of the on-chip memory blocks until the service is interrupted or the neural network model is modified or replaced.Type: ApplicationFiled: June 29, 2017Publication date: August 30, 2018Inventors: Eric S. Chung, Douglas C. Burger, Jeremy Fowers, Kalin Ovtcharov
-
Publication number: 20180137085Abstract: Comparisons between two nucleotide sequences can be performed by customized integrated circuity that can implement a Smith Waterman analysis in a reduced memory footprint, storing and referencing only individual portions, or subsections, of a two-dimensional matrix that is representative of the comparison between the two nucleotide sequences. As the backtracking proceeds, backtracking metadata corresponding to a cell from a subsection that is not currently retained in memory can be required. Such a subsection can be regenerated from previously generated scores associated with checkpoint cells of the two-dimensional matrix that comprise two edges of the subsection being regenerated.Type: ApplicationFiled: November 14, 2016Publication date: May 17, 2018Inventors: Daniel Lo, Eric Chung, Kalin Ovtcharov, Ravindra Pandya, David Heckerman, Roman Snytsar
-
Publication number: 20180137237Abstract: Comparisons between two nucleotide sequences can be performed by customized integrated circuitry that can implement a Smith Waterman analysis in series, as opposed to the parallel implementations known in the art. Series performance enables such customized integrated circuitry to take advantage of optimizations, including enveloping thresholds that demarcate between cells of a two-dimensional matrix for which nucleotide comparisons are to be performed, and cells of the two-dimensional matrix for which no such comparison need be performed, and, instead, a value of zero can simply be entered. Additionally, such customized integrated circuitry facilitates the combination of multiple control units, each directing the comparison of a unique pair of nucleotides, with a single calculation engine that can generate values for individual cells of the two-dimensional matrices by which such pairs of nucleotides are compared.Type: ApplicationFiled: November 11, 2016Publication date: May 17, 2018Inventors: Daniel Lo, Eric Chung, Kalin Ovtcharov, Ravindra Pandya, David Heckerman
-
Publication number: 20160379108Abstract: A method is provided for implementing a deep neural network on a server component that includes a host component including a CPU and a hardware acceleration component coupled to the host component. The deep neural network includes a plurality of layers. The method includes partitioning the deep neural network into a first segment and a second segment, the first segment including a first subset of the plurality of layers, the second segment including a second subset of the plurality of layers, configuring the host component to implement the first segment, and configuring the hardware acceleration component to implement the second segment.Type: ApplicationFiled: June 29, 2015Publication date: December 29, 2016Inventors: Eric Chung, Karin Strauss, Kalin Ovtcharov, Joo-Young Kim, Olatunji Ruwase
-
Publication number: 20160379109Abstract: A hardware acceleration component is provided for implementing a convolutional neural network. The hardware acceleration component includes an array of N rows and M columns of functional units, an array of N input data buffers configured to store input data, and an array of M weights data buffers configured to store weights data. Each of the N input data buffers is coupled to a corresponding one of the N rows of functional units. Each of the M weights data buffers is coupled to a corresponding one of the M columns of functional units. Each functional unit in a row is configured to receive a same set of input data. Each functional unit in a column is configured to receive a same set of weights data from the weights data buffer coupled to the row. Each of the functional units is configured to perform a convolution of the received input data and the received weights data, and the M columns of functional units are configured to provide M planes of output data.Type: ApplicationFiled: June 29, 2015Publication date: December 29, 2016Inventors: Eric Chung, Karin Strauss, Kalin Ovtcharov, Joo-Young Kim, Olatunji Ruwase
-
Patent number: 9367519Abstract: Various embodiments relating to encoding a sparse matrix into a data structure format that may be efficiently processed via parallel processing of a computing system are provided. In one embodiment, a sparse matrix may be received. A set of designated rows of the sparse matrix may be traversed until all non-zero elements in the sparse matrix have been placed in a first array. Each time a row in the set is traversed, a next non-zero element in that row may be placed in the first array. If all non-zero elements for a given row of the set of designated rows have been placed in the first array, the given row may be replaced in the set of designated rows with a next unprocessed row of the sparse matrix. The data structure in which the sparse matrix is encoded may be outputted. The data structure may include the first array.Type: GrantFiled: August 30, 2013Date of Patent: June 14, 2016Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Karin Strauss, Jeremy Fowers, Kalin Ovtcharov
-
Publication number: 20150067009Abstract: Various embodiments relating to encoding a sparse matrix into a data structure format that may be efficiently processed via parallel processing of a computing system are provided. In one embodiment, a sparse matrix may be received. A set of designated rows of the sparse matrix may be traversed until all non-zero elements in the sparse matrix have been placed in a first array. Each time a row in the set is traversed, a next non-zero element in that row may be placed in the first array. If all non-zero elements for a given row of the set of designated rows have been placed in the first array, the given row may be replaced in the set of designated rows with a next unprocessed row of the sparse matrix. The data structure in which the sparse matrix is encoded may be outputted. The data structure may include the first array.Type: ApplicationFiled: August 30, 2013Publication date: March 5, 2015Applicant: Microsoft CorporationInventors: Karin Strauss, Jeremy Fowers, Kalin Ovtcharov