Patents by Inventor Nitin Garegrat

Nitin Garegrat has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Storing tensors in memory based on depth

Patent number: 11748251

Abstract: Embodiments of the present disclosure include systems and methods for storing tensors in memory based on depth. In some embodiments, for each of a plurality of sets of elements in a three-dimensional (3D) matrix, a position is determined along a height axis and width axis of the 3D matrix. At the determined position, a set of elements are identified along a depth axis of the 3D matrix. The set of elements are stored in a contiguous block of memory.

Type: Grant

Filed: January 8, 2021

Date of Patent: September 5, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Nitin Garegrat, Shankar Narayan, Derek Gladding
STORING TENSORS IN MEMORY BASED ON DEPTH

Publication number: 20220222174

Abstract: Embodiments of the present disclosure include systems and methods for storing tensors in memory based on depth. In some embodiments, for each of a plurality of sets of elements in a three-dimensional (3D) matrix, a position is determined along a height axis and width axis of the 3D matrix. At the determined position, a set of elements are identified along a depth axis of the 3D matrix. The set of elements are stored in a contiguous block of memory.

Type: Application

Filed: January 8, 2021

Publication date: July 14, 2022

Inventors: Nitin Garegrat, Shankar Narayan, Derek Gladding
PERFORMING TENSOR OPERATIONS USING A PROGRAMMABLE CONTROL ENGINE

Publication number: 20220222318

Abstract: Embodiments of the present disclosure include systems and methods for performing tensor operations using a programmable control engine. A command queue is configured to receive a command from a software application. A configuration storage is configured to store a plurality of configurations. A matrix multiplication unit is configured to perform matrix multiplication operations. Memory is configured to store matrices. A control engine is configured to retrieve the command from the command queue; retrieve a configuration from the configuration storage based on the command; generate, based on the command and the configuration, instructions for the matrix multiplication unit to perform a set of matrix multiplication operations on first and second matrices stored in the memory; send the instructions to the matrix multiplication unit to configure the matrix multiplication unit to output results of the set of matrix multiplication operations; and store the results in a third matrix in the memory.

Type: Application

Filed: January 8, 2021

Publication date: July 14, 2022

Inventors: Nitin Garegrat, Derek Gladding, Shankar Narayan, Sujatha Santhanaraman, Jayadev Velagandula
Apparatus and method for a masked multiply instruction to support neural network pruning operations

Patent number: 10929503

Abstract: An apparatus and method for a masked multiply instruction to support neural network pruning operations. For example, one embodiment of a processor comprises: a decoder to decode a matrix multiplication with masking (GEMM) instruction identifying a destination matrix register to store a result, and source registers storing an A-matrix, a B-matrix, and a matrix mask; execution circuitry to execute the GEMM instruction, the execution circuitry to multiply a plurality of B-matrix elements with a plurality of A-matrix elements, each of the B-matrix elements associated with a mask value in the matrix mask, wherein if the mask value is set to a first value, then the execution circuitry is to multiply the B-matrix element with one or more of the A-matrix elements to generate a first partial result, and if the mask value is set to a second value, then the execution circuitry is to multiply an alternate B-matrix element with a one or more of the A-matrix elements to generate a second partial result.

Type: Grant

Filed: December 21, 2018

Date of Patent: February 23, 2021

Assignee: Intel Corporation

Inventors: Omid Azizi, Chen Koren, Nitin Garegrat
Apparatus and method for coherent, accelerated conversion between data representations

Patent number: 10761757

Abstract: An apparatus and method for a converting tensor data. For example, one embodiment of a method comprises: fetching source tensor blocks of a source tensor data structure, each source tensor block comprising a plurality of source tensor data elements having a first numeric representation, wherein the source tensor data structure comprises a predefined structural arrangement of source tensor blocks; converting the one or more source tensor blocks into one or more destination tensor blocks comprising a plurality of destination tensor data elements having a second numeric representation different from the first numeric representation, wherein the sets of one or more source tensor blocks are converted to one or more corresponding destination tensor blocks in a specified order based on the first and second numeric representations; and storing each individual destination tensor block in a designated memory region to maintain coherency with the predefined structural arrangement of the source tensor blocks.

Type: Grant

Filed: June 30, 2018

Date of Patent: September 1, 2020

Assignee: Intel Corporation

Inventors: Krishnakumar Nair, Andrew Yang, Michael Rotzin, Nitin Garegrat, Tom Schebye, Tony Werner
APPARATUS AND METHOD FOR A MASKED MULTIPLY INSTRUCTION TO SUPPORT NEURAL NETWORK PRUNING OPERATIONS

Publication number: 20190121837

Abstract: An apparatus and method for a masked multiply instruction to support neural network pruning operations. For example, one embodiment of a processor comprises: a decoder to decode a matrix multiplication with masking (GEMM) instruction identifying a destination matrix register to store a result, and source registers storing an A-matrix, a B-matrix, and a matrix mask; execution circuitry to execute the GEMM instruction, the execution circuitry to multiply a plurality of B-matrix elements with a plurality of A-matrix elements, each of the B-matrix elements associated with a mask value in the matrix mask, wherein if the mask value is set to a first value, then the execution circuitry is to multiply the B-matrix element with one or more of the A-matrix elements to generate a first partial result, and if the mask value is set to a second value, then the execution circuitry is to multiply an alternate B-matrix element with a one or more of the A-matrix elements to generate a second partial result.

Type: Application

Filed: December 21, 2018

Publication date: April 25, 2019

Inventors: OMID AZIZI, CHEN KOREN, NITIN GAREGRAT
APPARATUS AND METHOD FOR COHERENT, ACCELERATED CONVERSION BETWEEN DATA REPRESENTATIONS

Publication number: 20190042094

Abstract: An apparatus and method for a converting tensor data. For example, one embodiment of a method comprises: fetching source tensor blocks of a source tensor data structure, each source tensor block comprising a plurality of source tensor data elements having a first numeric representation, wherein the source tensor data structure comprises a predefined structural arrangement of source tensor blocks; converting the one or more source tensor blocks into one or more destination tensor blocks comprising a plurality of destination tensor data elements having a second numeric representation different from the first numeric representation, wherein the sets of one or more source tensor blocks are converted to one or more corresponding destination tensor blocks in a specified order based on the first and second numeric representations; and storing each individual destination tensor block in a designated memory region to maintain coherency with the predefined structural arrangement of the source tensor blocks.

Type: Application

Filed: June 30, 2018

Publication date: February 7, 2019

Inventors: Krishnakumar Nair, Andrew Yang, Michael Rotzn, Nitin Garegrat, Tom Schebye, Tony Werner

Storing tensors in memory based on depth

STORING TENSORS IN MEMORY BASED ON DEPTH

PERFORMING TENSOR OPERATIONS USING A PROGRAMMABLE CONTROL ENGINE

Apparatus and method for a masked multiply instruction to support neural network pruning operations

Apparatus and method for coherent, accelerated conversion between data representations

APPARATUS AND METHOD FOR A MASKED MULTIPLY INSTRUCTION TO SUPPORT NEURAL NETWORK PRUNING OPERATIONS

APPARATUS AND METHOD FOR COHERENT, ACCELERATED CONVERSION BETWEEN DATA REPRESENTATIONS