Patents by Inventor Dheevatsa Mudigere

Dheevatsa Mudigere has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240126544
    Abstract: Disclosed embodiments relate to instructions for fused multiply-add (FMA) operations with variable-precision inputs. In one example, a processor to execute an asymmetric FMA instruction includes fetch circuitry to fetch an FMA instruction having fields to specify an opcode, a destination, and first and second source vectors having first and second widths, respectively, decode circuitry to decode the fetched FMA instruction, and a single instruction multiple data (SIMD) execution circuit to process as many elements of the second source vector as fit into an SIMD lane width by multiplying each element by a corresponding element of the first source vector, and accumulating a resulting product with previous contents of the destination, wherein the SIMD lane width is one of 16 bits, 32 bits, and 64 bits, the first width is one of 4 bits and 8 bits, and the second width is one of 1 bit, 2 bits, and 4 bits.
    Type: Application
    Filed: December 28, 2023
    Publication date: April 18, 2024
    Inventors: Dipankar DAS, Naveen K. MELLEMPUDI, Mrinmay DUTTA, Arun KUMAR, Dheevatsa MUDIGERE, Abhisek KUNDU
  • Patent number: 11900107
    Abstract: Disclosed embodiments relate to instructions for fused multiply-add (FMA) operations with variable-precision inputs. In one example, a processor to execute an asymmetric FMA instruction includes fetch circuitry to fetch an FMA instruction having fields to specify an opcode, a destination, and first and second source vectors having first and second widths, respectively, decode circuitry to decode the fetched FMA instruction, and a single instruction multiple data (SIMD) execution circuit to process as many elements of the second source vector as fit into an SIMD lane width by multiplying each element by a corresponding element of the first source vector, and accumulating a resulting product with previous contents of the destination, wherein the SIMD lane width is one of 16 bits, 32 bits, and 64 bits, the first width is one of 4 bits and 8 bits, and the second width is one of 1 bit, 2 bits, and 4 bits.
    Type: Grant
    Filed: March 25, 2022
    Date of Patent: February 13, 2024
    Assignee: Intel Corporation
    Inventors: Dipankar Das, Naveen K. Mellempudi, Mrinmay Dutta, Arun Kumar, Dheevatsa Mudigere, Abhisek Kundu
  • Patent number: 11893490
    Abstract: One embodiment provides for a computer-readable medium storing instructions that cause one or more processors to perform operations comprising determining a per-layer scale factor to apply to tensor data associated with layers of a neural network model and converting the tensor data to converted tensor data. The tensor data may be converted from a floating point datatype to a second datatype that is an 8-bit datatype. The instructions further cause the one or more processors to generate an output tensor based on the converted tensor data and the per-layer scale factor.
    Type: Grant
    Filed: November 30, 2022
    Date of Patent: February 6, 2024
    Assignee: Intel Corporation
    Inventors: Abhisek Kundu, Naveen Mellempudi, Dheevatsa Mudigere, Dipankar Das
  • Publication number: 20230351542
    Abstract: One embodiment provides for a graphics processing unit to perform computations associated with a neural network, the graphics processing unit comprising a hardware processing unit having a dynamic precision fixed-point unit that is configurable to convert elements of a floating-point tensor to convert the floating-point tensor into a fixed-point tensor.
    Type: Application
    Filed: April 24, 2023
    Publication date: November 2, 2023
    Applicant: Intel Corporation
    Inventors: Naveen K. MELLEMPUDI, DHEEVATSA MUDIGERE, DIPANKAR DAS, SRINIVAS SRIDHARAN
  • Publication number: 20230252263
    Abstract: A system comprises a first processing element, a second processing element, a point-to-point connection between the first processing element and the second processing element, and a communication bus connecting together at least the first processing element and the second processing element. The first processing element includes a first matrix computing unit and the second processing element includes a second matrix computing unit. The point-to-point connection is configured to provide at least a result of the first processing element to a data joiner component of the second processing element configured to join at least the provided result of the first processing element with a result of the second matrix computing unit.
    Type: Application
    Filed: April 13, 2023
    Publication date: August 10, 2023
    Inventors: Krishnakumar Nair, Dheevatsa Mudigere, Abdulkadir Utku Diril
  • Patent number: 11688032
    Abstract: A processor system comprises a memory organizer unit and a matrix computing unit. The memory organizer unit is configured to receive a request for a three-dimensional data of a convolutional neural network layer. The requested three-dimensional data is obtained from a memory. The obtained three-dimensional data is rearranged in an optimized linear order and the rearranged data in the optimized linear order is provided to the matrix computing unit. The matrix computing unit is configured to perform at least a portion of a three-dimensional convolution using at least a portion of the provided rearranged data in the optimized linear order.
    Type: Grant
    Filed: August 16, 2019
    Date of Patent: June 27, 2023
    Assignee: Meta Platforms, Inc.
    Inventors: Dheevatsa Mudigere, Krishnakumar Nair, Abdulkadir Utku Diril
  • Patent number: 11669933
    Abstract: One embodiment provides for a graphics processing unit to perform computations associated with a neural network, the graphics processing unit comprising a hardware processing unit having a dynamic precision fixed-point unit that is configurable to quantize elements of a floating-point tensor to convert the floating-point tensor into a dynamic fixed-point tensor.
    Type: Grant
    Filed: April 27, 2022
    Date of Patent: June 6, 2023
    Assignee: Intel Corporation
    Inventors: Naveen K. Mellempudi, Dheevatsa Mudigere, Dipankar Das, Srinivas Sridharan
  • Patent number: 11657252
    Abstract: A microprocessor system comprises a first processing element, a second processing element, a point-to-point connection between the first processing element and the second processing element, and a communication bus connecting together at least the first processing element and the second processing element. The first processing element includes a first matrix computing unit and the second processing element includes a second matrix computing unit. The point-to-point connection is configured to provide at least a result of the first processing element to a data joiner component of the second processing element configured to join at least the provided result of the first processing element with a result of the second matrix computing unit.
    Type: Grant
    Filed: June 7, 2019
    Date of Patent: May 23, 2023
    Assignee: Meta Platforms, Inc.
    Inventors: Krishnakumar Nair, Dheevatsa Mudigere, Abdulkadir Utku Diril
  • Publication number: 20230087364
    Abstract: One embodiment provides for a computer-readable medium storing instructions that cause one or more processors to perform operations comprising determining a per-layer scale factor to apply to tensor data associated with layers of a neural network model and converting the tensor data to converted tensor data. The tensor data may be converted from a floating point datatype to a second datatype that is an 8-bit datatype. The instructions further cause the one or more processors to generate an output tensor based on the converted tensor data and the per-layer scale factor.
    Type: Application
    Filed: November 30, 2022
    Publication date: March 23, 2023
    Applicant: Intel Corporation
    Inventors: Abhisek KUNDU, NAVEEN MELLEMPUDI, DHEEVATSA MUDIGERE, Dipankar DAS
  • Publication number: 20230053289
    Abstract: An apparatus to facilitate acceleration of machine learning operations is disclosed. The apparatus comprises at least one processor to perform operations to implement a neural network and accelerator logic to perform communicatively coupled to the processor to perform compute operations for the neural network.
    Type: Application
    Filed: June 21, 2022
    Publication date: February 16, 2023
    Applicant: Intel Corporation
    Inventors: Amit Bleiweiss, Anavai Ramesh, Asit Mishra, Deborah Marr, Jeffrey Cook, Srinivas Sridharan, Eriko Nurvitadhi, Elmoustapha Ould-Ahmed-Vall, Dheevatsa Mudigere, Mohammad Ashraf Bhuiyan, Md Faijul Amin, Wei Wang, Dhawal Srivastava, Niharika Maheshwari
  • Patent number: 11556772
    Abstract: One embodiment provides for a computing device comprising a parallel processor compute unit to perform a set of parallel integer compute operations; a ternarization unit including a weight ternarization circuit and an activation quantization circuit; wherein the weight ternarization circuit is to convert a weight tensor from a floating-point representation to a ternary representation including a ternary weight and a scale factor; wherein the activation quantization circuit is to convert an activation tensor from a floating-point representation to an integer representation; and wherein the parallel processor compute unit includes one or more circuits to perform the set of parallel integer compute operations on the ternary representation of the weight tensor and the integer representation of the activation tensor.
    Type: Grant
    Filed: January 12, 2018
    Date of Patent: January 17, 2023
    Assignee: Intel Corporation
    Inventors: Abhisek Kundu, Naveen Mellempudi, Dheevatsa Mudigere, Dipankar Das
  • Publication number: 20220343174
    Abstract: Described herein is a graphics processor including a processing resource including a multiplier configured to multiply input associated with the instruction at one of a first plurality of bit widths, an adder configured to add a product output from the multiplier with an accumulator value at one of a second plurality of bit widths, and circuitry to select a first bit width of the first plurality of bit widths for the multiplier and a second bit width of the second plurality of bit widths for the adder.
    Type: Application
    Filed: May 12, 2022
    Publication date: October 27, 2022
    Applicant: Intel Corporation
    Inventors: Dipankar Das, Roger Gramunt, Mikhail Smelyanskiy, Jesus Corbal, Dheevatsa Mudigere, Naveen K. Mellempudi, Alexander F. Heinecke
  • Patent number: 11481471
    Abstract: A system comprises a matrix processor unit that includes a first type of register, a group of a second type of registers, and a plurality of calculation units. The first type of register is configured to concurrently store values from different rows of a first matrix. At least a portion of the first type of register is logically divided into groups of elements, and each of the groups corresponds to a different row of the first matrix. Each of the second type of registers is configured to concurrently store values from a plurality of different rows of a second matrix. Each of the calculation units corresponds to one of the second type of registers and is configured to at least in part determine a corresponding element in a result matrix of convoluting the second matrix with the first matrix.
    Type: Grant
    Filed: August 16, 2019
    Date of Patent: October 25, 2022
    Assignee: Meta Platforms, Inc.
    Inventors: Krishnakumar Nair, Abdulkadir Utku Diril, Dheevatsa Mudigere, Olivia Wu, Ehsan Khish Ardestani Zadeh, Yuchen Hao
  • Publication number: 20220327656
    Abstract: One embodiment provides for a graphics processing unit to perform computations associated with a neural network, the graphics processing unit comprising a hardware processing unit having a dynamic precision fixed-point unit that is configurable to quantize elements of a floating-point tensor to convert the floating-point tensor into a dynamic fixed-point tensor.
    Type: Application
    Filed: April 27, 2022
    Publication date: October 13, 2022
    Applicant: Intel Corporation
    Inventors: Naveen K. MELLEMPUDI, DHEEVATSA MUDIGERE, DIPANKAR DAS, SRINIVAS SRIDHARAN
  • Publication number: 20220214877
    Abstract: Disclosed embodiments relate to instructions for fused multiply-add (FMA) operations with variable-precision inputs. In one example, a processor to execute an asymmetric FMA instruction includes fetch circuitry to fetch an FMA instruction having fields to specify an opcode, a destination, and first and second source vectors having first and second widths, respectively, decode circuitry to decode the fetched FMA instruction, and a single instruction multiple data (SIMD) execution circuit to process as many elements of the second source vector as fit into an SIMD lane width by multiplying each element by a corresponding element of the first source vector, and accumulating a resulting product with previous contents of the destination, wherein the SIMD lane width is one of 16 bits, 32 bits, and 64 bits, the first width is one of 4 bits and 8 bits, and the second width is one of 1 bit, 2 bits, and 4 bits.
    Type: Application
    Filed: March 25, 2022
    Publication date: July 7, 2022
    Inventors: Dipankar DAS, Naveen K. MELLEMPUDI, Mrinmay DUTTA, Arun KUMAR, Dheevatsa MUDIGERE, Abhisek KUNDU
  • Patent number: 11373088
    Abstract: An apparatus to facilitate acceleration of machine learning operations is disclosed. The apparatus comprises at least one processor to perform operations to implement a neural network and accelerator logic to perform communicatively coupled to the processor to perform compute operations for the neural network.
    Type: Grant
    Filed: December 30, 2017
    Date of Patent: June 28, 2022
    Assignee: INTEL CORPORATION
    Inventors: Amit Bleiweiss, Anavai Ramesh, Asit Mishra, Deborah Marr, Jeffrey Cook, Srinivas Sridharan, Eriko Nurvitadhi, Elmoustapha Ould-Ahmed-Vall, Dheevatsa Mudigere, Mohammad Ashraf Bhuiyan, Md Faijul Amin, Wei Wang, Dhawal Srivastava, Niharika Maheshwari
  • Patent number: 11334796
    Abstract: A processing cluster of a processing cluster array comprises a plurality of registers to store input values of vector input operands, the input values of at least some of the vector input operands having different bit lengths than those of other input values of other vector input operands, and a compute unit to execute a dot-product instruction with the vector input operands to perform a number of parallel multiply operations and an accumulate operation per 32-bit lane based on a bit length of the smallest-sized input value of a first vector input operand relative to the 32-bit lane.
    Type: Grant
    Filed: August 3, 2020
    Date of Patent: May 17, 2022
    Assignee: Intel Corporation
    Inventors: Dipankar Das, Roger Gramunt, Mikhail Smelyanskiy, Jesus Corbal, Dheevatsa Mudigere, Naveen K. Mellempudi, Alexander F. Heinecke
  • Patent number: 11321086
    Abstract: Disclosed embodiments relate to instructions for fused multiply-add (FMA) operations with variable-precision inputs. In one example, a processor to execute an asymmetric FMA instruction includes fetch circuitry to fetch an FMA instruction having fields to specify an opcode, a destination, and first and second source vectors having first and second widths, respectively, decode circuitry to decode the fetched FMA instruction, and a single instruction multiple data (SIMD) execution circuit to process as many elements of the second source vector as fit into an SIMD lane width by multiplying each element by a corresponding element of the first source vector, and accumulating a resulting product with previous contents of the destination, wherein the SIMD lane width is one of 16 bits, 32 bits, and 64 bits, the first width is one of 4 bits and 8 bits, and the second width is one of 1 bit, 2 bits, and 4 bits.
    Type: Grant
    Filed: January 6, 2020
    Date of Patent: May 3, 2022
    Assignee: Intel Corporation
    Inventors: Dipankar Das, Naveen K. Mellempudi, Mrinmay Dutta, Arun Kumar, Dheevatsa Mudigere, Abhisek Kundu
  • Patent number: 11321805
    Abstract: One embodiment provides for a graphics processing unit to perform computations associated with a neural network, the graphics processing unit comprising compute unit including a hardware logic unit having dynamic precision fixed-point logic, the compute unit to receive a set of dynamic fixed-point tensors, compute, via the dynamic precision fixed-point logic, a right-shift value using an absolute maximum value within the set of dynamic fixed-point tensors and a dynamic range of the set of dynamic fixed-point tensors, right-shift data values within the set of dynamic fixed-point tensors based on the right-shift value, increment a shared exponent associated with the set of dynamic fixed-point tensors based on the right-shift value, perform a compute operation on the set of dynamic fixed-point tensors, and generate an output tensor via the compute operation on the set of dynamic fixed-point tensors.
    Type: Grant
    Filed: October 29, 2020
    Date of Patent: May 3, 2022
    Assignee: Intel Corporation
    Inventors: Naveen Mellempudi, Dheevatsa Mudigere, Dipankar Das, Srinivas Sridharan
  • Publication number: 20210342692
    Abstract: Technologies for artificial neural network training include a computing node with a host fabric interface that sends a message that includes one or more artificial neural network training algorithm values to another computing node in response to receipt of a request to send the message. Prior to sending the message, the host fabric interface may receive a request to quantize the message and quantize the message based on a quantization level included in the request to generate a quantized message. The quantization message includes one or more quantized values such that each quantized value has a lower precision than a corresponding artificial neural network training algorithm value. The host fabric interface then transmits the quantized message, which includes metadata indicative of the quantization level, to another computing node in response to quantization of the message for artificial neural network training. Other embodiments are described and claimed.
    Type: Application
    Filed: May 14, 2021
    Publication date: November 4, 2021
    Inventors: Naveen K. Mellempudi, Srinivas Sridharan, Dheevatsa Mudigere, Dipankar Das