Patents by Inventor Dheevatsa Mudigere

Dheevatsa Mudigere has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Instructions for fused multiply-add operations with variable precision input operands

Patent number: 12288062

Abstract: Disclosed embodiments relate to instructions for fused multiply-add (FMA) operations with variable-precision inputs. In one example, a processor to execute an asymmetric FMA instruction includes fetch circuitry to fetch an FMA instruction having fields to specify an opcode, a destination, and first and second source vectors having first and second widths, respectively, decode circuitry to decode the fetched FMA instruction, and a single instruction multiple data (SIMD) execution circuit to process as many elements of the second source vector as fit into an SIMD lane width by multiplying each element by a corresponding element of the first source vector, and accumulating a resulting product with previous contents of the destination, wherein the SIMD lane width is one of 16 bits, 32 bits, and 64 bits, the first width is one of 4 bits and 8 bits, and the second width is one of 1 bit, 2 bits, and 4 bits.

Type: Grant

Filed: December 28, 2023

Date of Patent: April 29, 2025

Assignee: Intel Corporation

Inventors: Dipankar Das, Naveen K. Mellempudi, Mrinmay Dutta, Arun Kumar, Dheevatsa Mudigere, Abhisek Kundu
Incremental precision networks using residual inference and fine-grain quantization

Patent number: 12198055

Abstract: One embodiment provides for a computer-readable medium storing instructions that cause one or more processors to perform operations comprising determining a per-layer scale factor to apply to tensor data associated with layers of a neural network model and converting the tensor data to converted tensor data. The tensor data may be converted from a floating point datatype to a second datatype that is an 8-bit datatype. The instructions further cause the one or more processors to generate an output tensor based on the converted tensor data and the per-layer scale factor.

Type: Grant

Filed: December 7, 2023

Date of Patent: January 14, 2025

Assignee: Intel Corporation

Inventors: Abhisek Kundu, Naveen Mellempudi, Dheevatsa Mudigere, Dipankar Das
MACHINE LEARNING ACCELERATOR MECHANISM

Publication number: 20240403620

Abstract: An apparatus to facilitate acceleration of machine learning operations is disclosed. The apparatus comprises at least one processor to perform operations to implement a neural network and accelerator logic to perform communicatively coupled to the processor to perform compute operations for the neural network.

Type: Application

Filed: May 31, 2024

Publication date: December 5, 2024

Applicant: Intel Corporation

Inventors: Amit Bleiweiss, Anavai Ramesh, Asit Mishra, Deborah Marr, Jeffrey Cook, Srinivas Sridharan, Eriko Nurvitadhi, Elmoustapha Ould-Ahmed-Vall, Dheevatsa Mudigere, Mohammad Ashraf Bhuiyan, Md Faijul Amin, Wei Wang, Dhawal Srivastava, Niharika Maheshwari
Fine-grain compute communication execution for deep learning frameworks via hardware accelerated point-to-point primitives

Patent number: 12154028

Abstract: One embodiment provides for a system to configure distributed training of a neural network. The system includes memory to store a library to facilitate transmission of data during distributed training of the neural network; a network interface to transmit and receive gradient data associated with the trainable parameters; a general-purpose processor to execute instructions provided by the library, the instructions to cause the general-purpose processor to configure the network interface to transmit and receive the gradient data associated with the trainable parameters during a workflow of a machine learning framework; and a graphics processor to perform compute operations associated with machine learning framework workflow to generate the gradient data associated with the trainable parameters, wherein, based on the machine learning framework workflow, the library is to interleave the compute operations on the graphics processor with transmission and receipt of gradient data via the network interface.

Type: Grant

Filed: January 12, 2018

Date of Patent: November 26, 2024

Assignee: Intel Corporation

Inventors: Srinivas Sridharan, Dheevatsa Mudigere
Machine learning accelerator mechanism

Patent number: 12039435

Abstract: An apparatus to facilitate acceleration of machine learning operations is disclosed. The apparatus comprises at least one processor to perform operations to implement a neural network and accelerator logic to perform communicatively coupled to the processor to perform compute operations for the neural network.

Type: Grant

Filed: June 21, 2022

Date of Patent: July 16, 2024

Assignee: INTEL CORPORATION

Inventors: Amit Bleiweiss, Anavai Ramesh, Asit Mishra, Deborah Marr, Jeffrey Cook, Srinivas Sridharan, Eriko Nurvitadhi, Elmoustapha Ould-Ahmed-Vall, Dheevatsa Mudigere, Mohammad Ashraf Bhuiyan, Md Faijul Amin, Wei Wang, Dhawal Srivastava, Niharika Maheshwari
Dynamic precision management for integer deep learning primitives

Patent number: 12033237

Abstract: One embodiment provides for a graphics processing unit to perform computations associated with a neural network, the graphics processing unit comprising a hardware processing unit having a dynamic precision fixed-point unit that is configurable to convert elements of a floating-point tensor to convert the floating-point tensor into a fixed-point tensor.

Type: Grant

Filed: April 24, 2023

Date of Patent: July 9, 2024

Assignee: Intel Corporation

Inventors: Naveen K. Mellempudi, Dheevatsa Mudigere, Dipankar Das, Srinivas Sridharan
Instructions for fused multiply-add operations with variable precision input operands

Patent number: 11900107

Abstract: Disclosed embodiments relate to instructions for fused multiply-add (FMA) operations with variable-precision inputs. In one example, a processor to execute an asymmetric FMA instruction includes fetch circuitry to fetch an FMA instruction having fields to specify an opcode, a destination, and first and second source vectors having first and second widths, respectively, decode circuitry to decode the fetched FMA instruction, and a single instruction multiple data (SIMD) execution circuit to process as many elements of the second source vector as fit into an SIMD lane width by multiplying each element by a corresponding element of the first source vector, and accumulating a resulting product with previous contents of the destination, wherein the SIMD lane width is one of 16 bits, 32 bits, and 64 bits, the first width is one of 4 bits and 8 bits, and the second width is one of 1 bit, 2 bits, and 4 bits.

Type: Grant

Filed: March 25, 2022

Date of Patent: February 13, 2024

Assignee: Intel Corporation

Inventors: Dipankar Das, Naveen K. Mellempudi, Mrinmay Dutta, Arun Kumar, Dheevatsa Mudigere, Abhisek Kundu
Incremental precision networks using residual inference and fine-grain quantization

Patent number: 11893490

Abstract: One embodiment provides for a computer-readable medium storing instructions that cause one or more processors to perform operations comprising determining a per-layer scale factor to apply to tensor data associated with layers of a neural network model and converting the tensor data to converted tensor data. The tensor data may be converted from a floating point datatype to a second datatype that is an 8-bit datatype. The instructions further cause the one or more processors to generate an output tensor based on the converted tensor data and the per-layer scale factor.

Type: Grant

Filed: November 30, 2022

Date of Patent: February 6, 2024

Assignee: Intel Corporation

Inventors: Abhisek Kundu, Naveen Mellempudi, Dheevatsa Mudigere, Dipankar Das
POINT TO POINT CONNECTED PROCESSING ELEMENTS WITH DATA JOINER COMPONENTS

Publication number: 20230252263

Abstract: A system comprises a first processing element, a second processing element, a point-to-point connection between the first processing element and the second processing element, and a communication bus connecting together at least the first processing element and the second processing element. The first processing element includes a first matrix computing unit and the second processing element includes a second matrix computing unit. The point-to-point connection is configured to provide at least a result of the first processing element to a data joiner component of the second processing element configured to join at least the provided result of the first processing element with a result of the second matrix computing unit.

Type: Application

Filed: April 13, 2023

Publication date: August 10, 2023

Inventors: Krishnakumar Nair, Dheevatsa Mudigere, Abdulkadir Utku Diril
Three-dimensional convolution pipeline with memory organizer unit

Patent number: 11688032

Abstract: A processor system comprises a memory organizer unit and a matrix computing unit. The memory organizer unit is configured to receive a request for a three-dimensional data of a convolutional neural network layer. The requested three-dimensional data is obtained from a memory. The obtained three-dimensional data is rearranged in an optimized linear order and the rearranged data in the optimized linear order is provided to the matrix computing unit. The matrix computing unit is configured to perform at least a portion of a three-dimensional convolution using at least a portion of the provided rearranged data in the optimized linear order.

Type: Grant

Filed: August 16, 2019

Date of Patent: June 27, 2023

Assignee: Meta Platforms, Inc.

Inventors: Dheevatsa Mudigere, Krishnakumar Nair, Abdulkadir Utku Diril
Dynamic precision management for integer deep learning primitives

Patent number: 11669933

Abstract: One embodiment provides for a graphics processing unit to perform computations associated with a neural network, the graphics processing unit comprising a hardware processing unit having a dynamic precision fixed-point unit that is configurable to quantize elements of a floating-point tensor to convert the floating-point tensor into a dynamic fixed-point tensor.

Type: Grant

Filed: April 27, 2022

Date of Patent: June 6, 2023

Assignee: Intel Corporation

Inventors: Naveen K. Mellempudi, Dheevatsa Mudigere, Dipankar Das, Srinivas Sridharan
Point to point connected processing elements with data joiner components

Patent number: 11657252

Abstract: A microprocessor system comprises a first processing element, a second processing element, a point-to-point connection between the first processing element and the second processing element, and a communication bus connecting together at least the first processing element and the second processing element. The first processing element includes a first matrix computing unit and the second processing element includes a second matrix computing unit. The point-to-point connection is configured to provide at least a result of the first processing element to a data joiner component of the second processing element configured to join at least the provided result of the first processing element with a result of the second matrix computing unit.

Type: Grant

Filed: June 7, 2019

Date of Patent: May 23, 2023

Assignee: Meta Platforms, Inc.

Inventors: Krishnakumar Nair, Dheevatsa Mudigere, Abdulkadir Utku Diril
MACHINE LEARNING ACCELERATOR MECHANISM

Publication number: 20230053289

Abstract: An apparatus to facilitate acceleration of machine learning operations is disclosed. The apparatus comprises at least one processor to perform operations to implement a neural network and accelerator logic to perform communicatively coupled to the processor to perform compute operations for the neural network.

Type: Application

Filed: June 21, 2022

Publication date: February 16, 2023

Applicant: Intel Corporation

Inventors: Amit Bleiweiss, Anavai Ramesh, Asit Mishra, Deborah Marr, Jeffrey Cook, Srinivas Sridharan, Eriko Nurvitadhi, Elmoustapha Ould-Ahmed-Vall, Dheevatsa Mudigere, Mohammad Ashraf Bhuiyan, Md Faijul Amin, Wei Wang, Dhawal Srivastava, Niharika Maheshwari
Incremental precision networks using residual inference and fine-grain quantization

Patent number: 11556772

Abstract: One embodiment provides for a computing device comprising a parallel processor compute unit to perform a set of parallel integer compute operations; a ternarization unit including a weight ternarization circuit and an activation quantization circuit; wherein the weight ternarization circuit is to convert a weight tensor from a floating-point representation to a ternary representation including a ternary weight and a scale factor; wherein the activation quantization circuit is to convert an activation tensor from a floating-point representation to an integer representation; and wherein the parallel processor compute unit includes one or more circuits to perform the set of parallel integer compute operations on the ternary representation of the weight tensor and the integer representation of the activation tensor.

Type: Grant

Filed: January 12, 2018

Date of Patent: January 17, 2023

Assignee: Intel Corporation

Inventors: Abhisek Kundu, Naveen Mellempudi, Dheevatsa Mudigere, Dipankar Das
OPTIMIZED COMPUTE HARDWARE FOR MACHINE LEARNING OPERATIONS

Publication number: 20220343174

Abstract: Described herein is a graphics processor including a processing resource including a multiplier configured to multiply input associated with the instruction at one of a first plurality of bit widths, an adder configured to add a product output from the multiplier with an accumulator value at one of a second plurality of bit widths, and circuitry to select a first bit width of the first plurality of bit widths for the multiplier and a second bit width of the second plurality of bit widths for the adder.

Type: Application

Filed: May 12, 2022

Publication date: October 27, 2022

Applicant: Intel Corporation

Inventors: Dipankar Das, Roger Gramunt, Mikhail Smelyanskiy, Jesus Corbal, Dheevatsa Mudigere, Naveen K. Mellempudi, Alexander F. Heinecke
Mapping convolution to a matrix processor unit

Patent number: 11481471

Abstract: A system comprises a matrix processor unit that includes a first type of register, a group of a second type of registers, and a plurality of calculation units. The first type of register is configured to concurrently store values from different rows of a first matrix. At least a portion of the first type of register is logically divided into groups of elements, and each of the groups corresponds to a different row of the first matrix. Each of the second type of registers is configured to concurrently store values from a plurality of different rows of a second matrix. Each of the calculation units corresponds to one of the second type of registers and is configured to at least in part determine a corresponding element in a result matrix of convoluting the second matrix with the first matrix.

Type: Grant

Filed: August 16, 2019

Date of Patent: October 25, 2022

Assignee: Meta Platforms, Inc.

Inventors: Krishnakumar Nair, Abdulkadir Utku Diril, Dheevatsa Mudigere, Olivia Wu, Ehsan Khish Ardestani Zadeh, Yuchen Hao
Machine learning accelerator mechanism

Patent number: 11373088

Abstract: An apparatus to facilitate acceleration of machine learning operations is disclosed. The apparatus comprises at least one processor to perform operations to implement a neural network and accelerator logic to perform communicatively coupled to the processor to perform compute operations for the neural network.

Type: Grant

Filed: December 30, 2017

Date of Patent: June 28, 2022

Assignee: INTEL CORPORATION

Inventors: Amit Bleiweiss, Anavai Ramesh, Asit Mishra, Deborah Marr, Jeffrey Cook, Srinivas Sridharan, Eriko Nurvitadhi, Elmoustapha Ould-Ahmed-Vall, Dheevatsa Mudigere, Mohammad Ashraf Bhuiyan, Md Faijul Amin, Wei Wang, Dhawal Srivastava, Niharika Maheshwari
Optimized compute hardware for machine learning operations

Patent number: 11334796

Abstract: A processing cluster of a processing cluster array comprises a plurality of registers to store input values of vector input operands, the input values of at least some of the vector input operands having different bit lengths than those of other input values of other vector input operands, and a compute unit to execute a dot-product instruction with the vector input operands to perform a number of parallel multiply operations and an accumulate operation per 32-bit lane based on a bit length of the smallest-sized input value of a first vector input operand relative to the 32-bit lane.

Type: Grant

Filed: August 3, 2020

Date of Patent: May 17, 2022

Assignee: Intel Corporation

Inventors: Dipankar Das, Roger Gramunt, Mikhail Smelyanskiy, Jesus Corbal, Dheevatsa Mudigere, Naveen K. Mellempudi, Alexander F. Heinecke
Instructions for fused multiply-add operations with variable precision input operands

Patent number: 11321086

Abstract: Disclosed embodiments relate to instructions for fused multiply-add (FMA) operations with variable-precision inputs. In one example, a processor to execute an asymmetric FMA instruction includes fetch circuitry to fetch an FMA instruction having fields to specify an opcode, a destination, and first and second source vectors having first and second widths, respectively, decode circuitry to decode the fetched FMA instruction, and a single instruction multiple data (SIMD) execution circuit to process as many elements of the second source vector as fit into an SIMD lane width by multiplying each element by a corresponding element of the first source vector, and accumulating a resulting product with previous contents of the destination, wherein the SIMD lane width is one of 16 bits, 32 bits, and 64 bits, the first width is one of 4 bits and 8 bits, and the second width is one of 1 bit, 2 bits, and 4 bits.

Type: Grant

Filed: January 6, 2020

Date of Patent: May 3, 2022

Assignee: Intel Corporation

Inventors: Dipankar Das, Naveen K. Mellempudi, Mrinmay Dutta, Arun Kumar, Dheevatsa Mudigere, Abhisek Kundu
Dynamic precision management for integer deep learning primitives

Patent number: 11321805

Abstract: One embodiment provides for a graphics processing unit to perform computations associated with a neural network, the graphics processing unit comprising compute unit including a hardware logic unit having dynamic precision fixed-point logic, the compute unit to receive a set of dynamic fixed-point tensors, compute, via the dynamic precision fixed-point logic, a right-shift value using an absolute maximum value within the set of dynamic fixed-point tensors and a dynamic range of the set of dynamic fixed-point tensors, right-shift data values within the set of dynamic fixed-point tensors based on the right-shift value, increment a shared exponent associated with the set of dynamic fixed-point tensors based on the right-shift value, perform a compute operation on the set of dynamic fixed-point tensors, and generate an output tensor via the compute operation on the set of dynamic fixed-point tensors.

Type: Grant

Filed: October 29, 2020

Date of Patent: May 3, 2022

Assignee: Intel Corporation

Inventors: Naveen Mellempudi, Dheevatsa Mudigere, Dipankar Das, Srinivas Sridharan

1 2 next