Patents by Inventor Dipankar Das

Dipankar Das has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Scaling half-precision floating point tensors for training deep neural networks

Patent number: 11507815

Abstract: A graphics processor is described that includes a single instruction, multiple thread (SIMT) architecture including hardware multithreading. The multiprocessor can execute parallel threads of instructions associated with a command stream, where the multiprocessor includes a set of functional units to execute at least one of the parallel threads of the instructions. The set of functional units can include a mixed precision tensor processor to perform tensor computations. The functional units can also include circuitry to analyze statistics for output values of the tensor computations, determine a target format to convert the output values, the target format determined based on the statistics for the output values and a precision associated with a second layer of the neural network, and convert the output values to the target format.

Type: Grant

Filed: May 11, 2022

Date of Patent: November 22, 2022

Assignee: Intel Corporation

Inventors: Naveen Mellempudi, Dipankar Das
DATA PARALLELISM AND HALO EXCHANGE FOR DISTRIBUTED MACHINE LEARNING

Publication number: 20220366526

Abstract: One embodiment provides for a method of transmitting data between multiple compute nodes of a distributed compute system, the method comprising multi-dimensionally partitioning data of a feature map across multiple nodes for distributed training of a convolutional neural network; performing a parallel convolution operation on the multiple partitions to train weight data of the neural network; and exchanging data between nodes to enable computation of halo regions, the halo regions having dependencies on data processed by a different node.

Type: Application

Filed: June 27, 2022

Publication date: November 17, 2022

Applicant: Intel Corporation

Inventors: Dipankar Das, KARTHIKEYAN VAIDYANATHAN, Srinivas Sridharan
Scaling half-precision floating point tensors for training deep neural networks

Patent number: 11501139

Abstract: One embodiment provides for a machine-learning accelerator device a multiprocessor to execute parallel threads of an instruction stream, the multiprocessor including a compute unit, the compute unit including a set of functional units, each functional unit to execute at least one of the parallel threads of the instruction stream. The compute unit includes compute logic configured to execute a single instruction to scale an input tensor associated with a layer of a neural network according to a scale factor, the input tensor stored in a floating-point data type, the compute logic to scale the input tensor to enable a data distribution of data of the input tensor to be represented by a 16-bit floating point data type.

Type: Grant

Filed: January 12, 2018

Date of Patent: November 15, 2022

Assignee: Intel Corporation

Inventors: Naveen Mellempudi, Dipankar Das
Conversion hardware mechanism

Patent number: 11494163

Abstract: An apparatus to facilitate a computer number format conversion is disclosed. The apparatus comprises a control unit to receive to receive data format information indicating a first precision data format that input data is to be received and converter hardware to receive the input data and convert the first precision data format to a second precision data format based on the data format information.

Type: Grant

Filed: September 6, 2019

Date of Patent: November 8, 2022

Assignee: Intel Corporation

Inventors: Naveen Mellempudi, Dipankar Das, Chunhui Mei, Kristopher Wong, Dhiraj D. Kalamkar, Hong H. Jiang, Subramaniam Maiyuran, Varghese George
Hardware implemented point to point communication primitives for machine learning

Patent number: 11488008

Abstract: One embodiment provides for a system to compute and distribute data for distributed training of a neural network, the system including first memory to store a first set of instructions including a machine learning framework; a fabric interface to enable transmission and receipt of data associated with the set of trainable machine learning parameters; a first set of general-purpose processor cores to execute the first set of instructions, the first set of instructions to provide a training workflow for computation of gradients for the trainable machine learning parameters and to communicate with a second set of instructions, the second set of instructions facilitate transmission and receipt of the gradients via the fabric interface; and a graphics processor to perform compute operations associated with the training workflow to generate the gradients for the trainable machine learning parameters.

Type: Grant

Filed: January 12, 2018

Date of Patent: November 1, 2022

Assignee: Intel Corporation

Inventors: Srinivas Sridharan, Karthikeyan Vaidyanathan, Dipankar Das
OPTIMIZED COMPUTE HARDWARE FOR MACHINE LEARNING OPERATIONS

Publication number: 20220343174

Abstract: Described herein is a graphics processor including a processing resource including a multiplier configured to multiply input associated with the instruction at one of a first plurality of bit widths, an adder configured to add a product output from the multiplier with an accumulator value at one of a second plurality of bit widths, and circuitry to select a first bit width of the first plurality of bit widths for the multiplier and a second bit width of the second plurality of bit widths for the adder.

Type: Application

Filed: May 12, 2022

Publication date: October 27, 2022

Applicant: Intel Corporation

Inventors: Dipankar Das, Roger Gramunt, Mikhail Smelyanskiy, Jesus Corbal, Dheevatsa Mudigere, Naveen K. Mellempudi, Alexander F. Heinecke
TEST STRIP ASSEMBLY FOR ANALYSING BODILY FLUIDS AND DEVICES THEREOF

Publication number: 20220341919

Abstract: A test strip assembly 101 for dipping in a bodily fluid sample to analyse a presence or absence of one or more analytes is provided. The test strip assembly 101 includes a basal layer (1), a first adhesive layer (2) that is entirely present over the basal layer (1) a porous membrane (3) that is present over the first adhesive layer (2) and the bodily fluid flows in a lateral direction in the porous membrane (3); a second adhesive layer (4); and a number of detection labels (5) placed on the porous membrane through the adhesive layer (4) and receives bodily fluids flowing in the lateral direction in the porous membrane such that the bodily fluids then flow in a vertical direction in the detection labels (5). A device including the test strip assembly (101) is provided.

Type: Application

Filed: June 20, 2020

Publication date: October 27, 2022

Inventors: Varun AKUR VENKATESAN, Siddharth PATTNAIK, Dipankar DAS
INSTRUCTIONS AND LOGIC FOR VECTOR MULTIPLY ADD WITH ZERO SKIPPING

Publication number: 20220326953

Abstract: Embodiments described herein provide for an instruction and associated logic to enable a vector multiply add instructions with automatic zero skipping for sparse input. One embodiment provides for a general-purpose graphics processor comprising logic to perform operations comprising fetching a hardware macro instruction having a predicate mask, a repeat count, and a set of initial operands, where the initial operands include a destination operand and multiple source operands. The hardware macro instruction is configured to perform one or more multiply/add operations on input data associated with a set of matrices.

Type: Application

Filed: April 18, 2022

Publication date: October 13, 2022

Applicant: Intel Corporation

Inventors: Supratim Pal, Sasikanth Avancha, Ishwar Bhati, Wei-Yu Chen, Dipankar Das, Ashutosh Garg, Chandra S. Gurram, Junjie Gu, Guei-Yuan Lueh, Subramaniam Maiyuran, Jorge E. Parra, Sudarshan Srinivasan, Varghese George
DYNAMIC PRECISION MANAGEMENT FOR INTEGER DEEP LEARNING PRIMITIVES

Publication number: 20220327656

Abstract: One embodiment provides for a graphics processing unit to perform computations associated with a neural network, the graphics processing unit comprising a hardware processing unit having a dynamic precision fixed-point unit that is configurable to quantize elements of a floating-point tensor to convert the floating-point tensor into a dynamic fixed-point tensor.

Type: Application

Filed: April 27, 2022

Publication date: October 13, 2022

Applicant: Intel Corporation

Inventors: Naveen K. MELLEMPUDI, DHEEVATSA MUDIGERE, DIPANKAR DAS, SRINIVAS SRIDHARAN
Scaling half-precision floating point tensors for training deep neural networks

Patent number: 11468303

Abstract: A graphics processor is described that includes a single instruction, multiple thread (SIMT) architecture including hardware multithreading. The multiprocessor can execute parallel threads of instructions associated with a command stream, where the multiprocessor includes a set of functional units to execute at least one of the parallel threads of the instructions. The set of functional units can include a mixed precision tensor processor to perform tensor computations to generate loss data. The loss data is stored as a floating-point data type and scaled by a scaling factor to enable a data distribution of a gradient tensor generated based on the loss data to be represented by a 16-bit floating point data type.

Type: Grant

Filed: July 30, 2019

Date of Patent: October 11, 2022

Assignee: Intel Corporation

Inventors: Naveen Mellempudi, Dipankar Das
SCALING HALF-PRECISION FLOATING POINT TENSORS FOR TRAINING DEEP NEURAL NETWORKS

Publication number: 20220269931

Abstract: A graphics processor is described that includes a single instruction, multiple thread (SIMT) architecture including hardware multithreading. The multiprocessor can execute parallel threads of instructions associated with a command stream, where the multiprocessor includes a set of functional units to execute at least one of the parallel threads of the instructions. The set of functional units can include a mixed precision tensor processor to perform tensor computations. The functional units can also include circuitry to analyze statistics for output values of the tensor computations, determine a target format to convert the output values, the target format determined based on the statistics for the output values and a precision associated with a second layer of the neural network, and convert the output values to the target format.

Type: Application

Filed: May 11, 2022

Publication date: August 25, 2022

Applicant: Intel Corporation

Inventors: NAVEEN MELLEMPUDI, DIPANKAR DAS
Apparatus and method for vector multiply and accumulate of packed words

Patent number: 11409525

Abstract: An apparatus and method for performing multiply-accumulate operations.

Type: Grant

Filed: January 24, 2018

Date of Patent: August 9, 2022

Assignee: Intel Corporation

Inventors: Alexander Heinecke, Dipankar Das, Robert Valentine, Mark Charney
COMMUNICATION OPTIMIZATIONS FOR DISTRIBUTED MACHINE LEARNING

Publication number: 20220245454

Abstract: Embodiments described herein provide a system to configure distributed training of a neural network, the system comprising memory to store a library to facilitate data transmission during distributed training of the neural network; a network interface to enable transmission and receipt of configuration data associated with a set of worker nodes, the worker nodes configured to perform distributed training of the neural network; and a processor to execute instructions provided by the library. The instructions cause the processor to create one or more groups of the worker nodes, the one or more groups of worker nodes to be created based on a communication pattern for messages to be transmitted between the worker nodes during distributed training of the neural network. The processor can transparently adjust communication paths between worker nodes based on the communication pattern.

Type: Application

Filed: March 3, 2022

Publication date: August 4, 2022

Applicant: Intel Corporation

Inventors: Srinivas Sridharan, Karthikeyan Vaidyanathan, Dipankar Das, Chandrasekaran Sakthivel, Mikhail E. Smorkalov
INSTRUCTIONS FOR FUSED MULTIPLY-ADD OPERATIONS WITH VARIABLE PRECISION INPUT OPERANDS

Publication number: 20220214877

Abstract: Disclosed embodiments relate to instructions for fused multiply-add (FMA) operations with variable-precision inputs. In one example, a processor to execute an asymmetric FMA instruction includes fetch circuitry to fetch an FMA instruction having fields to specify an opcode, a destination, and first and second source vectors having first and second widths, respectively, decode circuitry to decode the fetched FMA instruction, and a single instruction multiple data (SIMD) execution circuit to process as many elements of the second source vector as fit into an SIMD lane width by multiplying each element by a corresponding element of the first source vector, and accumulating a resulting product with previous contents of the destination, wherein the SIMD lane width is one of 16 bits, 32 bits, and 64 bits, the first width is one of 4 bits and 8 bits, and the second width is one of 1 bit, 2 bits, and 4 bits.

Type: Application

Filed: March 25, 2022

Publication date: July 7, 2022

Inventors: Dipankar DAS, Naveen K. MELLEMPUDI, Mrinmay DUTTA, Arun KUMAR, Dheevatsa MUDIGERE, Abhisek KUNDU
Data parallelism and halo exchange for distributed machine learning

Patent number: 11373266

Abstract: One embodiment provides for a method of transmitting data between multiple compute nodes of a distributed compute system, the method comprising multi-dimensionally partitioning data of a feature map across multiple nodes for distributed training of a convolutional neural network; performing a parallel convolution operation on the multiple partitions to train weight data of the neural network; and exchanging data between nodes to enable computation of halo regions, the halo regions having dependencies on data processed by a different node.

Type: Grant

Filed: January 12, 2018

Date of Patent: June 28, 2022

Assignee: Intel Corporation

Inventors: Dipankar Das, Karthikeyan Vaidyanathan, Srinivas Sridharan
Optimized compute hardware for machine learning operations

Patent number: 11334796

Abstract: A processing cluster of a processing cluster array comprises a plurality of registers to store input values of vector input operands, the input values of at least some of the vector input operands having different bit lengths than those of other input values of other vector input operands, and a compute unit to execute a dot-product instruction with the vector input operands to perform a number of parallel multiply operations and an accumulate operation per 32-bit lane based on a bit length of the smallest-sized input value of a first vector input operand relative to the 32-bit lane.

Type: Grant

Filed: August 3, 2020

Date of Patent: May 17, 2022

Assignee: Intel Corporation

Inventors: Dipankar Das, Roger Gramunt, Mikhail Smelyanskiy, Jesus Corbal, Dheevatsa Mudigere, Naveen K. Mellempudi, Alexander F. Heinecke
Dynamic precision management for integer deep learning primitives

Patent number: 11321805

Abstract: One embodiment provides for a graphics processing unit to perform computations associated with a neural network, the graphics processing unit comprising compute unit including a hardware logic unit having dynamic precision fixed-point logic, the compute unit to receive a set of dynamic fixed-point tensors, compute, via the dynamic precision fixed-point logic, a right-shift value using an absolute maximum value within the set of dynamic fixed-point tensors and a dynamic range of the set of dynamic fixed-point tensors, right-shift data values within the set of dynamic fixed-point tensors based on the right-shift value, increment a shared exponent associated with the set of dynamic fixed-point tensors based on the right-shift value, perform a compute operation on the set of dynamic fixed-point tensors, and generate an output tensor via the compute operation on the set of dynamic fixed-point tensors.

Type: Grant

Filed: October 29, 2020

Date of Patent: May 3, 2022

Assignee: Intel Corporation

Inventors: Naveen Mellempudi, Dheevatsa Mudigere, Dipankar Das, Srinivas Sridharan
Instructions for fused multiply-add operations with variable precision input operands

Patent number: 11321086

Abstract: Disclosed embodiments relate to instructions for fused multiply-add (FMA) operations with variable-precision inputs. In one example, a processor to execute an asymmetric FMA instruction includes fetch circuitry to fetch an FMA instruction having fields to specify an opcode, a destination, and first and second source vectors having first and second widths, respectively, decode circuitry to decode the fetched FMA instruction, and a single instruction multiple data (SIMD) execution circuit to process as many elements of the second source vector as fit into an SIMD lane width by multiplying each element by a corresponding element of the first source vector, and accumulating a resulting product with previous contents of the destination, wherein the SIMD lane width is one of 16 bits, 32 bits, and 64 bits, the first width is one of 4 bits and 8 bits, and the second width is one of 1 bit, 2 bits, and 4 bits.

Type: Grant

Filed: January 6, 2020

Date of Patent: May 3, 2022

Assignee: Intel Corporation

Inventors: Dipankar Das, Naveen K. Mellempudi, Mrinmay Dutta, Arun Kumar, Dheevatsa Mudigere, Abhisek Kundu
Instructions and logic for vector multiply add with zero skipping

Patent number: 11314515

Abstract: Embodiments described herein provide for an instruction and associated logic to enable a vector multiply add instructions with automatic zero skipping for sparse input. One embodiment provides for a general-purpose graphics processor comprising logic to perform operations comprising fetching a hardware macro instruction having a predicate mask, a repeat count, and a set of initial operands, where the initial operands include a destination operand and multiple source operands. The hardware macro instruction is configured to perform one or more multiply/add operations on input data associated with a set of matrices.

Type: Grant

Filed: December 23, 2019

Date of Patent: April 26, 2022

Assignee: Intel Corporation

Inventors: Supratim Pal, Sasikanth Avancha, Ishwar Bhati, Wei-Yu Chen, Dipankar Das, Ashutosh Garg, Chandra S. Gurram, Junjie Gu, Guei-Yuan Lueh, Subramaniam Maiyuran, Jorge E. Parra, Sudarshan Srinivasan, Varghese George
SCALABLE HIGH-PERFORMANCE PACKAGE ARCHITECTURE USING PROCESSOR-MEMORY-PHOTONICS MODULES

Publication number: 20220115362

Abstract: A processor package module comprises a processor-memory stack including one or more compute die stacked and interconnected with a memory stack on a substrate. One or more photonic die is on the substrate to transmit and receive optical I/O, the one or more photonic die connected to the processor-memory stack and connected to external components through a fiber array. The substrate is mounted into a socket housing, such as a land grid array (LGA) socket. An array of processor package modules are interconnected on a processor substrate via fiber arrays and optical connectors to form a processor chip complex.

Type: Application

Filed: October 9, 2020

Publication date: April 14, 2022

Inventors: Debendra MALLIK, Ravindranath MAHAJAN, Dipankar DAS

prev 1 2 3 4 5 6 next