Patents by Inventor Dipankar Das

Dipankar Das has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Instructions and logic for vector multiply add with zero skipping

Patent number: 11314515

Abstract: Embodiments described herein provide for an instruction and associated logic to enable a vector multiply add instructions with automatic zero skipping for sparse input. One embodiment provides for a general-purpose graphics processor comprising logic to perform operations comprising fetching a hardware macro instruction having a predicate mask, a repeat count, and a set of initial operands, where the initial operands include a destination operand and multiple source operands. The hardware macro instruction is configured to perform one or more multiply/add operations on input data associated with a set of matrices.

Type: Grant

Filed: December 23, 2019

Date of Patent: April 26, 2022

Assignee: Intel Corporation

Inventors: Supratim Pal, Sasikanth Avancha, Ishwar Bhati, Wei-Yu Chen, Dipankar Das, Ashutosh Garg, Chandra S. Gurram, Junjie Gu, Guei-Yuan Lueh, Subramaniam Maiyuran, Jorge E. Parra, Sudarshan Srinivasan, Varghese George
SCALABLE HIGH-PERFORMANCE PACKAGE ARCHITECTURE USING PROCESSOR-MEMORY-PHOTONICS MODULES

Publication number: 20220115362

Abstract: A processor package module comprises a processor-memory stack including one or more compute die stacked and interconnected with a memory stack on a substrate. One or more photonic die is on the substrate to transmit and receive optical I/O, the one or more photonic die connected to the processor-memory stack and connected to external components through a fiber array. The substrate is mounted into a socket housing, such as a land grid array (LGA) socket. An array of processor package modules are interconnected on a processor substrate via fiber arrays and optical connectors to form a processor chip complex.

Type: Application

Filed: October 9, 2020

Publication date: April 14, 2022

Inventors: Debendra MALLIK, Ravindranath MAHAJAN, Dipankar DAS
ABSTRACTION LAYERS FOR SCALABLE DISTRIBUTED MACHINE LEARNING

Publication number: 20220101480

Abstract: One embodiment provides for a method of transmitting data between multiple compute nodes of a distributed compute system, the method comprising creating a global view of communication operations to be performed between the multiple compute nodes of the distributed compute system, the global view created using information specific to a machine learning model associated with the distributed compute system; using the global view to determine a communication cost of the communication operations; and automatically determining a number of network endpoints for use in transmitting the data between the multiple compute nodes of the distributed compute system.

Type: Application

Filed: August 10, 2021

Publication date: March 31, 2022

Applicant: Intel Corporation

Inventors: DHIRAJ D. KALAMKAR, KARTHIKEYAN VAIDYANATHAN, SRINIVAS SRIDHARAN, DIPANKAR DAS
Circuitry for low-precision deep learning

Patent number: 11275998

Abstract: The present disclosure relates generally to techniques for improving the implementation of certain operations on an integrated circuit. In particular, deep learning techniques, which may use a deep neural network (DNN) topology, may be implemented more efficiently using low-precision weights and activation values by efficiently performing down conversion of data to a lower precision and by preventing data overflow during suitable computations. Further, by more efficiently mapping multipliers to programmable logic on the integrated circuit device, the resources used by the DNN topology to perform, for example, inference tasks may be reduced, resulting in improved integrated circuit operating speeds.

Type: Grant

Filed: May 31, 2018

Date of Patent: March 15, 2022

Assignee: Intel Corporation

Inventors: Martin Langhammer, Sudarshan Srinivasan, Gregg William Baeckler, Duncan Moss, Sasikanth Avancha, Dipankar Das
Communication optimizations for distributed machine learning

Patent number: 11270201

Abstract: Embodiments described herein provide a system to configure distributed training of a neural network, the system comprising memory to store a library to facilitate data transmission during distributed training of the neural network; a network interface to enable transmission and receipt of configuration data associated with a set of worker nodes, the worker nodes configured to perform distributed training of the neural network; and a processor to execute instructions provided by the library, the instructions to cause the processor to create one or more groups of the worker nodes, the one or more groups of worker nodes to be created based on a communication pattern for messages to be transmitted between the worker nodes during distributed training of the neural network.

Type: Grant

Filed: December 29, 2017

Date of Patent: March 8, 2022

Assignee: Intel Corporation

Inventors: Srinivas Sridharan, Karthikeyan Vaidyanathan, Dipankar Das, Chandrasekaran Sakthivel, Mikhail E. Smorkalov
APPARATUSES, METHODS, AND SYSTEMS FOR NEURAL NETWORKS

Publication number: 20220050683

Abstract: Methods and apparatuses relating to processing neural networks are described. In one embodiment, an apparatus to process a neural network includes a plurality of fully connected layer chips coupled by an interconnect; a plurality of convolutional layer chips each coupled by an interconnect to a respective fully connected layer chip of the plurality of fully connected layer chips and each of the plurality of fully connected layer chips and the plurality of convolutional layer chips including an interconnect to couple each of a forward propagation compute intensive tile, a back propagation compute intensive tile, and a weight gradient compute intensive tile of a column of compute intensive tiles between a first memory intensive tile and a second memory intensive tile.

Type: Application

Filed: October 26, 2021

Publication date: February 17, 2022

Inventors: Swagath VENKATARAMANI, Dipankar DAS, Ashish RANJAN, Subarno BANERJEE, Sasikanth AVANCHA, Ashok JAGANNATHAN, Ajaya V. DURG, Dheemanth NAGARAJ, Bharat KAUL, Anand RAGHUNATHAN
APPARATUSES, METHODS, AND SYSTEMS FOR ACCESS SYNCHRONIZATION IN A SHARED MEMORY

Publication number: 20210382719

Abstract: Systems, methods, and apparatuses relating to access synchronization in a shared memory are described. In one embodiment, a processor includes a decoder to decode an instruction into a decoded instruction, and an execution unit to execute the decoded instruction to: receive a first input operand of a memory address to be tracked and a second input operand of an allowed sequence of memory accesses to the memory address, and cause a block of a memory access that violates the allowed sequence of memory accesses to the memory address. In one embodiment, a circuit separate from the execution unit compares a memory address for a memory access request to one or more memory addresses in a tracking table, and blocks a memory access for the memory access request when a type of access violates a corresponding allowed sequence of memory accesses to the memory address for the memory access request.

Type: Application

Filed: August 24, 2021

Publication date: December 9, 2021

Inventors: Swagath VENKATARAMANI, Dipankar DAS, Sasikanth AVANCHA, Ashish RANJAN, Subarno BANERJEE, Bharat KAUL, Anand RAGHUNATHAN
ABSTRACTION LIBRARY TO ENABLE SCALABLE DISTRIBUTED MACHINE LEARNING

Publication number: 20210350212

Abstract: One embodiment provides for a non-transitory machine readable medium storing instructions which, when executed by one or more processors, cause the one or more processors to perform operations comprising providing an interface to define a neural network using machine-learning domain specific terminology, wherein the interface enables selection of a neural network topology and abstracts low-level communication details of distributed training of the neural network.

Type: Application

Filed: May 24, 2021

Publication date: November 11, 2021

Applicant: Intel Corporation

Inventors: DHIRAJ D. KALAMKAR, KARTHIKEYAN VAIDYANATHAN, SRINIVAS SRIDHARAN, DIPANKAR DAS
TECHNOLOGIES FOR SCALING DEEP LEARNING TRAINING

Publication number: 20210342692

Abstract: Technologies for artificial neural network training include a computing node with a host fabric interface that sends a message that includes one or more artificial neural network training algorithm values to another computing node in response to receipt of a request to send the message. Prior to sending the message, the host fabric interface may receive a request to quantize the message and quantize the message based on a quantization level included in the request to generate a quantized message. The quantization message includes one or more quantized values such that each quantized value has a lower precision than a corresponding artificial neural network training algorithm value. The host fabric interface then transmits the quantized message, which includes metadata indicative of the quantization level, to another computing node in response to quantization of the message for artificial neural network training. Other embodiments are described and claimed.

Type: Application

Filed: May 14, 2021

Publication date: November 4, 2021

Inventors: Naveen K. Mellempudi, Srinivas Sridharan, Dheevatsa Mudigere, Dipankar Das
Apparatuses, methods, and systems for access synchronization in a shared memory

Patent number: 11106464

Abstract: Systems, methods, and apparatuses relating to access synchronization in a shared memory are described. In one embodiment, a processor includes a decoder to decode an instruction into a decoded instruction, and an execution unit to execute the decoded instruction to: receive a first input operand of a memory address to be tracked and a second input operand of an allowed sequence of memory accesses to the memory address, and cause a block of a memory access that violates the allowed sequence of memory accesses to the memory address. In one embodiment, a circuit separate from the execution unit compares a memory address for a memory access request to one or more memory addresses in a tracking table, and blocks a memory access for the memory access request when a type of access violates a corresponding allowed sequence of memory accesses to the memory address for the memory access request.

Type: Grant

Filed: September 27, 2016

Date of Patent: August 31, 2021

Assignee: Intel Corporation

Inventors: Swagath Venkataramani, Dipankar Das, Sasikanth Avancha, Ashish Ranjan, Subarno Banerjee, Bharat Kaul, Anand Raghunathan
Abstraction layers for scalable distributed machine learning

Patent number: 11094029

Abstract: One embodiment provides for a method of transmitting data between multiple compute nodes of a distributed compute system, the method comprising creating a global view of communication operations to be performed between the multiple compute nodes of the distributed compute system, the global view created using information specific to a machine learning model associated with the distributed compute system; using the global view to determine a communication cost of the communication operations; and automatically determining a number of network endpoints for use in transmitting the data between the multiple compute nodes of the distributed compute system.

Type: Grant

Filed: April 10, 2017

Date of Patent: August 17, 2021

Assignee: INTEL CORPORATION

Inventors: Dhiraj D. Kalamkar, Karthikeyan Vaidyanathan, Srinivas Sridharan, Dipankar Das
Technologies for scaling deep learning training

Patent number: 11068780

Abstract: Technologies for artificial neural network training include a computing node with a host fabric interface that sends a message that includes one or more artificial neural network training algorithm values to another computing node in response to receipt of a request to send the message. Prior to sending the message, the host fabric interface may receive a request to quantize the message and quantize the message based on a quantization level included in the request to generate a quantized message. The quantization message includes one or more quantized values such that each quantized value has a lower precision than a corresponding artificial neural network training algorithm value. The host fabric interface then transmits the quantized message, which includes metadata indicative of the quantization level, to another computing node in response to quantization of the message for artificial neural network training. Other embodiments are described and claimed.

Type: Grant

Filed: April 1, 2017

Date of Patent: July 20, 2021

Assignee: Intel Corporation

Inventors: Naveen K. Mellempudi, Srinivas Sridharan, Dheevatsa Mudigere, Dipankar Das
INSTRUCTIONS AND LOGIC FOR VECTOR MULTIPLY ADD WITH ZERO SKIPPING

Publication number: 20210191724

Abstract: Embodiments described herein provide for an instruction and associated logic to enable a vector multiply add instructions with automatic zero skipping for sparse input. One embodiment provides for a general-purpose graphics processor comprising logic to perform operations comprising fetching a hardware macro instruction having a predicate mask, a repeat count, and a set of initial operands, where the initial operands include a destination operand and multiple source operands. The hardware macro instruction is configured to perform one or more multiply/add operations on input data associated with a set of matrices.

Type: Application

Filed: December 23, 2019

Publication date: June 24, 2021

Applicant: Intel Corporation

Inventors: Supratim Pal, Sasikanth Avancha, Ishwar Bhati, Wei-Yu Chen, Dipankar Das, Ashutosh Garg, Chandra S. Gurram, Junjie Gu, Guei-Yuan Lueh, Subramaniam Maiyuran, Jorge E. Parra, Sudarshan Srinivasan, Varghese George
Abstraction library to enable scalable distributed machine learning

Patent number: 11023803

Abstract: One embodiment provides for a non-transitory machine readable medium storing instructions which, when executed by one or more processors, cause the one or more processors to perform operations comprising providing an interface to define a neural network using machine-learning domain specific terminology, wherein the interface enables selection of a neural network topology and abstracts low-level communication details of distributed training of the neural network.

Type: Grant

Filed: April 10, 2017

Date of Patent: June 1, 2021

Assignee: INTEL CORPORATION

Inventors: Dhiraj D. Kalamkar, Karthikeyan Vaidyanathan, Srinivas Sridharan, Dipankar Das
DYNAMIC PRECISION MANAGEMENT FOR INTEGER DEEP LEARNING PRIMITIVES

Publication number: 20210110508

Abstract: One embodiment provides for a graphics processing unit to perform computations associated with a neural network, the graphics processing unit comprising compute unit including a hardware logic unit having dynamic precision fixed-point logic, the compute unit to receive a set of dynamic fixed-point tensors, compute, via the dynamic precision fixed-point logic, a right-shift value using an absolute maximum value within the set of dynamic fixed-point tensors and a dynamic range of the set of dynamic fixed-point tensors, right-shift data values within the set of dynamic fixed-point tensors based on the right-shift value, increment a shared exponent associated with the set of dynamic fixed-point tensors based on the right-shift value, perform a compute operation on the set of dynamic fixed-point tensors, and generate an output tensor via the compute operation on the set of dynamic fixed-point tensors.

Type: Application

Filed: October 29, 2020

Publication date: April 15, 2021

Applicant: Intel Corporation

Inventors: Naveen MELLEMPUDI, DHEEVATSA MUDIGERE, DIPANKAR DAS, SRINIVAS SRIDHARAN
PARALLEL PROCESSING BASED ON INJECTION NODE BANDWIDTH

Publication number: 20210109888

Abstract: A technique includes performing a collective operation among multiple nodes of a parallel processing computer system using multiple parallel processing stages. The technique includes regulating an ordering of the parallel processing stages so that an initial stage of the plurality of parallel processing stages is associated with a higher node injection bandwidth than a subsequent stage of the plurality of parallel processing stages.

Type: Application

Filed: September 30, 2017

Publication date: April 15, 2021

Inventors: Karthikeyan Vaidyanathan, Srinivas Sridharan, Dipankar Das
UTILIZING STRUCTURED SPARSITY IN SYSTOLIC ARRAYS

Publication number: 20210081201

Abstract: An apparatus to facilitate utilizing structured sparsity in systolic arrays is disclosed. The apparatus includes a processor comprising a systolic array to receive data from a plurality of source registers, the data comprising unpacked source data, structured source data that is packed based on sparsity, and metadata corresponding to the structured source data; identify portions of the unpacked source data to multiply with the structured source data, the portions of the unpacked source data identified based on the metadata; and output, to a destination register, a result of multiplication of the portions of the unpacked source data and the structured source data.

Type: Application

Filed: November 30, 2020

Publication date: March 18, 2021

Applicant: Intel Corporation

Inventors: Subramaniam Maiyuran, Jorge Parra, Ashutosh Garg, Chandra Gurram, Chunhui Mei, Durgesh Borkar, Shubra Marwaha, Supratim Pal, Varghese George, Wei Xiong, Yan Li, Yongsheng Liu, Dipankar Das, Sasikanth Avancha, Dharma Teja Vooturi, Naveen K. Mellempudi
PROGRAMMABLE CONVERSION HARDWARE

Publication number: 20210072955

Abstract: An apparatus to facilitate a computer number format conversion is disclosed. The apparatus comprises a control unit to receive to receive data format information indicating a first precision data format that input data is to be received and converter hardware to receive the input data and convert the first precision data format to a second precision data format based on the data format information.

Type: Application

Filed: September 6, 2019

Publication date: March 11, 2021

Applicant: Intel Corporation

Inventors: Naveen MELLEMPUDI, Dipankar DAS, Chunhui MEI, Kristopher WONG, Dhiraj D. KALAMKAR, Hong H. JIANG, Subramaniam Maiyuran, Varghese George
OPTIMIZED COMPUTE HARDWARE FOR MACHINE LEARNING OPERATIONS

Publication number: 20210019631

Abstract: A processing cluster of a processing cluster array comprises a plurality of registers to store input values of vector input operands, the input values of at least some of the vector input operands having different bit lengths than those of other input values of other vector input operands, and a compute unit to execute a dot-product instruction with the vector input operands to perform a number of parallel multiply operations and an accumulate operation per 32-bit lane based on a bit length of the smallest-sized input value of a first vector input operand relative to the 32-bit lane.

Type: Application

Filed: August 3, 2020

Publication date: January 21, 2021

Applicant: Intel Corporation

Inventors: Dipankar Das, Roger Gramunt, Mikhail Smelyanskiy, Jesus Corbal, Dheevatsa Mudigere, Naveen K. Mellempudi, Alexander F. Heinecke
Dynamic precision management for integer deep learning primitives

Patent number: 10825127

Abstract: One embodiment provides for a graphics processing unit to perform computations associated with a neural network, the graphics processing unit comprising compute unit including a hardware logic unit having dynamic precision fixed-point logic, the compute unit to receive a set of dynamic fixed-point tensors, compute, via the dynamic precision fixed-point logic, a right-shift value using an absolute maximum value within the set of dynamic fixed-point tensors and a dynamic range of the set of dynamic fixed-point tensors, right-shift data values within the set of dynamic fixed-point tensors based on the right-shift value, increment a shared exponent associated with the set of dynamic fixed-point tensors based on the right-shift value, perform a compute operation on the set of dynamic fixed-point tensors, and generate an output tensor via the compute operation on the set of dynamic fixed-point tensors.

Type: Grant

Filed: April 20, 2020

Date of Patent: November 3, 2020

Assignee: Intel Corporation

Inventors: Naveen Mellempudi, Dheevatsa Mudigere, Dipankar Das, Srinivas Sridharan

prev 1 2 3 4 5 6 next