Patents by Inventor DHIRAJ D. KALAMKAR

DHIRAJ D. KALAMKAR has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

MATRIX OPERATION OPTIMIZATION MECHANISM

Publication number: 20240427842

Abstract: An apparatus to facilitate machine learning matrix processing is disclosed. The apparatus comprises a memory to store matrix data one or more processors to execute an instruction to examine a message descriptor included in the instruction to determine a type of matrix layout manipulation operation that is to be executed, examine a message header included in the instruction having a plurality of parameters that define a two-dimensional (2D) memory surface that is to be retrieved, retrieve one or more blocks of the matrix data from the memory based on the plurality of parameters and a register file including a plurality of registers, wherein the one or more blocks of the matrix data is stored within a first set of the plurality of registers.

Type: Application

Filed: May 24, 2024

Publication date: December 26, 2024

Applicant: Intel Corporation

Inventors: Joydeep Ray, Fangwen Fu, Dhiraj D. Kalamkar, Sasikanth Avancha
Matrix operation optimization mechanism

Patent number: 12039000

Abstract: An apparatus to facilitate machine learning matrix processing is disclosed. The apparatus comprises a memory to store matrix data one or more processors to execute an instruction to examine a message descriptor included in the instruction to determine a type of matrix layout manipulation operation that is to be executed, examine a message header included in the instruction having a plurality of parameters that define a two-dimensional (2D) memory surface that is to be retrieved, retrieve one or more blocks of the matrix data from the memory based on the plurality of parameters and a register file including a plurality of registers, wherein the one or more blocks of the matrix data is stored within a first set of the plurality of registers.

Type: Grant

Filed: February 2, 2023

Date of Patent: July 16, 2024

Assignee: Intel Corporation

Inventors: Joydeep Ray, Fangwen Fu, Dhiraj D. Kalamkar, Sasikanth Avancha
ABSTRACTION LAYERS FOR SCALABLE DISTRIBUTED MACHINE LEARNING

Publication number: 20240070799

Abstract: One embodiment provides for a method of transmitting data between multiple compute nodes of a distributed compute system, the method comprising creating a global view of communication operations to be performed between the multiple compute nodes of the distributed compute system, the global view created using information specific to a machine learning model associated with the distributed compute system; using the global view to determine a communication cost of the communication operations; and automatically determining a number of network endpoints for use in transmitting the data between the multiple compute nodes of the distributed compute system.

Type: Application

Filed: September 5, 2023

Publication date: February 29, 2024

Applicant: Intel Corporation

Inventors: Dhiraj D. KALAMKAR, Karthikeyan VAIDYANATHAN, Srinivas SRIDHARAN, Dipankar DAS
Abstraction layers for scalable distributed machine learning

Patent number: 11798120

Abstract: One embodiment provides for a method of transmitting data between multiple compute nodes of a distributed compute system, the method comprising creating a global view of communication operations to be performed between the multiple compute nodes of the distributed compute system, the global view created using information specific to a machine learning model associated with the distributed compute system; using the global view to determine a communication cost of the communication operations; and automatically determining a number of network endpoints for use in transmitting the data between the multiple compute nodes of the distributed compute system.

Type: Grant

Filed: August 10, 2021

Date of Patent: October 24, 2023

Assignee: INTEL CORPORATION

Inventors: Dhiraj D. Kalamkar, Karthikeyan Vaidyanathan, Srinivas Sridharan, Dipankar Das
MATRIX OPERATION OPTIMIZATION MECHANISM

Publication number: 20230289399

Abstract: An apparatus to facilitate machine learning matrix processing is disclosed. The apparatus comprises a memory to store matrix data one or more processors to execute an instruction to examine a message descriptor included in the instruction to determine a type of matrix layout manipulation operation that is to be executed, examine a message header included in the instruction having a plurality of parameters that define a two-dimensional (2D) memory surface that is to be retrieved, retrieve one or more blocks of the matrix data from the memory based on the plurality of parameters and a register file including a plurality of registers, wherein the one or more blocks of the matrix data is stored within a first set of the plurality of registers.

Type: Application

Filed: February 2, 2023

Publication date: September 14, 2023

Applicant: Intel Corporation

Inventors: Joydeep Ray, Fangwen Fu, Dhiraj D. Kalamkar, Sasikanth Avancha
Matrix operation optimization mechanism

Patent number: 11593454

Abstract: An apparatus to facilitate machine learning matrix processing is disclosed. The apparatus comprises a memory to store matrix data one or more processors to execute an instruction to examine a message descriptor included in the instruction to determine a type of matrix layout manipulation operation that is to be executed, examine a message header included in the instruction having a plurality of parameters that define a two-dimensional (2D) memory surface that is to be retrieved, retrieve one or more blocks of the matrix data from the memory based on the plurality of parameters and a register file including a plurality of registers, wherein the one or more blocks of the matrix data is stored within a first set of the plurality of registers.

Type: Grant

Filed: June 2, 2020

Date of Patent: February 28, 2023

Assignee: Intel Corporation

Inventors: Joydeep Ray, Fangwen Fu, Dhiraj D. Kalamkar, Sasikanth Avancha
Conversion hardware mechanism

Patent number: 11494163

Abstract: An apparatus to facilitate a computer number format conversion is disclosed. The apparatus comprises a control unit to receive to receive data format information indicating a first precision data format that input data is to be received and converter hardware to receive the input data and convert the first precision data format to a second precision data format based on the data format information.

Type: Grant

Filed: September 6, 2019

Date of Patent: November 8, 2022

Assignee: Intel Corporation

Inventors: Naveen Mellempudi, Dipankar Das, Chunhui Mei, Kristopher Wong, Dhiraj D. Kalamkar, Hong H. Jiang, Subramaniam Maiyuran, Varghese George
ABSTRACTION LAYERS FOR SCALABLE DISTRIBUTED MACHINE LEARNING

Publication number: 20220101480

Abstract: One embodiment provides for a method of transmitting data between multiple compute nodes of a distributed compute system, the method comprising creating a global view of communication operations to be performed between the multiple compute nodes of the distributed compute system, the global view created using information specific to a machine learning model associated with the distributed compute system; using the global view to determine a communication cost of the communication operations; and automatically determining a number of network endpoints for use in transmitting the data between the multiple compute nodes of the distributed compute system.

Type: Application

Filed: August 10, 2021

Publication date: March 31, 2022

Applicant: Intel Corporation

Inventors: DHIRAJ D. KALAMKAR, KARTHIKEYAN VAIDYANATHAN, SRINIVAS SRIDHARAN, DIPANKAR DAS
MATRIX OPERATION OPTIMIZATION MECHANISM

Publication number: 20210374209

Abstract: An apparatus to facilitate machine learning matrix processing is disclosed. The apparatus comprises a memory to store matrix data one or more processors to execute an instruction to examine a message descriptor included in the instruction to determine a type of matrix layout manipulation operation that is to be executed, examine a message header included in the instruction having a plurality of parameters that define a two-dimensional (2D) memory surface that is to be retrieved, retrieve one or more blocks of the matrix data from the memory based on the plurality of parameters and a register file including a plurality of registers, wherein the one or more blocks of the matrix data is stored within a first set of the plurality of registers.

Type: Application

Filed: June 2, 2020

Publication date: December 2, 2021

Applicant: Intel Corporation

Inventors: Joydeep Ray, Fangwen Fu, Dhiraj D. Kalamkar, Sasikanth Avancha
ABSTRACTION LIBRARY TO ENABLE SCALABLE DISTRIBUTED MACHINE LEARNING

Publication number: 20210350212

Abstract: One embodiment provides for a non-transitory machine readable medium storing instructions which, when executed by one or more processors, cause the one or more processors to perform operations comprising providing an interface to define a neural network using machine-learning domain specific terminology, wherein the interface enables selection of a neural network topology and abstracts low-level communication details of distributed training of the neural network.

Type: Application

Filed: May 24, 2021

Publication date: November 11, 2021

Applicant: Intel Corporation

Inventors: DHIRAJ D. KALAMKAR, KARTHIKEYAN VAIDYANATHAN, SRINIVAS SRIDHARAN, DIPANKAR DAS
Abstraction layers for scalable distributed machine learning

Patent number: 11094029

Abstract: One embodiment provides for a method of transmitting data between multiple compute nodes of a distributed compute system, the method comprising creating a global view of communication operations to be performed between the multiple compute nodes of the distributed compute system, the global view created using information specific to a machine learning model associated with the distributed compute system; using the global view to determine a communication cost of the communication operations; and automatically determining a number of network endpoints for use in transmitting the data between the multiple compute nodes of the distributed compute system.

Type: Grant

Filed: April 10, 2017

Date of Patent: August 17, 2021

Assignee: INTEL CORPORATION

Inventors: Dhiraj D. Kalamkar, Karthikeyan Vaidyanathan, Srinivas Sridharan, Dipankar Das
Abstraction library to enable scalable distributed machine learning

Patent number: 11023803

Abstract: One embodiment provides for a non-transitory machine readable medium storing instructions which, when executed by one or more processors, cause the one or more processors to perform operations comprising providing an interface to define a neural network using machine-learning domain specific terminology, wherein the interface enables selection of a neural network topology and abstracts low-level communication details of distributed training of the neural network.

Type: Grant

Filed: April 10, 2017

Date of Patent: June 1, 2021

Assignee: INTEL CORPORATION

Inventors: Dhiraj D. Kalamkar, Karthikeyan Vaidyanathan, Srinivas Sridharan, Dipankar Das
PROGRAMMABLE CONVERSION HARDWARE

Publication number: 20210072955

Abstract: An apparatus to facilitate a computer number format conversion is disclosed. The apparatus comprises a control unit to receive to receive data format information indicating a first precision data format that input data is to be received and converter hardware to receive the input data and convert the first precision data format to a second precision data format based on the data format information.

Type: Application

Filed: September 6, 2019

Publication date: March 11, 2021

Applicant: Intel Corporation

Inventors: Naveen MELLEMPUDI, Dipankar DAS, Chunhui MEI, Kristopher WONG, Dhiraj D. KALAMKAR, Hong H. JIANG, Subramaniam Maiyuran, Varghese George
Performing power management in a multicore processor

Patent number: 10775873

Abstract: In an embodiment, a processor includes: a plurality of first cores to independently execute instructions, each of the plurality of first cores including a plurality of counters to store performance information; at least one second core to perform memory operations; and a power controller to receive performance information from at least some of the plurality of counters, determine a workload type executed on the processor based at least in part on the performance information, and based on the workload type dynamically migrate one or more threads from one or more of the plurality of first cores to the at least one second core for execution during a next operation interval. Other embodiments are described and claimed.

Type: Grant

Filed: February 28, 2019

Date of Patent: September 15, 2020

Assignee: Intel Corporation

Inventors: Victor W. Lee, Edward T. Grochowski, Daehyun Kim, Yuxin Bai, Sheng Li, Naveen K. Mellempudi, Dhiraj D. Kalamkar
Performing Power Management In A Multicore Processor

Publication number: 20190265777

Abstract: In an embodiment, a processor includes: a plurality of first cores to independently execute instructions, each of the plurality of first cores including a plurality of counters to store performance information; at least one second core to perform memory operations; and a power controller to receive performance information from at least some of the plurality of counters, determine a workload type executed on the processor based at least in part on the performance information, and based on the workload type dynamically migrate one or more threads from one or more of the plurality of first cores to the at least one second core for execution during a next operation interval. Other embodiments are described and claimed.

Type: Application

Filed: February 28, 2019

Publication date: August 29, 2019

Inventors: Victor W. Lee, Edward T. Grochowski, Daehyun Kim, Yuxin Bai, Sheng Li, Naveen K. Mellempudi, Dhiraj D. Kalamkar
Performing power management in a multicore processor

Patent number: 10234930

Abstract: In an embodiment, a processor includes: a plurality of first cores to independently execute instructions, each of the plurality of first cores including a plurality of counters to store performance information; at least one second core to perform memory operations; and a power controller to receive performance information from at least some of the plurality of counters, determine a workload type executed on the processor based at least in part on the performance information, and based on the workload type dynamically migrate one or more threads from one or more of the plurality of first cores to the at least one second core for execution during a next operation interval. Other embodiments are described and claimed.

Type: Grant

Filed: February 13, 2015

Date of Patent: March 19, 2019

Assignee: Intel Corporation

Inventors: Victor W. Lee, Edward T. Grochowski, Daehyun Kim, Yuxin Bai, Sheng Li, Naveen K. Mellempudi, Dhiraj D. Kalamkar
ABSTRACTION LAYERS FOR SCALABLE DISTRIBUTED MACHINE LEARNING

Publication number: 20180293493

Abstract: One embodiment provides for a method of transmitting data between multiple compute nodes of a distributed compute system, the method comprising creating a global view of communication operations to be performed between the multiple compute nodes of the distributed compute system, the global view created using information specific to a machine learning model associated with the distributed compute system; using the global view to determine a communication cost of the communication operations; and automatically determining a number of network endpoints for use in transmitting the data between the multiple compute nodes of the distributed compute system.

Type: Application

Filed: April 10, 2017

Publication date: October 11, 2018

Applicant: Intel Corporation

Inventors: Dhiraj D. Kalamkar, KARTHIKEYAN VAIDYANATHAN, SRINIVAS SRIDHARAN, DIPANKAR DAS
ABSTRACTION LIBRARY TO ENABLE SCALABLE DISTRIBUTED MACHINE LEARNING

Publication number: 20180293492

Abstract: One embodiment provides for a non-transitory machine readable medium storing instructions which, when executed by one or more processors, cause the one or more processors to perform operations comprising providing an interface to define a neural network using machine-learning domain specific terminology, wherein the interface enables selection of a neural network topology and abstracts low-level communication details of distributed training of the neural network.

Type: Application

Filed: April 10, 2017

Publication date: October 11, 2018

Applicant: Intel Corporation

Inventors: Dhiraj D. Kalamkar, KARTHIKEYAN VAIDYANATHAN, SRINIVAS SRIDHARAN, DIPANKAR DAS
Performing power management in a multicore processor

Patent number: 9910481

Abstract: In an embodiment, a processor a plurality of cores to independently execute instructions, the cores including a plurality of counters to store performance information, and a power controller coupled to the plurality of cores, the power controller having a logic to receive performance information from at least some of the plurality of counters, determine a number of cores to be active and a performance state for the number of cores for a next operation interval, based at least in part on the performance information and model information, and cause the number of cores to be active during the next operation interval, the performance information associated with execution of a workload on one or more of the plurality of cores. Other embodiments are described and claimed.

Type: Grant

Filed: February 13, 2015

Date of Patent: March 6, 2018

Assignee: Intel Corporation

Inventors: Victor W. Lee, Daehyun Kim, Yuxin Bai, Shihao Ji, Sheng Li, Dhiraj D. Kalamkar, Naveen K. Mellempudi
PERFORMING POWER MANAGEMENT IN A MULTICORE PROCESSOR

Publication number: 20160239065

Abstract: In an embodiment, a processor a plurality of cores to independently execute instructions, the cores including a plurality of counters to store performance information, and a power controller coupled to the plurality of cores, the power controller having a logic to receive performance information from at least some of the plurality of counters, determine a number of cores to be active and a performance state for the number of cores for a next operation interval, based at least in part on the performance information and model information, and cause the number of cores to be active during the next operation interval, the performance information associated with execution of a workload on one or more of the plurality of cores. Other embodiments are described and claimed.

Type: Application

Filed: February 13, 2015

Publication date: August 18, 2016

Inventors: VICTOR W. LEE, DAEHYUN KIM, YUXIN BAI, SHIHAO JI, SHENG LI, DHIRAJ D. KALAMKAR, NAVEEN K. MELLEMPUDI

1 2 next