Patents by Inventor DHIRAJ D. KALAMKAR
DHIRAJ D. KALAMKAR has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240427842Abstract: An apparatus to facilitate machine learning matrix processing is disclosed. The apparatus comprises a memory to store matrix data one or more processors to execute an instruction to examine a message descriptor included in the instruction to determine a type of matrix layout manipulation operation that is to be executed, examine a message header included in the instruction having a plurality of parameters that define a two-dimensional (2D) memory surface that is to be retrieved, retrieve one or more blocks of the matrix data from the memory based on the plurality of parameters and a register file including a plurality of registers, wherein the one or more blocks of the matrix data is stored within a first set of the plurality of registers.Type: ApplicationFiled: May 24, 2024Publication date: December 26, 2024Applicant: Intel CorporationInventors: Joydeep Ray, Fangwen Fu, Dhiraj D. Kalamkar, Sasikanth Avancha
-
Patent number: 12039000Abstract: An apparatus to facilitate machine learning matrix processing is disclosed. The apparatus comprises a memory to store matrix data one or more processors to execute an instruction to examine a message descriptor included in the instruction to determine a type of matrix layout manipulation operation that is to be executed, examine a message header included in the instruction having a plurality of parameters that define a two-dimensional (2D) memory surface that is to be retrieved, retrieve one or more blocks of the matrix data from the memory based on the plurality of parameters and a register file including a plurality of registers, wherein the one or more blocks of the matrix data is stored within a first set of the plurality of registers.Type: GrantFiled: February 2, 2023Date of Patent: July 16, 2024Assignee: Intel CorporationInventors: Joydeep Ray, Fangwen Fu, Dhiraj D. Kalamkar, Sasikanth Avancha
-
Publication number: 20240070799Abstract: One embodiment provides for a method of transmitting data between multiple compute nodes of a distributed compute system, the method comprising creating a global view of communication operations to be performed between the multiple compute nodes of the distributed compute system, the global view created using information specific to a machine learning model associated with the distributed compute system; using the global view to determine a communication cost of the communication operations; and automatically determining a number of network endpoints for use in transmitting the data between the multiple compute nodes of the distributed compute system.Type: ApplicationFiled: September 5, 2023Publication date: February 29, 2024Applicant: Intel CorporationInventors: Dhiraj D. KALAMKAR, Karthikeyan VAIDYANATHAN, Srinivas SRIDHARAN, Dipankar DAS
-
Patent number: 11798120Abstract: One embodiment provides for a method of transmitting data between multiple compute nodes of a distributed compute system, the method comprising creating a global view of communication operations to be performed between the multiple compute nodes of the distributed compute system, the global view created using information specific to a machine learning model associated with the distributed compute system; using the global view to determine a communication cost of the communication operations; and automatically determining a number of network endpoints for use in transmitting the data between the multiple compute nodes of the distributed compute system.Type: GrantFiled: August 10, 2021Date of Patent: October 24, 2023Assignee: INTEL CORPORATIONInventors: Dhiraj D. Kalamkar, Karthikeyan Vaidyanathan, Srinivas Sridharan, Dipankar Das
-
Publication number: 20230289399Abstract: An apparatus to facilitate machine learning matrix processing is disclosed. The apparatus comprises a memory to store matrix data one or more processors to execute an instruction to examine a message descriptor included in the instruction to determine a type of matrix layout manipulation operation that is to be executed, examine a message header included in the instruction having a plurality of parameters that define a two-dimensional (2D) memory surface that is to be retrieved, retrieve one or more blocks of the matrix data from the memory based on the plurality of parameters and a register file including a plurality of registers, wherein the one or more blocks of the matrix data is stored within a first set of the plurality of registers.Type: ApplicationFiled: February 2, 2023Publication date: September 14, 2023Applicant: Intel CorporationInventors: Joydeep Ray, Fangwen Fu, Dhiraj D. Kalamkar, Sasikanth Avancha
-
Patent number: 11593454Abstract: An apparatus to facilitate machine learning matrix processing is disclosed. The apparatus comprises a memory to store matrix data one or more processors to execute an instruction to examine a message descriptor included in the instruction to determine a type of matrix layout manipulation operation that is to be executed, examine a message header included in the instruction having a plurality of parameters that define a two-dimensional (2D) memory surface that is to be retrieved, retrieve one or more blocks of the matrix data from the memory based on the plurality of parameters and a register file including a plurality of registers, wherein the one or more blocks of the matrix data is stored within a first set of the plurality of registers.Type: GrantFiled: June 2, 2020Date of Patent: February 28, 2023Assignee: Intel CorporationInventors: Joydeep Ray, Fangwen Fu, Dhiraj D. Kalamkar, Sasikanth Avancha
-
Patent number: 11494163Abstract: An apparatus to facilitate a computer number format conversion is disclosed. The apparatus comprises a control unit to receive to receive data format information indicating a first precision data format that input data is to be received and converter hardware to receive the input data and convert the first precision data format to a second precision data format based on the data format information.Type: GrantFiled: September 6, 2019Date of Patent: November 8, 2022Assignee: Intel CorporationInventors: Naveen Mellempudi, Dipankar Das, Chunhui Mei, Kristopher Wong, Dhiraj D. Kalamkar, Hong H. Jiang, Subramaniam Maiyuran, Varghese George
-
Publication number: 20220101480Abstract: One embodiment provides for a method of transmitting data between multiple compute nodes of a distributed compute system, the method comprising creating a global view of communication operations to be performed between the multiple compute nodes of the distributed compute system, the global view created using information specific to a machine learning model associated with the distributed compute system; using the global view to determine a communication cost of the communication operations; and automatically determining a number of network endpoints for use in transmitting the data between the multiple compute nodes of the distributed compute system.Type: ApplicationFiled: August 10, 2021Publication date: March 31, 2022Applicant: Intel CorporationInventors: DHIRAJ D. KALAMKAR, KARTHIKEYAN VAIDYANATHAN, SRINIVAS SRIDHARAN, DIPANKAR DAS
-
Publication number: 20210374209Abstract: An apparatus to facilitate machine learning matrix processing is disclosed. The apparatus comprises a memory to store matrix data one or more processors to execute an instruction to examine a message descriptor included in the instruction to determine a type of matrix layout manipulation operation that is to be executed, examine a message header included in the instruction having a plurality of parameters that define a two-dimensional (2D) memory surface that is to be retrieved, retrieve one or more blocks of the matrix data from the memory based on the plurality of parameters and a register file including a plurality of registers, wherein the one or more blocks of the matrix data is stored within a first set of the plurality of registers.Type: ApplicationFiled: June 2, 2020Publication date: December 2, 2021Applicant: Intel CorporationInventors: Joydeep Ray, Fangwen Fu, Dhiraj D. Kalamkar, Sasikanth Avancha
-
Publication number: 20210350212Abstract: One embodiment provides for a non-transitory machine readable medium storing instructions which, when executed by one or more processors, cause the one or more processors to perform operations comprising providing an interface to define a neural network using machine-learning domain specific terminology, wherein the interface enables selection of a neural network topology and abstracts low-level communication details of distributed training of the neural network.Type: ApplicationFiled: May 24, 2021Publication date: November 11, 2021Applicant: Intel CorporationInventors: DHIRAJ D. KALAMKAR, KARTHIKEYAN VAIDYANATHAN, SRINIVAS SRIDHARAN, DIPANKAR DAS
-
Patent number: 11094029Abstract: One embodiment provides for a method of transmitting data between multiple compute nodes of a distributed compute system, the method comprising creating a global view of communication operations to be performed between the multiple compute nodes of the distributed compute system, the global view created using information specific to a machine learning model associated with the distributed compute system; using the global view to determine a communication cost of the communication operations; and automatically determining a number of network endpoints for use in transmitting the data between the multiple compute nodes of the distributed compute system.Type: GrantFiled: April 10, 2017Date of Patent: August 17, 2021Assignee: INTEL CORPORATIONInventors: Dhiraj D. Kalamkar, Karthikeyan Vaidyanathan, Srinivas Sridharan, Dipankar Das
-
Patent number: 11023803Abstract: One embodiment provides for a non-transitory machine readable medium storing instructions which, when executed by one or more processors, cause the one or more processors to perform operations comprising providing an interface to define a neural network using machine-learning domain specific terminology, wherein the interface enables selection of a neural network topology and abstracts low-level communication details of distributed training of the neural network.Type: GrantFiled: April 10, 2017Date of Patent: June 1, 2021Assignee: INTEL CORPORATIONInventors: Dhiraj D. Kalamkar, Karthikeyan Vaidyanathan, Srinivas Sridharan, Dipankar Das
-
Publication number: 20210072955Abstract: An apparatus to facilitate a computer number format conversion is disclosed. The apparatus comprises a control unit to receive to receive data format information indicating a first precision data format that input data is to be received and converter hardware to receive the input data and convert the first precision data format to a second precision data format based on the data format information.Type: ApplicationFiled: September 6, 2019Publication date: March 11, 2021Applicant: Intel CorporationInventors: Naveen MELLEMPUDI, Dipankar DAS, Chunhui MEI, Kristopher WONG, Dhiraj D. KALAMKAR, Hong H. JIANG, Subramaniam Maiyuran, Varghese George
-
Patent number: 10775873Abstract: In an embodiment, a processor includes: a plurality of first cores to independently execute instructions, each of the plurality of first cores including a plurality of counters to store performance information; at least one second core to perform memory operations; and a power controller to receive performance information from at least some of the plurality of counters, determine a workload type executed on the processor based at least in part on the performance information, and based on the workload type dynamically migrate one or more threads from one or more of the plurality of first cores to the at least one second core for execution during a next operation interval. Other embodiments are described and claimed.Type: GrantFiled: February 28, 2019Date of Patent: September 15, 2020Assignee: Intel CorporationInventors: Victor W. Lee, Edward T. Grochowski, Daehyun Kim, Yuxin Bai, Sheng Li, Naveen K. Mellempudi, Dhiraj D. Kalamkar
-
Publication number: 20190265777Abstract: In an embodiment, a processor includes: a plurality of first cores to independently execute instructions, each of the plurality of first cores including a plurality of counters to store performance information; at least one second core to perform memory operations; and a power controller to receive performance information from at least some of the plurality of counters, determine a workload type executed on the processor based at least in part on the performance information, and based on the workload type dynamically migrate one or more threads from one or more of the plurality of first cores to the at least one second core for execution during a next operation interval. Other embodiments are described and claimed.Type: ApplicationFiled: February 28, 2019Publication date: August 29, 2019Inventors: Victor W. Lee, Edward T. Grochowski, Daehyun Kim, Yuxin Bai, Sheng Li, Naveen K. Mellempudi, Dhiraj D. Kalamkar
-
Patent number: 10234930Abstract: In an embodiment, a processor includes: a plurality of first cores to independently execute instructions, each of the plurality of first cores including a plurality of counters to store performance information; at least one second core to perform memory operations; and a power controller to receive performance information from at least some of the plurality of counters, determine a workload type executed on the processor based at least in part on the performance information, and based on the workload type dynamically migrate one or more threads from one or more of the plurality of first cores to the at least one second core for execution during a next operation interval. Other embodiments are described and claimed.Type: GrantFiled: February 13, 2015Date of Patent: March 19, 2019Assignee: Intel CorporationInventors: Victor W. Lee, Edward T. Grochowski, Daehyun Kim, Yuxin Bai, Sheng Li, Naveen K. Mellempudi, Dhiraj D. Kalamkar
-
Publication number: 20180293493Abstract: One embodiment provides for a method of transmitting data between multiple compute nodes of a distributed compute system, the method comprising creating a global view of communication operations to be performed between the multiple compute nodes of the distributed compute system, the global view created using information specific to a machine learning model associated with the distributed compute system; using the global view to determine a communication cost of the communication operations; and automatically determining a number of network endpoints for use in transmitting the data between the multiple compute nodes of the distributed compute system.Type: ApplicationFiled: April 10, 2017Publication date: October 11, 2018Applicant: Intel CorporationInventors: Dhiraj D. Kalamkar, KARTHIKEYAN VAIDYANATHAN, SRINIVAS SRIDHARAN, DIPANKAR DAS
-
Publication number: 20180293492Abstract: One embodiment provides for a non-transitory machine readable medium storing instructions which, when executed by one or more processors, cause the one or more processors to perform operations comprising providing an interface to define a neural network using machine-learning domain specific terminology, wherein the interface enables selection of a neural network topology and abstracts low-level communication details of distributed training of the neural network.Type: ApplicationFiled: April 10, 2017Publication date: October 11, 2018Applicant: Intel CorporationInventors: Dhiraj D. Kalamkar, KARTHIKEYAN VAIDYANATHAN, SRINIVAS SRIDHARAN, DIPANKAR DAS
-
Patent number: 9910481Abstract: In an embodiment, a processor a plurality of cores to independently execute instructions, the cores including a plurality of counters to store performance information, and a power controller coupled to the plurality of cores, the power controller having a logic to receive performance information from at least some of the plurality of counters, determine a number of cores to be active and a performance state for the number of cores for a next operation interval, based at least in part on the performance information and model information, and cause the number of cores to be active during the next operation interval, the performance information associated with execution of a workload on one or more of the plurality of cores. Other embodiments are described and claimed.Type: GrantFiled: February 13, 2015Date of Patent: March 6, 2018Assignee: Intel CorporationInventors: Victor W. Lee, Daehyun Kim, Yuxin Bai, Shihao Ji, Sheng Li, Dhiraj D. Kalamkar, Naveen K. Mellempudi
-
Publication number: 20160239065Abstract: In an embodiment, a processor a plurality of cores to independently execute instructions, the cores including a plurality of counters to store performance information, and a power controller coupled to the plurality of cores, the power controller having a logic to receive performance information from at least some of the plurality of counters, determine a number of cores to be active and a performance state for the number of cores for a next operation interval, based at least in part on the performance information and model information, and cause the number of cores to be active during the next operation interval, the performance information associated with execution of a workload on one or more of the plurality of cores. Other embodiments are described and claimed.Type: ApplicationFiled: February 13, 2015Publication date: August 18, 2016Inventors: VICTOR W. LEE, DAEHYUN KIM, YUXIN BAI, SHIHAO JI, SHENG LI, DHIRAJ D. KALAMKAR, NAVEEN K. MELLEMPUDI