Patents by Inventor Kamal Sinha

Kamal Sinha has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

COORDINATION AND INCREASED UTILIZATION OF GRAPHICS PROCESSORS DURING INFERENCE

Publication number: 20210201438

Abstract: A mechanism is described for facilitating inference coordination and processing utilization for machine learning at autonomous machines. A method of embodiments, as described herein, includes detecting, at training time, information relating to one or more tasks to be performed according to a training dataset relating to a processor including a graphics processor. The method may further include analyzing the information to determine one or more portions of hardware relating to the processor capable of supporting the one or more tasks, and configuring the hardware to pre-select the one or more portions to perform the one or more tasks, while other portions of the hardware remain available for other tasks.

Type: Application

Filed: January 7, 2021

Publication date: July 1, 2021

Applicant: Intel Corporation

Inventors: Abhishek R. Appu, Altug Koker, John C. Weast, Mike B. Macpherson, Linda L. Hurd, Sara S. Baghsorkhi, Justin E. Gottschlich, Prasoonkumar Surti, Chandrasekaran Sakthivel, Liwei Ma, Elmoustapha Ould-Ahmed-Vall, Kamal Sinha, Joydeep Ray, Balaji Vembu, Sanjeev Jahagirdar, Vasanth Ranganathan, DUKHWAN Kim
Compute mechanism for sparse matrix data

Patent number: 11049284

Abstract: An apparatus to facilitate compute compression is disclosed. The apparatus includes a graphics processing unit including mapping logic to map a first block of integer pixel data to a compression block and compression logic to compress the compression block.

Type: Grant

Filed: July 15, 2019

Date of Patent: June 29, 2021

Assignee: Intel Corporation

Inventors: Abhishek R. Appu, Altug Koker, Joydeep Ray, Balaji Vembu, Prasoonkumar Surti, Kamal Sinha, Nadathur Rajagoplan Satish, Narayan Srinivasa, Feng Chen, Dukhwan Kim, Farshad Akhbari
INSTRUCTIONS AND LOGIC TO PERFORM FLOATING POINT AND INTEGER OPERATIONS FOR MACHINE LEARNING

Publication number: 20210182058

Abstract: A processing apparatus is provided comprising a multiprocessor having a multithreaded architecture. The multiprocessor can execute at least one single instruction to perform parallel mixed precision matrix operations. In one embodiment the apparatus includes a memory interface and an array of multiprocessors coupled to the memory interface. At least one multiprocessor in the array of multiprocessors is configured to execute a fused multiply-add instruction in parallel across multiple threads.

Type: Application

Filed: February 5, 2021

Publication date: June 17, 2021

Applicant: Intel Corporation

Inventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
Hierarchical general register file (GRF) for execution block

Patent number: 11010163

Abstract: Disclosed herein is an apparatus which comprises a plurality of execution units, and a first general register file (GRF) communicatively couple to the plurality of execution units, wherein the first GRF is shared by the plurality of execution units.

Type: Grant

Filed: July 30, 2019

Date of Patent: May 18, 2021

Assignee: INTEL CORPORATION

Inventors: Abhishek R. Appu, Altug Koker, Joydeep Ray, Kamal Sinha, Kiran C. Veernapu, Subramaniam Maiyuran, Prasoonkumar Surti, Guei-Yuan Lueh, David Puffer, Supratim Pal, Eric J. Hoekstra, Travis T. Schluessler, Linda L. Hurd
Dynamic precision for neural network compute operations

Patent number: 11010659

Abstract: In an example, an apparatus comprises a compute engine comprising a high precision component and a low precision component; and logic, at least partially including hardware logic, to receive instructions in the compute engine; select at least one of the high precision component or the low precision component to execute the instructions; and apply a gate to at least one of the high precision component or the low precision component to execute the instructions. Other embodiments are also disclosed and claimed.

Type: Grant

Filed: April 24, 2017

Date of Patent: May 18, 2021

Assignee: INTEL CORPORATION

Inventors: Kamal Sinha, Balaji Vembu, Eriko Nurvitadhi, Nicolas C. Galoppo Von Borries, Rajkishore Barik, Tsung-Han Lin, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Anbang Yao, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Farshad Akhbari, Narayan Srinivasa, Feng Chen, Dukhwan Kim, Nadathur Rajagopalan Satish, John C. Weast, Mike B. MacPherson, Linda L. Hurd, Vasanth Ranganathan, Sanjeev S. Jahagirdar
INSTRUCTIONS AND LOGIC TO PERFORM FLOATING POINT AND INTEGER OPERATIONS FOR MACHINE LEARNING

Publication number: 20210124579

Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute a 32-bit intermediate product of 16-bit operands and to compute a 32-bit sum based on the 32-bit intermediate product.

Type: Application

Filed: December 9, 2020

Publication date: April 29, 2021

Applicant: Intel Corporation

Inventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
Sensory enhanced augmented reality and virtual reality device

Patent number: 10983594

Abstract: Systems, apparatuses and methods may provide away to enhance an augmented reality (AR) and/or virtual reality (VR) user experience with environmental information captured from sensors located in one or more physical environments. More particularly, systems, apparatuses and methods may provide a way to track, by an eye tracker sensor, a gaze of a user, and capture, by the sensors, environmental information. The systems, apparatuses and methods may render feedback, by one or more feedback devices or display device, for a portion of the environment information based on the gaze of the user.

Type: Grant

Filed: July 18, 2019

Date of Patent: April 20, 2021

Assignee: Intel Corporation

Inventors: Altug Koker, Michael Apodaca, Kai Xiao, Chandrasekaran Sakthivel, Jeffery S. Boles, Adam T. Lake, James M. Holland, Pattabhiraman K, Sayan Lahiri, Radhakrishnan Venkataraman, Kamal Sinha, Ankur N. Shah, Deepak S. Vembar, Abhishek R. Appu, Joydeep Ray, Elmoustapha Ould-Ahmed-Vall
Dynamic brightness and resolution control in virtual environments

Patent number: 10970538

Abstract: Systems, apparatuses, and methods may provide for technology to dynamically control a display in response to ocular characteristic measurements of at least one eye of a user.

Type: Grant

Filed: November 6, 2018

Date of Patent: April 6, 2021

Assignee: Intel Corporation

Inventors: Radhakrishnan Venkataraman, James M. Holland, Sayan Lahiri, Pattabhiraman K, Kamal Sinha, Chandrasekaran Sakthivel, Daniel Pohl, Vivek Tiwari, Philip R. Laws, Subramaniam Maiyuran, Abhishek R. Appu, Eimoustapha Ould-Ahmed-Vall, Peter L. Doyle, Devan Burke
Power consumption management for communication bus

Patent number: 10955896

Abstract: Methods and apparatus relating to techniques for avoiding cache lookup for cold cache. In an example, an apparatus comprises logic, at least partially comprising hardware logic, to receive data for a current write operation to a memory, determine a number of bits in the received data for the current write operation to the memory which have changed from a previous write operation to the memory and in response to a determination that the number of bits in the received data for the current write operation to the memory which have changed from a previous write operation to the memory exceeds a threshold, to toggle a plurality of bits in the data for the current write operation to create an encoded data set and set an indicator bit to a value which indicates that the plurality of bits have been toggled. Other embodiments are also disclosed and claimed.

Type: Grant

Filed: February 5, 2020

Date of Patent: March 23, 2021

Assignee: INTEL CORPORATION

Inventors: Abhishek R. Appu, Altug Koker, Eric J. Hoekstra, Kiran C. Veernapu, Prasoonkumar Surti, Vasanth Ranganathan, Kamal Sinha, Balaji Vembu, Eric J. Asperheim, Sanjeev S. Jahagirdar, Joydeep Ray
Extend GPU/CPU coherency to multi-GPU cores

Patent number: 10956330

Abstract: In an example, an apparatus comprises a plurality of processing unit cores, a plurality of cache memory modules associated with the plurality of processing unit cores, and a machine learning model communicatively coupled to the plurality of processing unit cores, wherein the plurality of cache memory modules share cache coherency data with the machine learning model. Other embodiments are also disclosed and claimed.

Type: Grant

Filed: December 26, 2019

Date of Patent: March 23, 2021

Assignee: INTEL CORPORATION

Inventors: Chandrasekaran Sakthivel, Prasoonkumar Surti, John C. Weast, Sara S. Baghsorkhi, Justin E. Gottschlich, Abhishek R. Appu, Nicolas C. Galoppo Von Borries, Joydeep Ray, Narayan Srinivasa, Feng Chen, Ben J. Ashbaugh, Rajkishore Barik, Tsung-Han Lin, Kamal Sinha, Eriko Nurvitadhi, Balaji Vembu, Altug Koker
Machine learning sparse computation mechanism

Patent number: 10943325

Abstract: Techniques to improve performance of matrix multiply operations are described in which a compute kernel can specify one or more element-wise operations to perform on output of the compute kernel before the output is transferred to higher levels of a processor memory hierarchy.

Type: Grant

Filed: July 16, 2020

Date of Patent: March 9, 2021

Assignee: Intel Corporation

Inventors: Eriko Nurvitadhi, Balaji Vembu, Tsung-Han Lin, Kamal Sinha, Rajkishore Barik, Nicolas C. Galoppo Von Borries
DYNAMIC POWER BUDGET ALLOCATION IN MULTI-PROCESSOR SYSTEM

Publication number: 20210064111

Abstract: Dynamic power budget allocation in a multi-processor system is described. In an example, an apparatus includes a plurality of processor units; and a power control component, the power control component to monitor power utilization of each of the plurality of processor units, wherein power consumed by the plurality of processor units is limited by a global power budget. The apparatus is to assign a workload to each of the processor units and is to establish an initial power budget for operation of each of the processor units, and, upon the apparatus determining that one or more processor units require an increased power budget based on one or more criteria, the apparatus is to dynamically reallocate an amount of the global power budget to the one or more processor units.

Type: Application

Filed: August 14, 2020

Publication date: March 4, 2021

Applicant: Intel Corporation

Inventors: Nikos Kaburlasos, Iqbal Rajwani, Bhushan Borole, Kamal Sinha, Sanjeev Jahagirdar
Frequent Data Value Compression for Graphics Processing Units

Publication number: 20210035257

Abstract: A control surface tracks an individual cacheline in the original surface for frequent data values. If so, control surface bits are set. When reading a cacheline from memory, first the control surface bits are read. If they happen to be set, then the original memory read is skipped altogether and instead the bits from the control surface provide the value for the entire cacheline.

Type: Application

Filed: August 5, 2020

Publication date: February 4, 2021

Inventors: Saurabh Sharma, Abhishek Venkatesh, Travis T. Schluessler, Prasoonkumar Surti, Altug Koker, Aravindh V. Anantaraman, Pattabhiraman P. K., Abhishek R. Appu, Joydeep Ray, Kamal Sinha, Vasanth Ranganathan, Bhushan M. Borole, Wenyin Fu, Eric J. Hoekstra, Linda L. Hurd
PROGRAMMABLE COARSE GRAINED AND SPARSE MATRIX COMPUTE HARDWARE WITH ADVANCED SCHEDULING

Publication number: 20210035255

Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to cause the compute apparatus to perform a complex machine learning compute operation.

Type: Application

Filed: July 14, 2020

Publication date: February 4, 2021

Applicant: Intel Corporation

Inventors: Eriko Nurvitadhi, Balaji Vembu, Nicolas C. Galoppo Von Borries, Rajkishore Barik, Tsung-Han Lin, Kamal Sinha, Nadathur Rajagopalan Satish, Jeremy Bottleson, Farshad Akhbari, Altug Koker, Narayan Srinivasa, Dukhwan Kim, Sara S. Baghsorkhi, Justin E. Gottschlich, Feng Chen, Elmoustapha Ould-Ahmed-Vall, Kevin Nealis, Xiaoming Chen, Anbang Yao
AVOID CACHE LOOKUP FOR COLD CACHE

Publication number: 20210034540

Abstract: Methods and apparatus relating to techniques for avoiding cache lookup for cold cache. In an example, an apparatus comprises logic, at least partially comprising hardware logic, to receive, in a read/modify/write (RMW) pipeline, a cache access request from a requestor, wherein the cache request comprises a cache set identifier associated with requested data in the cache set, determine whether the cache set associated with the cache set identifier is in an inaccessible invalid state, and in response to a determination that the cache set is in an inaccessible state or an invalid state, to terminate the cache access request. Other embodiments are also disclosed and claimed.

Type: Application

Filed: September 4, 2020

Publication date: February 4, 2021

Applicant: Intel Corporation

Inventors: Abhishek R. Appu, Altug Koker, Joydeep Ray, Prasoonkumar Surti, Kamal Sinha, Kiran C. Veernapu, Balaji Vembu
Data prefetching for graphics data processing

Patent number: 10909039

Abstract: Embodiments are generally directed to data prefetching for graphics data processing. An embodiment of an apparatus includes one or more processors including one or more graphics processing units (GPUs); and a plurality of caches to provide storage for the one or more GPUs, the plurality of caches including at least an L1 cache and an L3 cache, wherein the apparatus to provide intelligent prefetching of data by a prefetcher of a first GPU of the one or more GPUs including measuring a hit rate for the L1 cache; upon determining that the hit rate for the L1 cache is equal to or greater than a threshold value, limiting a prefetch of data to storage in the L3 cache, and upon determining that the hit rate for the L1 cache is less than a threshold value, allowing the prefetch of data to the L1 cache.

Type: Grant

Filed: March 15, 2019

Date of Patent: February 2, 2021

Assignee: INTEL CORPORATION

Inventors: Vikranth Vemulapalli, Lakshminarayanan Striramassarma, Mike MacPherson, Aravindh Anantaraman, Ben Ashbaugh, Murali Ramadoss, William B. Sadler, Jonathan Pearce, Scott Janus, Brent Insko, Vasanth Ranganathan, Kamal Sinha, Arthur Hunter, Jr., Prasoonkumar Surti, Nicolas Galoppo von Borries, Joydeep Ray, Abhishek R. Appu, ElMoustapha Ould-Ahmed-Vall, Altug Koker, Sungye Kim, Subramaniam Maiyuran, Valentin Andrei
Efficient multi-context thread distribution

Patent number: 10908905

Abstract: Methods and apparatus relating to techniques for avoiding cache lookup for cold cache. In an example, an apparatus comprises logic, at least partially comprising hardware logic, to determine a first number of threads to be scheduled for each context of a plurality of contexts in a multi-context processing system, allocate a second number of streaming multiprocessors (SMs) to the respective plurality of contexts, and dispatch threads from the plurality of contexts only to the streaming multiprocessor(s) allocated to the respective plurality of contexts. Other embodiments are also disclosed and claimed.

Type: Grant

Filed: October 11, 2019

Date of Patent: February 2, 2021

Assignee: Intel Corporation

Inventors: Joydeep Ray, Altug Koker, Balaji Vembu, Abhishek R. Appu, Kamal Sinha, Prasoonkumar Surti, Kiran C. Veernapu
Compute optimization mechanism for deep neural networks

Patent number: 10902547

Abstract: An apparatus to facilitate compute optimization is disclosed. The apparatus includes a plurality of processing units each comprising a plurality of execution units (EUs), wherein the plurality of EUs comprise a first EU type and a second EU type.

Type: Grant

Filed: November 21, 2017

Date of Patent: January 26, 2021

Assignee: Intel Corporation

Inventors: Prasoonkumar Surti, Narayan Srinivasa, Feng Chen, Joydeep Ray, Ben J. Ashbaugh, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Balaji Vembu, Tsung-Han Lin, Kamal Sinha, Rajkishore Barik, Sara S. Baghsorkhi, Justin E. Gottschlich, Altug Koker, Nadathur Rajagopalan Satish, Farshad Akhbari, Dukhwan Kim, Wenyin Fu, Travis T. Schluessler, Josh B. Mastronarde, Linda L. Hurd, John H. Feit, Jeffery S. Boles, Adam T. Lake, Karthik Vaidyanathan, Devan Burke, Subramaniam Maiyuran, Abhishek R. Appu
Optimizing read only memory surface accesses

Patent number: 10901909

Abstract: In accordance with some embodiments, a separate pipe is used in graphics processor for handling accesses, namely reads, to read only (RO) surfaces within caches. Moreover, the caches may have defined read only section and defined read write (RW) sections. The read only section may be accessed through a dedicated read only pipe and the read write section may be accessed through a read write pipe for those surfaces that can also be written. Thus, the read only sections are handled in a read only fashion without the need to accommodate writes.

Type: Grant

Filed: April 17, 2017

Date of Patent: January 26, 2021

Assignee: Intel Corporation

Inventors: Abhishek R. Appu, Joydeep Ray, Altug Koker, Balaji Vembu, Kamal Sinha, Prasoonkumar Surti, Wenyin Fu, Bhushan M. Borole, Vasanth Ranganathan
Coordination and increased utilization of graphics processors during inference

Patent number: 10891707

Abstract: A mechanism is described for facilitating inference coordination and processing utilization for machine learning at autonomous machines. A method of embodiments, as described herein, includes detecting, at training time, information relating to one or more tasks to be performed according to a training dataset relating to a processor including a graphics processor. The method may further include analyzing the information to determine one or more portions of hardware relating to the processor capable of supporting the one or more tasks, and configuring the hardware to pre-select the one or more portions to perform the one or more tasks, while other portions of the hardware remain available for other tasks.

Type: Grant

Filed: April 8, 2019

Date of Patent: January 12, 2021

Assignee: Intel Corporation

Inventors: Abhishek R. Appu, Altug Koker, John C. Weast, Mike B. Macpherson, Linda L. Hurd, Sara S. Baghsorkhi, Justin E. Gottschlich, Prasoonkumar Surti, Chandrasekaran Sakthivel, Liwei Ma, Elmoustapha Ould-Ahmed-Vall, Kamal Sinha, Joydeep Ray, Balaji Vembu, Sanjeev Jahagirdar, Vasanth Ranganathan, Dukhwan Kim

prev … 2 3 4 5 6 7 8 9 10 … next