Patents by Inventor Kamal Sinha

Kamal Sinha has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Intelligent thread dispatch and vectorization of atomic operations

Patent number: 10871966

Abstract: A mechanism is described for facilitating intelligent dispatching and vectorizing at autonomous machines. A method of embodiments, as described herein, includes detecting a plurality of threads corresponding to a plurality of workloads associated with tasks relating to a graphics processor. The method may further include determining a first set of threads of the plurality of threads that are similar to each other or have adjacent surfaces, and physically clustering the first set of threads close together using a first set of adjacent compute blocks.

Type: Grant

Filed: June 18, 2019

Date of Patent: December 22, 2020

Assignee: INTEL CORPORATION

Inventors: Feng Chen, Narayan Srinivasa, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Joydeep Ray, Nicolas C. Galoppo Von Borries, Prasoonkumar Surti, Ben J. Ashbaugh, Sanjeev Jahagirdar, Vasanth Ranganathan
MACHINE LEARNING SPARSE COMPUTATION MECHANISM

Publication number: 20200349670

Abstract: Techniques to improve performance of matrix multiply operations are described in which a compute kernel can specify one or more element-wise operations to perform on output of the compute kernel before the output is transferred to higher levels of a processor memory hierarchy.

Type: Application

Filed: July 16, 2020

Publication date: November 5, 2020

Applicant: Intel Corporation

Inventors: Eriko Nurvitadhi, Balaji Vembu, Tsung-Han Lin, Kamal Sinha, Rajkishore Barik, Nicolas C. Galoppo Von Borries
Replacement Policies for a Hybrid Hierarchical Cache

Publication number: 20200349091

Abstract: A hybrid hierarchical cache is implemented at the same level in the access pipeline, to get the faster access behavior of a smaller cache and, at the same time, a higher hit rate at lower power for a larger cache, in some embodiments. A split cache at the same level in the access pipeline includes two caches that work together. In the hybrid, split, low level cache (e.g., L1) evictions are coordinated locally between the two L1 portions, and on a miss to both L1 portions, a line is allocated from a larger L2 cache to the smallest L1 cache.

Type: Application

Filed: May 22, 2020

Publication date: November 5, 2020

Inventors: Abhishek R. Appu, Joydeep Ray, James A. Valerio, Altug Koker, Prasoonkumar Surti, Balaji Vembu, Wenyin Fu, Bhushan M. Borole, Kamal Sinha
Regional Adjustment of Render Rate

Publication number: 20200348897

Abstract: In accordance with some embodiments, the render rate is varied across and/or up and down the display screen. This may be done based on where the user is looking in order to reduce power consumption and/or increase performance. Specifically the screen display is separated into regions, such as quadrants. Each of these regions is rendered at a rate determined by at least one of what the user is currently looking at, what the user has looked at in the past and/or what it is predicted that the user will look at next. Areas of less focus may be rendered at a lower rate, reducing power consumption in some embodiments.

Type: Application

Filed: May 22, 2020

Publication date: November 5, 2020

Inventors: Eric J. Asperheim, Subramaniam M. Maiyuran, Kiran C. Veernapu, Sanjeev S. Jahagirdar, Balaji Vembu, Devan Burke, Philip R. Laws, Kamal Sinha, Abhishek R. Appu, Elmoustapha Ould-Ahmed-Vall, Peter L. Doyle, Joydeep Ray, Travis T. Schluessler, John H. Feit, Nikos Kaburlasos, Jacek Kwiatkowski, Altug Koker
System, apparatus and method for providing a local clock signal for a memory array

Patent number: 10817012

Abstract: In an embodiment, a processor includes at least one processor core and at least one graphics processor. The at least one graphics processor may include a register file having a plurality of entries, where at least a portion of the at least one graphics processor is to operate at a first operating frequency and the register file is to operate at a second operating frequency greater than the first operating frequency, to enable the at least one graphics processor to issue a plurality of write requests to the register file in a single clock cycle at the first operating frequency and receive a plurality of data elements of a plurality of read requests from the register file in the single clock cycle at the first operating frequency. Other embodiments are described and claimed.

Type: Grant

Filed: July 31, 2019

Date of Patent: October 27, 2020

Assignee: Intel Corporation

Inventors: Iqbal R. Rajwani, Altug Koker, Bhushan M. Borole, Kamal Sinha, Abhishek R. Appu, Anupama A. Thaploo, Sunil Nekkanti, Wenyin Fu
DATA PREFETCHING FOR GRAPHICS DATA PROCESSING

Publication number: 20200293450

Abstract: Embodiments are generally directed to data prefetching for graphics data processing. An embodiment of an apparatus includes one or more processors including one or more graphics processing units (GPUs); and a plurality of caches to provide storage for the one or more GPUs, the plurality of caches including at least an L1 cache and an L3 cache, wherein the apparatus to provide intelligent prefetching of data by a prefetcher of a first GPU of the one or more GPUs including measuring a hit rate for the L1 cache; upon determining that the hit rate for the L1 cache is equal to or greater than a threshold value, limiting a prefetch of data to storage in the L3 cache, and upon determining that the hit rate for the L1 cache is less than a threshold value, allowing the prefetch of data to the L1 cache.

Type: Application

Filed: March 15, 2019

Publication date: September 17, 2020

Applicant: Intel Corporation

Inventors: Vikranth Vemulapalli, Lakshminarayanan Striramassarma, Mike MacPherson, Aravindh Anantaraman, Ben Ashbaugh, Murali Ramadoss, William B. Sadler, Jonathan Pearce, Scott Janus, Brent Insko, Vasanth Ranganathan, Kamal Sinha, Arthur Hunter, JR., Prasoonkumar Surti, Nicolas Galoppo von Borries, Joydeep Ray, Abhishek R. Appu, ElMoustapha Ould-Ahmed-Vall, Altug Koker, Sungye Kim, Subramaniam Maiyuran, Valentin Andrei
MULTI-TILE GRAPHICS PROCESSOR RENDERING

Publication number: 20200294301

Abstract: Embodiments are generally directed to multi-tile graphics processor rendering. An embodiment of an apparatus includes a memory for storage of data; and one or more processors including a graphics processing unit (GPU) to process data, wherein the GPU includes a plurality of GPU tiles, wherein, upon geometric data being assigned to each of a plurality of screen tiles, the apparatus is to transfer the geometric data to the plurality of GPU tiles.

Type: Application

Filed: March 15, 2019

Publication date: September 17, 2020

Applicant: Intel Corporation

Inventors: Prasoonkumar Surti, Arthur Hunter, JR., Kamal Sinha, Scott Janus, Brent Insko, Vasanth Ranganathan, Lakshminarayanan Striramassarma
Avoid cache lookup for cold cache

Patent number: 10769072

Abstract: Methods and apparatus relating to techniques for avoiding cache lookup for cold cache. In an example, an apparatus comprises logic, at least partially comprising hardware logic, to receive, in a read/modify/write (RMW) pipeline, a cache access request from a requestor, wherein the cache request comprises a cache set identifier associated with requested data in the cache set, determine whether the cache set associated with the cache set identifier is in an inaccessible invalid state, and in response to a determination that the cache set is in an inaccessible state or an invalid state, to terminate the cache access request. Other embodiments are also disclosed and claimed.

Type: Grant

Filed: February 15, 2019

Date of Patent: September 8, 2020

Assignee: INTEL CORPORATION

Inventors: Abhishek R. Appu, Altug Koker, Joydeep Ray, Prasoonkumar Surti, Kamal Sinha, Kiran C. Veernapu, Balaji Vembu
Programmable coarse grained and sparse matrix compute hardware with advanced scheduling

Patent number: 10769748

Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to cause the compute apparatus to perform a complex machine learning compute operation.

Type: Grant

Filed: November 21, 2018

Date of Patent: September 8, 2020

Assignee: Intel Corporation

Inventors: Eriko Nurvitadhi, Balaji Vembu, Nicolas C. Galoppo Von Borries, Rajkishore Barik, Tsung-Han Lin, Kamal Sinha, Nadathur Rajagopalan Satish, Jeremy Bottleson, Farshad Akhbari, Altug Koker, Narayan Srinivasa, Dukhwan Kim, Sara S. Baghsorkhi, Justin E. Gottschlich, Feng Chen, Elmoustapha Ould-Ahmed-Vall, Kevin Nealis, Xiaoming Chen, Anbang Yao
MACHINE LEARNING SPARSE COMPUTATION MECHANISM

Publication number: 20200279349

Abstract: An apparatus to facilitate processing of a sparse matrix is disclosed. The apparatus includes a plurality of processing units each comprising one or more processing elements, including logic to read operands, a multiplication unit to multiply two or more operands and a scheduler to identify operands having a zero value and prevent scheduling of the operands having the zero value at the multiplication unit.

Type: Application

Filed: May 21, 2020

Publication date: September 3, 2020

Applicant: Intel Corporation

Inventors: Eriko Nurvitadhi, Balaji Vembu, Tsung-Han Lin, Kamal Sinha, Rajikshore Barik, Nicolas C. Galoppo Von Borries
System, apparatus and method for reducing voltage swing on an interconnect

Patent number: 10762877

Abstract: In an embodiment, an apparatus includes: a repeater to receive an input signal at an input node and output an output signal at an output node; a dynamic header device coupled between the repeater and a supply voltage node; and a feedback device coupled between the output node and the dynamic header device to dynamically control the dynamic header device based at least in part on the output signal. Other embodiments are described and claimed.

Type: Grant

Filed: April 17, 2017

Date of Patent: September 1, 2020

Assignee: Intel Corporation

Inventors: Anupama A. Thaploo, Jaydeep P. Kulkarni, Bhushan M. Borole, Abhishek R. Appu, Altug Koker, Kamal Sinha, Wenyin Fu
PROCESSOR POWER MANAGEMENT

Publication number: 20200272215

Abstract: Methods and apparatus relating to techniques for avoiding cache lookup for cold cache. In an example, an apparatus comprises logic, at least partially comprising hardware logic, to collect user information for a user of a data processing device, generate a user profile for the user of the data processing device from the user information, and set a power profile a processor in the data processing device using the user profile. Other embodiments are also disclosed and claimed.

Type: Application

Filed: February 28, 2020

Publication date: August 27, 2020

Applicant: INTEL CORPORATION

Inventors: Altug Koker, Abhishek R. Appu, Kiran C. Veernapu, Joydeep Ray, Balaji Vembu, Prasoonkumar Surti, Kamal Sinha, Eric J. Hoekstra, Wenyin Fu, Nikos Kaburlasos, Bhushan M. Borole, Travis T. Schluessler, Ankur N. Shah, Jonathan Kennedy
Reducing aging of register file keeper circuits

Patent number: 10754809

Abstract: By shutting off keeper transistors during pre-charge, the aging on these devices may be reduced. This means that a relatively weaker keeper may be used for noise compared to an overdesigned stronger keeper. Using a relatively weaker keeper circuit results in a faster evaluation stage and improved minimum read voltage in some embodiments.

Type: Grant

Filed: March 15, 2019

Date of Patent: August 25, 2020

Assignee: Intel Corporation

Inventors: Anupama A. Thaploo, Bhushan M. Borole, Bee Ngo, Iqbal R. Rajwani, Altug Koker, Abhishek R. Appu, Kamal Sinha, Wenyin Fu
Frequent data value compression for graphics processing units

Patent number: 10748238

Abstract: A control surface tracks an individual cacheline in the original surface for frequent data values. If so, control surface bits are set. When reading a cacheline from memory, first the control surface bits are read. If they happen to be set, then the original memory read is skipped altogether and instead the bits from the control surface provide the value for the entire cacheline.

Type: Grant

Filed: February 19, 2019

Date of Patent: August 18, 2020

Assignee: Intel Corporation

Inventors: Saurabh Sharma, Abhishek Venkatesh, Travis T. Schluessler, Prasoonkumar Surti, Altug Koker, Aravindh V. Anantaraman, Pattabhiraman P. K., Abhishek R. Appu, Joydeep Ray, Kamal Sinha, Vasanth Ranganathan, Bhushan M. Borole, Wenyin Fu, Eric J. Hoekstra, Linda L. Hurd
Dynamic power budget allocation in multi-processor system

Patent number: 10747286

Abstract: Dynamic power budget allocation in a multi-processor system is described. In an example, an apparatus includes a plurality of processor units; and a power control component, the power control component to monitor power utilization of each of the plurality of processor units, wherein power consumed by the plurality of processor units is limited by a global power budget. The apparatus is to assign a workload to each of the processor units and is to establish an initial power budget for operation of each of the processor units, and, upon the apparatus determining that one or more processor units require an increased power budget based on one or more criteria, the apparatus is to dynamically reallocate an amount of the global power budget to the one or more processor units.

Type: Grant

Filed: June 11, 2018

Date of Patent: August 18, 2020

Assignee: INTEL CORPORATION

Inventors: Nikos Kaburlasos, Iqbal Rajwani, Bhushan Borole, Kamal Sinha, Sanjeev Jahagirdar
POWER CONSUMPTION MANAGEMENT FOR COMMUNICATION BUS

Publication number: 20200241622

Abstract: Methods and apparatus relating to techniques for avoiding cache lookup for cold cache. In an example, an apparatus comprises logic, at least partially comprising hardware logic, to receive data for a current write operation to a memory, determine a number of bits in the received data for the current write operation to the memory which have changed from a previous write operation to the memory and in response to a determination that the number of bits in the received data for the current write operation to the memory which have changed from a previous write operation to the memory exceeds a threshold, to toggle a plurality of bits in the data for the current write operation to create an encoded data set and set an indicator bit to a value which indicates that the plurality of bits have been toggled. Other embodiments are also disclosed and claimed.

Type: Application

Filed: February 5, 2020

Publication date: July 30, 2020

Applicant: INTEL CORPORATION

Inventors: Abhishek R. Appu, Altug Koker, Eric J. Hoekstra, Kiran C. Veernapu, Prasoonkumar Surti, Vasanth Ranganathan, Kamal Sinha, Balaji Vembu, Eric J. Asperheim, Sanjeev S. Jahagirdar, Joydeep Ray
ADAPTIVE MULTI-RESOLUTION FOR GRAPHICS

Publication number: 20200218330

Abstract: An embodiment may include an application processor, persistent storage media coupled to the application processor, and a graphics subsystem coupled to the application processor. The system may further include any of a performance analyzer to analyze a performance of the graphics subsystem to provide performance analysis information, a content-based depth analyzer to analyze content to provide content-based depth analysis information, a focus analyzer to analyze a focus area to provide focus analysis information, an edge analyzer to provide edge analysis information, a frame analyzer to provide frame analysis information, and/or a variance analyzer to analyze respective amounts of variance for the frame. The system may further include a workload adjuster to adjust a workload of the graphics subsystem based on the analysis information. Other embodiments are disclosed and claimed.

Type: Application

Filed: March 16, 2020

Publication date: July 9, 2020

Inventors: Travis T. Schluessler, Joydeep Ray, John H. Feit, Nikos Kaburlasos, Jacek Kwiatkowski, Abhishek R. Appu, Kamal Sinha, James M. Holland, Pattabhiraman K., Sayan Lahiri, Radhakrishnan Venkataraman, Carson Brownlee, Vivek Tiwari, Kai Xiao, Jefferson Amstutz, Deepak S. Vembar, Ankur N. Shah, ElMoustapha Ould-Ahmed-Vall
Machine learning sparse computation mechanism

Patent number: 10706498

Abstract: An apparatus to facilitate processing of a sparse matrix is disclosed. The apparatus includes a plurality of processing units each comprising one or more processing elements, including logic to read operands, a multiplication unit to multiply two or more operands and a scheduler to identify operands having a zero value and prevent scheduling of the operands having the zero value at the multiplication unit.

Type: Grant

Filed: May 20, 2019

Date of Patent: July 7, 2020

Assignee: Intel Corporation

Inventors: Eriko Nurvitadhi, Balaji Vembu, Tsung-Han Lin, Kamal Sinha, Rajikshore Barik, Nicolas C. Galoppo Von Borries
HYBRID LOW POWER HOMOGENOUS GRAPICS PROCESSING UNITS

Publication number: 20200210238

Abstract: In an example, an apparatus comprises a plurality of execution units comprising at least a first type of execution unit and a second type of execution unit and logic, at least partially including hardware logic, to analyze a workload and assign the workload to one of the first type of execution unit or the second type of execution unit. Other embodiments are also disclosed and claimed.

Type: Application

Filed: December 24, 2019

Publication date: July 2, 2020

Applicant: Intel Corporation

Inventors: Abhishek R Appu, Altug Koker, Balaji Vembu, Joydeep Ray, Kamal Sinha, Prasoonkumar Surti, Kiran C. Veernapu, Subramaniam Maiyuran, Sanjeev S. Jahagirdar, Eric J. Asperheim, Guei-Yuan Lueh, David Puffer, Wenyin Fu, Nikos Kaburlasos, Bhushan M. Borole, Josh B. Mastronarde, Linda L. Hurd, Travis T. Schluessler, Tomasz Janczak, Abhishek Venkatesh, Kai Xiao, Slawomir Grajewski
EXTEND GPU/CPU COHERENCY TO MULTI-GPU CORES

Publication number: 20200210338

Abstract: In an example, an apparatus comprises a plurality of processing unit cores, a plurality of cache memory modules associated with the plurality of processing unit cores, and a machine learning model communicatively coupled to the plurality of processing unit cores, wherein the plurality of cache memory modules share cache coherency data with the machine learning model. Other embodiments are also disclosed and claimed.

Type: Application

Filed: December 26, 2019

Publication date: July 2, 2020

Applicant: Intel Corporation

Inventors: Chandrasekaran Sakthivel, Prasoonkumar Surti, John C. Weast, Sara S. Baghsorkhi, Justin E. Gottschlich, Abhishek R. Appu, Nicolas C. Galoppo Von Borries, Joydeep Ray, Narayan Srinivasa, Feng Chen, Ben J. Ashbaugh, Rajkishore Barik, Tsung-Han Lin, Kamal Sinha, Eriko Nurvitadhi, Balaji Vembu, Altug Koker

prev … 3 4 5 6 7 8 9 10 11 … next