Patents by Inventor Balaji Vembu

Balaji Vembu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Replacement policies for a hybrid hierarchical cache

Patent number: 11263152

Abstract: A hybrid hierarchical cache is implemented at the same level in the access pipeline, to get the faster access behavior of a smaller cache and, at the same time, a higher hit rate at lower power for a larger cache, in some embodiments. A split cache at the same level in the access pipeline includes two caches that work together. In the hybrid, split, low level cache (e.g., L1) evictions are coordinated locally between the two L1 portions, and on a miss to both L1 portions, a line is allocated from a larger L2 cache to the smallest L1 cache.

Type: Grant

Filed: May 22, 2020

Date of Patent: March 1, 2022

Assignee: Intel Corporation

Inventors: Abhishek R. Appu, Joydeep Ray, James A. Valerio, Altug Koker, Prasoonkumar Surti, Balaji Vembu, Wenyin Fu, Bhushan M. Borole, Kamal Sinha
Order independent asynchronous compute and streaming for graphics

Patent number: 11257274

Abstract: An embodiment of an electronic processing system may include an application processor, persistent storage media communicatively coupled to the application processor, and a graphics subsystem communicatively coupled to the application processor. The system may include one or more of a draw call re-orderer communicatively coupled to the application processor and the graphics subsystem to re-order two or more draw calls, a workload re-orderer communicatively coupled to the application processor and the graphics subsystem to re-order two or more work items in an order independent mode, a queue primitive included in at least one of the two or more draw calls to define a producer stage and a consumer stage, and an order-independent executor communicatively coupled to the application processor and the graphics subsystem to provide tile-based order independent execution of a compute stage. Other embodiments are disclosed and claimed.

Type: Grant

Filed: May 29, 2020

Date of Patent: February 22, 2022

Assignee: Intel Corporation

Inventors: Devan Burke, Adam T. Lake, Jeffery S. Boles, John H. Feit, Karthik Vaidyanathan, Abhishek R. Appu, Joydeep Ray, Subramaniam Maiyuran, Altug Koker, Balaji Vembu, Murali Ramadoss, Prasoonkumar Surti, Eric J. Hoekstra, Gabor Liktor, Jonathan Kennedy, Slawomir Grajewski, Elmoustapha Ould-Ahmed-Vall
ADJUSTING GRAPHICS RENDERING BASED ON FACIAL EXPRESSION

Publication number: 20220050520

Abstract: An embodiment of a graphics apparatus may include a facial expression detector to detect a facial expression of a user, and a parameter adjuster communicatively coupled to the facial expression detector to adjust a graphics parameter based on the detected facial expression of the user. The detected facial expression may include one or more of a squinting, blinking, winking, and facial muscle tension of the user. The graphics parameter may include one or more of a frame resolution, a screen contrast, a screen brightness, and a shading rate. Other embodiments are disclosed and claimed.

Type: Application

Filed: August 30, 2021

Publication date: February 17, 2022

Applicant: Intel Corporation

Inventors: Travis T. Schluessler, Joydeep Ray, John H. Feit, Nikos Kaburlasos, Jacek Kwiatkowski, Jefferson Amstutz, Carson Brownlee, Vivek Tiwari, Sayan Lahiri, Kai Xiao, Abhishek R. Appu, ElMoustapha Ould-Ahmed-Vall, Deepak S. Vembar, Ankur N. Shah, Balaji Vembu, Josh B. Mastronarde
Handling pipeline submissions across many compute units

Patent number: 11244420

Abstract: One embodiment provides an apparatus comprising an interconnect fabric comprising one or more fabric switches, a plurality of memory interfaces coupled to the interconnect fabric to provide access to a plurality of memory devices, an input/output (IO) interface coupled to the interconnect fabric to provide access to IO devices, an array of multiprocessors coupled to the interconnect fabric, scheduling circuitry to distribute a plurality of thread groups across the array of multiprocessors, each thread group comprising a plurality of threads and each thread comprising a plurality of instructions to be executed by at least one of the multiprocessors, and a first multiprocessor of the array of multiprocessors to be assigned to process a first thread group comprising a first plurality of threads, the first multiprocessor comprising a plurality of parallel execution circuits.

Type: Grant

Filed: March 10, 2021

Date of Patent: February 8, 2022

Assignee: Intel Corporation

Inventors: Balaji Vembu, Altug Koker, Joydeep Ray
Thread prefetch mechanism

Patent number: 11232536

Abstract: An apparatus to facilitate data prefetching is disclosed. The apparatus includes a memory, one or more execution units (EUs) to execute a plurality of processing threads and prefetch logic to prefetch pages of data from the memory to assist in the execution of the plurality of processing threads.

Type: Grant

Filed: February 14, 2020

Date of Patent: January 25, 2022

Assignee: INTEL CORPORATION

Inventors: Adam T. Lake, Guei-Yuan Lueh, Balaji Vembu, Murali Ramadoss, Prasoonkumar Surti, Abhishek R. Appu, Altug Koker, Subramaniam M. Maiyuran, Eric C. Samson, David J. Cowperthwaite, Zhi Wang, Kun Tian, David Puffer, Brian T. Lewis
Method and apparatus for efficient loop processing in a graphics hardware front end

Patent number: 11232531

Abstract: Various embodiments enable loop processing in a command processing block of the graphics hardware. Such hardware may include a processor including a command buffer, and a graphics command parser. The graphics command parser to load graphics commands from the command buffer, parse a first graphics command, store a loop count value associated with the first graphics command, parse a second graphics command and store a loop wrap address based on the second graphics command. The graphics command parser may execute a command sequence identified by the second graphics command, parse a third graphics command, the third graphics command identifying an end of the command sequence, set a new loop count value, and iteratively execute the command sequence using the loop wrap address based on the new loop count value.

Type: Grant

Filed: August 29, 2017

Date of Patent: January 25, 2022

Assignee: INTEL CORPORATION

Inventors: Hema Chand Nalluri, Balaji Vembu, Peter Doyle, Michael Apodaca
INSTRUCTIONS AND LOGIC TO PERFORM FLOATING POINT AND INTEGER OPERATIONS FOR MACHINE LEARNING

Publication number: 20220019431

Abstract: A processing apparatus is provided comprising a multiprocessor having a multithreaded architecture. The multiprocessor can execute at least one single instruction to perform parallel mixed precision matrix operations. In one embodiment the apparatus includes a memory interface and an array of multiprocessors coupled to the memory interface. At least one multiprocessor in the array of multiprocessors is configured to execute a fused multiply-add instruction in parallel across multiple threads.

Type: Application

Filed: July 6, 2021

Publication date: January 20, 2022

Applicant: Intel Corporation

Inventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
Multiple Key Management

Publication number: 20220019700

Abstract: A system and method for encrypting and decrypting data exchanged between a multi-tile processing unit and a storage, where a plurality of keys are used for the encryption. Each of the plurality of keys is associated with a different one or more sets of the processors. Encryption hardware is configured to select a key to use for encryption/decryption operations in dependence upon the set of tiles associated with the data being exchanged. Each write request from a tile contains identifier bits associated with that tile's set of tiles, enabling the encryption hardware to select the key to use for encrypting the data in the write request. Each read completion for a tile contains identifier bits associated with that tile's set of tiles, enabling the encryption hardware to select the key to use for decrypting the data in the read completion.

Type: Application

Filed: July 13, 2021

Publication date: January 20, 2022

Inventors: Daniel John Pelham WILKINSON, Graham Bernard CUNNINGHAM, Stavros VOLOS, Kapil VASWANI, Cedric Alain Marie FOURNET, Balaji VEMBU
Thread scheduling over compute blocks for power optimization

Patent number: 11227360

Abstract: One embodiment provides for a general-purpose graphics processing unit comprising a processing array including multiple compute blocks, each compute block including multiple processing clusters and a thread dispatch unit to dispatch threads of a workload to the multiple compute blocks based on a parallelism metric, wherein the thread dispatch unit, based on the parallelism metric, is to perform one of a first operation and a second operation, the first operation to distribute threads across the multiple compute blocks and the second operation is to concentrate threads within one of the multiple compute blocks.

Type: Grant

Filed: December 16, 2019

Date of Patent: January 18, 2022

Assignee: Intel Corporation

Inventors: Altug Koker, Balaji Vembu, Joydeep Ray, James A. Valerio, Abhishek R. Appu
Compute optimization mechanism for deep neural networks

Patent number: 11222392

Abstract: An apparatus to facilitate compute optimization is disclosed. The apparatus includes a memory device including a first integrated circuit (IC) including a plurality of memory channels and a second IC including a plurality of processing units, each coupled to a memory channel in the plurality of memory channels.

Type: Grant

Filed: August 5, 2019

Date of Patent: January 11, 2022

Assignee: Intel Corporation

Inventors: Prasoonkumar Surti, Narayan Srinivasa, Feng Chen, Joydeep Ray, Ben J. Ashbaugh, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Balaji Vembu, Tsung-Han Lin, Kamal Sinha, Rajkishore Barik, Sara S. Baghsorkhi, Justin E. Gottschlich, Altug Koker, Nadathur Rajagopalan Satish, Farshad Akhbari, Dukhwan Kim, Wenyin Fu, Travis T. Schluessler, Josh B. Mastronarde, Linda L. Hurd, John H. Feit, Jeffery S. Boles, Adam T. Lake, Karthik Vaidyanathan, Devan Burke, Subramaniam Maiyuran, Abhishek R. Appu
Autonomous vehicle advanced sensing and response

Patent number: 11217040

Abstract: One embodiment provides for a computing device within an autonomous vehicle, the compute device comprising a wireless network device to enable a wireless data connection with an autonomous vehicle network, a set of multiple processors including a general-purpose processor and a general-purpose graphics processor, the set of multiple processors to execute a compute manager to manage execution of compute workloads associated with the autonomous vehicle, the compute workload associated with autonomous operations of the autonomous vehicle, and offload logic configured to execute on the set of multiple processors, the offload logic to determine to offload one or more of the compute workloads to one or more autonomous vehicles within range of the wireless network device.

Type: Grant

Filed: April 15, 2019

Date of Patent: January 4, 2022

Assignee: Intel Corporation

Inventors: Barath Lakshamanan, Linda L. Hurd, Ben J. Ashbaugh, Elmoustapha Ould-Ahmed-Vall, Liwei Ma, Jingyi Jin, Justin E. Gottschlich, Chandrasekaran Sakthivel, Michael S. Strickland, Brian T. Lewis, Lindsey Kuper, Altug Koker, Abhishek R. Appu, Prasoonkumar Surti, Joydeep Ray, Balaji Vembu, Javier S. Turek, Naila Farooqui
Programmable coarse grained and sparse matrix compute hardware with advanced scheduling

Patent number: 11210760

Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to cause the compute apparatus to perform a complex machine learning compute operation.

Type: Grant

Filed: July 14, 2020

Date of Patent: December 28, 2021

Assignee: Intel Corporation

Inventors: Eriko Nurvitadhi, Balaji Vembu, Nicolas C. Galoppo Von Borries, Rajkishore Barik, Tsung-Han Lin, Kamal Sinha, Nadathur Rajagopalan Satish, Jeremy Bottleson, Farshad Akhbari, Altug Koker, Narayan Srinivasa, Dukhwan Kim, Sara S. Baghsorkhi, Justin E. Gottschlich, Feng Chen, Elmoustapha Ould-Ahmed-Vall, Kevin Nealis, Xiaoming Chen, Anbang Yao
Engine to enable high speed context switching via on-die storage

Patent number: 11210265

Abstract: In an example, an apparatus comprises a plurality of execution units, and a first memory communicatively couple to the plurality of execution units, wherein the first shared memory is shared by the plurality of execution units and a copy engine to copy context state data from at least a first of the plurality of execution units to the first shared memory. Other embodiments are also disclosed and claimed.

Type: Grant

Filed: May 7, 2020

Date of Patent: December 28, 2021

Assignee: INTEL CORPORATION

Inventors: Altug Koker, Prasoonkumar Surti, David Puffer, Subramaniam Maiyuran, Guei-Yuan Lueh, Abhishek R. Appu, Joydeep Ray, Balaji Vembu, Tomer Bar-On, Andrew T. Lauritzen, Hugues Labbe, John G. Gierach, Gabor Liktor
DISPLAY ENGINE SURFACE BLENDING AND ADAPTIVE TEXEL TO PIXEL RATIO SAMPLE RATE SYSTEM, APPARATUS AND METHOD

Publication number: 20210398250

Abstract: Systems, apparatuses and methods may provide away to blend two or more of the scene surfaces based on the focus area and an offload threshold. More particularly, systems, apparatuses and methods may provide a way to blend, by a display engine, two or more of the focus area scene surfaces and blended non-focus area scene surfaces. The systems, apparatuses and methods may include a graphics engine to render the focus area surfaces at a higher sample rate than the non-focus area scene surfaces.

Type: Application

Filed: April 26, 2021

Publication date: December 23, 2021

Inventors: Joydeep Ray, Travis T. Schluessler, John H. Feit, Nikos Kaburlasos, Jacek Kwiatkowski, Abhishek R. Appu, Balaji Vembu, Prasoonkumar Surti
SECTOR CACHE FOR COMPRESSION

Publication number: 20210374062

Abstract: In an example, an apparatus comprises a plurality of execution units, and a cache memory communicatively coupled to the plurality of execution units, wherein the cache memory is structured into a plurality of sectors, wherein each sector in the plurality of sectors comprises at least two cache lines. Other embodiments are also disclosed and claimed.

Type: Application

Filed: August 12, 2021

Publication date: December 2, 2021

Applicant: Intel Corporation

Inventors: Abhishek R. Appu, Altug Koker, Joydeep Ray, David Puffer, Prasoonkumar Surti, Lakshminarayanan Striramassarma, Vasanth Ranganathan, Kiran C. Veernapu, Balaji Vembu, Pattabhiraman K
SCHEDULING OF THREADS FOR EXECUTION UTILIZING LOAD BALANCING OF THREAD GROUPS

Publication number: 20210373899

Abstract: An apparatus to facilitate thread scheduling is disclosed. The apparatus includes logic to store barrier usage data based on a magnitude of barrier messages in an application kernel and a scheduler to schedule execution of threads across a plurality of multiprocessors based on the barrier usage data.

Type: Application

Filed: February 11, 2021

Publication date: December 2, 2021

Applicant: Intel Corporation

Inventors: Balaji Vembu, Abhishek R. Appu, Joydeep Ray, Altug Koker
COMPUTE OPTIMIZATIONS FOR NEURAL NETWORKS

Publication number: 20210373886

Abstract: One embodiment provides for a compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction that specifies multiple operands including a multi-bit input value and a ternary weight associated with a neural network and an arithmetic logic unit including a multiplier, an adder, and an accumulator register. To execute the decoded instruction, the multiplier is to perform a multiplication operation on the multi-bit input based on the ternary weight to generate an intermediate product and the adder is to add the intermediate product to a value stored in the accumulator register and update the value stored in the accumulator register.

Type: Application

Filed: July 26, 2021

Publication date: December 2, 2021

Applicant: Intel Corporation

Inventors: Kevin Nealis, Anbang Yao, Xiaoming Chen, Elmoustapha Ould-Ahmed-Vall, Sara S. Baghsorkhi, Eriko Nurvitadhi, Balaji Vembu, Nicolas C. Galoppo Von Borries, Rajkishore Barik, Tsung-Han Lin, Kamal Sinha
Memory-based dependency tracking and cache pre-fetch hardware for multi-resolution shading

Patent number: 11182296

Abstract: Systems, apparatuses and methods may provide a way to track graphics pipeline operations. More particularly, the systems, apparatuses and methods may provide a way to track operation dependencies between graphics pipeline operations for blocks of pixel samples and stall one or more of the pipeline operations based on the operation dependencies. The systems, apparatuses and methods may further provide cache pre-fetch hardware to monitor processing of blocks of pixel samples and fetch a next block of the pixel samples from the memory into a cache before completion of processing a current block of pixel samples based on one or more of the pipeline operations or a surface state of one or more regions of a screen space.

Type: Grant

Filed: September 10, 2019

Date of Patent: November 23, 2021

Assignee: Intel Corporation

Inventors: Andrew T. Lauritzen, Gabor Liktor, Tomer Bar-On, Hugues Labbe, John G. Gierach, Joydeep Ray, Travis T. Schluessler, John H. Feit, Nikos Kaburlasos, Jacek Kwiatkowski, Abhishek R. Appu, Balaji Vembu, Altug Koker
COMPUTE OPTIMIZATION MECHANISM FOR DEEP NEURAL NETWORKS

Publication number: 20210350499

Abstract: An apparatus to facilitate compute optimization is disclosed. The apparatus includes a plurality of processing units each comprising a plurality of processing cores of a first type and a second type. A first set of processing cores of a first type perform multi-dimensional matrix operations and a second set of processing cores of a second type perform general purpose graphics processing unit (GPGPU) operations.

Type: Application

Filed: July 26, 2021

Publication date: November 11, 2021

Applicant: Intel Corporation

Inventors: Prasoonkumar Surti, Narayan Srinivasa, Feng Chen, Joydeep Ray, Ben J. Ashbaugh, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Balaji Vembu, Tsung-Han Lin, Kamal Sinha, Rajkishore Barik, Sara S. Baghsorkhi, Justin E. Gottschlich, Altug Koker, Nadathur Rajagopalan Satish, Farshad Akhbari, Dukhwan Kim, Wenyin Fu, Travis T. Schluessler, Josh B. Mastronarde, Linda L. Hurd, John H. Feit, Jeffery S. Boles, Adam T. Lake, Karthik Vaidyanathan, Devan Burke, Subramaniam Maiyuran, Abhishek R. Appu
Instructions and logic to perform floating-point and integer operations for machine learning

Patent number: 11169799

Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute a 32-bit intermediate product of 16-bit operands and to compute a 32-bit sum based on the 32-bit intermediate product.

Type: Grant

Filed: June 5, 2019

Date of Patent: November 9, 2021

Assignee: Intel Corporation

Inventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar

prev … 4 5 6 7 8 9 10 11 12 … next