Patents by Inventor Balaji Vembu

Balaji Vembu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

BARRIERS AND SYNCHRONIZATION FOR MACHINE LEARNING AT AUTONOMOUS MACHINES

Publication number: 20220357742

Abstract: A mechanism is described for facilitating barriers and synchronization for machine learning at autonomous machines. A method of embodiments, as described herein, includes detecting thread groups relating to machine learning associated with one or more processing devices. The method may further include facilitating barrier synchronization of the thread groups across multiple dies such that each thread in a thread group is scheduled across a set of compute elements associated with the multiple dies, where each die represents a processing device of the one or more processing devices, the processing device including a graphics processor.

Type: Application

Filed: May 23, 2022

Publication date: November 10, 2022

Applicant: Intel Corporation

Inventors: Abhishek R. Appu, Altug Koker, Joydeep Ray, Balaji Vembu, John C. Weast, Mike B. Macpherson, Dukhwan Kim, Linda L. Hurd, Sanjeev Jahagirdar, Vasanth Ranganathan
INSTRUCTIONS AND LOGIC TO PERFORM FLOATING POINT AND INTEGER OPERATIONS FOR MACHINE LEARNING

Publication number: 20220357945

Abstract: One embodiment provides a graphics processor comprising a memory controller and a graphics processing resource coupled with the memory controller. The graphics processing resource includes circuitry configured to execute an instruction to perform a matrix operation on first input including weight data and second input including input activation data, generate intermediate data based on a result of the matrix operation, quantize the intermediate data to a floating-point format determined based on a statistical distribution of first output data, and output, as second output data, quantized intermediate data in a determined floating-point format.

Type: Application

Filed: June 7, 2022

Publication date: November 10, 2022

Applicant: Intel Corporation

Inventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
Memory-based software barriers

Patent number: 11494232

Abstract: A mechanism is described for facilitating memory-based software barriers to emulate hardware barriers at graphics processors in computing devices. A method of embodiments, as described herein, includes facilitating converting thread scheduling at a processor from hardware barriers to software barriers, where the software barriers emulate the hardware barriers.

Type: Grant

Filed: November 24, 2020

Date of Patent: November 8, 2022

Assignee: Intel Corporation

Inventors: Altug Koker, Joydeep Ray, Balaji Vembu, James A. Valerio, Abhishek R. Appu
Scatter gather engine

Patent number: 11494967

Abstract: In an example, an apparatus comprises a plurality of execution units, and logic, at least partially including hardware logic, to create a scatter gather list in memory and collect a plurality of operating statistics for the plurality of execution units using the scatter gather list. Other embodiments are also disclosed and claimed.

Type: Grant

Filed: February 11, 2021

Date of Patent: November 8, 2022

Assignee: INTEL CORPORATION

Inventors: Balaji Vembu, Murali Ramadoss, David I. Standring, Shruti A. Sethi, Jeffrey S. Frizzell, Alan M. Curtis, Abhishek R. Appu, Joydeep Ray, Altug Koker
PAGE FAULTING AND SELECTIVE PREEMPTION

Publication number: 20220351325

Abstract: One embodiment provides a parallel processor comprising a memory interface and a processing array coupled with the memory interface. The processing array is configured to address memory accessed via the memory interface via a virtual address mapping and includes circuitry to resolve a page fault for the virtual address mapping, wherein each of the multiple compute blocks is separately preemptable.

Type: Application

Filed: May 20, 2022

Publication date: November 3, 2022

Applicant: Intel Corporation

Inventors: Altug Koker, Ingo Wald, David Puffer, Subramaniam M. Maiyuran, Prasoonkumar Surti, Balaji Vembu, Guei-Yuan Lueh, Murali Ramadoss, Abhishek R. Appu, Joydeep Ray
EFFICIENT THREAD GROUP SCHEDULING

Publication number: 20220350651

Abstract: A mechanism is described for facilitating intelligent thread scheduling at autonomous machines. A method of embodiments, as described herein, includes detecting dependency information relating to a plurality of threads corresponding to a plurality of workloads associated with tasks relating to a processor including a graphics processor. The method may further include generating a tree of thread groups based on the dependency information, where each thread group includes multiple threads, and scheduling one or more of the thread groups associated a similar dependency to avoid dependency conflicts.

Type: Application

Filed: May 17, 2022

Publication date: November 3, 2022

Applicant: Intel Corporation

Inventors: Joydeep Ray, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Rajkishore Barik, Eriko Nurvitadhi, Nicolas Galoppo Von Borries, Tsung-Han Lin, Sanjeev Jahagirdar, Vasanth Ranganathan
Workload scheduling and distribution on a distributed graphics device

Patent number: 11481864

Abstract: Embodiments described herein provide a graphics, media, and compute device having a tiled architecture composed of a number of tiles of smaller graphics devices. The work distribution infrastructure for such device enables the distribution of workloads across multiple tiles of the device. Work items can be submitted to any one or more of the multiple tiles, with workloads able to span multiple tiles. Additionally, upon completion of a work item, graphics, media, and/or compute engines within the device can readily acquire new work items for execution with minimal latency.

Type: Grant

Filed: April 19, 2021

Date of Patent: October 25, 2022

Assignee: Intel Corporation

Inventors: Balaji Vembu, Brandon Fliflet, James Valerio, Michael Apodaca, Ben Ashbaugh, Hema Nalluri, Ankur Shah, Murali Ramadoss, David Puffer, Altug Koker, Aditya Navale, Abhishek R. Appu, Joydeep Ray, Travis Schluessler
APPARATUS AND METHOD FOR MEMORY MANAGEMENT IN A GRAPHICS PROCESSING ENVIRONMENT

Publication number: 20220334982

Abstract: An apparatus and method are described for implementing memory management in a graphics processing system. For example, one embodiment of an apparatus comprises: a first plurality of graphics processing resources to execute graphics commands and process graphics data; a first memory management unit (MMU) to communicatively couple the first plurality of graphics processing resources to a system-level MMU to access a system memory; a second plurality of graphics processing resources to execute graphics commands and process graphics data; a second MMU to communicatively couple the second plurality of graphics processing resources to the first MMU; wherein the first MMU is configured as a master MMU having a direct connection to the system-level MMU and the second MMU comprises a slave MMU configured to send memory transactions to the first MMU, the first MMU either servicing a memory transaction or sending the memory transaction to the system-level MMU on behalf of the second MMU.

Type: Application

Filed: May 27, 2022

Publication date: October 20, 2022

Inventors: NIRANJAN L. COORAY, ABHISHEK R. APPU, ALTUG KOKER, JOYDEEP RAY, BALAJI VEMBU, PATTABHIRAMAN K, DAVID PUFFER, DAVID J. COWPERTHWAITE, RAJESH M. SANKARAN, SATYESHWAR SINGH, SAMEER KP, ANKUR N. SHAH, KUN TIAN
CACHE OPTIMIZATION FOR GRAPHICS SYSTEMS

Publication number: 20220334977

Abstract: A mechanism is described for facilitating optimization of cache associated with graphics processors at computing devices. A method of embodiments, as described herein, includes introducing coloring bits to contents of a cache associated with a processor including a graphics processor, wherein the coloring bits to represent a signal identifying one or more caches available for use, while avoiding explicit invalidations and flushes.

Type: Application

Filed: April 7, 2022

Publication date: October 20, 2022

Applicant: Intel Corporation

Inventors: Altug Koker, Balaji Vembu, Joydeep Ray, Abhishek R. Appu
COMPUTE OPTIMIZATION MECHANISM FOR DEEP NEURAL NETWORKS

Publication number: 20220335562

Abstract: Embodiments provide mechanisms to facilitate compute operations for deep neural networks. One embodiment comprises a graphics processing unit comprising one or more multiprocessors, at least one of the one or more multiprocessors including a register file to store a plurality of different types of operands and a plurality of processing cores. The plurality of processing cores includes a first set of processing cores of a first type and a second set of processing cores of a second type. The first set of processing cores are associated with a first memory channel and the second set of processing cores are associated with a second memory channel.

Type: Application

Filed: May 11, 2022

Publication date: October 20, 2022

Applicant: Intel Corporation

Inventors: Prasoonkumar Surti, Narayan Srinivasa, Feng Chen, Joydeep Ray, Ben J. Ashbaugh, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Balaji Vembu, Tsung-Han Lin, Kamal Sinha, Rajkishore Barik, Sara S. Baghsorkhi, Justin E. Gottschlich, Altug Koker, Nadathur Rajagopalan Satish, Farshad Akhbari, Dukhwan Kim, Wenyin Fu, Travis T. Schluessler, Josh B. Mastronarde, Linda L. Hurd, John H. Feit, Jeffery S. Boles, Adam T. Lake, Karthik Vaidyanathan, Devan Burke, Subramaniam Maiyuran, Abhishek R. Appu
Apparatus and method for efficient graphics virtualization

Patent number: 11475623

Abstract: An apparatus and method are described for allocating local memories to virtual machines. For example, one embodiment of an apparatus comprises: a command streamer to queue commands from a plurality of virtual machines (VMs) or applications, the commands to be distributed from the command streamer and executed by graphics processing resources of a graphics processing unit (GPU); a tile cache to store graphics data associated with the plurality of VMs or applications as the commands are executed by the graphics processing resources; and tile cache allocation hardware logic to allocate a first portion of the tile cache to a first VM or application and a second portion of the tile cache to a second VM or application; the tile cache allocation hardware logic to further allocate a first region in system memory to store spill-over data when the first portion of the tile cache and/or the second portion of the file cache becomes full.

Type: Grant

Filed: January 5, 2021

Date of Patent: October 18, 2022

Assignee: INTEL CORPORATION

Inventors: Joydeep Ray, Abhishek R. Appu, Pattabhiraman K, Balaji Vembu, Altug Koker, Niranjan L. Cooray, Josh B. Mastronarde
Specialized fixed function hardware for efficient convolution

Patent number: 11475286

Abstract: One embodiment provides an apparatus comprising an instruction cache to store a plurality of instructions, a scheduler unit coupled to the instruction cache, the scheduler unit to schedule the plurality of instructions for execution, an instruction fetch and decode unit to decode the plurality of instructions to determine a set of operations to perform in response, one or more compute blocks to perform parallel multiply-accumulate operations based on the instruction fetch and decode unit decoding a first instruction of the plurality of instructions, and matrix multiplication logic to perform matrix multiplication operations based on the instruction fetch and decode unit decoding a second instruction of the plurality of instructions.

Type: Grant

Filed: December 21, 2021

Date of Patent: October 18, 2022

Assignee: Intel Corporation

Inventors: Rajkishore Barik, Elmoustapha Ould-Ahmed-Vall, Xiaoming Chen, Dhawal Srivastava, Anbang Yao, Kevin Nealis, Eriko Nurvitadhi, Sara S. Baghsorkhi, Balaji Vembu, Tatiana Shpeisman, Ping T. Tang
Positional only shading pipeline (POSH) geometry data processing with coarse Z buffer

Patent number: 11461959

Abstract: The systems, apparatuses and methods may provide a way to adaptively process and aggressively cull geometry data. Systems, apparatuses and methods may provide for processing, by a positional only shading pipeline (POSH), geometry data including surface triangles for a digital representation of a scene. More particularly, systems, apparatuses and methods may provide a way to identify surface triangles in one or more exclusion zones and non-exclusion zones, and cull surface triangles in one or more exclusion zones.

Type: Grant

Filed: May 4, 2020

Date of Patent: October 4, 2022

Assignee: Intel Corporation

Inventors: Prasoonkumar Surti, Karthik Vaidyanathan, Atsuo Kuwahara, Hugues Labbe, Sameer Kp, Jonathan Kennedy, Abhishek R. Appu, Jeffery S. Boles, Balaji Vembu, Michael Apodaca, Slawomir Grajewski, Gabor Liktor, David M. Cimini, Andrew T. Lauritzen, Travis T. Schluessler, Murali Ramadoss, Abhishek Venkatesh, Joydeep Ray, Kai Xiao, Ankur N. Shah, Altug Koker
APPARATUS AND METHOD FOR EFFICIENT GRAPHICS VIRTUALIZATION

Publication number: 20220309731

Abstract: An apparatus and method are described for allocating local memories to virtual machines. For example, one embodiment of an apparatus comprises: a command streamer to queue commands from a plurality of virtual machines (VMs) or applications, the commands to be distributed from the command streamer and executed by graphics processing resources of a graphics processing unit (GPU); a tile cache to store graphics data associated with the plurality of VMs or applications as the commands are executed by the graphics processing resources; and tile cache allocation hardware logic to allocate a first portion of the tile cache to a first VM or application and a second portion of the tile cache to a second VM or application; the tile cache allocation hardware logic to further allocate a first region in system memory to store spill-over data when the first portion of the tile cache and/or the second portion of the file cache becomes full.

Type: Application

Filed: June 13, 2022

Publication date: September 29, 2022

Inventors: Joydeep RAY, Abhishek R. APPU, Pattabhiraman K, Balaji VEMBU, Altug KOKER, Niranjan L. COORAY, Josh B. MASTRONARDE
METHOD AND APPARATUS FOR EFFICIENT LOOP PROCESSING IN A GRAPHICS HARDWARE FRONT END

Publication number: 20220284539

Abstract: Various embodiments enable loop processing in a command processing block of the graphics hardware. Such hardware may include a processor including a command buffer, and a graphics command parser. The graphics command parser to load graphics commands from the command buffer, parse a first graphics command, store a loop count value associated with the first graphics command, parse a second graphics command and store a loop wrap address based on the second graphics command. The graphics command parser may execute a command sequence identified by the second graphics command, parse a third graphics command, the third graphics command identifying an end of the command sequence, set a new loop count value, and iteratively execute the command sequence using the loop wrap address based on the new loop count value.

Type: Application

Filed: January 18, 2022

Publication date: September 8, 2022

Inventors: Hema Chand NALLURI, Balaji VEMBU, Peter DOYLE, Michael APODACA
APPARATUS AND METHOD FOR MANAGING DATA BIAS IN A GRAPHICS PROCESSING ARCHITECTURE

Publication number: 20220277412

Abstract: An apparatus and method are described for managing data which is biased towards a processor or a GPU. For example, an apparatus comprises a processor comprising one or more cores, one or more cache levels, and cache coherence controllers to maintain coherent data in the one or more cache levels; a graphics processing unit (GPU) to execute graphics instructions and process graphics data, wherein the GPU and processor cores are to share a virtual address space for accessing a system memory; a GPU memory addressable through the virtual address space shared by the processor cores and GPU; and bias management circuitry to store an indication for whether the data has a processor bias or a GPU bias, wherein if the data has a GPU bias, the data is to be accessed by the GPU without necessarily accessing the processor's cache coherence controllers.

Type: Application

Filed: March 15, 2022

Publication date: September 1, 2022

Inventors: Joydeep RAY, Abhishek R. APPU, Altug KOKER, Balaji VEMBU
PAGE FAULTING AND SELECTIVE PREEMPTION

Publication number: 20220277413

Abstract: One embodiment provides a graphics processor comprising a system interface and circuitry coupled with the system interface. The circuitry includes an execution resource and a preemption status register. The execution resource is configured to execute an instruction. During execution of the instruction, the execution resource is to receive a request to preempt execution of a thread associated with the instruction and, based on a value stored in the preemption status register, execute at least one additional instruction after receipt of the request to preempt execution of the thread.

Type: Application

Filed: May 20, 2022

Publication date: September 1, 2022

Applicant: Intel Corporation

Inventors: Altug Koker, Ingo Wald, David Puffer, Subramaniam M. Maiyuran, Prasoonkumar Surti, Balaji Vembu, Guei-Yuan Lueh, Murali Ramadoss, Abhishek R. Appu, Joydeep Ray
Machine learning sparse computation mechanism

Patent number: 11430083

Abstract: Techniques to improve performance of matrix multiply operations are described in which a compute kernel can specify one or more element-wise operations to perform on output of the compute kernel before the output is transferred to higher levels of a processor memory hierarchy.

Type: Grant

Filed: March 5, 2021

Date of Patent: August 30, 2022

Assignee: Intel Corporation

Inventors: Eriko Nurvitadhi, Balaji Vembu, Tsung-Han Lin, Kamal Sinha, Rajkishore Barik, Nicolas C. Galoppo Von Borries
Coordination and increased utilization of graphics processors during inference

Patent number: 11430082

Abstract: A mechanism is described for facilitating inference coordination and processing utilization for machine learning at autonomous machines. A method of embodiments, as described herein, includes detecting, at training time, information relating to one or more tasks to be performed according to a training dataset relating to a processor including a graphics processor. The method may further include analyzing the information to determine one or more portions of hardware relating to the processor capable of supporting the one or more tasks, and configuring the hardware to pre-select the one or more portions to perform the one or more tasks, while other portions of the hardware remain available for other tasks.

Type: Grant

Filed: January 7, 2021

Date of Patent: August 30, 2022

Assignee: Intel Corporation

Inventors: Abhishek R. Appu, Altug Koker, John C. Weast, Mike B. Macpherson, Linda L. Hurd, Sara S. Baghsorkhi, Justin E. Gottschlich, Prasoonkumar Surti, Chandrasekaran Sakthivel, Liwei Ma, Elmoustapha Ould-Ahmed-Vall, Kamal Sinha, Joydeep Ray, Balaji Vembu, Sanjeev Jahagirdar, Vasanth Ranganathan, Dukhwan Kim
ORDER INDEPENDENT ASYNCHRONOUS COMPUTE AND STREAMING FOR GRAPHICS

Publication number: 20220262059

Abstract: An embodiment of an electronic processing system may include an application processor, persistent storage media communicatively coupled to the application processor, and a graphics subsystem communicatively coupled to the application processor. The system may include one or more of a draw call re-orderer communicatively coupled to the application processor and the graphics subsystem to re-order two or more draw calls, a workload re-orderer communicatively coupled to the application processor and the graphics subsystem to re-order two or more work items in an order independent mode, a queue primitive included in at least one of the two or more draw calls to define a producer stage and a consumer stage, and an order-independent executor communicatively coupled to the application processor and the graphics subsystem to provide tile-based order independent execution of a compute stage. Other embodiments are disclosed and claimed.

Type: Application

Filed: February 2, 2022

Publication date: August 18, 2022

Inventors: Devan Burke, Adam T. Lake, Jeffery S. Boles, John H. Feit, Karthik Vaidyanathan, Abhishek R. Appu, Joydeep Ray, Subramaniam Maiyuran, Altug Koker, Balaji Vembu, Murali Ramadoss, Prasoonkumar Surti, Eric J. Hoekstra, Gabor Liktor, Jonathan Kennedy, Slawomir Grajewski, Elmoustapha Ould-Ahmed-Vall

prev 1 2 3 4 5 6 7 8 9 … next