Patents by Inventor Balaji Vembu

Balaji Vembu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20220357742
    Abstract: A mechanism is described for facilitating barriers and synchronization for machine learning at autonomous machines. A method of embodiments, as described herein, includes detecting thread groups relating to machine learning associated with one or more processing devices. The method may further include facilitating barrier synchronization of the thread groups across multiple dies such that each thread in a thread group is scheduled across a set of compute elements associated with the multiple dies, where each die represents a processing device of the one or more processing devices, the processing device including a graphics processor.
    Type: Application
    Filed: May 23, 2022
    Publication date: November 10, 2022
    Applicant: Intel Corporation
    Inventors: Abhishek R. Appu, Altug Koker, Joydeep Ray, Balaji Vembu, John C. Weast, Mike B. Macpherson, Dukhwan Kim, Linda L. Hurd, Sanjeev Jahagirdar, Vasanth Ranganathan
  • Publication number: 20220357945
    Abstract: One embodiment provides a graphics processor comprising a memory controller and a graphics processing resource coupled with the memory controller. The graphics processing resource includes circuitry configured to execute an instruction to perform a matrix operation on first input including weight data and second input including input activation data, generate intermediate data based on a result of the matrix operation, quantize the intermediate data to a floating-point format determined based on a statistical distribution of first output data, and output, as second output data, quantized intermediate data in a determined floating-point format.
    Type: Application
    Filed: June 7, 2022
    Publication date: November 10, 2022
    Applicant: Intel Corporation
    Inventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
  • Patent number: 11494232
    Abstract: A mechanism is described for facilitating memory-based software barriers to emulate hardware barriers at graphics processors in computing devices. A method of embodiments, as described herein, includes facilitating converting thread scheduling at a processor from hardware barriers to software barriers, where the software barriers emulate the hardware barriers.
    Type: Grant
    Filed: November 24, 2020
    Date of Patent: November 8, 2022
    Assignee: Intel Corporation
    Inventors: Altug Koker, Joydeep Ray, Balaji Vembu, James A. Valerio, Abhishek R. Appu
  • Patent number: 11494967
    Abstract: In an example, an apparatus comprises a plurality of execution units, and logic, at least partially including hardware logic, to create a scatter gather list in memory and collect a plurality of operating statistics for the plurality of execution units using the scatter gather list. Other embodiments are also disclosed and claimed.
    Type: Grant
    Filed: February 11, 2021
    Date of Patent: November 8, 2022
    Assignee: INTEL CORPORATION
    Inventors: Balaji Vembu, Murali Ramadoss, David I. Standring, Shruti A. Sethi, Jeffrey S. Frizzell, Alan M. Curtis, Abhishek R. Appu, Joydeep Ray, Altug Koker
  • Publication number: 20220351325
    Abstract: One embodiment provides a parallel processor comprising a memory interface and a processing array coupled with the memory interface. The processing array is configured to address memory accessed via the memory interface via a virtual address mapping and includes circuitry to resolve a page fault for the virtual address mapping, wherein each of the multiple compute blocks is separately preemptable.
    Type: Application
    Filed: May 20, 2022
    Publication date: November 3, 2022
    Applicant: Intel Corporation
    Inventors: Altug Koker, Ingo Wald, David Puffer, Subramaniam M. Maiyuran, Prasoonkumar Surti, Balaji Vembu, Guei-Yuan Lueh, Murali Ramadoss, Abhishek R. Appu, Joydeep Ray
  • Publication number: 20220350651
    Abstract: A mechanism is described for facilitating intelligent thread scheduling at autonomous machines. A method of embodiments, as described herein, includes detecting dependency information relating to a plurality of threads corresponding to a plurality of workloads associated with tasks relating to a processor including a graphics processor. The method may further include generating a tree of thread groups based on the dependency information, where each thread group includes multiple threads, and scheduling one or more of the thread groups associated a similar dependency to avoid dependency conflicts.
    Type: Application
    Filed: May 17, 2022
    Publication date: November 3, 2022
    Applicant: Intel Corporation
    Inventors: Joydeep Ray, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Rajkishore Barik, Eriko Nurvitadhi, Nicolas Galoppo Von Borries, Tsung-Han Lin, Sanjeev Jahagirdar, Vasanth Ranganathan
  • Patent number: 11481864
    Abstract: Embodiments described herein provide a graphics, media, and compute device having a tiled architecture composed of a number of tiles of smaller graphics devices. The work distribution infrastructure for such device enables the distribution of workloads across multiple tiles of the device. Work items can be submitted to any one or more of the multiple tiles, with workloads able to span multiple tiles. Additionally, upon completion of a work item, graphics, media, and/or compute engines within the device can readily acquire new work items for execution with minimal latency.
    Type: Grant
    Filed: April 19, 2021
    Date of Patent: October 25, 2022
    Assignee: Intel Corporation
    Inventors: Balaji Vembu, Brandon Fliflet, James Valerio, Michael Apodaca, Ben Ashbaugh, Hema Nalluri, Ankur Shah, Murali Ramadoss, David Puffer, Altug Koker, Aditya Navale, Abhishek R. Appu, Joydeep Ray, Travis Schluessler
  • Publication number: 20220334982
    Abstract: An apparatus and method are described for implementing memory management in a graphics processing system. For example, one embodiment of an apparatus comprises: a first plurality of graphics processing resources to execute graphics commands and process graphics data; a first memory management unit (MMU) to communicatively couple the first plurality of graphics processing resources to a system-level MMU to access a system memory; a second plurality of graphics processing resources to execute graphics commands and process graphics data; a second MMU to communicatively couple the second plurality of graphics processing resources to the first MMU; wherein the first MMU is configured as a master MMU having a direct connection to the system-level MMU and the second MMU comprises a slave MMU configured to send memory transactions to the first MMU, the first MMU either servicing a memory transaction or sending the memory transaction to the system-level MMU on behalf of the second MMU.
    Type: Application
    Filed: May 27, 2022
    Publication date: October 20, 2022
    Inventors: NIRANJAN L. COORAY, ABHISHEK R. APPU, ALTUG KOKER, JOYDEEP RAY, BALAJI VEMBU, PATTABHIRAMAN K, DAVID PUFFER, DAVID J. COWPERTHWAITE, RAJESH M. SANKARAN, SATYESHWAR SINGH, SAMEER KP, ANKUR N. SHAH, KUN TIAN
  • Publication number: 20220334977
    Abstract: A mechanism is described for facilitating optimization of cache associated with graphics processors at computing devices. A method of embodiments, as described herein, includes introducing coloring bits to contents of a cache associated with a processor including a graphics processor, wherein the coloring bits to represent a signal identifying one or more caches available for use, while avoiding explicit invalidations and flushes.
    Type: Application
    Filed: April 7, 2022
    Publication date: October 20, 2022
    Applicant: Intel Corporation
    Inventors: Altug Koker, Balaji Vembu, Joydeep Ray, Abhishek R. Appu
  • Publication number: 20220335562
    Abstract: Embodiments provide mechanisms to facilitate compute operations for deep neural networks. One embodiment comprises a graphics processing unit comprising one or more multiprocessors, at least one of the one or more multiprocessors including a register file to store a plurality of different types of operands and a plurality of processing cores. The plurality of processing cores includes a first set of processing cores of a first type and a second set of processing cores of a second type. The first set of processing cores are associated with a first memory channel and the second set of processing cores are associated with a second memory channel.
    Type: Application
    Filed: May 11, 2022
    Publication date: October 20, 2022
    Applicant: Intel Corporation
    Inventors: Prasoonkumar Surti, Narayan Srinivasa, Feng Chen, Joydeep Ray, Ben J. Ashbaugh, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Balaji Vembu, Tsung-Han Lin, Kamal Sinha, Rajkishore Barik, Sara S. Baghsorkhi, Justin E. Gottschlich, Altug Koker, Nadathur Rajagopalan Satish, Farshad Akhbari, Dukhwan Kim, Wenyin Fu, Travis T. Schluessler, Josh B. Mastronarde, Linda L. Hurd, John H. Feit, Jeffery S. Boles, Adam T. Lake, Karthik Vaidyanathan, Devan Burke, Subramaniam Maiyuran, Abhishek R. Appu
  • Patent number: 11475623
    Abstract: An apparatus and method are described for allocating local memories to virtual machines. For example, one embodiment of an apparatus comprises: a command streamer to queue commands from a plurality of virtual machines (VMs) or applications, the commands to be distributed from the command streamer and executed by graphics processing resources of a graphics processing unit (GPU); a tile cache to store graphics data associated with the plurality of VMs or applications as the commands are executed by the graphics processing resources; and tile cache allocation hardware logic to allocate a first portion of the tile cache to a first VM or application and a second portion of the tile cache to a second VM or application; the tile cache allocation hardware logic to further allocate a first region in system memory to store spill-over data when the first portion of the tile cache and/or the second portion of the file cache becomes full.
    Type: Grant
    Filed: January 5, 2021
    Date of Patent: October 18, 2022
    Assignee: INTEL CORPORATION
    Inventors: Joydeep Ray, Abhishek R. Appu, Pattabhiraman K, Balaji Vembu, Altug Koker, Niranjan L. Cooray, Josh B. Mastronarde
  • Patent number: 11475286
    Abstract: One embodiment provides an apparatus comprising an instruction cache to store a plurality of instructions, a scheduler unit coupled to the instruction cache, the scheduler unit to schedule the plurality of instructions for execution, an instruction fetch and decode unit to decode the plurality of instructions to determine a set of operations to perform in response, one or more compute blocks to perform parallel multiply-accumulate operations based on the instruction fetch and decode unit decoding a first instruction of the plurality of instructions, and matrix multiplication logic to perform matrix multiplication operations based on the instruction fetch and decode unit decoding a second instruction of the plurality of instructions.
    Type: Grant
    Filed: December 21, 2021
    Date of Patent: October 18, 2022
    Assignee: Intel Corporation
    Inventors: Rajkishore Barik, Elmoustapha Ould-Ahmed-Vall, Xiaoming Chen, Dhawal Srivastava, Anbang Yao, Kevin Nealis, Eriko Nurvitadhi, Sara S. Baghsorkhi, Balaji Vembu, Tatiana Shpeisman, Ping T. Tang
  • Patent number: 11461959
    Abstract: The systems, apparatuses and methods may provide a way to adaptively process and aggressively cull geometry data. Systems, apparatuses and methods may provide for processing, by a positional only shading pipeline (POSH), geometry data including surface triangles for a digital representation of a scene. More particularly, systems, apparatuses and methods may provide a way to identify surface triangles in one or more exclusion zones and non-exclusion zones, and cull surface triangles in one or more exclusion zones.
    Type: Grant
    Filed: May 4, 2020
    Date of Patent: October 4, 2022
    Assignee: Intel Corporation
    Inventors: Prasoonkumar Surti, Karthik Vaidyanathan, Atsuo Kuwahara, Hugues Labbe, Sameer Kp, Jonathan Kennedy, Abhishek R. Appu, Jeffery S. Boles, Balaji Vembu, Michael Apodaca, Slawomir Grajewski, Gabor Liktor, David M. Cimini, Andrew T. Lauritzen, Travis T. Schluessler, Murali Ramadoss, Abhishek Venkatesh, Joydeep Ray, Kai Xiao, Ankur N. Shah, Altug Koker
  • Publication number: 20220309731
    Abstract: An apparatus and method are described for allocating local memories to virtual machines. For example, one embodiment of an apparatus comprises: a command streamer to queue commands from a plurality of virtual machines (VMs) or applications, the commands to be distributed from the command streamer and executed by graphics processing resources of a graphics processing unit (GPU); a tile cache to store graphics data associated with the plurality of VMs or applications as the commands are executed by the graphics processing resources; and tile cache allocation hardware logic to allocate a first portion of the tile cache to a first VM or application and a second portion of the tile cache to a second VM or application; the tile cache allocation hardware logic to further allocate a first region in system memory to store spill-over data when the first portion of the tile cache and/or the second portion of the file cache becomes full.
    Type: Application
    Filed: June 13, 2022
    Publication date: September 29, 2022
    Inventors: Joydeep RAY, Abhishek R. APPU, Pattabhiraman K, Balaji VEMBU, Altug KOKER, Niranjan L. COORAY, Josh B. MASTRONARDE
  • Publication number: 20220284539
    Abstract: Various embodiments enable loop processing in a command processing block of the graphics hardware. Such hardware may include a processor including a command buffer, and a graphics command parser. The graphics command parser to load graphics commands from the command buffer, parse a first graphics command, store a loop count value associated with the first graphics command, parse a second graphics command and store a loop wrap address based on the second graphics command. The graphics command parser may execute a command sequence identified by the second graphics command, parse a third graphics command, the third graphics command identifying an end of the command sequence, set a new loop count value, and iteratively execute the command sequence using the loop wrap address based on the new loop count value.
    Type: Application
    Filed: January 18, 2022
    Publication date: September 8, 2022
    Inventors: Hema Chand NALLURI, Balaji VEMBU, Peter DOYLE, Michael APODACA
  • Publication number: 20220277412
    Abstract: An apparatus and method are described for managing data which is biased towards a processor or a GPU. For example, an apparatus comprises a processor comprising one or more cores, one or more cache levels, and cache coherence controllers to maintain coherent data in the one or more cache levels; a graphics processing unit (GPU) to execute graphics instructions and process graphics data, wherein the GPU and processor cores are to share a virtual address space for accessing a system memory; a GPU memory addressable through the virtual address space shared by the processor cores and GPU; and bias management circuitry to store an indication for whether the data has a processor bias or a GPU bias, wherein if the data has a GPU bias, the data is to be accessed by the GPU without necessarily accessing the processor's cache coherence controllers.
    Type: Application
    Filed: March 15, 2022
    Publication date: September 1, 2022
    Inventors: Joydeep RAY, Abhishek R. APPU, Altug KOKER, Balaji VEMBU
  • Publication number: 20220277413
    Abstract: One embodiment provides a graphics processor comprising a system interface and circuitry coupled with the system interface. The circuitry includes an execution resource and a preemption status register. The execution resource is configured to execute an instruction. During execution of the instruction, the execution resource is to receive a request to preempt execution of a thread associated with the instruction and, based on a value stored in the preemption status register, execute at least one additional instruction after receipt of the request to preempt execution of the thread.
    Type: Application
    Filed: May 20, 2022
    Publication date: September 1, 2022
    Applicant: Intel Corporation
    Inventors: Altug Koker, Ingo Wald, David Puffer, Subramaniam M. Maiyuran, Prasoonkumar Surti, Balaji Vembu, Guei-Yuan Lueh, Murali Ramadoss, Abhishek R. Appu, Joydeep Ray
  • Patent number: 11430083
    Abstract: Techniques to improve performance of matrix multiply operations are described in which a compute kernel can specify one or more element-wise operations to perform on output of the compute kernel before the output is transferred to higher levels of a processor memory hierarchy.
    Type: Grant
    Filed: March 5, 2021
    Date of Patent: August 30, 2022
    Assignee: Intel Corporation
    Inventors: Eriko Nurvitadhi, Balaji Vembu, Tsung-Han Lin, Kamal Sinha, Rajkishore Barik, Nicolas C. Galoppo Von Borries
  • Patent number: 11430082
    Abstract: A mechanism is described for facilitating inference coordination and processing utilization for machine learning at autonomous machines. A method of embodiments, as described herein, includes detecting, at training time, information relating to one or more tasks to be performed according to a training dataset relating to a processor including a graphics processor. The method may further include analyzing the information to determine one or more portions of hardware relating to the processor capable of supporting the one or more tasks, and configuring the hardware to pre-select the one or more portions to perform the one or more tasks, while other portions of the hardware remain available for other tasks.
    Type: Grant
    Filed: January 7, 2021
    Date of Patent: August 30, 2022
    Assignee: Intel Corporation
    Inventors: Abhishek R. Appu, Altug Koker, John C. Weast, Mike B. Macpherson, Linda L. Hurd, Sara S. Baghsorkhi, Justin E. Gottschlich, Prasoonkumar Surti, Chandrasekaran Sakthivel, Liwei Ma, Elmoustapha Ould-Ahmed-Vall, Kamal Sinha, Joydeep Ray, Balaji Vembu, Sanjeev Jahagirdar, Vasanth Ranganathan, Dukhwan Kim
  • Publication number: 20220262059
    Abstract: An embodiment of an electronic processing system may include an application processor, persistent storage media communicatively coupled to the application processor, and a graphics subsystem communicatively coupled to the application processor. The system may include one or more of a draw call re-orderer communicatively coupled to the application processor and the graphics subsystem to re-order two or more draw calls, a workload re-orderer communicatively coupled to the application processor and the graphics subsystem to re-order two or more work items in an order independent mode, a queue primitive included in at least one of the two or more draw calls to define a producer stage and a consumer stage, and an order-independent executor communicatively coupled to the application processor and the graphics subsystem to provide tile-based order independent execution of a compute stage. Other embodiments are disclosed and claimed.
    Type: Application
    Filed: February 2, 2022
    Publication date: August 18, 2022
    Inventors: Devan Burke, Adam T. Lake, Jeffery S. Boles, John H. Feit, Karthik Vaidyanathan, Abhishek R. Appu, Joydeep Ray, Subramaniam Maiyuran, Altug Koker, Balaji Vembu, Murali Ramadoss, Prasoonkumar Surti, Eric J. Hoekstra, Gabor Liktor, Jonathan Kennedy, Slawomir Grajewski, Elmoustapha Ould-Ahmed-Vall