Patents by Inventor Balaji Vembu

Balaji Vembu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240095201
    Abstract: Embodiments described herein provide techniques to facilitate scalable interrupts and workload submission for a virtualized graphics processor. The techniques include memory-based interrupt reporting and shared work queue submission for multiple software domains.
    Type: Application
    Filed: August 31, 2023
    Publication date: March 21, 2024
    Applicant: Intel Corporation
    Inventors: David Puffer, Ankur Shah, Niranjan Cooray, Bryan White, Balaji Vembu, Hema Chand Nalluri, Kritika Bala
  • Publication number: 20240086199
    Abstract: An apparatus to facilitate thread scheduling is disclosed. The apparatus includes logic to store barrier usage data based on a magnitude of barrier messages in an application kernel and a scheduler to schedule execution of threads across a plurality of multiprocessors based on the barrier usage data.
    Type: Application
    Filed: August 4, 2023
    Publication date: March 14, 2024
    Applicant: Intel Corporation
    Inventors: Balaji Vembu, Abhishek R. Appu, Joydeep Ray, Altug Koker
  • Publication number: 20240086138
    Abstract: In accordance with some embodiments, the render rate is varied across and/or up and down the display screen. This may be done based on where the user is looking in order to reduce power consumption and/or increase performance. Specifically the screen display is separated into regions, such as quadrants. Each of these regions is rendered at a rate determined by at least one of what the user is currently looking at, what the user has looked at in the past and/or what it is predicted that the user will look at next. Areas of less focus may be rendered at a lower rate, reducing power consumption in some embodiments.
    Type: Application
    Filed: September 26, 2023
    Publication date: March 14, 2024
    Inventors: Eric J. Asperheim, Subramaniam Maiyuran, Kiran C. Veernapu, Sanjeev S. Jahagirdar, Balaji Vembu, Devan Burke, Philip R. Laws, Kamal Sinha, Abhishek R. Appu, Elmoustapha Ould-Ahmed-Vall, Peter L. Doyle, Joydeep Ray, Travis T. Schluessler, John H. Feit, Nikos Kaburlasos, Jacek Kwiatkowski, Altug Koker
  • Publication number: 20240086542
    Abstract: In various examples there is a computing device comprising: a first microcontroller comprising a first immutable bootloader and first mutable firmware. The first immutable bootloader uses a unique device secret burnt into hardware of the computing device in order to generate an attestation of the first mutable firmware. The computing device has a second microcontroller. There is second mutable firmware at the second microcontroller. There is a second immutable bootloader at the second microcontroller which sends a measurement of the second mutable firmware to the first immutable bootloader whenever the second microcontroller restarts, such that the first microcontroller is able to include the measurement in the attestation.
    Type: Application
    Filed: November 13, 2023
    Publication date: March 14, 2024
    Inventors: Stavros VOLOS, Colin DOAK, Simon Douglas CHAMBERS, David RUGGLES, Richard NEAL, Cedric Alain Marie FOURNET, Kapil VASWANI, Balaji VEMBU
  • Publication number: 20240078629
    Abstract: Techniques to improve performance of matrix multiply operations are described in which a compute kernel can specify one or more element-wise operations to perform on output of the compute kernel before the output is transferred to higher levels of a processor memory hierarchy.
    Type: Application
    Filed: September 14, 2023
    Publication date: March 7, 2024
    Applicant: Intel Corporation
    Inventors: Eriko Nurvitadhi, Balaji Vembu, Tsung-Han Lin, Kamal Sinha, Rajkishore Barik, Nicolas C. Galoppo Von Borries
  • Patent number: 11922535
    Abstract: Embodiments provide mechanisms to facilitate compute operations for deep neural networks. One embodiment comprises a graphics processing unit comprising one or more multiprocessors, at least one of the one or more multiprocessors including a register file to store a plurality of different types of operands and a plurality of processing cores. The plurality of processing cores includes a first set of processing cores of a first type and a second set of processing cores of a second type. The first set of processing cores are associated with a first memory channel and the second set of processing cores are associated with a second memory channel.
    Type: Grant
    Filed: February 13, 2023
    Date of Patent: March 5, 2024
    Assignee: Intel Corporation
    Inventors: Prasoonkumar Surti, Narayan Srinivasa, Feng Chen, Joydeep Ray, Ben J. Ashbaugh, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Balaji Vembu, Tsung-Han Lin, Kamal Sinha, Rajkishore Barik, Sara S. Baghsorkhi, Justin E. Gottschlich, Altug Koker, Nadathur Rajagopalan Satish, Farshad Akhbari, Dukhwan Kim, Wenyin Fu, Travis T. Schluessler, Josh B. Mastronarde, Linda L. Hurd, John H. Feit, Jeffery S. Boles, Adam T. Lake, Karthik Vaidyanathan, Devan Burke, Subramaniam Maiyuran, Abhishek R. Appu
  • Publication number: 20240013337
    Abstract: A mechanism is described for detecting, at training time, information related to one or more tasks to be performed by the one or more processors according to a training dataset for a neural network, analyzing the information to determine one or more portions of hardware of a processor of the one or more processors that is configurable to support the one or more tasks, configuring the hardware to pre-select the one or more portions to perform the one or more tasks, while other portions of the hardware remain available for other tasks, and monitoring utilization of the hardware via a hardware unit of the graphics processor and, via a scheduler of the graphics processor, adjusting allocation of the one or more tasks to the one or more portions of the hardware based on the utilization.
    Type: Application
    Filed: July 13, 2023
    Publication date: January 11, 2024
    Applicant: Intel Corporation
    Inventors: Abhishek R. Appu, Altug Koker, John C. Weast, Mike B. Macpherson, Linda L. Hurd, Sara S. Baghsorkhi, Justin E. Gottschlich, Prasoonkumar Surti, Chandrasekaran Sakthivel, Liwei Ma, Elmoustapha Ould-Ahmed-Vall, Kamal Sinha, Joydeep Ray, Balaji Vembu, Sanjeev Jahagirdar, Vasanth Ranganathan, Dukhwan Kim
  • Patent number: 11868264
    Abstract: One embodiment provides circuitry coupled with cache memory and a memory interface, the circuitry to compress compute data at multiple cache line granularity, and a processing resource coupled with the memory interface and the cache memory. The processing resource is configured to perform a general-purpose compute operation on compute data associated with multiple cache lines of the cache memory. The circuitry is configured to compress the compute data before a write of the compute data via the memory interface to the memory bus, in association with a read of the compute data associated with the multiple cache lines via the memory interface, decompress the compute data, and provide the decompressed compute data to the processing resource.
    Type: Grant
    Filed: February 13, 2023
    Date of Patent: January 9, 2024
    Assignee: Intel Corporation
    Inventors: Abhishek R. Appu, Altug Koker, Joydeep Ray, David Puffer, Prasoonkumar Surti, Lakshminarayanan Striramassarma, Vasanth Ranganathan, Kiran C. Veernapu, Balaji Vembu, Pattabhiraman K
  • Publication number: 20240004833
    Abstract: In an example, an apparatus comprises a plurality of execution units, and a first memory communicatively couple to the plurality of execution units, wherein the first shared memory is shared by the plurality of execution units and a copy engine to copy context state data from at least a first of the plurality of execution units to the first shared memory. Other embodiments are also disclosed and claimed.
    Type: Application
    Filed: July 10, 2023
    Publication date: January 4, 2024
    Applicant: Intel Corporation
    Inventors: Altug Koker, Prasoonkumar Surti, David Puffer, Subramaniam Maiyuran, Guei-Yuan Lueh, Abhishek R. Appu, Joydeep Ray, Balaji Vembu, Tomer Bar-On, Andrew T. Lauritzen, Hugues Labbe, John G. Gierach, Gabor Liktor
  • Publication number: 20240005136
    Abstract: In an example, an apparatus comprises a compute engine comprising a high precision component and a low precision component; and logic, at least partially including hardware logic, to receive instructions in the compute engine; select at least one of the high precision component or the low precision component to execute the instructions; and apply a gate to at least one of the high precision component or the low precision component to execute the instructions. Other embodiments are also disclosed and claimed.
    Type: Application
    Filed: July 12, 2023
    Publication date: January 4, 2024
    Applicant: Intel Corporation
    Inventors: Kamal Sinha, Balaji Vembu, Eriko Nurvitadhi, Nicolas C. Galoppo Von Borries, Rajkishore Barik, Tsung-Han Lin, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Anbang Yao, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Farshad Akhbari, Narayan Srinivasa, Feng Chen, Dukhwan Kim, Nadathur Rajagopalan Satish, John C. Weast, Mike B. MacPherson, Linda L. Hurd, Vasanth Ranganathan, Sanjeev Jahagirdar
  • Publication number: 20240004713
    Abstract: In an example, an apparatus comprises a plurality of execution units comprising at least a first type of execution unit and a second type of execution unit and logic, at least partially including hardware logic, to analyze a workload and assign the workload to one of the first type of execution unit or the second type of execution unit. Other embodiments are also disclosed and claimed.
    Type: Application
    Filed: August 1, 2023
    Publication date: January 4, 2024
    Applicant: Intel Corporation
    Inventors: Abhishek R. APPU, Altug KOKER, Balaji VEMBU, Joydeep RAY, Kamal SINHA, Prasoonkumar SURTI, Kiran C. VEERNAPU, Subramaniam MAIYURAN, Sanjeev S. Jahagirdar, Eric J. Asperheim, Guei-Yuan Lueh, David Puffer, Wenyin Fu, Nikos Kaburlasos, Bhushan M. Borole, Josh B. Mastronarde, Linda L. Hurd, Travis T. Schluessler, Tomasz Janczak, Abhishek Venkatesh, Kai Xiao, Slawomir Grajewski
  • Publication number: 20230418355
    Abstract: Methods and apparatus relating to techniques for avoiding cache lookup for cold cache. In an example, an apparatus comprises logic, at least partially comprising hardware logic, to collect user information for a user of a data processing device, generate a user profile for the user of the data processing device from the user information, and set a power profile a processor in the data processing device using the user profile. Other embodiments are also disclosed and claimed.
    Type: Application
    Filed: June 22, 2023
    Publication date: December 28, 2023
    Applicant: INTEL CORPORATION
    Inventors: Altug Koker, Abhishek R. Appu, Kiran C. Veernapu, Joydeep Ray, Balaji Vembu, Prasoonkumar Surti, Kamal Sinha, Eric J. Hoekstra, Wenyin Fu, Nikos Kaburlasos, Bhushan M. Borole, Travis T. Schluessler, Ankur N. Shah, Jonathan Kennedy
  • Patent number: 11853429
    Abstract: In various examples there is a computing device comprising: a first microcontroller comprising a first immutable bootloader and first mutable firmware. The first immutable bootloader uses a unique device secret burnt into hardware of the computing device in order to generate an attestation of the first mutable firmware. The computing device has a second microcontroller. There is second mutable firmware at the second microcontroller. There is a second immutable bootloader at the second microcontroller which sends a measurement of the second mutable firmware to the first immutable bootloader whenever the second microcontroller restarts, such that the first microcontroller is able to include the measurement in the attestation.
    Type: Grant
    Filed: July 13, 2021
    Date of Patent: December 26, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Stavros Volos, Colin Doak, Simon Douglas Chambers, David Ruggles, Richard Neal, Cédric Alain Marie Fournet, Kapil Vaswani, Balaji Vembu
  • Patent number: 11847719
    Abstract: An apparatus and method are described for managing data which is biased towards a processor or a GPU. For example, an apparatus comprises a processor comprising one or more cores, one or more cache levels, and cache coherence controllers to maintain coherent data in the one or more cache levels; a graphics processing unit (GPU) to execute graphics instructions and process graphics data, wherein the GPU and processor cores are to share a virtual address space for accessing a system memory; a GPU memory addressable through the virtual address space shared by the processor cores and GPU; and bias management circuitry to store an indication for whether the data has a processor bias or a GPU bias, wherein if the data has a GPU bias, the data is to be accessed by the GPU without necessarily accessing the processor's cache coherence controllers.
    Type: Grant
    Filed: March 15, 2022
    Date of Patent: December 19, 2023
    Assignee: Intel Corporation
    Inventors: Joydeep Ray, Abhishek R. Appu, Altug Koker, Balaji Vembu
  • Publication number: 20230394616
    Abstract: One embodiment provides a parallel processor comprising a hardware scheduler to schedule pipeline commands for compute operations to one or more of multiple types of compute units, a plurality of processing resources including a first sparse compute unit configured for input at a first level of sparsity and hybrid memory circuitry including a memory controller, a memory interface, and a second sparse compute unit configured for input at a second level of sparsity that is greater than the first level of sparsity.
    Type: Application
    Filed: June 14, 2023
    Publication date: December 7, 2023
    Applicant: Intel Corporation
    Inventors: Eriko Nurvitadhi, Balaji Vembu, Nicolas C. Galoppo Von Borries, Rajkishore Barik, Tsung-Han Lin, Kamal Sinha, Nadathur Rajagopalan Satish, Jeremy Bottleson, Farshad Akhbari, Altug Koker, Narayan Srinivasa, Dukhwan Kim, Sara S. Baghsorkhi, Justin E. Gottschlich, Feng Chen, Elmoustapha Ould-Ahmed-Vall, Kevin Nealis, Xiaoming Chen, Anbang Yao
  • Patent number: 11816384
    Abstract: In accordance with some embodiments, the render rate is varied across and/or up and down the display screen. This may be done based on where the user is looking in order to reduce power consumption and/or increase performance. Specifically the screen display is separated into regions, such as quadrants. Each of these regions is rendered at a rate determined by at least one of what the user is currently looking at, what the user has looked at in the past and/or what it is predicted that the user will look at next. Areas of less focus may be rendered at a lower rate, reducing power consumption in some embodiments.
    Type: Grant
    Filed: October 4, 2022
    Date of Patent: November 14, 2023
    Assignee: Intel Corporation
    Inventors: Eric J. Asperheim, Subramaniam Maiyuran, Kiran C. Veernapu, Sanjeev S. Jahagirdar, Balaji Vembu, Devan Burke, Philip R. Laws, Kamal Sinha, Abhishek R. Appu, Elmoustapha Ould-Ahmed-Vall, Peter L. Doyle, Joydeep Ray, Travis T. Schluessler, John H. Feit, Nikos Kaburlasos, Jacek Kwiatkowski, Altug Koker
  • Publication number: 20230359461
    Abstract: One embodiment provides for a compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction that specifies multiple operands including a multi-bit input value and a one-bit weight associated with a neural network, as well as an arithmetic logic unit including a multiplier, an adder, and an accumulator register. To execute the decoded instruction, the multiplier is to perform a fused operation including an exclusive not OR (XNOR) operation and a population count operation. The adder is configured to add the intermediate product to a value stored in the accumulator register and update the value stored in the accumulator register.
    Type: Application
    Filed: May 11, 2023
    Publication date: November 9, 2023
    Applicant: Intel Corporation
    Inventors: Kevin Nealis, Anbang Yao, Xiaoming Chen, Elmoustapha Ould-Ahmed-Vall, Sara S. Baghsorkhi, Eriko Nurvitadhi, Balaji Vembu, Nicolas C. Galoppo Von Borries, Rajkishore Barik, Tsung-Han Lin, Kamal Sinha
  • Patent number: 11810405
    Abstract: An autonomous vehicle is provided that includes one or more processors configured to provide a local compute manager to manage execution of compute workloads associated with the autonomous vehicle. The local compute manager can perform various compute operations, including receiving offload of compute operations from to other compute nodes and offloading compute operations to other compute notes, where the other compute nodes can be other autonomous vehicles. The local compute manager can also facilitate autonomous navigation functionality.
    Type: Grant
    Filed: November 30, 2021
    Date of Patent: November 7, 2023
    Assignee: Intel Corporation
    Inventors: Barath Lakshamanan, Linda L. Hurd, Ben J. Ashbaugh, Elmoustapha Ould-Ahmed-Vall, Liwei Ma, Jingyi Jin, Justin E. Gottschlich, Chandrasekaran Sakthivel, Michael S. Strickland, Brian T. Lewis, Lindsey Kuper, Altug Koker, Abhishek R. Appu, Prasoonkumar Surti, Joydeep Ray, Balaji Vembu, Javier S. Turek, Naila Farooqui
  • Patent number: 11803935
    Abstract: Techniques to improve performance of matrix multiply operations are described in which a compute kernel can specify one or more element-wise operations to perform on output of the compute kernel before the output is transferred to higher levels of a processor memory hierarchy.
    Type: Grant
    Filed: August 5, 2022
    Date of Patent: October 31, 2023
    Assignee: Intel Corporation
    Inventors: Eriko Nurvitadhi, Balaji Vembu, Tsung-Han Lin, Kamal Sinha, Rajkishore Barik, Nicolas C. Galoppo Von Borries
  • Patent number: 11803934
    Abstract: One embodiment provides an apparatus comprising an interconnect fabric comprising one or more fabric switches, a plurality of memory interfaces coupled to the interconnect fabric to provide access to a plurality of memory devices, an input/output (IO) interface coupled to the interconnect fabric to provide access to IO devices, an array of multiprocessors coupled to the interconnect fabric, scheduling circuitry to distribute a plurality of thread groups across the array of multiprocessors, each thread group comprising a plurality of threads and each thread comprising a plurality of instructions to be executed by at least one of the multiprocessors, and a first multiprocessor of the array of multiprocessors to be assigned to process a first thread group comprising a first plurality of threads, the first multiprocessor comprising a plurality of parallel execution circuits.
    Type: Grant
    Filed: February 2, 2022
    Date of Patent: October 31, 2023
    Assignee: Intel Corporation
    Inventors: Balaji Vembu, Altug Koker, Joydeep Ray