Patents by Inventor Balaji Vembu

Balaji Vembu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20220164916
    Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to cause the compute apparatus to perform a complex compute operation.
    Type: Application
    Filed: December 3, 2021
    Publication date: May 26, 2022
    Applicant: Intel Corporation
    Inventors: Eriko Nurvitadhi, Balaji Vembu, Nicolas C. Galoppo Von Borries, Rajkishore Barik, Tsung-Han Lin, Kamal Sinha, Nadathur Rajagopalan Satish, Jeremy Bottleson, Farshad Akhbari, Altug Koker, Narayan Srinivasa, Dukhwan Kim, Sara S. Baghsorkhi, Justin E. Gottschlich, Feng Chen, Elmoustapha Ould-Ahmed-Vall, Kevin Nealis, Xiaoming Chen, Anbang Yao
  • Patent number: 11341600
    Abstract: An apparatus to facilitate partitioning of a graphics device is disclosed. The apparatus includes a plurality of engines and logic to partition the plurality of engines to facilitate independent access to each engine within the plurality of engines.
    Type: Grant
    Filed: November 13, 2019
    Date of Patent: May 24, 2022
    Assignee: Intel Corporation
    Inventors: Abhishek R. Appu, Balaji Vembu, Altug Koker, Bryan R. White, David J. Cowperthwaite, Joydeep Ray, Murali Ramadoss
  • Patent number: 11341212
    Abstract: An apparatus and method for protecting content in a graphics processor. For example, one embodiment of an apparatus comprises: encode/decode circuitry to decode protected audio and/or video content to generate decoded audio and/or video content; a graphics cache of a graphics processing unit (GPU) to store the decoded audio and/or video content; first protection circuitry to set a protection attribute for each cache line containing the decoded audio and/or video data in the graphics cache; a cache coherency controller to generate a coherent read request to the graphics cache; second protection circuitry to read the protection attribute to determine whether the cache line identified in the read request is protected, wherein if it is protected, the second protection circuitry to refrain from including at least some of the data from the cache line in a response.
    Type: Grant
    Filed: February 17, 2020
    Date of Patent: May 24, 2022
    Assignee: INTEL CORPORATION
    Inventors: Joydeep Ray, Abhishek R. Appu, Pattabhiraman K, Balaji Vembu, Altug Koker
  • Publication number: 20220156876
    Abstract: An apparatus to facilitate compute optimization is disclosed. The apparatus includes one or more processing units to provide a first set of shader operations associated with a shader stage of a graphics pipeline, a scheduler to schedule shader threads for processing, and a field-programmable gate array (FPGA) dynamically configured to provide a second set of shader operations associated with the shader stage of the graphics pipeline.
    Type: Application
    Filed: November 18, 2021
    Publication date: May 19, 2022
    Applicant: Intel Corporation
    Inventors: Prasoonkumar Surti, Narayan Srinivasa, Feng Chen, Joydeep Ray, Ben J. Ashbaugh, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Balaji Vembu, Tsung-Han Lin, Kamal Sinha, Rajkishore Barik, Sara S. Baghsorkhi, Justin E. Gottschlich, Altug Koker, Nadathur Rajagopalan Satish, Farshad Akhbari, Dukhwan Kim, Wenyin Fu, Travis T. Schluessler, Josh B. Mastronarde, Linda L. Hurd, John H. Feit, Jeffery S. Boles, Adam T. Lake, Karthik Vaidyanathan, Devan Burke, Subramaniam Maiyuran, Abhishek R. Appu
  • Publication number: 20220156875
    Abstract: Thread dispatch circuitry is configured to dispatch threads of a two-dimensional (2D) thread group based on data access locality associated with the threads. The thread dispatch circuitry can dispatch a first 2D sub-group of the 2D thread group to a compute block of the multiple compute blocks, the first 2D sub-group associated with a first 2D tile of memory and dispatch a second 2D sub-group of the 2D thread group to the compute block of the multiple compute blocks, the second 2D sub-group associated with a second 2D tile of memory.
    Type: Application
    Filed: November 16, 2021
    Publication date: May 19, 2022
    Applicant: Intel Corporation
    Inventors: Altug Koker, Balaji Vembu, Joydeep Ray, James A. Valerio, Abhishek R. Appu
  • Patent number: 11334962
    Abstract: An apparatus to facilitate compute optimization is disclosed. The apparatus includes a plurality of processing units each comprising a plurality of processing cores of a first type and a second type. A first set of processing cores of a first type perform multi-dimensional matrix operations and a second set of processing cores of a second type perform general purpose graphics processing unit (GPGPU) operations.
    Type: Grant
    Filed: July 26, 2021
    Date of Patent: May 17, 2022
    Assignee: Intel Corporation
    Inventors: Prasoonkumar Surti, Narayan Srinivasa, Feng Chen, Joydeep Ray, Ben J. Ashbaugh, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Balaji Vembu, Tsung-Han Lin, Kamal Sinha, Rajkishore Barik, Sara S. Baghsorkhi, Justin E. Gottschlich, Altug Koker, Nadathur Rajagopalan Satish, Farshad Akhbari, Dukhwan Kim, Wenyin Fu, Travis T. Schluessler, Josh B. Mastronarde, Linda L. Hurd, John H. Feit, Jeffery S. Boles, Adam T. Lake, Karthik Vaidyanathan, Devan Burke, Subramaniam Maiyuran, Abhishek R. Appu
  • Patent number: 11320886
    Abstract: Methods and apparatus relating to techniques for a dual path sequential element to reduce toggles in data path are described. In an embodiment, switching logic causes signals for a single data path of a processor to be directed to at least two separate data paths. At least one of the two separate data paths is power gated to reduce signal toggles in the at least one data path. Other embodiments are also disclosed and claimed.
    Type: Grant
    Filed: December 1, 2020
    Date of Patent: May 3, 2022
    Assignee: INTEL CORPORATION
    Inventors: Subramaniam Maiyuran, Sanjeev S. Jahagirdar, Kiran C. Veernapu, Eric J. Asperheim, Altug Koker, Balaji Vembu, Joydeep Ray, Abhishek R. Appu
  • Patent number: 11314654
    Abstract: A mechanism is described for facilitating optimization of cache associated with graphics processors at computing devices. A method of embodiments, as described herein, includes introducing coloring bits to contents of a cache associated with a processor including a graphics processor, wherein the coloring bits to represent a signal identifying one or more caches available for use, while avoiding explicit invalidations and flushes.
    Type: Grant
    Filed: November 19, 2020
    Date of Patent: April 26, 2022
    Assignee: Intel Corporation
    Inventors: Altug Koker, Balaji Vembu, Joydeep Ray, Abhishek R. Appu
  • Publication number: 20220114430
    Abstract: One embodiment provides an apparatus comprising an instruction cache to store a plurality of instructions, a scheduler unit coupled to the instruction cache, the scheduler unit to schedule the plurality of instructions for execution, an instruction fetch and decode unit to decode the plurality of instructions to determine a set of operations to perform in response, one or more compute blocks to perform parallel multiply-accumulate operations based on the instruction fetch and decode unit decoding a first instruction of the plurality of instructions, and matrix multiplication logic to perform matrix multiplication operations based on the instruction fetch and decode unit decoding a second instruction of the plurality of instructions.
    Type: Application
    Filed: December 21, 2021
    Publication date: April 14, 2022
    Applicant: Intel Corporation
    Inventors: Rajkishore Barik, Elmoustapha Ould-Ahmed-Vall, Xiaoming Chen, Dhawal Srivastava, Anbang Yao, Kevin Nealis, Eriko Nurvitadhi, Sara S. Baghsorkhi, Balaji Vembu, Tatiana Shpeisman, Ping T. Tang
  • Publication number: 20220113783
    Abstract: Methods and apparatus relating to techniques for avoiding cache lookup for cold cache. In an example, an apparatus comprises logic, at least partially comprising hardware logic, to collect user information for a user of a data processing device, generate a user profile for the user of the data processing device from the user information, and set a power profile a processor in the data processing device using the user profile. Other embodiments are also disclosed and claimed.
    Type: Application
    Filed: August 25, 2021
    Publication date: April 14, 2022
    Applicant: INTEL CORPORATION
    Inventors: Altug Koker, Abhishek R. Appu, Kiran C. Veernapu, Joydeep Ray, Balaji Vembu, Prasoonkumar Surti, Kamal Sinha, Eric J. Hoekstra, Wenyin Fu, Nikos Kaburlasos, Bhushan M. Borole, Travis T. Schluessler, Ankur N. Shah, Jonathan Kennedy
  • Patent number: 11281837
    Abstract: In accordance with embodiments disclosed herein, there is provided systems and methods for router-based transaction routing for toggle reduction. An integrated circuit includes a transmitter circuit, receiver circuits, and a multicast bus coupled between the transmitter circuit and the receiver circuits. The multicast bus includes a first flow router circuit to route a multicast signal to a first receiver circuit of the plurality of receiver circuits and not route the multicast signal to a second receiver circuit of the plurality of receiver circuits.
    Type: Grant
    Filed: December 18, 2017
    Date of Patent: March 22, 2022
    Assignee: Intel Corporation
    Inventors: Hema Chand Nalluri, Balaji Vembu, Santosh Tripathy, Altug Koker, Pattabhiraman K
  • Patent number: 11282161
    Abstract: An apparatus and method are described for managing data which is biased towards a processor or a GPU. For example, an apparatus comprises a processor comprising one or more cores, one or more cache levels, and cache coherence controllers to maintain coherent data in the one or more cache levels; a graphics processing unit (GPU) to execute graphics instructions and process graphics data, wherein the GPU and processor cores are to share a virtual address space for accessing a system memory; a GPU memory addressable through the virtual address space shared by the processor cores and GPU; and bias management circuitry to store an indication for whether the data has a processor bias or a GPU bias, wherein if the data has a GPU bias, the data is to be accessed by the GPU without necessarily accessing the processor's cache coherence controllers.
    Type: Grant
    Filed: May 5, 2020
    Date of Patent: March 22, 2022
    Assignee: INTEL CORPORATION
    Inventors: Joydeep Ray, Abhishek R. Appu, Altug Koker, Balaji Vembu
  • Publication number: 20220084329
    Abstract: An autonomous vehicle is provided that includes one or more processors configured to provide a local compute manager to manage execution of compute workloads associated with the autonomous vehicle. The local compute manager can perform various compute operations, including receiving offload of compute operations from to other compute nodes and offloading compute operations to other compute notes, where the other compute nodes can be other autonomous vehicles. The local compute manager can also facilitate autonomous navigation functionality.
    Type: Application
    Filed: November 30, 2021
    Publication date: March 17, 2022
    Applicant: Intel Corporation
    Inventors: Barath LAKSHAMANAN, Linda L. HURD, Ben J. ASHBAUGH, Elmoustapha OULD-AHMED-VALL, Liwei MA, Jingyi JIN, Justin E. GOTTSCHLICH, Chandrasekaran SAKTHIVEL, Michael S. STRICKLAND, Brian T. LEWIS, Lindsey KUPER, Altug KOKER, Abhishek R. APPU, Prasoonkumar SURTI, Joydeep RAY, Balaji VEMBU, Javier S. TUREK, Naila FAROOQUI
  • Publication number: 20220084252
    Abstract: An apparatus to facilitate compute compression is disclosed. The apparatus includes a graphics processing unit including mapping logic to map a first block of integer pixel data to a compression block and compression logic to compress the compression block.
    Type: Application
    Filed: June 23, 2021
    Publication date: March 17, 2022
    Applicant: Intel Corporation
    Inventors: Abhishek Appu, Altug Koker, Joydeep Ray, Balaji Vembu, Prasoonkumar Surti, Kamal Sinha, Nadathur Rajagoplan Satish, Narayan Srinivasa, Feng Chen, Dukhwan Kim, Farshad Akhbari
  • Publication number: 20220076480
    Abstract: Systems, apparatuses, and methods may provide for technology to process graphics data in a virtual gaming environment. The technology may identify, from graphics data in a graphics application, redundant graphics calculations relating to common frame characteristics of one or more graphical scenes to be shared between client game devices of a plurality of users and calculate, in response to the identified redundant graphics calculations, frame characteristics relating to the one or more graphical scenes. Additionally, the technology may send, over a computer network, the calculation of the frame characteristics to the client game devices.
    Type: Application
    Filed: September 17, 2021
    Publication date: March 10, 2022
    Inventors: Jonathan Kennedy, Gabor Liktor, Jeffery S. Boles, Slawomir Grajewski, Balaji Vembu, Travis T. Schluessler, Abhishek R. Appu, Ankur N. Shah, Joydeep Ray, Altug Koker, Jacek Kwiatkowski
  • Patent number: 11270406
    Abstract: Embodiments described herein provide techniques enable a compute unit to continue processing operations when all dispatched threads are blocked. One embodiment provides for a method comprising executing multiple concurrent threads on a processing resource of a graphics processor, during execution, detecting that each of the multiple concurrent threads of the processing resource are blocked from execution, selecting a victim thread from the multiple concurrent threads, and suspending the victim thread. The thread state is stored to a thread scratch space in memory along with a blocking event associated with the victim thread.
    Type: Grant
    Filed: November 16, 2020
    Date of Patent: March 8, 2022
    Assignee: Intel Corporation
    Inventors: Murali Ramadoss, Balaji Vembu, Eric C. Samson, Kun Tian, David J. Cowperthwaite, Altug Koker, Zhi Wang, Joydeep Ray, Subramaniam M. Maiyuran, Abhishek R. Appu
  • Patent number: 11269643
    Abstract: A mechanism is described for facilitating fast data operations and for facilitating a finite state machine for machine learning at autonomous machines. A method of embodiments, as described herein, includes detecting input data to be used in computational tasks by a computation component of a processor including a graphics processor. The method may further include determining one or more frequently-used data values (FDVs) from the data, and pushing the one or more frequent data values to bypass the computational tasks.
    Type: Grant
    Filed: April 9, 2017
    Date of Patent: March 8, 2022
    Assignee: Intel Corporation
    Inventors: Liwei Ma, Nadathur Rajagopalan Satish, Jeremy Bottleson, Farshad Akhbari, Eriko Nurvitadhi, Abhishek R. Appu, Altug Koker, Kamal Sinha, Joydeep Ray, Balaji Vembu, Vasanth Ranganathan, Sanjeev Jahagirdar
  • Publication number: 20220066970
    Abstract: Methods and apparatus relating to techniques for avoiding cache lookup for cold cache. In an example, an apparatus comprises logic, at least partially comprising hardware logic, to monitor a thread switching overhead parameter for an application executing in a processing system and in response to a determination that the thread switching overhead parameter exceeds a threshold, to activate a thread management algorithm to reduce thread switching in the processing system. Other embodiments are also disclosed and claimed.
    Type: Application
    Filed: July 2, 2021
    Publication date: March 3, 2022
    Applicant: Intel Corporation
    Inventors: Abhishek R. Appu, Altug Koker, Joydeep Ray, Kiran C. Veernapu, Balaji Vembu, Vasanth Ranganathan, Prasoonkumar Surti
  • Publication number: 20220067874
    Abstract: In an example, an apparatus comprises logic, at least partially comprising hardware logic, to receive an input from one or more detectors proximate a display to present an output from a graphics pipeline, determine that a user is not interacting with the display, and in response to a determination that the user is not interacting with the display, to reduce a frame rendering rate of the graphics pipeline. Other embodiments are also disclosed and claimed.
    Type: Application
    Filed: August 10, 2021
    Publication date: March 3, 2022
    Applicant: INTEL CORPORATION
    Inventors: Balaji Vembu, Nikos Kaburlasos, Josh B. Mastronarde
  • Publication number: 20220066726
    Abstract: In accordance with some embodiments, the render rate is varied across and/or up and down the display screen. This may be done based on where the user is looking in order to reduce power consumption and/or increase performance. Specifically the screen display is separated into regions, such as quadrants. Each of these regions is rendered at a rate determined by at least one of what the user is currently looking at, what the user has looked at in the past and/or what it is predicted that the user will look at next. Areas of less focus may be rendered at a lower rate, reducing power consumption in some embodiments.
    Type: Application
    Filed: August 11, 2021
    Publication date: March 3, 2022
    Inventors: Eric J. Asperheim, Subramaniam M. Maiyuran, Kiran C. Veernapu, Sanjeev S. Jahagirdar, Balaji Vembu, Devan Burke, Philip R. Laws, Kamal Sinha, Abhishek R. Appu, Elmoustapha Ould-Ahmed-Vall, Peter L. Doyle, Joydeep Ray, Travis T. Schluessler, John H. Feit, Nikos Kaburlasos, Jacek Kwiatkowski, Altug Koker