Patents by Inventor Balaji Vembu

Balaji Vembu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20210158471
    Abstract: Embodiments described herein provide techniques enable a compute unit to continue processing operations when all dispatched threads are blocked. One embodiment provides for a method comprising executing multiple concurrent threads on a processing resource of a graphics processor, during execution, detecting that each of the multiple concurrent threads of the processing resource are blocked from execution, selecting a victim thread from the multiple concurrent threads, and suspending the victim thread. The thread state is stored to a thread scratch space in memory along with a blocking event associated with the victim thread.
    Type: Application
    Filed: November 16, 2020
    Publication date: May 27, 2021
    Applicant: Intel Corporation
    Inventors: Murali Ramadoss, Balaji Vembu, Eric C. Samson, Kun Tian, David J. Cowperthwaite, Altug Koker, Zhi Wang, Joydeep Ray, Subramaniam M. Maiyuran, Abhishek R. Appu
  • Patent number: 11018863
    Abstract: An embodiment of a graphics apparatus may include a graphics processor including a kernel executor, and a security engine communicatively coupled to the graphics processor. The security engine may be configured to create a kernel security key, encrypt an executable kernel for the kernel executor in accordance with the kernel security key, and share the kernel security key with the graphics processor.
    Type: Grant
    Filed: June 7, 2019
    Date of Patent: May 25, 2021
    Assignee: Intel Corporation
    Inventors: Balaji Vembu, Vidhya Krishnan, Sandeep S. Sodhi, Scott Janus, Daniel Nemiroff
  • Publication number: 20210149724
    Abstract: A mechanism is described for facilitating memory-based software barriers to emulate hardware barriers at graphics processors in computing devices. A method of embodiments, as described herein, includes facilitating converting thread scheduling at a processor from hardware barriers to software barriers, where the software barriers emulate the hardware barriers.
    Type: Application
    Filed: November 24, 2020
    Publication date: May 20, 2021
    Applicant: Intel Corporation
    Inventors: Altug Koker, Joydeep Ray, Balaji Vembu, James A. Valerio, Abhishek R. Appu
  • Patent number: 11010659
    Abstract: In an example, an apparatus comprises a compute engine comprising a high precision component and a low precision component; and logic, at least partially including hardware logic, to receive instructions in the compute engine; select at least one of the high precision component or the low precision component to execute the instructions; and apply a gate to at least one of the high precision component or the low precision component to execute the instructions. Other embodiments are also disclosed and claimed.
    Type: Grant
    Filed: April 24, 2017
    Date of Patent: May 18, 2021
    Assignee: INTEL CORPORATION
    Inventors: Kamal Sinha, Balaji Vembu, Eriko Nurvitadhi, Nicolas C. Galoppo Von Borries, Rajkishore Barik, Tsung-Han Lin, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Anbang Yao, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Farshad Akhbari, Narayan Srinivasa, Feng Chen, Dukhwan Kim, Nadathur Rajagopalan Satish, John C. Weast, Mike B. MacPherson, Linda L. Hurd, Vasanth Ranganathan, Sanjeev S. Jahagirdar
  • Patent number: 10997686
    Abstract: Embodiments described herein provide a graphics, media, and compute device having a tiled architecture composed of a number of tiles of smaller graphics devices. The work distribution infrastructure for such device enables the distribution of workloads across multiple tiles of the device. Work items can be submitted to any one or more of the multiple tiles, with workloads able to span multiple tiles. Additionally, upon completion of a work item, graphics, media, and/or compute engines within the device can readily acquire new work items for execution with minimal latency.
    Type: Grant
    Filed: January 9, 2019
    Date of Patent: May 4, 2021
    Assignee: Intel Corporation
    Inventors: Balaji Vembu, Brandon Fliflet, James Valerio, Michael Apodaca, Ben Ashbaugh, Hema Nalluri, Ankur Shah, Murali Ramadoss, David Puffer, Altug Koker, Aditya Navale, Abhishek R. Appu, Joydeep Ray, Travis Schluessler
  • Publication number: 20210124579
    Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute a 32-bit intermediate product of 16-bit operands and to compute a 32-bit sum based on the 32-bit intermediate product.
    Type: Application
    Filed: December 9, 2020
    Publication date: April 29, 2021
    Applicant: Intel Corporation
    Inventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
  • Publication number: 20210125581
    Abstract: A mechanism is described for facilitating using of a shared local memory for register spilling/filling relating to graphics processors at computing devices. A method of embodiments, as described herein, includes reserving one or more spaces of a shared local memory (SLM) to perform one or more of spilling and filling relating to registers associated with a graphics processor of a computing device.
    Type: Application
    Filed: October 5, 2020
    Publication date: April 29, 2021
    Applicant: Intel Corporation
    Inventors: Joydeep Ray, Altug Koker, Balaji Vembu, Murali Ramadoss, Guei-Yuan Lueh, James A. Valerio, Prasoonkumar Surti, Abhishek R. Appu, Vasanth Ranganathan, Kalyan K. Bhiravabhatla, Arthur D. Hunter, JR., Wei-Yu Chen, Subramaniam M. Maiyuran
  • Patent number: 10991075
    Abstract: Systems, apparatuses and methods may provide away to blend two or more of the scene surfaces based on the focus area and an offload threshold. More particularly, systems, apparatuses and methods may provide a way to blend, by a display engine, two or more of the focus area scene surfaces and blended non-focus area scene surfaces. The systems, apparatuses and methods may include a graphics engine to render the focus area surfaces at a higher sample rate than the non-focus area scene surfaces.
    Type: Grant
    Filed: September 7, 2018
    Date of Patent: April 27, 2021
    Assignee: Intel Corporation
    Inventors: Joydeep Ray, Travis T. Schluessler, John H. Feit, Nikos Kaburlasos, Jacek Kwiatkowski, Abhishek R. Appu, Balaji Vembu, Prasoonkumar Surti
  • Patent number: 10977762
    Abstract: One embodiment provides for a general-purpose graphics processing unit multiple processing elements having a single instruction, multiple thread (SIMT) architecture configured to perform hardware multithreading during execution of a plurality of thread groups. The plurality of thread groups can include one or more sub-groups of threads, with a first sub-group is associated with a first thread group and a second sub-group associated with a second thread group. Data dependencies can be used to trigger the launch of threads, such that when a first thread in the second sub-group has a data dependency upon a first thread in the first sub-group, circuitry in the general-purpose graphics processing unit can launch at least the first thread in the second sub-group to execute in response to satisfaction of the data dependency.
    Type: Grant
    Filed: March 30, 2020
    Date of Patent: April 13, 2021
    Assignee: Intel Corporation
    Inventors: Balaji Vembu, Altug Koker, Joydeep Ray
  • Patent number: 10963986
    Abstract: Methods and apparatus relating to techniques for power management. In an example, an apparatus comprises logic, at least partially comprising hardware logic, to receive one or more frames for a workload, determine one or more compute resource parameters for the workload, and store the one or more compute resource parameters for the workload in a memory in association with workload context data for the workload. Other embodiments are also disclosed and claimed.
    Type: Grant
    Filed: December 26, 2019
    Date of Patent: March 30, 2021
    Assignee: INTEL CORPORATION
    Inventors: Balaji Vembu, Josh B. Mastronarde, Altug Koker, Nikos Kaburlasos, Abhishek R. Appu, Joydeep Ray
  • Patent number: 10956223
    Abstract: Methods and apparatus relating to techniques for avoiding cache lookup for cold cache. In an example, an apparatus comprises logic, at least partially comprising hardware logic, to receive a completion acknowledgment from the plurality of graphics processing units and in response to a determination that the workload is finished, to terminate one or more communication connections on the interconnect bridge. Other embodiments are also disclosed and claimed.
    Type: Grant
    Filed: June 18, 2020
    Date of Patent: March 23, 2021
    Assignee: INTEL CORPORATION
    Inventors: Altug Koker, Abhishek R. Appu, Kiran C. Veernapu, Joydeep Ray, Balaji Vembu
  • Patent number: 10955896
    Abstract: Methods and apparatus relating to techniques for avoiding cache lookup for cold cache. In an example, an apparatus comprises logic, at least partially comprising hardware logic, to receive data for a current write operation to a memory, determine a number of bits in the received data for the current write operation to the memory which have changed from a previous write operation to the memory and in response to a determination that the number of bits in the received data for the current write operation to the memory which have changed from a previous write operation to the memory exceeds a threshold, to toggle a plurality of bits in the data for the current write operation to create an encoded data set and set an indicator bit to a value which indicates that the plurality of bits have been toggled. Other embodiments are also disclosed and claimed.
    Type: Grant
    Filed: February 5, 2020
    Date of Patent: March 23, 2021
    Assignee: INTEL CORPORATION
    Inventors: Abhishek R. Appu, Altug Koker, Eric J. Hoekstra, Kiran C. Veernapu, Prasoonkumar Surti, Vasanth Ranganathan, Kamal Sinha, Balaji Vembu, Eric J. Asperheim, Sanjeev S. Jahagirdar, Joydeep Ray
  • Patent number: 10956330
    Abstract: In an example, an apparatus comprises a plurality of processing unit cores, a plurality of cache memory modules associated with the plurality of processing unit cores, and a machine learning model communicatively coupled to the plurality of processing unit cores, wherein the plurality of cache memory modules share cache coherency data with the machine learning model. Other embodiments are also disclosed and claimed.
    Type: Grant
    Filed: December 26, 2019
    Date of Patent: March 23, 2021
    Assignee: INTEL CORPORATION
    Inventors: Chandrasekaran Sakthivel, Prasoonkumar Surti, John C. Weast, Sara S. Baghsorkhi, Justin E. Gottschlich, Abhishek R. Appu, Nicolas C. Galoppo Von Borries, Joydeep Ray, Narayan Srinivasa, Feng Chen, Ben J. Ashbaugh, Rajkishore Barik, Tsung-Han Lin, Kamal Sinha, Eriko Nurvitadhi, Balaji Vembu, Altug Koker
  • Publication number: 20210081774
    Abstract: One embodiment provides for a general-purpose graphics processing unit including a scheduler to schedule multiple matrix operations for execution by a general-purpose graphics processing unit. The multiple matrix operations are determined based on a single machine learning compute instruction. The single machine learning compute instruction is a convolution instruction and the multiple matrix operations are associated with a convolution operation.
    Type: Application
    Filed: October 28, 2020
    Publication date: March 18, 2021
    Applicant: Intel Corporation
    Inventors: Rajkishore Barik, Elmoustapha Ould-Ahmed-Vall, Xiaoming Chen, Dhawal Srivastava, Anbang Yao, Kevin Nealis, Eriko Nurvitadhi, Sara S. Baghsorkhi, Balaji Vembu, Tatiana Shpeisman, Ping T. Tang
  • Patent number: 10943325
    Abstract: Techniques to improve performance of matrix multiply operations are described in which a compute kernel can specify one or more element-wise operations to perform on output of the compute kernel before the output is transferred to higher levels of a processor memory hierarchy.
    Type: Grant
    Filed: July 16, 2020
    Date of Patent: March 9, 2021
    Assignee: Intel Corporation
    Inventors: Eriko Nurvitadhi, Balaji Vembu, Tsung-Han Lin, Kamal Sinha, Rajkishore Barik, Nicolas C. Galoppo Von Borries
  • Publication number: 20210065329
    Abstract: A mechanism is described for facilitating dynamic merging of atomic operations in computing devices. A method of embodiments, as described herein, includes facilitating detecting atomic messages and a plurality of slot addresses. The method further includes comparing one or more slot addresses of the plurality of slot addresses with other slot addresses of the plurality of slot addresses to seek one or more matched slot addresses, where the one or more matched slot addresses are merged into one or more merged groups. The method may further include generating one or more merged atomic operations based on and corresponding to the one or more merged groups.
    Type: Application
    Filed: September 29, 2020
    Publication date: March 4, 2021
    Applicant: Intel Corporation
    Inventors: Joydeep Ray, Altug Koker, Abhishek R. Appu, Balaji Vembu
  • Patent number: 10937123
    Abstract: An apparatus and method for dynamic provisioning, quality of service, and prioritization in a graphics processor. For example, one embodiment of an apparatus comprises a graphics processing unit (GPU) comprising a plurality of graphics processing resources; slice configuration hardware logic to logically subdivide the graphics processing resources into a plurality of slices; and slice allocation hardware logic to allocate a designated number of slices to each virtual machine (VM) of a plurality of VMs running in a virtualized execution environment, the slice allocation hardware logic to allocate different numbers of slices to different VMs based on graphics processing requirements and/or priorities of each of the VMs.
    Type: Grant
    Filed: July 8, 2019
    Date of Patent: March 2, 2021
    Assignee: Intel Corporation
    Inventors: Abhishek R. Appu, Joydeep Ray, Altug Koker, Balaji Vembu, Pattabhiraman K, Matthew B. Callaway
  • Patent number: 10936214
    Abstract: Briefly, in accordance with one or more embodiments, an apparatus comprises a memory comprising one or more physical memory chips, and a processor to implement a working set monitor to monitor a working set resident in the one or more physical memory chips. The working set monitor is to adjust a number of the physical memory chips that are powered on based on a size of the working set.
    Type: Grant
    Filed: June 14, 2019
    Date of Patent: March 2, 2021
    Assignee: Intel Corporation
    Inventors: Travis T. Schluessler, Prasoonkumar Surti, Aravindh V. Anantaraman, Abhishek R. Appu, Joydeep Ray, Altug Koker, Balaji Vembu
  • Publication number: 20210056051
    Abstract: An apparatus and method are described for implementing memory management in a graphics processing system. For example, one embodiment of an apparatus comprises: a first plurality of graphics processing resources to execute graphics commands and process graphics data; a first memory management unit (MMU) to communicatively couple the first plurality of graphics processing resources to a system-level MMU to access a system memory; a second plurality of graphics processing resources to execute graphics commands and process graphics data; a second MMU to communicatively couple the second plurality of graphics processing resources to the first MMU; wherein the first MMU is configured as a master MMU having a direct connection to the system-level MMU and the second MMU comprises a slave MMU configured to send memory transactions to the first MMU, the first MMU either servicing a memory transaction or sending the memory transaction to the system-level MMU on behalf of the second MMU.
    Type: Application
    Filed: September 1, 2020
    Publication date: February 25, 2021
    Inventors: NIRANJAN L. COORAY, ABHISHEK R. APPU, ALTUG KOKER, JOYDEEP RAY, BALAJI VEMBU, PATTABHIRAMAN K, DAVID PUFFER, DAVID J. COWPERTHWAITE, RAJESH M. SANKARAN, SATYESHWAR SINGH, SAMEER KP, ANKUR N. SHAH, KUN TIAN
  • Publication number: 20210056033
    Abstract: In an example, an apparatus comprises a plurality of execution units, and a cache memory communicatively coupled to the plurality of execution units, wherein the cache memory is structured into a plurality of sectors, wherein each sector in the plurality of sectors comprises at least two cache lines. Other embodiments are also disclosed and claimed.
    Type: Application
    Filed: September 20, 2020
    Publication date: February 25, 2021
    Applicant: Intel Corporation
    Inventors: Abhishek R. Appu, Altug Koker, Joydeep Ray, David Puffer, Prasoonkumar Surti, Lakshminarayanan Striramassarma, Vasanth Ranganathan, Kiran C. Veernapu, Balaji Vembu, Pattabhiraman K