Patents by Inventor Balaji Vembu

Balaji Vembu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

COMPUTE CLUSTER PREEMPTION WITHIN A GENERAL-PURPOSE GRAPHICS PROCESSING UNIT

Publication number: 20210158471

Abstract: Embodiments described herein provide techniques enable a compute unit to continue processing operations when all dispatched threads are blocked. One embodiment provides for a method comprising executing multiple concurrent threads on a processing resource of a graphics processor, during execution, detecting that each of the multiple concurrent threads of the processing resource are blocked from execution, selecting a victim thread from the multiple concurrent threads, and suspending the victim thread. The thread state is stored to a thread scratch space in memory along with a blocking event associated with the victim thread.

Type: Application

Filed: November 16, 2020

Publication date: May 27, 2021

Applicant: Intel Corporation

Inventors: Murali Ramadoss, Balaji Vembu, Eric C. Samson, Kun Tian, David J. Cowperthwaite, Altug Koker, Zhi Wang, Joydeep Ray, Subramaniam M. Maiyuran, Abhishek R. Appu
Graphics processor with encrypted kernels

Patent number: 11018863

Abstract: An embodiment of a graphics apparatus may include a graphics processor including a kernel executor, and a security engine communicatively coupled to the graphics processor. The security engine may be configured to create a kernel security key, encrypt an executable kernel for the kernel executor in accordance with the kernel security key, and share the kernel security key with the graphics processor.

Type: Grant

Filed: June 7, 2019

Date of Patent: May 25, 2021

Assignee: Intel Corporation

Inventors: Balaji Vembu, Vidhya Krishnan, Sandeep S. Sodhi, Scott Janus, Daniel Nemiroff
MEMORY-BASED SOFTWARE BARRIERS

Publication number: 20210149724

Abstract: A mechanism is described for facilitating memory-based software barriers to emulate hardware barriers at graphics processors in computing devices. A method of embodiments, as described herein, includes facilitating converting thread scheduling at a processor from hardware barriers to software barriers, where the software barriers emulate the hardware barriers.

Type: Application

Filed: November 24, 2020

Publication date: May 20, 2021

Applicant: Intel Corporation

Inventors: Altug Koker, Joydeep Ray, Balaji Vembu, James A. Valerio, Abhishek R. Appu
Dynamic precision for neural network compute operations

Patent number: 11010659

Abstract: In an example, an apparatus comprises a compute engine comprising a high precision component and a low precision component; and logic, at least partially including hardware logic, to receive instructions in the compute engine; select at least one of the high precision component or the low precision component to execute the instructions; and apply a gate to at least one of the high precision component or the low precision component to execute the instructions. Other embodiments are also disclosed and claimed.

Type: Grant

Filed: April 24, 2017

Date of Patent: May 18, 2021

Assignee: INTEL CORPORATION

Inventors: Kamal Sinha, Balaji Vembu, Eriko Nurvitadhi, Nicolas C. Galoppo Von Borries, Rajkishore Barik, Tsung-Han Lin, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Anbang Yao, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Farshad Akhbari, Narayan Srinivasa, Feng Chen, Dukhwan Kim, Nadathur Rajagopalan Satish, John C. Weast, Mike B. MacPherson, Linda L. Hurd, Vasanth Ranganathan, Sanjeev S. Jahagirdar
Workload scheduling and distribution on a distributed graphics device

Patent number: 10997686

Abstract: Embodiments described herein provide a graphics, media, and compute device having a tiled architecture composed of a number of tiles of smaller graphics devices. The work distribution infrastructure for such device enables the distribution of workloads across multiple tiles of the device. Work items can be submitted to any one or more of the multiple tiles, with workloads able to span multiple tiles. Additionally, upon completion of a work item, graphics, media, and/or compute engines within the device can readily acquire new work items for execution with minimal latency.

Type: Grant

Filed: January 9, 2019

Date of Patent: May 4, 2021

Assignee: Intel Corporation

Inventors: Balaji Vembu, Brandon Fliflet, James Valerio, Michael Apodaca, Ben Ashbaugh, Hema Nalluri, Ankur Shah, Murali Ramadoss, David Puffer, Altug Koker, Aditya Navale, Abhishek R. Appu, Joydeep Ray, Travis Schluessler
INSTRUCTIONS AND LOGIC TO PERFORM FLOATING POINT AND INTEGER OPERATIONS FOR MACHINE LEARNING

Publication number: 20210124579

Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute a 32-bit intermediate product of 16-bit operands and to compute a 32-bit sum based on the 32-bit intermediate product.

Type: Application

Filed: December 9, 2020

Publication date: April 29, 2021

Applicant: Intel Corporation

Inventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
REGISTER SPILL/FILL USING SHARED LOCAL MEMORY SPACE

Publication number: 20210125581

Abstract: A mechanism is described for facilitating using of a shared local memory for register spilling/filling relating to graphics processors at computing devices. A method of embodiments, as described herein, includes reserving one or more spaces of a shared local memory (SLM) to perform one or more of spilling and filling relating to registers associated with a graphics processor of a computing device.

Type: Application

Filed: October 5, 2020

Publication date: April 29, 2021

Applicant: Intel Corporation

Inventors: Joydeep Ray, Altug Koker, Balaji Vembu, Murali Ramadoss, Guei-Yuan Lueh, James A. Valerio, Prasoonkumar Surti, Abhishek R. Appu, Vasanth Ranganathan, Kalyan K. Bhiravabhatla, Arthur D. Hunter, JR., Wei-Yu Chen, Subramaniam M. Maiyuran
Display engine surface blending and adaptive texel to pixel ratio sample rate system, apparatus and method

Patent number: 10991075

Abstract: Systems, apparatuses and methods may provide away to blend two or more of the scene surfaces based on the focus area and an offload threshold. More particularly, systems, apparatuses and methods may provide a way to blend, by a display engine, two or more of the focus area scene surfaces and blended non-focus area scene surfaces. The systems, apparatuses and methods may include a graphics engine to render the focus area surfaces at a higher sample rate than the non-focus area scene surfaces.

Type: Grant

Filed: September 7, 2018

Date of Patent: April 27, 2021

Assignee: Intel Corporation

Inventors: Joydeep Ray, Travis T. Schluessler, John H. Feit, Nikos Kaburlasos, Jacek Kwiatkowski, Abhishek R. Appu, Balaji Vembu, Prasoonkumar Surti
Handling pipeline submissions across many compute units

Patent number: 10977762

Abstract: One embodiment provides for a general-purpose graphics processing unit multiple processing elements having a single instruction, multiple thread (SIMT) architecture configured to perform hardware multithreading during execution of a plurality of thread groups. The plurality of thread groups can include one or more sub-groups of threads, with a first sub-group is associated with a first thread group and a second sub-group associated with a second thread group. Data dependencies can be used to trigger the launch of threads, such that when a first thread in the second sub-group has a data dependency upon a first thread in the first sub-group, circuitry in the general-purpose graphics processing unit can launch at least the first thread in the second sub-group to execute in response to satisfaction of the data dependency.

Type: Grant

Filed: March 30, 2020

Date of Patent: April 13, 2021

Assignee: Intel Corporation

Inventors: Balaji Vembu, Altug Koker, Joydeep Ray
Adaptive compute size per workload

Patent number: 10963986

Abstract: Methods and apparatus relating to techniques for power management. In an example, an apparatus comprises logic, at least partially comprising hardware logic, to receive one or more frames for a workload, determine one or more compute resource parameters for the workload, and store the one or more compute resource parameters for the workload in a memory in association with workload context data for the workload. Other embodiments are also disclosed and claimed.

Type: Grant

Filed: December 26, 2019

Date of Patent: March 30, 2021

Assignee: INTEL CORPORATION

Inventors: Balaji Vembu, Josh B. Mastronarde, Altug Koker, Nikos Kaburlasos, Abhishek R. Appu, Joydeep Ray
High bandwidth connection between processor dies

Patent number: 10956223

Abstract: Methods and apparatus relating to techniques for avoiding cache lookup for cold cache. In an example, an apparatus comprises logic, at least partially comprising hardware logic, to receive a completion acknowledgment from the plurality of graphics processing units and in response to a determination that the workload is finished, to terminate one or more communication connections on the interconnect bridge. Other embodiments are also disclosed and claimed.

Type: Grant

Filed: June 18, 2020

Date of Patent: March 23, 2021

Assignee: INTEL CORPORATION

Inventors: Altug Koker, Abhishek R. Appu, Kiran C. Veernapu, Joydeep Ray, Balaji Vembu
Power consumption management for communication bus

Patent number: 10955896

Abstract: Methods and apparatus relating to techniques for avoiding cache lookup for cold cache. In an example, an apparatus comprises logic, at least partially comprising hardware logic, to receive data for a current write operation to a memory, determine a number of bits in the received data for the current write operation to the memory which have changed from a previous write operation to the memory and in response to a determination that the number of bits in the received data for the current write operation to the memory which have changed from a previous write operation to the memory exceeds a threshold, to toggle a plurality of bits in the data for the current write operation to create an encoded data set and set an indicator bit to a value which indicates that the plurality of bits have been toggled. Other embodiments are also disclosed and claimed.

Type: Grant

Filed: February 5, 2020

Date of Patent: March 23, 2021

Assignee: INTEL CORPORATION

Inventors: Abhishek R. Appu, Altug Koker, Eric J. Hoekstra, Kiran C. Veernapu, Prasoonkumar Surti, Vasanth Ranganathan, Kamal Sinha, Balaji Vembu, Eric J. Asperheim, Sanjeev S. Jahagirdar, Joydeep Ray
Extend GPU/CPU coherency to multi-GPU cores

Patent number: 10956330

Abstract: In an example, an apparatus comprises a plurality of processing unit cores, a plurality of cache memory modules associated with the plurality of processing unit cores, and a machine learning model communicatively coupled to the plurality of processing unit cores, wherein the plurality of cache memory modules share cache coherency data with the machine learning model. Other embodiments are also disclosed and claimed.

Type: Grant

Filed: December 26, 2019

Date of Patent: March 23, 2021

Assignee: INTEL CORPORATION

Inventors: Chandrasekaran Sakthivel, Prasoonkumar Surti, John C. Weast, Sara S. Baghsorkhi, Justin E. Gottschlich, Abhishek R. Appu, Nicolas C. Galoppo Von Borries, Joydeep Ray, Narayan Srinivasa, Feng Chen, Ben J. Ashbaugh, Rajkishore Barik, Tsung-Han Lin, Kamal Sinha, Eriko Nurvitadhi, Balaji Vembu, Altug Koker
SPECIALIZED FIXED FUNCTION HARDWARE FOR EFFICIENT CONVOLUTION

Publication number: 20210081774

Abstract: One embodiment provides for a general-purpose graphics processing unit including a scheduler to schedule multiple matrix operations for execution by a general-purpose graphics processing unit. The multiple matrix operations are determined based on a single machine learning compute instruction. The single machine learning compute instruction is a convolution instruction and the multiple matrix operations are associated with a convolution operation.

Type: Application

Filed: October 28, 2020

Publication date: March 18, 2021

Applicant: Intel Corporation

Inventors: Rajkishore Barik, Elmoustapha Ould-Ahmed-Vall, Xiaoming Chen, Dhawal Srivastava, Anbang Yao, Kevin Nealis, Eriko Nurvitadhi, Sara S. Baghsorkhi, Balaji Vembu, Tatiana Shpeisman, Ping T. Tang
Machine learning sparse computation mechanism

Patent number: 10943325

Abstract: Techniques to improve performance of matrix multiply operations are described in which a compute kernel can specify one or more element-wise operations to perform on output of the compute kernel before the output is transferred to higher levels of a processor memory hierarchy.

Type: Grant

Filed: July 16, 2020

Date of Patent: March 9, 2021

Assignee: Intel Corporation

Inventors: Eriko Nurvitadhi, Balaji Vembu, Tsung-Han Lin, Kamal Sinha, Rajkishore Barik, Nicolas C. Galoppo Von Borries
EFFICIENT MERGING OF ATOMIC OPERATIONS AT COMPUTING DEVICES

Publication number: 20210065329

Abstract: A mechanism is described for facilitating dynamic merging of atomic operations in computing devices. A method of embodiments, as described herein, includes facilitating detecting atomic messages and a plurality of slot addresses. The method further includes comparing one or more slot addresses of the plurality of slot addresses with other slot addresses of the plurality of slot addresses to seek one or more matched slot addresses, where the one or more matched slot addresses are merged into one or more merged groups. The method may further include generating one or more merged atomic operations based on and corresponding to the one or more merged groups.

Type: Application

Filed: September 29, 2020

Publication date: March 4, 2021

Applicant: Intel Corporation

Inventors: Joydeep Ray, Altug Koker, Abhishek R. Appu, Balaji Vembu
Apparatus and method for dynamic provisioning, quality of service, and prioritization in a graphics processor

Patent number: 10937123

Abstract: An apparatus and method for dynamic provisioning, quality of service, and prioritization in a graphics processor. For example, one embodiment of an apparatus comprises a graphics processing unit (GPU) comprising a plurality of graphics processing resources; slice configuration hardware logic to logically subdivide the graphics processing resources into a plurality of slices; and slice allocation hardware logic to allocate a designated number of slices to each virtual machine (VM) of a plurality of VMs running in a virtualized execution environment, the slice allocation hardware logic to allocate different numbers of slices to different VMs based on graphics processing requirements and/or priorities of each of the VMs.

Type: Grant

Filed: July 8, 2019

Date of Patent: March 2, 2021

Assignee: Intel Corporation

Inventors: Abhishek R. Appu, Joydeep Ray, Altug Koker, Balaji Vembu, Pattabhiraman K, Matthew B. Callaway
Power management of memory chips based on working set size

Patent number: 10936214

Abstract: Briefly, in accordance with one or more embodiments, an apparatus comprises a memory comprising one or more physical memory chips, and a processor to implement a working set monitor to monitor a working set resident in the one or more physical memory chips. The working set monitor is to adjust a number of the physical memory chips that are powered on based on a size of the working set.

Type: Grant

Filed: June 14, 2019

Date of Patent: March 2, 2021

Assignee: Intel Corporation

Inventors: Travis T. Schluessler, Prasoonkumar Surti, Aravindh V. Anantaraman, Abhishek R. Appu, Joydeep Ray, Altug Koker, Balaji Vembu
APPARATUS AND METHOD FOR MEMORY MANAGEMENT IN A GRAPHICS PROCESSING ENVIRONMENT

Publication number: 20210056051

Abstract: An apparatus and method are described for implementing memory management in a graphics processing system. For example, one embodiment of an apparatus comprises: a first plurality of graphics processing resources to execute graphics commands and process graphics data; a first memory management unit (MMU) to communicatively couple the first plurality of graphics processing resources to a system-level MMU to access a system memory; a second plurality of graphics processing resources to execute graphics commands and process graphics data; a second MMU to communicatively couple the second plurality of graphics processing resources to the first MMU; wherein the first MMU is configured as a master MMU having a direct connection to the system-level MMU and the second MMU comprises a slave MMU configured to send memory transactions to the first MMU, the first MMU either servicing a memory transaction or sending the memory transaction to the system-level MMU on behalf of the second MMU.

Type: Application

Filed: September 1, 2020

Publication date: February 25, 2021

Inventors: NIRANJAN L. COORAY, ABHISHEK R. APPU, ALTUG KOKER, JOYDEEP RAY, BALAJI VEMBU, PATTABHIRAMAN K, DAVID PUFFER, DAVID J. COWPERTHWAITE, RAJESH M. SANKARAN, SATYESHWAR SINGH, SAMEER KP, ANKUR N. SHAH, KUN TIAN
SECTOR CACHE FOR COMPRESSION

Publication number: 20210056033

Abstract: In an example, an apparatus comprises a plurality of execution units, and a cache memory communicatively coupled to the plurality of execution units, wherein the cache memory is structured into a plurality of sectors, wherein each sector in the plurality of sectors comprises at least two cache lines. Other embodiments are also disclosed and claimed.

Type: Application

Filed: September 20, 2020

Publication date: February 25, 2021

Applicant: Intel Corporation

Inventors: Abhishek R. Appu, Altug Koker, Joydeep Ray, David Puffer, Prasoonkumar Surti, Lakshminarayanan Striramassarma, Vasanth Ranganathan, Kiran C. Veernapu, Balaji Vembu, Pattabhiraman K

prev … 7 8 9 10 11 12 13 14 15 … next