Patents by Inventor Balaji Vembu

Balaji Vembu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Reduce power by frame skipping

Patent number: 11094033

Abstract: In an example, an apparatus comprises logic, at least partially comprising hardware logic, to receive an input from one or more detectors proximate a display to present an output from a graphics pipeline, determine that a user is not interacting with the display, and in response to a determination that the user is not interacting with the display, to reduce a frame rendering rate of the graphics pipeline. Other embodiments are also disclosed and claimed.

Type: Grant

Filed: February 14, 2020

Date of Patent: August 17, 2021

Assignee: INTEL CORPORATION

Inventors: Balaji Vembu, Nikos Kaburlasos, Josh B. Mastronarde
COMPUTE OPTIMIZATION MECHANISM FOR DEEP NEURAL NETWORKS

Publication number: 20210241417

Abstract: An apparatus to facilitate compute optimization is disclosed. The apparatus includes a plurality of processing units each comprising a plurality of execution units (EUs), wherein the plurality of EUs comprise a first EU type and a second EU type.

Type: Application

Filed: January 11, 2021

Publication date: August 5, 2021

Applicant: Intel Corporation

Inventors: Prasoonkumar Surti, Narayan Srinivasa, Feng Chen, Joydeep Ray, Ben J. Ashbaugh, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Balaji Vembu, Tsung-Han Lin, Kamal Sinha, Rajkishore Barik, Sara S. Baghsorkhi, Justin E. Gottschlich, Altug Koker, Nadathur Rajagopalan Satish, Farshad Akhbari, Dukhwan Kim, Wenyin Fu, Travis T. Schluessler, Josh B. Mastronarde, Linda L. Hurd, John H. Feit, Jeffery S. Boles, Adam T. Lake, Karthik Vaidyanathan, Devan Burke, Subramaniam Maiyuran, Abhishek R. Appu
WORKLOAD SCHEDULING AND DISTRIBUTION ON A DISTRIBUTED GRAPHICS DEVICE

Publication number: 20210241418

Abstract: Embodiments described herein provide a graphics, media, and compute device having a tiled architecture composed of a number of tiles of smaller graphics devices. The work distribution infrastructure for such device enables the distribution of workloads across multiple tiles of the device. Work items can be submitted to any one or more of the multiple tiles, with workloads able to span multiple tiles. Additionally, upon completion of a work item, graphics, media, and/or compute engines within the device can readily acquire new work items for execution with minimal latency.

Type: Application

Filed: April 19, 2021

Publication date: August 5, 2021

Applicant: Intel Corporation

Inventors: Balaji Vembu, Brandon Fliflet, James Valerio, Michael Apodaca, Ben Ashbaugh, Hema Nalluri, Ankur Shah, Murali Ramadoss, David Puffer, Altug Koker, Aditya Navale, Abhishek R. Appu, Joydeep Ray, Travis Schluessler
Apparatus and method for dynamic provisioning, quality of service, and scheduling in a graphics processor

Patent number: 11080213

Abstract: An apparatus and method for dynamic provisioning and traffic control on a memory fabric.

Type: Grant

Filed: December 2, 2019

Date of Patent: August 3, 2021

Assignee: INTEL CORPORATION

Inventors: Balaji Vembu, Altug Koker, Joydeep Ray, Abhishek R. Appu, Pattabhiraman K, Niranjan L. Cooray
Instructions and logic to perform floating point and integer operations for machine learning

Patent number: 11080046

Abstract: A processing apparatus is provided comprising a multiprocessor having a multithreaded architecture. The multiprocessor can execute at least one single instruction to perform parallel mixed precision matrix operations. In one embodiment the apparatus includes a memory interface and an array of multiprocessors coupled to the memory interface. At least one multiprocessor in the array of multiprocessors is configured to execute a fused multiply-add instruction in parallel across multiple threads.

Type: Grant

Filed: February 5, 2021

Date of Patent: August 3, 2021

Assignee: Intel Corporation

Inventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
DUAL PATH SEQUENTIAL ELEMENT TO REDUCE TOGGLES IN DATA PATH

Publication number: 20210232204

Abstract: Methods and apparatus relating to techniques for a dual path sequential element to reduce toggles in data path are described. In an embodiment, switching logic causes signals for a single data path of a processor to be directed to at least two separate data paths. At least one of the two separate data paths is power gated to reduce signal toggles in the at least one data path. Other embodiments are also disclosed and claimed.

Type: Application

Filed: December 1, 2020

Publication date: July 29, 2021

Applicant: Intel Corporation

Inventors: Subramaniam Maiyuran, Sanjeev S. Jahagirdar, Kiran C. Veernapu, Eric J. Asperheim, Altug Koker, Balaji Vembu, Joydeep Ray, Abhishek R. Appu
Compute optimizations for neural networks using bipolar binary weight

Patent number: 11074072

Abstract: One embodiment provides for a compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction that specifies multiple operands including a multi-bit input value and a bipolar binary weight associated with a neural network and an arithmetic logic unit including a multiplier, an adder, and an accumulator register. To execute the decoded instruction, the multiplier is to perform a multiplication operation on the multi-bit input based on the bipolar binary weight to generate an intermediate product and the adder is to add the intermediate product to a value stored in the accumulator register and update the value stored in the accumulator register.

Type: Grant

Filed: July 8, 2019

Date of Patent: July 27, 2021

Assignee: Intel Corporation

Inventors: Kevin Nealis, Anbang Yao, Xiaoming Chen, Elmoustapha Ould-Ahmed-Vall, Sara S. Baghsorkhi, Eriko Nurvitadhi, Balaji Vembu, Nicolas C. Galoppo Von Borries, Rajkishore Barik, Tsung-Han Lin, Kamal Sinha
MACHINE LEARNING SPARSE COMPUTATION MECHANISM

Publication number: 20210217130

Abstract: Techniques to improve performance of matrix multiply operations are described in which a compute kernel can specify one or more element-wise operations to perform on output of the compute kernel before the output is transferred to higher levels of a processor memory hierarchy.

Type: Application

Filed: March 5, 2021

Publication date: July 15, 2021

Applicant: Intel Corporation

Inventors: Eriko Nurvitadhi, Balaji Vembu, Tsung-Han Lin, Kamal Sinha, Rajkishore Barik, Nicolas C. Galoppo Von Borries
CACHE OPTIMIZATION FOR GRAPHICS SYSTEMS

Publication number: 20210216467

Abstract: A mechanism is described for facilitating optimization of cache associated with graphics processors at computing devices. A method of embodiments, as described herein, includes introducing coloring bits to contents of a cache associated with a processor including a graphics processor, wherein the coloring bits to represent a signal identifying one or more caches available for use, while avoiding explicit invalidations and flushes.

Type: Application

Filed: November 19, 2020

Publication date: July 15, 2021

Applicant: Intel Corporation

Inventors: Altug Koker, Balaji Vembu, Joydeep Ray, Abhishek R. Appu
Avoid thread switching in cache management

Patent number: 11055248

Abstract: Methods and apparatus relating to techniques for avoiding cache lookup for cold cache. In an example, an apparatus comprises logic, at least partially comprising hardware logic, to monitor a thread switching overhead parameter for an application executing in a processing system and in response to a determination that the thread switching overhead parameter exceeds a threshold, to activate a thread management algorithm to reduce thread switching in the processing system. Other embodiments are also disclosed and claimed.

Type: Grant

Filed: October 11, 2019

Date of Patent: July 6, 2021

Assignee: INTEL CORPORATION

Inventors: Abhishek R. Appu, Altug Koker, Joydeep Ray, Kiran C. Veernapu, Balaji Vembu, Vasanth Ranganathan, Prasoonkumar Surti
APPARATUS AND METHOD FOR EFFICIENT GRAPHICS VIRTUALIZATION

Publication number: 20210201556

Abstract: An apparatus and method are described for allocating local memories to virtual machines. For example, one embodiment of an apparatus comprises: a command streamer to queue commands from a plurality of virtual machines (VMs) or applications, the commands to be distributed from the command streamer and executed by graphics processing resources of a graphics processing unit (GPU); a tile cache to store graphics data associated with the plurality of VMs or applications as the commands are executed by the graphics processing resources; and tile cache allocation hardware logic to allocate a first portion of the tile cache to a first VM or application and a second portion of the tile cache to a second VM or application; the tile cache allocation hardware logic to further allocate a first region in system memory to store spill-over data when the first portion of the tile cache and/or the second portion of the file cache becomes full.

Type: Application

Filed: January 5, 2021

Publication date: July 1, 2021

Inventors: JOYDEEP RAY, ABHISHEK R. APPU, PATTABHIRAMAN K, BALAJI VEMBU, ALTUG KOKER, NIRANJAN L. COORAY, JOSH B. MASTRONARDE
COORDINATION AND INCREASED UTILIZATION OF GRAPHICS PROCESSORS DURING INFERENCE

Publication number: 20210201438

Abstract: A mechanism is described for facilitating inference coordination and processing utilization for machine learning at autonomous machines. A method of embodiments, as described herein, includes detecting, at training time, information relating to one or more tasks to be performed according to a training dataset relating to a processor including a graphics processor. The method may further include analyzing the information to determine one or more portions of hardware relating to the processor capable of supporting the one or more tasks, and configuring the hardware to pre-select the one or more portions to perform the one or more tasks, while other portions of the hardware remain available for other tasks.

Type: Application

Filed: January 7, 2021

Publication date: July 1, 2021

Applicant: Intel Corporation

Inventors: Abhishek R. Appu, Altug Koker, John C. Weast, Mike B. Macpherson, Linda L. Hurd, Sara S. Baghsorkhi, Justin E. Gottschlich, Prasoonkumar Surti, Chandrasekaran Sakthivel, Liwei Ma, Elmoustapha Ould-Ahmed-Vall, Kamal Sinha, Joydeep Ray, Balaji Vembu, Sanjeev Jahagirdar, Vasanth Ranganathan, DUKHWAN Kim
Dynamic voltage-frequency curve mangement

Patent number: 11048605

Abstract: Methods and apparatus relating to techniques for power management. In an example, an apparatus comprises logic, at least partially comprising hardware logic, to generate a voltage/frequency curve for at least one of a core or a sub-core in a processor and manage an operating voltage level of the at least one of a core or a sub-core using the voltage/frequency curve. Other embodiments are also disclosed and claimed.

Type: Grant

Filed: July 30, 2019

Date of Patent: June 29, 2021

Assignee: INTEL CORPORATION

Inventors: Nikos Kaburlasos, Balaji Vembu, Josh B. Mastronarde, Altug Koker, Eric C. Samson, Abhishek R. Appu, Kiran C. Veernapu, Joydeep Ray, Vasanth Ranganathan, Sanjeev S. Jahagirdar
Compute mechanism for sparse matrix data

Patent number: 11049284

Abstract: An apparatus to facilitate compute compression is disclosed. The apparatus includes a graphics processing unit including mapping logic to map a first block of integer pixel data to a compression block and compression logic to compress the compression block.

Type: Grant

Filed: July 15, 2019

Date of Patent: June 29, 2021

Assignee: Intel Corporation

Inventors: Abhishek R. Appu, Altug Koker, Joydeep Ray, Balaji Vembu, Prasoonkumar Surti, Kamal Sinha, Nadathur Rajagoplan Satish, Narayan Srinivasa, Feng Chen, Dukhwan Kim, Farshad Akhbari
HANDLING PIPELINE SUBMISSIONS ACROSS MANY COMPUTE UNITS

Publication number: 20210192676

Abstract: One embodiment provides an apparatus comprising an interconnect fabric comprising one or more fabric switches, a plurality of memory interfaces coupled to the interconnect fabric to provide access to a plurality of memory devices, an input/output (TO) interface coupled to the interconnect fabric to provide access to IO devices, an array of multiprocessors coupled to the interconnect fabric, scheduling circuitry to distribute a plurality of thread groups across the array of multiprocessors, each thread group comprising a plurality of threads and each thread comprising a plurality of instructions to be executed by at least one of the multiprocessors, and a first multiprocessor of the array of multiprocessors to be assigned to process a first thread group comprising a first plurality of threads, the first multiprocessor comprising a plurality of parallel execution circuits.

Type: Application

Filed: March 10, 2021

Publication date: June 24, 2021

Applicant: Intel Corporation

Inventors: Balaji Vembu, Altug Koker, Joydeep Ray
SECTOR CACHE FOR COMPRESSION

Publication number: 20210191872

Abstract: In an example, an apparatus comprises a plurality of execution units, and a cache memory communicatively coupled to the plurality of execution units, wherein the cache memory is structured into a plurality of sectors, wherein each sector in the plurality of sectors comprises at least two cache lines. Other embodiments are also disclosed and claimed.

Type: Application

Filed: March 3, 2021

Publication date: June 24, 2021

Applicant: Intel Corporation

Inventors: Abhishek R. Appu, Altug Koker, Joydeep Ray, David Puffer, Prasoonkumar Surti, Lakshminarayanan Striramassarma, Vasanth Ranganathan, Kiran C. Veernapu, Balaji Vembu, Pattabhiraman K
DE-CENTRALIZED LOAD-BALANCING AT PROCESSORS

Publication number: 20210182120

Abstract: A mechanism is described for facilitating localized load-balancing for processors in computing devices. A method of embodiments, as described herein, includes facilitating hosting, at a processor of a computing device, a local load-balancing mechanism. The method may further include monitoring balancing of loads at the processor and serving as a local scheduler to maintain de-centralized load-balancing at the processor and between the processor and other one or more processors.

Type: Application

Filed: December 22, 2020

Publication date: June 17, 2021

Applicant: Intel Corporation

Inventors: Prasoonkumar Surti, David Cowperthwaite, Abhishek R. Appu, Joydeep Ray, Vasanth Ranganathan, Altug Koker, Balaji Vembu
INSTRUCTIONS AND LOGIC TO PERFORM FLOATING POINT AND INTEGER OPERATIONS FOR MACHINE LEARNING

Publication number: 20210182058

Abstract: A processing apparatus is provided comprising a multiprocessor having a multithreaded architecture. The multiprocessor can execute at least one single instruction to perform parallel mixed precision matrix operations. In one embodiment the apparatus includes a memory interface and an array of multiprocessors coupled to the memory interface. At least one multiprocessor in the array of multiprocessors is configured to execute a fused multiply-add instruction in parallel across multiple threads.

Type: Application

Filed: February 5, 2021

Publication date: June 17, 2021

Applicant: Intel Corporation

Inventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
COMPUTE CLUSTER PREEMPTION WITHIN A GENERAL-PURPOSE GRAPHICS PROCESSING UNIT

Publication number: 20210158471

Abstract: Embodiments described herein provide techniques enable a compute unit to continue processing operations when all dispatched threads are blocked. One embodiment provides for a method comprising executing multiple concurrent threads on a processing resource of a graphics processor, during execution, detecting that each of the multiple concurrent threads of the processing resource are blocked from execution, selecting a victim thread from the multiple concurrent threads, and suspending the victim thread. The thread state is stored to a thread scratch space in memory along with a blocking event associated with the victim thread.

Type: Application

Filed: November 16, 2020

Publication date: May 27, 2021

Applicant: Intel Corporation

Inventors: Murali Ramadoss, Balaji Vembu, Eric C. Samson, Kun Tian, David J. Cowperthwaite, Altug Koker, Zhi Wang, Joydeep Ray, Subramaniam M. Maiyuran, Abhishek R. Appu
Graphics processor with encrypted kernels

Patent number: 11018863

Abstract: An embodiment of a graphics apparatus may include a graphics processor including a kernel executor, and a security engine communicatively coupled to the graphics processor. The security engine may be configured to create a kernel security key, encrypt an executable kernel for the kernel executor in accordance with the kernel security key, and share the kernel security key with the graphics processor.

Type: Grant

Filed: June 7, 2019

Date of Patent: May 25, 2021

Assignee: Intel Corporation

Inventors: Balaji Vembu, Vidhya Krishnan, Sandeep S. Sodhi, Scott Janus, Daniel Nemiroff

prev … 6 7 8 9 10 11 12 13 14 … next