Patents by Inventor Balaji Vembu

Balaji Vembu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11609856
    Abstract: In an example, an apparatus comprises a plurality of processing unit cores, a plurality of cache memory modules associated with the plurality of processing unit cores, and a machine learning model communicatively coupled to the plurality of processing unit cores, wherein the plurality of cache memory modules share cache coherency data with the machine learning model. Other embodiments are also disclosed and claimed.
    Type: Grant
    Filed: March 19, 2021
    Date of Patent: March 21, 2023
    Assignee: INTEL CORPORATION
    Inventors: Chandrasekaran Sakthivel, Prasoonkumar Surti, John C. Weast, Sara S. Baghsorkhi, Justin E. Gottschlich, Abhishek R. Appu, Nicolas C. Galoppo Von Borries, Joydeep Ray, Narayan Srinivasa, Feng Chen, Ben J. Ashbaugh, Rajkishore Barik, Tsung-Han Lin, Kamal Sinha, Eriko Nurvitadhi, Balaji Vembu, Altug Koker
  • Patent number: 11593910
    Abstract: Embodiments provide mechanisms to facilitate compute operations for deep neural networks. One embodiment comprises a graphics processing unit comprising one or more multiprocessors, at least one of the one or more multiprocessors including a register file to store a plurality of different types of operands and a plurality of processing cores. The plurality of processing cores includes a first set of processing cores of a first type and a second set of processing cores of a second type. The first set of processing cores are associated with a first memory channel and the second set of processing cores are associated with a second memory channel.
    Type: Grant
    Filed: May 11, 2022
    Date of Patent: February 28, 2023
    Assignee: Intel Corporation
    Inventors: Prasoonkumar Surti, Narayan Srinivasa, Feng Chen, Joydeep Ray, Ben J. Ashbaugh, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Balaji Vembu, Tsung-Han Lin, Kamal Sinha, Rajkishore Barik, Sara S. Baghsorkhi, Justin E. Gottschlich, Altug Koker, Nadathur Rajagopalan Satish, Farshad Akhbari, Dukhwan Kim, Wenyin Fu, Travis T. Schluessler, Josh B. Mastronarde, Linda L. Hurd, John H. Feit, Jeffery S. Boles, Adam T. Lake, Karthik Vaidyanathan, Devan Burke, Subramaniam Maiyuran, Abhishek R. Appu
  • Patent number: 11593269
    Abstract: In an example, an apparatus comprises a plurality of execution units, and a cache memory communicatively coupled to the plurality of execution units, wherein the cache memory is structured into a plurality of sectors, wherein each sector in the plurality of sectors comprises at least two cache lines. Other embodiments are also disclosed and claimed.
    Type: Grant
    Filed: August 12, 2021
    Date of Patent: February 28, 2023
    Assignee: Intel Corporation
    Inventors: Abhishek R. Appu, Altug Koker, Joydeep Ray, David Puffer, Prasoonkumar Surti, Lakshminarayanan Striramassarma, Vasanth Ranganathan, Kiran C. Veernapu, Balaji Vembu, Pattabhiraman K
  • Patent number: 11592817
    Abstract: A mechanism is described for facilitating storage management for machine learning at autonomous machines. A method of embodiments, as described herein, includes detecting one or more components associated with machine learning, where the one or more components include memory and a processor coupled to the memory, and where the processor includes a graphics processor. The method may further include allocating a storage portion of the memory and a hardware portion of the processor to a machine learning training set, where the storage and hardware portions are precise for implementation and processing of the training set.
    Type: Grant
    Filed: April 28, 2017
    Date of Patent: February 28, 2023
    Assignee: INTEL CORPORATION
    Inventors: Abhishek R. Appu, John C. Weast, Sara S. Baghsorkhi, Justin E. Gottschlich, Prasoonkumar Surti, Chandrasekaran Sakthivel, Altug Koker, Farshad Akhbari, Feng Chen, Dukhwan Kim, Narayan Srinivasa, Nadathur Rajagopalan Satish, Kamal Sinha, Joydeep Ray, Balaji Vembu, Mike B. Macpherson, Linda L. Hurd, Sanjeev Jahagirdar, Vasanth Ranganathan
  • Patent number: 11586548
    Abstract: In an example, an apparatus comprises a plurality of execution units, and a cache memory communicatively coupled to the plurality of execution units, wherein the cache memory is structured into a plurality of sectors, wherein each sector in the plurality of sectors comprises at least two cache lines. Other embodiments are also disclosed and claimed.
    Type: Grant
    Filed: March 3, 2021
    Date of Patent: February 21, 2023
    Assignee: Intel Corporation
    Inventors: Abhishek R. Appu, Altug Koker, Joydeep Ray, David Puffer, Prasoonkumar Surti, Lakshminarayanan Striramassarma, Vasanth Ranganathan, Kiran C. Veernapu, Balaji Vembu, Pattabhiraman K
  • Publication number: 20230046506
    Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute an intermediate product of 16-bit operands and to compute a 32-bit sum based on the intermediate product.
    Type: Application
    Filed: October 17, 2022
    Publication date: February 16, 2023
    Applicant: Intel Corporation
    Inventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
  • Publication number: 20230039729
    Abstract: Methods and apparatus relating to autonomous vehicle neural network optimization techniques are described. In an embodiment, the difference between a first training dataset to be used for a neural network and a second training dataset to be used for the neural network is detected. The second training dataset is authenticated in response to the detection of the difference. The neural network is used to assist in an autonomous vehicle/driving. Other embodiments are also disclosed and claimed.
    Type: Application
    Filed: October 11, 2022
    Publication date: February 9, 2023
    Applicant: Intel Corporation
    Inventors: Abhishek R. Appu, Altug Koker, Linda L. Hurd, Dukhwan Kim, Mike B. MacPherson, John C. Weast, Justin E. Gottschlich, Jingyi Jin, Barath Lakshmanan, Chandrasekaran Sakthivel, Michael S. Strickland, Joydeep Ray, Kamal Sinha, Prasoonkumar Surti, Balaji Vembu, Ping T. Tang, Anbang Yao, Tatiana Shpeisman, Xiaoming Chen
  • Publication number: 20230040631
    Abstract: Techniques to improve performance of matrix multiply operations are described in which a compute kernel can specify one or more element-wise operations to perform on output of the compute kernel before the output is transferred to higher levels of a processor memory hierarchy.
    Type: Application
    Filed: August 5, 2022
    Publication date: February 9, 2023
    Applicant: Intel Corporation
    Inventors: Eriko Nurvitadhi, Balaji Vembu, Tsung-Han Lin, Kamal Sinha, Rajkishore Barik, Nicolas C. Galoppo Von Borries
  • Publication number: 20230039853
    Abstract: Embodiments described herein provide a graphics, media, and compute device having a tiled architecture composed of a number of tiles of smaller graphics devices. The work distribution infrastructure for such device enables the distribution of workloads across multiple tiles of the device. Work items can be submitted to any one or more of the multiple tiles, with workloads able to span multiple tiles. Additionally, upon completion of a work item, graphics, media, and/or compute engines within the device can readily acquire new work items for execution with minimal latency.
    Type: Application
    Filed: October 18, 2022
    Publication date: February 9, 2023
    Applicant: Intel Corporation
    Inventors: Balaji Vembu, Brandon Fliflet, James Valerio, Michael Apodaca, Ben Ashbaugh, Hema Nalluri, Ankur Shah, Murali Ramadoss, David Puffer, Altug Koker, Aditya Navale, Abhishek R. Appu, Joydeep Ray, Travis Schluessler
  • Patent number: 11574386
    Abstract: Systems, apparatuses and methods may provide away to blend two or more of the scene surfaces based on the focus area and an offload threshold. More particularly, systems, apparatuses and methods may provide a way to blend, by a display engine, two or more of the focus area scene surfaces and blended non-focus area scene surfaces. The systems, apparatuses and methods may include a graphics engine to render the focus area surfaces at a higher sample rate than the non-focus area scene surfaces.
    Type: Grant
    Filed: April 26, 2021
    Date of Patent: February 7, 2023
    Assignee: Intel Corporation
    Inventors: Joydeep Ray, Travis T. Schluessler, John H. Feit, Nikos Kaburlasos, Jacek Kwiatkowski, Abhishek R. Appu, Balaji Vembu, Prasoonkumar Surti
  • Patent number: 11562461
    Abstract: An apparatus to facilitate compute optimization is disclosed. The apparatus includes one or more processing units to provide a first set of shader operations associated with a shader stage of a graphics pipeline, a scheduler to schedule shader threads for processing, and a field-programmable gate array (FPGA) dynamically configured to provide a second set of shader operations associated with the shader stage of the graphics pipeline.
    Type: Grant
    Filed: November 18, 2021
    Date of Patent: January 24, 2023
    Assignee: Intel Corporation
    Inventors: Prasoonkumar Surti, Narayan Srinivasa, Feng Chen, Joydeep Ray, Ben J. Ashbaugh, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Balaji Vembu, Tsung-Han Lin, Kamal Sinha, Rajkishore Barik, Sara S. Baghsorkhi, Justin E. Gottschlich, Altug Koker, Nadathur Rajagopalan Satish, Farshad Akhbari, Dukhwan Kim, Wenyin Fu, Travis T. Schluessler, Josh B. Mastronarde, Linda L. Hurd, John H. Feit, Jeffery S. Boles, Adam T. Lake, Karthik Vaidyanathan, Devan Burke, Subramaniam Maiyuran, Abhishek R. Appu
  • Publication number: 20230014066
    Abstract: A method for securely terminating a distributed trusted execution environment (TEE) spanning a plurality of work accelerators. After wiping sensitive data from the memory of its accelerator, a root of trust for each accelerator is configured to receive confirmation that the data has been wiped from the processor memory in relevant other accelerators prior to moving on to the next stage at which the TEE on its associated accelerator is terminated. Since the data has been wiped from the other accelerators, even if a third party were to inject malicious code into the accelerator, they would be unable to read out the secret data from the other accelerators since the data has been wiped from those other accelerators. In this way, a mechanism is provided for ensuring that when the distributed TEE is terminated, malicious third parties are unable to read out confidential data from the accelerators.
    Type: Application
    Filed: July 13, 2021
    Publication date: January 19, 2023
    Inventors: Daniel John Pelham Wilkinson, Stavros Volos, Kapil Vaswani, Balaji Vembu
  • Publication number: 20230020255
    Abstract: A method for securely terminating a distributed trusted execution environment spanning a plurality of work accelerators. Each accelerator is configured to self-isolate upon determining that the distributed TEE is to be terminated across the system of accelerators. The data is also wiped from the processor memory of each accelerator, such that the data cannot be read out from the processor memory once the accelerator's links are re-enabled. The self-isolation is performed on each accelerator prior to the step of terminating the TEE on that accelerator. An accelerator only re-enables its links to other accelerators once the data is wiped from its processor memory such that the secret data is removed from the accelerator memory.
    Type: Application
    Filed: July 13, 2021
    Publication date: January 19, 2023
    Inventors: Daniel John Pelham WILKINSON, Stavros VOLOS, Kapil VASWANI, Balaji VEMBU
  • Publication number: 20230020838
    Abstract: In various examples there is a computing device comprising: a first microcontroller comprising a first immutable bootloader and first mutable firmware. The first immutable bootloader uses a unique device secret burnt into hardware of the computing device in order to generate an attestation of the first mutable firmware. The computing device has a second microcontroller. There is second mutable firmware at the second microcontroller. There is a second immutable bootloader at the second microcontroller which sends a measurement of the second mutable firmware to the first immutable bootloader whenever the second microcontroller restarts, such that the first microcontroller is able to include the measurement in the attestation.
    Type: Application
    Filed: July 13, 2021
    Publication date: January 19, 2023
    Inventors: Stavros VOLOS, Colin DOAK, Simon Douglas CHAMBERS, David RUGGLES, Richard NEAL, C├ędric Alain Marie FOURNET, Kapil VASWANI, Balaji VEMBU
  • Publication number: 20220413869
    Abstract: An apparatus to facilitate thread scheduling is disclosed. In one embodiment the apparatus includes a processor comprising a plurality of multiprocessors comprising single-instruction multiple thread (SIMT) execution circuitry to simultaneously execute multiple threads, a shared local memory to be shared by the multiple threads, and scheduling hardware logic to schedule the multiple threads in a thread group for execution across the plurality of multiprocessors in accordance with barrier data. The instructions of the multiple threads are to produce shared data to be stored in the shared local memory when executed by the plurality of multiprocessors, wherein additional instructions of at least a first thread of the multiple threads are to use the shared data, and wherein, in accordance with the barrier data, the first thread is to wait for other threads of the multiple threads to finish producing the shared data before executing the additional instructions.
    Type: Application
    Filed: August 31, 2022
    Publication date: December 29, 2022
    Applicant: Intel Corporation
    Inventors: Balaji Vembu, Abhishek R. Appu, Joydeep Ray, Altug Koker
  • Publication number: 20220405877
    Abstract: An apparatus and method for dynamic provisioning, quality of service, and prioritization in a graphics processor. For example, one embodiment of an apparatus comprises a graphics processing unit (GPU) comprising a plurality of graphics processing resources; slice configuration hardware logic to logically subdivide the graphics processing resources into a plurality of slices; and slice allocation hardware logic to allocate a designated number of slices to each virtual machine (VM) of a plurality of VMs running in a virtualized execution environment, the slice allocation hardware logic to allocate different numbers of slices to different VMs based on graphics processing requirements and/or priorities of each of the VMs.
    Type: Application
    Filed: May 31, 2022
    Publication date: December 22, 2022
    Inventors: Abhishek R. APPU, Joydeep RAY, Altug KOKER, Balaji VEMBU, Pattabhiraman K, Matthew B. CALLAWAY
  • Publication number: 20220405876
    Abstract: An apparatus to facilitate partitioning of a graphics device is disclosed. The apparatus includes a plurality of engines and logic to partition the plurality of engines to facilitate independent access to each engine within the plurality of engines.
    Type: Application
    Filed: May 6, 2022
    Publication date: December 22, 2022
    Applicant: Intel Corporation
    Inventors: Abhishek R. Appu, Balaji Vembu, Altug Koker, Bryan R. White, David J. Cowperthwaite, Joydeep Ray, Murali Ramadoss
  • Patent number: 11531510
    Abstract: In accordance with some embodiments, the render rate is varied across and/or up and down the display screen. This may be done based on where the user is looking in order to reduce power consumption and/or increase performance. Specifically the screen display is separated into regions, such as quadrants. Each of these regions is rendered at a rate determined by at least one of what the user is currently looking at, what the user has looked at in the past and/or what it is predicted that the user will look at next. Areas of less focus may be rendered at a lower rate, reducing power consumption in some embodiments.
    Type: Grant
    Filed: August 11, 2021
    Date of Patent: December 20, 2022
    Assignee: Intel Corporation
    Inventors: Eric J. Asperheim, Subramaniam M. Maiyuran, Kiran C. Veernapu, Sanjeev S. Jahagirdar, Balaji Vembu, Devan Burke, Philip R. Laws, Kamal Sinha, Abhishek R. Appu, Elmoustapha Ould-Ahmed-Vall, Peter L. Doyle, Joydeep Ray, Travis T. Schluessler, John H. Feit, Nikos Kaburlasos, Jacek Kwiatkowski, Altug Koker
  • Publication number: 20220398101
    Abstract: An apparatus to facilitate thread scheduling is disclosed. The apparatus includes logic to store barrier usage data based on a magnitude of barrier messages in an application kernel and a scheduler to schedule execution of threads across a plurality of multiprocessors based on the barrier usage data.
    Type: Application
    Filed: June 24, 2022
    Publication date: December 15, 2022
    Applicant: Intel Corporation
    Inventors: Balaji Vembu, Abhishek R. Appu, Joydeep Ray, Altug Koker
  • Publication number: 20220398147
    Abstract: Apparatus and method for scalable error reporting. For example, one embodiment of an apparatus comprises error detection circuitry to detect an error in a component of a first tile within a tile-based hierarchy of a processing device; error classification circuitry to classify the error and record first error data based on the classification; a first tile interface to combine the first error data with second error data received from one or more other components associated with the first tile to generate first accumulated error data; and a master tile interface to combine the first accumulated error data with second accumulated error data received from at least one other tile interface to generate second accumulated error data and to provide the second accumulated error data to a host executing an application to process the second accumulated error data.
    Type: Application
    Filed: June 24, 2022
    Publication date: December 15, 2022
    Inventors: Balaji VEMBU, Bryan WHITE, Ankur SHAH, Murali RAMADOSS, David PUFFER, Altug KOKER, Aditya NAVALE, Mahesh NATU