Patents by Inventor Vasanth Ranganathan

Vasanth Ranganathan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240134797
    Abstract: Embodiments described herein provide a technique to facilitate the broadcast or multicast of asynchronous loads to shared local memory of a plurality of graphics cores within a graphics core cluster. One embodiment provides a graphics processor including a cache memory a graphics core cluster coupled with the cache memory. The graphics core cluster includes a plurality of graphics cores. The plurality of graphics cores includes a graphics core configured to receive a designation as a producer graphics core for a multicast load, read data from the cache memory; and transmit the data read from the cache memory to a consumer graphics core of the plurality of graphics cores.
    Type: Application
    Filed: October 24, 2022
    Publication date: April 25, 2024
    Applicant: Intel Corporation
    Inventors: John A. Wiegert, Joydeep Ray, Vasanth Ranganathan, Biju George, Fangwen Fu, Abhishek R. Appu, Chunhui Mei, Changwon Rhee
  • Patent number: 11961179
    Abstract: One embodiment provides for a graphics processing unit comprising a processing cluster to perform multi-rate shading via coarse pixel shading and output shaded coarse pixels for processing by a post-shader pixel processing pipeline.
    Type: Grant
    Filed: April 24, 2023
    Date of Patent: April 16, 2024
    Assignee: Intel Corporation
    Inventors: Prasoonkumar Surti, Abhishek R. Appu, Subhajit Dasgupta, Srivallaba Mysore, Michael J. Norris, Vasanth Ranganathan, Joydeep Ray
  • Patent number: 11954062
    Abstract: Embodiments described herein provide techniques to enable the dynamic reconfiguration of memory on a general-purpose graphics processing unit. One embodiment described herein enables dynamic reconfiguration of cache memory bank assignments based on hardware statistics. One embodiment enables for virtual memory address translation using mixed four kilobyte and sixty-four kilobyte pages within the same page table hierarchy and under the same page directory. One embodiment provides for a graphics processor and associated heterogenous processing system having near and far regions of the same level of a cache hierarchy.
    Type: Grant
    Filed: March 14, 2020
    Date of Patent: April 9, 2024
    Assignee: Intel Corporation
    Inventors: Joydeep Ray, Niranjan Cooray, Subramaniam Maiyuran, Altug Koker, Prasoonkumar Surti, Varghese George, Valentin Andrei, Abhishek Appu, Guadalupe Garcia, Pattabhiraman K, Sungye Kim, Sanjay Kumar, Pratik Marolia, Elmoustapha Ould-Ahmed-Vall, Vasanth Ranganathan, William Sadler, Lakshminarayanan Striramassarma
  • Publication number: 20240111609
    Abstract: Low-latency synchronization utilizing local team barriers for thread team processing is described. An example of an apparatus includes one or more processors including a graphics processor, the graphics processor including a plurality of processing resources; and memory for storage of data including data for graphics processing, wherein the graphics processor is to receive a request for establishment of a local team barrier for a thread team, the thread team being allocated to a first processing resource, the thread team including multiple threads; determine requirements and designated threads for the local team barrier; and establish the local team barrier in a local register of the first processing resource based at least in part on the requirements and designated threads for the local barrier.
    Type: Application
    Filed: September 30, 2022
    Publication date: April 4, 2024
    Applicant: Intel Corporation
    Inventors: Biju George, Supratim Pal, James Valerio, Vasanth Ranganathan, Fangwen Fu, Chunhui Mei
  • Publication number: 20240111590
    Abstract: An apparatus to facilitate ordered thread dispatch for thread teams is disclosed. The apparatus includes one or more processors including a graphic processor, the graphics processor including a plurality of processing resources, and wherein the graphics processor is to: allocate a thread team local identifier (ID) for respective threads of a thread team comprising a plurality of hardware threads that are to be executed solely by a processing resource of the plurality of processing resources; and dispatch the respective threads together into the processing resource, the respective threads having the thread team local ID allocated.
    Type: Application
    Filed: September 30, 2022
    Publication date: April 4, 2024
    Applicant: Intel Corporation
    Inventors: Biju George, Vasanth Ranganathan, Fangwen Fu, Ben Ashbaugh, Roland Schulz
  • Patent number: 11934342
    Abstract: Embodiments are generally directed to graphics processor data access and sharing. An embodiment of an apparatus includes a circuit element to produce a result in processing of an application; a load-store unit to receive the result and generate pre-fetch information for a cache utilizing the result; and a prefetch generator to produce prefetch addresses based at least in part on the pre-fetch information; wherein the load-store unit is to receive software assistance for prefetching, and wherein generation of the pre-fetch information is based at least in part on the software assistance.
    Type: Grant
    Filed: March 14, 2020
    Date of Patent: March 19, 2024
    Assignee: INTEL CORPORATION
    Inventors: Altug Koker, Varghese George, Aravindh Anantaraman, Valentin Andrei, Abhishek R. Appu, Niranjan Cooray, Nicolas Galoppo Von Borries, Mike MacPherson, Subramaniam Maiyuran, ElMoustapha Ould-Ahmed-Vall, David Puffer, Vasanth Ranganathan, Joydeep Ray, Ankur N. Shah, Lakshminarayanan Striramassarma, Prasoonkumar Surti, Saurabh Tangri
  • Publication number: 20240086357
    Abstract: Systems and methods for updating remote memory side caches in a multi-GPU configuration are disclosed herein. In one embodiment, a graphics processor for a multi-tile architecture includes a first graphics processing unit (GPU) having a first memory, a first memory side cache memory, a first communication fabric, and a first memory management unit (MMU). The graphics processor includes a second graphics processing unit (GPU) having a second memory, a second memory side cache memory, a second memory management unit (MMU), and a second communication fabric that is communicatively coupled to the first communication fabric. The first MMU is configured to control memory requests for the first memory, to update content in the first memory, to update content in the first memory side cache memory, and to determine whether to update the content in the second memory side cache memory.
    Type: Application
    Filed: November 21, 2023
    Publication date: March 14, 2024
    Applicant: Intel Corporation
    Inventors: Altug Koker, Joydeep Ray, Aravindh Anantaraman, Valentin Andrei, Abhishek Appu, Sean Coleman, Nicolas Galoppo Von Borries, Varghese George, Pattabhiraman K, SungYe Kim, Mike Macpherson, Subramaniam Maiyuran, Elmoustapha Ould-Ahmed-Vall, Vasanth Ranganathan, James Valerio
  • Publication number: 20240078630
    Abstract: Embodiments described herein are generally directed to improvements relating to power, latency, bandwidth and/or performance issues relating to GPU processing/caching. According to one embodiment, a system includes a producer intellectual property (IP) (e.g., a media IP), a compute core (e.g., a GPU or an AI-specific core of the GPU), a streaming buffer logically interposed between the producer IP and the compute core. The producer IP is operable to consume data from memory and output results to the streaming buffer. The compute core is operable to perform AI inference processing based on data consumed from the streaming buffer and output AI inference processing results to the memory.
    Type: Application
    Filed: October 19, 2023
    Publication date: March 7, 2024
    Applicant: Intel Corporation
    Inventors: Subramaniam Maiyuran, Durgaprasad Bilagi, Joydeep Ray, Scott Janus, Sanjeev Jahagirdar, Brent Insko, Lidong Xu, Abhishek R. Appu, James Holland, Vasanth Ranganathan, Nikos Kaburlasos, Altug Koker, Xinmin Tian, Guei-Yuan Lueh, Changliang Wang
  • Patent number: 11915357
    Abstract: Apparatus and method for stack throttling.
    Type: Grant
    Filed: March 16, 2020
    Date of Patent: February 27, 2024
    Assignee: Intel Corporation
    Inventors: Karthik Vaidyanathan, Abhishek Appu, Vasanth Ranganathan, Joydeep Ray, Prasoonkumar Surti
  • Publication number: 20240054595
    Abstract: Embodiments described herein provide a system of concurrent compute queues that enable the scheduling of a large number of compute contexts simultaneously on graphics processor hardware. One embodiment provides an apparatus comprising a system interface and a general-purpose graphics processor coupled with the system interface. The general-purpose graphics processor comprises a plurality of graphics processor hardware resources configured to be partitioned into a plurality of isolated partitions, each of the plurality of isolated partitions including a first command streamer, a second command streamer, and circuitry configured to schedule general-purpose graphics compute workloads submitted to a first plurality of command queues associated with the first command streamer and a second plurality of command queues associated with the second command streamer.
    Type: Application
    Filed: August 10, 2022
    Publication date: February 15, 2024
    Applicant: Intel Corporation
    Inventors: Joydeep Ray, Vasanth Ranganathan, James Valerio, Jeffery S. Boles, Hema Chand Nalluri, Aditya Navale, Ben J. Ashbaugh, Michal Mrozek, Murali Ramadoss, Hong Jiang, Ankur Shah
  • Patent number: 11892950
    Abstract: Embodiments are generally directed to data prefetching for graphics data processing. An embodiment of an apparatus includes one or more processors including one or more graphics processing units (GPUs); and a plurality of caches to provide storage for the one or more GPUs, the plurality of caches including at least an L1 cache and an L3 cache, wherein the apparatus to provide intelligent prefetching of data by a prefetcher of a first GPU of the one or more GPUs including measuring a hit rate for the L1 cache; upon determining that the hit rate for the L1 cache is equal to or greater than a threshold value, limiting a prefetch of data to storage in the L3 cache, and upon determining that the hit rate for the L1 cache is less than a threshold value, allowing the prefetch of data to the L1 cache.
    Type: Grant
    Filed: July 15, 2022
    Date of Patent: February 6, 2024
    Assignee: INTEL CORPORATION
    Inventors: Vikranth Vemulapalli, Lakshminarayanan Striramassarma, Mike MacPherson, Aravindh Anantaraman, Ben Ashbaugh, Murali Ramadoss, William B. Sadler, Jonathan Pearce, Scott Janus, Brent Insko, Vasanth Ranganathan, Kamal Sinha, Arthur Hunter, Jr., Prasoonkumar Surti, Nicolas Galoppo von Borries, Joydeep Ray, Abhishek R. Appu, ElMoustapha Ould-Ahmed-Vall, Altug Koker, Sungye Kim, Subramaniam Maiyuran, Valentin Andrei
  • Publication number: 20240013337
    Abstract: A mechanism is described for detecting, at training time, information related to one or more tasks to be performed by the one or more processors according to a training dataset for a neural network, analyzing the information to determine one or more portions of hardware of a processor of the one or more processors that is configurable to support the one or more tasks, configuring the hardware to pre-select the one or more portions to perform the one or more tasks, while other portions of the hardware remain available for other tasks, and monitoring utilization of the hardware via a hardware unit of the graphics processor and, via a scheduler of the graphics processor, adjusting allocation of the one or more tasks to the one or more portions of the hardware based on the utilization.
    Type: Application
    Filed: July 13, 2023
    Publication date: January 11, 2024
    Applicant: Intel Corporation
    Inventors: Abhishek R. Appu, Altug Koker, John C. Weast, Mike B. Macpherson, Linda L. Hurd, Sara S. Baghsorkhi, Justin E. Gottschlich, Prasoonkumar Surti, Chandrasekaran Sakthivel, Liwei Ma, Elmoustapha Ould-Ahmed-Vall, Kamal Sinha, Joydeep Ray, Balaji Vembu, Sanjeev Jahagirdar, Vasanth Ranganathan, Dukhwan Kim
  • Publication number: 20240012767
    Abstract: An apparatus to facilitate efficient data sharing for graphics data processing operations is disclosed. The apparatus includes a processing resource to generate a stream of instructions, an L1 cache communicably coupled to the processing resource and comprising an on-page detector circuit to determine that a set of memory requests in the stream of instructions access a same memory page; and set a marker in a first request of the set of memory requests; and arbitration circuitry communicably coupled to the L1 cache, the arbitration circuitry to route the set of memory requests to memory comprising the memory page and to, in response to receiving the first request with the marker set, remain with the processing resource to process the set of memory requests.
    Type: Application
    Filed: July 25, 2023
    Publication date: January 11, 2024
    Applicant: Intel Corporation
    Inventors: Joydeep Ray, Altug Koker, Elmoustapha Ould-Ahmed-Vall, Michael Macpherson, Aravindh V. Anantaraman, Vasanth Ranganathan, Lakshminarayanan Striramassarma, Varghese George, Abhishek Appu, Prasoonkumar Surti
  • Patent number: 11868264
    Abstract: One embodiment provides circuitry coupled with cache memory and a memory interface, the circuitry to compress compute data at multiple cache line granularity, and a processing resource coupled with the memory interface and the cache memory. The processing resource is configured to perform a general-purpose compute operation on compute data associated with multiple cache lines of the cache memory. The circuitry is configured to compress the compute data before a write of the compute data via the memory interface to the memory bus, in association with a read of the compute data associated with the multiple cache lines via the memory interface, decompress the compute data, and provide the decompressed compute data to the processing resource.
    Type: Grant
    Filed: February 13, 2023
    Date of Patent: January 9, 2024
    Assignee: Intel Corporation
    Inventors: Abhishek R. Appu, Altug Koker, Joydeep Ray, David Puffer, Prasoonkumar Surti, Lakshminarayanan Striramassarma, Vasanth Ranganathan, Kiran C. Veernapu, Balaji Vembu, Pattabhiraman K
  • Publication number: 20240005136
    Abstract: In an example, an apparatus comprises a compute engine comprising a high precision component and a low precision component; and logic, at least partially including hardware logic, to receive instructions in the compute engine; select at least one of the high precision component or the low precision component to execute the instructions; and apply a gate to at least one of the high precision component or the low precision component to execute the instructions. Other embodiments are also disclosed and claimed.
    Type: Application
    Filed: July 12, 2023
    Publication date: January 4, 2024
    Applicant: Intel Corporation
    Inventors: Kamal Sinha, Balaji Vembu, Eriko Nurvitadhi, Nicolas C. Galoppo Von Borries, Rajkishore Barik, Tsung-Han Lin, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Anbang Yao, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Farshad Akhbari, Narayan Srinivasa, Feng Chen, Dukhwan Kim, Nadathur Rajagopalan Satish, John C. Weast, Mike B. MacPherson, Linda L. Hurd, Vasanth Ranganathan, Sanjeev Jahagirdar
  • Patent number: 11861761
    Abstract: Embodiments described herein are generally directed to improvements relating to power, latency, bandwidth and/or performance issues relating to GPU processing/caching. According to one embodiment, a system includes a producer intellectual property (IP) (e.g., a media IP), a compute core (e.g., a GPU or an AI-specific core of the GPU), a streaming buffer logically interposed between the producer IP and the compute core. The producer IP is operable to consume data from memory and output results to the streaming buffer. The compute core is operable to perform AI inference processing based on data consumed from the streaming buffer and output AI inference processing results to the memory.
    Type: Grant
    Filed: November 11, 2020
    Date of Patent: January 2, 2024
    Assignee: Intel Corporation
    Inventors: Subramaniam Maiyuran, Durgaprasad Bilagi, Joydeep Ray, Scott Janus, Sanjeev Jahagirdar, Brent Insko, Lidong Xu, Abhishek R. Appu, James Holland, Vasanth Ranganathan, Nikos Kaburlasos, Altug Koker, Xinmin Tian, Guei-Yuan Lueh, Changliang Wang
  • Publication number: 20230418617
    Abstract: Embodiments described herein provide an apparatus comprising a plurality of processing resources including a first processing resource and a second processing resource, a memory communicatively coupled to the first processing resource and the second processing resource, and a processor to receive data dependencies for one or more tasks comprising one or more producer tasks executing on the first processing resource and one or more consumer tasks executing on the second processing resource and move a data output from one or more producer tasks executing on the first processing resource to a cache memory communicatively coupled to the second processing resource. Other embodiments may be described and claimed.
    Type: Application
    Filed: June 22, 2023
    Publication date: December 28, 2023
    Applicant: INTEL CORPORATION
    Inventors: Christopher J. HUGHES, Prasoonkumar SURTI, Guei-Yuan LUEH, Adam T. LAKE, Jill BOYCE, Subramaniam MAIYURAN, Lidong XU, James M. HOLLAND, Vasanth RANGANATHAN, Nikos KABURLASOS, Altug KOKER, Abhishek R. Appu
  • Publication number: 20230401064
    Abstract: A graphics processing device is provided that includes a set of compute units to execute a workload, a cache coupled with the set of compute units, and circuitry coupled with the cache and the set of compute units. The circuitry is configured to, in response to a cache miss for the read from a first cache, broadcast an event within the graphics processor device to identify data associated with the cache miss, receive the event at a second compute unit in the set of compute units, and prefetch the data identified by the event into a second cache that is local to the second compute unit before an attempt to read the instruction or data by the second thread.
    Type: Application
    Filed: July 6, 2023
    Publication date: December 14, 2023
    Applicant: Intel Corporation
    Inventors: JAMES VALERIO, VASANTH RANGANATHAN, JOYDEEP RAY, PRADEEP RAMANI
  • Patent number: 11842423
    Abstract: Embodiments described herein include software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. One embodiment provides for data aware sparsity via compressed bitstreams. One embodiment provides for block sparse dot product instructions. One embodiment provides for a depth-wise adapter for a systolic array.
    Type: Grant
    Filed: December 15, 2020
    Date of Patent: December 12, 2023
    Assignee: Intel Corporation
    Inventors: Abhishek Appu, Subramaniam Maiyuran, Mike Macpherson, Fangwen Fu, Jiasheng Chen, Varghese George, Vasanth Ranganathan, Ashutosh Garg, Joydeep Ray
  • Publication number: 20230386130
    Abstract: One embodiment provides for a graphics processing unit comprising a processing cluster to perform multi-rate shading via coarse pixel shading and output shaded coarse pixels for processing by a post-shader pixel processing pipeline.
    Type: Application
    Filed: April 24, 2023
    Publication date: November 30, 2023
    Applicant: Intel Corporation
    Inventors: Prasoonkumar Surti, Abhishek R. Appu, Subhajit Dasgupta, Srivallaba Mysore, Michael J. Norris, Vasanth Ranganathan, Joydeep Ray