Patents by Inventor Vasanth Ranganathan

Vasanth Ranganathan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12045658
    Abstract: Apparatus and method for stack access throttling for synchronous ray tracing. For example, one embodiment of an apparatus comprises: ray tracing acceleration hardware to manage active ray tracing stack allocations to ensure that a size of the active ray tracing stack allocations remains within a threshold; and an execution unit to execute a thread to explicitly request a new ray tracing stack allocation from the ray tracing acceleration hardware, the ray tracing acceleration hardware to permit the new ray tracing stack allocation if the size of the active ray tracing stack allocations will remain within the threshold after permitting the new ray tracing stack allocation.
    Type: Grant
    Filed: January 31, 2022
    Date of Patent: July 23, 2024
    Assignee: Intel Corporation
    Inventors: Pawel Majewski, Prasoonkumar Surti, Karthik Vaidyanathan, Joshua Barczak, Vasanth Ranganathan, Vikranth Vemulapalli
  • Patent number: 12039331
    Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute an intermediate product of 16-bit operands and to compute a 32-bit sum based on the intermediate product.
    Type: Grant
    Filed: October 17, 2022
    Date of Patent: July 16, 2024
    Assignee: Intel Corporation
    Inventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
  • Publication number: 20240232094
    Abstract: One embodiment provides circuitry coupled with cache memory and a memory interface, the circuitry to compress compute data at multiple cache line granularity, and a processing resource coupled with the memory interface and the cache memory. The processing resource is configured to perform a general-purpose compute operation on compute data associated with multiple cache lines of the cache memory. The circuitry is configured to compress the compute data before a write of the compute data via the memory interface to the memory bus, in association with a read of the compute data associated with the multiple cache lines via the memory interface, decompress the compute data, and provide the decompressed compute data to the processing resource.
    Type: Application
    Filed: January 5, 2024
    Publication date: July 11, 2024
    Applicant: Intel Corporation
    Inventors: Abhishek R. Appu, Altug Koker, Joydeep Ray, David Puffer, Prasoonkumar Surti, Lakshminarayanan Striramassarma, Vasanth Ranganathan, Kiran C. Veernapu, Balaji Vembu, Pattabhiraman K
  • Publication number: 20240232088
    Abstract: Embodiments described herein provide a technique to facilitate the broadcast or multicast of asynchronous loads to shared local memory of a plurality of graphics cores within a graphics core cluster. One embodiment provides a graphics processor including a cache memory a graphics core cluster coupled with the cache memory. The graphics core cluster includes a plurality of graphics cores. The plurality of graphics cores includes a graphics core configured to receive a designation as a producer graphics core for a multicast load, read data from the cache memory; and transmit the data read from the cache memory to a consumer graphics core of the plurality of graphics cores.
    Type: Application
    Filed: October 25, 2022
    Publication date: July 11, 2024
    Applicant: Intel Corporation
    Inventors: John A. Wiegert, Joydeep Ray, Vasanth Ranganathan, Biju George, Fangwen Fu, Abhishek R. Appu, Chunhui Mei, Changwon Rhee
  • Patent number: 12032496
    Abstract: An apparatus to facilitate efficient data sharing for graphics data processing operations is disclosed. The apparatus includes a processing resource to generate a stream of instructions, an L1 cache communicably coupled to the processing resource and comprising an on-page detector circuit to determine that a set of memory requests in the stream of instructions access a same memory page; and set a marker in a first request of the set of memory requests; and arbitration circuitry communicably coupled to the L1 cache, the arbitration circuitry to route the set of memory requests to memory comprising the memory page and to, in response to receiving the first request with the marker set, remain with the processing resource to process the set of memory requests.
    Type: Grant
    Filed: July 25, 2023
    Date of Patent: July 9, 2024
    Assignee: INTEL CORPORATION
    Inventors: Joydeep Ray, Altug Koker, Elmoustapha Ould-Ahmed-Vall, Michael Macpherson, Aravindh V. Anantaraman, Vasanth Ranganathan, Lakshminarayanan Striramassarma, Varghese George, Abhishek Appu, Prasoonkumar Surti
  • Publication number: 20240220254
    Abstract: Data multicast in compute core clusters is described. An example of an apparatus includes one or more processors including at least a first processor, the first processor including one or more clusters of cores and a memory, wherein each cluster of cores includes multiple cores, each core including one or more processing resources, shared memory, and broadcast circuitry; and wherein a first core in a first cluster of cores is to request a data element, determine whether any additional cores in the first cluster require the data element, and, upon determining that one or more additional cores in the first cluster require the data element, broadcast the data element to the one or more additional cores via interconnects between the broadcast circuitry of the cores of the first core cluster.
    Type: Application
    Filed: December 30, 2022
    Publication date: July 4, 2024
    Applicant: Intel Corporation
    Inventors: Chunhui Mei, Yongsheng Liu, John A. Wiegert, Vasanth Ranganathan, Ben J. Ashbaugh, Fangwen Fu, Hong Jiang, Guei-Yuan Lueh, James Valerio, Alan M. Curtis, Maxim Kazakov
  • Publication number: 20240221295
    Abstract: One embodiment provides for a graphics processing unit comprising a processing cluster to perform multi-rate shading via coarse pixel shading and output shaded coarse pixels for processing by a post-shader pixel processing pipeline.
    Type: Application
    Filed: February 8, 2024
    Publication date: July 4, 2024
    Applicant: Intel Corporation
    Inventors: Prasoonkumar Surti, Abhishek R. Appu, Subhajit Dasgupta, Srivallaba Mysore, Michael J. Norris, Vasanth Ranganathan, Joydeep Ray
  • Publication number: 20240220335
    Abstract: Synchronization for data multicast in compute core clusters is described. An example of an apparatus includes one or more processors including at least a graphics processing unit (GPU), the GPU including one or more clusters of cores and a memory, wherein each cluster of cores includes a plurality of cores, each core including one or more processing resources, shared local memory, and gateway circuitry, wherein the GPU is to initiate broadcast of a data element from a producer core to one or more consumer cores, and synchronize the broadcast of the data element utilizing the gateway circuitry of the producer core and the one or more consumer cores, and wherein synchronizing the broadcast of the data element includes establishing a multi-core barrier for broadcast of the data element.
    Type: Application
    Filed: December 30, 2022
    Publication date: July 4, 2024
    Applicant: Intel Corporation
    Inventors: Chunhui Mei, Yongsheng Liu, John A. Wiegert, Vasanth Ranganathan, Ben J. Ashbaugh, Fangwen Fu, Hong Jiang, Guei-Yuan Lueh, James Valerio, Alan M. Curtis, Maxim Kazakov
  • Publication number: 20240211403
    Abstract: One embodiment provides a graphics processor comprising memory access circuitry configured to generate a virtual address for pixel data at a pixel coordinate on a surface in memory to facilitate the caching of the pixel data in a cache memory before the actual memory address of the pixel coordinate is able to be determined.
    Type: Application
    Filed: December 21, 2022
    Publication date: June 27, 2024
    Applicant: Intel Corporation
    Inventors: Abhishek R. Appu, Joydeep Ray, Karthik Vaidyanathan, Sreedhar Chalasani, Eric Liskay, Prathamesh Raghunath Shinde, Vasanth Ranganathan, Michael J. Norris, Rajasekhar Pantangi, Altug Koker
  • Patent number: 12020370
    Abstract: Embodiments described herein provide for a technique to improve the culling efficiency of coarse depth testing. One embodiment provides for a graphics processor that is configured to perform a method to track a history of source fragments that are tested against a destination tile. When a combination of partial fragments sum to full coverage, the most conservative source far depth value is used instead of the previous destination far depth value. When the combination sums to partial coverage, the previous destination far depth value is retained.
    Type: Grant
    Filed: March 24, 2023
    Date of Patent: June 25, 2024
    Assignee: Intel Corporation
    Inventors: Saikat Mandal, Vasanth Ranganathan
  • Patent number: 12013808
    Abstract: Embodiments are generally directed to a multi-tile architecture for graphics operations. An embodiment of an apparatus includes a multi-tile architecture for graphics operations including a multi-tile graphics processor, the multi-tile processor includes one or more dies; multiple processor tiles installed on the one or more dies; and a structure to interconnect the processor tiles on the one or more dies, wherein the structure to enable communications between processor tiles the processor tiles.
    Type: Grant
    Filed: March 14, 2020
    Date of Patent: June 18, 2024
    Assignee: INTEL CORPORATION
    Inventors: Altug Koker, Ben Ashbaugh, Scott Janus, Aravindh Anantaraman, Abhishek R. Appu, Niranjan Cooray, Varghese George, Arthur Hunter, Brent E. Insko, Elmoustapha Ould-Ahmed-Vall, Selvakumar Panneer, Vasanth Ranganathan, Joydeep Ray, Kamal Sinha, Lakshminarayanan Striramassarma, Prasoonkumar Surti, Saurabh Tangri
  • Publication number: 20240184739
    Abstract: Embodiments described herein provide techniques to enable the dynamic reconfiguration of memory on a general-purpose graphics processing unit. One embodiment described herein enables dynamic reconfiguration of cache memory bank assignments based on hardware statistics. One embodiment enables for virtual memory address translation using mixed four kilobyte and sixty-four kilobyte pages within the same page table hierarchy and under the same page directory. One embodiment provides for a graphics processor and associated heterogenous processing system having near and far regions of the same level of a cache hierarchy.
    Type: Application
    Filed: February 5, 2024
    Publication date: June 6, 2024
    Applicant: INTEL CORPORATION
    Inventors: Joydeep RAY, Niranjan COORAY, Subramaniam MAIYURAN, Altug KOKER, Prasoonkumar SURTI, Varghese GEORGE, Valentin ANDREI, Abhishek APPU, Guadalupe GARCIA, Pattabhiraman K, Sungye KIM, Sanjay KUMAR, Pratik MAROLIA, Elmoustapha OULD-AHMED-VALL, Vasanth RANGANATHAN, William SADLER, Lakshminarayanan STRIRAMASSARMA
  • Publication number: 20240184572
    Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute an intermediate product of 16-bit operands and to compute a 32-bit sum based on the intermediate product.
    Type: Application
    Filed: December 4, 2023
    Publication date: June 6, 2024
    Applicant: Intel Corporation
    Inventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
  • Patent number: 12001209
    Abstract: A method of embodiments, as described herein, includes detecting thread groups relating to machine learning associated with one or more processing devices. The method may further include facilitating barrier synchronization of the thread groups across multiple dies such that each thread in a thread group is scheduled across a set of compute elements associated with the multiple dies, where each die represents a processing device of the one or more processing devices, the processing device including a graphics processor.
    Type: Grant
    Filed: May 23, 2022
    Date of Patent: June 4, 2024
    Assignee: Intel Corporation
    Inventors: Abhishek R. Appu, Altug Koker, Joydeep Ray, Balaji Vembu, John C. Weast, Mike B. Macpherson, Dukhwan Kim, Linda L. Hurd, Sanjeev Jahagirdar, Vasanth Ranganathan
  • Patent number: 11995029
    Abstract: Multi-tile Memory Management for Detecting Cross Tile Access, Providing Multi-Tile Inference Scaling with multicasting of data via copy operation, and Providing Page Migration are disclosed herein. In one embodiment, a graphics processor for a multi-tile architecture includes a first graphics processing unit (GPU) having a memory and a memory controller, a second graphics processing unit (GPU) having a memory and a cross-GPU fabric to communicatively couple the first and second GPUs. The memory controller is configured to determine whether frequent cross tile memory accesses occur from the first GPU to the memory of the second GPU in the multi-GPU configuration and to send a message to initiate a data transfer mechanism when frequent cross tile memory accesses occur from the first GPU to the memory of the second GPU.
    Type: Grant
    Filed: March 14, 2020
    Date of Patent: May 28, 2024
    Assignee: Intel Corporation
    Inventors: Lakshminarayanan Striramassarma, Prasoonkumar Surti, Varghese George, Ben Ashbaugh, Aravindh Anantaraman, Valentin Andrei, Abhishek Appu, Nicolas Galoppo Von Borries, Altug Koker, Mike Macpherson, Subramaniam Maiyuran, Nilay Mistry, Elmoustapha Ould-Ahmed-Vall, Selvakumar Panneer, Vasanth Ranganathan, Joydeep Ray, Ankur Shah, Saurabh Tangri
  • Publication number: 20240161227
    Abstract: Embodiments described herein include software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. One embodiment provides for data aware sparsity via compressed bitstreams. One embodiment provides for block sparse dot product instructions. One embodiment provides for a depth-wise adapter for a systolic array.
    Type: Application
    Filed: December 7, 2023
    Publication date: May 16, 2024
    Applicant: Intel Corporation
    Inventors: Abhishek Appu, Subramaniam Maiyuran, Mike Macpherson, Fangwen Fu, Jiasheng Chen, Varghese George, Vasanth Ranganathan, Ashutosh Garg, Joydeep Ray
  • Publication number: 20240160478
    Abstract: An apparatus to facilitate increasing processing resources in processing cores of a graphics environment is disclosed. The apparatus includes a plurality of processing resources to execute one or more execution threads; a plurality of message arbiter-processing resource (MA-PR) routers, wherein a respective MA-PR router of the plurality of MA-PR routers corresponds to a pair of processing resources of the plurality of processing resources and is to arbitrate routing of a thread control message from a message arbiter between the pair of processing resources; a plurality of local shared cache (LSC) sequencers to provide an interface between at least one LSC of the processing core and the plurality of processing resources; and a plurality of instruction caches (ICs) to store instructions of the one or more execution threads, wherein a respective IC of the plurality of ICs interfaces with a portion of the plurality of processing resources.
    Type: Application
    Filed: November 15, 2022
    Publication date: May 16, 2024
    Applicant: Intel Corporation
    Inventors: Jiasheng Chen, Chunhui Mei, Ben J. Ashbaugh, Naveen Matam, Joydeep Ray, Timothy Bauer, Guei-Yuan Lueh, Vasanth Ranganathan, Prashant Chaudhari, Vikranth Vemulapalli, Nishanth Reddy Pendluru, Piotr Reiter, Jain Philip, Marek Rudniewski, Christopher Spencer, Parth Damani, Prathamesh Raghunath Shinde, John Wiegert, Fataneh Ghodrat
  • Publication number: 20240134797
    Abstract: Embodiments described herein provide a technique to facilitate the broadcast or multicast of asynchronous loads to shared local memory of a plurality of graphics cores within a graphics core cluster. One embodiment provides a graphics processor including a cache memory a graphics core cluster coupled with the cache memory. The graphics core cluster includes a plurality of graphics cores. The plurality of graphics cores includes a graphics core configured to receive a designation as a producer graphics core for a multicast load, read data from the cache memory; and transmit the data read from the cache memory to a consumer graphics core of the plurality of graphics cores.
    Type: Application
    Filed: October 24, 2022
    Publication date: April 25, 2024
    Applicant: Intel Corporation
    Inventors: John A. Wiegert, Joydeep Ray, Vasanth Ranganathan, Biju George, Fangwen Fu, Abhishek R. Appu, Chunhui Mei, Changwon Rhee
  • Patent number: 11961179
    Abstract: One embodiment provides for a graphics processing unit comprising a processing cluster to perform multi-rate shading via coarse pixel shading and output shaded coarse pixels for processing by a post-shader pixel processing pipeline.
    Type: Grant
    Filed: April 24, 2023
    Date of Patent: April 16, 2024
    Assignee: Intel Corporation
    Inventors: Prasoonkumar Surti, Abhishek R. Appu, Subhajit Dasgupta, Srivallaba Mysore, Michael J. Norris, Vasanth Ranganathan, Joydeep Ray
  • Patent number: 11954062
    Abstract: Embodiments described herein provide techniques to enable the dynamic reconfiguration of memory on a general-purpose graphics processing unit. One embodiment described herein enables dynamic reconfiguration of cache memory bank assignments based on hardware statistics. One embodiment enables for virtual memory address translation using mixed four kilobyte and sixty-four kilobyte pages within the same page table hierarchy and under the same page directory. One embodiment provides for a graphics processor and associated heterogenous processing system having near and far regions of the same level of a cache hierarchy.
    Type: Grant
    Filed: March 14, 2020
    Date of Patent: April 9, 2024
    Assignee: Intel Corporation
    Inventors: Joydeep Ray, Niranjan Cooray, Subramaniam Maiyuran, Altug Koker, Prasoonkumar Surti, Varghese George, Valentin Andrei, Abhishek Appu, Guadalupe Garcia, Pattabhiraman K, Sungye Kim, Sanjay Kumar, Pratik Marolia, Elmoustapha Ould-Ahmed-Vall, Vasanth Ranganathan, William Sadler, Lakshminarayanan Striramassarma