Patents by Inventor Lakshminarayanan Striramassarma

Lakshminarayanan Striramassarma has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11954062
    Abstract: Embodiments described herein provide techniques to enable the dynamic reconfiguration of memory on a general-purpose graphics processing unit. One embodiment described herein enables dynamic reconfiguration of cache memory bank assignments based on hardware statistics. One embodiment enables for virtual memory address translation using mixed four kilobyte and sixty-four kilobyte pages within the same page table hierarchy and under the same page directory. One embodiment provides for a graphics processor and associated heterogenous processing system having near and far regions of the same level of a cache hierarchy.
    Type: Grant
    Filed: March 14, 2020
    Date of Patent: April 9, 2024
    Assignee: Intel Corporation
    Inventors: Joydeep Ray, Niranjan Cooray, Subramaniam Maiyuran, Altug Koker, Prasoonkumar Surti, Varghese George, Valentin Andrei, Abhishek Appu, Guadalupe Garcia, Pattabhiraman K, Sungye Kim, Sanjay Kumar, Pratik Marolia, Elmoustapha Ould-Ahmed-Vall, Vasanth Ranganathan, William Sadler, Lakshminarayanan Striramassarma
  • Patent number: 11934342
    Abstract: Embodiments are generally directed to graphics processor data access and sharing. An embodiment of an apparatus includes a circuit element to produce a result in processing of an application; a load-store unit to receive the result and generate pre-fetch information for a cache utilizing the result; and a prefetch generator to produce prefetch addresses based at least in part on the pre-fetch information; wherein the load-store unit is to receive software assistance for prefetching, and wherein generation of the pre-fetch information is based at least in part on the software assistance.
    Type: Grant
    Filed: March 14, 2020
    Date of Patent: March 19, 2024
    Assignee: INTEL CORPORATION
    Inventors: Altug Koker, Varghese George, Aravindh Anantaraman, Valentin Andrei, Abhishek R. Appu, Niranjan Cooray, Nicolas Galoppo Von Borries, Mike MacPherson, Subramaniam Maiyuran, ElMoustapha Ould-Ahmed-Vall, David Puffer, Vasanth Ranganathan, Joydeep Ray, Ankur N. Shah, Lakshminarayanan Striramassarma, Prasoonkumar Surti, Saurabh Tangri
  • Patent number: 11908542
    Abstract: Prior knowledge of access pattern is leveraged to improve energy dissipation for general matrix operations. This improves memory access energy for a multitude of applications such as image processing, deep neural networks, and scientific computing workloads, for example. In some embodiments, prior knowledge of access pattern allows for burst read and/or write operations. As such, burst mode solution can provide energy savings in both READ (RD) and WRITE (WR) operations. For machine learning or inference, the weight values are known ahead in time (e.g., inference operation), and so the unused bytes in the cache line are exploited to store a sparsity map that is used for disabling read from either upper or lower half of the cache line, thus saving dynamic capacitance.
    Type: Grant
    Filed: December 23, 2019
    Date of Patent: February 20, 2024
    Assignee: Intel Corporation
    Inventors: Charles Augustine, Somnath Paul, Turbo Majumder, Iqbal Rajwani, Andrew Lines, Altug Koker, Lakshminarayanan Striramassarma, Muhammad Khellah
  • Patent number: 11892950
    Abstract: Embodiments are generally directed to data prefetching for graphics data processing. An embodiment of an apparatus includes one or more processors including one or more graphics processing units (GPUs); and a plurality of caches to provide storage for the one or more GPUs, the plurality of caches including at least an L1 cache and an L3 cache, wherein the apparatus to provide intelligent prefetching of data by a prefetcher of a first GPU of the one or more GPUs including measuring a hit rate for the L1 cache; upon determining that the hit rate for the L1 cache is equal to or greater than a threshold value, limiting a prefetch of data to storage in the L3 cache, and upon determining that the hit rate for the L1 cache is less than a threshold value, allowing the prefetch of data to the L1 cache.
    Type: Grant
    Filed: July 15, 2022
    Date of Patent: February 6, 2024
    Assignee: INTEL CORPORATION
    Inventors: Vikranth Vemulapalli, Lakshminarayanan Striramassarma, Mike MacPherson, Aravindh Anantaraman, Ben Ashbaugh, Murali Ramadoss, William B. Sadler, Jonathan Pearce, Scott Janus, Brent Insko, Vasanth Ranganathan, Kamal Sinha, Arthur Hunter, Jr., Prasoonkumar Surti, Nicolas Galoppo von Borries, Joydeep Ray, Abhishek R. Appu, ElMoustapha Ould-Ahmed-Vall, Altug Koker, Sungye Kim, Subramaniam Maiyuran, Valentin Andrei
  • Publication number: 20240013338
    Abstract: Embodiments described herein provide techniques to disaggregate an architecture of a system on a chip integrated circuit into multiple distinct chiplets that can be packaged onto a common chassis. In one embodiment, a graphics processing unit or parallel processor is composed from diverse silicon chiplets that are separately manufactured. A chiplet is an at least partially and distinctly packaged integrated circuit that includes distinct units of logic that can be assembled with other chiplets into a larger package. A diverse set of chiplets with different IP core logic can be assembled into a single device.
    Type: Application
    Filed: September 20, 2023
    Publication date: January 11, 2024
    Applicant: Intel Corporation
    Inventors: Naveen Matam, Lance Cheney, Eric Finley, Varghese George, Sanjeev Jahagirdar, Altug Koker, Josh Mastronarde, Iqbal Rajwani, Lakshminarayanan Striramassarma, Melaku Teshome, Vikranth Vemulapalli, Binoj Xavier
  • Publication number: 20240012767
    Abstract: An apparatus to facilitate efficient data sharing for graphics data processing operations is disclosed. The apparatus includes a processing resource to generate a stream of instructions, an L1 cache communicably coupled to the processing resource and comprising an on-page detector circuit to determine that a set of memory requests in the stream of instructions access a same memory page; and set a marker in a first request of the set of memory requests; and arbitration circuitry communicably coupled to the L1 cache, the arbitration circuitry to route the set of memory requests to memory comprising the memory page and to, in response to receiving the first request with the marker set, remain with the processing resource to process the set of memory requests.
    Type: Application
    Filed: July 25, 2023
    Publication date: January 11, 2024
    Applicant: Intel Corporation
    Inventors: Joydeep Ray, Altug Koker, Elmoustapha Ould-Ahmed-Vall, Michael Macpherson, Aravindh V. Anantaraman, Vasanth Ranganathan, Lakshminarayanan Striramassarma, Varghese George, Abhishek Appu, Prasoonkumar Surti
  • Patent number: 11868264
    Abstract: One embodiment provides circuitry coupled with cache memory and a memory interface, the circuitry to compress compute data at multiple cache line granularity, and a processing resource coupled with the memory interface and the cache memory. The processing resource is configured to perform a general-purpose compute operation on compute data associated with multiple cache lines of the cache memory. The circuitry is configured to compress the compute data before a write of the compute data via the memory interface to the memory bus, in association with a read of the compute data associated with the multiple cache lines via the memory interface, decompress the compute data, and provide the decompressed compute data to the processing resource.
    Type: Grant
    Filed: February 13, 2023
    Date of Patent: January 9, 2024
    Assignee: Intel Corporation
    Inventors: Abhishek R. Appu, Altug Koker, Joydeep Ray, David Puffer, Prasoonkumar Surti, Lakshminarayanan Striramassarma, Vasanth Ranganathan, Kiran C. Veernapu, Balaji Vembu, Pattabhiraman K
  • Publication number: 20240005443
    Abstract: Embodiments described herein provide techniques to disaggregate an architecture of a system on a chip integrated circuit into multiple distinct chiplets that can be packaged onto a common chassis. In one embodiment, a graphics processing unit or parallel processor is composed from diverse silicon chiplets that are separately manufactured. A chiplet is an at least partially and distinctly packaged integrated circuit that includes distinct units of logic that can be assembled with other chiplets into a larger package. A diverse set of chiplets with different IP core logic can be assembled into a single device.
    Type: Application
    Filed: August 24, 2023
    Publication date: January 4, 2024
    Applicant: Intel Corporation
    Inventors: Naveen Matam, Lance Cheney, Eric Finley, Varghese George, Sanjeev Jahagirdar, Altug Koker, Josh Mastronarde, Iqbal Rajwani, Lakshminarayanan Striramassarma, Melaku Teshome, Vikranth Vemulapalli, Binoj Xavier
  • Publication number: 20230351543
    Abstract: Embodiments described herein include, software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. Embodiment described herein provided techniques to detect zero value elements within a vector or a set of packed data elements output by a processing resource and generate metadata to indicate a location of the zero value elements within the plurality of data elements.
    Type: Application
    Filed: May 2, 2023
    Publication date: November 2, 2023
    Applicant: Intel Corporation
    Inventors: Joydeep Ray, Scott Janus, Varghese George, Subramaniam Maiyuran, Altug Koker, Abhishek Appu, Prasoonkumar Surti, Vasanth Ranganathan, Valentin Andrei, Ashutosh Garg, Yoav Harel, Arthur Hunter, JR., SungYe Kim, Mike Macpherson, Elmoustapha Ould-Ahmed-Vall, William Sadler, Lakshminarayanan Striramassarma, Vikranth Vemulapalli
  • Patent number: 11763416
    Abstract: Embodiments described herein provide techniques to disaggregate an architecture of a system on a chip integrated circuit into multiple distinct chiplets that can be packaged onto a common chassis. In one embodiment, a graphics processing unit or parallel processor is composed from diverse silicon chiplets that are separately manufactured. A chiplet is an at least partially and distinctly packaged integrated circuit that includes distinct units of logic that can be assembled with other chiplets into a larger package. A diverse set of chiplets with different IP core logic can be assembled into a single device.
    Type: Grant
    Filed: October 13, 2021
    Date of Patent: September 19, 2023
    Assignee: Intel Corporation
    Inventors: Naveen Matam, Lance Cheney, Eric Finley, Varghese George, Sanjeev Jahagirdar, Altug Koker, Josh Mastronarde, Iqbal Rajwani, Lakshminarayanan Striramassarma, Melaku Teshome, Vikranth Vemulapalli, Binoj Xavier
  • Patent number: 11755501
    Abstract: An apparatus to facilitate efficient data sharing for graphics data processing operations is disclosed. The apparatus includes a processing resource to generate a stream of instructions, an L1 cache communicably coupled to the processing resource and comprising an on-page detector circuit to determine that a set of memory requests in the stream of instructions access a same memory page; and set a marker in a first request of the set of memory requests; and arbitration circuitry communicably coupled to the L1 cache, the arbitration circuitry to route the set of memory requests to memory comprising the memory page and to, in response to receiving the first request with the marker set, remain with the processing resource to process the set of memory requests.
    Type: Grant
    Filed: March 25, 2021
    Date of Patent: September 12, 2023
    Assignee: INTEL CORPORATION
    Inventors: Joydeep Ray, Altug Koker, Elmoustapha Ould-Ahmed-Vall, Michael Macpherson, Aravindh V. Anantaraman, Vasanth Ranganathan, Lakshminarayanan Striramassarma, Varghese George, Abhishek Appu, Prasoonkumar Surti
  • Patent number: 11756150
    Abstract: Embodiments described herein provide techniques to disaggregate an architecture of a system on a chip integrated circuit into multiple distinct chiplets that can be packaged onto a common chassis. In one embodiment, a graphics processing unit or parallel processor is composed from diverse silicon chiplets that are separately manufactured. A chiplet is an at least partially and distinctly packaged integrated circuit that includes distinct units of logic that can be assembled with other chiplets into a larger package. A diverse set of chiplets with different IP core logic can be assembled into a single device.
    Type: Grant
    Filed: February 17, 2022
    Date of Patent: September 12, 2023
    Assignee: Intel Corporation
    Inventors: Naveen Matam, Lance Cheney, Eric Finley, Varghese George, Sanjeev Jahagirdar, Altug Koker, Josh Mastronarde, Iqbal Rajwani, Lakshminarayanan Striramassarma, Melaku Teshome, Vikranth Vemulapalli, Binoj Xavier
  • Publication number: 20230259458
    Abstract: One embodiment provides circuitry coupled with cache memory and a memory interface, the circuitry to compress compute data at multiple cache line granularity, and a processing resource coupled with the memory interface and the cache memory. The processing resource is configured to perform a general-purpose compute operation on compute data associated with multiple cache lines of the cache memory. The circuitry is configured to compress the compute data before a write of the compute data via the memory interface to the memory bus, in association with a read of the compute data associated with the multiple cache lines via the memory interface, decompress the compute data, and provide the decompressed compute data to the processing resource.
    Type: Application
    Filed: February 13, 2023
    Publication date: August 17, 2023
    Applicant: Intel Corporation
    Inventors: Abhishek R. Appu, Altug Koker, Joydeep Ray, David Puffer, Prasoonkumar Surti, Lakshminarayanan Striramassarma, Vasanth Ranganathan, Kiran C. Veernapu, Balaji Vembu, Pattabhiraman K
  • Publication number: 20230205704
    Abstract: A graphics processor includes multiple levels of memory units, including a memory device and a cache device located near a graphics component. The graphics processor includes distributed compression/decompression, including a module between the cache device and the memory device. The module can perform compression of write data when the write data is moved from the cache device to the memory device, and perform decompression of read data when the read data is moved from the memory device to the cache device. The graphics processor can include a second level of cache with another compression module between the first level of cache and the second level of cache.
    Type: Application
    Filed: December 23, 2021
    Publication date: June 29, 2023
    Inventors: Prasoonkumar SURTI, Vidhya KRISHNAN, Abhishek R. APPU, Karol A. SZERSZEN, Lakshminarayanan STRIRAMASSARMA
  • Patent number: 11676239
    Abstract: Embodiments described herein include, software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. Embodiment described herein provided techniques to skip computational operations for zero filled matrices and sub-matrices. Embodiments additionally provide techniques to maintain data compression through to a processing unit. Embodiments additionally provide an architecture for a sparse aware logic unit.
    Type: Grant
    Filed: June 3, 2021
    Date of Patent: June 13, 2023
    Assignee: Intel Corporation
    Inventors: Joydeep Ray, Scott Janus, Varghese George, Subramaniam Maiyuran, Altug Koker, Abhishek Appu, Prasoonkumar Surti, Vasanth Ranganathan, Andrei Valentin, Ashutosh Garg, Yoav Harel, Arthur Hunter, Jr., SungYe Kim, Mike Macpherson, Elmoustapha Ould-Ahmed-Vall, William Sadler, Lakshminarayanan Striramassarma, Vikranth Vemulapalli
  • Patent number: 11620256
    Abstract: Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache coupled to the processing resources. The cache controller is configured to control cache priority by determining whether default settings or an instruction will control cache operations for the cache.
    Type: Grant
    Filed: April 28, 2022
    Date of Patent: April 4, 2023
    Assignee: Intel Corporation
    Inventors: Altug Koker, Joydeep Ray, Ben Ashbaugh, Jonathan Pearce, Abhishek Appu, Vasanth Ranganathan, Lakshminarayanan Striramassarma, Elmoustapha Ould-Ahmed-Vall, Aravindh Anantaraman, Valentin Andrei, Nicolas Galoppo Von Borries, Varghese George, Yoav Harel, Arthur Hunter, Jr., Brent Insko, Scott Janus, Pattabhiraman K, Mike Macpherson, Subramaniam Maiyuran, Marian Alin Petre, Murali Ramadoss, Shailesh Shah, Kamal Sinha, Prasoonkumar Surti, Vikranth Vemulapalli
  • Patent number: 11593269
    Abstract: In an example, an apparatus comprises a plurality of execution units, and a cache memory communicatively coupled to the plurality of execution units, wherein the cache memory is structured into a plurality of sectors, wherein each sector in the plurality of sectors comprises at least two cache lines. Other embodiments are also disclosed and claimed.
    Type: Grant
    Filed: August 12, 2021
    Date of Patent: February 28, 2023
    Assignee: Intel Corporation
    Inventors: Abhishek R. Appu, Altug Koker, Joydeep Ray, David Puffer, Prasoonkumar Surti, Lakshminarayanan Striramassarma, Vasanth Ranganathan, Kiran C. Veernapu, Balaji Vembu, Pattabhiraman K
  • Patent number: 11586548
    Abstract: In an example, an apparatus comprises a plurality of execution units, and a cache memory communicatively coupled to the plurality of execution units, wherein the cache memory is structured into a plurality of sectors, wherein each sector in the plurality of sectors comprises at least two cache lines. Other embodiments are also disclosed and claimed.
    Type: Grant
    Filed: March 3, 2021
    Date of Patent: February 21, 2023
    Assignee: Intel Corporation
    Inventors: Abhishek R. Appu, Altug Koker, Joydeep Ray, David Puffer, Prasoonkumar Surti, Lakshminarayanan Striramassarma, Vasanth Ranganathan, Kiran C. Veernapu, Balaji Vembu, Pattabhiraman K
  • Publication number: 20230051190
    Abstract: Embodiments are generally directed to data prefetching for graphics data processing. An embodiment of an apparatus includes one or more processors including one or more graphics processing units (GPUs); and a plurality of caches to provide storage for the one or more GPUs, the plurality of caches including at least an L1 cache and an L3 cache, wherein the apparatus to provide intelligent prefetching of data by a prefetcher of a first GPU of the one or more GPUs including measuring a hit rate for the L1 cache; upon determining that the hit rate for the L1 cache is equal to or greater than a threshold value, limiting a prefetch of data to storage in the L3 cache, and upon determining that the hit rate for the L1 cache is less than a threshold value, allowing the prefetch of data to the L1 cache.
    Type: Application
    Filed: July 15, 2022
    Publication date: February 16, 2023
    Applicant: Intel Corporation
    Inventors: Vikranth Vemulapalli, Lakshminarayanan Striramassarma, Mike MacPherson, Aravindh Anantaraman, Ben Ashbaugh, Murali Ramadoss, William B. Sadler, Jonathan Pearce, Scott Janus, Brent Insko, Vasanth Ranganathan, Kamal Sinha, Arthur Hunter, JR., Prasoonkumar Surti, Nicolas Galoppo von Borries, Joydeep Ray, Abhishek R. Appu, ElMoustapha Ould-Ahmed-Vall, Altug Koker, Sungye Kim, Subramaniam Maiyuran, Valentin Andrei
  • Publication number: 20220350751
    Abstract: Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache memory that is coupled to the processing resources. The cache controller is configured to set an initial aging policy using an aging field based on age of cache lines within the cache memory and to determine whether a hint or an instruction to indicate a level of aging has been received. In one embodiment, the cache memory configured to be partitioned into multiple cache regions, wherein the multiple cache regions include a first cache region having a cache eviction policy with a configurable level of data persistence.
    Type: Application
    Filed: July 12, 2022
    Publication date: November 3, 2022
    Applicant: Intel Corporation
    Inventors: Altug Koker, Joydeep Ray, Elmoustapha Ould-Ahmed-Vall, Abhishek Appu, Aravindh Anantaraman, Valentin Andrei, Durgaprasad Bilagi, Varghese George, Brent Insko, Sanjeev Jahagirdar, Scott Janus, Pattabhiraman K, SungYe Kim, Subramaniam Maiyuran, Vasanth Ranganathan, Lakshminarayanan Striramassarma, Xinmin Tian