Patents by Inventor Lakshminarayanan Striramassarma
Lakshminarayanan Striramassarma has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20250173308Abstract: Embodiments are generally directed to graphics processor data access and sharing. An embodiment of an apparatus includes a circuit element to produce a result in processing of an application; a load-store unit to receive the result and generate pre-fetch information for a cache utilizing the result; and a prefetch generator to produce prefetch addresses based at least in part on the pre-fetch information; wherein the load-store unit is to receive software assistance for prefetching, and wherein generation of the pre-fetch information is based at least in part on the software assistance.Type: ApplicationFiled: November 25, 2024Publication date: May 29, 2025Applicant: Intel CorporationInventors: Altug Koker, Varghese George, Aravindh Anantaraman, Valentin Andrei, Abhishek R. Appu, Niranjan Cooray, Nicolas Galoppo Von Borries, Mike MacPherson, Subramaniam Maiyuran, ElMoustapha Ould-Ahmed-Vall, David Puffer, Vasanth Ranganathan, Joydeep Ray, Ankur N. Shah, Lakshminarayanan Striramassarma, Prasoonkumar Surti, Saurabh Tangri
-
Patent number: 12293431Abstract: Embodiments described herein include, software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. Embodiment described herein provided techniques to detect zero value elements within a vector or a set of packed data elements output by a processing resource and generate metadata to indicate a location of the zero value elements within the plurality of data elements.Type: GrantFiled: May 2, 2023Date of Patent: May 6, 2025Assignee: Intel CorporationInventors: Joydeep Ray, Scott Janus, Varghese George, Subramaniam Maiyuran, Altug Koker, Abhishek Appu, Prasoonkumar Surti, Vasanth Ranganathan, Valentin Andrei, Ashutosh Garg, Yoav Harel, Arthur Hunter, Jr., SungYe Kim, Mike Macpherson, Elmoustapha Ould-Ahmed-Vall, William Sadler, Lakshminarayanan Striramassarma, Vikranth Vemulapalli
-
Publication number: 20250117356Abstract: Methods and apparatus relating to techniques for multi-tile memory management. In an example, a graphics processor includes an interposer, a first chiplet coupled with the interposer, the first chiplet including a graphics processing resource and an interconnect network coupled with the graphics processing resource, cache circuitry coupled with the graphics processing resource via the interconnect network, and a second chiplet coupled with the first chiplet via the interposer, the second chiplet including a memory-side cache and a memory controller coupled with the memory-side cache. The memory controller is configured to enable access to a high-bandwidth memory (HBM) device, the memory-side cache is configured to cache data associated with a memory access performed via the memory controller, and the cache circuitry is logically positioned between the graphics processing resource and a chiplet interface.Type: ApplicationFiled: October 15, 2024Publication date: April 10, 2025Applicant: Intel CorporationInventors: Abhishek R. Appu, Altug Koker, Aravindh Anantaraman, Elmoustapha Ould-Ahmed-Vall, Valentin Andrei, Nicolas Galoppo Von Borries, Varghese George, Mike Macpherson, Subramaniam Maiyuran, Joydeep Ray, Lakshminarayanan Striramassarma, Scott Janus, Brent Insko, Vasanth Ranganathan, Kamal Sinha, Arthur Hunter, Prasoonkumar Surti, David Puffer, James Valerio, Ankur N. Shah
-
Publication number: 20250103546Abstract: Embodiments are generally directed to cache structure and utilization. An embodiment of an apparatus includes one or more processors including a graphics processor; a memory for storage of data for processing by the one or more processors; and a cache to cache data from the memory; wherein the apparatus is to provide for dynamic overfetching of cache lines for the cache, including receiving a read request and accessing the cache for the requested data, and upon a miss in the cache, overfetching data from memory or a higher level cache in addition to fetching the requested data, wherein the overfetching of data is based at least in part on a current overfetch boundary, and provides for data is to be prefetched extending to the current overfetch boundary.Type: ApplicationFiled: October 4, 2024Publication date: March 27, 2025Applicant: Intel CorporationInventors: Altug Koker, Lakshminarayanan Striramassarma, Aravindh Anantaraman, Valentin Andrei, Abhishek R. Appu, Sean Coleman, Varghese George, Pattabhiraman K, Mike MacPherson, Subramaniam Maiyuran, ElMoustapha Ould-Ahmed-Vall, Vasanth Ranganathan, Joydeep Ray, Jayakrishna P S, Prasoonkumar Surti
-
Publication number: 20250103547Abstract: Embodiments described herein include software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. One embodiment provides techniques to optimize training and inference on a systolic array when using sparse data. One embodiment provides techniques to use decompression information when performing sparse compute operations. One embodiment enables the disaggregation of special function compute arrays via a shared reg file. One embodiment enables packed data compress and expand operations on a GPGPU. One embodiment provides techniques to exploit block sparsity within the cache hierarchy of a GPGPU.Type: ApplicationFiled: October 4, 2024Publication date: March 27, 2025Applicant: INTEL CORPORATIONInventors: Prasoonkumar Surti, Subramaniam Maiyuran, Valentin Andrei, Abhishek Appu, Varghese George, Altug Koker, Mike Macpherson, Elmoustapha Ould-Ahmed-Vall, Vasanth Ranganathan, Joydeep Ray, Lakshminarayanan Striramassarma, SungYe Kim
-
Publication number: 20250103511Abstract: Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache memory that is coupled to the processing resources. The cache controller is configured to set an initial aging policy using an aging field based on age of cache lines within the cache memory and to determine whether a hint or an instruction to indicate a level of aging has been received.Type: ApplicationFiled: October 3, 2024Publication date: March 27, 2025Applicant: Intel CorporationInventors: Altug Koker, Joydeep Ray, Elmoustapha Ould-Ahmed-Vall, Abhishek Appu, Aravindh Anantaraman, Valentin Andrei, Durgaprasad Bilagi, Varghese George, Brent Insko, Sanjeev Jahagirdar, Scott Janus, Pattabhiraman K, SungYe Kim, Subramaniam Maiyuran, Vasanth Ranganathan, Lakshminarayanan Striramassarma, Xinmin Tian
-
Publication number: 20250103548Abstract: Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache coupled to the processing resources. The cache controller is configured to control cache priority by determining whether default settings or an instruction will control cache operations for the cache.Type: ApplicationFiled: November 14, 2024Publication date: March 27, 2025Applicant: Intel CorporationInventors: Altug Koker, Joydeep Ray, Ben Ashbaugh, Jonathan Pearce, Abhishek Appu, Vasanth Ranganathan, Lakshminarayanan Striramassarma, Elmoustapha Ould-Ahmed-Vall, Aravindh Anantaraman, Valentin Andrei, Nicolas Galoppo Von Borries, Varghese George, Yoav Harel, Arthur Hunter, JR., Brent Insko, Scott Janus, Pattabhiraman K, Mike Macpherson, Subramaniam Maiyuran, Marian Alin Petre, Murali Ramadoss, Shailesh Shah, Kamal Sinha, Prasoonkumar Surti, Vikranth Vemulapalli
-
Publication number: 20250104179Abstract: A disaggregated processor package can be configured to accept interchangeable chiplets. Interchangeability is enabled by specifying a standard physical interconnect for chiplets that can enable the chiplet to interface with a fabric or bridge interconnect. Chiplets from different IP designers can conform to the common interconnect, enabling such chiplets to be interchangeable during assembly. The fabric and bridge interconnects logic on the chiplet can then be configured to confirm with the actual interconnect layout of the on-board logic of the chiplet. Additionally, data from chiplets can be transmitted across an inter-chiplet fabric using encapsulation, such that the actual data being transferred is opaque to the fabric, further enable interchangeability of the individual chiplets. With such an interchangeable design, cache or DRAM memory can be inserted into memory chiplet slots, while compute or graphics chiplets with a higher or lower core count can be inserted into logic chiplet slots.Type: ApplicationFiled: October 3, 2024Publication date: March 27, 2025Applicant: Intel CorporationInventors: Altug Koker, Lance Cheney, Eric Finley, Varghese George, Sanjeev Jahagirdar, Josh Mastronarde, Naveen Matam, Iqbal Rajwani, Lakshminarayanan Striramassarma, Melaku Teshome, Vikranth Vemulapalli, Binoj Xavier
-
Publication number: 20250061535Abstract: Embodiments described herein provide techniques to disaggregate an architecture of a system on a chip integrated circuit into multiple distinct chiplets that can be packaged onto a common chassis. In one embodiment, a graphics processing unit or parallel processor is composed from diverse silicon chiplets that are separately manufactured. A chiplet is an at least partially and distinctly packaged integrated circuit that includes distinct units of logic that can be assembled with other chiplets into a larger package. A diverse set of chiplets with different IP core logic can be assembled into a single device.Type: ApplicationFiled: August 30, 2024Publication date: February 20, 2025Applicant: Intel CorporationInventors: Naveen Matam, Lance Cheney, Eric Finley, Varghese George, Sanjeev Jahagirdar, Altug Koker, Josh Mastronarde, Iqbal Rajwani, Lakshminarayanan Striramassarma, Melaku Teshome, Vikranth Vemulapalli, Binoj Xavier
-
Publication number: 20250046001Abstract: Embodiments are generally directed to multi-tile graphics processor rendering. An embodiment of an apparatus includes a memory for storage of data; and one or more processors including a graphics processing unit (GPU) to process data, wherein the GPU includes a plurality of GPU tiles, wherein, upon geometric data being assigned to each of a plurality of screen tiles, the apparatus is to transfer the geometric data to the plurality of GPU tiles.Type: ApplicationFiled: August 12, 2024Publication date: February 6, 2025Applicant: Intel CorporationInventors: Prasoonkumar Surti, Arthur Hunter, Kamal Sinha, Scott Janus, Brent Insko, Vasanth Ranganathan, Lakshminarayanan Striramassarma
-
Patent number: 12210477Abstract: Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache coupled to the processing resources. The cache controller is configured to control cache priority by determining whether default settings or an instruction will control cache operations for the cache.Type: GrantFiled: March 14, 2020Date of Patent: January 28, 2025Assignee: Intel CorporationInventors: Altug Koker, Joydeep Ray, Ben Ashbaugh, Jonathan Pearce, Abhishek Appu, Vasanth Ranganathan, Lakshminarayanan Striramassarma, Elmoustapha Ould-Ahmed-Vall, Aravindh Anantaraman, Valentin Andrei, Nicolas Galoppo Von Borries, Varghese George, Yoav Harel, Arthur Hunter, Jr., Brent Insko, Scott Janus, Pattabhiraman K, Mike Macpherson, Subramaniam Maiyuran, Marian Alin Petre, Murali Ramadoss, Shailesh Shah, Kamal Sinha, Prasoonkumar Surti, Vikranth Vemulapalli
-
Patent number: 12204487Abstract: Embodiments are generally directed to graphics processor data access and sharing. An embodiment of an apparatus includes a circuit element to produce a result in processing of an application; a load-store unit to receive the result and generate pre-fetch information for a cache utilizing the result; and a prefetch generator to produce prefetch addresses based at least in part on the pre-fetch information; wherein the load-store unit is to receive software assistance for prefetching, and wherein generation of the pre-fetch information is based at least in part on the software assistance.Type: GrantFiled: January 17, 2024Date of Patent: January 21, 2025Assignee: INTEL CORPORATIONInventors: Altug Koker, Varghese George, Aravindh Anantaraman, Valentin Andrei, Abhishek R. Appu, Niranjan Cooray, Nicolas Galoppo Von Borries, Mike MacPherson, Subramaniam Maiyuran, ElMoustapha Ould-Ahmed-Vall, David Puffer, Vasanth Ranganathan, Joydeep Ray, Ankur N. Shah, Lakshminarayanan Striramassarma, Prasoonkumar Surti, Saurabh Tangri
-
Patent number: 12182062Abstract: Methods and apparatus relating to techniques for multi-tile memory management. In an example, a graphics processor includes an interposer, a first chiplet coupled with the interposer, the first chiplet including a graphics processing resource and an interconnect network coupled with the graphics processing resource, cache circuitry coupled with the graphics processing resource via the interconnect network, and a second chiplet coupled with the first chiplet via the interposer, the second chiplet including a memory-side cache and a memory controller coupled with the memory-side cache. The memory controller is configured to enable access to a high-bandwidth memory (HBM) device, the memory-side cache is configured to cache data associated with a memory access performed via the memory controller, and the cache circuitry is logically positioned between the graphics processing resource and a chiplet interface.Type: GrantFiled: October 7, 2022Date of Patent: December 31, 2024Assignee: Intel CorporationInventors: Abhishek R. Appu, Altug Koker, Aravindh Anantaraman, Elmoustapha Ould-Ahmed-Vall, Valentin Andrei, Nicolas Galoppo Von Borries, Varghese George, Mike Macpherson, Subramaniam Maiyuran, Joydeep Ray, Lakshminarayanan Striramassarma, Scott Janus, Brent Insko, Vasanth Ranganathan, Kamal Sinha, Arthur Hunter, Prasoonkumar Surti, David Puffer, James Valerio, Ankur N. Shah
-
Patent number: 12182035Abstract: Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache memory that is coupled to the processing resources. The cache controller is configured to set an initial aging policy using an aging field based on age of cache lines within the cache memory and to determine whether a hint or an instruction to indicate a level of aging has been received.Type: GrantFiled: March 14, 2020Date of Patent: December 31, 2024Assignee: Intel CorporationInventors: Altug Koker, Joydeep Ray, Elmoustapha Ould-Ahmed-Vall, Abhishek Appu, Aravindh Anantaraman, Valentin Andrei, Durgaprasad Bilagi, Varghese George, Brent Insko, Sanjeev Jahagirdar, Scott Janus, Pattabhiraman K, SungYe Kim, Subramaniam Maiyuran, Vasanth Ranganathan, Lakshminarayanan Striramassarma, Xinmin Tian
-
Publication number: 20240411717Abstract: Embodiments are generally directed to cache structure and utilization. An embodiment of an apparatus includes one or more processors including a graphics processor; a memory for storage of data for processing by the one or more processors; and a cache to cache data from the memory; wherein the apparatus is to provide for dynamic overfetching of cache lines for the cache, including receiving a read request and accessing the cache for the requested data, and upon a miss in the cache, overfetching data from memory or a higher level cache in addition to fetching the requested data, wherein the overfetching of data is based at least in part on a current overfetch boundary, and provides for data is to be prefetched extending to the current overfetch boundary.Type: ApplicationFiled: June 10, 2024Publication date: December 12, 2024Applicant: Intel CorporationInventors: Altug Koker, Lakshminarayanan Striramassarma, Aravindh Anantaraman, Valentin Andrei, Abhishek R. Appu, Sean Coleman, Varghese George, Pattabhiraman K, Mike MacPherson, Subramaniam Maiyuran, ElMoustapha Ould-Ahmed-Vall, Vasanth Ranganathan, Joydeep Ray, Jayakrishna P S, Prasoonkumar Surti
-
Patent number: 12153541Abstract: Embodiments are generally directed to cache structure and utilization. An embodiment of an apparatus includes one or more processors including a graphics processor; a memory for storage of data for processing by the one or more processors; and a cache to cache data from the memory; wherein the apparatus is to provide for dynamic overfetching of cache lines for the cache, including receiving a read request and accessing the cache for the requested data, and upon a miss in the cache, overfetching data from memory or a higher level cache in addition to fetching the requested data, wherein the overfetching of data is based at least in part on a current overfetch boundary, and provides for data is to be prefetched extending to the current overfetch boundary.Type: GrantFiled: February 17, 2022Date of Patent: November 26, 2024Assignee: INTEL CORPORATIONInventors: Altug Koker, Lakshminarayanan Striramassarma, Aravindh Anantaraman, Valentin Andrei, Abhishek R. Appu, Sean Coleman, Varghese George, Pattabhiraman K, Mike MacPherson, Subramaniam Maiyuran, Elmoustapha Ould-Ahmed-Vall, Vasanth Ranganathan, Joydeep Ray, Jayakrishna P S, Prasoonkumar Surti
-
Publication number: 20240385975Abstract: An apparatus to facilitate efficient data sharing for graphics data processing operations is disclosed. The apparatus includes a processing resource to generate a stream of instructions, an L1 cache communicably coupled to the processing resource and comprising an on-page detector circuit to determine that a set of memory requests in the stream of instructions access a same memory page; and set a marker in a first request of the set of memory requests; and arbitration circuitry communicably coupled to the L1 cache, the arbitration circuitry to route the set of memory requests to memory comprising the memory page and to, in response to receiving the first request with the marker set, remain with the processing resource to process the set of memory requests.Type: ApplicationFiled: May 22, 2024Publication date: November 21, 2024Applicant: Intel CorporationInventors: Joydeep Ray, Altug Koker, Elmoustapha Ould-Ahmed-Vall, Michael Macpherson, Aravindh V. Anantaraman, Vasanth Ranganathan, Lakshminarayanan Striramassarma, Varghese George, Abhishek Appu, Prasoonkumar Surti
-
Patent number: 12141890Abstract: A disaggregated processor package can be configured to accept interchangeable chiplets. Interchangeability is enabled by specifying a standard physical interconnect for chiplets that can enable the chiplet to interface with a fabric or bridge interconnect. Chiplets from different IP designers can conform to the common interconnect, enabling such chiplets to be interchangeable during assembly. The fabric and bridge interconnects logic on the chiplet can then be configured to confirm with the actual interconnect layout of the on-board logic of the chiplet. Additionally, data from chiplets can be transmitted across an inter-chiplet fabric using encapsulation, such that the actual data being transferred is opaque to the fabric, further enable interchangeability of the individual chiplets. With such an interchangeable design, cache or DRAM memory can be inserted into memory chiplet slots, while compute or graphics chiplets with a higher or lower core count can be inserted into logic chiplet slots.Type: GrantFiled: March 2, 2022Date of Patent: November 12, 2024Assignee: Intel CorporationInventors: Altug Koker, Lance Cheney, Eric Finley, Varghese George, Sanjeev Jahagirdar, Josh Mastronarde, Naveen Matam, Iqbal Rajwani, Lakshminarayanan Striramassarma, Melaku Teshome, Vikranth Vemulapalli, Binoj Xavier
-
Patent number: 12141094Abstract: Embodiments described herein include software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. One embodiment provides techniques to optimize training and inference on a systolic array when using sparse data. One embodiment provides techniques to use decompression information when performing sparse compute operations. One embodiment enables the disaggregation of special function compute arrays via a shared reg file. One embodiment enables packed data compress and expand operations on a GPGPU. One embodiment provides techniques to exploit block sparsity within the cache hierarchy of a GPGPU.Type: GrantFiled: March 14, 2020Date of Patent: November 12, 2024Assignee: Intel CorporationInventors: Prasoonkumar Surti, Subramaniam Maiyuran, Valentin Andrei, Abhishek Appu, Varghese George, Altug Koker, Mike Macpherson, Elmoustapha Ould-Ahmed-Vall, Vasanth Ranganathan, Joydeep Ray, Lakshminarayanan Striramassarma, SungYe Kim
-
Patent number: 12124383Abstract: Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache memory that is coupled to the processing resources. The cache controller is configured to set an initial aging policy using an aging field based on age of cache lines within the cache memory and to determine whether a hint or an instruction to indicate a level of aging has been received. In one embodiment, the cache memory configured to be partitioned into multiple cache regions, wherein the multiple cache regions include a first cache region having a cache eviction policy with a configurable level of data persistence.Type: GrantFiled: July 12, 2022Date of Patent: October 22, 2024Assignee: Intel CorporationInventors: Altug Koker, Joydeep Ray, Elmoustapha Ould-Ahmed-Vall, Abhishek Appu, Aravindh Anantaraman, Valentin Andrei, Durgaprasad Bilagi, Varghese George, Brent Insko, Sanjeev Jahagirdar, Scott Janus, Pattabhiraman K, SungYe Kim, Subramaniam Maiyuran, Vasanth Ranganathan, Lakshminarayanan Striramassarma, Xinmin Tian