Patents by Inventor Lakshminarayanan Striramassarma

Lakshminarayanan Striramassarma has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20220114096
    Abstract: Multi-tile Memory Management for Detecting Cross Tile Access, Providing Multi-Tile Inference Scaling with multicasting of data via copy operation, and Providing Page Migration are disclosed herein. In one embodiment, a graphics processor for a multi-tile architecture includes a first graphics processing unit (GPU) having a memory and a memory controller, a second graphics processing unit (GPU) having a memory and a cross-GPU fabric to communicatively couple the first and second GPUs. The memory controller is configured to determine whether frequent cross tile memory accesses occur from the first GPU to the memory of the second GPU in the multi-GPU configuration and to send a message to initiate a data transfer mechanism when frequent cross tile memory accesses occur from the first GPU to the memory of the second GPU.
    Type: Application
    Filed: March 14, 2020
    Publication date: April 14, 2022
    Applicant: Intel Corporation
    Inventors: Lakshminarayanan Striramassarma, Prasoonkumar Surti, Varghese George, Ben Ashbaugh, Aravindh Anantaraman, Valentin Andrei, Abhishek Appu, Nicolas Galoppo Von Borries, Altug Koker, Mike Macpherson, Subramaniam Maiyuran, Nilay Mistry, Elmoustapha Ould-Ahmed-Vall, Selvakumar Panneer, Vasanth Ranganathan, Joydeep Ray, Ankur Shah, Saurabh Tangri
  • Patent number: 11301384
    Abstract: Embodiments described herein provide a general purpose graphics processor comprising a plurality of tiles, each tile of the plurality of tiles comprising at least one execution unit, a local cache, and a cache control unit, and a high bandwidth memory communicatively coupled to the plurality of tiles, wherein the high bandwidth memory is shared between the plurality of tiles. The cache control unit is to implement a partial write management protocol to receive a partial write operation directed to a cache line in the local cache, the partial write operation comprising write data, write the data associated with the partial write operation to the local cache when the cache line is in a modified state, and forward the write data associated with the partial write operation to the high bandwidth memory when the partial write operation triggers a cache miss or when the cache line is in an exclusive state or a shared state. Other embodiments may be described and claimed.
    Type: Grant
    Filed: October 12, 2020
    Date of Patent: April 12, 2022
    Assignee: INTEL CORPORATION
    Inventors: Joydeep Ray, James Valerio, Ben Ashbaugh, Lakshminarayanan Striramassarma
  • Publication number: 20220107914
    Abstract: Embodiments are generally directed to a multi-tile architecture for graphics operations. An embodiment of an apparatus includes a multi-tile architecture for graphics operations including a multi-tile graphics processor, the multi-tile processor includes one or more dies; multiple processor tiles installed on the one or more dies; and a structure to interconnect the processor tiles on the one or more dies, wherein the structure to enable communications between processor tiles the processor tiles.
    Type: Application
    Filed: March 14, 2020
    Publication date: April 7, 2022
    Applicant: Intel Corporation
    Inventors: Altug Koker, Ben Ashbaugh, Scott Janus, Aravindh Anantaraman, Abhishek R. Appu, Niranjan Cooray, Varghese George, Arthur Hunter, Brent E. Insko, Elmoustapha Ould-Ahmed-Vall, Selvakumar Panneer, Vasanth Ranganathan, Joydeep Ray, Kamal Sinha, Lakshminarayanan Striramassarma, Surti Prasoonkumar, Saurabh Tangri
  • Publication number: 20220066931
    Abstract: Embodiments described herein provide techniques to enable the dynamic reconfiguration of memory on a general-purpose graphics processing unit. One embodiment described herein enables dynamic reconfiguration of cache memory bank assignments based on hardware statistics. One embodiment enables for virtual memory address translation using mixed four kilobyte and sixty-four kilobyte pages within the same page table hierarchy and under the same page directory. One embodiment provides for a graphics processor and associated heterogenous processing system having near and far regions of the same level of a cache hierarchy.
    Type: Application
    Filed: March 14, 2020
    Publication date: March 3, 2022
    Applicant: INTEL CORPORATION
    Inventors: JOYDEEP RAY, NIRANJAN COORAY, SUBRAMANIAM MAIYURAN, ALTUG KOKER, PRASOONKUMAR SURTI, VARGHESE GEORGE, VALENTIN ANDREI, ABHISHEK APPU, GUADALUPE GARCIA, PATTABHIRAMAN K, SUNGYE KIM, SANJAY KUMAR, PRATIK MAROLIA, ELMOUSTAPHA OULD-AHMED-VALL, VASANTH RANGANATHAN, WILLIAM SADLER, LAKSHMINARAYANAN STRIRAMASSARMA
  • Patent number: 11263141
    Abstract: In an example, an apparatus comprises a plurality of execution units, and a cache memory communicatively coupled to the plurality of execution units, wherein the cache memory is structured into a plurality of sectors, wherein each sector in the plurality of sectors comprises at least two cache lines. Other embodiments are also disclosed and claimed.
    Type: Grant
    Filed: September 20, 2020
    Date of Patent: March 1, 2022
    Assignee: INTEL CORPORATION
    Inventors: Abhishek R. Appu, Altug Koker, Joydeep Ray, David Puffer, Prasoonkumar Surti, Lakshminarayanan Striramassarma, Vasanth Ranganathan, Kiran C. Veernapu, Balaji Vembu, Pattabhiraman K
  • Publication number: 20220058852
    Abstract: Embodiments are generally directed to multi-tile graphics processor rendering. An embodiment of an apparatus includes a memory for storage of data; and one or more processors including a graphics processing unit (GPU) to process data, wherein the GPU includes a plurality of GPU tiles, wherein, upon geometric data being assigned to each of a plurality of screen tiles, the apparatus is to transfer the geometric data to the plurality of GPU tiles.
    Type: Application
    Filed: October 8, 2021
    Publication date: February 24, 2022
    Applicant: Intel Corporation
    Inventors: Prasoonkumar Surti, Arthur Hunter, Kamal Sinha, Scott Janus, Brent Insko, Vasanth Ranganathan, Lakshminarayanan Striramassarma
  • Publication number: 20220036500
    Abstract: Embodiments described herein provide techniques to disaggregate an architecture of a system on a chip integrated circuit into multiple distinct chiplets that can be packaged onto a common chassis. In one embodiment, a graphics processing unit or parallel processor is composed from diverse silicon chiplets that are separately manufactured. A chiplet is an at least partially and distinctly packaged integrated circuit that includes distinct units of logic that can be assembled with other chiplets into a larger package. A diverse set of chiplets with different IP core logic can be assembled into a single device.
    Type: Application
    Filed: October 13, 2021
    Publication date: February 3, 2022
    Applicant: Intel Corporation
    Inventors: Naveen Matam, Lance Cheney, Eric Finley, Varghese George, Sanjeev Jahagirdar, Altug Koker, Josh Mastronarde, Iqbal Rajwani, Lakshminarayanan Striramassarma, Melaku Teshome, Vikranth Vemulapalli, Binoj Xavier
  • Publication number: 20210374062
    Abstract: In an example, an apparatus comprises a plurality of execution units, and a cache memory communicatively coupled to the plurality of execution units, wherein the cache memory is structured into a plurality of sectors, wherein each sector in the plurality of sectors comprises at least two cache lines. Other embodiments are also disclosed and claimed.
    Type: Application
    Filed: August 12, 2021
    Publication date: December 2, 2021
    Applicant: Intel Corporation
    Inventors: Abhishek R. Appu, Altug Koker, Joydeep Ray, David Puffer, Prasoonkumar Surti, Lakshminarayanan Striramassarma, Vasanth Ranganathan, Kiran C. Veernapu, Balaji Vembu, Pattabhiraman K
  • Publication number: 20210374897
    Abstract: Embodiments described herein include, software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. Embodiment described herein provided techniques to skip computational operations for zero filled matrices and sub-matrices. Embodiments additionally provide techniques to maintain data compression through to a processing unit. Embodiments additionally provide an architecture for a sparse aware logic unit.
    Type: Application
    Filed: June 3, 2021
    Publication date: December 2, 2021
    Applicant: Intel Corporation
    Inventors: Joydeep Ray, Scott Janus, Varghese George, Subramaniam Maiyuran, Altug Koker, Abhishek Appu, Prasoonkumar Surti, Vasanth Ranganathan, Andrei Valentin, Ashutosh Garg, Yoav Harel, Arthur Hunter, JR., SungYe Kim, Mike Macpherson, Elmoustapha Ould-Ahmed-Vall, William Sadler, Lakshminarayanan Striramassarma, Vikranth Vemulapalli
  • Patent number: 11145105
    Abstract: Embodiments are generally directed to multi-tile graphics processor rendering. An embodiment of an apparatus includes a memory for storage of data; and one or more processors including a graphics processing unit (GPU) to process data, wherein the GPU includes a plurality of GPU tiles, wherein, upon geometric data being assigned to each of a plurality of screen tiles, the apparatus is to transfer the geometric data to the plurality of GPU tiles.
    Type: Grant
    Filed: March 15, 2019
    Date of Patent: October 12, 2021
    Assignee: INTEL CORPORATION
    Inventors: Prasoonkumar Surti, Arthur Hunter, Jr., Kamal Sinha, Scott Janus, Brent Insko, Vasanth Ranganathan, Lakshminarayanan Striramassarma
  • Publication number: 20210303481
    Abstract: An apparatus to facilitate efficient data sharing for graphics data processing operations is disclosed. The apparatus includes a processing resource to generate a stream of instructions, an L1 cache communicably coupled to the processing resource and comprising an on-page detector circuit to determine that a set of memory requests in the stream of instructions access a same memory page; and set a marker in a first request of the set of memory requests; and arbitration circuitry communicably coupled to the L1 cache, the arbitration circuitry to route the set of memory requests to memory comprising the memory page and to, in response to receiving the first request with the marker set, remain with the processing resource to process the set of memory requests.
    Type: Application
    Filed: March 25, 2021
    Publication date: September 30, 2021
    Applicant: Intel Corporation
    Inventors: Joydeep Ray, Altug Koker, Elmoustapha Ould-Ahmed-Vall, Michael Macpherson, Aravindh V. Anantaraman, Vasanth Ranganathan, Lakshminarayanan Striramassarma, Varghese George, Abhishek Appu, Prasoonkumar Surti
  • Patent number: 11113784
    Abstract: Embodiments described herein include, software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. Embodiment described herein provided techniques to skip computational operations for zero filled matrices and sub-matrices. Embodiments additionally provide techniques to maintain data compression through to a processing unit. Embodiments additionally provide an architecture for a sparse aware logic unit.
    Type: Grant
    Filed: October 6, 2020
    Date of Patent: September 7, 2021
    Assignee: Intel Corporation
    Inventors: Joydeep Ray, Scott Janus, Varghese George, Subramaniam Maiyuran, Altug Koker, Abhishek Appu, Prasoonkumar Surti, Vasanth Ranganathan, Andrei Valentin, Ashutosh Garg, Yoav Harel, Arthur Hunter, Jr., SungYe Kim, Mike Macpherson, Elmoustapha Ould-Ahmed-Vall, William Sadler, Lakshminarayanan Striramassarma, Vikranth Vemulapalli
  • Publication number: 20210256654
    Abstract: A disaggregated processor package can be configured to accept interchangeable chiplets. Interchangeability is enabled by specifying a standard physical interconnect for chiplets that can enable the chiplet to interface with a fabric or bridge interconnect. Chiplets from different IP designers can conform to the common interconnect, enabling such chiplets to be interchangeable during assembly. The fabric and bridge interconnects logic on the chiplet can then be configured to confirm with the actual interconnect layout of the on-board logic of the chiplet. Additionally, data from chiplets can be transmitted across an inter-chiplet fabric using encapsulation, such that the actual data being transferred is opaque to the fabric, further enable interchangeability of the individual chiplets. With such an interchangeable design, higher or lower density memory can be inserted into memory chiplet slots, while compute or graphics chiplets with a higher or lower core count can be inserted into logic chiplet slots.
    Type: Application
    Filed: January 29, 2021
    Publication date: August 19, 2021
    Applicant: Intel Corporation
    Inventors: Altug Koker, Lance Cheney, Eric Finley, Varghese George, Sanjeev Jahagirdar, Josh Mastronarde, Naveen Matam, Iqbal Rajwani, Lakshminarayanan Striramassarma, Melaku Teshome, Vikranth Vemulapalli, Binoj Xavier
  • Publication number: 20210255957
    Abstract: Embodiments are generally directed to data prefetching for graphics data processing. An embodiment of an apparatus includes one or more processors including one or more graphics processing units (GPUs); and a plurality of caches to provide storage for the one or more GPUs, the plurality of caches including at least an L1 cache and an L3 cache, wherein the apparatus to provide intelligent prefetching of data by a prefetcher of a first GPU of the one or more GPUs including measuring a hit rate for the L1 cache; upon determining that the hit rate for the L1 cache is equal to or greater than a threshold value, limiting a prefetch of data to storage in the L3 cache, and upon determining that the hit rate for the L1 cache is less than a threshold value, allowing the prefetch of data to the L1 cache.
    Type: Application
    Filed: January 28, 2021
    Publication date: August 19, 2021
    Applicant: Intel Corporation
    Inventors: Vikranth Vemulapalli, Lakshminarayanan Striramassarma, Mike MacPherson, Aravindh Anantaraman, Ben Ashbaugh, Murali Ramadoss, William B. Sadler, Jonathan Pearce, Scott Janus, Brent Insko, Vasanth Ranganathan, Kamal Sinha, Arthur Hunter, JR., Prasoonkumar Surti, Nicolas Galoppo von Borries, Joydeep Ray, Abhishek R. Appu, ElMoustapha Ould-Ahmed-Vall, Altug Koker, Sungye Kim, Subramaniam Maiyuran, Valentin Andrei
  • Publication number: 20210191872
    Abstract: In an example, an apparatus comprises a plurality of execution units, and a cache memory communicatively coupled to the plurality of execution units, wherein the cache memory is structured into a plurality of sectors, wherein each sector in the plurality of sectors comprises at least two cache lines. Other embodiments are also disclosed and claimed.
    Type: Application
    Filed: March 3, 2021
    Publication date: June 24, 2021
    Applicant: Intel Corporation
    Inventors: Abhishek R. Appu, Altug Koker, Joydeep Ray, David Puffer, Prasoonkumar Surti, Lakshminarayanan Striramassarma, Vasanth Ranganathan, Kiran C. Veernapu, Balaji Vembu, Pattabhiraman K
  • Publication number: 20210193196
    Abstract: Prior knowledge of access pattern is leveraged to improve energy dissipation for general matrix operations. This improves memory access energy for a multitude of applications such as image processing, deep neural networks, and scientific computing workloads, for example. In some embodiments, prior knowledge of access pattern allows for burst read and/or write operations. As such, burst mode solution can provide energy savings in both READ (RD) and WRITE (WR) operations. For machine learning or inference, the weight values are known ahead in time (e.g., inference operation), and so the unused bytes in the cache line are exploited to store a sparsity map that is used for disabling read from either upper or lower half of the cache line, thus saving dynamic capacitance.
    Type: Application
    Filed: December 23, 2019
    Publication date: June 24, 2021
    Applicant: Intel Corporation
    Inventors: Charles Augustine, Somnath Paul, Turbo Majumder, Iqbal Rajwani, Andrew Lines, Altug Koker, Lakshminarayanan Striramassarma, Muhammad Khellah
  • Publication number: 20210133913
    Abstract: Embodiments described herein provide techniques to disaggregate an architecture of a system on a chip integrated circuit into multiple distinct chiplets that can be packaged onto a common chassis. In one embodiment, a graphics processing unit or parallel processor is composed from diverse silicon chiplets that are separately manufactured. A chiplet is an at least partially packaged integrated circuit that includes distinct units of logic that can be assembled with other chiplets into a larger package. A diverse set of chiplets with different IP core logic can be assembled into a single device.
    Type: Application
    Filed: October 13, 2020
    Publication date: May 6, 2021
    Applicant: Intel Corporation
    Inventors: Naveen Matam, Lance Cheney, Eric Finley, Varghese George, Sanjeev Jahagirdar, Altug Koker, Josh Mastronarde, Iqbal Rajwani, Lakshminarayanan Striramassarma, Melaku Teshome, Vikranth Vemulapalli, Binoj Xavier
  • Publication number: 20210056028
    Abstract: Embodiments described herein provide a general purpose graphics processor comprising a plurality of tiles, each tile of the plurality of tiles comprising at least one execution unit, a local cache, and a cache control unit, and a high bandwidth memory communicatively coupled to the plurality of tiles, wherein the high bandwidth memory is shared between the plurality of tiles. The cache control unit is to implement a partial write management protocol to receive a partial write operation directed to a cache line in the local cache, the partial write operation comprising write data, write the data associated with the partial write operation to the local cache when the cache line is in a modified state, and forward the write data associated with the partial write operation to the high bandwidth memory when the partial write operation triggers a cache miss or when the cache line is in an exclusive state or a shared state. Other embodiments may be described and claimed.
    Type: Application
    Filed: October 12, 2020
    Publication date: February 25, 2021
    Applicant: Intel Corporation
    Inventors: JOYDEEP RAY, James Valerio, Ben Ashbaugh, Lakshminarayanan Striramassarma
  • Publication number: 20210056033
    Abstract: In an example, an apparatus comprises a plurality of execution units, and a cache memory communicatively coupled to the plurality of execution units, wherein the cache memory is structured into a plurality of sectors, wherein each sector in the plurality of sectors comprises at least two cache lines. Other embodiments are also disclosed and claimed.
    Type: Application
    Filed: September 20, 2020
    Publication date: February 25, 2021
    Applicant: Intel Corporation
    Inventors: Abhishek R. Appu, Altug Koker, Joydeep Ray, David Puffer, Prasoonkumar Surti, Lakshminarayanan Striramassarma, Vasanth Ranganathan, Kiran C. Veernapu, Balaji Vembu, Pattabhiraman K
  • Publication number: 20210035258
    Abstract: Embodiments described herein include, software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. Embodiment described herein provided techniques to skip computational operations for zero filled matrices and sub-matrices. Embodiments additionally provide techniques to maintain data compression through to a processing unit. Embodiments additionally provide an architecture for a sparse aware logic unit.
    Type: Application
    Filed: October 6, 2020
    Publication date: February 4, 2021
    Applicant: Intel Corporation
    Inventors: Joydeep Ray, Scott Janus, Varghese George, Subramaniam Maiyuran, Altug Koker, Abhishek Appu, Prasoonkumar Surti, Vasanth Ranganathan, Andrei Valentin, Ashutosh Garg, Yoav Harel, Arthur Hunter, JR., SungYe Kim, Mike Macpherson, Elmoustapha Ould-Ahmed-Vall, William Sadler, Lakshminarayanan Striramassarma, Vikranth Vemulapalli