Patents by Inventor Varghese George
Varghese George has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20250028675Abstract: Embodiments described herein include software, firmware, and hardware that provides techniques to enable deterministic scheduling across multiple general-purpose graphics processing units. One embodiment provides a multi-GPU architecture with uniform latency. One embodiment provides techniques to distribute memory output based on memory chip thermals. One embodiment provides techniques to enable thermally aware workload scheduling. One embodiment provides techniques to enable end to end contracts for workload scheduling on multiple GPUs.Type: ApplicationFiled: August 1, 2024Publication date: January 23, 2025Applicant: Intel CorporationInventors: JOYDEEP RAY, SELVAKUMAR PANNEER, SAURABH TANGRI, BEN ASHBAUGH, SCOTT JANUS, ABHISHEK APPU, VARGHESE GEORGE, RAVISHANKAR IYER, NILESH JAIN, PATTABHIRAMAN K, ALTUG KOKER, MIKE MACPHERSON, JOSH MASTRONARDE, ELMOUSTAPHA OULD-AHMED-VALL, JAYAKRISHNA P. S, ERIC SAMSON
-
Patent number: 12204487Abstract: Embodiments are generally directed to graphics processor data access and sharing. An embodiment of an apparatus includes a circuit element to produce a result in processing of an application; a load-store unit to receive the result and generate pre-fetch information for a cache utilizing the result; and a prefetch generator to produce prefetch addresses based at least in part on the pre-fetch information; wherein the load-store unit is to receive software assistance for prefetching, and wherein generation of the pre-fetch information is based at least in part on the software assistance.Type: GrantFiled: January 17, 2024Date of Patent: January 21, 2025Assignee: INTEL CORPORATIONInventors: Altug Koker, Varghese George, Aravindh Anantaraman, Valentin Andrei, Abhishek R. Appu, Niranjan Cooray, Nicolas Galoppo Von Borries, Mike MacPherson, Subramaniam Maiyuran, ElMoustapha Ould-Ahmed-Vall, David Puffer, Vasanth Ranganathan, Joydeep Ray, Ankur N. Shah, Lakshminarayanan Striramassarma, Prasoonkumar Surti, Saurabh Tangri
-
Patent number: 12198222Abstract: Embodiments described herein include software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. One embodiment provides for data aware sparsity via compressed bitstreams. One embodiment provides for block sparse dot product instructions. One embodiment provides for a depth-wise adapter for a systolic array.Type: GrantFiled: December 7, 2023Date of Patent: January 14, 2025Assignee: Intel CorporationInventors: Abhishek Appu, Subramaniam Maiyuran, Mike Macpherson, Fangwen Fu, Jiasheng Chen, Varghese George, Vasanth Ranganathan, Ashutosh Garg, Joydeep Ray
-
Publication number: 20250011380Abstract: Provided herein are methods of killing or reducing the number of naturally-occurring and/or treatment-induced senescent cells and diseased cells in a subject in need thereof, decreasing the accumulation of naturally-occurring and/or treatment-induced senescent cells and diseased cells in a subject in need thereof, that include administering to the subject a therapeutically effective amount of one or more common gamma-chain family cytokine receptor activating agent(s) and/or one or more agent(s) that result(s) in a decrease in the activation of a TGF-? receptor.Type: ApplicationFiled: July 1, 2024Publication date: January 9, 2025Applicant: HCW Biologics, Inc.Inventors: Hing C. Wong, Xiaoyun Zhu, Bai Liu, Pallavi Chaturvedi, Varghese George
-
Publication number: 20250004981Abstract: Methods and apparatus relating to techniques for multi-tile memory management. In an example, a graphics processor includes an interposer, a first chiplet coupled with the interposer, the first chiplet including a graphics processing resource and an interconnect network coupled with the graphics processing resource, cache circuitry coupled with the graphics processing resource via the interconnect network, and a second chiplet coupled with the first chiplet via the interposer, the second chiplet including a memory-side cache and a memory controller coupled with the memory-side cache. The memory controller is configured to enable access to a high-bandwidth memory (HBM) device, the memory-side cache is configured to cache data associated with a memory access performed via the memory controller, and the cache circuitry is logically positioned between the graphics processing resource and a chiplet interface.Type: ApplicationFiled: August 2, 2024Publication date: January 2, 2025Applicant: Intel CorporationInventors: Abhishek R. Appu, Altug Koker, Aravindh Anantaraman, Elmoustapha Ould-Ahmed-Vall, Valentin Andrei, Nicolas Galoppo Von Borries, Varghese George, Mike Macpherson, Subramaniam Maiyuran, Joydeep Ray, Lakshminarayana Striramassarma, Scott Janus, Brent Insko, Vasanth Ranganathan, Kamal Sinha, Arthur Hunter, Prasoonkumar Surti, David Puffer, James Valerio, Ankur N. Shah
-
Patent number: 12182062Abstract: Methods and apparatus relating to techniques for multi-tile memory management. In an example, a graphics processor includes an interposer, a first chiplet coupled with the interposer, the first chiplet including a graphics processing resource and an interconnect network coupled with the graphics processing resource, cache circuitry coupled with the graphics processing resource via the interconnect network, and a second chiplet coupled with the first chiplet via the interposer, the second chiplet including a memory-side cache and a memory controller coupled with the memory-side cache. The memory controller is configured to enable access to a high-bandwidth memory (HBM) device, the memory-side cache is configured to cache data associated with a memory access performed via the memory controller, and the cache circuitry is logically positioned between the graphics processing resource and a chiplet interface.Type: GrantFiled: October 7, 2022Date of Patent: December 31, 2024Assignee: Intel CorporationInventors: Abhishek R. Appu, Altug Koker, Aravindh Anantaraman, Elmoustapha Ould-Ahmed-Vall, Valentin Andrei, Nicolas Galoppo Von Borries, Varghese George, Mike Macpherson, Subramaniam Maiyuran, Joydeep Ray, Lakshminarayanan Striramassarma, Scott Janus, Brent Insko, Vasanth Ranganathan, Kamal Sinha, Arthur Hunter, Prasoonkumar Surti, David Puffer, James Valerio, Ankur N. Shah
-
Patent number: 12182035Abstract: Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache memory that is coupled to the processing resources. The cache controller is configured to set an initial aging policy using an aging field based on age of cache lines within the cache memory and to determine whether a hint or an instruction to indicate a level of aging has been received.Type: GrantFiled: March 14, 2020Date of Patent: December 31, 2024Assignee: Intel CorporationInventors: Altug Koker, Joydeep Ray, Elmoustapha Ould-Ahmed-Vall, Abhishek Appu, Aravindh Anantaraman, Valentin Andrei, Durgaprasad Bilagi, Varghese George, Brent Insko, Sanjeev Jahagirdar, Scott Janus, Pattabhiraman K, SungYe Kim, Subramaniam Maiyuran, Vasanth Ranganathan, Lakshminarayanan Striramassarma, Xinmin Tian
-
Publication number: 20240427847Abstract: Described herein is a graphics processor including a plurality of processing clusters coupled with a host interface, each processing cluster comprising a plurality of multiprocessors, the plurality of multiprocessors interconnected via a data interconnect, and each multiprocessor comprising sparse matrix multiply acceleration hardware including a systolic processing array with feedback inputs.Type: ApplicationFiled: June 27, 2024Publication date: December 26, 2024Applicant: Intel CorporationInventors: SUBRAMANIAM MAIYURAN, JORGE PARRA, SUPRATIM PAL, ASHUTOSH GARG, SHUBRA MARWAHA, CHANDRA GURRAM, DARIN STARKEY, DURGESH BORKAR, VARGHESE GEORGE
-
Patent number: 12174783Abstract: A processing apparatus includes a processing resource including a general-purpose parallel processing engine and a matrix accelerator. The matrix accelerator includes first circuitry to receive a command to perform operations associated with an instruction, second circuitry to configure the matrix accelerator according to a physical depth of a systolic array within the matrix accelerator and a logical depth associated with the instruction, third circuitry to read operands for the instruction from a register file associated with the systolic array, fourth circuitry to perform operations for the instruction via one or more passes through one or more physical pipeline stages of the systolic array based on a configuration performed by the second circuitry, and fifth circuitry to write output of the operations to the register file associated with the systolic array.Type: GrantFiled: June 24, 2021Date of Patent: December 24, 2024Assignee: Intel CorporationInventors: Jorge Parra, Wei-yu Chen, Kaiyu Chen, Varghese George, Junjie Gu, Chandra Gurram, Guei-Yuan Lueh, Stephen Junkins, Subramaniam Maiyuran, Supratim Pal
-
Publication number: 20240411717Abstract: Embodiments are generally directed to cache structure and utilization. An embodiment of an apparatus includes one or more processors including a graphics processor; a memory for storage of data for processing by the one or more processors; and a cache to cache data from the memory; wherein the apparatus is to provide for dynamic overfetching of cache lines for the cache, including receiving a read request and accessing the cache for the requested data, and upon a miss in the cache, overfetching data from memory or a higher level cache in addition to fetching the requested data, wherein the overfetching of data is based at least in part on a current overfetch boundary, and provides for data is to be prefetched extending to the current overfetch boundary.Type: ApplicationFiled: June 10, 2024Publication date: December 12, 2024Applicant: Intel CorporationInventors: Altug Koker, Lakshminarayanan Striramassarma, Aravindh Anantaraman, Valentin Andrei, Abhishek R. Appu, Sean Coleman, Varghese George, Pattabhiraman K, Mike MacPherson, Subramaniam Maiyuran, ElMoustapha Ould-Ahmed-Vall, Vasanth Ranganathan, Joydeep Ray, Jayakrishna P S, Prasoonkumar Surti
-
Publication number: 20240403259Abstract: Methods and apparatus relating to techniques for data compression. In an example, an apparatus comprises a processor receive a data compression instruction for a memory segment; and in response to the data compression instruction, compress a sequence of identical memory values in response to a determination that the sequence of identical memory values has a length which exceeds a threshold. Other embodiments are also disclosed and claimed.Type: ApplicationFiled: July 22, 2024Publication date: December 5, 2024Applicant: Intel CorporationInventors: Abhishek R. Appu, Altug Koker, Aravindh Anantaraman, Elmoustapha Ould-Ahmed-Vall, Joydeep Ray, Mike MacPherson, Valentin Andrei, Nicolas Galoppo Von Borries, Varghese George, Subramaniam Maiyuran, Vasanth Ranganathan, Jayakrishna P S, Pattabhiraman K, Sudhakar Kamma
-
Patent number: 12153541Abstract: Embodiments are generally directed to cache structure and utilization. An embodiment of an apparatus includes one or more processors including a graphics processor; a memory for storage of data for processing by the one or more processors; and a cache to cache data from the memory; wherein the apparatus is to provide for dynamic overfetching of cache lines for the cache, including receiving a read request and accessing the cache for the requested data, and upon a miss in the cache, overfetching data from memory or a higher level cache in addition to fetching the requested data, wherein the overfetching of data is based at least in part on a current overfetch boundary, and provides for data is to be prefetched extending to the current overfetch boundary.Type: GrantFiled: February 17, 2022Date of Patent: November 26, 2024Assignee: INTEL CORPORATIONInventors: Altug Koker, Lakshminarayanan Striramassarma, Aravindh Anantaraman, Valentin Andrei, Abhishek R. Appu, Sean Coleman, Varghese George, Pattabhiraman K, Mike MacPherson, Subramaniam Maiyuran, Elmoustapha Ould-Ahmed-Vall, Vasanth Ranganathan, Joydeep Ray, Jayakrishna P S, Prasoonkumar Surti
-
Publication number: 20240385975Abstract: An apparatus to facilitate efficient data sharing for graphics data processing operations is disclosed. The apparatus includes a processing resource to generate a stream of instructions, an L1 cache communicably coupled to the processing resource and comprising an on-page detector circuit to determine that a set of memory requests in the stream of instructions access a same memory page; and set a marker in a first request of the set of memory requests; and arbitration circuitry communicably coupled to the L1 cache, the arbitration circuitry to route the set of memory requests to memory comprising the memory page and to, in response to receiving the first request with the marker set, remain with the processing resource to process the set of memory requests.Type: ApplicationFiled: May 22, 2024Publication date: November 21, 2024Applicant: Intel CorporationInventors: Joydeep Ray, Altug Koker, Elmoustapha Ould-Ahmed-Vall, Michael Macpherson, Aravindh V. Anantaraman, Vasanth Ranganathan, Lakshminarayanan Striramassarma, Varghese George, Abhishek Appu, Prasoonkumar Surti
-
Patent number: 12141094Abstract: Embodiments described herein include software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. One embodiment provides techniques to optimize training and inference on a systolic array when using sparse data. One embodiment provides techniques to use decompression information when performing sparse compute operations. One embodiment enables the disaggregation of special function compute arrays via a shared reg file. One embodiment enables packed data compress and expand operations on a GPGPU. One embodiment provides techniques to exploit block sparsity within the cache hierarchy of a GPGPU.Type: GrantFiled: March 14, 2020Date of Patent: November 12, 2024Assignee: Intel CorporationInventors: Prasoonkumar Surti, Subramaniam Maiyuran, Valentin Andrei, Abhishek Appu, Varghese George, Altug Koker, Mike Macpherson, Elmoustapha Ould-Ahmed-Vall, Vasanth Ranganathan, Joydeep Ray, Lakshminarayanan Striramassarma, SungYe Kim
-
Patent number: 12141890Abstract: A disaggregated processor package can be configured to accept interchangeable chiplets. Interchangeability is enabled by specifying a standard physical interconnect for chiplets that can enable the chiplet to interface with a fabric or bridge interconnect. Chiplets from different IP designers can conform to the common interconnect, enabling such chiplets to be interchangeable during assembly. The fabric and bridge interconnects logic on the chiplet can then be configured to confirm with the actual interconnect layout of the on-board logic of the chiplet. Additionally, data from chiplets can be transmitted across an inter-chiplet fabric using encapsulation, such that the actual data being transferred is opaque to the fabric, further enable interchangeability of the individual chiplets. With such an interchangeable design, cache or DRAM memory can be inserted into memory chiplet slots, while compute or graphics chiplets with a higher or lower core count can be inserted into logic chiplet slots.Type: GrantFiled: March 2, 2022Date of Patent: November 12, 2024Assignee: Intel CorporationInventors: Altug Koker, Lance Cheney, Eric Finley, Varghese George, Sanjeev Jahagirdar, Josh Mastronarde, Naveen Matam, Iqbal Rajwani, Lakshminarayanan Striramassarma, Melaku Teshome, Vikranth Vemulapalli, Binoj Xavier
-
Publication number: 20240362180Abstract: Graphics processors and graphics processing units having dot product accumulate instructions for a hybrid floating point format are disclosed. In one embodiment, a graphics multiprocessor comprises an instruction unit to dispatch instructions and a processing resource coupled to the instruction unit. The processing resource is configured to receive a dot product accumulate instruction from the instruction unit and to process the dot product accumulate instruction using a bfloat16 number (BF16) format.Type: ApplicationFiled: April 26, 2024Publication date: October 31, 2024Applicant: Intel CorporationInventors: Subramaniam Maiyuran, Shubra Marwaha, Ashutosh Garg, Supratim Pal, Jorge Parra, Chandra Gurram, Varghese George, Darin Starkey, Guei-Yuan Lueh
-
Patent number: 12124383Abstract: Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache memory that is coupled to the processing resources. The cache controller is configured to set an initial aging policy using an aging field based on age of cache lines within the cache memory and to determine whether a hint or an instruction to indicate a level of aging has been received. In one embodiment, the cache memory configured to be partitioned into multiple cache regions, wherein the multiple cache regions include a first cache region having a cache eviction policy with a configurable level of data persistence.Type: GrantFiled: July 12, 2022Date of Patent: October 22, 2024Assignee: Intel CorporationInventors: Altug Koker, Joydeep Ray, Elmoustapha Ould-Ahmed-Vall, Abhishek Appu, Aravindh Anantaraman, Valentin Andrei, Durgaprasad Bilagi, Varghese George, Brent Insko, Sanjeev Jahagirdar, Scott Janus, Pattabhiraman K, SungYe Kim, Subramaniam Maiyuran, Vasanth Ranganathan, Lakshminarayanan Striramassarma, Xinmin Tian
-
Publication number: 20240345990Abstract: Multi-tile Memory Management for Detecting Cross Tile Access, Providing Multi-Tile Inference Scaling with multicasting of data via copy operation, and Providing Page Migration are disclosed herein. In one embodiment, a graphics processor for a multi-tile architecture includes a first graphics processing unit (GPU) having a memory and a memory controller, a second graphics processing unit (GPU) having a memory and a cross-GPU fabric to communicatively couple the first and second GPUs. The memory controller is configured to determine whether frequent cross tile memory accesses occur from the first GPU to the memory of the second GPU in the multi-GPU configuration and to send a message to initiate a data transfer mechanism when frequent cross tile memory accesses occur from the first GPU to the memory of the second GPU.Type: ApplicationFiled: April 4, 2024Publication date: October 17, 2024Applicant: Intel CorporationInventors: Lakshminarayanan Striramassarma, Prasoonkumar Surti, Varghese George, Ben Ashbaugh, Aravindh Anantaraman, Valentin Andrei, Abhishek Appu, Nicolas Galoppo Von Borries, Altug Koker, Mike Macpherson, Subramaniam Maiyuran, Nilay Mistry, Elmoustapha Ould-Ahmed-Vall, Selvakumar Panneer, Vasanth Ranganathan, Joydeep Ray, Ankur Shah, Saurabh Tangri
-
Patent number: 12117962Abstract: Methods and apparatus relating to scalar core integration in a graphics processor. In an example, an apparatus comprises a processor to receive a set of workload instructions for a graphics workload from a host complex, determine a first subset of operations in the set of operations that is suitable for execution by a scalar processor complex of the graphics processing device and a second subset of operations in the set of operations that is suitable for execution by a vector processor complex of the graphics processing device, assign the first subset of operations to the scalar processor complex for execution to generate a first set of outputs, assign the second subset of operations to the vector processor complex for execution to generate a second set of outputs. Other embodiments are also disclosed and claimed.Type: GrantFiled: August 16, 2023Date of Patent: October 15, 2024Assignee: INTEL CORPORATIONInventors: Joydeep Ray, Aravindh Anantaraman, Abhishek R. Appu, Altug Koker, Elmoustapha Ould-Ahmed-Vall, Valentin Andrei, Subramaniam Maiyuran, Nicolas Galoppo Von Borries, Varghese George, Mike Macpherson, Ben Ashbaugh, Murali Ramadoss, Vikranth Vemulapalli, William Sadler, Jonathan Pearce, Sungye Kim
-
Patent number: 12112398Abstract: Embodiments described herein provide techniques to disaggregate an architecture of a system on a chip integrated circuit into multiple distinct chiplets that can be packaged onto a common chassis. In one embodiment, a graphics processing unit or parallel processor is composed from diverse silicon chiplets that are separately manufactured. A chiplet is an at least partially and distinctly packaged integrated circuit that includes distinct units of logic that can be assembled with other chiplets into a larger package. A diverse set of chiplets with different IP core logic can be assembled into a single device.Type: GrantFiled: September 20, 2023Date of Patent: October 8, 2024Assignee: Intel CorporationInventors: Naveen Matam, Lance Cheney, Eric Finley, Varghese George, Sanjeev Jahagirdar, Altug Koker, Josh Mastronarde, Iqbal Rajwani, Lakshminarayanan Striramassarma, Melaku Teshome, Vikranth Vemulapalli, Binoj Xavier