Patents by Inventor Abhishek R. Appu
Abhishek R. Appu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12373912Abstract: Embodiments are generally directed to memory prefetching in multiple GPU environment. An embodiment of an apparatus includes multiple processors including a host processor and multiple graphics processing units (GPUs) to process data, each of the GPUs including a prefetcher and a cache; and a memory for storage of data, the memory including a plurality of memory elements, wherein the prefetcher of each of the GPUs is to prefetch data from the memory to the cache of the GPU; and wherein the prefetcher of a GPU is prohibited from prefetching from a page that is not owned by the GPU or by the host processor.Type: GrantFiled: November 16, 2023Date of Patent: July 29, 2025Assignee: INTEL CORPORATIONInventors: Joydeep Ray, Aravindh Anantaraman, Valentin Andrei, Abhishek R. Appu, Nicolas Galoppo von Borries, Varghese George, Altug Koker, Elmoustapha Ould-Ahmed-Vall, Mike Macpherson, Subramaniam Maiyuran
-
Patent number: 12361600Abstract: Embodiments described herein provided for an instruction and associated logic to enable a processing resource including a tensor accelerator to perform optimized computation of sparse submatrix operations. One embodiment provides a parallel processor comprising a processing cluster coupled with the cache memory. The processing cluster includes a plurality of multiprocessors coupled with a data interconnect, where a multiprocessor of the plurality of multiprocessors includes a tensor core configured to load tensor data and metadata associated with the tensor data from the cache memory, wherein the metadata indicates a first numerical transform applied to the tensor data, perform an inverse transform of the first numerical transform, perform a tensor operation on the tensor data after the inverse transform is performed, and write output of the tensor operation to a memory coupled with the processing cluster.Type: GrantFiled: May 23, 2023Date of Patent: July 15, 2025Assignee: Intel CorporationInventors: Abhishek R. Appu, Prasoonkumar Surti, Jill Boyce, Subramaniam Maiyuran, Michael Apodaca, Adam T. Lake, James Holland, Vasanth Ranganathan, Altug Koker, Lidong Xu, Nikos Kaburlasos
-
Patent number: 12346798Abstract: In an example, an apparatus comprises a compute engine comprising a high precision component and a low precision component; and logic, at least partially including hardware logic, to receive instructions in the compute engine; select at least one of the high precision component or the low precision component to execute the instructions; and apply a gate to at least one of the high precision component or the low precision component to execute the instructions. Other embodiments are also disclosed and claimed.Type: GrantFiled: July 12, 2023Date of Patent: July 1, 2025Assignee: INTEL CORPORATIONInventors: Kamal Sinha, Balaji Vembu, Eriko Nurvitadhi, Nicolas C. Galoppo Von Borries, Rajkishore Barik, Tsung-Han Lin, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Anbang Yao, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Farshad Akhbari, Narayan Srinivasa, Feng Chen, Dukhwan Kim, Nadathur Rajagopalan Satish, John C. Weast, Mike B. MacPherson, Linda L. Hurd, Vasanth Ranganathan, Sanjeev Jahagirdar
-
Publication number: 20250200697Abstract: Apparatuses including general-purpose graphics processing units having on chip dense memory for temporal buffering are disclosed. In one embodiment, a graphics multiprocessor includes a plurality of compute engines to perform first computations to generate a first set of data, cache for storing data, and a high density memory that is integrated on chip with the plurality of compute engines and the cache. The high density memory to receive the first set of data, to temporarily store the first set of data, and to provide the first set of data to the cache during a first time period that is prior to a second time period when the plurality of compute engines will use the first set of data for second computations.Type: ApplicationFiled: February 7, 2025Publication date: June 19, 2025Applicant: Intel CorporationInventors: Varghese George, Altug Koker, Aravindh Anantaraman, Subramaniam Maiyuran, SungYe Kim, Valentin Andrei, Elmoustapha Ould-Ahmed-Vall, Joydeep Ray, Abhishek R. Appu, Nicolas C. Galoppo von Borries, Prasoonkumar Surti, Mike Macpherson
-
Publication number: 20250182709Abstract: Often when there is a glare on a display screen the user may be able to mitigate the glare by tilting or otherwise moving the screen or changing their viewing position. However, when driving a car there are limited options for overcoming glares on the dashboard, especially when you are driving for a long distance in the same direction. Embodiments are directed to eliminating such glare. Other embodiments are related to mixed reality (MR) and filling in occluded areas.Type: ApplicationFiled: January 8, 2025Publication date: June 5, 2025Inventors: Arthur J. Runyan, Richmond Hicks, Nausheen Ansari, Narayan Biswal, Ya-Ti Peng, Abhishek R. Appu, Wen-Fu Kao, Sang-Hee Lee, Joydeep Ray, Changliang Wang, Satyanarayana Avadhanam, Scott Janus, Gary Smith, Nilesh V. Shah, Keith W. Rowe, Robert J. Johnston
-
Patent number: 12321310Abstract: Embodiments described herein provide techniques to facilitate instruction-based control of memory attributes. One embodiment provides a graphics processor comprising a processing resource, a memory device, a cache coupled with the processing resources and the memory, and circuitry to process a memory access message received from the processing resource. The memory access message enables access to data of the memory device. To process the memory access message, the circuitry is configured to determine one or more cache attributes that indicate whether the data should be read from or stored the cache. The cache attributes may be provided by the memory access message or stored in state data associated with the data to be accessed by the access message.Type: GrantFiled: October 20, 2023Date of Patent: June 3, 2025Assignee: Intel CorporationInventors: Joydeep Ray, Altug Koker, Varghese George, Mike Macpherson, Aravindh Anantaraman, Abhishek R. Appu, Elmoustapha Ould-Ahmed-Vall, Nicolas Galoppo von Borries, Ben J. Ashbaugh
-
Publication number: 20250173308Abstract: Embodiments are generally directed to graphics processor data access and sharing. An embodiment of an apparatus includes a circuit element to produce a result in processing of an application; a load-store unit to receive the result and generate pre-fetch information for a cache utilizing the result; and a prefetch generator to produce prefetch addresses based at least in part on the pre-fetch information; wherein the load-store unit is to receive software assistance for prefetching, and wherein generation of the pre-fetch information is based at least in part on the software assistance.Type: ApplicationFiled: November 25, 2024Publication date: May 29, 2025Applicant: Intel CorporationInventors: Altug Koker, Varghese George, Aravindh Anantaraman, Valentin Andrei, Abhishek R. Appu, Niranjan Cooray, Nicolas Galoppo Von Borries, Mike MacPherson, Subramaniam Maiyuran, ElMoustapha Ould-Ahmed-Vall, David Puffer, Vasanth Ranganathan, Joydeep Ray, Ankur N. Shah, Lakshminarayanan Striramassarma, Prasoonkumar Surti, Saurabh Tangri
-
Patent number: 12299766Abstract: Systems and methods for supporting generic pointers in hardware of a graphics processing unit (GPU) are provided. In various examples, a GPU includes multiple sub-cores each having a processing resource and a load/store pipeline. The processing resource is operable to receive a memory access message including a pointer and a memory type identifier indicative of the pointer representing a generic pointer. The processing resource is further operable to output a load or store operation to the load/store pipeline based on the memory access message, including computing an address for the load or store operation by adding a base address of a named memory type of a plurality of named memory types referenced by the generic pointer to an offset into a memory of the named memory type. The load/store pipeline is operable to, responsive to receipt of the load or store operation, access the memory at the address.Type: GrantFiled: September 24, 2021Date of Patent: May 13, 2025Assignee: Intel CorporationInventors: Joydeep Ray, Prathamesh Raghunath Shinde, Ben J. Ashbaugh, Wei-Yu Chen, Abhishek R. Appu, Vasanth Ranganathan, Dmitry Yurievich Babokin, Ankur N. Shah
-
Patent number: 12293430Abstract: Methods, systems and apparatuses provide for graphics processor technology that routes untyped unordered access view (UAV) messages to a next level memory cache, routes typed UAV messages and render target messages to a pixel pipeline, and processes, via the pixel pipeline, the typed UAV messages. The technology can also provide for the pixel pipeline to perform a format conversion of one or more pixels associated with a typed UAV message based on a surface format of a UAV resource, calculate a memory address for each pixel associated with the typed UAV message, and collect a plurality of fragments from processed typed UAV messages.Type: GrantFiled: June 23, 2021Date of Patent: May 6, 2025Assignee: Intel CorporationInventors: Jay Jardosh, Prasoonkumar Surti, Abhishek R. Appu
-
Patent number: 12288287Abstract: An embodiment of an electronic processing system may include an application processor, persistent storage media communicatively coupled to the application processor, and a graphics subsystem communicatively coupled to the application processor. The graphics subsystem may include a first graphics engine to process a graphics workload, and a second graphics engine to offload at least a portion of the graphics workload from the first graphics engine. The second graphics engine may include a low precision compute engine. The system may further include a wearable display housing the second graphics engine. Other embodiments are disclosed and claimed.Type: GrantFiled: February 8, 2024Date of Patent: April 29, 2025Assignee: Intel CorporationInventors: Atsuo Kuwahara, Deepak S. Vembar, Chandrasekaran Sakthivel, Radhakrishnan Venkataraman, Brent E. Insko, Anupreet S. Kalra, Hugues Labbe, Abhishek R. Appu, Ankur N. Shah, Joydeep Ray, Elmoustapha Ould-Ahmed-Vall, Prasoonkumar Surti, Murali Ramadoss
-
Patent number: 12288284Abstract: Dynamic routing of texture-load in graphics processing is described. An example of a processor includes one or more processing resources, the one or more processing resources to load a message including a texture load; a texture sampler and a data port; and a message router to route the texture load to a destination, wherein the destination may be either the texture sampler or the data port; wherein the message router includes arbitration circuitry to select the destination for the texture load, the arbitration circuitry to base selection of the destination at least in part on support by the data port for a format of a memory surface for the texture load; and a utilization metric for the data port representing availability of the data port.Type: GrantFiled: September 24, 2021Date of Patent: April 29, 2025Assignee: INTEL CORPORATIONInventors: Carlos Nava Rodriguez, Yoav Harel, Joydeep Ray, Abhishek R. Appu, Vamsee Vardhan Chivukula, Benjamin R. Pletcher
-
Publication number: 20250130848Abstract: An apparatus to facilitate barrier state save and restore for preemption in a graphics environment is disclosed. The apparatus includes processing resources to execute a plurality of execution threads that are comprised in a thread group (TG) and mid-thread preemption barrier save and restore hardware circuitry to: initiate an exception handling routine in response to a mid-thread preemption event, the exception handling routine to cause a barrier signaling event to be issued; receive indication of a valid designated thread status for a thread of a thread group (TG) in response to the barrier signaling event; and in response to receiving the indication of the valid designated thread status for the thread of the TG, cause, by the thread of the TG having the valid designated thread status, a barrier save routine and a barrier restore routine to be initiated for named barriers of the TG.Type: ApplicationFiled: November 1, 2024Publication date: April 24, 2025Applicant: Intel CorporationInventors: Vasanth Ranganathan, James Valerio, Joydeep Ray, Abhishek R. Appu, Alan Curtis, Prathamesh Raghunath Shinde, Brandon Fliflet, Ben J. Ashbaugh, John Wiegert
-
Patent number: 12282809Abstract: A graphics processing apparatus includes graphics processors connected by a network connection, where the graphics processors pass compressed data. A first graphics processor stores data blocks as compressed data in a memory. The compressed data has data blocks of variable size, where a size of a block of compressed data depends on a compression ratio of the block of compressed data. A second graphics processor also stores data blocks as compressed data. The first graphics processor concatenates a variable number of blocks of compressed data into a packet of fixed size to send to the second graphics processor. The packet has a variable number of blocks of compressed data depending on the compression ratios of the multiple blocks of compressed data.Type: GrantFiled: September 24, 2021Date of Patent: April 22, 2025Assignee: Intel CorporationInventors: Karol A. Szerszen, Prasoonkumar Surti, Abhishek R. Appu
-
Publication number: 20250117356Abstract: Methods and apparatus relating to techniques for multi-tile memory management. In an example, a graphics processor includes an interposer, a first chiplet coupled with the interposer, the first chiplet including a graphics processing resource and an interconnect network coupled with the graphics processing resource, cache circuitry coupled with the graphics processing resource via the interconnect network, and a second chiplet coupled with the first chiplet via the interposer, the second chiplet including a memory-side cache and a memory controller coupled with the memory-side cache. The memory controller is configured to enable access to a high-bandwidth memory (HBM) device, the memory-side cache is configured to cache data associated with a memory access performed via the memory controller, and the cache circuitry is logically positioned between the graphics processing resource and a chiplet interface.Type: ApplicationFiled: October 15, 2024Publication date: April 10, 2025Applicant: Intel CorporationInventors: Abhishek R. Appu, Altug Koker, Aravindh Anantaraman, Elmoustapha Ould-Ahmed-Vall, Valentin Andrei, Nicolas Galoppo Von Borries, Varghese George, Mike Macpherson, Subramaniam Maiyuran, Joydeep Ray, Lakshminarayanan Striramassarma, Scott Janus, Brent Insko, Vasanth Ranganathan, Kamal Sinha, Arthur Hunter, Prasoonkumar Surti, David Puffer, James Valerio, Ankur N. Shah
-
Publication number: 20250117874Abstract: One embodiment provides an apparatus comprising a memory stack including multiple memory dies and a parallel processor including a plurality of multiprocessors. Each multiprocessor has a single instruction, multiple thread (SIMT) architecture, the parallel processor coupled to the memory stack via one or more memory interfaces. At least one multiprocessor comprises a multiply-accumulate circuit to perform multiply-accumulate operations on matrix data in a stage of a neural network implementation to produce a result matrix comprising a plurality of matrix data elements at a first precision, precision tracking logic to evaluate metrics associated with the matrix data elements and indicate if an optimization is to be performed for representing data at a second stage of the neural network implementation, and a numerical transform unit to dynamically perform a numerical transform operation on the matrix data elements based on the indication to produce transformed matrix data elements at a second precision.Type: ApplicationFiled: October 7, 2024Publication date: April 10, 2025Applicant: Intel CorporationInventors: Elmoustapha Ould-Ahmed-Vall, Sara S. Baghsorkhi, Anbang Yao, Kevin Nealis, Xiaoming Chen, Altug Koker, Abhishek R. Appu, John C. Weast, Mike B. Macpherson, Dukhwan Kim, Linda L. Hurd, Ben J. Ashbaugh, Barath Lakshmanan, Liwei Ma, Joydeep Ray, Ping T. Tang, Michael S. Strickland
-
Publication number: 20250117060Abstract: Methods and apparatus relating to techniques for avoiding cache lookup for cold cache. In an example, an apparatus comprises logic, at least partially comprising hardware logic, to collect user information for a user of a data processing device, generate a user profile for the user of the data processing device from the user information, and set a power profile a processor in the data processing device using the user profile. Other embodiments are also disclosed and claimed.Type: ApplicationFiled: September 12, 2024Publication date: April 10, 2025Applicant: INTEL CORPORATIONInventors: Altug Koker, Abhishek R. Appu, Kiran C. Veernapu, Joydeep Ray, Balaji Vembu, Prasoonkumar Surti, Kamal Sinha, Eric J. Hoekstra, Wenyin Fu, Nikos Kaburlasos, Bhushan M. Borole, Travis T. Schluessler, Ankur N. Shah, Jonathan Kennedy
-
Publication number: 20250103430Abstract: Apparatuses including a graphics processing unit, graphics multiprocessor, or graphics processor having an error detection correction logic for cache memory or shared memory are disclosed. In one embodiment, a graphics multiprocessor includes cache or local memory for storing data and error detection correction circuitry integrated with or coupled to the cache or local memory. The error detection correction circuitry is configured to perform a tag read for data of the cache or local memory to check error detection correction information.Type: ApplicationFiled: October 4, 2024Publication date: March 27, 2025Applicant: Intel CorporationInventors: Vasanth Ranganathan, Joydeep Ray, Abhishek R. Appu, Nikos Kaburlasos, Lidong Xu, Subramaniam Maiyuran, Altug Koker, Naveen Matam, James Holland, Brent Insko, Sanjeev Jahagirdar, Scott Janus, Durgaprasad Bilagi, Xinmin Tian
-
Publication number: 20250103343Abstract: Embodiments described herein provide an apparatus comprising a plurality of processing resources including a first processing resource and a second processing resource, a memory communicatively coupled to the first processing resource and the second processing resource, and a processor to receive data dependencies for one or more tasks comprising one or more producer tasks executing on the first processing resource and one or more consumer tasks executing on the second processing resource and move a data output from one or more producer tasks executing on the first processing resource to a cache memory communicatively coupled to the second processing resource. Other embodiments may be described and claimed.Type: ApplicationFiled: November 21, 2024Publication date: March 27, 2025Applicant: INTEL CORPORATIONInventors: Christopher J. HUGHES, Prasoonkumar SURTI, Guei-Yuan LUEH, Adam T. LAKE, Jill BOYCE, Subramaniam MAIYURAN, Lidong XU, James M. HOLLAND, Vasanth RANGANATHAN, Nikos KABURLASOS, Altug KOKER, Abhishek R. Appu
-
Publication number: 20250103546Abstract: Embodiments are generally directed to cache structure and utilization. An embodiment of an apparatus includes one or more processors including a graphics processor; a memory for storage of data for processing by the one or more processors; and a cache to cache data from the memory; wherein the apparatus is to provide for dynamic overfetching of cache lines for the cache, including receiving a read request and accessing the cache for the requested data, and upon a miss in the cache, overfetching data from memory or a higher level cache in addition to fetching the requested data, wherein the overfetching of data is based at least in part on a current overfetch boundary, and provides for data is to be prefetched extending to the current overfetch boundary.Type: ApplicationFiled: October 4, 2024Publication date: March 27, 2025Applicant: Intel CorporationInventors: Altug Koker, Lakshminarayanan Striramassarma, Aravindh Anantaraman, Valentin Andrei, Abhishek R. Appu, Sean Coleman, Varghese George, Pattabhiraman K, Mike MacPherson, Subramaniam Maiyuran, ElMoustapha Ould-Ahmed-Vall, Vasanth Ranganathan, Joydeep Ray, Jayakrishna P S, Prasoonkumar Surti
-
Publication number: 20250104326Abstract: One embodiment provides a graphics processor comprising an interface to a system interconnect and a graphics processor coupled to the interface, the graphics processor comprising circuitry configured to compact sample data for multiple sample locations of a pixel, map the multiple sample locations to memory locations that store compacted sample data, the memory locations in a memory of the graphics processor, apply lossless compression to the compacted sample data, and update a compression control surface associated with the memory locations, the compression control surface to specify a compression status for the memory locationsType: ApplicationFiled: September 11, 2024Publication date: March 27, 2025Applicant: Intel CorporationInventors: Abhishek R. Appu, Prasoonkumar Surti, Joydeep Ray, Michael J. Norris