Patents by Inventor Joydeep Ray
Joydeep Ray has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240134797Abstract: Embodiments described herein provide a technique to facilitate the broadcast or multicast of asynchronous loads to shared local memory of a plurality of graphics cores within a graphics core cluster. One embodiment provides a graphics processor including a cache memory a graphics core cluster coupled with the cache memory. The graphics core cluster includes a plurality of graphics cores. The plurality of graphics cores includes a graphics core configured to receive a designation as a producer graphics core for a multicast load, read data from the cache memory; and transmit the data read from the cache memory to a consumer graphics core of the plurality of graphics cores.Type: ApplicationFiled: October 24, 2022Publication date: April 25, 2024Applicant: Intel CorporationInventors: John A. Wiegert, Joydeep Ray, Vasanth Ranganathan, Biju George, Fangwen Fu, Abhishek R. Appu, Chunhui Mei, Changwon Rhee
-
Publication number: 20240134527Abstract: Embodiments described herein provide a technique to enable access to entries in a surface state or sampler state using 64-bit virtual addresses. One embodiment provides a graphics core that includes memory access circuitry configured to facilitate access to the memory by functional units of the graphics core. The memory access circuitry is configured to receive a message to access an entry in a surface state or a sampler state associated with a parallel processing operation. The message specifies a base address for a surface state entry or sampler state entry. The circuitry can add the base address and the offset to determine a 64-bit virtual address for the entry in the surface state entry or the sampler state and submit a memory access request to the memory to access the entry of the surface state or sampler state.Type: ApplicationFiled: October 20, 2022Publication date: April 25, 2024Applicant: Intel CorporationInventors: Joydeep Ray, Michael Apodaca, Yoav Harel, Guei-Yuan Lueh, John A. Wiegert
-
Publication number: 20240122998Abstract: The present invention relates to a composition for use in the prevention of respiratory allergy, in particular allergic rhinitis, dust mite allergy and/or asthma. The invention also provides a kit comprising a combination of at least one polyphenol and at least one Bifidobacterium longum in one or more containers.Type: ApplicationFiled: February 16, 2022Publication date: April 18, 2024Inventors: CARINE BLANCHARD, FRANCINE LENFANT, ULRIKE REYMONDIN, GAELLE MARIE LAURE SCHLUP-OLLIVIER, JOYDEEP RAY, SEBASTIEN HOLVOET, CHRISTOPHE BARDE
-
Patent number: 11961179Abstract: One embodiment provides for a graphics processing unit comprising a processing cluster to perform multi-rate shading via coarse pixel shading and output shaded coarse pixels for processing by a post-shader pixel processing pipeline.Type: GrantFiled: April 24, 2023Date of Patent: April 16, 2024Assignee: Intel CorporationInventors: Prasoonkumar Surti, Abhishek R. Appu, Subhajit Dasgupta, Srivallaba Mysore, Michael J. Norris, Vasanth Ranganathan, Joydeep Ray
-
Patent number: 11954062Abstract: Embodiments described herein provide techniques to enable the dynamic reconfiguration of memory on a general-purpose graphics processing unit. One embodiment described herein enables dynamic reconfiguration of cache memory bank assignments based on hardware statistics. One embodiment enables for virtual memory address translation using mixed four kilobyte and sixty-four kilobyte pages within the same page table hierarchy and under the same page directory. One embodiment provides for a graphics processor and associated heterogenous processing system having near and far regions of the same level of a cache hierarchy.Type: GrantFiled: March 14, 2020Date of Patent: April 9, 2024Assignee: Intel CorporationInventors: Joydeep Ray, Niranjan Cooray, Subramaniam Maiyuran, Altug Koker, Prasoonkumar Surti, Varghese George, Valentin Andrei, Abhishek Appu, Guadalupe Garcia, Pattabhiraman K, Sungye Kim, Sanjay Kumar, Pratik Marolia, Elmoustapha Ould-Ahmed-Vall, Vasanth Ranganathan, William Sadler, Lakshminarayanan Striramassarma
-
Patent number: 11954783Abstract: An embodiment of an electronic processing system may include an application processor, persistent storage media communicatively coupled to the application processor, and a graphics subsystem communicatively coupled to the application processor. The graphics subsystem may include a first graphics engine to process a graphics workload, and a second graphics engine to offload at least a portion of the graphics workload from the first graphics engine. The second graphics engine may include a low precision compute engine. The system may further include a wearable display housing the second graphics engine. Other embodiments are disclosed and claimed.Type: GrantFiled: December 29, 2021Date of Patent: April 9, 2024Assignee: Intel CorporationInventors: Atsuo Kuwahara, Deepak S. Vembar, Chandrasekaran Sakthivel, Radhakrishnan Venkataraman, Brent E. Insko, Anupreet S. Kalra, Hugues Labbe, Abhishek R. Appu, Ankur N. Shah, Joydeep Ray, Elmoustapha Ould-Ahmed-Vall, Prasoonkumar Surti, Murali Ramadoss
-
Publication number: 20240112295Abstract: Shared local registers for thread team processing is described. An example of an apparatus includes one or more processors including a graphic processor having multiple processing resources; and memory for storage of data, the graphics processor to allocate a first thread team to a first processing resource, the first thread team including hardware threads to be executed solely by the first processing resource; allocate a shared local register (SLR) space that may be directly reference in the ISA instructions to the first processing resource, the SLR space being accessible to the threads of the thread team and being inaccessible to threads outside of the thread team; and allocate individual register spaces to the thread team, each of the individual register spaces being accessible to a respective thread of the thread team.Type: ApplicationFiled: September 30, 2022Publication date: April 4, 2024Applicant: Intel CorporationInventors: Biju George, Fangwen Fu, Supratim Pal, Jorge Parra, Chunhui Mei, Maxim Kazakov, Joydeep Ray
-
Patent number: 11948224Abstract: One embodiment provides an apparatus comprising a memory stack including multiple memory dies and a parallel processor including a plurality of multiprocessors. Each multiprocessor has a single instruction, multiple thread (SIMT) architecture, the parallel processor coupled to the memory stack via one or more memory interfaces. At least one multiprocessor comprises a multiply-accumulate circuit to perform multiply-accumulate operations on matrix data in a stage of a neural network implementation to produce a result matrix comprising a plurality of matrix data elements at a first precision, precision tracking logic to evaluate metrics associated with the matrix data elements and indicate if an optimization is to be performed for representing data at a second stage of the neural network implementation, and a numerical transform unit to dynamically perform a numerical transform operation on the matrix data elements based on the indication to produce transformed matrix data elements at a second precision.Type: GrantFiled: November 1, 2022Date of Patent: April 2, 2024Assignee: Intel CorporationInventors: Elmoustapha Ould-Ahmed-Vall, Sara S. Baghsorkhi, Anbang Yao, Kevin Nealis, Xiaoming Chen, Altug Koker, Abhishek R. Appu, John C. Weast, Mike B. Macpherson, Dukhwan Kim, Linda L. Hurd, Ben J. Ashbaugh, Barath Lakshmanan, Liwei Ma, Joydeep Ray, Ping T. Tang, Michael S. Strickland
-
Publication number: 20240095038Abstract: Embodiments described herein provide a technique to decompose 64-bit per-lane virtual addresses to access a plurality of data elements on behalf of a multi-lane parallel processing execution resource of a graphics or compute accelerator. The 64-bit per-lane addresses are decomposed into a base address and a plurality of per-lane offsets for transmission to memory access circuitry. The memory access circuitry then combines the base address and the per-lane offsets to reconstruct the per-lane addresses.Type: ApplicationFiled: September 21, 2022Publication date: March 21, 2024Applicant: Intel CorporationInventors: John Wiegert, Joydeep Ray, Timothy Bauer, James Valerio
-
Patent number: 11934342Abstract: Embodiments are generally directed to graphics processor data access and sharing. An embodiment of an apparatus includes a circuit element to produce a result in processing of an application; a load-store unit to receive the result and generate pre-fetch information for a cache utilizing the result; and a prefetch generator to produce prefetch addresses based at least in part on the pre-fetch information; wherein the load-store unit is to receive software assistance for prefetching, and wherein generation of the pre-fetch information is based at least in part on the software assistance.Type: GrantFiled: March 14, 2020Date of Patent: March 19, 2024Assignee: INTEL CORPORATIONInventors: Altug Koker, Varghese George, Aravindh Anantaraman, Valentin Andrei, Abhishek R. Appu, Niranjan Cooray, Nicolas Galoppo Von Borries, Mike MacPherson, Subramaniam Maiyuran, ElMoustapha Ould-Ahmed-Vall, David Puffer, Vasanth Ranganathan, Joydeep Ray, Ankur N. Shah, Lakshminarayanan Striramassarma, Prasoonkumar Surti, Saurabh Tangri
-
Patent number: 11934934Abstract: An apparatus to facilitate optimization of a convolutional neural network (CNN) is disclosed. The apparatus includes optimization logic to receive a CNN model having a list of instructions and including pruning logic to optimize the list of instructions by eliminating branches in the list of instructions that comprise a weight value of 0.Type: GrantFiled: April 17, 2017Date of Patent: March 19, 2024Assignee: Intel CorporationInventors: Liwei Ma, Elmoustapha Ould- Ahmed-Vall, Barath Lakshmanan, Ben J. Ashbaugh, Jingyi Jin, Jeremy Bottleson, Mike B. Macpherson, Kevin Nealis, Dhawal Srivastava, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Anbang Yao, Tatiana Shpeisman, Altug Koker, Abhishek R. Appu
-
Publication number: 20240086199Abstract: An apparatus to facilitate thread scheduling is disclosed. The apparatus includes logic to store barrier usage data based on a magnitude of barrier messages in an application kernel and a scheduler to schedule execution of threads across a plurality of multiprocessors based on the barrier usage data.Type: ApplicationFiled: August 4, 2023Publication date: March 14, 2024Applicant: Intel CorporationInventors: Balaji Vembu, Abhishek R. Appu, Joydeep Ray, Altug Koker
-
Publication number: 20240086356Abstract: Embodiments described herein provide techniques to facilitate instruction-based control of memory attributes. One embodiment provides a graphics processor comprising a processing resource, a memory device, a cache coupled with the processing resources and the memory, and circuitry to process a memory access message received from the processing resource. The memory access message enables access to data of the memory device. To process the memory access message, the circuitry is configured to determine one or more cache attributes that indicate whether the data should be read from or stored the cache. The cache attributes may be provided by the memory access message or stored in state data associated with the data to be accessed by the access message.Type: ApplicationFiled: October 20, 2023Publication date: March 14, 2024Applicant: Intel CorporationInventors: Joydeep Ray, Altug Koker, Varghese George, Mike Macpherson, Aravindh Anantaraman, Abhishek R. Appu, Elmoustapha Ould-Ahmed-Vall, Nicolas Galoppo von Borries, Ben J. Ashbaugh
-
Publication number: 20240086064Abstract: Embodiments described herein enable the offload of address calculations required to access a data element within an array of data elements from primary compute resources of a graphics processor to the memory access circuitry of the graphics processor. The memory access circuitry is configured to receive a message to access a data element of an array of data elements in the memory, the message to include an index of the data element in the array of data elements, calculate a byte address for the data element based in part on the index of the data element in the array of data elements, and submit a memory access request to the memory to access the data element at the byte address.Type: ApplicationFiled: September 14, 2022Publication date: March 14, 2024Applicant: Intel CorporationInventors: John Wiegert, Joydeep Ray, Timothy Bauer, James Valerio
-
Publication number: 20240086138Abstract: In accordance with some embodiments, the render rate is varied across and/or up and down the display screen. This may be done based on where the user is looking in order to reduce power consumption and/or increase performance. Specifically the screen display is separated into regions, such as quadrants. Each of these regions is rendered at a rate determined by at least one of what the user is currently looking at, what the user has looked at in the past and/or what it is predicted that the user will look at next. Areas of less focus may be rendered at a lower rate, reducing power consumption in some embodiments.Type: ApplicationFiled: September 26, 2023Publication date: March 14, 2024Inventors: Eric J. Asperheim, Subramaniam Maiyuran, Kiran C. Veernapu, Sanjeev S. Jahagirdar, Balaji Vembu, Devan Burke, Philip R. Laws, Kamal Sinha, Abhishek R. Appu, Elmoustapha Ould-Ahmed-Vall, Peter L. Doyle, Joydeep Ray, Travis T. Schluessler, John H. Feit, Nikos Kaburlasos, Jacek Kwiatkowski, Altug Koker
-
Publication number: 20240086357Abstract: Systems and methods for updating remote memory side caches in a multi-GPU configuration are disclosed herein. In one embodiment, a graphics processor for a multi-tile architecture includes a first graphics processing unit (GPU) having a first memory, a first memory side cache memory, a first communication fabric, and a first memory management unit (MMU). The graphics processor includes a second graphics processing unit (GPU) having a second memory, a second memory side cache memory, a second memory management unit (MMU), and a second communication fabric that is communicatively coupled to the first communication fabric. The first MMU is configured to control memory requests for the first memory, to update content in the first memory, to update content in the first memory side cache memory, and to determine whether to update the content in the second memory side cache memory.Type: ApplicationFiled: November 21, 2023Publication date: March 14, 2024Applicant: Intel CorporationInventors: Altug Koker, Joydeep Ray, Aravindh Anantaraman, Valentin Andrei, Abhishek Appu, Sean Coleman, Nicolas Galoppo Von Borries, Varghese George, Pattabhiraman K, SungYe Kim, Mike Macpherson, Subramaniam Maiyuran, Elmoustapha Ould-Ahmed-Vall, Vasanth Ranganathan, James Valerio
-
Publication number: 20240087077Abstract: Embodiments described herein provide a technique to merge partial cache line writes to a cache memory. One embodiment provides a graphics processor comprising a graphics core, a cache coupled with the graphics core, and memory access circuitry to process memory access messages received from the graphics core. The memory access circuitry includes partial cache line write merge circuitry configured to merge a first partial write to a cache line of the cache with a second partial write to the cache line of the cache.Type: ApplicationFiled: September 14, 2022Publication date: March 14, 2024Applicant: Intel CorporationInventors: Joydeep Ray, Abhishek R. Appu, Prathamesh Raghunath Shinde, John Wiegert
-
Publication number: 20240078630Abstract: Embodiments described herein are generally directed to improvements relating to power, latency, bandwidth and/or performance issues relating to GPU processing/caching. According to one embodiment, a system includes a producer intellectual property (IP) (e.g., a media IP), a compute core (e.g., a GPU or an AI-specific core of the GPU), a streaming buffer logically interposed between the producer IP and the compute core. The producer IP is operable to consume data from memory and output results to the streaming buffer. The compute core is operable to perform AI inference processing based on data consumed from the streaming buffer and output AI inference processing results to the memory.Type: ApplicationFiled: October 19, 2023Publication date: March 7, 2024Applicant: Intel CorporationInventors: Subramaniam Maiyuran, Durgaprasad Bilagi, Joydeep Ray, Scott Janus, Sanjeev Jahagirdar, Brent Insko, Lidong Xu, Abhishek R. Appu, James Holland, Vasanth Ranganathan, Nikos Kaburlasos, Altug Koker, Xinmin Tian, Guei-Yuan Lueh, Changliang Wang
-
Patent number: 11922535Abstract: Embodiments provide mechanisms to facilitate compute operations for deep neural networks. One embodiment comprises a graphics processing unit comprising one or more multiprocessors, at least one of the one or more multiprocessors including a register file to store a plurality of different types of operands and a plurality of processing cores. The plurality of processing cores includes a first set of processing cores of a first type and a second set of processing cores of a second type. The first set of processing cores are associated with a first memory channel and the second set of processing cores are associated with a second memory channel.Type: GrantFiled: February 13, 2023Date of Patent: March 5, 2024Assignee: Intel CorporationInventors: Prasoonkumar Surti, Narayan Srinivasa, Feng Chen, Joydeep Ray, Ben J. Ashbaugh, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Balaji Vembu, Tsung-Han Lin, Kamal Sinha, Rajkishore Barik, Sara S. Baghsorkhi, Justin E. Gottschlich, Altug Koker, Nadathur Rajagopalan Satish, Farshad Akhbari, Dukhwan Kim, Wenyin Fu, Travis T. Schluessler, Josh B. Mastronarde, Linda L. Hurd, John H. Feit, Jeffery S. Boles, Adam T. Lake, Karthik Vaidyanathan, Devan Burke, Subramaniam Maiyuran, Abhishek R. Appu
-
Publication number: 20240069914Abstract: Embodiments described herein provide a system to enable access to an n-dimensional tensor in memory of a graphics processor via a batch of two-dimensional block access messages. One embodiment provides a graphics processor comprising general-purpose graphics execution resources coupled with the system interface, the general-purpose graphics execution resources including a matrix accelerator. The matrix accelerator is configured to perform a matrix operation on a plurality of tensors stored in a memory. Circuitry is included to facilitate access to the memory by the general-purpose graphics execution resources. The circuitry is configured to receive a request to access a tensor of the plurality of tensors and generate a batch of two-dimensional block access messages along a dimension of n>2 of the tensor. The batch of two-dimensional block access messages enables access to the tensor by the matrix accelerator.Type: ApplicationFiled: August 23, 2022Publication date: February 29, 2024Applicant: Intel CorporationInventors: Biju George, Fangwen Fu, Joydeep Ray