Patents by Inventor Joydeep Ray

Joydeep Ray has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

System, apparatus and method for increasing performance in a processor during a voltage ramp

Patent number: 12007824

Abstract: In one embodiment, a processor includes: a graphics processor to execute a workload; and a power controller coupled to the graphics processor. The power controller may include a voltage ramp circuit to receive a request for the graphics processor to operate at a first performance state having a first operating voltage and a first operating frequency and cause an output voltage of a voltage regulator to increase to the first operating voltage. The voltage ramp circuit may be configured to enable the graphics processor to execute the workload at an interim performance state having an interim operating voltage and an interim operating frequency when the output voltage reaches a minimum operating voltage. Other embodiments are described and claimed.

Type: Grant

Filed: November 2, 2021

Date of Patent: June 11, 2024

Assignee: Intel Corporation

Inventors: Altug Koker, Abhishek R. Appu, Bhushan M. Borole, Wenyin Fu, Kamal Sinha, Joydeep Ray
INSTRUCTIONS AND LOGIC TO PERFORM FLOATING POINT AND INTEGER OPERATIONS FOR MACHINE LEARNING

Publication number: 20240184572

Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute an intermediate product of 16-bit operands and to compute a 32-bit sum based on the intermediate product.

Type: Application

Filed: December 4, 2023

Publication date: June 6, 2024

Applicant: Intel Corporation

Inventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
Barriers and synchronization for machine learning at autonomous machines

Patent number: 12001209

Abstract: A method of embodiments, as described herein, includes detecting thread groups relating to machine learning associated with one or more processing devices. The method may further include facilitating barrier synchronization of the thread groups across multiple dies such that each thread in a thread group is scheduled across a set of compute elements associated with the multiple dies, where each die represents a processing device of the one or more processing devices, the processing device including a graphics processor.

Type: Grant

Filed: May 23, 2022

Date of Patent: June 4, 2024

Assignee: Intel Corporation

Inventors: Abhishek R. Appu, Altug Koker, Joydeep Ray, Balaji Vembu, John C. Weast, Mike B. Macpherson, Dukhwan Kim, Linda L. Hurd, Sanjeev Jahagirdar, Vasanth Ranganathan
Multi-tile memory management for detecting cross tile access providing multi-tile inference scaling and providing page migration

Patent number: 11995029

Abstract: Multi-tile Memory Management for Detecting Cross Tile Access, Providing Multi-Tile Inference Scaling with multicasting of data via copy operation, and Providing Page Migration are disclosed herein. In one embodiment, a graphics processor for a multi-tile architecture includes a first graphics processing unit (GPU) having a memory and a memory controller, a second graphics processing unit (GPU) having a memory and a cross-GPU fabric to communicatively couple the first and second GPUs. The memory controller is configured to determine whether frequent cross tile memory accesses occur from the first GPU to the memory of the second GPU in the multi-GPU configuration and to send a message to initiate a data transfer mechanism when frequent cross tile memory accesses occur from the first GPU to the memory of the second GPU.

Type: Grant

Filed: March 14, 2020

Date of Patent: May 28, 2024

Assignee: Intel Corporation

Inventors: Lakshminarayanan Striramassarma, Prasoonkumar Surti, Varghese George, Ben Ashbaugh, Aravindh Anantaraman, Valentin Andrei, Abhishek Appu, Nicolas Galoppo Von Borries, Altug Koker, Mike Macpherson, Subramaniam Maiyuran, Nilay Mistry, Elmoustapha Ould-Ahmed-Vall, Selvakumar Panneer, Vasanth Ranganathan, Joydeep Ray, Ankur Shah, Saurabh Tangri
Thread scheduling over compute blocks for power optimization

Patent number: 11995737

Abstract: Thread dispatch circuitry is configured to dispatch threads of a two-dimensional (2D) thread group based on data access locality associated with the threads. The thread dispatch circuitry can dispatch a first 2D sub-group of the 2D thread group to a compute block of the multiple compute blocks, the first 2D sub-group associated with a first 2D tile of memory and dispatch a second 2D sub-group of the 2D thread group to the compute block of the multiple compute blocks, the second 2D sub-group associated with a second 2D tile of memory.

Type: Grant

Filed: November 16, 2021

Date of Patent: May 28, 2024

Assignee: Intel Corporation

Inventors: Altug Koker, Balaji Vembu, Joydeep Ray, James A. Valerio, Abhishek R. Appu
INCREASING PROCESSING RESOURCES IN PROCESSING CORES OF A GRAPHICS ENVIRONMENT

Publication number: 20240160478

Abstract: An apparatus to facilitate increasing processing resources in processing cores of a graphics environment is disclosed. The apparatus includes a plurality of processing resources to execute one or more execution threads; a plurality of message arbiter-processing resource (MA-PR) routers, wherein a respective MA-PR router of the plurality of MA-PR routers corresponds to a pair of processing resources of the plurality of processing resources and is to arbitrate routing of a thread control message from a message arbiter between the pair of processing resources; a plurality of local shared cache (LSC) sequencers to provide an interface between at least one LSC of the processing core and the plurality of processing resources; and a plurality of instruction caches (ICs) to store instructions of the one or more execution threads, wherein a respective IC of the plurality of ICs interfaces with a portion of the plurality of processing resources.

Type: Application

Filed: November 15, 2022

Publication date: May 16, 2024

Applicant: Intel Corporation

Inventors: Jiasheng Chen, Chunhui Mei, Ben J. Ashbaugh, Naveen Matam, Joydeep Ray, Timothy Bauer, Guei-Yuan Lueh, Vasanth Ranganathan, Prashant Chaudhari, Vikranth Vemulapalli, Nishanth Reddy Pendluru, Piotr Reiter, Jain Philip, Marek Rudniewski, Christopher Spencer, Parth Damani, Prathamesh Raghunath Shinde, John Wiegert, Fataneh Ghodrat
MEMORY PREFETCHING IN MULTIPLE GPU ENVIRONMENT

Publication number: 20240161226

Abstract: Embodiments are generally directed to memory prefetching in multiple GPU environment. An embodiment of an apparatus includes multiple processors including a host processor and multiple graphics processing units (GPUs) to process data, each of the GPUs including a prefetcher and a cache; and a memory for storage of data, the memory including a plurality of memory elements, wherein the prefetcher of each of the GPUs is to prefetch data from the memory to the cache of the GPU; and wherein the prefetcher of a GPU is prohibited from prefetching from a page that is not owned by the GPU or by the host processor.

Type: Application

Filed: November 16, 2023

Publication date: May 16, 2024

Applicant: Intel Corporation

Inventors: Joydeep Ray, Aravindh Anantaraman, Valentin Andrei, Abhishek R. Appu, Nicolas Galoppo von Borries, Varghese George, Altug Koker, Elmoustapha Ould-Ahmed-Vall, Mike Macpherson, Subramaniam Maiyuran
ARCHITECTURE FOR BLOCK SPARSE OPERATIONS ON A SYSTOLIC ARRAY

Publication number: 20240161227

Abstract: Embodiments described herein include software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. One embodiment provides for data aware sparsity via compressed bitstreams. One embodiment provides for block sparse dot product instructions. One embodiment provides for a depth-wise adapter for a systolic array.

Type: Application

Filed: December 7, 2023

Publication date: May 16, 2024

Applicant: Intel Corporation

Inventors: Abhishek Appu, Subramaniam Maiyuran, Mike Macpherson, Fangwen Fu, Jiasheng Chen, Varghese George, Vasanth Ranganathan, Ashutosh Garg, Joydeep Ray
AUGMENTED REALITY VIRTUAL REALITY RAY TRACING SENSORY ENHANCEMENT SYSTEM, APPARATUS AND METHOD

Publication number: 20240163631

Abstract: Systems, apparatuses and methods may provide away to render augmented reality (AR) and/or virtual reality (VR) sensory enhancements using ray tracing. More particularly, systems, apparatuses and methods may provide a way to normalize environment information captured by multiple capture devices, and calculate, for an observer, the sound sources or sensed events vector paths. The systems, apparatuses and methods may detect and/or manage one or more capture devices and assign one or more the capture devices based on one or more conditions to provide observer an immersive VR/AR experience.

Type: Application

Filed: November 22, 2023

Publication date: May 16, 2024

Inventors: Joydeep Ray, Travis T. Schluessler, Prasoonkumar Surti, John H. Feit, Nikos Kaburlasos, Jacek Kwiatkowski, Abhishek R. Appu, James M. Holland, Jeffery S. Boles, Jonathan Kennedy, Louis Feng, Atsuo Kuwahara, Barnan Das, Narayan Biswal, Stanley J. Baran, Gokcen Cilingir, Nilesh V. Shah, Archie Sharma, Mayuresh M. Varerkar
VIRTUAL ADDRESS ACCESS TO GPU SURFACE AND SAMPLER STATES

Publication number: 20240134527

Abstract: Embodiments described herein provide a technique to enable access to entries in a surface state or sampler state using 64-bit virtual addresses. One embodiment provides a graphics core that includes memory access circuitry configured to facilitate access to the memory by functional units of the graphics core. The memory access circuitry is configured to receive a message to access an entry in a surface state or a sampler state associated with a parallel processing operation. The message specifies a base address for a surface state entry or sampler state entry. The circuitry can add the base address and the offset to determine a 64-bit virtual address for the entry in the surface state entry or the sampler state and submit a memory access request to the memory to access the entry of the surface state or sampler state.

Type: Application

Filed: October 20, 2022

Publication date: April 25, 2024

Applicant: Intel Corporation

Inventors: Joydeep Ray, Michael Apodaca, Yoav Harel, Guei-Yuan Lueh, John A. Wiegert
BROADCAST ASYNCHRONOUS LOADS TO SHARED LOCAL MEMORY

Publication number: 20240134797

Abstract: Embodiments described herein provide a technique to facilitate the broadcast or multicast of asynchronous loads to shared local memory of a plurality of graphics cores within a graphics core cluster. One embodiment provides a graphics processor including a cache memory a graphics core cluster coupled with the cache memory. The graphics core cluster includes a plurality of graphics cores. The plurality of graphics cores includes a graphics core configured to receive a designation as a producer graphics core for a multicast load, read data from the cache memory; and transmit the data read from the cache memory to a consumer graphics core of the plurality of graphics cores.

Type: Application

Filed: October 24, 2022

Publication date: April 25, 2024

Applicant: Intel Corporation

Inventors: John A. Wiegert, Joydeep Ray, Vasanth Ranganathan, Biju George, Fangwen Fu, Abhishek R. Appu, Chunhui Mei, Changwon Rhee
Fragment compression for coarse pixel shading

Patent number: 11961179

Abstract: One embodiment provides for a graphics processing unit comprising a processing cluster to perform multi-rate shading via coarse pixel shading and output shaded coarse pixels for processing by a post-shader pixel processing pipeline.

Type: Grant

Filed: April 24, 2023

Date of Patent: April 16, 2024

Assignee: Intel Corporation

Inventors: Prasoonkumar Surti, Abhishek R. Appu, Subhajit Dasgupta, Srivallaba Mysore, Michael J. Norris, Vasanth Ranganathan, Joydeep Ray
Dynamic memory reconfiguration

Patent number: 11954062

Abstract: Embodiments described herein provide techniques to enable the dynamic reconfiguration of memory on a general-purpose graphics processing unit. One embodiment described herein enables dynamic reconfiguration of cache memory bank assignments based on hardware statistics. One embodiment enables for virtual memory address translation using mixed four kilobyte and sixty-four kilobyte pages within the same page table hierarchy and under the same page directory. One embodiment provides for a graphics processor and associated heterogenous processing system having near and far regions of the same level of a cache hierarchy.

Type: Grant

Filed: March 14, 2020

Date of Patent: April 9, 2024

Assignee: Intel Corporation

Inventors: Joydeep Ray, Niranjan Cooray, Subramaniam Maiyuran, Altug Koker, Prasoonkumar Surti, Varghese George, Valentin Andrei, Abhishek Appu, Guadalupe Garcia, Pattabhiraman K, Sungye Kim, Sanjay Kumar, Pratik Marolia, Elmoustapha Ould-Ahmed-Vall, Vasanth Ranganathan, William Sadler, Lakshminarayanan Striramassarma
Graphics system with additional context

Patent number: 11954783

Abstract: An embodiment of an electronic processing system may include an application processor, persistent storage media communicatively coupled to the application processor, and a graphics subsystem communicatively coupled to the application processor. The graphics subsystem may include a first graphics engine to process a graphics workload, and a second graphics engine to offload at least a portion of the graphics workload from the first graphics engine. The second graphics engine may include a low precision compute engine. The system may further include a wearable display housing the second graphics engine. Other embodiments are disclosed and claimed.

Type: Grant

Filed: December 29, 2021

Date of Patent: April 9, 2024

Assignee: Intel Corporation

Inventors: Atsuo Kuwahara, Deepak S. Vembar, Chandrasekaran Sakthivel, Radhakrishnan Venkataraman, Brent E. Insko, Anupreet S. Kalra, Hugues Labbe, Abhishek R. Appu, Ankur N. Shah, Joydeep Ray, Elmoustapha Ould-Ahmed-Vall, Prasoonkumar Surti, Murali Ramadoss
SHARED LOCAL REGISTERS FOR THREAD TEAM PROCESSING

Publication number: 20240112295

Abstract: Shared local registers for thread team processing is described. An example of an apparatus includes one or more processors including a graphic processor having multiple processing resources; and memory for storage of data, the graphics processor to allocate a first thread team to a first processing resource, the first thread team including hardware threads to be executed solely by the first processing resource; allocate a shared local register (SLR) space that may be directly reference in the ISA instructions to the first processing resource, the SLR space being accessible to the threads of the thread team and being inaccessible to threads outside of the thread team; and allocate individual register spaces to the thread team, each of the individual register spaces being accessible to a respective thread of the thread team.

Type: Application

Filed: September 30, 2022

Publication date: April 4, 2024

Applicant: Intel Corporation

Inventors: Biju George, Fangwen Fu, Supratim Pal, Jorge Parra, Chunhui Mei, Maxim Kazakov, Joydeep Ray
Compute optimizations for low precision machine learning operations

Patent number: 11948224

Abstract: One embodiment provides an apparatus comprising a memory stack including multiple memory dies and a parallel processor including a plurality of multiprocessors. Each multiprocessor has a single instruction, multiple thread (SIMT) architecture, the parallel processor coupled to the memory stack via one or more memory interfaces. At least one multiprocessor comprises a multiply-accumulate circuit to perform multiply-accumulate operations on matrix data in a stage of a neural network implementation to produce a result matrix comprising a plurality of matrix data elements at a first precision, precision tracking logic to evaluate metrics associated with the matrix data elements and indicate if an optimization is to be performed for representing data at a second stage of the neural network implementation, and a numerical transform unit to dynamically perform a numerical transform operation on the matrix data elements based on the indication to produce transformed matrix data elements at a second precision.

Type: Grant

Filed: November 1, 2022

Date of Patent: April 2, 2024

Assignee: Intel Corporation

Inventors: Elmoustapha Ould-Ahmed-Vall, Sara S. Baghsorkhi, Anbang Yao, Kevin Nealis, Xiaoming Chen, Altug Koker, Abhishek R. Appu, John C. Weast, Mike B. Macpherson, Dukhwan Kim, Linda L. Hurd, Ben J. Ashbaugh, Barath Lakshmanan, Liwei Ma, Joydeep Ray, Ping T. Tang, Michael S. Strickland
BASE PLUS OFFSET ADDRESSING FOR LOAD/STORE MESSAGES

Publication number: 20240095038

Abstract: Embodiments described herein provide a technique to decompose 64-bit per-lane virtual addresses to access a plurality of data elements on behalf of a multi-lane parallel processing execution resource of a graphics or compute accelerator. The 64-bit per-lane addresses are decomposed into a base address and a plurality of per-lane offsets for transmission to memory access circuitry. The memory access circuitry then combines the base address and the per-lane offsets to reconstruct the per-lane addresses.

Type: Application

Filed: September 21, 2022

Publication date: March 21, 2024

Applicant: Intel Corporation

Inventors: John Wiegert, Joydeep Ray, Timothy Bauer, James Valerio
Assistance for hardware prefetch in cache access

Patent number: 11934342

Abstract: Embodiments are generally directed to graphics processor data access and sharing. An embodiment of an apparatus includes a circuit element to produce a result in processing of an application; a load-store unit to receive the result and generate pre-fetch information for a cache utilizing the result; and a prefetch generator to produce prefetch addresses based at least in part on the pre-fetch information; wherein the load-store unit is to receive software assistance for prefetching, and wherein generation of the pre-fetch information is based at least in part on the software assistance.

Type: Grant

Filed: March 14, 2020

Date of Patent: March 19, 2024

Assignee: INTEL CORPORATION

Inventors: Altug Koker, Varghese George, Aravindh Anantaraman, Valentin Andrei, Abhishek R. Appu, Niranjan Cooray, Nicolas Galoppo Von Borries, Mike MacPherson, Subramaniam Maiyuran, ElMoustapha Ould-Ahmed-Vall, David Puffer, Vasanth Ranganathan, Joydeep Ray, Ankur N. Shah, Lakshminarayanan Striramassarma, Prasoonkumar Surti, Saurabh Tangri
Convolutional neural network optimization mechanism

Patent number: 11934934

Abstract: An apparatus to facilitate optimization of a convolutional neural network (CNN) is disclosed. The apparatus includes optimization logic to receive a CNN model having a list of instructions and including pruning logic to optimize the list of instructions by eliminating branches in the list of instructions that comprise a weight value of 0.

Type: Grant

Filed: April 17, 2017

Date of Patent: March 19, 2024

Assignee: Intel Corporation

Inventors: Liwei Ma, Elmoustapha Ould- Ahmed-Vall, Barath Lakshmanan, Ben J. Ashbaugh, Jingyi Jin, Jeremy Bottleson, Mike B. Macpherson, Kevin Nealis, Dhawal Srivastava, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Anbang Yao, Tatiana Shpeisman, Altug Koker, Abhishek R. Appu
MERGING ATOMICS TO THE SAME CACHE LINE

Publication number: 20240087077

Abstract: Embodiments described herein provide a technique to merge partial cache line writes to a cache memory. One embodiment provides a graphics processor comprising a graphics core, a cache coupled with the graphics core, and memory access circuitry to process memory access messages received from the graphics core. The memory access circuitry includes partial cache line write merge circuitry configured to merge a first partial write to a cache line of the cache with a second partial write to the cache line of the cache.

Type: Application

Filed: September 14, 2022

Publication date: March 14, 2024

Applicant: Intel Corporation

Inventors: Joydeep Ray, Abhishek R. Appu, Prathamesh Raghunath Shinde, John Wiegert

prev … 2 3 4 5 6 7 8 9 10 … next