Patents by Inventor James A. Valerio

James A. Valerio has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11977895
    Abstract: Examples described herein relate to a graphics processing unit (GPU) coupled to the memory device, the GPU configured to: execute an instruction thread; determine if a dual directional signal barrier is associated with the instruction thread; and based on clearance of the dual directional signal barrier for a particular signal barrier identifier and a mode of operation, indicate a clearance of the dual directional signal barrier for the mode of operation, wherein the dual directional signal barrier is to provide a single barrier to gate activity of one or more producers based on activity of one or more consumers or gate activity of one or more consumers based on activity of one or more producers.
    Type: Grant
    Filed: December 22, 2020
    Date of Patent: May 7, 2024
    Assignee: Intel Corporation
    Inventors: Sabareesh Ganapathy, Fangwen Fu, Hong Jiang, James Valerio
  • Publication number: 20240111609
    Abstract: Low-latency synchronization utilizing local team barriers for thread team processing is described. An example of an apparatus includes one or more processors including a graphics processor, the graphics processor including a plurality of processing resources; and memory for storage of data including data for graphics processing, wherein the graphics processor is to receive a request for establishment of a local team barrier for a thread team, the thread team being allocated to a first processing resource, the thread team including multiple threads; determine requirements and designated threads for the local team barrier; and establish the local team barrier in a local register of the first processing resource based at least in part on the requirements and designated threads for the local barrier.
    Type: Application
    Filed: September 30, 2022
    Publication date: April 4, 2024
    Applicant: Intel Corporation
    Inventors: Biju George, Supratim Pal, James Valerio, Vasanth Ranganathan, Fangwen Fu, Chunhui Mei
  • Publication number: 20240095038
    Abstract: Embodiments described herein provide a technique to decompose 64-bit per-lane virtual addresses to access a plurality of data elements on behalf of a multi-lane parallel processing execution resource of a graphics or compute accelerator. The 64-bit per-lane addresses are decomposed into a base address and a plurality of per-lane offsets for transmission to memory access circuitry. The memory access circuitry then combines the base address and the per-lane offsets to reconstruct the per-lane addresses.
    Type: Application
    Filed: September 21, 2022
    Publication date: March 21, 2024
    Applicant: Intel Corporation
    Inventors: John Wiegert, Joydeep Ray, Timothy Bauer, James Valerio
  • Publication number: 20240086357
    Abstract: Systems and methods for updating remote memory side caches in a multi-GPU configuration are disclosed herein. In one embodiment, a graphics processor for a multi-tile architecture includes a first graphics processing unit (GPU) having a first memory, a first memory side cache memory, a first communication fabric, and a first memory management unit (MMU). The graphics processor includes a second graphics processing unit (GPU) having a second memory, a second memory side cache memory, a second memory management unit (MMU), and a second communication fabric that is communicatively coupled to the first communication fabric. The first MMU is configured to control memory requests for the first memory, to update content in the first memory, to update content in the first memory side cache memory, and to determine whether to update the content in the second memory side cache memory.
    Type: Application
    Filed: November 21, 2023
    Publication date: March 14, 2024
    Applicant: Intel Corporation
    Inventors: Altug Koker, Joydeep Ray, Aravindh Anantaraman, Valentin Andrei, Abhishek Appu, Sean Coleman, Nicolas Galoppo Von Borries, Varghese George, Pattabhiraman K, SungYe Kim, Mike Macpherson, Subramaniam Maiyuran, Elmoustapha Ould-Ahmed-Vall, Vasanth Ranganathan, James Valerio
  • Publication number: 20240086064
    Abstract: Embodiments described herein enable the offload of address calculations required to access a data element within an array of data elements from primary compute resources of a graphics processor to the memory access circuitry of the graphics processor. The memory access circuitry is configured to receive a message to access a data element of an array of data elements in the memory, the message to include an index of the data element in the array of data elements, calculate a byte address for the data element based in part on the index of the data element in the array of data elements, and submit a memory access request to the memory to access the data element at the byte address.
    Type: Application
    Filed: September 14, 2022
    Publication date: March 14, 2024
    Applicant: Intel Corporation
    Inventors: John Wiegert, Joydeep Ray, Timothy Bauer, James Valerio
  • Publication number: 20240054595
    Abstract: Embodiments described herein provide a system of concurrent compute queues that enable the scheduling of a large number of compute contexts simultaneously on graphics processor hardware. One embodiment provides an apparatus comprising a system interface and a general-purpose graphics processor coupled with the system interface. The general-purpose graphics processor comprises a plurality of graphics processor hardware resources configured to be partitioned into a plurality of isolated partitions, each of the plurality of isolated partitions including a first command streamer, a second command streamer, and circuitry configured to schedule general-purpose graphics compute workloads submitted to a first plurality of command queues associated with the first command streamer and a second plurality of command queues associated with the second command streamer.
    Type: Application
    Filed: August 10, 2022
    Publication date: February 15, 2024
    Applicant: Intel Corporation
    Inventors: Joydeep Ray, Vasanth Ranganathan, James Valerio, Jeffery S. Boles, Hema Chand Nalluri, Aditya Navale, Ben J. Ashbaugh, Michal Mrozek, Murali Ramadoss, Hong Jiang, Ankur Shah
  • Publication number: 20240045725
    Abstract: Apparatus and method for concurrent performance monitoring. For example, one embodiment of an apparatus comprises: compute hardware logic to concurrently process a number of workloads, the compute hardware logic to be subdivided into a plurality of compute hardware contexts based on the number of workloads; and programmable performance monitoring circuitry to be dynamically partitioned to perform parallel performance monitoring operations to monitor performance of each of the plurality of compute hardware contexts while the number of workloads are concurrently processed, the programmable performance monitoring circuitry to differentiate between performance monitoring data of different compute hardware contexts based on a unique identifier associated with each of the compute hardware contexts.
    Type: Application
    Filed: August 4, 2022
    Publication date: February 8, 2024
    Inventors: Prashant CHAUDHARI, Jain PHILIP, James VALERIO, Murali RAMADOSS, Ankur SHAH, Jeffery S. BOLES, Aditya NAVALE
  • Publication number: 20230401064
    Abstract: A graphics processing device is provided that includes a set of compute units to execute a workload, a cache coupled with the set of compute units, and circuitry coupled with the cache and the set of compute units. The circuitry is configured to, in response to a cache miss for the read from a first cache, broadcast an event within the graphics processor device to identify data associated with the cache miss, receive the event at a second compute unit in the set of compute units, and prefetch the data identified by the event into a second cache that is local to the second compute unit before an attempt to read the instruction or data by the second thread.
    Type: Application
    Filed: July 6, 2023
    Publication date: December 14, 2023
    Applicant: Intel Corporation
    Inventors: JAMES VALERIO, VASANTH RANGANATHAN, JOYDEEP RAY, PRADEEP RAMANI
  • Publication number: 20230359499
    Abstract: Examples are described here that can be used to allocate commands from multiple sources to performance by one or more segments of a processing device. For example, a processing device can be segmented into multiple portions and each portion is allocated to process commands from a particular source. In the event a single source provides commands, the entire processing device (all segments) can be allocated to process commands from the single source. When a second source provides commands, some segments can be allocated to perform commands from the first source and other segments can be allocated to perform commands from the second source. Accordingly, commands from multiple applications can be executed by a processing unit at the same time.
    Type: Application
    Filed: May 9, 2023
    Publication date: November 9, 2023
    Inventors: James VALERIO, Vasanth RANGANATHAN, Joydeep RAY, Rahul A. KULKARNI, Abhishek R. APPU, Jeffery S. BOLES, Hema C. NALLURI
  • Patent number: 11762662
    Abstract: A graphics processing device comprises a set of compute units to execute multiple threads of a workload, a cache coupled with the set of compute units, and a prefetcher to prefetch instructions associated with the workload. The prefetcher is configured to use a thread dispatch command that is used to dispatch threads to execute a kernel to prefetch instructions, parameters, and/or constants that will be used during execution of the kernel. Prefetch operations for the kernel can then occur concurrently with thread dispatch operations.
    Type: Grant
    Filed: October 25, 2021
    Date of Patent: September 19, 2023
    Assignee: Intel Corporation
    Inventors: James Valerio, Vasanth Ranganathan, Joydeep Ray, Pradeep Ramani
  • Patent number: 11726826
    Abstract: Examples are described here that can be used to allocate commands from multiple sources to performance by one or more segments of a processing device. For example, a processing device can be segmented into multiple portions and each portion is allocated to process commands from a particular source. In the event a single source provides commands, the entire processing device (all segments) can be allocated to process commands from the single source. When a second source provides commands, some segments can be allocated to perform commands from the first source and other segments can be allocated to perform commands from the second source. Accordingly, commands from multiple applications can be executed by a processing unit at the same time.
    Type: Grant
    Filed: June 4, 2021
    Date of Patent: August 15, 2023
    Assignee: Intel Corporation
    Inventors: James Valerio, Vasanth Ranganathan, Joydeep Ray, Rahul A. Kulkarni, Abhishek R. Appu, Jeffery S. Boles, Hema C. Nalluri
  • Publication number: 20230153176
    Abstract: An apparatus to facilitate facilitating forward progress guarantee using single-level synchronization at individual thread granularity is disclosed. The apparatus includes a processor comprising a barrier synchronization hardware circuitry to assign a set of global named barrier identifiers (IDs) to individual execution threads of a plurality of execution threads and synchronize execution of the individual execution threads on a single level via the set of global named barrier IDs; and a plurality of processing resources to execute the plurality of execution threads and comprising divergent barrier scheduling hardware circuitry to facilitate execution flow switching from a first divergent branch executed by a first thread to a second divergent branch executed by a second thread, the execution flow switching performed responsive to the first thread stalling to wait on a named barrier of the set of global named barrier IDs.
    Type: Application
    Filed: November 17, 2021
    Publication date: May 18, 2023
    Applicant: Intel Corporation
    Inventors: Chunhui Mei, James Valerio, Supratim Pal, Guei-Yuan Lueh, Hong Jiang
  • Publication number: 20230094002
    Abstract: Dynamic routing of texture-load in graphics processing is described. An example of an apparatus includes a graphics processor including a plurality of processing engines of a class of processing engines of the graphic processor; a set of queues for the plurality of processing engines; and a unified submit port for the plurality of processing engines, wherein the unified submit port is to notify a scheduler regarding availability of slots in the set of queues for receipt of workload contexts; and wherein, upon the unified submit port receiving a workload context for processing by the plurality of processing engines, the unified submit port is to detect an available processing engine of the plurality of processing engines and direct the received context to a slot of the set of queues for processing by the available processing engine.
    Type: Application
    Filed: September 24, 2021
    Publication date: March 30, 2023
    Applicant: Intel Corporation
    Inventors: Hema Chand Nalluri, Jeffery S. Boles, Joseph Koston, Ankur N. Shah, Vidhya Krishnan, Vasanth Ranganathan, Joydeep Ray, Aditya Navale, Murali Ramadoss, James Valerio
  • Publication number: 20230090973
    Abstract: One embodiment provides a graphics processor including a processing resource including a register file, memory, a cache memory, and load/store/cache circuitry to process load, store, and prefetch messages from the processing resource. The circuitry includes support for an immediate address offset that will be used to adjust the address supplied for a memory access to be requested by the circuitry. Including support for the immediate address offset removes the need to execute additional instructions to adjust the address to be accessed prior to execution of the memory access instruction.
    Type: Application
    Filed: September 21, 2021
    Publication date: March 23, 2023
    Applicant: Intel Corporation
    Inventors: Joydeep Ray, Abhishek R. Appu, Timothy R. Bauer, James Valerio, Weiyu Chen, Subramaniam Maiyuran, Prasoonkumar Surti, Karthik Vaidyanathan, Carsten Benthin, Sven Woop, Jiasheng Chen
  • Publication number: 20230039853
    Abstract: Embodiments described herein provide a graphics, media, and compute device having a tiled architecture composed of a number of tiles of smaller graphics devices. The work distribution infrastructure for such device enables the distribution of workloads across multiple tiles of the device. Work items can be submitted to any one or more of the multiple tiles, with workloads able to span multiple tiles. Additionally, upon completion of a work item, graphics, media, and/or compute engines within the device can readily acquire new work items for execution with minimal latency.
    Type: Application
    Filed: October 18, 2022
    Publication date: February 9, 2023
    Applicant: Intel Corporation
    Inventors: Balaji Vembu, Brandon Fliflet, James Valerio, Michael Apodaca, Ben Ashbaugh, Hema Nalluri, Ankur Shah, Murali Ramadoss, David Puffer, Altug Koker, Aditya Navale, Abhishek R. Appu, Joydeep Ray, Travis Schluessler
  • Publication number: 20220413899
    Abstract: An apparatus to facilitate barrier state save and restore for preemption in a graphics environment is disclosed. The apparatus includes processing resources to execute a plurality of execution threads that are comprised in a thread group (TG) and mid-thread preemption barrier save and restore hardware circuitry to: initiate an exception handling routine in response to a mid-thread preemption event, the exception handling routine to cause a barrier signaling event to be issued; receive indication of a valid designated thread status for a thread of a thread group (TG) in response to the barrier signaling event; and in response to receiving the indication of the valid designated thread status for the thread of the TG, cause, by the thread of the TG having the valid designated thread status, a barrier save routine and a barrier restore routine to be initiated for named barriers of the TG.
    Type: Application
    Filed: June 25, 2021
    Publication date: December 29, 2022
    Applicant: Intel Corporation
    Inventors: Vasanth Ranganathan, James Valerio, Joydeep Ray, Abhishek R. Appu, Alan Curtis, Prathamesh Raghunath Shinde, Brandon Fliflet, Ben J. Ashbaugh, John Wiegert
  • Patent number: 11508338
    Abstract: A mechanism is described for facilitating using of a shared local memory for register spilling/filling relating to graphics processors at computing devices. A method of embodiments, as described herein, includes reserving one or more spaces of a shared local memory (SLM) to perform one or more of spilling and filling relating to registers associated with a graphics processor of a computing device.
    Type: Grant
    Filed: October 5, 2020
    Date of Patent: November 22, 2022
    Assignee: Intel Corporation
    Inventors: Joydeep Ray, Altug Koker, Balaji Vembu, Murali Ramadoss, Guei-Yuan Lueh, James A. Valerio, Prasoonkumar Surti, Abhishek R. Appu, Vasanth Ranganathan, Kalyan K. Bhiravabhatla, Arthur D. Hunter, Jr., Wei-Yu Chen, Subramaniam M. Maiyuran
  • Patent number: 11494232
    Abstract: A mechanism is described for facilitating memory-based software barriers to emulate hardware barriers at graphics processors in computing devices. A method of embodiments, as described herein, includes facilitating converting thread scheduling at a processor from hardware barriers to software barriers, where the software barriers emulate the hardware barriers.
    Type: Grant
    Filed: November 24, 2020
    Date of Patent: November 8, 2022
    Assignee: Intel Corporation
    Inventors: Altug Koker, Joydeep Ray, Balaji Vembu, James A. Valerio, Abhishek R. Appu
  • Patent number: 11481864
    Abstract: Embodiments described herein provide a graphics, media, and compute device having a tiled architecture composed of a number of tiles of smaller graphics devices. The work distribution infrastructure for such device enables the distribution of workloads across multiple tiles of the device. Work items can be submitted to any one or more of the multiple tiles, with workloads able to span multiple tiles. Additionally, upon completion of a work item, graphics, media, and/or compute engines within the device can readily acquire new work items for execution with minimal latency.
    Type: Grant
    Filed: April 19, 2021
    Date of Patent: October 25, 2022
    Assignee: Intel Corporation
    Inventors: Balaji Vembu, Brandon Fliflet, James Valerio, Michael Apodaca, Ben Ashbaugh, Hema Nalluri, Ankur Shah, Murali Ramadoss, David Puffer, Altug Koker, Aditya Navale, Abhishek R. Appu, Joydeep Ray, Travis Schluessler
  • Patent number: 11436695
    Abstract: One embodiment provides for a general-purpose graphics processing device comprising a general-purpose graphics processing compute block to process a workload including graphics or compute operations, a first cache memory, and a coherency module enable the first cache memory to coherently cache data for the workload, the data stored in memory within a virtual address space, wherein the virtual address space shared with a separate general-purpose processor including a second cache memory that is coherent with the first cache memory.
    Type: Grant
    Filed: March 12, 2021
    Date of Patent: September 6, 2022
    Assignee: Intel Corporation
    Inventors: Joydeep Ray, Altug Koker, James A. Valerio, David Puffer, Abhishek R. Appu, Stephen Junkins