Patents by Inventor Jerome F. Duluk, Jr.

Jerome F. Duluk, Jr. has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20210019185
    Abstract: One embodiment of the present invention sets forth a technique for encapsulating compute task state that enables out-of-order scheduling and execution of the compute tasks. The scheduling circuitry organizes the compute tasks into groups based on priority levels. The compute tasks may then be selected for execution using different scheduling schemes. Each group is maintained as a linked list of pointers to compute tasks that are encoded as task metadata (TMD) stored in memory. A TMD encapsulates the state and parameters needed to initialize, schedule, and execute a compute task.
    Type: Application
    Filed: October 5, 2020
    Publication date: January 21, 2021
    Inventors: Jerome F. DULUK, JR., Lacky V. SHAH, Sean J. TREICHLER
  • Patent number: 10817338
    Abstract: Embodiments of the present invention set forth techniques for allocating execution resources to groups of threads within a graphics processing unit. A compute work distributor included in the graphics processing unit receives an indication from a process that a first group of threads is to be launched. The compute work distributor determines that a first subcontext associated with the process has at least one processor credit. In some embodiments, CTAs may be launched even when there are no processor credits, if one of the TPCs that was already acquired has sufficient space. The compute work distributor identifies a first processor included in a plurality of processors that has a processing load that is less than or equal to the processor loads associated with all other processors included in the plurality of processors. The compute work distributor launches the first group of threads to execute on the first processor.
    Type: Grant
    Filed: January 31, 2018
    Date of Patent: October 27, 2020
    Assignee: NVIDIA Corporation
    Inventors: Jerome F. Duluk, Jr., Luke Durant, Ramon Matas Navarro, Alan Menezes, Jeffrey Tuckey, Gentaro Hirota, Brian Pharris
  • Patent number: 10795722
    Abstract: One embodiment of the present invention sets forth a technique for encapsulating compute task state that enables out-of-order scheduling and execution of the compute tasks. The scheduling circuitry organizes the compute tasks into groups based on priority levels. The compute tasks may then be selected for execution using different scheduling schemes. Each group is maintained as a linked list of pointers to compute tasks that are encoded as task metadata (TMD) stored in memory. A TMD encapsulates the state and parameters needed to initialize, schedule, and execute a compute task.
    Type: Grant
    Filed: November 9, 2011
    Date of Patent: October 6, 2020
    Assignee: NVIDIA Corporation
    Inventors: Jerome F. Duluk, Jr., Lacky V. Shah, Sean J. Treichler
  • Publication number: 20200151016
    Abstract: One embodiment of the present invention sets forth a technique for performing nested kernel execution within a parallel processing subsystem. The technique involves enabling a parent thread to launch a nested child grid on the parallel processing subsystem, and enabling the parent thread to perform a thread synchronization barrier on the child grid for proper execution semantics between the parent thread and the child grid. This technique advantageously enables the parallel processing subsystem to perform a richer set of programming constructs, such as conditionally executed and nested operations and externally defined library functions without the additional complexity of CPU involvement.
    Type: Application
    Filed: January 17, 2020
    Publication date: May 14, 2020
    Inventors: Stephen Jones, Philip Alexander Cuadra, Daniel Elliot Wexler, Ignacio Llamas, Lacky V. Shah, Jerome F. Duluk, Jr., Christopher Lamb
  • Publication number: 20190340145
    Abstract: Techniques are disclosed for tracking memory page accesses in a unified virtual memory system. An access tracking unit detects a memory page access generated by a first processor for accessing a memory page in a memory system of a second processor. The access tracking unit determines whether a cache memory includes an entry for the memory page. If so, then the access tracking unit increments an associated access counter. Otherwise, the access tracking unit attempts to find an unused entry in the cache memory that is available for allocation. If so, then the access tracking unit associates the second entry with the memory page, and sets an access counter associated with the second entry to an initial value. Otherwise, the access tracking unit selects a valid entry in the cache memory; clears an associated valid bit; associates the entry with the memory page; and initializes an associated access counter.
    Type: Application
    Filed: June 24, 2019
    Publication date: November 7, 2019
    Inventors: Jerome F. DULUK, JR., Cameron BUSCHARDT, James Leroy DEMING, Brian FAHS, Mark HAIRGROVE, John MASHEY
  • Patent number: 10445243
    Abstract: A system for managing virtual memory. The system includes a first processing unit configured to execute a first operation that references a first virtual memory address. The system also includes a first memory management unit (MMU) associated with the first processing unit and configured to generate a first page fault upon determining that a first page table that is stored in a first memory unit associated with the first processing unit does not include a mapping corresponding to the first virtual memory address. The system further includes a first copy engine associated with the first processing unit. The first copy engine is configured to read a first command queue to determine a first mapping that corresponds to the first virtual memory address and is included in a first page state directory. The first copy engine is also configured to update the first page table to include the first mapping.
    Type: Grant
    Filed: October 16, 2013
    Date of Patent: October 15, 2019
    Assignee: NVIDIA CORPORATION
    Inventors: Jerome F. Duluk, Jr., Cameron Buschardt, Sherry Cheung, James Leroy Deming, Samuel H. Duncan, Lucien Dunning, Robert George, Arvind Gopalakrishnan, Mark Hairgrove, Chenghuan Jia, John Mashey
  • Patent number: 10438314
    Abstract: One embodiment of the present invention sets forth a graphics processing system. The graphics processing system includes a screen-space pipeline and a tiling unit. The screen-space pipeline is configured to perform visibility testing and fragment shading. The tiling unit is configured to determine that a first set of primitives overlaps a first cache tile. The tiling unit is also configured to first transmit the first set of primitives to the screen-space pipeline with a command configured to cause the screen-space pipeline to process the first set of primitives in a z-only mode, and then transmit the first set of primitives to the screen-space pipeline with a command configured to cause the screen-space pipeline to process the first set of primitives in a normal mode. In the z-only mode, at least some fragment shading operations are disabled in the screen-space pipeline. In the normal mode, fragment shading operations are enabled.
    Type: Grant
    Filed: April 23, 2018
    Date of Patent: October 8, 2019
    Assignee: NVIDIA CORPORATION
    Inventors: Ziyad S. Hakura, Jerome F. Duluk, Jr.
  • Patent number: 10409730
    Abstract: One embodiment of the present invention includes a microcontroller coupled to a memory management unit (MMU). The MMU is coupled to a page table included in a physical memory, and the microcontroller is configured to perform one or more virtual memory operations associated with the physical memory and the page table. In operation, the microcontroller receives a page fault generated by the MMU in response to an invalid memory access via a virtual memory address. To remedy such a page fault, the microcontroller performs actions to map the virtual memory address to an appropriate location in the physical memory. By contrast, in prior-art systems, a fault handler would typically remedy the page fault. Advantageously, because the microcontroller executes these tasks locally with respect to the MMU and the physical memory, latency associated with remedying page faults may be decreased. Consequently, overall system performance may be increased.
    Type: Grant
    Filed: August 27, 2013
    Date of Patent: September 10, 2019
    Assignee: NVIDIA CORPORATION
    Inventors: Cameron Buschardt, Jerome F. Duluk, Jr., John Mashey, Mark Hairgrove, James Leroy Deming, Brian Fahs
  • Publication number: 20190243652
    Abstract: One embodiment of the present invention sets forth a graphics processing system. The graphics processing system includes a screen-space pipeline and a tiling unit. The screen-space pipeline is configured to perform visibility testing and fragment shading. The tiling unit is configured to determine that a first set of primitives overlaps a first cache tile. The tiling unit is also configured to first transmit the first set of primitives to the screen-space pipeline with a command configured to cause the screen-space pipeline to process the first set of primitives in a z-only mode, and then transmit the first set of primitives to the screen-space pipeline with a command configured to cause the screen-space pipeline to process the first set of primitives in a normal mode. In the z-only mode, at least some fragment shading operations are disabled in the screen-space pipeline. In the normal mode, fragment shading operations are enabled.
    Type: Application
    Filed: April 23, 2018
    Publication date: August 8, 2019
    Inventors: Ziyad S. HAKURA, Jerome F. DULUK, JR.
  • Publication number: 20190235928
    Abstract: Embodiments of the present invention set forth techniques for allocating execution resources to groups of threads within a graphics processing unit. A compute work distributor included in the graphics processing unit receives an indication from a process that a first group of threads is to be launched. The compute work distributor determines that a first subcontext associated with the process has at least one processor credit. In some embodiments, CTAs may be launched even when there are no processor credits, if one of the TPCs that was already acquired has sufficient space. The compute work distributor identifies a first processor included in a plurality of processors that has a processing load that is less than or equal to the processor loads associated with all other processors included in the plurality of processors. The compute work distributor launches the first group of threads to execute on the first processor.
    Type: Application
    Filed: January 31, 2018
    Publication date: August 1, 2019
    Inventors: Jerome F. Duluk, JR., Luke Durant, Ramon Matas Navarro, Alan Menezes, Jeffrey Tuckey, Gentaro Hirota, Brian Pharris
  • Publication number: 20190235924
    Abstract: Embodiments of the present invention set forth techniques for allocating execution resources to groups of threads within a graphics processing unit. A compute work distributor included in the graphics processing unit receives an indication from a process that a first group of threads is to be launched. The compute work distributor determines that a first subcontext associated with the process has at least one processor credit. In some embodiments, CTAs may be launched even when there are no processor credits, if one of the TPCs that was already acquired has sufficient space. The compute work distributor identifies a first processor included in a plurality of processors that has a processing load that is less than or equal to the processor loads associated with all other processors included in the plurality of processors. The compute work distributor launches the first group of threads to execute on the first processor.
    Type: Application
    Filed: January 31, 2018
    Publication date: August 1, 2019
    Inventors: Jerome F. Duluk, Jr., Luke Durant, Ramon Matas Navarro, Alan Menezes, Jeffrey Tuckey, Gentaro Hirota, Brian Pharris
  • Patent number: 10331603
    Abstract: Techniques are disclosed for tracking memory page accesses in a unified virtual memory system. An access tracking unit detects a memory page access generated by a first processor for accessing a memory page in a memory system of a second processor. The access tracking unit determines whether a cache memory includes an entry for the memory page. If so, then the access tracking unit increments an associated access counter. Otherwise, the access tracking unit attempts to find an unused entry in the cache memory that is available for allocation. If so, then the access tracking unit associates the second entry with the memory page, and sets an access counter associated with the second entry to an initial value. Otherwise, the access tracking unit selects a valid entry in the cache memory; clears an associated valid bit; associates the entry with the memory page; and initializes an associated access counter.
    Type: Grant
    Filed: April 9, 2018
    Date of Patent: June 25, 2019
    Assignee: NVIDIA CORPORATION
    Inventors: Jerome F. Duluk, Jr., Cameron Buschardt, James Leroy Deming, Brian Fahs, Mark Hairgrove, John Mashey
  • Patent number: 10310973
    Abstract: A technique for simultaneously executing multiple tasks, each having an independent virtual address space, involves assigning an address space identifier (ASID) to each task and constructing each virtual memory access request to include both a virtual address and the ASID. During virtual to physical address translation, the ASID selects a corresponding page table, which includes virtual to physical address mappings for the ASID and associated task. Entries for a translation look-aside buffer (TLB) include both the virtual address and ASID to complete each mapping to a physical address. Deep scheduling of tasks sharing a virtual address space may be implemented to improve cache affinity for both TLB and data caches.
    Type: Grant
    Filed: October 25, 2012
    Date of Patent: June 4, 2019
    Assignee: NVIDIA CORPORATION
    Inventors: Nick Barrow-Williams, Brian Fahs, Jerome F. Duluk, Jr., James Leroy Deming, Timothy John Purcell, Lucien Dunning, Mark Hairgrove
  • Patent number: 10303616
    Abstract: A system for managing virtual memory. The system includes a first processing unit configured to execute a first operation that references a first virtual memory address. The system also includes a first memory management unit (MMU) associated with the first processing unit and configured to generate a first page fault upon determining that a first page table that is stored in a first memory unit associated with the first processing unit does not include a mapping corresponding to the first virtual memory address. The system further includes a first copy engine associated with the first processing unit. The first copy engine is configured to read a first command queue to determine a first mapping that corresponds to the first virtual memory address and is included in a first page state directory. The first copy engine is also configured to update the first page table to include the first mapping.
    Type: Grant
    Filed: October 16, 2013
    Date of Patent: May 28, 2019
    Assignee: NVIDIA CORPORATION
    Inventors: Jerome F. Duluk, Jr., Chenghuan Jia, John Mashey, Cameron Buschardt, Sherry Cheung, James Leroy Deming, Samuel H. Duncan, Lucien Dunning, Robert George, Arvind Gopalakrishnan, Mark Hairgrove
  • Patent number: 10223122
    Abstract: One embodiment of the present invention sets forth a graphics processing system configured to track event counts in a tile-based architecture. The graphics processing system includes a screen-space pipeline and a tiling unit. The screen-space pipeline includes a first unit, a count memory associated with the first unit, and an accumulating memory associated with the first unit. The first unit is configured to detect an event type and increment the count memory. The tiling unit is configured to cause the screen-space pipeline to update an external memory address to reflect a first value stored in the count memory when the first unit completes processing of a first set of primitives. The tiling unit is also configured to cause the screen-space pipeline to update the accumulating memory to reflect a second value stored in the count memory when the first unit completes processing of a second set of primitives.
    Type: Grant
    Filed: April 9, 2017
    Date of Patent: March 5, 2019
    Assignee: NVIDIA CORPORATION
    Inventors: Ziyad S. Hakura, Jerome F. Duluk, Jr.
  • Patent number: 10217183
    Abstract: A system, method, and computer program product are provided for allocating processor resources to process compute workloads and graphics workloads substantially simultaneously. The method includes the steps of allocating a plurality of processing units to process tasks associated with a graphics pipeline, receiving a request to allocate at least one processing unit in the plurality of processing units to process tasks associated with a compute pipeline, and reallocating the at least one processing unit to process tasks associated with the compute pipeline.
    Type: Grant
    Filed: December 20, 2013
    Date of Patent: February 26, 2019
    Assignee: NVIDIA CORPORATION
    Inventors: Gregory S. Palmer, Jerome F. Duluk, Jr., Karim Maher Abdalla, Jonathon S. Evans, Adam Clark Weitkemper, Lacky Vasant Shah, Philip Browning Johnson, Gentaro Hirota
  • Patent number: 10216413
    Abstract: Techniques are provided by which memory pages may be migrated among PPU memories in a multi-PPU system. According to the techniques, a UVM driver determines that a particular memory page should change ownership state and/or be migrated between one PPU memory and another PPU memory. In response to this determination, the UVM driver initiates a peer transition sequence to cause the ownership state and/or location of the memory page to change. Various peer transition sequences involve modifying mappings for one or more PPU, and copying a memory page from one PPU memory to another PPU memory. Several steps in peer transition sequences may be performed in parallel for increased processing speed.
    Type: Grant
    Filed: May 1, 2017
    Date of Patent: February 26, 2019
    Assignee: NVIDIA CORPORATION
    Inventors: Jerome F. Duluk, Jr., John Mashey, Mark Hairgrove, Chenghuan Jia, Cameron Buschardt, Lucien Dunning, Brian Fahs
  • Patent number: 10169091
    Abstract: A technique for simultaneously executing multiple tasks, each having an independent virtual address space, involves assigning an address space identifier (ASID) to each task and constructing each virtual memory access request to include both a virtual address and the ASID. During virtual to physical address translation, the ASID selects a corresponding page table, which includes virtual to physical address mappings for the ASID and associated task. Entries for a translation look-aside buffer (TLB) include both the virtual address and ASID to complete each mapping to a physical address. Deep scheduling of tasks sharing a virtual address space may be implemented to improve cache affinity for both TLB and data caches.
    Type: Grant
    Filed: October 25, 2012
    Date of Patent: January 1, 2019
    Assignee: NVIDIA CORPORATION
    Inventors: Nick Barrow-Williams, Brian Fahs, Jerome F. Duluk, Jr., James Leroy Deming, Timothy John Purcell, Lucien Dunning, Mark Hairgrove
  • Patent number: 10169072
    Abstract: A method for providing state inheritance across command lists in a multi-threaded processing environment. The method includes receiving an application program that includes a plurality of parallel threads; generating a command list for each thread of the plurality of parallel threads; causing a first command list associated with a first thread of the plurality of parallel threads to be executed by a processing unit; and causing a second command list associated with a second thread of the plurality of parallel threads to be executed by the processing unit, where the second command list inherits from the first command list state associated with the processing unit.
    Type: Grant
    Filed: August 9, 2010
    Date of Patent: January 1, 2019
    Assignee: NVIDIA CORPORATION
    Inventors: Jerome F. Duluk, Jr., Jesse David Hall, Henry Packard Moreton, Patrick R. Brown
  • Patent number: 10133677
    Abstract: Techniques are disclosed for transitioning a memory page between memories in a virtual memory subsystem. A unified virtual memory (UVM) driver detects a page fault in response to a memory access request associated with a first memory page, where a local page table does not include an entry corresponding to a virtual memory address included in the memory access request. The UVM driver, in response to the page fault, executes a page fault sequence. The page fault sequence includes modifying the ownership state associated with the first memory page to be central-processing-unit-shared. The page fault sequence further includes scheduling the first memory page for migration from a system memory associated with a central processing unit (CPU) to a local memory associated with a parallel processing unit (PPU). One advantage of the disclosed approach is that the PPU accesses memory pages with greater efficiency.
    Type: Grant
    Filed: December 18, 2013
    Date of Patent: November 20, 2018
    Assignee: NVIDIA CORPORATION
    Inventors: Jerome F. Duluk, Jr., Cameron Buschardt, James Leroy Deming, Lucien Dunning, Brian Fahs, Mark Hairgrove, John Mashey