Patents by Inventor Jerome F. Duluk, Jr.

Jerome F. Duluk, Jr. has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

COMPUTE TASK STATE ENCAPSULATION

Publication number: 20210019185

Abstract: One embodiment of the present invention sets forth a technique for encapsulating compute task state that enables out-of-order scheduling and execution of the compute tasks. The scheduling circuitry organizes the compute tasks into groups based on priority levels. The compute tasks may then be selected for execution using different scheduling schemes. Each group is maintained as a linked list of pointers to compute tasks that are encoded as task metadata (TMD) stored in memory. A TMD encapsulates the state and parameters needed to initialize, schedule, and execute a compute task.

Type: Application

Filed: October 5, 2020

Publication date: January 21, 2021

Inventors: Jerome F. DULUK, JR., Lacky V. SHAH, Sean J. TREICHLER
Dynamic partitioning of execution resources

Patent number: 10817338

Abstract: Embodiments of the present invention set forth techniques for allocating execution resources to groups of threads within a graphics processing unit. A compute work distributor included in the graphics processing unit receives an indication from a process that a first group of threads is to be launched. The compute work distributor determines that a first subcontext associated with the process has at least one processor credit. In some embodiments, CTAs may be launched even when there are no processor credits, if one of the TPCs that was already acquired has sufficient space. The compute work distributor identifies a first processor included in a plurality of processors that has a processing load that is less than or equal to the processor loads associated with all other processors included in the plurality of processors. The compute work distributor launches the first group of threads to execute on the first processor.

Type: Grant

Filed: January 31, 2018

Date of Patent: October 27, 2020

Assignee: NVIDIA Corporation

Inventors: Jerome F. Duluk, Jr., Luke Durant, Ramon Matas Navarro, Alan Menezes, Jeffrey Tuckey, Gentaro Hirota, Brian Pharris
Compute task state encapsulation

Patent number: 10795722

Abstract: One embodiment of the present invention sets forth a technique for encapsulating compute task state that enables out-of-order scheduling and execution of the compute tasks. The scheduling circuitry organizes the compute tasks into groups based on priority levels. The compute tasks may then be selected for execution using different scheduling schemes. Each group is maintained as a linked list of pointers to compute tasks that are encoded as task metadata (TMD) stored in memory. A TMD encapsulates the state and parameters needed to initialize, schedule, and execute a compute task.

Type: Grant

Filed: November 9, 2011

Date of Patent: October 6, 2020

Assignee: NVIDIA Corporation

Inventors: Jerome F. Duluk, Jr., Lacky V. Shah, Sean J. Treichler
TECHNIQUE FOR COMPUTATIONAL NESTED PARALLELISM

Publication number: 20200151016

Abstract: One embodiment of the present invention sets forth a technique for performing nested kernel execution within a parallel processing subsystem. The technique involves enabling a parent thread to launch a nested child grid on the parallel processing subsystem, and enabling the parent thread to perform a thread synchronization barrier on the child grid for proper execution semantics between the parent thread and the child grid. This technique advantageously enables the parallel processing subsystem to perform a richer set of programming constructs, such as conditionally executed and nested operations and externally defined library functions without the additional complexity of CPU involvement.

Type: Application

Filed: January 17, 2020

Publication date: May 14, 2020

Inventors: Stephen Jones, Philip Alexander Cuadra, Daniel Elliot Wexler, Ignacio Llamas, Lacky V. Shah, Jerome F. Duluk, Jr., Christopher Lamb
PCIE TRAFFIC TRACKING HARDWARE IN A UNIFIED VIRTUAL MEMORY SYSTEM

Publication number: 20190340145

Abstract: Techniques are disclosed for tracking memory page accesses in a unified virtual memory system. An access tracking unit detects a memory page access generated by a first processor for accessing a memory page in a memory system of a second processor. The access tracking unit determines whether a cache memory includes an entry for the memory page. If so, then the access tracking unit increments an associated access counter. Otherwise, the access tracking unit attempts to find an unused entry in the cache memory that is available for allocation. If so, then the access tracking unit associates the second entry with the memory page, and sets an access counter associated with the second entry to an initial value. Otherwise, the access tracking unit selects a valid entry in the cache memory; clears an associated valid bit; associates the entry with the memory page; and initializes an associated access counter.

Type: Application

Filed: June 24, 2019

Publication date: November 7, 2019

Inventors: Jerome F. DULUK, JR., Cameron BUSCHARDT, James Leroy DEMING, Brian FAHS, Mark HAIRGROVE, John MASHEY
Fault buffer for resolving page faults in unified virtual memory system

Patent number: 10445243

Abstract: A system for managing virtual memory. The system includes a first processing unit configured to execute a first operation that references a first virtual memory address. The system also includes a first memory management unit (MMU) associated with the first processing unit and configured to generate a first page fault upon determining that a first page table that is stored in a first memory unit associated with the first processing unit does not include a mapping corresponding to the first virtual memory address. The system further includes a first copy engine associated with the first processing unit. The first copy engine is configured to read a first command queue to determine a first mapping that corresponds to the first virtual memory address and is included in a first page state directory. The first copy engine is also configured to update the first page table to include the first mapping.

Type: Grant

Filed: October 16, 2013

Date of Patent: October 15, 2019

Assignee: NVIDIA CORPORATION

Inventors: Jerome F. Duluk, Jr., Cameron Buschardt, Sherry Cheung, James Leroy Deming, Samuel H. Duncan, Lucien Dunning, Robert George, Arvind Gopalakrishnan, Mark Hairgrove, Chenghuan Jia, John Mashey
Two-pass cache tile processing for visibility testing in a tile-based architecture

Patent number: 10438314

Abstract: One embodiment of the present invention sets forth a graphics processing system. The graphics processing system includes a screen-space pipeline and a tiling unit. The screen-space pipeline is configured to perform visibility testing and fragment shading. The tiling unit is configured to determine that a first set of primitives overlaps a first cache tile. The tiling unit is also configured to first transmit the first set of primitives to the screen-space pipeline with a command configured to cause the screen-space pipeline to process the first set of primitives in a z-only mode, and then transmit the first set of primitives to the screen-space pipeline with a command configured to cause the screen-space pipeline to process the first set of primitives in a normal mode. In the z-only mode, at least some fragment shading operations are disabled in the screen-space pipeline. In the normal mode, fragment shading operations are enabled.

Type: Grant

Filed: April 23, 2018

Date of Patent: October 8, 2019

Assignee: NVIDIA CORPORATION

Inventors: Ziyad S. Hakura, Jerome F. Duluk, Jr.
Microcontroller for memory management unit

Patent number: 10409730

Abstract: One embodiment of the present invention includes a microcontroller coupled to a memory management unit (MMU). The MMU is coupled to a page table included in a physical memory, and the microcontroller is configured to perform one or more virtual memory operations associated with the physical memory and the page table. In operation, the microcontroller receives a page fault generated by the MMU in response to an invalid memory access via a virtual memory address. To remedy such a page fault, the microcontroller performs actions to map the virtual memory address to an appropriate location in the physical memory. By contrast, in prior-art systems, a fault handler would typically remedy the page fault. Advantageously, because the microcontroller executes these tasks locally with respect to the MMU and the physical memory, latency associated with remedying page faults may be decreased. Consequently, overall system performance may be increased.

Type: Grant

Filed: August 27, 2013

Date of Patent: September 10, 2019

Assignee: NVIDIA CORPORATION

Inventors: Cameron Buschardt, Jerome F. Duluk, Jr., John Mashey, Mark Hairgrove, James Leroy Deming, Brian Fahs
TWO-PASS CACHE TILE PROCESSING FOR VISIBILITY TESTING IN A TILE-BASED ARCHITECTURE

Publication number: 20190243652

Abstract: One embodiment of the present invention sets forth a graphics processing system. The graphics processing system includes a screen-space pipeline and a tiling unit. The screen-space pipeline is configured to perform visibility testing and fragment shading. The tiling unit is configured to determine that a first set of primitives overlaps a first cache tile. The tiling unit is also configured to first transmit the first set of primitives to the screen-space pipeline with a command configured to cause the screen-space pipeline to process the first set of primitives in a z-only mode, and then transmit the first set of primitives to the screen-space pipeline with a command configured to cause the screen-space pipeline to process the first set of primitives in a normal mode. In the z-only mode, at least some fragment shading operations are disabled in the screen-space pipeline. In the normal mode, fragment shading operations are enabled.

Type: Application

Filed: April 23, 2018

Publication date: August 8, 2019

Inventors: Ziyad S. HAKURA, Jerome F. DULUK, JR.
DYNAMIC PARTITIONING OF EXECUTION RESOURCES

Publication number: 20190235928

Abstract: Embodiments of the present invention set forth techniques for allocating execution resources to groups of threads within a graphics processing unit. A compute work distributor included in the graphics processing unit receives an indication from a process that a first group of threads is to be launched. The compute work distributor determines that a first subcontext associated with the process has at least one processor credit. In some embodiments, CTAs may be launched even when there are no processor credits, if one of the TPCs that was already acquired has sufficient space. The compute work distributor identifies a first processor included in a plurality of processors that has a processing load that is less than or equal to the processor loads associated with all other processors included in the plurality of processors. The compute work distributor launches the first group of threads to execute on the first processor.

Type: Application

Filed: January 31, 2018

Publication date: August 1, 2019

Inventors: Jerome F. Duluk, JR., Luke Durant, Ramon Matas Navarro, Alan Menezes, Jeffrey Tuckey, Gentaro Hirota, Brian Pharris
DYNAMIC PARTITIONING OF EXECUTION RESOURCES

Publication number: 20190235924

Abstract: Embodiments of the present invention set forth techniques for allocating execution resources to groups of threads within a graphics processing unit. A compute work distributor included in the graphics processing unit receives an indication from a process that a first group of threads is to be launched. The compute work distributor determines that a first subcontext associated with the process has at least one processor credit. In some embodiments, CTAs may be launched even when there are no processor credits, if one of the TPCs that was already acquired has sufficient space. The compute work distributor identifies a first processor included in a plurality of processors that has a processing load that is less than or equal to the processor loads associated with all other processors included in the plurality of processors. The compute work distributor launches the first group of threads to execute on the first processor.

Type: Application

Filed: January 31, 2018

Publication date: August 1, 2019

Inventors: Jerome F. Duluk, Jr., Luke Durant, Ramon Matas Navarro, Alan Menezes, Jeffrey Tuckey, Gentaro Hirota, Brian Pharris
PCIe traffic tracking hardware in a unified virtual memory system

Patent number: 10331603

Abstract: Techniques are disclosed for tracking memory page accesses in a unified virtual memory system. An access tracking unit detects a memory page access generated by a first processor for accessing a memory page in a memory system of a second processor. The access tracking unit determines whether a cache memory includes an entry for the memory page. If so, then the access tracking unit increments an associated access counter. Otherwise, the access tracking unit attempts to find an unused entry in the cache memory that is available for allocation. If so, then the access tracking unit associates the second entry with the memory page, and sets an access counter associated with the second entry to an initial value. Otherwise, the access tracking unit selects a valid entry in the cache memory; clears an associated valid bit; associates the entry with the memory page; and initializes an associated access counter.

Type: Grant

Filed: April 9, 2018

Date of Patent: June 25, 2019

Assignee: NVIDIA CORPORATION

Inventors: Jerome F. Duluk, Jr., Cameron Buschardt, James Leroy Deming, Brian Fahs, Mark Hairgrove, John Mashey
Efficient memory virtualization in multi-threaded processing units

Patent number: 10310973

Abstract: A technique for simultaneously executing multiple tasks, each having an independent virtual address space, involves assigning an address space identifier (ASID) to each task and constructing each virtual memory access request to include both a virtual address and the ASID. During virtual to physical address translation, the ASID selects a corresponding page table, which includes virtual to physical address mappings for the ASID and associated task. Entries for a translation look-aside buffer (TLB) include both the virtual address and ASID to complete each mapping to a physical address. Deep scheduling of tasks sharing a virtual address space may be implemented to improve cache affinity for both TLB and data caches.

Type: Grant

Filed: October 25, 2012

Date of Patent: June 4, 2019

Assignee: NVIDIA CORPORATION

Inventors: Nick Barrow-Williams, Brian Fahs, Jerome F. Duluk, Jr., James Leroy Deming, Timothy John Purcell, Lucien Dunning, Mark Hairgrove
Migration scheme for unified virtual memory system

Patent number: 10303616

Abstract: A system for managing virtual memory. The system includes a first processing unit configured to execute a first operation that references a first virtual memory address. The system also includes a first memory management unit (MMU) associated with the first processing unit and configured to generate a first page fault upon determining that a first page table that is stored in a first memory unit associated with the first processing unit does not include a mapping corresponding to the first virtual memory address. The system further includes a first copy engine associated with the first processing unit. The first copy engine is configured to read a first command queue to determine a first mapping that corresponds to the first virtual memory address and is included in a first page state directory. The first copy engine is also configured to update the first page table to include the first mapping.

Type: Grant

Filed: October 16, 2013

Date of Patent: May 28, 2019

Assignee: NVIDIA CORPORATION

Inventors: Jerome F. Duluk, Jr., Chenghuan Jia, John Mashey, Cameron Buschardt, Sherry Cheung, James Leroy Deming, Samuel H. Duncan, Lucien Dunning, Robert George, Arvind Gopalakrishnan, Mark Hairgrove
Managing event count reports in a tile-based architecture

Patent number: 10223122

Abstract: One embodiment of the present invention sets forth a graphics processing system configured to track event counts in a tile-based architecture. The graphics processing system includes a screen-space pipeline and a tiling unit. The screen-space pipeline includes a first unit, a count memory associated with the first unit, and an accumulating memory associated with the first unit. The first unit is configured to detect an event type and increment the count memory. The tiling unit is configured to cause the screen-space pipeline to update an external memory address to reflect a first value stored in the count memory when the first unit completes processing of a first set of primitives. The tiling unit is also configured to cause the screen-space pipeline to update the accumulating memory to reflect a second value stored in the count memory when the first unit completes processing of a second set of primitives.

Type: Grant

Filed: April 9, 2017

Date of Patent: March 5, 2019

Assignee: NVIDIA CORPORATION

Inventors: Ziyad S. Hakura, Jerome F. Duluk, Jr.
System, method, and computer program product for simultaneous execution of compute and graphics workloads

Patent number: 10217183

Abstract: A system, method, and computer program product are provided for allocating processor resources to process compute workloads and graphics workloads substantially simultaneously. The method includes the steps of allocating a plurality of processing units to process tasks associated with a graphics pipeline, receiving a request to allocate at least one processing unit in the plurality of processing units to process tasks associated with a compute pipeline, and reallocating the at least one processing unit to process tasks associated with the compute pipeline.

Type: Grant

Filed: December 20, 2013

Date of Patent: February 26, 2019

Assignee: NVIDIA CORPORATION

Inventors: Gregory S. Palmer, Jerome F. Duluk, Jr., Karim Maher Abdalla, Jonathon S. Evans, Adam Clark Weitkemper, Lacky Vasant Shah, Philip Browning Johnson, Gentaro Hirota
Migration of peer-mapped memory pages

Patent number: 10216413

Abstract: Techniques are provided by which memory pages may be migrated among PPU memories in a multi-PPU system. According to the techniques, a UVM driver determines that a particular memory page should change ownership state and/or be migrated between one PPU memory and another PPU memory. In response to this determination, the UVM driver initiates a peer transition sequence to cause the ownership state and/or location of the memory page to change. Various peer transition sequences involve modifying mappings for one or more PPU, and copying a memory page from one PPU memory to another PPU memory. Several steps in peer transition sequences may be performed in parallel for increased processing speed.

Type: Grant

Filed: May 1, 2017

Date of Patent: February 26, 2019

Assignee: NVIDIA CORPORATION

Inventors: Jerome F. Duluk, Jr., John Mashey, Mark Hairgrove, Chenghuan Jia, Cameron Buschardt, Lucien Dunning, Brian Fahs
Efficient memory virtualization in multi-threaded processing units

Patent number: 10169091

Abstract: A technique for simultaneously executing multiple tasks, each having an independent virtual address space, involves assigning an address space identifier (ASID) to each task and constructing each virtual memory access request to include both a virtual address and the ASID. During virtual to physical address translation, the ASID selects a corresponding page table, which includes virtual to physical address mappings for the ASID and associated task. Entries for a translation look-aside buffer (TLB) include both the virtual address and ASID to complete each mapping to a physical address. Deep scheduling of tasks sharing a virtual address space may be implemented to improve cache affinity for both TLB and data caches.

Type: Grant

Filed: October 25, 2012

Date of Patent: January 1, 2019

Assignee: NVIDIA CORPORATION

Inventors: Nick Barrow-Williams, Brian Fahs, Jerome F. Duluk, Jr., James Leroy Deming, Timothy John Purcell, Lucien Dunning, Mark Hairgrove
Hardware for parallel command list generation

Patent number: 10169072

Abstract: A method for providing state inheritance across command lists in a multi-threaded processing environment. The method includes receiving an application program that includes a plurality of parallel threads; generating a command list for each thread of the plurality of parallel threads; causing a first command list associated with a first thread of the plurality of parallel threads to be executed by a processing unit; and causing a second command list associated with a second thread of the plurality of parallel threads to be executed by the processing unit, where the second command list inherits from the first command list state associated with the processing unit.

Type: Grant

Filed: August 9, 2010

Date of Patent: January 1, 2019

Assignee: NVIDIA CORPORATION

Inventors: Jerome F. Duluk, Jr., Jesse David Hall, Henry Packard Moreton, Patrick R. Brown
Opportunistic migration of memory pages in a unified virtual memory system

Patent number: 10133677

Abstract: Techniques are disclosed for transitioning a memory page between memories in a virtual memory subsystem. A unified virtual memory (UVM) driver detects a page fault in response to a memory access request associated with a first memory page, where a local page table does not include an entry corresponding to a virtual memory address included in the memory access request. The UVM driver, in response to the page fault, executes a page fault sequence. The page fault sequence includes modifying the ownership state associated with the first memory page to be central-processing-unit-shared. The page fault sequence further includes scheduling the first memory page for migration from a system memory associated with a central processing unit (CPU) to a local memory associated with a parallel processing unit (PPU). One advantage of the disclosed approach is that the PPU accesses memory pages with greater efficiency.

Type: Grant

Filed: December 18, 2013

Date of Patent: November 20, 2018

Assignee: NVIDIA CORPORATION

Inventors: Jerome F. Duluk, Jr., Cameron Buschardt, James Leroy Deming, Lucien Dunning, Brian Fahs, Mark Hairgrove, John Mashey

prev 1 2 3 4 5 6 … next