Patents by Inventor Jerome F. Duluk, Jr.

Jerome F. Duluk, Jr. has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Managing memory regions to support sparse mappings

Patent number: 9754561

Abstract: One embodiment of the present invention includes a memory management unit (MMU) that is configured to manage sparse mappings. The MMU processes requests to translate virtual addresses to physical addresses based on page table entries (PTEs) that indicate a sparse status. If the MMU determines that the PTE does not include a mapping from a virtual address to a physical address, then the MMU responds to the request based on the sparse status. If the sparse status is active, then the MMU determines the physical address based on whether the type of the request is a write operation and, subsequently, generates an acknowledgement of the request. By contrast, if the sparse status is not active, then the MMU generates a page fault. Advantageously, the disclosed embodiments enable the computer system to manage sparse mappings without incurring the performance degradation associated with both page faults and conventional software-based sparse mapping management.

Type: Grant

Filed: October 4, 2013

Date of Patent: September 5, 2017

Assignee: NVIDIA CORPORATION

Inventors: Jonathan Dunaisky, Henry Packard Moreton, Jeffrey A. Bolz, Yury Y. Uralsky, James Leroy Deming, Rui M. Bastos, Patrick R. Brown, Amanpreet Grewal, Christian Amsinck, Poornachandra Rao, Jerome F. Duluk, Jr., Andrew J. Tao
COMMON POINTERS IN UNIFIED VIRTUAL MEMORY SYSTEM

Publication number: 20170249254

Abstract: A system for managing virtual memory. The system includes a first processing unit configured to execute a first operation that references a first virtual memory address. The system also includes a first memory management unit (MMU) associated with the first processing unit and configured to generate a first page fault upon determining that a first page table that is stored in a first memory unit associated with the first processing unit does not include a mapping corresponding to the first virtual memory address. The system further includes a first copy engine associated with the first processing unit. The first copy engine is configured to read a first command queue to determine a first mapping that corresponds to the first virtual memory address and is included in a first page state directory. The first copy engine is also configured to update the first page table to include the first mapping.

Type: Application

Filed: October 16, 2013

Publication date: August 31, 2017

Applicant: NVIDIA Corporation

Inventors: Jerome F. DULUK, JR., Cameron BUSCHARDT, Sherry CHEUNG, James Leroy DEMING, Samuel H. DUNCAN, Lucien DUNNING, Robert GEORGE, Arvind GOPALAKRISHNAN, Mark HAIRGROVE, Chenghuan JIA, John MASHEY
Software methods in a GPU

Patent number: 9734545

Abstract: One embodiment of the present invention sets forth a technique for executing a software method within a graphics processing unit (GPU) that minimizes the number of clock cycles during which the graphics engine is idled. The function of the software method is performed by a firmware method that is executed by a processor within the GPU. The firmware method is executed to access and optionally update the state stored in the GPU. Unlike execution of a conventional software method, execution of the firmware method does not require an exchange of information between a CPU and the GPU. Therefore, the CPU is not interrupted and throughput of the CPU is not reduced.

Type: Grant

Filed: October 7, 2010

Date of Patent: August 15, 2017

Assignee: NVIDIA Corporation

Inventors: Jerome F. Duluk, Jr., John Christopher Cook, Fred Gruner, Gregory Scott Palmer
Execution state analysis for assigning tasks to streaming multiprocessors

Patent number: 9715413

Abstract: One embodiment of the present invention sets forth a technique for selecting a first processor included in a plurality of processors to receive work related to a compute task. The technique involves analyzing state data of each processor in the plurality of processors to identify one or more processors that have already been assigned one compute task and are eligible to receive work related to the one compute task, receiving, from each of the one or more processors identified as eligible, an availability value that indicates the capacity of the processor to receive new work, selecting a first processor to receive work related to the one compute task based on the availability values received from the one or more processors, and issuing, to the first processor via a cooperative thread array (CTA), the work related to the one compute task.

Type: Grant

Filed: January 18, 2012

Date of Patent: July 25, 2017

Assignee: NVIDIA Corporation

Inventors: Karim M. Abdalla, Lacky V. Shah, Jerome F. Duluk, Jr., Timothy John Purcell, Tanmoy Mandal, Gentaro Hirota
Methods and apparatus for auto-throttling encapsulated compute tasks

Patent number: 9710306

Abstract: Systems and methods for auto-throttling encapsulated compute tasks. A device driver may configure a parallel processor to execute compute tasks in a number of discrete throttled modes. The device driver may also allocate memory to a plurality of different processing units in a non-throttled mode. The device driver may also allocate memory to a subset of the plurality of processing units in each of the throttling modes. Data structures defined for each task include a flag that instructs the processing unit whether the task may be executed in the non-throttled mode or in the throttled mode. A work distribution unit monitors each of the tasks scheduled to run on the plurality of processing units and determines whether the processor should be configured to run in the throttled mode or in the non-throttled mode.

Type: Grant

Filed: April 9, 2012

Date of Patent: July 18, 2017

Assignee: NVIDIA Corporation

Inventors: Jerome F. Duluk, Jr., Jesse David Hall, Philip Alexander Cuadra, Karim M. Abdalla
FRAME BUFFER ACCESS TRACKING VIA A SLIDING WINDOW IN A UNIFIED VIRTUAL MEMORY SYSTEM

Publication number: 20170199689

Abstract: One embodiment of the present invention is a memory subsystem that includes a sliding window tracker that tracks memory accesses associated with a sliding window of memory page groups. When the sliding window tracker detects an access operation associated with a memory page group within the sliding window, the sliding window tracker sets a reference bit that is associated with the memory page group and is included in a reference vector that represents accesses to the memory page groups within the sliding window. Based on the values of the reference bits, the sliding window tracker causes the selection a memory page in a memory page group that has fallen into disuse from a first memory to a second memory. Because the sliding window tracker tunes the memory pages that are resident in the first memory to reflect memory access patterns, the overall performance of the memory subsystem is improved.

Type: Application

Filed: May 31, 2016

Publication date: July 13, 2017

Inventors: John MASHEY, Cameron BUSCHARDT, James Leroy DEMING, Jerome F. DULUK, JR., Brian FAHS
MIGRATION SCHEME FOR UNIFIED VIRTUAL MEMORY SYSTEM

Publication number: 20170185526

Abstract: A system for managing virtual memory. The system includes a first processing unit configured to execute a first operation that references a first virtual memory address. The system also includes a first memory management unit (MMU) associated with the first processing unit and configured to generate a first page fault upon determining that a first page table that is stored in a first memory unit associated with the first processing unit does not include a mapping corresponding to the first virtual memory address. The system further includes a first copy engine associated with the first processing unit. The first copy engine is configured to read a first command queue to determine a first mapping that corresponds to the first virtual memory address and is included in a first page state directory. The first copy engine is also configured to update the first page table to include the first mapping.

Type: Application

Filed: October 16, 2013

Publication date: June 29, 2017

Applicant: NVIDIA CORPORATION

Inventors: Jerome F. DULUK, JR., Cameron BUSCHARDT, Sherry CHEUNG, James Leroy DEMING, Samuel H. DUNCAN, Lucien DUNNING, Robert GEORGE, Arvind GOPALAKRISHNAN, Mark HAIRGROVE, Chenghuan JIA, John MASHEY
Techniques for optimizing stencil buffers

Patent number: 9679350

Abstract: One embodiment sets forth a method for associating each stencil value included in a stencil buffer with multiple fragments. Components within a graphics processing pipeline use a set of stencil masks to partition the bits of each stencil value. Each stencil mask selects a different subset of bits, and each fragment is strategically associated with both a stencil value and a stencil mask. Before performing stencil actions associated with a fragment, the raster operations unit performs stencil mask operations on the operands. No fragments are associated with both the same stencil mask and the same stencil value. Consequently, no fragments are associated with the same stencil bits included in the stencil buffer. Advantageously, by reducing the number of stencil bits associated with each fragment, certain classes of software applications may reduce the wasted memory associated with stencil buffers in which each stencil value is associated with a single fragment.

Type: Grant

Filed: August 3, 2015

Date of Patent: June 13, 2017

Assignee: NVDIA Corporation

Inventors: Eric B. Lum, Jerome F. Duluk, Jr.
REPLAYING MEMORY TRANSACTIONS WHILE RESOLVING MEMORY ACCESS FAULTS

Publication number: 20170161206

Abstract: One embodiment of the present invention is a parallel processing unit (PPU) that includes one or more streaming multiprocessors (SMs) and implements a replay unit per SM. Upon detecting a page fault associated with a memory transaction issued by a particular SM, the corresponding replay unit causes the SM, but not any unaffected SMs, to cease issuing new memory transactions. The replay unit then stores the faulting memory transaction and any faulting in-flight memory transaction in a replay buffer. As page faults are resolved, the replay unit replays the memory transactions in the replay buffer—removing successful memory transactions from the replay buffer—until all of the stored memory transactions have successfully executed. Advantageously, the overall performance of the PPU is improved compared to conventional PPUs that, upon detecting a page fault, stop performing memory transactions across all SMs included in the PPU until the fault is resolved.

Type: Application

Filed: February 20, 2017

Publication date: June 8, 2017

Inventors: James Leroy DEMING, Jerome F. DULUK, JR., John MASHEY, Mark HAIRGROVE, Lucien DUNNING, Jonathon Stuart Ramsey EVANS, Samuel H. DUNCAN, Cameron BUSCHARDT, Brian FAHS
Migration of peer-mapped memory pages

Patent number: 9639474

Abstract: Techniques are provided by which memory pages may be migrated among PPU memories in a multi-PPU system. According to the techniques, a UVM driver determines that a particular memory page should change ownership state and/or be migrated between one PPU memory and another PPU memory. In response to this determination, the UVM driver initiates a peer transition sequence to cause the ownership state and/or location of the memory page to change. Various peer transition sequences involve modifying mappings for one or more PPU, and copying a memory page from one PPU memory to another PPU memory. Several steps in peer transition sequences may be performed in parallel for increased processing speed.

Type: Grant

Filed: December 19, 2013

Date of Patent: May 2, 2017

Assignee: NVIDIA Corporation

Inventors: Jerome F. Duluk, Jr., John Mashey, Mark Hairgrove, Chenghuan Jia, Cameron Buschardt, Lucien Dunning, Brian Fahs
Managing event count reports in a tile-based architecture

Patent number: 9639367

Abstract: One embodiment of the present invention sets forth a graphics processing system configured to track event counts in a tile-based architecture. The graphics processing system includes a screen-space pipeline and a tiling unit. The screen-space pipeline includes a first unit, a count memory associated with the first unit, and an accumulating memory associated with the first unit. The first unit is configured to detect an event type and increment the count memory. The tiling unit is configured to cause the screen-space pipeline to update an external memory address to reflect a first value stored in the count memory when the first unit completes processing of a first set of primitives. The tiling unit is also configured to cause the screen-space pipeline to update the accumulating memory to reflect a second value stored in the count memory when the first unit completes processing of a second set of primitives.

Type: Grant

Filed: October 4, 2013

Date of Patent: May 2, 2017

Assignee: NVIDIA Corporation

Inventors: Ziyad S. Hakura, Jerome F. Duluk, Jr.
Higher accuracy Z-culling in a tile-based architecture

Patent number: 9612839

Abstract: A graphics processing pipeline configured for z-cull operations. The graphics processing pipeline comprising a screen-space pipeline and a tiling unit. The screen-space pipeline includes a z-cull unit configured to perform z-culling operations. The tiling unit is configured to determine that a first set of primitives overlaps a first cache tile. The tiling unit is also configured to transmit the first set of primitives to the screen-space pipeline for processing. The tiling unit is further configured to select between processing the first set of primitives in a full-surface z-cull mode or processing the first set of primitives in a partial-surface z-cull mode. The tiling unit is also configured to cause the z-cull unit to process the first set of primitives in the full-surface z-cull mode or to process the first set of primitives in the partial-surface z-cull mode.

Type: Grant

Filed: October 23, 2013

Date of Patent: April 4, 2017

Assignee: NVIDIA Corporation

Inventors: Ziyad S. Hakura, Jerome F. Duluk, Jr.
Microcontroller for memory management unit

Patent number: 9588903

Abstract: One embodiment of the present invention includes a microcontroller coupled to a memory management unit (MMU). The MMU is coupled to a page table included in a physical memory, and the microcontroller is configured to perform one or more virtual memory operations associated with the physical memory and the page table. In operation, the microcontroller receives a page fault generated by the MMU in response to an invalid memory access via a virtual memory address. To remedy such a page fault, the microcontroller performs actions to map the virtual memory address to an appropriate location in the physical memory. By contrast, in prior-art systems, a fault handler would typically remedy the page fault. Advantageously, because the microcontroller executes these tasks locally with respect to the MMU and the physical memory, latency associated with remedying page faults may be decreased. Consequently, overall system performance may be increased.

Type: Grant

Filed: August 27, 2013

Date of Patent: March 7, 2017

Assignee: NVIDIA Corporation

Inventors: Cameron Buschardt, Jerome F. Duluk, Jr., John Mashey, Mark Hairgrove, James Leroy Deming, Brian Fahs
Methods to facilitate primitive batching

Patent number: 9589310

Abstract: One embodiment of the present invention sets forth a technique for splitting a set of vertices into a plurality of batches for processing. The method includes receiving one or more primitives each containing an associated set of vertices. For each of the one or more primitives, one or more vertices are gathered from the set of vertices, the vertices are arranged into one or more batches, the batch is routed to a processing pipeline line to process each batch as a separate primitive, and the one or more batches are processed to produce results identical to those of processing the entire primitive as a single entity.

Type: Grant

Filed: October 5, 2010

Date of Patent: March 7, 2017

Assignee: NVIDIA Corporation

Inventors: Jerome F. Duluk, Jr., Thomas Roell, Patrick R. Brown
Replaying memory transactions while resolving memory access faults

Patent number: 9575892

Abstract: One embodiment of the present invention is a parallel processing unit (PPU) that includes one or more streaming multiprocessors (SMs) and implements a replay unit per SM. Upon detecting a page fault associated with a memory transaction issued by a particular SM, the corresponding replay unit causes the SM, but not any unaffected SMs, to cease issuing new memory transactions. The replay unit then stores the faulting memory transaction and any faulting in-flight memory transaction in a replay buffer. As page faults are resolved, the replay unit replays the memory transactions in the replay buffer—removing successful memory transactions from the replay buffer—until all of the stored memory transactions have successfully executed. Advantageously, the overall performance of the PPU is improved compared to conventional PPUs that, upon detecting a page fault, stop performing memory transactions across all SMs included in the PPU until the fault is resolved.

Type: Grant

Filed: December 17, 2013

Date of Patent: February 21, 2017

Assignee: NVIDIA Corporation

Inventors: James Leroy Deming, Jerome F. Duluk, Jr., John Mashey, Mark Hairgrove, Lucien Dunning, Jonathon Stuart Ramsey Evans, Samuel H. Duncan, Cameron Buschardt, Brian Fahs
MIGRATING PAGES OF DIFFERENT SIZES BETWEEN HETEROGENEOUS PROCESSORS

Publication number: 20160357482

Abstract: One embodiment of the present invention sets forth a computer-implemented method for migrating a memory page from a first memory to a second memory. The method includes determining a first page size supported by the first memory. The method also includes determining a second page size supported by the second memory. The method further includes determining a use history of the memory page based on an entry in a page state directory associated with the memory page. The method also includes migrating the memory page between the first memory and the second memory based on the first page size, the second page size, and the use history.

Type: Application

Filed: August 22, 2016

Publication date: December 8, 2016

Inventors: Jerome F. DULUK, JR., Cameron BUSCHARDT, James Leroy DEMING, Lucien DUNNING, Brian FAHS, Mark HAIRGROVE, Chenghuan JIA, John MASHEY, James M. VAN DYKE
Technique for computational nested parallelism

Patent number: 9513975

Abstract: One embodiment of the present invention sets forth a technique for performing nested kernel execution within a parallel processing subsystem. The technique involves enabling a parent thread to launch a nested child grid on the parallel processing subsystem, and enabling the parent thread to perform a thread synchronization barrier on the child grid for proper execution semantics between the parent thread and the child grid. This technique advantageously enables the parallel processing subsystem to perform a richer set of programming constructs, such as conditionally executed and nested operations and externally defined library functions without the additional complexity of CPU involvement.

Type: Grant

Filed: May 2, 2012

Date of Patent: December 6, 2016

Assignee: NVIDIA Corporation

Inventors: Stephen Jones, Philip Alexander Cuadra, Daniel Elliot Wexler, Ignacio Llamas, Lacky V. Shah, Jerome F. Duluk, Jr., Christopher Lamb
Compute work distribution reference counters

Patent number: 9507638

Abstract: One embodiment of the present invention sets forth a technique for managing the allocation and release of resources during multi-threaded program execution. Programmable reference counters are initialized to values that limit the amount of resources for allocation to tasks that share the same reference counter. Resource parameters are specified for each task to define the amount of resources allocated for consumption by each array of execution threads that is launched to execute the task. The resource parameters also specify the behavior of the array for acquiring and releasing resources. Finally, during execution of each thread in the array, an exit instruction may be configured to override the release of the resources that were allocated to the array. The resources may then be retained for use by a child task that is generated during execution of a thread.

Type: Grant

Filed: November 8, 2011

Date of Patent: November 29, 2016

Assignee: NVIDIA Corporation

Inventors: Philip Alexander Cuadra, Karim M. Abdalla, Jerome F. Duluk, Jr., Luke Durant, Gerald F. Luiz, Timothy John Purcell, Lacky V. Shah
Efficient super-sampling with per-pixel shader threads

Patent number: 9495721

Abstract: Techniques for dispatching pixel information in a graphics processing pipeline. A fragment processing unit generates a pixel that includes multiple samples based on a first portion of a graphics primitive received by a first thread. The fragment processing unit calculates a first value for the first pixel, where the first value is calculated only once for the pixel. The fragment processing unit calculates a first set of values for the samples, where each value in the first set of values corresponds to a different sample and is calculated only once for the corresponding sample. The fragment processing unit combines the first value with each value in the first set of values to create a second set of values. The fragment processing unit creates one or more dispatch messages to store the second set of values in a set of output registers.

Type: Grant

Filed: December 21, 2012

Date of Patent: November 15, 2016

Assignee: NVIDIA Corporation

Inventors: Jerome F. Duluk, Jr., Rouslan Dimitrov, Eric Lum, Rui Bastos
Concurrent execution of independent streams in multi-channel time slice groups

Patent number: 9442759

Abstract: A time slice group (TSG) is a grouping of different streams of work (referred to herein as “channels”) that share the same context information. The set of channels belonging to a TSG are processed in a pre-determined order. However, when a channel stalls while processing, the next channel with independent work can be switched to fully load the parallel processing unit. Importantly, because each channel in the TSG shares the same context information, a context switch operation is not needed when the processing of a particular channel in the TSG stops and the processing of a next channel in the TSG begins. Therefore, multiple independent streams of work are allowed to run concurrently within a single context increasing utilization of parallel processing units.

Type: Grant

Filed: December 9, 2011

Date of Patent: September 13, 2016

Assignee: NVIDIA Corporation

Inventors: Samuel H. Duncan, Lacky V. Shah, Sean J. Treichler, Daniel Elliot Wexler, Jerome F. Duluk, Jr., Philip Browning Johnson, Jonathon Stuart Ramsay Evans

prev 1 2 3 4 5 6 7 8 … next