Patents by Inventor Jerome F. Duluk, Jr.

Jerome F. Duluk, Jr. has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

MANAGING PER-TILE EVENT COUNT REPORTS IN A TILE-BASED ARCHITECTURE

Publication number: 20140118370

Abstract: A graphics processing system configured to track per-tile event counts in a tile-based architecture. A tiling unit in the graphics processing system is configured to cause a screen-space pipeline to load a count value associated with a first cache tile into a count memory and to cause the screen-space pipeline to process a first set of primitives that intersect the first cache tile. The tiling unit is further configured to cause the screen-space pipeline to store a second count value in a report memory location. The tiling unit is also configured to cause the screen-space pipeline to process a second set of primitives that intersect the first cache tile and to cause the screen-space pipeline to store a third count value in the first accumulating memory. Conditional rendering operations may be performed on a per-cache tile basis, based on the per-tile event count.

Type: Application

Filed: October 23, 2013

Publication date: May 1, 2014

Applicant: NVIDIA Corporation

Inventors: Ziyad S. HAKURA, Jerome F. DULUK, Jr.
TWO-PASS CACHE TILE PROCESSING FOR VISIBILITY TESTING IN A TILE-BASED ARCHITECTURE

Publication number: 20140118347

Abstract: One embodiment of the present invention sets forth a graphics processing system. The graphics processing system includes a screen-space pipeline and a tiling unit. The screen-space pipeline is configured to perform visibility testing and fragment shading. The tiling unit is configured to determine that a first set of primitives overlaps a first cache tile. The tiling unit is also configured to first transmit the first set of primitives to the screen-space pipeline with a command configured to cause the screen-space pipeline to process the first set of primitives in a z-only mode, and then transmit the first set of primitives to the screen-space pipeline with a command configured to cause the screen-space pipeline to process the first set of primitives in a normal mode. In the z-only mode, at least some fragment shading operations are disabled in the screen-space pipeline. In the normal mode, fragment shading operations are enabled.

Type: Application

Filed: October 1, 2013

Publication date: May 1, 2014

Applicant: NVIDIA CORPORATION

Inventors: Ziyad S. HAKURA, Jerome F. DULUK, Jr.
EFFICIENT MEMORY VIRTUALIZATION IN MULTI-THREADED PROCESSING UNITS

Publication number: 20140122829

Abstract: A technique for simultaneously executing multiple tasks, each having an independent virtual address space, involves assigning an address space identifier (ASID) to each task and constructing each virtual memory access request to include both a virtual address and the ASID. During virtual to physical address translation, the ASID selects a corresponding page table, which includes virtual to physical address mappings for the ASID and associated task. Entries for a translation look-aside buffer (TLB) include both the virtual address and ASID to complete each mapping to a physical address. Deep scheduling of tasks sharing a virtual address space may be implemented to improve cache affinity for both TLB and data caches.

Type: Application

Filed: October 25, 2012

Publication date: May 1, 2014

Applicant: NVIDIA Corporation

Inventors: Nick BARROW-WILLIAMS, Brian FAHS, Jerome F. DULUK, JR., James Leroy DEMING, Timothy John PURCELL, Lucien DUNNING, Mark HAIRGROVE
HIGHER ACCURACY Z-CULLING IN A TILE-BASED ARCHITECTURE

Publication number: 20140118348

Abstract: A graphics processing pipeline configured for z-cull operations. The graphics processing pipeline comprising a screen-space pipeline and a tiling unit. The screen-space pipeline includes a z-cull unit configured to perform z-culling operations. The tiling unit is configured to determine that a first set of primitives overlaps a first cache tile. The tiling unit is also configured to transmit the first set of primitives to the screen-space pipeline for processing. The tiling unit is further configured to select between processing the first set of primitives in a full-surface z-cull mode or processing the first set of primitives in a partial-surface z-cull mode. The tiling unit is also configured to cause the z-cull unit to process the first set of primitives in the full-surface z-cull mode or to process the first set of primitives in the partial-surface z-cull mode.

Type: Application

Filed: October 23, 2013

Publication date: May 1, 2014

Applicant: NVIDIA CORPORATION

Inventors: Ziyad S. HAKURA, Jerome F. DULUK, Jr.
MANAGING EVENT COUNT REPORTS IN A TILE-BASED ARCHITECTURE

Publication number: 20140118369

Abstract: One embodiment of the present invention sets forth a graphics processing system configured to track event counts in a tile-based architecture. The graphics processing system includes a screen-space pipeline and a tiling unit. The screen-space pipeline includes a first unit, a count memory associated with the first unit, and an accumulating memory associated with the first unit. The first unit is configured to detect an event type and increment the count memory. The tiling unit is configured to cause the screen-space pipeline to update an external memory address to reflect a first value stored in the count memory when the first unit completes processing of a first set of primitives. The tiling unit is also configured to cause the screen-space pipeline to update the accumulating memory to reflect a second value stored in the count memory when the first unit completes processing of a second set of primitives.

Type: Application

Filed: October 4, 2013

Publication date: May 1, 2014

Applicant: NVIDIA CORPORATION

Inventors: Ziyad S. HAKURA, Jerome F. DULUK, JR.
EFFICIENT MEMORY VIRTUALIZATION IN MULTI-THREADED PROCESSING UNITS

Publication number: 20140123146

Abstract: A technique for simultaneously executing multiple tasks, each having an independent virtual address space, involves assigning an address space identifier (ASID) to each task and constructing each virtual memory access request to include both a virtual address and the ASID. During virtual to physical address translation, the ASID selects a corresponding page table, which includes virtual to physical address mappings for the ASID and associated task. Entries for a translation look-aside buffer (TLB) include both the virtual address and ASID to complete each mapping to a physical address. Deep scheduling of tasks sharing a virtual address space may be implemented to improve cache affinity for both TLB and data caches.

Type: Application

Filed: October 25, 2012

Publication date: May 1, 2014

Applicant: NVIDIA CORPORATION

Inventors: Nick BARROW-WILLIAMS, Brian FAHS, Jerome F. DULUK, JR., James Leroy DEMING, Timothy John PURCELL, Lucien DUNNING, Mark HAIRGROVE
EFFICIENT MEMORY VIRTUALIZATION IN MULTI-THREADED PROCESSING UNITS

Publication number: 20140123145

Abstract: A technique for simultaneously executing multiple tasks, each having an independent virtual address space, involves assigning an address space identifier (ASID) to each task and constructing each virtual memory access request to include both a virtual address and the ASID. During virtual to physical address translation, the ASID selects a corresponding page table, which includes virtual to physical address mappings for the ASID and associated task. Entries for a translation look-aside buffer (TLB) include both the virtual address and ASID to complete each mapping to a physical address. Deep scheduling of tasks sharing a virtual address space may be implemented to improve cache affinity for both TLB and data caches.

Type: Application

Filed: October 25, 2012

Publication date: May 1, 2014

Applicant: NVIDIA CORPORATION

Inventors: Nick BARROW-WILLIAMS, Brian FAHS, Jerome F. DULUK, Jr., James Leroy DEMING, Timothy John PURCELL, Lucien DUNNING, Mark HAIRGROVE
Sparse texture systems and methods

Patent number: 8681169

Abstract: Systems and methods for texture processing are presented. In one embodiment a texture method includes creating a sparse texture residency translation map; performing a probe process utilizing the sparse texture residency translation map information to return a finest LOD that contains the texels for a texture lookup operation; and performing the texture lookup operation utilizing the finest LOD. In one exemplary implementation, the finest LOD is utilized as a minimum LOD clamp during the texture lookup operation. A finest LOD number indicates a minimum resident LOD and a sparse texture residency translation map includes one finest LOD number per tile of a sparse texture. The sparse texture residency translation can indicate a minimum resident LOD.

Type: Grant

Filed: December 31, 2009

Date of Patent: March 25, 2014

Assignee: Nvidia Corporation

Inventors: Jesse D. Hall, Jerome F. Duluk, Jr., Andrew Tao, Henry Moreton
Inter-shader attribute buffer optimization

Patent number: 8619087

Abstract: One embodiment of the present invention sets forth a technique for reducing the amount of memory required to store vertex data processed within a processing pipeline that includes a plurality of shading engines. The method includes determining a first active shading engine and a second active shading engine included within the processing pipeline, wherein the second active shading engine receives vertex data output by the first active shading engine. An output map is received and indicates one or more attributes that are included in the vertex data and output by the first active shading engine. An input map is received and indicates one or more attributes that are included in the vertex data and received by the second active shading engine from the first active shading engine.

Type: Grant

Filed: September 30, 2010

Date of Patent: December 31, 2013

Assignee: Nvidia Corporation

Inventors: Jerome F. Duluk, Jr., Gernot Schaufler
TECHNIQUE FOR COMPUTATIONAL NESTED PARALLELISM

Publication number: 20130298133

Abstract: One embodiment of the present invention sets forth a technique for performing nested kernel execution within a parallel processing subsystem. The technique involves enabling a parent thread to launch a nested child grid on the parallel processing subsystem, and enabling the parent thread to perform a thread synchronization barrier on the child grid for proper execution semantics between the parent thread and the child grid. This technique advantageously enables the parallel processing subsystem to perform a richer set of programming constructs, such as conditionally executed and nested operations and externally defined library functions without the additional complexity of CPU involvement.

Type: Application

Filed: May 2, 2012

Publication date: November 7, 2013

Inventors: Stephen JONES, Philip Alexander Cuadra, Daniel Elliot Wexler, Ignacio Llamas, Lacky V. Shah, Jerome F. Duluk, JR., Christopher Lamb
METHODS AND APPARATUS FOR AUTO-THROTTLING ENCAPSULATED COMPUTE TASKS

Publication number: 20130268942

Abstract: Systems and methods for auto-throttling encapsulated compute tasks. A device driver may configure a parallel processor to execute compute tasks in a number of discrete throttled modes. The device driver may also allocate memory to a plurality of different processing units in a non-throttled mode. The device driver may also allocate memory to a subset of the plurality of processing units in each of the throttling modes. Data structures defined for each task include a flag that instructs the processing unit whether the task may be executed in the non-throttled mode or in the throttled mode. A work distribution unit monitors each of the tasks scheduled to run on the plurality of processing units and determines whether the processor should be configured to run in the throttled mode or in the non-throttled mode.

Type: Application

Filed: April 9, 2012

Publication date: October 10, 2013

Inventors: Jerome F. DULUK, JR., Jesse David Hall, Philip Alexander Cuadra, Karim M. Abdalla
System and method for utilizing semaphores in a graphics pipeline

Patent number: 8525842

Abstract: A semaphore system, method, and computer program product are provided for use in a graphics environment. In operation, a semaphore is operated upon utilizing a plurality of graphics processing modules for a variety of graphics processing-related purposes (e.g. for example, controlling access to graphics data by the graphics processing modules, etc.).

Type: Grant

Filed: June 16, 2006

Date of Patent: September 3, 2013

Assignee: NVIDIA Corporation

Inventors: Jerome F. Duluk, Jr., Richard A. Silkebakken
AUTOMATIC DEPENDENT TASK LAUNCH

Publication number: 20130198760

Abstract: One embodiment of the present invention sets forth a technique for automatic launching of a dependent task when execution of a first task completes. Automatically launching the dependent task reduces the latency incurred during the transition from the first task to the dependent task. Information associated with the dependent task is encoded as part of the metadata for the first task. When execution of the first task completes a task scheduling unit is notified and the dependent task is launched without requiring any release or acquisition of a semaphore. The information associated with the dependent task includes an enable flag and a pointer to the dependent task. Once the dependent task is launched, the first task is marked as complete so that memory storing the metadata for the first task may be reused to store metadata for a new task.

Type: Application

Filed: January 27, 2012

Publication date: August 1, 2013

Inventors: Philip Alexander CUADRA, Lacky V. Shah, Timothy John Purcell, Gerald F. Luiz, Jerome F. Duluk, JR.
Hardware override of application programming interface programmed state

Patent number: 8493395

Abstract: A method and system for overriding state information programmed into a processor using an application programming interface (API) avoids introducing error conditions in the processor. An override monitor unit within the processor stores the programmed state for any setting that is overridden so that the programmed state can be restored when the error condition no longer exists. The override monitor unit overrides the programmed state by forcing the setting to a legal value that does not cause an error condition. The processor is able to continue operating without notifying a device driver that an error condition has occurred since the error condition is avoided.

Type: Grant

Filed: July 16, 2012

Date of Patent: July 23, 2013

Assignee: Nvidia Corporation

Inventors: Jerome F. Duluk, Jr., Henry P. Moreton, Steven E. Molnar, John S. Montrym
SCHEDULING AND EXECUTION OF COMPUTE TASKS

Publication number: 20130185728

Abstract: One embodiment of the present invention sets forth a technique for assigning a compute task to a first processor included in a plurality of processors. The technique involves analyzing each compute task in a plurality of compute tasks to identify one or more compute tasks that are eligible for assignment to the first processor, where each compute task is listed in a first table and is associated with a priority value and an allocation order that indicates relative time at which the compute task was added to the first table. The technique further involves selecting a first task compute from the identified one or more compute tasks based on at least one of the priority value and the allocation order, and assigning the first compute task to the first processor for execution.

Type: Application

Filed: January 18, 2012

Publication date: July 18, 2013

Inventors: Karim M. Abdalla, Lacky V. Shah, Jerome F. Duluk, JR., Timothy John Purcell, Tanmoy Mandal, Gentaro Hirota
SCHEDULING AND EXECUTION OF COMPUTE TASKS

Publication number: 20130185725

Abstract: One embodiment of the present invention sets forth a technique for selecting a first processor included in a plurality of processors to receive work related to a compute task. The technique involves analyzing state data of each processor in the plurality of processors to identify one or more processors that have already been assigned one compute task and are eligible to receive work related to the one compute task, receiving, from each of the one or more processors identified as eligible, an availability value that indicates the capacity of the processor to receive new work, selecting a first processor to receive work related to the one compute task based on the availability values received from the one or more processors, and issuing, to the first processor via a cooperative thread array (CTA), the work related to the one compute task.

Type: Application

Filed: January 18, 2012

Publication date: July 18, 2013

Inventors: Karim M. ABDALLA, Lacky V. Shah, Jerome F. Duluk, JR., Timothy John Purcell, Tanmoy Mandal, Gentaro Hirota
SIGNALING, ORDERING, AND EXECUTION OF DYNAMICALLY GENERATED TASKS IN A PROCESSING SYSTEM

Publication number: 20130160021

Abstract: One embodiment of the present invention sets forth a technique for enabling the insertion of generated tasks into a scheduling pipeline of a multiple processor system allows a compute task that is being executed to dynamically generate a dynamic task and notify a scheduling unit of the multiple processor system without intervention by a CPU. A reflected notification signal is generated in response to a write request when data for the dynamic task is written to a queue. Additional reflected notification signals are generated for other events that occur during execution of a compute task, e.g., to invalidate cache entries storing data for the compute task and to enable scheduling of another compute task.

Type: Application

Filed: December 16, 2011

Publication date: June 20, 2013

Inventors: Timothy John PURCELL, Lacky V. Shah, Jerome F. Duluk, JR., Sean J. Treichler, Karim M. Abdalla, Philip Alexander Cuadra, Brian Pharris
ERROR CHECKING IN OUT-OF-ORDER TASK SCHEDULING

Publication number: 20130152094

Abstract: One embodiment of the present invention sets forth a technique for error-checking a compute task. The technique involves receiving a pointer to a compute task, storing the pointer in a scheduling queue, determining that the compute task should be executed, retrieving the pointer from the scheduling queue, determining via an error-check procedure that the compute task is eligible for execution, and executing the compute task.

Type: Application

Filed: December 9, 2011

Publication date: June 13, 2013

Inventors: Jerome F. Duluk, JR., Timothy John Purcell, Jesse David Hall, Phlip Alexander Cuadra
Multi-Channel Time Slice Groups

Publication number: 20130152093

Abstract: A time slice group (TSG) is a grouping of different streams of work (referred to herein as “channels”) that share the same context information. The set of channels belonging to a TSG are processed in a pre-determined order. However, when a channel stalls while processing, the next channel with independent work can be switched to fully load the parallel processing unit. Importantly, because each channel in the TSG shares the same context information, a context switch operation is not needed when the processing of a particular channel in the TSG stops and the processing of a next channel in the TSG begins. Therefore, multiple independent streams of work are allowed to run concurrently within a single context increasing utilization of parallel processing units.

Type: Application

Filed: December 9, 2011

Publication date: June 13, 2013

Inventors: Samuel H. DUNCAN, Lacky V. SHAH, Sean J. TREICHLER, Daniel Elliot WEXLER, Jerome F. DULUK, JR., Phillip Browning JOHNSON, Jonathon Stuart Ramsay EVANS
COMPUTE TASK STATE ENCAPSULATION

Publication number: 20130117751

Abstract: One embodiment of the present invention sets forth a technique for encapsulating compute task state that enables out-of-order scheduling and execution of the compute tasks. The scheduling circuitry organizes the compute tasks into groups based on priority levels. The compute tasks may then be selected for execution using different scheduling schemes. Each group is maintained as a linked list of pointers to compute tasks that are encoded as task metadata (TMD) stored in memory. A TMD encapsulates the state and parameters needed to initialize, schedule, and execute a compute task.

Type: Application

Filed: November 9, 2011

Publication date: May 9, 2013

Inventors: Jerome F. DULUK, JR., Lacky V. SHAH, Sean J. TREICHLER

prev … 4 5 6 7 8 9 10 11 12 next