Patents by Inventor Jerome F. Duluk

Jerome F. Duluk has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

TECHNIQUE FOR STORING SHARED VERTICES

Publication number: 20140176589

Abstract: A graphics processing unit includes a set of geometry processing units each configured to process graphics primitives in parallel with one another. A given geometry processing unit generates one or more graphics primitives or geometry objects and buffers the associated vertex data locally. The geometry processing unit also buffers different sets of indices to those vertices, where each such set represents a different graphics primitive or geometry object. The geometry processing units may then stream the buffered vertices and indices to global buffers in parallel with one another. A stream output synchronization unit coordinates the parallel streaming of vertices and indices by providing each geometry processing unit with a different base address within a global vertex buffer where vertices may be written. The stream output synchronization unit also provides each geometry processing unit with a different base address within a global index buffer where indices may be written.

Type: Application

Filed: December 20, 2012

Publication date: June 26, 2014

Applicant: NVIDIA CORPORATION

Inventors: Jerome F. Duluk, JR., Ziyad S. Hakura, Henry Packard MORETON
PROGRAMMABLE BLENDING IN MULTI-THREADED PROCESSING UNITS

Publication number: 20140176568

Abstract: A technique for efficiently rendering content reduces each complex blend mode to a series of basic blend operations. The series of basic blend operations are executed within a recirculating pipeline until a final blended value is computed. The recirculating pipeline is positioned within a color raster operations unit of a graphics processing unit for efficient access to image buffer data.

Type: Application

Filed: December 20, 2012

Publication date: June 26, 2014

Applicant: NVIDIA CORPORATION

Inventors: Rui BASTOS, Mark J. Kilgard, William Craig McKnight, Jerome F. Duluk, Pierre Souillot, Dale L. Kirkland, Christian Amsinck, Joseph Detmer, Christian Rouet, Don Bittel
TECHNIQUE FOR STORING SHARED VERTICES

Publication number: 20140176588

Abstract: A graphics processing unit includes a set of geometry processing units each configured to process graphics primitives in parallel with one another. A given geometry processing unit generates one or more graphics primitives or geometry objects and buffers the associated vertex data locally. The geometry processing unit also buffers different sets of indices to those vertices, where each such set represents a different graphics primitive or geometry object. The geometry processing units may then stream the buffered vertices and indices to global buffers in parallel with one another. A stream output synchronization unit coordinates the parallel streaming of vertices and indices by providing each geometry processing unit with a different base address within a global vertex buffer where vertices may be written. The stream output synchronization unit also provides each geometry processing unit with a different base address within a global index buffer where indices may be written.

Type: Application

Filed: December 20, 2012

Publication date: June 26, 2014

Applicant: NVIDIA CORPORATION

Inventors: Jerome F. Duluk, JR., Ziyad S. Hakura, Henry Packard MORETON
PROGRAMMABLE BLENDING VIA MULTIPLE PIXEL SHADER DISPATCHES

Publication number: 20140176547

Abstract: Techniques are disclosed for dispatching pixel information in a graphics processing pipeline. A fragment processing unit in the graphics processing pipeline generates a pixel that includes multiple samples based on a portion of a graphics primitive received by a thread. The fragment processing unit calculates a set of source values, where each source value corresponds to a different sample of the pixel. The fragment processing unit retrieves a set of destination values from a render target, where each destination value corresponds to a different source value. The fragment processing unit blends each source value with a corresponding destination value to create a set of final values, and creates one or more dispatch messages to store the set of final values in a set of output registers. One advantage of the disclosed techniques is that pixel shader programs perform per-sample operations with increased efficiency.

Type: Application

Filed: December 21, 2012

Publication date: June 26, 2014

Applicant: Nvidia Corporation

Inventors: JEROME F. DULUK, JR., Jesse David Hall
EFFICIENT SUPER-SAMPLING WITH PER-PIXEL SHADER THREADS

Publication number: 20140176579

Abstract: Techniques are disclosed for dispatching pixel information in a graphics processing pipeline. A fragment processing unit generates a pixel that includes multiple samples based on a first portion of a graphics primitive received by a first thread. The fragment processing unit calculates a first value for the first pixel, where the first value is calculated only once for the pixel. The fragment processing unit calculates a first set of values for the samples, where each value in the first set of values corresponds to a different sample and is calculated only once for the corresponding sample. The fragment processing unit combines the first value with each value in the first set of values to create a second set of values. The fragment processing unit creates one or more dispatch messages to store the second set of values in a set of output registers. One advantage of the disclosed techniques is that pixel shader programs perform per-sample operations with increased efficiency.

Type: Application

Filed: December 21, 2012

Publication date: June 26, 2014

Applicant: NVIDIA CORPORATION

Inventors: Jerome F. Duluk, JR., Rouslan DIMITROV, Eric LUM, Rui BASTOS
Restart index that sets a topology

Patent number: 8760455

Abstract: One embodiment of the present invention sets forth a technique for reducing overhead associated with transmitting primitive draw commands from memory to a graphics processing unit (GPU). Command pairs comprising an end draw command and a begin draw command associated with a conventional graphics application programming interface (API) are selectively replaced with a new construct. The new construct is a reset topology index, which implements a combined function of the end draw command and begin draw command. The new construct improves efficiency by reducing total data transmitted from memory to the GPU.

Type: Grant

Filed: October 4, 2010

Date of Patent: June 24, 2014

Assignee: NVIDIA Corporation

Inventors: Jerome F. Duluk, Jr., Thomas Roell, James C. Bowman
Sharing binding groups between shaders

Patent number: 8749562

Abstract: A system and method for sharing binding groups between shaders allows for efficient use of shader state data storage resources. In contrast with conventional graphics processors and Application Programming Interfaces that specify a set of binding points for each shader that are exclusive to that shader, two or more shaders may reference the same binding group that includes multiple binding points. As the number and variety of different shaders increases, the number of binding groups may increase at a slower rate since some binding groups may be shared between different shaders.

Type: Grant

Filed: September 23, 2009

Date of Patent: June 10, 2014

Assignee: NVIDIA Corporation

Inventor: Jerome F. Duluk, Jr.
MANAGING EVENT COUNT REPORTS IN A TILE-BASED ARCHITECTURE

Publication number: 20140118369

Abstract: One embodiment of the present invention sets forth a graphics processing system configured to track event counts in a tile-based architecture. The graphics processing system includes a screen-space pipeline and a tiling unit. The screen-space pipeline includes a first unit, a count memory associated with the first unit, and an accumulating memory associated with the first unit. The first unit is configured to detect an event type and increment the count memory. The tiling unit is configured to cause the screen-space pipeline to update an external memory address to reflect a first value stored in the count memory when the first unit completes processing of a first set of primitives. The tiling unit is also configured to cause the screen-space pipeline to update the accumulating memory to reflect a second value stored in the count memory when the first unit completes processing of a second set of primitives.

Type: Application

Filed: October 4, 2013

Publication date: May 1, 2014

Applicant: NVIDIA CORPORATION

Inventors: Ziyad S. HAKURA, Jerome F. DULUK, JR.
EFFICIENT MEMORY VIRTUALIZATION IN MULTI-THREADED PROCESSING UNITS

Publication number: 20140122829

Abstract: A technique for simultaneously executing multiple tasks, each having an independent virtual address space, involves assigning an address space identifier (ASID) to each task and constructing each virtual memory access request to include both a virtual address and the ASID. During virtual to physical address translation, the ASID selects a corresponding page table, which includes virtual to physical address mappings for the ASID and associated task. Entries for a translation look-aside buffer (TLB) include both the virtual address and ASID to complete each mapping to a physical address. Deep scheduling of tasks sharing a virtual address space may be implemented to improve cache affinity for both TLB and data caches.

Type: Application

Filed: October 25, 2012

Publication date: May 1, 2014

Applicant: NVIDIA Corporation

Inventors: Nick BARROW-WILLIAMS, Brian FAHS, Jerome F. DULUK, JR., James Leroy DEMING, Timothy John PURCELL, Lucien DUNNING, Mark HAIRGROVE
MANAGING PER-TILE EVENT COUNT REPORTS IN A TILE-BASED ARCHITECTURE

Publication number: 20140118370

Abstract: A graphics processing system configured to track per-tile event counts in a tile-based architecture. A tiling unit in the graphics processing system is configured to cause a screen-space pipeline to load a count value associated with a first cache tile into a count memory and to cause the screen-space pipeline to process a first set of primitives that intersect the first cache tile. The tiling unit is further configured to cause the screen-space pipeline to store a second count value in a report memory location. The tiling unit is also configured to cause the screen-space pipeline to process a second set of primitives that intersect the first cache tile and to cause the screen-space pipeline to store a third count value in the first accumulating memory. Conditional rendering operations may be performed on a per-cache tile basis, based on the per-tile event count.

Type: Application

Filed: October 23, 2013

Publication date: May 1, 2014

Applicant: NVIDIA Corporation

Inventors: Ziyad S. HAKURA, Jerome F. DULUK, Jr.
TWO-PASS CACHE TILE PROCESSING FOR VISIBILITY TESTING IN A TILE-BASED ARCHITECTURE

Publication number: 20140118347

Abstract: One embodiment of the present invention sets forth a graphics processing system. The graphics processing system includes a screen-space pipeline and a tiling unit. The screen-space pipeline is configured to perform visibility testing and fragment shading. The tiling unit is configured to determine that a first set of primitives overlaps a first cache tile. The tiling unit is also configured to first transmit the first set of primitives to the screen-space pipeline with a command configured to cause the screen-space pipeline to process the first set of primitives in a z-only mode, and then transmit the first set of primitives to the screen-space pipeline with a command configured to cause the screen-space pipeline to process the first set of primitives in a normal mode. In the z-only mode, at least some fragment shading operations are disabled in the screen-space pipeline. In the normal mode, fragment shading operations are enabled.

Type: Application

Filed: October 1, 2013

Publication date: May 1, 2014

Applicant: NVIDIA CORPORATION

Inventors: Ziyad S. HAKURA, Jerome F. DULUK, Jr.
EFFICIENT MEMORY VIRTUALIZATION IN MULTI-THREADED PROCESSING UNITS

Publication number: 20140123145

Abstract: A technique for simultaneously executing multiple tasks, each having an independent virtual address space, involves assigning an address space identifier (ASID) to each task and constructing each virtual memory access request to include both a virtual address and the ASID. During virtual to physical address translation, the ASID selects a corresponding page table, which includes virtual to physical address mappings for the ASID and associated task. Entries for a translation look-aside buffer (TLB) include both the virtual address and ASID to complete each mapping to a physical address. Deep scheduling of tasks sharing a virtual address space may be implemented to improve cache affinity for both TLB and data caches.

Type: Application

Filed: October 25, 2012

Publication date: May 1, 2014

Applicant: NVIDIA CORPORATION

Inventors: Nick BARROW-WILLIAMS, Brian FAHS, Jerome F. DULUK, Jr., James Leroy DEMING, Timothy John PURCELL, Lucien DUNNING, Mark HAIRGROVE
HIGHER ACCURACY Z-CULLING IN A TILE-BASED ARCHITECTURE

Publication number: 20140118348

Abstract: A graphics processing pipeline configured for z-cull operations. The graphics processing pipeline comprising a screen-space pipeline and a tiling unit. The screen-space pipeline includes a z-cull unit configured to perform z-culling operations. The tiling unit is configured to determine that a first set of primitives overlaps a first cache tile. The tiling unit is also configured to transmit the first set of primitives to the screen-space pipeline for processing. The tiling unit is further configured to select between processing the first set of primitives in a full-surface z-cull mode or processing the first set of primitives in a partial-surface z-cull mode. The tiling unit is also configured to cause the z-cull unit to process the first set of primitives in the full-surface z-cull mode or to process the first set of primitives in the partial-surface z-cull mode.

Type: Application

Filed: October 23, 2013

Publication date: May 1, 2014

Applicant: NVIDIA CORPORATION

Inventors: Ziyad S. HAKURA, Jerome F. DULUK, Jr.
EFFICIENT MEMORY VIRTUALIZATION IN MULTI-THREADED PROCESSING UNITS

Publication number: 20140123146

Abstract: A technique for simultaneously executing multiple tasks, each having an independent virtual address space, involves assigning an address space identifier (ASID) to each task and constructing each virtual memory access request to include both a virtual address and the ASID. During virtual to physical address translation, the ASID selects a corresponding page table, which includes virtual to physical address mappings for the ASID and associated task. Entries for a translation look-aside buffer (TLB) include both the virtual address and ASID to complete each mapping to a physical address. Deep scheduling of tasks sharing a virtual address space may be implemented to improve cache affinity for both TLB and data caches.

Type: Application

Filed: October 25, 2012

Publication date: May 1, 2014

Applicant: NVIDIA CORPORATION

Inventors: Nick BARROW-WILLIAMS, Brian FAHS, Jerome F. DULUK, JR., James Leroy DEMING, Timothy John PURCELL, Lucien DUNNING, Mark HAIRGROVE
Sparse texture systems and methods

Patent number: 8681169

Abstract: Systems and methods for texture processing are presented. In one embodiment a texture method includes creating a sparse texture residency translation map; performing a probe process utilizing the sparse texture residency translation map information to return a finest LOD that contains the texels for a texture lookup operation; and performing the texture lookup operation utilizing the finest LOD. In one exemplary implementation, the finest LOD is utilized as a minimum LOD clamp during the texture lookup operation. A finest LOD number indicates a minimum resident LOD and a sparse texture residency translation map includes one finest LOD number per tile of a sparse texture. The sparse texture residency translation can indicate a minimum resident LOD.

Type: Grant

Filed: December 31, 2009

Date of Patent: March 25, 2014

Assignee: Nvidia Corporation

Inventors: Jesse D. Hall, Jerome F. Duluk, Jr., Andrew Tao, Henry Moreton
Inter-shader attribute buffer optimization

Patent number: 8619087

Abstract: One embodiment of the present invention sets forth a technique for reducing the amount of memory required to store vertex data processed within a processing pipeline that includes a plurality of shading engines. The method includes determining a first active shading engine and a second active shading engine included within the processing pipeline, wherein the second active shading engine receives vertex data output by the first active shading engine. An output map is received and indicates one or more attributes that are included in the vertex data and output by the first active shading engine. An input map is received and indicates one or more attributes that are included in the vertex data and received by the second active shading engine from the first active shading engine.

Type: Grant

Filed: September 30, 2010

Date of Patent: December 31, 2013

Assignee: Nvidia Corporation

Inventors: Jerome F. Duluk, Jr., Gernot Schaufler
TECHNIQUE FOR COMPUTATIONAL NESTED PARALLELISM

Publication number: 20130298133

Abstract: One embodiment of the present invention sets forth a technique for performing nested kernel execution within a parallel processing subsystem. The technique involves enabling a parent thread to launch a nested child grid on the parallel processing subsystem, and enabling the parent thread to perform a thread synchronization barrier on the child grid for proper execution semantics between the parent thread and the child grid. This technique advantageously enables the parallel processing subsystem to perform a richer set of programming constructs, such as conditionally executed and nested operations and externally defined library functions without the additional complexity of CPU involvement.

Type: Application

Filed: May 2, 2012

Publication date: November 7, 2013

Inventors: Stephen JONES, Philip Alexander Cuadra, Daniel Elliot Wexler, Ignacio Llamas, Lacky V. Shah, Jerome F. Duluk, JR., Christopher Lamb
METHODS AND APPARATUS FOR AUTO-THROTTLING ENCAPSULATED COMPUTE TASKS

Publication number: 20130268942

Abstract: Systems and methods for auto-throttling encapsulated compute tasks. A device driver may configure a parallel processor to execute compute tasks in a number of discrete throttled modes. The device driver may also allocate memory to a plurality of different processing units in a non-throttled mode. The device driver may also allocate memory to a subset of the plurality of processing units in each of the throttling modes. Data structures defined for each task include a flag that instructs the processing unit whether the task may be executed in the non-throttled mode or in the throttled mode. A work distribution unit monitors each of the tasks scheduled to run on the plurality of processing units and determines whether the processor should be configured to run in the throttled mode or in the non-throttled mode.

Type: Application

Filed: April 9, 2012

Publication date: October 10, 2013

Inventors: Jerome F. DULUK, JR., Jesse David Hall, Philip Alexander Cuadra, Karim M. Abdalla
System and method for utilizing semaphores in a graphics pipeline

Patent number: 8525842

Abstract: A semaphore system, method, and computer program product are provided for use in a graphics environment. In operation, a semaphore is operated upon utilizing a plurality of graphics processing modules for a variety of graphics processing-related purposes (e.g. for example, controlling access to graphics data by the graphics processing modules, etc.).

Type: Grant

Filed: June 16, 2006

Date of Patent: September 3, 2013

Assignee: NVIDIA Corporation

Inventors: Jerome F. Duluk, Jr., Richard A. Silkebakken
AUTOMATIC DEPENDENT TASK LAUNCH

Publication number: 20130198760

Abstract: One embodiment of the present invention sets forth a technique for automatic launching of a dependent task when execution of a first task completes. Automatically launching the dependent task reduces the latency incurred during the transition from the first task to the dependent task. Information associated with the dependent task is encoded as part of the metadata for the first task. When execution of the first task completes a task scheduling unit is notified and the dependent task is launched without requiring any release or acquisition of a semaphore. The information associated with the dependent task includes an enable flag and a pointer to the dependent task. Once the dependent task is launched, the first task is marked as complete so that memory storing the metadata for the first task may be reused to store metadata for a new task.

Type: Application

Filed: January 27, 2012

Publication date: August 1, 2013

Inventors: Philip Alexander CUADRA, Lacky V. Shah, Timothy John Purcell, Gerald F. Luiz, Jerome F. Duluk, JR.

prev … 4 5 6 7 8 9 10 11 12 next