Patents by Inventor Jerome F. Duluk

Jerome F. Duluk has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20140176589
    Abstract: A graphics processing unit includes a set of geometry processing units each configured to process graphics primitives in parallel with one another. A given geometry processing unit generates one or more graphics primitives or geometry objects and buffers the associated vertex data locally. The geometry processing unit also buffers different sets of indices to those vertices, where each such set represents a different graphics primitive or geometry object. The geometry processing units may then stream the buffered vertices and indices to global buffers in parallel with one another. A stream output synchronization unit coordinates the parallel streaming of vertices and indices by providing each geometry processing unit with a different base address within a global vertex buffer where vertices may be written. The stream output synchronization unit also provides each geometry processing unit with a different base address within a global index buffer where indices may be written.
    Type: Application
    Filed: December 20, 2012
    Publication date: June 26, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Jerome F. Duluk, JR., Ziyad S. Hakura, Henry Packard MORETON
  • Publication number: 20140176568
    Abstract: A technique for efficiently rendering content reduces each complex blend mode to a series of basic blend operations. The series of basic blend operations are executed within a recirculating pipeline until a final blended value is computed. The recirculating pipeline is positioned within a color raster operations unit of a graphics processing unit for efficient access to image buffer data.
    Type: Application
    Filed: December 20, 2012
    Publication date: June 26, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Rui BASTOS, Mark J. Kilgard, William Craig McKnight, Jerome F. Duluk, Pierre Souillot, Dale L. Kirkland, Christian Amsinck, Joseph Detmer, Christian Rouet, Don Bittel
  • Publication number: 20140176588
    Abstract: A graphics processing unit includes a set of geometry processing units each configured to process graphics primitives in parallel with one another. A given geometry processing unit generates one or more graphics primitives or geometry objects and buffers the associated vertex data locally. The geometry processing unit also buffers different sets of indices to those vertices, where each such set represents a different graphics primitive or geometry object. The geometry processing units may then stream the buffered vertices and indices to global buffers in parallel with one another. A stream output synchronization unit coordinates the parallel streaming of vertices and indices by providing each geometry processing unit with a different base address within a global vertex buffer where vertices may be written. The stream output synchronization unit also provides each geometry processing unit with a different base address within a global index buffer where indices may be written.
    Type: Application
    Filed: December 20, 2012
    Publication date: June 26, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Jerome F. Duluk, JR., Ziyad S. Hakura, Henry Packard MORETON
  • Publication number: 20140176547
    Abstract: Techniques are disclosed for dispatching pixel information in a graphics processing pipeline. A fragment processing unit in the graphics processing pipeline generates a pixel that includes multiple samples based on a portion of a graphics primitive received by a thread. The fragment processing unit calculates a set of source values, where each source value corresponds to a different sample of the pixel. The fragment processing unit retrieves a set of destination values from a render target, where each destination value corresponds to a different source value. The fragment processing unit blends each source value with a corresponding destination value to create a set of final values, and creates one or more dispatch messages to store the set of final values in a set of output registers. One advantage of the disclosed techniques is that pixel shader programs perform per-sample operations with increased efficiency.
    Type: Application
    Filed: December 21, 2012
    Publication date: June 26, 2014
    Applicant: Nvidia Corporation
    Inventors: JEROME F. DULUK, JR., Jesse David Hall
  • Publication number: 20140176579
    Abstract: Techniques are disclosed for dispatching pixel information in a graphics processing pipeline. A fragment processing unit generates a pixel that includes multiple samples based on a first portion of a graphics primitive received by a first thread. The fragment processing unit calculates a first value for the first pixel, where the first value is calculated only once for the pixel. The fragment processing unit calculates a first set of values for the samples, where each value in the first set of values corresponds to a different sample and is calculated only once for the corresponding sample. The fragment processing unit combines the first value with each value in the first set of values to create a second set of values. The fragment processing unit creates one or more dispatch messages to store the second set of values in a set of output registers. One advantage of the disclosed techniques is that pixel shader programs perform per-sample operations with increased efficiency.
    Type: Application
    Filed: December 21, 2012
    Publication date: June 26, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Jerome F. Duluk, JR., Rouslan DIMITROV, Eric LUM, Rui BASTOS
  • Patent number: 8760455
    Abstract: One embodiment of the present invention sets forth a technique for reducing overhead associated with transmitting primitive draw commands from memory to a graphics processing unit (GPU). Command pairs comprising an end draw command and a begin draw command associated with a conventional graphics application programming interface (API) are selectively replaced with a new construct. The new construct is a reset topology index, which implements a combined function of the end draw command and begin draw command. The new construct improves efficiency by reducing total data transmitted from memory to the GPU.
    Type: Grant
    Filed: October 4, 2010
    Date of Patent: June 24, 2014
    Assignee: NVIDIA Corporation
    Inventors: Jerome F. Duluk, Jr., Thomas Roell, James C. Bowman
  • Patent number: 8749562
    Abstract: A system and method for sharing binding groups between shaders allows for efficient use of shader state data storage resources. In contrast with conventional graphics processors and Application Programming Interfaces that specify a set of binding points for each shader that are exclusive to that shader, two or more shaders may reference the same binding group that includes multiple binding points. As the number and variety of different shaders increases, the number of binding groups may increase at a slower rate since some binding groups may be shared between different shaders.
    Type: Grant
    Filed: September 23, 2009
    Date of Patent: June 10, 2014
    Assignee: NVIDIA Corporation
    Inventor: Jerome F. Duluk, Jr.
  • Publication number: 20140118369
    Abstract: One embodiment of the present invention sets forth a graphics processing system configured to track event counts in a tile-based architecture. The graphics processing system includes a screen-space pipeline and a tiling unit. The screen-space pipeline includes a first unit, a count memory associated with the first unit, and an accumulating memory associated with the first unit. The first unit is configured to detect an event type and increment the count memory. The tiling unit is configured to cause the screen-space pipeline to update an external memory address to reflect a first value stored in the count memory when the first unit completes processing of a first set of primitives. The tiling unit is also configured to cause the screen-space pipeline to update the accumulating memory to reflect a second value stored in the count memory when the first unit completes processing of a second set of primitives.
    Type: Application
    Filed: October 4, 2013
    Publication date: May 1, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Ziyad S. HAKURA, Jerome F. DULUK, JR.
  • Publication number: 20140122829
    Abstract: A technique for simultaneously executing multiple tasks, each having an independent virtual address space, involves assigning an address space identifier (ASID) to each task and constructing each virtual memory access request to include both a virtual address and the ASID. During virtual to physical address translation, the ASID selects a corresponding page table, which includes virtual to physical address mappings for the ASID and associated task. Entries for a translation look-aside buffer (TLB) include both the virtual address and ASID to complete each mapping to a physical address. Deep scheduling of tasks sharing a virtual address space may be implemented to improve cache affinity for both TLB and data caches.
    Type: Application
    Filed: October 25, 2012
    Publication date: May 1, 2014
    Applicant: NVIDIA Corporation
    Inventors: Nick BARROW-WILLIAMS, Brian FAHS, Jerome F. DULUK, JR., James Leroy DEMING, Timothy John PURCELL, Lucien DUNNING, Mark HAIRGROVE
  • Publication number: 20140118370
    Abstract: A graphics processing system configured to track per-tile event counts in a tile-based architecture. A tiling unit in the graphics processing system is configured to cause a screen-space pipeline to load a count value associated with a first cache tile into a count memory and to cause the screen-space pipeline to process a first set of primitives that intersect the first cache tile. The tiling unit is further configured to cause the screen-space pipeline to store a second count value in a report memory location. The tiling unit is also configured to cause the screen-space pipeline to process a second set of primitives that intersect the first cache tile and to cause the screen-space pipeline to store a third count value in the first accumulating memory. Conditional rendering operations may be performed on a per-cache tile basis, based on the per-tile event count.
    Type: Application
    Filed: October 23, 2013
    Publication date: May 1, 2014
    Applicant: NVIDIA Corporation
    Inventors: Ziyad S. HAKURA, Jerome F. DULUK, Jr.
  • Publication number: 20140118347
    Abstract: One embodiment of the present invention sets forth a graphics processing system. The graphics processing system includes a screen-space pipeline and a tiling unit. The screen-space pipeline is configured to perform visibility testing and fragment shading. The tiling unit is configured to determine that a first set of primitives overlaps a first cache tile. The tiling unit is also configured to first transmit the first set of primitives to the screen-space pipeline with a command configured to cause the screen-space pipeline to process the first set of primitives in a z-only mode, and then transmit the first set of primitives to the screen-space pipeline with a command configured to cause the screen-space pipeline to process the first set of primitives in a normal mode. In the z-only mode, at least some fragment shading operations are disabled in the screen-space pipeline. In the normal mode, fragment shading operations are enabled.
    Type: Application
    Filed: October 1, 2013
    Publication date: May 1, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Ziyad S. HAKURA, Jerome F. DULUK, Jr.
  • Publication number: 20140123145
    Abstract: A technique for simultaneously executing multiple tasks, each having an independent virtual address space, involves assigning an address space identifier (ASID) to each task and constructing each virtual memory access request to include both a virtual address and the ASID. During virtual to physical address translation, the ASID selects a corresponding page table, which includes virtual to physical address mappings for the ASID and associated task. Entries for a translation look-aside buffer (TLB) include both the virtual address and ASID to complete each mapping to a physical address. Deep scheduling of tasks sharing a virtual address space may be implemented to improve cache affinity for both TLB and data caches.
    Type: Application
    Filed: October 25, 2012
    Publication date: May 1, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Nick BARROW-WILLIAMS, Brian FAHS, Jerome F. DULUK, Jr., James Leroy DEMING, Timothy John PURCELL, Lucien DUNNING, Mark HAIRGROVE
  • Publication number: 20140118348
    Abstract: A graphics processing pipeline configured for z-cull operations. The graphics processing pipeline comprising a screen-space pipeline and a tiling unit. The screen-space pipeline includes a z-cull unit configured to perform z-culling operations. The tiling unit is configured to determine that a first set of primitives overlaps a first cache tile. The tiling unit is also configured to transmit the first set of primitives to the screen-space pipeline for processing. The tiling unit is further configured to select between processing the first set of primitives in a full-surface z-cull mode or processing the first set of primitives in a partial-surface z-cull mode. The tiling unit is also configured to cause the z-cull unit to process the first set of primitives in the full-surface z-cull mode or to process the first set of primitives in the partial-surface z-cull mode.
    Type: Application
    Filed: October 23, 2013
    Publication date: May 1, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Ziyad S. HAKURA, Jerome F. DULUK, Jr.
  • Publication number: 20140123146
    Abstract: A technique for simultaneously executing multiple tasks, each having an independent virtual address space, involves assigning an address space identifier (ASID) to each task and constructing each virtual memory access request to include both a virtual address and the ASID. During virtual to physical address translation, the ASID selects a corresponding page table, which includes virtual to physical address mappings for the ASID and associated task. Entries for a translation look-aside buffer (TLB) include both the virtual address and ASID to complete each mapping to a physical address. Deep scheduling of tasks sharing a virtual address space may be implemented to improve cache affinity for both TLB and data caches.
    Type: Application
    Filed: October 25, 2012
    Publication date: May 1, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Nick BARROW-WILLIAMS, Brian FAHS, Jerome F. DULUK, JR., James Leroy DEMING, Timothy John PURCELL, Lucien DUNNING, Mark HAIRGROVE
  • Patent number: 8681169
    Abstract: Systems and methods for texture processing are presented. In one embodiment a texture method includes creating a sparse texture residency translation map; performing a probe process utilizing the sparse texture residency translation map information to return a finest LOD that contains the texels for a texture lookup operation; and performing the texture lookup operation utilizing the finest LOD. In one exemplary implementation, the finest LOD is utilized as a minimum LOD clamp during the texture lookup operation. A finest LOD number indicates a minimum resident LOD and a sparse texture residency translation map includes one finest LOD number per tile of a sparse texture. The sparse texture residency translation can indicate a minimum resident LOD.
    Type: Grant
    Filed: December 31, 2009
    Date of Patent: March 25, 2014
    Assignee: Nvidia Corporation
    Inventors: Jesse D. Hall, Jerome F. Duluk, Jr., Andrew Tao, Henry Moreton
  • Patent number: 8619087
    Abstract: One embodiment of the present invention sets forth a technique for reducing the amount of memory required to store vertex data processed within a processing pipeline that includes a plurality of shading engines. The method includes determining a first active shading engine and a second active shading engine included within the processing pipeline, wherein the second active shading engine receives vertex data output by the first active shading engine. An output map is received and indicates one or more attributes that are included in the vertex data and output by the first active shading engine. An input map is received and indicates one or more attributes that are included in the vertex data and received by the second active shading engine from the first active shading engine.
    Type: Grant
    Filed: September 30, 2010
    Date of Patent: December 31, 2013
    Assignee: Nvidia Corporation
    Inventors: Jerome F. Duluk, Jr., Gernot Schaufler
  • Publication number: 20130298133
    Abstract: One embodiment of the present invention sets forth a technique for performing nested kernel execution within a parallel processing subsystem. The technique involves enabling a parent thread to launch a nested child grid on the parallel processing subsystem, and enabling the parent thread to perform a thread synchronization barrier on the child grid for proper execution semantics between the parent thread and the child grid. This technique advantageously enables the parallel processing subsystem to perform a richer set of programming constructs, such as conditionally executed and nested operations and externally defined library functions without the additional complexity of CPU involvement.
    Type: Application
    Filed: May 2, 2012
    Publication date: November 7, 2013
    Inventors: Stephen JONES, Philip Alexander Cuadra, Daniel Elliot Wexler, Ignacio Llamas, Lacky V. Shah, Jerome F. Duluk, JR., Christopher Lamb
  • Publication number: 20130268942
    Abstract: Systems and methods for auto-throttling encapsulated compute tasks. A device driver may configure a parallel processor to execute compute tasks in a number of discrete throttled modes. The device driver may also allocate memory to a plurality of different processing units in a non-throttled mode. The device driver may also allocate memory to a subset of the plurality of processing units in each of the throttling modes. Data structures defined for each task include a flag that instructs the processing unit whether the task may be executed in the non-throttled mode or in the throttled mode. A work distribution unit monitors each of the tasks scheduled to run on the plurality of processing units and determines whether the processor should be configured to run in the throttled mode or in the non-throttled mode.
    Type: Application
    Filed: April 9, 2012
    Publication date: October 10, 2013
    Inventors: Jerome F. DULUK, JR., Jesse David Hall, Philip Alexander Cuadra, Karim M. Abdalla
  • Patent number: 8525842
    Abstract: A semaphore system, method, and computer program product are provided for use in a graphics environment. In operation, a semaphore is operated upon utilizing a plurality of graphics processing modules for a variety of graphics processing-related purposes (e.g. for example, controlling access to graphics data by the graphics processing modules, etc.).
    Type: Grant
    Filed: June 16, 2006
    Date of Patent: September 3, 2013
    Assignee: NVIDIA Corporation
    Inventors: Jerome F. Duluk, Jr., Richard A. Silkebakken
  • Publication number: 20130198760
    Abstract: One embodiment of the present invention sets forth a technique for automatic launching of a dependent task when execution of a first task completes. Automatically launching the dependent task reduces the latency incurred during the transition from the first task to the dependent task. Information associated with the dependent task is encoded as part of the metadata for the first task. When execution of the first task completes a task scheduling unit is notified and the dependent task is launched without requiring any release or acquisition of a semaphore. The information associated with the dependent task includes an enable flag and a pointer to the dependent task. Once the dependent task is launched, the first task is marked as complete so that memory storing the metadata for the first task may be reused to store metadata for a new task.
    Type: Application
    Filed: January 27, 2012
    Publication date: August 1, 2013
    Inventors: Philip Alexander CUADRA, Lacky V. Shah, Timothy John Purcell, Gerald F. Luiz, Jerome F. Duluk, JR.