Patents by Inventor Emmett M. Kilgariff

Emmett M. Kilgariff has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Distributed stream output in a parallel processing unit

Patent number: 8817031

Abstract: A technique for performing stream output operations in a parallel processing system is disclosed. A stream synchronization unit is provided that enables the parallel processing unit to track batches of vertices being processed in a graphics processing pipeline. A plurality of stream output units is also provided, where each stream output unit writes vertex attribute data to one or more stream output buffers for a portion of the batches of vertices. A messaging protocol is implemented between the stream synchronization unit and the plurality of stream output units that ensures that each of the stream output units writes vertex attribute data for the particular batch of vertices distributed to that particular stream output unit in the same order in the stream output buffers as the order in which the batch of vertices was received from a device driver by the parallel processing unit.

Type: Grant

Filed: September 29, 2010

Date of Patent: August 26, 2014

Assignee: NVIDIA Corporation

Inventors: Ziyad S. Hakura, Rohit Gupta, Michael C. Shebanow, Emmett M. Kilgariff
SHADER PROGRAM ATTRIBUTE STORAGE

Publication number: 20140204106

Abstract: A system, method, and computer program product are provided for determining a size of an attribute storage buffer. Input attributes read by a shader program to generate output attributes are identified. A portion of the output attributes to be consumed by a destination shader program is identified. The size of the attribute storage buffer that is allocated for execution of the shader program is computed based on the input attributes and the portion of the output attributes.

Type: Application

Filed: January 18, 2013

Publication date: July 24, 2014

Applicant: NVIDIA CORPORATION

Inventors: Ziyad Sami Hakura, Emmett M. Kilgariff
MID-PRIMITIVE GRAPHICS EXECUTION PREEMPTION

Publication number: 20140184617

Abstract: One embodiment of the present invention sets forth a technique for mid-primitive execution preemption. When preemption is initiated no new instructions are issued, in-flight instructions progress to an execution unit boundary, and the execution state is unloaded from the processing pipeline. The execution units within the processing pipeline, including the coarse rasterization unit complete execution of in-flight instructions and become idle. However, rasterization of a triangle may be preempted at a coarse raster region boundary. The amount of context state to be stored is reduced because the execution units are idle. Preempting at the mid-primitive level during rasterization reduces the time from when preemption is initiated to when another process can execute because the entire triangle is not rasterized.

Type: Application

Filed: December 27, 2012

Publication date: July 3, 2014

Applicant: NVIDIA CORPORATION

Inventors: Gregory Scott PALMER, Ziyad S. HAKURA, Emmett M. KILGARIFF, Dale L. KIRKLAND, Lacky V. SHAH
Hardware-managed virtual buffers using a shared memory for load distribution

Patent number: 8760460

Abstract: One embodiment of the present invention sets forth a technique for using a shared memory to store hardware-managed virtual buffers. A circular buffer is allocated within a general-purpose multi-use cache for storage of primitive attribute data rather than having a dedicated buffer for the storage of the primitive attribute data. The general-purpose multi-use cache is also configured to store other graphics data sinces the space requirement for primitive attribute data storage is highly variable, depending on the number of attributes and the size of primitives. Entries in the circular buffer are allocated as needed and released and invalidated after the primitive attribute data has been consumed. An address to the circular buffer entry is transmitted along with primitive descriptors from object-space processing to the distributed processing in screen-space.

Type: Grant

Filed: May 4, 2010

Date of Patent: June 24, 2014

Assignee: NVIDIA Corporation

Inventors: Emmett M. Kilgariff, Steven E. Molnar, Sean J. Treichler, Johnny S. Rhoades, Gernot Schaufler, Dale L. Kirkland, Cynthia Ann Edgeworth Allison, Karl M. Wurstner, Timothy John Purcell
COMPUTING TESSELLATION COORDINATES USING DEDICATED HARDWARE

Publication number: 20140160126

Abstract: A system and method for performing tessellation of three-dimensional surface patches performs some tessellation operations using programmable processing units and other tessellation operations using fixed function units with limited precision. (u,v) parameter coordinates for each vertex are computed using fixed function units to offload programmable processing engines. The (u,v) computation is a symmetric operation and is based on integer coordinates of the vertex, tessellation level of detail values, and a spacing mode.

Type: Application

Filed: December 2, 2013

Publication date: June 12, 2014

Applicant: NVIDIA CORPORATION

Inventors: Justin S. LEGAKIS, Emmett M. KILGARIFF, Michael C. SHEBANOW
ORDER-PRESERVING DISTRIBUTED RASTERIZER

Publication number: 20140152652

Abstract: One embodiment of the present invention sets forth a technique for rendering graphics primitives in parallel while maintaining the API primitive ordering. Multiple, independent geometry units perform geometry processing concurrently on different graphics primitives. A primitive distribution scheme delivers primitives concurrently to multiple rasterizers at rates of multiple primitives per clock while maintaining the primitive ordering for each pixel. The multiple, independent rasterizer units perform rasterization concurrently on one or more graphics primitives, enabling the rendering of multiple primitives per system clock.

Type: Application

Filed: November 18, 2013

Publication date: June 5, 2014

Applicant: NVIDIA CORPORATION

Inventors: Steven E. MOLNAR, Emmett M. KILGARIFF, John S. RHOADES, Timothy John PURCELL, Sean J. TREICHLER, Ziyad S. HAKURA, Franklin C. CROW, James C. BOWMAN
SETTING DOWNSTREAM RENDER STATE IN AN UPSTREAM SHADER

Publication number: 20140125669

Abstract: Techniques are disclosed for processing graphics objects in a stage of a graphics processing pipeline. The techniques include receiving a graphics primitive associated with the graphics object, and determining a plurality of attributes corresponding to one or more vertices associated with the graphics primitive. The techniques further include determining values for one or more state parameters associated with a downstream stage of the graphics processing pipeline based on a visual effect associated with the graphics primitive. The techniques further include transmitting the state parameter values to the downstream stage of the graphics processing pipeline. One advantage of the disclosed techniques is that visual effects are flexibly and efficiently performed.

Type: Application

Filed: November 7, 2012

Publication date: May 8, 2014

Applicant: NVIDIA CORPORATION

Inventors: Emmett M. KILGARIFF, Morgan McGUIRE, Yury Y. URALSKY, Ziyad S. HAKURA
TILED CACHE INVALIDATION

Publication number: 20140122812

Abstract: One embodiment of the present invention sets forth a graphics subsystem. The graphics subsystem includes a first tiling unit associated with a first set of raster tiles and a crossbar unit. The crossbar unit is configured to transmit a first set of primitives to the first tiling unit and to transmit a first cache invalidate command to the first tiling unit. The first tiling unit is configured to determine that a second bounding box associated with primitives included in the first set of primitives overlaps a first cache tile and that the first bounding box overlaps the first cache tile. The first tiling unit is further configured to transmit the primitives and the first cache invalidate command to a first screen-space pipeline associated with the first tiling unit for processing. The screen-space pipeline processes the cache invalidate command to invalidate cache lines specified by the cache invalidate command.

Type: Application

Filed: September 3, 2013

Publication date: May 1, 2014

Applicant: NVIDIA CORPORATION

Inventors: Ziyad S. HAKURA, Emmett M. KILGARIFF
ON-CHIP ANTI-ALIAS RESOLVE IN A CACHE TILING ARCHITECTURE

Publication number: 20140118352

Abstract: One embodiment of the present invention includes a graphics subsystem for processing multi-sample anti-aliasing work. The graphics subsystem includes a cache unit, a tiling unit, and a screen-space pipeline coupled to the cache unit and to the tiling unit. The tiling unit is configured to organize multi-sample anti-aliasing commands into cache tile batches. The screen-space pipeline includes a pixel shader and a raster operations unit, and receives cache tile batches from the tiling unit. The pixel shader is configured to generate sample data based on a set of primitives and to generate resolved data based on the sample data. The raster operations unit is configured to store the sample data in the cache unit and to invalidate the sample data after the pixel shader generates the resolved data.

Type: Application

Filed: June 25, 2013

Publication date: May 1, 2014

Inventors: Ziyad S. HAKURA, Emmett M. KILGARIFF
CACHING OF ADAPTIVELY SIZED CACHE TILES IN A UNIFIED L2 CACHE WITH SURFACE COMPRESSION

Publication number: 20140118379

Abstract: One embodiment of the present invention includes techniques for adaptively sizing cache tiles in a graphics system. A device driver associated with a graphics system sets a cache tile size associated with a cache tile to a first size. The detects a change from a first render target configuration that includes a first set of render targets to a second render target configuration that includes a second set of render targets. The device driver sets the cache tile size to a second size based on the second render target configuration. One advantage of the disclosed approach is that the cache tile size is adaptively sized, resulting in fewer cache tiles for less complex render target configurations. Adaptively sizing cache tiles leads to more efficient processor utilization and reduced power requirements. In addition, a unified L2 cache tile allows dynamic partitioning of cache memory between cache tile data and other data.

Type: Application

Filed: August 28, 2013

Publication date: May 1, 2014

Applicant: NVIDIA CORPORATION

Inventors: Ziyad S. HAKURA, Rouslan DIMITROV, Emmett M. KILGARIFF, Andrei KHODAKOVSKY
HEURISTICS FOR IMPROVING PERFORMANCE IN A TILE BASED ARCHITECTURE

Publication number: 20140118376

Abstract: One embodiment of the present invention includes a technique for processing graphics primitives in a tile-based architecture. The technique includes storing, in a buffer, a first plurality of graphics primitives and a first plurality of state bundles received from the world-space pipeline. The technique further includes determining, based on a first condition, that the first plurality of graphics primitives should be replayed from the buffer, and, in response, replaying the first plurality of graphics primitives against a first tile included in a first plurality of tiles. Replaying the first plurality of graphics primitives includes comparing each graphics primitive against the first tile to determine whether the graphics primitive intersects the first tile, determining that one or more graphics primitives intersects the first tile, and transmitting the one or more graphics primitives and one or more associated state bundles to a screen-space pipeline for processing.

Type: Application

Filed: October 4, 2013

Publication date: May 1, 2014

Applicant: NVIDIA CORPORATION

Inventors: Ziyad S. HAKURA, Walter R. STEINER, Cynthia Ann Edgeworth ALLISON, Rouslan DIMITROV, Karim M. ABDALLA, Dale L. KIRKLAND, Emmett M. KILGARIFF
Distributed clip, cull, viewport transform and perspective correction

Patent number: 8704835

Abstract: A parallel processing subsystem includes a plurality of general processing clusters (GPCs). Each GPC includes one or more clipping, culling, viewport transformation, and perspective correction engines (VPC). Since VPCs are distributed per GPC, each VPC can process graphics primitives in parallel with the other VPCs processing graphics primitives.

Type: Grant

Filed: October 8, 2009

Date of Patent: April 22, 2014

Assignee: Nvidia Corporation

Inventors: Ziyad S. Hakura, Emmett M. Kilgariff
Distributing primitives to multiple rasterizers

Patent number: 8704836

Abstract: One embodiment of the present invention sets forth a technique for parallel distribution of primitives to multiple rasterizers. Multiple, independent geometry units perform geometry processing concurrently on different graphics primitives. A primitive distribution scheme delivers primitives from the multiple geometry units concurrently to multiple rasterizers at rates of multiple primitives per clock. The multiple, independent rasterizer units perform rasterization concurrently on one or more graphics primitives, enabling the rendering of multiple primitives per system clock.

Type: Grant

Filed: October 19, 2009

Date of Patent: April 22, 2014

Assignee: NVIDIA Corporation

Inventors: Johnny S. Rhoades, Steven E. Molnar, Emmett M. Kilgariff, Michael C. Shebanow, Ziyad S. Hakura, Dale L. Kirkland, James Daniel Kelly
Calculation of plane equations after determination of Z-buffer visibility

Patent number: 8692829

Abstract: One embodiment of the present invention sets forth a technique for computing plane equations for primitive shading after non-visible pixels are removed by z culling operations and pixel coverage has been determined. The z plane equations are computed before the plane equations for non-z primitive attributes are computed. The z plane equations are then used to perform screen-space z culling of primitives during and following rasterization. Culling of primitives is also performed based on pixel sample coverage. Consequently, primitives that have visible pixels after z culling operations reach the primitive shading unit. The non-z plane equations are only computed for geometry that is visible after the z culling operations. The primitive shading unit does not need to fetch vertex attributes from memory and does not need to compute non-z plane equations for the culled primitives.

Type: Grant

Filed: September 7, 2010

Date of Patent: April 8, 2014

Assignee: Nvidia Corporation

Inventors: Ziyad S. Hakura, Emmett M. Kilgariff
SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR PROVIDING A DYNAMIC DISPLAY REFRESH

Publication number: 20140092113

Abstract: A system, method, and computer program product are provided for a dynamic display refresh. In use, a state of a display device is identified in which an entirety of an image frame is currently displayed by the display device. In response to the identification of the state, it is determined whether an entirety of a next image frame to be displayed has been rendered to memory. The next image frame is transmitted to the display device for display thereof, when it is determined that the entirety of the next image frame to be displayed has been rendered to the memory. Further, a refresh of the display device is delayed, when it is determined that the entirety of the next image frame to be displayed has not been rendered to the memory.

Type: Application

Filed: September 11, 2013

Publication date: April 3, 2014

Applicant: NVIDIA Corporation

Inventors: Tom Petersen, David Wyatt, Paul van der Kouwe, Emmett M. Kilgariff, Laurence Harrison, Jensen Huang, Tony Tamasi, Gerrit A. Slavenburg, Thomas F. Fox, David Matthew Stears, Robert Jan Schutten, Ross Cunniff, Ajay Kamalvanshi, Robert Osborne, Rouslan L. Dimitrov
Alpha-to-coverage value determination using virtual samples

Patent number: 8669999

Abstract: One embodiment of the present invention sets forth a technique for converting alpha values into pixel coverage masks. Geometric coverage is sampled at a number of “real” sample positions within each pixel. Color and depth values are computed for each of these real samples. Fragment alpha values are used to determine an alpha coverage mask for the real samples and additional “virtual” samples, in which the number of bits set in the mask bits is proportional to the alpha value. An alpha-to-coverage mode uses the virtual samples to increase the number of transparency levels for each pixel compared with using only real samples. The alpha-to-coverage mode may be used in conjunction with virtual coverage anti-aliasing to provide higher-quality transparency for rendering anti-aliased images.

Type: Grant

Filed: October 14, 2010

Date of Patent: March 11, 2014

Assignee: NVIDIA Corporation

Inventors: Walter E. Donovan, Emmett M. Kilgariff, Steven E. Molnar, Christian Amsinck, Robert Ohannessian
RELAXED COHERENCY BETWEEN DIFFERENT CACHES

Publication number: 20140025891

Abstract: One embodiment sets forth a technique for ensuring relaxed coherency between different caches. Two different execution units may be configured to access different caches that may store one or more cache lines corresponding to the same memory address. During time periods between memory barrier instructions relaxed coherency is maintained between the different caches. More specifically, writes to a cache line in a first cache that corresponds to a particular memory address are not necessarily propagated to a cache line in a second cache before the second cache receives a read or write request that also corresponds to the particular memory address. Therefore, the first cache and the second are not necessarily coherent during time periods of relaxed coherency. Execution of a memory barrier instruction ensures that the different caches will be coherent before a new period of relaxed coherency begins.

Type: Application

Filed: July 20, 2012

Publication date: January 23, 2014

Inventors: Joel James MCCORMACK, Rajesh KOTA, Olivier GIROUX, Emmett M. KILGARIFF
Computing tessellation coordinates using dedicated hardware

Patent number: 8599202

Abstract: A system and method for performing tessellation of three-dimensional surface patches performs some tessellation operations using programmable processing units and other tessellation operations using fixed function units with limited precision. (u,v) parameter coordinates for each vertex are computed using fixed function units to offload programmable processing engines. The (u,v) computation is a symmetric operation and is based on integer coordinates of the vertex, tessellation level of detail values, and a spacing mode.

Type: Grant

Filed: September 29, 2008

Date of Patent: December 3, 2013

Assignee: Nvidia Corporation

Inventors: Justin S. Legakis, Emmett M. Kilgariff, Michael C. Shebanow
Order-preserving distributed rasterizer

Patent number: 8587581

Abstract: One embodiment of the present invention sets forth a technique for rendering graphics primitives in parallel while maintaining the API primitive ordering. Multiple, independent geometry units perform geometry processing concurrently on different graphics primitives. A primitive distribution scheme delivers primitives concurrently to multiple rasterizers at rates of multiple primitives per clock while maintaining the primitive ordering for each pixel. The multiple, independent rasterizer units perform rasterization concurrently on one or more graphics primitives, enabling the rendering of multiple primitives per system clock.

Type: Grant

Filed: October 15, 2009

Date of Patent: November 19, 2013

Assignee: Nvidia Corporation

Inventors: Steven E. Molnar, Emmett M. Kilgariff, Johnny S. Rhoades, Timothy John Purcell, Sean J. Treichler, Ziyad S. Hakura, Franklin C. Crow, James C. Bowman
Cull before vertex attribute fetch and vertex lighting

Patent number: 8564616

Abstract: One embodiment of the invention sets forth a mechanism for compiling a vertex shader program into two portions, a culling portion and a shading portion. The culling portion of the compiled vertex shader program specifies vertex attributes and instructions of the vertex shader program needed to determine whether early vertex culling operations should be performed on a batch of vertices associated with one or more primitives of a graphics scene. The shading portion of the compiled vertex shader program specifies the remaining vertex attributes and instructions of the vertex shader program for performing vertex lighting and performing other operations on the vertices in the batch of vertices. When the compiled vertex shader program is executed by graphics processing hardware, the shading portion of the compiled vertex shader is executed only when early vertex culling operations are not performed on the batch of vertices.

Type: Grant

Filed: July 17, 2009

Date of Patent: October 22, 2013

Assignee: Nvidia Corporation

Inventors: Ziyad S. Hakura, John Erik Lindholm, Emmett M. Kilgariff, Robert Ohannessian, Scott R. Whitman, James C. Bowman, Patrick R. Brown, Ross A. Cunniff

prev 1 2 3 4 5 6 next