Patents by Inventor Jeffrey A. Bolz
Jeffrey A. Bolz has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20250021622Abstract: Disclosed are systems and techniques for efficient vector-matrix multiply operations across parallel processing unit threads. The techniques include receiving first data of a first thread, the first data comprising a first input vector and a first matrix. The techniques further include receiving second data of a second thread, the second data comprising a second input vector and a second matrix. The techniques further include combining the first input vector and the second input vector into an input matrix and generating a result matrix at least by multiplying the input matrix by the first matrix using a matrix-multiply circuit. The techniques further include separating the result matrix into a first result value and a second result value, the first result value corresponding to the first thread and the second result value corresponding to the second thread.Type: ApplicationFiled: July 11, 2024Publication date: January 16, 2025Inventors: Sean Jeffrey Treichler, Yury Uralsky, Karthik Vaidyanathan, Franz Petrik Clarberg, Jeffrey Alan Bolz, John Matthew Burgess, Ajay Sudarshan Tirumala
-
Patent number: 10607390Abstract: A device driver is configured to identify a group of compute shaders to be executed in multiple traversals of a graphics processing pipeline. Each such compute shader accesses a compute tile of data having particular dimensions. The device driver interoperates with a tiling unit to determines dimension for a cache tile so that an integer multiple of each compute tile will fit evenly within the cache tile. Thus, when executing compute shaders in different traversals of the graphics processing pipeline, the data processed by those compute shaders can be cached in the cache tile between passes.Type: GrantFiled: December 14, 2016Date of Patent: March 31, 2020Assignee: NVIDIA CorporationInventor: Jeffrey A. Bolz
-
Patent number: 10535114Abstract: In one embodiment of the present invention a driver configures a graphics pipeline implemented in a cache tiling architecture to perform dynamically-defined multi-pass rendering sequences. In operation, based on sequence-specific configuration data, the driver determines an optimized tile size and, for each pixel in each pass, the set of pixels in each previous pass that influence the processing of the pixel. The driver then configures the graphics pipeline to perform per-tile rendering operations in a region that is translated by a pass-specific offset backward—vertically and/or horizontally—along a tiled caching traversal line. Notably, the offset ensures that the required pixel data from previous passes is available. The driver further configures the graphics pipeline to store the rendered data in cache lines.Type: GrantFiled: August 18, 2015Date of Patent: January 14, 2020Assignee: NVIDIA CORPORATIONInventor: Jeffrey A. Bolz
-
Patent number: 10453168Abstract: A tile coalescer within a graphics processing pipeline coalesces coverage data into tiles. The coverage data indicates, for a set of XY positions, whether a graphics primitive covers those XY positions. The tile indicates, for a larger set of XY positions, whether one or more graphics primitives cover those XY positions. The tile coalescer includes coverage data in the tile only once for each XY position, thereby allowing the API ordering of the graphics primitives covering each XY position to be preserved. The tile is then distributed to a set of streaming multiprocessors for shading and blending operations. The different streaming multiprocessors execute thread groups to process the tile. In doing so, those thread groups may perform read-modify-write operations with data stored in memory. Each such thread group is scheduled to execute via atomic operations, and according to the API order of the associated graphics primitives.Type: GrantFiled: August 17, 2018Date of Patent: October 22, 2019Assignee: NVIDIA CORPORATIONInventors: Ziyad Hakura, Eric Lum, Dale Kirkland, Jack Choquette, Patrick R. Brown, Yury Y. Uralsky, Jeffrey Bolz
-
Patent number: 10430989Abstract: A multi-pass unit interoperates with a device driver to configure a screen space pipeline to perform multiple processing passes with buffered graphics primitives. The multi-pass unit receives primitive data and state bundles from the device driver. The primitive data includes a graphics primitive and a primitive mask. The primitive mask indicates the specific passes when the graphics primitive should be processed. The state bundles include one or more state settings and a state mask. The state mask indicates the specific passes where the state settings should be applied. The primitives and state settings are interleaved. For a given pass, the multi-pass unit extracts the interleaved state settings for that pass and configures the screen space pipeline according to those state settings. The multi-pass unit also extracts the interleaved graphics primitives to be processed in that pass. Then, the multi-pass unit causes the screen space pipeline to process those graphics primitives.Type: GrantFiled: November 25, 2015Date of Patent: October 1, 2019Assignee: NVIDIA CORPORATIONInventors: Ziyad Hakura, Cynthia Allison, Dale Kirkland, Jeffrey Bolz, Yury Uralsky, Jonah Alben
-
Patent number: 10417990Abstract: A method of binding graphics resources is provided that includes: (1) identifying graphics resources for binding, (2) generating a bind group for the graphics resources, (3) organizing the bind group into a bind group memory using a bind group layout and (4) providing bind group control for processing of the bind group. A method of organizing graphics resources and a resource organizing unit are also provided.Type: GrantFiled: September 16, 2015Date of Patent: September 17, 2019Assignee: Nvidia CorporationInventor: Jeffrey A. Bolz
-
Patent number: 10187663Abstract: A subsystem configured to encode an RGBA8 data stream assembles sequences of four-byte groups from the data stream. The subsystem decorrelates the red and blue channels, and computes a difference between each four-byte group and an anchor value. The anchor is encoded at full value. The subsystem then assigns each group a five-bit header based on the number and location of non-zero bytes and on the data content of the non-zero bytes within the group. The subsystem favors zero valued bytes. Thus, when a group includes only zero valued bytes, the header is sufficient to encode the group; no data bits are necessary. Further, two successive groups of zero-valued bytes may be encoded as a single header with no data bits, achieving further data reduction. Finally, the subsystem concatenates all the headers with associated data to yield the source data stream compressed to some ratio, e.g. four-to-one.Type: GrantFiled: August 20, 2015Date of Patent: January 22, 2019Assignee: NVIDIA CORPORATIONInventors: Jeffrey A. Bolz, Jeffrey Pool
-
Publication number: 20180374185Abstract: A tile coalescer within a graphics processing pipeline coalesces coverage data into tiles. The coverage data indicates, for a set of XY positions, whether a graphics primitive covers those XY positions. The tile indicates, for a larger set of XY positions, whether one or more graphics primitives cover those XY positions. The tile coalescer includes coverage data in the tile only once for each XY position, thereby allowing the API ordering of the graphics primitives covering each XY position to be preserved. The tile is then distributed to a set of streaming multiprocessors for shading and blending operations. The different streaming multiprocessors execute thread groups to process the tile. In doing so, those thread groups may perform read-modify-write operations with data stored in memory. Each such thread group is scheduled to execute via atomic operations, and according to the API order of the associated graphics primitives.Type: ApplicationFiled: August 17, 2018Publication date: December 27, 2018Inventors: Ziyad Hakura, Eric Lum, Dale Kirkland, Jack Choquette, Patrick R. Brown, Yury Y. Uralsky, Jeffrey Bolz
-
Patent number: 10147222Abstract: A multi-pass unit interoperates with a device driver to configure a screen space pipeline to perform multiple processing passes with buffered graphics primitives. The multi-pass unit receives primitive data and state bundles from the device driver. The primitive data includes a graphics primitive and a primitive mask. The primitive mask indicates the specific passes when the graphics primitive should be processed. The state bundles include one or more state settings and a state mask. The state mask indicates the specific passes where the state settings should be applied. The primitives and state settings are interleaved. For a given pass, the multi-pass unit extracts the interleaved state settings for that pass and configures the screen space pipeline according to those state settings. The multi-pass unit also extracts the interleaved graphics primitives to be processed in that pass. Then, the multi-pass unit causes the screen space pipeline to process those graphics primitives.Type: GrantFiled: November 25, 2015Date of Patent: December 4, 2018Assignee: NVIDIA CORPORATIONInventors: Ziyad Hakura, Cynthia Allison, Dale Kirkland, Jeffrey Bolz, Yury Uralsky, Jonah Alben
-
Patent number: 10134169Abstract: One embodiment of the present invention sets forth a method for accessing texture objects stored within a texture memory. The method comprises the steps of receiving a texture bind request from an application program, wherein the texture bind request includes an object identifier that identifies a first texture object stored in the texture memory and an image identifier that identifies a first image unit, binding the first texture object to the first image unit based on the texture bind request, receiving, within a shader engine, a first shading program command from the application program for performing a first memory access operation on the first texture object, wherein the memory access operation is a store operation or atomic operation to an arbitrary location in the image, and performing, within the shader engine, the first memory access operation on the first texture object via the first image unit.Type: GrantFiled: August 12, 2010Date of Patent: November 20, 2018Assignee: NVIDIA CORPORATIONInventors: Jeffrey A. Bolz, Patrick R. Brown
-
Patent number: 10121220Abstract: A parallel processor and a method of reducing texture cache invalidation are disclosed. In one embodiment, the parallel processor includes a cache configured to receive lines of data; and a parallel execution unit associated with the cache and configured to execute parallel counterparts of an operation. The parallel counterparts, when executed, are configured to create, in the cache, corresponding aliases of a line of data pertaining to the operation such that the parallel counterparts are operable to invalidate only the corresponding aliases.Type: GrantFiled: April 28, 2015Date of Patent: November 6, 2018Assignee: Nvidia CorporationInventor: Jeffrey Bolz
-
Patent number: 10083514Abstract: One embodiment of the present invention includes techniques for rasterizing primitives that include edges shared between paths. For each edge, a rasterizer unit selects and applies a sample rule from multiple sample rules. If the edge is shared, then the selected sample rule causes each group of coverage samples associated with a single color sample to be considered as either fully inside or fully outside the edge. Consequently, conflation artifacts caused when the number of coverage samples per pixel exceeds the number of color samples per pixel may be reduced. In prior-art techniques, reducing such conflation artifacts typically involves increasing the number of color samples per pixel to equal the number of coverage samples per pixel. Advantageously, the disclosed techniques enable rendering using algorithms that reduce the ratio of color to coverage samples, thereby decreasing memory consumption and memory bandwidth use, without causing conflation artifacts associated with shared edges.Type: GrantFiled: October 10, 2016Date of Patent: September 25, 2018Assignee: NVIDIA CORPORATIONInventors: Mark J. Kilgard, Jeffrey A. Bolz
-
Patent number: 10083036Abstract: One embodiment of the present invention sets forth a technique for managing graphics processing resources in a tile-based architecture. The technique includes storing a release packet associated with a graphics processing resource in a buffer and initiating a replay of graphics primitives stored in the buffer and associated with the graphics processing resource. The technique further includes, for each tile included in a plurality of tiles and processed during the replay, reading the release packet and determining whether the tile is a last tile processed during the replay. The technique further includes determining not to transmit the release packet to a screen-space pipeline and continuing to read graphics data stored in the buffer if the tile is not the last tile to be processed during the replay, or transmitting the release packet to the screen-space pipeline if the tile is the last tile to be processed during the replay.Type: GrantFiled: October 3, 2013Date of Patent: September 25, 2018Assignee: NVIDIA CORPORATIONInventors: Ziyad S. Hakura, Cynthia Ann Edgeworth Allison, Dale L. Kirkland, Andrei Khodakovsky, Jeffrey A. Bolz
-
Patent number: 10055806Abstract: A tile coalescer within a graphics processing pipeline coalesces coverage data into tiles. The coverage data indicates, for a set of XY positions, whether a graphics primitive covers those XY positions. The tile indicates, for a larger set of XY positions, whether one or more graphics primitives cover those XY positions. The tile coalescer includes coverage data in the tile only once for each XY position, thereby allowing the API ordering of the graphics primitives covering each XY position to be preserved. The tile is then distributed to a set of streaming multiprocessors for shading and blending operations. The different streaming multiprocessors execute thread groups to process the tile. In doing so, those thread groups may perform read-modify-write operations with data stored in memory. Each such thread group is scheduled to execute via atomic operations, and according to the API order of the associated graphics primitives.Type: GrantFiled: October 27, 2015Date of Patent: August 21, 2018Assignee: NVIDIA CORPORATIONInventors: Ziyad Hakura, Eric Lum, Dale Kirkland, Jack Choquette, Patrick R. Brown, Yury Y. Uralsky, Jeffrey Bolz
-
Patent number: 10032242Abstract: A method for managing bind-render-target commands in a tile-based architecture. The method includes receiving a requested set of bound render targets and a draw command. The method also includes, upon receiving the draw command, determining whether a current set of bound render targets includes each of the render targets identified in the requested set. The method further includes, if the current set does not include each render target identified in the requested set, then issuing a flush-tiling-unit-command to a parallel processing subsystem, modifying the current set to include each render target identified in the requested set, and issuing bind-render-target commands identifying the requested set to the tile-based architecture for processing. The method further includes, if the current set of render targets includes each render target identified in the requested set, then not issuing the flush-tiling-unit-command.Type: GrantFiled: October 1, 2013Date of Patent: July 24, 2018Assignee: NVIDIA CORPORATIONInventors: Ziyad S. Hakura, Jeffrey A. Bolz, Amanpreet Grewal, Matthew Johnson, Andrei Khodakovsky
-
Patent number: 10032245Abstract: A tile coalescer within a graphics processing pipeline coalesces coverage data into tiles. The coverage data indicates, for a set of XY positions, whether a graphics primitive covers those XY positions. The tile indicates, for a larger set of XY positions, whether one or more graphics primitives cover those XY positions. The tile coalescer includes coverage data in the tile only once for each XY position, thereby allowing the API ordering of the graphics primitives covering each XY position to be preserved. The tile is then distributed to a set of streaming multiprocessors for shading and blending operations. The different streaming multiprocessors execute thread groups to process the tile. In doing so, those thread groups may perform read-modify-write operations with data stored in memory. Each such thread group is scheduled to execute via atomic operations, and according to the API order of the associated graphics primitives.Type: GrantFiled: October 27, 2015Date of Patent: July 24, 2018Assignee: NVIDIA CORPORATIONInventors: Ziyad Hakura, Eric Lum, Dale Kirkland, Jack Choquette, Patrick R. Brown, Yury Y. Uralsky, Jeffrey Bolz
-
Patent number: 10019776Abstract: A tile coalescer within a graphics processing pipeline coalesces coverage data into tiles. The coverage data indicates, for a set of XY positions, whether a graphics primitive covers those XY positions. The tile indicates, for a larger set of XY positions, whether one or more graphics primitives cover those XY positions. The tile coalescer includes coverage data in the tile only once for each XY position, thereby allowing the API ordering of the graphics primitives covering each XY position to be preserved. The tile is then distributed to a set of streaming multiprocessors for shading and blending operations. The different streaming multiprocessors execute thread groups to process the tile. In doing so, those thread groups may perform read-modify-write operations with data stored in memory. Each such thread group is scheduled to execute via atomic operations, and according to the API order of the associated graphics primitives.Type: GrantFiled: October 27, 2015Date of Patent: July 10, 2018Assignee: NVIDIA CORPORATIONInventors: Ziyad Hakura, Eric Lum, Dale Kirkland, Jack Choquette, Patrick R. Brown, Yury Y. Uralsky, Jeffrey Bolz
-
Publication number: 20180165787Abstract: A device driver is configured to identify a group of compute shaders to be executed in multiple traversals of a graphics processing pipeline. Each such compute shader accesses a compute tile of data having particular dimensions. The device driver interoperates with a tiling unit to determines dimension for a cache tile so that an integer multiple of each compute tile will fit evenly within the cache tile. Thus, when executing compute shaders in different traversals of the graphics processing pipeline, the data processed by those compute shaders can be cached in the cache tile between passes.Type: ApplicationFiled: December 14, 2016Publication date: June 14, 2018Inventor: Jeffrey A. Bolz
-
Patent number: 9773341Abstract: One embodiment of the present invention includes techniques for rasterizing geometries. First, a processing unit defines a bounding primitive that covers the geometry and does not include any internal edges. If the bounding primitive intersects any enabled clip plane, then the processing unit generates fragments to fill a current viewport. Alternatively, the processing unit generates fragments to fill the bounding primitive. Because the rasterized region includes no internal edges, conflation artifacts caused when the number of coverage samples per pixel exceeds the number of color samples per pixel may be reduced. In prior-art techniques, reducing such conflation artifacts typically involves increasing the number of color samples per pixel to equal the number of coverage samples per pixel.Type: GrantFiled: August 20, 2013Date of Patent: September 26, 2017Assignee: NVIDIA CorporationInventors: Jeffrey A. Bolz, Mark J. Kilgard
-
Patent number: 9767600Abstract: A graphics processing pipeline within a parallel processing unit (PPU) is configured to perform path rendering by generating a collection of graphics primitives that represent each path to be rendered. The graphics processing pipeline determines the coverage of each primitive at a number of stencil sample locations within each different pixel. Then, the graphics processing pipeline reduces the number of stencil samples down to a smaller number of color samples, for each pixel. The graphics processing pipeline is configured to modulate a given color sample associated with a given pixel based on the color values of any graphics primitives that cover the stencil samples from which the color sample was reduced. The final color of the pixel is determined by downsampling the color samples associated with the pixel.Type: GrantFiled: September 5, 2013Date of Patent: September 19, 2017Assignee: NVIDIA CorporationInventors: Jeffrey A. Bolz, Mark J. Kilgard, Henry Packard Moreton, Rui M. Bastos, Eric B. Lum