Patents by Inventor Ziyad Hakura

Ziyad Hakura has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Techniques for maintaining atomicity and ordering for pixel shader operations

Patent number: 10019776

Abstract: A tile coalescer within a graphics processing pipeline coalesces coverage data into tiles. The coverage data indicates, for a set of XY positions, whether a graphics primitive covers those XY positions. The tile indicates, for a larger set of XY positions, whether one or more graphics primitives cover those XY positions. The tile coalescer includes coverage data in the tile only once for each XY position, thereby allowing the API ordering of the graphics primitives covering each XY position to be preserved. The tile is then distributed to a set of streaming multiprocessors for shading and blending operations. The different streaming multiprocessors execute thread groups to process the tile. In doing so, those thread groups may perform read-modify-write operations with data stored in memory. Each such thread group is scheduled to execute via atomic operations, and according to the API order of the associated graphics primitives.

Type: Grant

Filed: October 27, 2015

Date of Patent: July 10, 2018

Assignee: NVIDIA CORPORATION

Inventors: Ziyad Hakura, Eric Lum, Dale Kirkland, Jack Choquette, Patrick R. Brown, Yury Y. Uralsky, Jeffrey Bolz
DISTRIBUTED INDEX FETCH, PRIMITIVE ASSEMBLY, AND PRIMITIVE BATCHING

Publication number: 20170178401

Abstract: One embodiment of the present invention includes a technique for distributing work slices associated with a graphics processing unit for processing. A primitive distribution system receives a draw command related to a graphics object associated with a plurality of indices. The primitive distribution system creates a plurality of work slices, where each work slice is associated with a different subset of the indices included in the plurality of indices. The primitive distribution system scans a first subset of indices to identify a first set of characteristics that is needed to process a second subset of indices. The primitive distribution system processes the second subset of indices based at least in part on the one or more characteristics.

Type: Application

Filed: December 22, 2015

Publication date: June 22, 2017

Inventors: Niket Agrawal, Amit Jain, Dale Kirkland, Karim Abdalla, Ziyad Hakura, Haren Kethareswaran
MULTI-PASS RENDERING IN A SCREEN SPACE PIPELINE

Publication number: 20170148204

Abstract: A multi-pass unit interoperates with a device driver to configure a screen space pipeline to perform multiple processing passes with buffered graphics primitives. The multi-pass unit receives primitive data and state bundles from the device driver. The primitive data includes a graphics primitive and a primitive mask. The primitive mask indicates the specific passes when the graphics primitive should be processed. The state bundles include one or more state settings and a state mask. The state mask indicates the specific passes where the state settings should be applied. The primitives and state settings are interleaved. For a given pass, the multi-pass unit extracts the interleaved state settings for that pass and configures the screen space pipeline according to those state settings. The multi-pass unit also extracts the interleaved graphics primitives to be processed in that pass. Then, the multi-pass unit causes the screen space pipeline to process those graphics primitives.

Type: Application

Filed: November 25, 2015

Publication date: May 25, 2017

Inventors: Ziyad HAKURA, Cynthia ALLISON, Dale KIRKLAND, Jeffrey BOLZ, Yury URALSKY, Jonah ALBEN
MULTI-PASS RENDERING IN A SCREEN SPACE PIPELINE

Publication number: 20170148203

Abstract: A multi-pass unit interoperates with a device driver to configure a screen space pipeline to perform multiple processing passes with buffered graphics primitives. The multi-pass unit receives primitive data and state bundles from the device driver. The primitive data includes a graphics primitive and a primitive mask. The primitive mask indicates the specific passes when the graphics primitive should be processed. The state bundles include one or more state settings and a state mask. The state mask indicates the specific passes where the state settings should be applied. The primitives and state settings are interleaved. For a given pass, the multi-pass unit extracts the interleaved state settings for that pass and configures the screen space pipeline according to those state settings. The multi-pass unit also extracts the interleaved graphics primitives to be processed in that pass. Then, the multi-pass unit causes the screen space pipeline to process those graphics primitives.

Type: Application

Filed: November 25, 2015

Publication date: May 25, 2017

Inventors: Ziyad HAKURA, Cynthia ALLISON, Dale KIRKLAND, Jeffrey BOLZ, Yury URALSKY, Jonah ALBEN
TECHNIQUES FOR MAINTAINING ATOMICITY AND ORDERING FOR PIXEL SHADER OPERATIONS

Publication number: 20170116699

Abstract: A tile coalescer within a graphics processing pipeline coalesces coverage data into tiles. The coverage data indicates, for a set of XY positions, whether a graphics primitive covers those XY positions. The tile indicates, for a larger set of XY positions, whether one or more graphics primitives cover those XY positions. The tile coalescer includes coverage data in the tile only once for each XY position, thereby allowing the API ordering of the graphics primitives covering each XY position to be preserved. The tile is then distributed to a set of streaming multiprocessors for shading and blending operations. The different streaming multiprocessors execute thread groups to process the tile. In doing so, those thread groups may perform read-modify-write operations with data stored in memory. Each such thread group is scheduled to execute via atomic operations, and according to the API order of the associated graphics primitives.

Type: Application

Filed: October 27, 2015

Publication date: April 27, 2017

Inventors: ZIYAD HAKURA, ERIC LUM, DALE KIRKLAND, JACK CHOQUETTE, PATRICK R. BROWN, YURY Y. URALSKY, JEFFREY BOLZ
TECHNIQUES FOR MAINTAINING ATOMICITY AND ORDERING FOR PIXEL SHADER OPERATIONS

Publication number: 20170116700

Abstract: A tile coalescer within a graphics processing pipeline coalesces coverage data into tiles. The coverage data indicates, for a set of XY positions, whether a graphics primitive covers those XY positions. The tile indicates, for a larger set of XY positions, whether one or more graphics primitives cover those XY positions. The tile coalescer includes coverage data in the tile only once for each XY position, thereby allowing the API ordering of the graphics primitives covering each XY position to be preserved. The tile is then distributed to a set of streaming multiprocessors for shading and blending operations. The different streaming multiprocessors execute thread groups to process the tile. In doing so, those thread groups may perform read-modify-write operations with data stored in memory. Each such thread group is scheduled to execute via atomic operations, and according to the API order of the associated graphics primitives.

Type: Application

Filed: October 27, 2015

Publication date: April 27, 2017

Inventors: ZIYAD HAKURA, ERIC LUM, DALE KIRKLAND, JACK CHOQUETTE, PATRICK R. BROWN, YURY Y. URALSKY, JEFFREY BOLZ
TECHNIQUES FOR MAINTAINING ATOMICITY AND ORDERING FOR PIXEL SHADER OPERATIONS

Publication number: 20170116698

Abstract: A tile coalescer within a graphics processing pipeline coalesces coverage data into tiles. The coverage data indicates, for a set of XY positions, whether a graphics primitive covers those XY positions. The tile indicates, for a larger set of XY positions, whether one or more graphics primitives cover those XY positions. The tile coalescer includes coverage data in the tile only once for each XY position, thereby allowing the API ordering of the graphics primitives covering each XY position to be preserved. The tile is then distributed to a set of streaming multiprocessors for shading and blending operations. The different streaming multiprocessors execute thread groups to process the tile. In doing so, those thread groups may perform read-modify-write operations with data stored in memory. Each such thread group is scheduled to execute via atomic operations, and according to the API order of the associated graphics primitives.

Type: Application

Filed: October 27, 2015

Publication date: April 27, 2017

Inventors: ZIYAD HAKURA, ERIC LUM, DALE KIRKLAND, JACK CHOQUETTE, PATRICK R. BROWN, YURY Y. URALSKY, JEFFREY BOLZ
Automated generation of theoretical performance analysis based upon workload and design configuration

Patent number: 7765500

Abstract: A method of more efficiently, easily and cost-effectively analyzing the performance of a device model is disclosed. Embodiments enable automated generation of theoretical performance analysis for a device model based upon a workload associated with rendering graphical data and a configuration of the device model. The workload may be independent of design configuration, thereby enabling determination of the workload without simulating the device model. Additionally, the design configuration may be updated or changed without re-determining the workload. Accordingly, the graphical data may comprise a general or random test which is relatively large in size and covers a relatively large operational scope of the design. Additionally, the workload may comprise graphical information determined based upon the graphical data. Further, the theoretical performance analysis may indicate a graphics pipeline unit of the device model causing a bottleneck in a graphics pipeline of the device model.

Type: Grant

Filed: November 8, 2007

Date of Patent: July 27, 2010

Assignee: NVIDIA Corporation

Inventors: Ziyad Hakura, John Tynefield, Thomas Green
System, apparatus and method for issuing predictions from an inventory to access a memory

Publication number: 20060095677

Abstract: A system, apparatus, and method are disclosed for managing predictive accesses to memory. In one embodiment, an exemplary apparatus is configured as a prediction inventory that stores predictions in a number of queues. Each queue is configured to maintain predictions until a subset of the predictions is either issued to access a memory or filtered out as redundant. In another embodiment, an exemplary prefetcher predicts accesses to a memory. The prefetcher comprises a speculator for generating a number of predictions and a prediction inventory, which includes queues each configured to maintain a group of items. The group of items typically includes a triggering address that corresponds to the group. Each item of the group is of one type of prediction. Also, the prefetcher includes an inventory filter configured to compare the number of predictions against one of the queues having the either the same or different prediction type as the number of predictions.

Type: Application

Filed: August 17, 2004

Publication date: May 4, 2006

Inventors: Ziyad Hakura, Brian Langendorf, Stefano Pescador, Radoslay Danilak, Brad Simeral
System, apparatus and method for generating nonsequential predictions to access a memory

Publication number: 20060041721

Abstract: A system, apparatus, and method are disclosed for storing and prioritizing predictions to anticipate nonsequential accesses to a memory. In one embodiment, an exemplary apparatus is configured as a prefetcher for predicting accesses to a memory. The prefetcher includes a prediction generator configured to generate a prediction that is unpatternable to an address. Also, the prefetcher also can include a target cache coupled to the prediction generator to maintain the prediction in a manner that determines a priority for the prediction. In another embodiment, the prefetcher can also include a priority adjuster. The priority adjuster sets a priority for a prediction relative to other predictions. In some cases, the placement of the prediction is indicative of the priority relative to priorities for the other predictions. In yet another embodiment, the prediction generator uses the priority to determine that the prediction is to be generated before other predictions.

Type: Application

Filed: August 17, 2004

Publication date: February 23, 2006

Inventors: Ziyad Hakura, Brian Langendorf, Stefano Pescador, Radoslav Danilak, Brad Simeral
System, apparatus and method for predicting accesses to a memory

Publication number: 20060041723

Abstract: A system, apparatus, and method are disclosed for predicting accesses to memory. In one embodiment, an exemplary apparatus comprises a processor configured to execute program instructions and process program data, a memory including the program instructions and the program data, and a memory processor. The memory processor can include a speculator configured to receive an address containing the program instructions or the program data. Such a speculator can comprise a sequential predictor for generating a configurable number of sequential addresses. The speculator can also include a nonsequential predictor configured to associate a subset of addresses to the address and to predict a group of addresses based on at least one address of the subset, wherein at least one address of the subset is unpatternable to the address.

Type: Application

Filed: August 17, 2004

Publication date: February 23, 2006

Inventors: Ziyad Hakura, Brian Langendorf, Stefano Pescador, Radoslav Danilak, Brad Simeral
System, apparatus and method for performing look-ahead lookup on predictive information in a cache memory

Publication number: 20060041722

Abstract: A system, apparatus, and method are disclosed for storing predictions as well as examining and using one or more caches for anticipating accesses to a memory. In one embodiment, an exemplary apparatus is a prefetcher for managing predictive accesses with a memory. The prefetcher can include a speculator to generate a range of predictions, and multiple caches. For example, the prefetcher can include a first cache and a second cache to store predictions. An entry of the first cache is addressable by a first representation of an address from the range of predictions, whereas an entry of the second cache is addressable by a second representation of the address. The first and the second representations are compared in parallel against the stored predictions of either the first cache and the second cache, or both.

Type: Application

Filed: August 17, 2004

Publication date: February 23, 2006

Inventors: Ziyad Hakura, Radoslav Danilak, Brad Simeral, Brian Langendorf, Stefano Pescador, Dmitry Vyshetsky
System and method for accelerating a special purpose processor

Publication number: 20050128203

Abstract: Embodiments of the invention accelerate at least one special purpose processor, such as a GPU, or a driver managing a special purpose processor, by using at least one co-processor. Advantageously, embodiments of the invention are fault-tolerant in that the at least one GPU or other special purpose processor is able to execute all computations, although perhaps at a lower level of performance, if the at least one co-processor is rendered inoperable. The co-processor may also be used selectively, based on performance considerations.

Type: Application

Filed: December 11, 2003

Publication date: June 16, 2005

Inventors: Jen-Hsun Huang, Michael Cox, Ziyad Hakura, John Montrym, Brad Simeral, Brian Langendorf, Blanton Kephart, Franck Diard

prev 1 2