Patents by Inventor Ziyad Hakura

Ziyad Hakura has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10019776
    Abstract: A tile coalescer within a graphics processing pipeline coalesces coverage data into tiles. The coverage data indicates, for a set of XY positions, whether a graphics primitive covers those XY positions. The tile indicates, for a larger set of XY positions, whether one or more graphics primitives cover those XY positions. The tile coalescer includes coverage data in the tile only once for each XY position, thereby allowing the API ordering of the graphics primitives covering each XY position to be preserved. The tile is then distributed to a set of streaming multiprocessors for shading and blending operations. The different streaming multiprocessors execute thread groups to process the tile. In doing so, those thread groups may perform read-modify-write operations with data stored in memory. Each such thread group is scheduled to execute via atomic operations, and according to the API order of the associated graphics primitives.
    Type: Grant
    Filed: October 27, 2015
    Date of Patent: July 10, 2018
    Assignee: NVIDIA CORPORATION
    Inventors: Ziyad Hakura, Eric Lum, Dale Kirkland, Jack Choquette, Patrick R. Brown, Yury Y. Uralsky, Jeffrey Bolz
  • Publication number: 20170178401
    Abstract: One embodiment of the present invention includes a technique for distributing work slices associated with a graphics processing unit for processing. A primitive distribution system receives a draw command related to a graphics object associated with a plurality of indices. The primitive distribution system creates a plurality of work slices, where each work slice is associated with a different subset of the indices included in the plurality of indices. The primitive distribution system scans a first subset of indices to identify a first set of characteristics that is needed to process a second subset of indices. The primitive distribution system processes the second subset of indices based at least in part on the one or more characteristics.
    Type: Application
    Filed: December 22, 2015
    Publication date: June 22, 2017
    Inventors: Niket Agrawal, Amit Jain, Dale Kirkland, Karim Abdalla, Ziyad Hakura, Haren Kethareswaran
  • Publication number: 20170148204
    Abstract: A multi-pass unit interoperates with a device driver to configure a screen space pipeline to perform multiple processing passes with buffered graphics primitives. The multi-pass unit receives primitive data and state bundles from the device driver. The primitive data includes a graphics primitive and a primitive mask. The primitive mask indicates the specific passes when the graphics primitive should be processed. The state bundles include one or more state settings and a state mask. The state mask indicates the specific passes where the state settings should be applied. The primitives and state settings are interleaved. For a given pass, the multi-pass unit extracts the interleaved state settings for that pass and configures the screen space pipeline according to those state settings. The multi-pass unit also extracts the interleaved graphics primitives to be processed in that pass. Then, the multi-pass unit causes the screen space pipeline to process those graphics primitives.
    Type: Application
    Filed: November 25, 2015
    Publication date: May 25, 2017
    Inventors: Ziyad HAKURA, Cynthia ALLISON, Dale KIRKLAND, Jeffrey BOLZ, Yury URALSKY, Jonah ALBEN
  • Publication number: 20170148203
    Abstract: A multi-pass unit interoperates with a device driver to configure a screen space pipeline to perform multiple processing passes with buffered graphics primitives. The multi-pass unit receives primitive data and state bundles from the device driver. The primitive data includes a graphics primitive and a primitive mask. The primitive mask indicates the specific passes when the graphics primitive should be processed. The state bundles include one or more state settings and a state mask. The state mask indicates the specific passes where the state settings should be applied. The primitives and state settings are interleaved. For a given pass, the multi-pass unit extracts the interleaved state settings for that pass and configures the screen space pipeline according to those state settings. The multi-pass unit also extracts the interleaved graphics primitives to be processed in that pass. Then, the multi-pass unit causes the screen space pipeline to process those graphics primitives.
    Type: Application
    Filed: November 25, 2015
    Publication date: May 25, 2017
    Inventors: Ziyad HAKURA, Cynthia ALLISON, Dale KIRKLAND, Jeffrey BOLZ, Yury URALSKY, Jonah ALBEN
  • Publication number: 20170116699
    Abstract: A tile coalescer within a graphics processing pipeline coalesces coverage data into tiles. The coverage data indicates, for a set of XY positions, whether a graphics primitive covers those XY positions. The tile indicates, for a larger set of XY positions, whether one or more graphics primitives cover those XY positions. The tile coalescer includes coverage data in the tile only once for each XY position, thereby allowing the API ordering of the graphics primitives covering each XY position to be preserved. The tile is then distributed to a set of streaming multiprocessors for shading and blending operations. The different streaming multiprocessors execute thread groups to process the tile. In doing so, those thread groups may perform read-modify-write operations with data stored in memory. Each such thread group is scheduled to execute via atomic operations, and according to the API order of the associated graphics primitives.
    Type: Application
    Filed: October 27, 2015
    Publication date: April 27, 2017
    Inventors: ZIYAD HAKURA, ERIC LUM, DALE KIRKLAND, JACK CHOQUETTE, PATRICK R. BROWN, YURY Y. URALSKY, JEFFREY BOLZ
  • Publication number: 20170116700
    Abstract: A tile coalescer within a graphics processing pipeline coalesces coverage data into tiles. The coverage data indicates, for a set of XY positions, whether a graphics primitive covers those XY positions. The tile indicates, for a larger set of XY positions, whether one or more graphics primitives cover those XY positions. The tile coalescer includes coverage data in the tile only once for each XY position, thereby allowing the API ordering of the graphics primitives covering each XY position to be preserved. The tile is then distributed to a set of streaming multiprocessors for shading and blending operations. The different streaming multiprocessors execute thread groups to process the tile. In doing so, those thread groups may perform read-modify-write operations with data stored in memory. Each such thread group is scheduled to execute via atomic operations, and according to the API order of the associated graphics primitives.
    Type: Application
    Filed: October 27, 2015
    Publication date: April 27, 2017
    Inventors: ZIYAD HAKURA, ERIC LUM, DALE KIRKLAND, JACK CHOQUETTE, PATRICK R. BROWN, YURY Y. URALSKY, JEFFREY BOLZ
  • Publication number: 20170116698
    Abstract: A tile coalescer within a graphics processing pipeline coalesces coverage data into tiles. The coverage data indicates, for a set of XY positions, whether a graphics primitive covers those XY positions. The tile indicates, for a larger set of XY positions, whether one or more graphics primitives cover those XY positions. The tile coalescer includes coverage data in the tile only once for each XY position, thereby allowing the API ordering of the graphics primitives covering each XY position to be preserved. The tile is then distributed to a set of streaming multiprocessors for shading and blending operations. The different streaming multiprocessors execute thread groups to process the tile. In doing so, those thread groups may perform read-modify-write operations with data stored in memory. Each such thread group is scheduled to execute via atomic operations, and according to the API order of the associated graphics primitives.
    Type: Application
    Filed: October 27, 2015
    Publication date: April 27, 2017
    Inventors: ZIYAD HAKURA, ERIC LUM, DALE KIRKLAND, JACK CHOQUETTE, PATRICK R. BROWN, YURY Y. URALSKY, JEFFREY BOLZ
  • Patent number: 7765500
    Abstract: A method of more efficiently, easily and cost-effectively analyzing the performance of a device model is disclosed. Embodiments enable automated generation of theoretical performance analysis for a device model based upon a workload associated with rendering graphical data and a configuration of the device model. The workload may be independent of design configuration, thereby enabling determination of the workload without simulating the device model. Additionally, the design configuration may be updated or changed without re-determining the workload. Accordingly, the graphical data may comprise a general or random test which is relatively large in size and covers a relatively large operational scope of the design. Additionally, the workload may comprise graphical information determined based upon the graphical data. Further, the theoretical performance analysis may indicate a graphics pipeline unit of the device model causing a bottleneck in a graphics pipeline of the device model.
    Type: Grant
    Filed: November 8, 2007
    Date of Patent: July 27, 2010
    Assignee: NVIDIA Corporation
    Inventors: Ziyad Hakura, John Tynefield, Thomas Green
  • Publication number: 20060095677
    Abstract: A system, apparatus, and method are disclosed for managing predictive accesses to memory. In one embodiment, an exemplary apparatus is configured as a prediction inventory that stores predictions in a number of queues. Each queue is configured to maintain predictions until a subset of the predictions is either issued to access a memory or filtered out as redundant. In another embodiment, an exemplary prefetcher predicts accesses to a memory. The prefetcher comprises a speculator for generating a number of predictions and a prediction inventory, which includes queues each configured to maintain a group of items. The group of items typically includes a triggering address that corresponds to the group. Each item of the group is of one type of prediction. Also, the prefetcher includes an inventory filter configured to compare the number of predictions against one of the queues having the either the same or different prediction type as the number of predictions.
    Type: Application
    Filed: August 17, 2004
    Publication date: May 4, 2006
    Inventors: Ziyad Hakura, Brian Langendorf, Stefano Pescador, Radoslay Danilak, Brad Simeral
  • Publication number: 20060041721
    Abstract: A system, apparatus, and method are disclosed for storing and prioritizing predictions to anticipate nonsequential accesses to a memory. In one embodiment, an exemplary apparatus is configured as a prefetcher for predicting accesses to a memory. The prefetcher includes a prediction generator configured to generate a prediction that is unpatternable to an address. Also, the prefetcher also can include a target cache coupled to the prediction generator to maintain the prediction in a manner that determines a priority for the prediction. In another embodiment, the prefetcher can also include a priority adjuster. The priority adjuster sets a priority for a prediction relative to other predictions. In some cases, the placement of the prediction is indicative of the priority relative to priorities for the other predictions. In yet another embodiment, the prediction generator uses the priority to determine that the prediction is to be generated before other predictions.
    Type: Application
    Filed: August 17, 2004
    Publication date: February 23, 2006
    Inventors: Ziyad Hakura, Brian Langendorf, Stefano Pescador, Radoslav Danilak, Brad Simeral
  • Publication number: 20060041723
    Abstract: A system, apparatus, and method are disclosed for predicting accesses to memory. In one embodiment, an exemplary apparatus comprises a processor configured to execute program instructions and process program data, a memory including the program instructions and the program data, and a memory processor. The memory processor can include a speculator configured to receive an address containing the program instructions or the program data. Such a speculator can comprise a sequential predictor for generating a configurable number of sequential addresses. The speculator can also include a nonsequential predictor configured to associate a subset of addresses to the address and to predict a group of addresses based on at least one address of the subset, wherein at least one address of the subset is unpatternable to the address.
    Type: Application
    Filed: August 17, 2004
    Publication date: February 23, 2006
    Inventors: Ziyad Hakura, Brian Langendorf, Stefano Pescador, Radoslav Danilak, Brad Simeral
  • Publication number: 20060041722
    Abstract: A system, apparatus, and method are disclosed for storing predictions as well as examining and using one or more caches for anticipating accesses to a memory. In one embodiment, an exemplary apparatus is a prefetcher for managing predictive accesses with a memory. The prefetcher can include a speculator to generate a range of predictions, and multiple caches. For example, the prefetcher can include a first cache and a second cache to store predictions. An entry of the first cache is addressable by a first representation of an address from the range of predictions, whereas an entry of the second cache is addressable by a second representation of the address. The first and the second representations are compared in parallel against the stored predictions of either the first cache and the second cache, or both.
    Type: Application
    Filed: August 17, 2004
    Publication date: February 23, 2006
    Inventors: Ziyad Hakura, Radoslav Danilak, Brad Simeral, Brian Langendorf, Stefano Pescador, Dmitry Vyshetsky
  • Publication number: 20050128203
    Abstract: Embodiments of the invention accelerate at least one special purpose processor, such as a GPU, or a driver managing a special purpose processor, by using at least one co-processor. Advantageously, embodiments of the invention are fault-tolerant in that the at least one GPU or other special purpose processor is able to execute all computations, although perhaps at a lower level of performance, if the at least one co-processor is rendered inoperable. The co-processor may also be used selectively, based on performance considerations.
    Type: Application
    Filed: December 11, 2003
    Publication date: June 16, 2005
    Inventors: Jen-Hsun Huang, Michael Cox, Ziyad Hakura, John Montrym, Brad Simeral, Brian Langendorf, Blanton Kephart, Franck Diard