Patents Assigned to NVidia
  • Publication number: 20140118393
    Abstract: One embodiment of the present invention includes a method for performing a multi-pass tiling test. The method includes combining a plurality of bounding boxes to generate a coarse bounding box. The method further includes identifying a first cache tile associated with a render surface and determining that the coarse bounding box intersects the first cache tile. The method further includes comparing each bounding box included in the plurality of bounding boxes against the first cache tile to determine that a first set of one or more bounding boxes included in the plurality of bounding boxes intersects the first cache tile. Finally, the method includes, for each bounding box included in the first set of one or more bounding boxes, processing one or more graphics primitives associated with the bounding box. One advantage of the disclosed technique is that the number of intersection calculations performed for each cache tile is reduced.
    Type: Application
    Filed: August 14, 2013
    Publication date: May 1, 2014
    Applicant: NVIDIA CORPORATION
    Inventor: Ziyad S. HAKURA
  • Publication number: 20140123146
    Abstract: A technique for simultaneously executing multiple tasks, each having an independent virtual address space, involves assigning an address space identifier (ASID) to each task and constructing each virtual memory access request to include both a virtual address and the ASID. During virtual to physical address translation, the ASID selects a corresponding page table, which includes virtual to physical address mappings for the ASID and associated task. Entries for a translation look-aside buffer (TLB) include both the virtual address and ASID to complete each mapping to a physical address. Deep scheduling of tasks sharing a virtual address space may be implemented to improve cache affinity for both TLB and data caches.
    Type: Application
    Filed: October 25, 2012
    Publication date: May 1, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Nick BARROW-WILLIAMS, Brian FAHS, Jerome F. DULUK, JR., James Leroy DEMING, Timothy John PURCELL, Lucien DUNNING, Mark HAIRGROVE
  • Publication number: 20140118373
    Abstract: One embodiment of the present invention sets forth a technique for managing graphics processing resources in a tile-based architecture. The technique includes storing a release packet associated with a graphics processing resource in a buffer and initiating a replay of graphics primitives stored in the buffer and associated with the graphics processing resource. The technique further includes, for each tile included in a plurality of tiles and processed during the replay, reading the release packet and determining whether the tile is a last tile processed during the replay. The technique further includes determining not to transmit the release packet to a screen-space pipeline and continuing to read graphics data stored in the buffer if the tile is not the last tile to be processed during the replay, or transmitting the release packet to the screen-space pipeline if the tile is the last tile to be processed during the replay.
    Type: Application
    Filed: October 3, 2013
    Publication date: May 1, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Ziyad S. HAKURA, Cynthia Ann Edgeworth ALLISON, Dale L. KIRKLAND, Andrei KHODAKOVSKY, Jeffrey A. BOLZ
  • Publication number: 20140118361
    Abstract: One embodiment of the present invention sets forth a graphics subsystem configured to implement distributed tiled caching. The graphics subsystem includes one or more world-space pipelines, one or more screen-space pipelines, one or more tiling units, and a crossbar unit. Each world-space pipeline is implemented in a different processing entity and is coupled to a different tiling unit. Each screen-space pipeline is implemented in a different processing entity and is coupled to the crossbar unit. The tiling units are configured to receive primitives from the world-space pipelines, generate cache tile batches based on the primitives, and transmit the primitives to the screen-space pipelines. One advantage of the disclosed approach is that primitives are processed in application-programming-interface order in a highly parallel tiling architecture. Another advantage is that primitives are processed in cache tile order, which reduces memory bandwidth consumption and improves cache memory utilization.
    Type: Application
    Filed: October 18, 2013
    Publication date: May 1, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Ziyad S. HAKURA, Cynthia Ann Edgeworth ALLISON, Dale L. KIRKLAND, Walter R. STEINER
  • Publication number: 20140118370
    Abstract: A graphics processing system configured to track per-tile event counts in a tile-based architecture. A tiling unit in the graphics processing system is configured to cause a screen-space pipeline to load a count value associated with a first cache tile into a count memory and to cause the screen-space pipeline to process a first set of primitives that intersect the first cache tile. The tiling unit is further configured to cause the screen-space pipeline to store a second count value in a report memory location. The tiling unit is also configured to cause the screen-space pipeline to process a second set of primitives that intersect the first cache tile and to cause the screen-space pipeline to store a third count value in the first accumulating memory. Conditional rendering operations may be performed on a per-cache tile basis, based on the per-tile event count.
    Type: Application
    Filed: October 23, 2013
    Publication date: May 1, 2014
    Applicant: NVIDIA Corporation
    Inventors: Ziyad S. HAKURA, Jerome F. DULUK, Jr.
  • Publication number: 20140118374
    Abstract: One embodiment of the present invention sets forth a technique for managing buffer entries in a tile-based architecture. The technique includes receiving a first plurality of graphics primitives and a first buffer address at which attributes associated with the first plurality of graphics primitives are stored. The technique further includes, for each tile included in a plurality of tiles, transmitting the first plurality of graphics primitives and the first buffer address to a screen space pipeline and receiving an acknowledgement from the screen space pipeline indicating that processing the first plurality of graphics primitives has completed. The technique further includes determining that processing the first plurality of graphics primitives has completed for a last tile included in the plurality of tiles and that the acknowledgement has been received for each tile included in the plurality of tiles, and, in response, releasing a buffer entry associated with the first buffer address.
    Type: Application
    Filed: October 3, 2013
    Publication date: May 1, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Ziyad S. HAKURA, Cynthia Ann Edgeworth ALLISON, Dale L. KIRKLAND
  • Publication number: 20140122548
    Abstract: A system, method, and computer program product are provided for evaluating an integral utilizing a low discrepancy sequence. In use, a low discrepancy sequence that includes at least one component that is a (0,1)-sequence in base b is determined. Additionally, an integral is evaluated, utilizing the low discrepancy sequence.
    Type: Application
    Filed: November 1, 2012
    Publication date: May 1, 2014
    Applicant: NVIDIA CORPORATION
    Inventor: Alexander Keller
  • Publication number: 20140122558
    Abstract: Embodiments of the invention provide techniques for offloading certain classes of compute operations from battery-powered handheld devices operating in a wireless private area network (WPAN) to devices with relatively greater computing capabilities operating in the WPAN that are not power-limited by batteries. In order to offload certain classes of compute operations, a handheld device may discover an offload device within a local network via a discovery mechanism, and offload large or complex compute operations to the offload device by utilizing one or more low-latency communications protocols, such as Wi-Fi Direct or a combination of Wi-Fi Direct and real-time transport protocol (RTP). One advantage of the disclosed techniques is that the techniques allow handheld devices to perform complex operations without substantially impacting battery life.
    Type: Application
    Filed: October 29, 2012
    Publication date: May 1, 2014
    Applicant: NVIDIA CORPORATION
    Inventor: Hassane S. AZAR
  • Publication number: 20140118366
    Abstract: A tile-based system for processing graphics data. The tile based system includes a first screen-space pipeline, a cache unit, and a first tiling unit. The first tiling unit is configured to transmit a first set of primitives that overlap a first cache tile and a first prefetch command to the first screen-space pipeline for processing, and transmit a second set of primitives that overlap a second cache tile to the first screen-space pipeline for processing. The first prefetch command is configured to cause the cache unit to fetch data associated with the second cache tile from an external memory unit. The first tiling unit may also be configured to transmit a first flush command to the screen-space pipeline for processing with the first set of primitives. The first flush command is configured to cause the cache unit to flush data associated with the first cache tile.
    Type: Application
    Filed: October 1, 2013
    Publication date: May 1, 2014
    Applicant: NVIDIA Corporation
    Inventors: Ziyad S. HAKURA, Rouslan DIMITROV
  • Publication number: 20140118021
    Abstract: A test system and method for selecting a derating factor to be applied to a ratio of transistors having disparate electrical characteristics in a wafer fabrication process. In one embodiment, the test system includes: (1) structural at-speed automated test equipment (ATE) operable to iterate structural at-speed tests at multiple clock frequencies over integrated circuit (IC) samples fabricated under different process conditions and (2) derating factor selection circuitry coupled to the structural at-speed ATE and configured to employ results of the structural at-speed tests to identify performance deterioration in the samples, the performance deterioration indicating the derating factor to be employed in a subsequent wafer fabrication process.
    Type: Application
    Filed: October 30, 2012
    Publication date: May 1, 2014
    Applicant: NVIDIA CORPORATION
    Inventor: Gunaseelan Ponnuvel
  • Publication number: 20140118351
    Abstract: A system, method, and computer program product are provided for inputting modified coverage data into a pixel shader. In use, coverage data modified by a depth/stencil test is input into a pixel shader. Additionally, one or more actions are performed at the pixel shader, utilizing the modified coverage data.
    Type: Application
    Filed: November 1, 2012
    Publication date: May 1, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Yury Uralsky, Henry Packard Moreton
  • Publication number: 20140122805
    Abstract: Embodiments related to selecting a runahead poison policy from a plurality of runahead poison policies during microprocessor operation are provided. The example method includes causing the microprocessor to enter runahead upon detection of a runahead event and implementing a first runahead poison policy selected from a plurality of runahead poison policies operative to manage runahead poison injection during runahead. The example method also includes during microprocessor operation, selecting a second runahead poison policy operative to manage runahead poison injection differently from the first runahead poison policy.
    Type: Application
    Filed: October 26, 2012
    Publication date: May 1, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Magnus Ekman, James van Zoeren, Paul Serris
  • Publication number: 20140122829
    Abstract: A technique for simultaneously executing multiple tasks, each having an independent virtual address space, involves assigning an address space identifier (ASID) to each task and constructing each virtual memory access request to include both a virtual address and the ASID. During virtual to physical address translation, the ASID selects a corresponding page table, which includes virtual to physical address mappings for the ASID and associated task. Entries for a translation look-aside buffer (TLB) include both the virtual address and ASID to complete each mapping to a physical address. Deep scheduling of tasks sharing a virtual address space may be implemented to improve cache affinity for both TLB and data caches.
    Type: Application
    Filed: October 25, 2012
    Publication date: May 1, 2014
    Applicant: NVIDIA Corporation
    Inventors: Nick BARROW-WILLIAMS, Brian FAHS, Jerome F. DULUK, JR., James Leroy DEMING, Timothy John PURCELL, Lucien DUNNING, Mark HAIRGROVE
  • Publication number: 20140118380
    Abstract: One embodiment of the present invention includes a graphics subsystem that includes a tiling unit, a crossbar unit, and a screen-space pipeline. The crossbar unit is configured to transmit primitives interleaved with state change commands to the tiling unit. The tiling unit is configured to record an initial state associated with the primitives and to transmit to the screen-space pipeline one or more primitives in the primitives that overlap a first cache tile. The tiling unit is further configured to transmit the initial state to the screen-space pipeline and to transmit to the screen-space pipeline one or more primitives in the primitives that overlap a second cache tile. The tiling unit includes a state filter block configured to determine that a first state change in the state change commands is followed by a second state change, without an intervening primitive, and to forego transmitting the first state change in response.
    Type: Application
    Filed: September 3, 2013
    Publication date: May 1, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Ziyad S. HAKURA, Pierre SOUILLOT, Cynthia Ann Edgeworth ALLISON, Dale L. KIRKLAND, Walter R. STEINER
  • Publication number: 20140118379
    Abstract: One embodiment of the present invention includes techniques for adaptively sizing cache tiles in a graphics system. A device driver associated with a graphics system sets a cache tile size associated with a cache tile to a first size. The detects a change from a first render target configuration that includes a first set of render targets to a second render target configuration that includes a second set of render targets. The device driver sets the cache tile size to a second size based on the second render target configuration. One advantage of the disclosed approach is that the cache tile size is adaptively sized, resulting in fewer cache tiles for less complex render target configurations. Adaptively sizing cache tiles leads to more efficient processor utilization and reduced power requirements. In addition, a unified L2 cache tile allows dynamic partitioning of cache memory between cache tile data and other data.
    Type: Application
    Filed: August 28, 2013
    Publication date: May 1, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Ziyad S. HAKURA, Rouslan DIMITROV, Emmett M. KILGARIFF, Andrei KHODAKOVSKY
  • Publication number: 20140123145
    Abstract: A technique for simultaneously executing multiple tasks, each having an independent virtual address space, involves assigning an address space identifier (ASID) to each task and constructing each virtual memory access request to include both a virtual address and the ASID. During virtual to physical address translation, the ASID selects a corresponding page table, which includes virtual to physical address mappings for the ASID and associated task. Entries for a translation look-aside buffer (TLB) include both the virtual address and ASID to complete each mapping to a physical address. Deep scheduling of tasks sharing a virtual address space may be implemented to improve cache affinity for both TLB and data caches.
    Type: Application
    Filed: October 25, 2012
    Publication date: May 1, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Nick BARROW-WILLIAMS, Brian FAHS, Jerome F. DULUK, Jr., James Leroy DEMING, Timothy John PURCELL, Lucien DUNNING, Mark HAIRGROVE
  • Publication number: 20140122809
    Abstract: One embodiment of the present invention sets forth a technique for processing commands received by an intermediary cache from one or more clients. The technique involves receiving a first write command from an arbiter unit, where the first write command specifies a first memory address, determining that a first cache line related to a set of cache lines included in the intermediary cache is associated with the first memory address, causing data associated with the first write command to be written into the first cache line, and marking the first cache line as dirty.
    Type: Application
    Filed: October 30, 2012
    Publication date: May 1, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: James Patrick ROBERTSON, Gregory Alan MUTHLER, Hemayet HOSSAIN, Timothy John PURCELL, Karan MEHRA, Peter B. HOLMQVIST, George R. LYNCH
  • Publication number: 20140118391
    Abstract: One embodiment of the present invention includes a method for generating accumulated bounding boxes for graphics primitives. The method includes generating a first bounding box associated with a first graphics primitive. The method further includes, for each graphics primitive included in a first set of one or more additional graphics primitives, determining that the graphics primitive is within a threshold distance of the first bounding box, and adding the graphics primitive to the first bounding box. The method further includes determining not to add a second graphics primitive to the first bounding box. The method further includes generating a second bounding box associated with the second graphics primitive. Finally, the method includes transmitting the first bounding box to a tiling unit via a crossbar. One advantage of the disclosed embodiments is that multiple bounding boxes are combined to generate an accumulated bounding box that is then transferred across the crossbar.
    Type: Application
    Filed: August 14, 2013
    Publication date: May 1, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Ziyad S. HAKURA, Dale L. KIRKLAND
  • Publication number: 20140123144
    Abstract: One embodiment of the present invention enables threads executing on a processor to locally generate and execute work within that processor by way of work queues and command blocks. A device driver, as an initialization procedure for establishing memory objects that enable the threads to locally generate and execute work, generates a work queue, and sets a GP_GET pointer of the work queue to the first entry in the work queue. The device driver also, during the initialization procedure, sets a GP_PUT pointer of the work queue to the last free entry included in the work queue, thereby establishing a range of entries in the work queue into which new work generated by the threads can be loaded and subsequently executed by the processor. The threads then populate command blocks with generated work and point entries in the work queue to the command blocks to effect processor execution of the work stored in the command blocks.
    Type: Application
    Filed: October 26, 2012
    Publication date: May 1, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Ignacio LLAMAS, Craig Ross DUTTWEILER, Jeffrey A. BOLZ, Daniel Elliot WEXLER
  • Publication number: 20140122949
    Abstract: Systems and methods for latches are presented. In one embodiment a system includes scan in propagation component, data propagation component, and control component. The scan in propagation component is operable to select between a scan in value and a recirculation value. The data propagation component is operable to select between a data value and results forwarded from the scan in propagation component, wherein results of the data propagation component are forwarded as the recirculation value to the scan in propagation component. The control component is operable to control an indication of a selection by the scan in propagation component and the data propagation component.
    Type: Application
    Filed: October 29, 2012
    Publication date: May 1, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Ilyas Elkin, Ge Yang