Abstract: In one embodiment, a microprocessor is provided. The microprocessor includes instruction memory and a branch prediction unit. The branch prediction unit is configured to use information from the instruction memory to selectively power up the branch prediction unit from a powered-down state when fetched instruction data includes a branch instruction and maintain the branch prediction unit in the powered-down state when the fetched instruction data does not include a branch instruction in order to reduce power consumption of the microprocessor during instruction fetch operations.
Type:
Grant
Filed:
April 27, 2012
Date of Patent:
January 24, 2017
Assignee:
NVIDIA CORPORATION
Inventors:
Aneesh Aggarwal, Ross Segelken, Kevin Koschoreck, Paul Wasson
Abstract: A system, method, and computer program product for implementing a tree traversal operation for a tree data structure is disclosed. The method includes the steps of receiving at least a portion of a tree data structure that represents a tree having a plurality of nodes and processing, via a tree traversal operation algorithm executed by a processor, one or more nodes of the tree data structure by intersecting the one or more nodes of the tree data structure with a query data structure. A first node of the tree data structure is associated with a first local coordinate system and a second node of the tree data structure is associated with a second local coordinate system, the first node being an ancestor of the second node, and the first local coordinate system and the second local coordinate system are both specified relative to a global coordinate system.
Type:
Grant
Filed:
January 5, 2015
Date of Patent:
January 24, 2017
Assignee:
NVIDIA Corporation
Inventors:
Samuli Matias Laine, Timo Oskari Aila, Tero Tapani Karras
Abstract: A system, method, and computer program product are provided for remapping registers based on a change in execution mode. A sequence of instructions is received for execution by a processor and a change in an execution mode from a first execution mode to a second execution mode within the sequence of instructions is identified, where a first register mapping is associated with the first execution mode and a second register mapping is associated with the second execution mode. Data stored in a set of registers within a processor is reorganized based on the first register mapping and the second register mapping in response to the change in the execution mode.
Type:
Grant
Filed:
December 20, 2013
Date of Patent:
January 24, 2017
Assignee:
NVIDIA Corporation
Inventors:
Ben Hertzberg, Guillermo Juan Rozas, Alexander Christian Klaiber, Nickolas Andrew Fortino
Abstract: One embodiment of the present invention includes a parallel processing unit (PPU) that performs pixel shading at variable granularities. For effects that vary at a low frequency across a pixel block, a coarse shading unit performs the associated shading operations on a subset of the pixels in the pixel block. By contrast, for effects that vary at a high frequency across the pixel block, fine shading units perform the associated shading operations on each pixel in the pixel block. Because the PPU implements coarse shading units and fine shading units, the PPU may tune the shading rate per-effect based on the frequency of variation across each pixel group. By contrast, conventional PPUs typically compute all effects per-pixel, performing redundant shading operations for low frequency effects. Consequently, to produce similar image quality, the PPU consumes less power and increases the rendering frame rate compared to a conventional PPU.
Type:
Grant
Filed:
December 13, 2013
Date of Patent:
January 24, 2017
Assignee:
NVIDIA Corporation
Inventors:
Yong He, Eric B. Lum, Eric Enderton, Henry Packard Moreton, Kayvon Fatahalian
Abstract: A mobile communications system comprises a first group of one or more base stations which are arranged to communicate signals with mobile units via a wireless access interface by transmitting and/or receiving radio signals within a first frequency band; a second group of one or more base stations which are arranged to communicate signals with mobile units via a wireless access interface by transmitting and/or receiving radio signals within a second frequency band; and a controller.
Abstract: A system, method, and computer program product are provided for generating anti-aliased images. The method includes the steps of assigning one or more samples to a plurality of clusters, each cluster in the plurality of clusters corresponding to an aggregate stored in an aggregate geometry buffer, where each of the one or more samples is covered by a visible fragment and rasterizing three-dimensional geometry to generate material parameters for each sample of the one or more samples. For each cluster in the plurality of clusters, the material parameters for each sample assigned to the cluster are combined to produce the aggregate. The combined material parameters for each cluster are stored in an aggregate geometry buffer. An anti-aliased image may then be generated by shading the combined material parameters.
Type:
Grant
Filed:
May 5, 2015
Date of Patent:
January 17, 2017
Assignee:
NVIDIA Corporation
Inventors:
Cyril Jean-Francois Crassin, Morgan McGuire, Aaron Eliot Lefohn, David Patrick Luebke
Abstract: One or more embodiments of the invention set forth techniques to create a process in a graphical processing unit (GPU) that has access to memory buffers in the system memory of a computer system that are shared among a plurality of GPUs in the computer system. The GPU of the process is able to engage in Direct Memory Access (DMA) with any of the shared memory buffers thereby eliminating additional copying steps that have been needed to combine data output of the various GPUs without such shared access.
Abstract: A system, method, and computer program product are provided for splitting primitives. A plurality of primitives is received for a scene and a pre-determined plane that intersects the scene is identified. Bounding volumes of the plurality of primitives that are intersected by the pre-determined plane are split, where a bounding volume that encloses each intersected primitive of the plurality of primitives is split into a first bounding volume and a second bounding volume at an intersection of the bounding volume and the pre-determined plane.
Abstract: In one embodiment, a microprocessor is provided. The microprocessor includes a branch prediction unit. The branch prediction unit is configured to track the presence of branches in instruction data that is fetched from an instruction memory after a redirection at a target of a predicted taken branch. The branch prediction unit is selectively powered up from a powered-down state when the fetched instruction data includes a branch instruction and is maintained in the powered-down state when the fetched instruction data does not include an instruction branch in order to reduce power consumption of the microprocessor during instruction fetch operations.
Type:
Grant
Filed:
April 27, 2012
Date of Patent:
January 17, 2017
Assignee:
NVIDIA CORPORATION
Inventors:
Aneesh Aggarwal, Ross Segelken, Paul Wasson
Abstract: A system and method of producing a frame of a video image from an interlaced field. In one embodiment, the method includes: (1) creating an equal-intensity trace from present samples in the field, (2) recognizing an equal-intensity path in the equal-intensity trace, (3) at least partially straightening the equal-intensity path and (4) using the equal-intensity path to determine an intensity value for a missing sample in the frame.
Abstract: Presented systems and methods can facilitate efficient information storage and tracking operations, including translation look aside buffer operations. In one embodiment, the systems and methods effectively allow the caching of invalid entries (with the attendant benefits e.g., regarding power, resource usage, stalls, etc), while maintaining the illusion that the TLBs do not in fact cache invalid entries (e.g., act in compliance with architectural rules). In one exemplary implementation, an “unreal” TLB entry effectively serves as a hint that the linear address in question currently has no valid mapping. In one exemplary implementation, speculative operations that hit an unreal entry are discarded; architectural operations that hit an unreal entry discard the entry and perform a normal page walk, either obtaining a valid entry, or raising an architectural fault.
Type:
Grant
Filed:
March 14, 2013
Date of Patent:
January 17, 2017
Assignee:
NVIDIA CORPORATION
Inventors:
Alexander Klaiber, Guillermo Juan Rozas
Abstract: One embodiment of the present invention includes a technique for processing graphics primitives in a tile-based architecture. The technique includes storing, in a buffer, a first plurality of graphics primitives and a first plurality of state bundles received from a world-space pipeline, and transmitting the first plurality of graphics primitives to a screen-space pipeline for processing while a tiling function is enabled. The technique further includes storing, in the buffer, a second plurality of graphics primitives and a second plurality of state bundles received from the world-space pipeline. The technique further includes determining, based on a first condition, that the tiling function should be disabled and that the second plurality of graphics primitives should be flushed from the buffer, and transmitting the second plurality of graphics primitives to the screen-space pipeline for processing while the tiling function is disabled.
Type:
Grant
Filed:
October 4, 2013
Date of Patent:
January 10, 2017
Assignee:
NVIDIA Corporation
Inventors:
Ziyad S. Hakura, Cynthia Ann Edgeworth Allison, Joseph Cavanaugh, Dale L. Kirkland, Emmett M. Kilgariff
Abstract: A static random access memory (SRAM) cell includes a storage unit configured to store a data bit in a storage node. The SRAM cell further includes an access unit coupled to the storage unit. The access unit is configured to transfer current to the storage node when a word line is asserted. The SRAM cell further includes a row header configured to provide current from a power supply when the word line is not asserted, and to not provide current from the power supply when the word line is asserted. The SRAM cell further includes a column header configured to provide current from a power supply when a write column line is not asserted, and to not provide current from the power supply when the write column line is asserted.
Type:
Grant
Filed:
April 18, 2013
Date of Patent:
January 10, 2017
Assignee:
NVIDIA Corporation
Inventors:
Hwong-Kwo Lin, Ge Yang, Fei Song, Xi Zhang, Haiyan Gong
Abstract: The server based graphics processing techniques, describer herein, include loading a given instance of a guest shim layer and loading a given instance of a guest display device interface that calls back into the given instance of the guest shim layer, in response to loading the given instance of the guest shim layer, wherein the guest shim layer and the guest display device interface are executing under control of a virtual machine guest operating system. The given instance of the shim layer requests a communication channel between the given instance of the guest shim layer and a host-guest communication manager (D3D HGCM) service module from a host-guest communication manager (HGCM). In response to the request for the communication channel loading, the D3D HGCM service module is loaded and a communication channel between the given instance of the shim layer and the D3D HGCM service module is created by the HGCM.
Abstract: A method for executing an application program using streams. A device driver receives a first command within an application program and parses the first command to identify a first stream token that is associated with a first stream. The device driver checks a memory location associated with the first stream for a first semaphore, and determines whether the first semaphore has been released. Once the first semaphore has been released, a second command within the application program is executed. Advantageously, embodiments of the invention provide a technique for developers to take advantage of the parallel execution capabilities of a GPU.
Type:
Grant
Filed:
August 15, 2008
Date of Patent:
January 10, 2017
Assignee:
NVIDIA Corporation
Inventors:
Nicholas Patrick Wilt, Ian Buck, Philip Cuadra
Abstract: One embodiment of the present invention sets forth a technique for dynamically allocating memory using one or more lock-free FIFOs. One or more lock-free FIFOs are populated with FIFO nodes, where each FIFO node represents a memory allocation of a predetermined size. Each particular lock-free FIFO includes memory allocations of a single size. Different lock-free FIFOs may include memory allocations for different sizes to service allocation requests for different size memory allocations. A lock-free mechanism is used to pop FIFO nodes from the FIFO. The use of the lock-free FIFO allows multiple consumers to simultaneously attempt to pop the head FIFO node without first obtaining a lock to ensure exclusive access of the FIFO.
Abstract: One embodiment of the present invention sets forth a technique for parallel distribution of primitives to multiple rasterizers. Multiple, independent geometry units perform geometry processing concurrently on different graphics primitives. A primitive distribution scheme delivers primitives from the multiple geometry units concurrently to multiple rasterizers at rates of multiple primitives per clock. The multiple, independent rasterizer units perform rasterization concurrently on one or more graphics primitives, enabling the rendering of multiple primitives per system clock.
Type:
Grant
Filed:
October 19, 2009
Date of Patent:
January 3, 2017
Assignee:
NVIDIA Corporation
Inventors:
Johnny S. Rhoades, Emmett M. Kilgariff, Michael C. Shebanow, Ziyad S. Hakura, Dale L. Kirkland, James Daniel Kelly
Abstract: A passive cooling system is provided for dissipating heat from an electronic component. The system includes a printed circuit board including a first dielectric layer and a first conductive layer, an electronic component coupled to the printed circuit board via a plurality of electrical contacts, and a cooling component thermally coupled to the electronic component through the first conductive layer by a micro via thermal array.
Abstract: A system, method, and computer program product are provided for collecting trace information based on a computational workload. The method includes the steps of compiling source code to generate a program, launching a workload to be executed by the parallel processing unit, collecting one or more records of trace information associated with a plurality of threads configured to execute the program, and correlating the one or more records to one or more corresponding instructions included in the source code. Each record in the one or more records includes at least a value of a program counter and a scheduler state of the thread.
Type:
Grant
Filed:
June 4, 2014
Date of Patent:
January 3, 2017
Assignee:
NVIDIA Corporation
Inventors:
Gregory Paul Smith, Lars Siegfried Nyland
Abstract: A system and method uses the capabilities of a geometry shader unit within the multi-threaded graphics processor to implement algorithms with variable input and output.