Patents Assigned to NVidia

Organizing memory to optimize memory accesses of compressed data

Patent number: 9934145

Abstract: In one embodiment of the present invention a cache unit organizes data stored in an attached memory to optimize accesses to compressed data. In operation, the cache unit introduces a layer of indirection between a physical address associated with a memory access request and groups of blocks in the attached memory. The layer of indirection—virtual tiles—enables the cache unit to selectively store compressed data that would conventionally be stored in separate physical tiles included in a group of blocks in a single physical tile. Because the cache unit stores compressed data associated with multiple physical tiles in a single physical tile and, more specifically, in adjacent locations within the single physical tile, the cache unit coalesces the compressed data into contiguous blocks. Subsequently, upon performing a read operation, the cache unit may retrieve the compressed data conventionally associated with separate physical tiles in a single read operation.

Type: Grant

Filed: October 28, 2015

Date of Patent: April 3, 2018

Assignee: NVIDIA Corporation

Inventors: Praveen Krishnamurthy, Peter B. Holmquist, Wishwesh Gandhi, Timothy Purcell, Karan Mehra, Lacky Shah
Using a geometry shader for variable input and output algorithms

Patent number: 9928642

Abstract: A system and method uses the capabilities of a geometry shader unit within the multi-threaded graphics processor to implement algorithms with variable input and output.

Type: Grant

Filed: January 3, 2017

Date of Patent: March 27, 2018

Assignee: NVIDIA CORPORATION

Inventor: Franck Diard
Single-pass parallel prefix scan with dynamic look back

Patent number: 9928033

Abstract: One embodiment of the present invention performs a parallel prefix scan in a single pass that incorporates variable look-back. A parallel processing unit (PPU) subdivides a list of inputs into sequentially-ordered segments and assigns each segment to a streaming multiprocessor (SM) included in the PPU. Notably, the SMs may operate in parallel. Each SM executes write operations on a segment descriptor that includes the status, aggregate, and inclusive-prefix associated with the assigned segment. Further, each SM may execute read operations on segment descriptors associated with other segments. In operation, each SM may perform reduction operations to determine a segment-wide aggregate, may perform look-back operations across multiple preceding segments to determine an exclusive-prefix, and may perform a scan seeded with the exclusive prefix to generate output data.

Type: Grant

Filed: October 1, 2013

Date of Patent: March 27, 2018

Assignee: NVIDIA Corporation

Inventor: Duane Merrill
Method and system for processing nested stream events

Patent number: 9928109

Abstract: One embodiment of the present disclosure sets forth a technique for enforcing cross stream dependencies in a parallel processing subsystem such as a graphics processing unit. The technique involves queuing waiting events to create cross stream dependencies and signaling events to indicated completion to the waiting events. A scheduler kernel examines a task status data structure from a corresponding stream and updates dependency counts for tasks and events within the stream. When each task dependency for a waiting event is satisfied, an associated task may execute.

Type: Grant

Filed: May 9, 2012

Date of Patent: March 27, 2018

Assignee: NVIDIA Corporation

Inventor: Luke Durant
Method and apparatus for determining mutual intersection of multiple convex shapes

Patent number: 9928644

Abstract: A solution is proposed for efficiently determining whether or not a set of elements (such as convex shapes) in a multi-dimensional space mutually intersects. The solution may be applied to elements in any closed subset of real numbers for any number of spatial dimensions of the multi-dimensional space. The solutions provided herein include iterative processes for calculating the point displacement from boundaries of the elements (shapes), and devices for implementing the iterative process(es). The processes and devices herein may be extended to abstract (functional) definitions of convex shapes, allowing for simple and economical representations. As an embodiment of the present invention, an object called a “void simplex” may be determined, allowing the process to terminate even earlier when found, thereby avoiding unnecessary computation without excess memory requirements.

Type: Grant

Filed: July 1, 2015

Date of Patent: March 27, 2018

Assignee: NVIDIA CORPORATION

Inventor: Bryan Galdrikian
System, method, and computer program product for a two-phase queue

Patent number: 9928104

Abstract: A system, method, and computer program product are provided for accessing a queue. The method includes receiving a first request to reserve a data record entry in a queue, updating a queue state block based on the first request, and returning a response to the request. A second request is received to commit the data record entry and the queue state block is updated based on the second request.

Type: Grant

Filed: June 19, 2013

Date of Patent: March 27, 2018

Assignee: NVIDIA Corporation

Inventors: William J. Dally, James David Balfour, Ignacio Llamas Ubieto
System and method for deadlock-free pipelining

Patent number: 9928639

Abstract: A system and method for facilitating increased graphics processing without deadlock. Embodiments of the present invention provide storage for execution unit pipeline results (e.g., texture pipeline results). The storage allows increased processing of multiple threads as a texture unit may be used to store information while corresponding locations of the register file are available for reallocation to other threads. Embodiments further provide for preventing deadlock by limiting the number of requests and ensuring that a set of requests is not issued unless there are resources available to complete each request of the set of requests. Embodiments of the present invention thus provide for deadlock free increased performance.

Type: Grant

Filed: November 27, 2013

Date of Patent: March 27, 2018

Assignee: NVIDIA CORPORATION

Inventors: Michael Toksvig, Erik Lindholm
Method and system for network driven automatic adaptive rendering impedance

Patent number: 9930082

Abstract: A system and method for network driven automatic adaptive rendering impedance are presented. Embodiments of the present invention are operable to dynamically throttle the frame rate associated with an application using a server based graphics processor based on determined communication network conditions between a server based application and a remote server. Embodiments of the present invention are operable to monitor network conditions between the server and the client using a network monitoring module and correspondingly adjust the frame rate for a graphics processor used by an application through the use of a throttling signal in response to the determined network conditions. By throttling the application in the manner described by embodiments of the present invention, power resources of the server may be conserved, computational efficiency of the server may be promoted and user density of the server may be increased.

Type: Grant

Filed: November 20, 2012

Date of Patent: March 27, 2018

Assignee: NVIDIA CORPORATION

Inventor: Lawrence Ibarria
Work-efficient, load-balanced, merge-based parallelized consumption of sequences of sequences

Patent number: 9928034

Abstract: A method, computer readable medium, and system are disclosed for processing a segmented data set. The method includes the steps of receiving a data structure storing a plurality of values segmented into a plurality of sequences; assigning a plurality of processing elements to process the plurality of values; and processing the plurality of values by the plurality of processing elements according to a merge-based algorithm. Each processing element in the plurality of processing elements identifies a portion of values in the plurality of values allocated to the processing element based on the merge-based algorithm. In one embodiment, the processing elements are threads executed in parallel by a parallel processing unit.

Type: Grant

Filed: December 16, 2015

Date of Patent: March 27, 2018

Assignee: NVIDIA Corporation

Inventor: Duane George Merrill, III
Computing tessellation coordinates using dedicated hardware

Patent number: 9922457

Abstract: A system and method for performing tessellation of three-dimensional surface patches performs some tessellation operations using programmable processing units and other tessellation operations using fixed function units with limited precision. (u,v) parameter coordinates for each vertex are computed using fixed function units to offload programmable processing engines. The (u,v) computation is a symmetric operation and is based on integer coordinates of the vertex, tessellation level of detail values, and a spacing mode.

Type: Grant

Filed: December 2, 2013

Date of Patent: March 20, 2018

Assignee: NVIDIA CORPORATION

Inventors: Justin S. Legakis, Emmett M. Kilgariff, Michael C. Shebanow
Tree-based thread management

Patent number: 9921847

Abstract: In one embodiment of the present invention, a streaming multiprocessor (SM) uses a tree of nodes to manage threads. Each node specifies a set of active threads and a program counter. Upon encountering a conditional instruction that causes an execution path to diverge, the SM creates child nodes corresponding to each of the divergent execution paths. Based on the conditional instruction, the SM assigns each active thread included in the parent node to at most one child node, and the SM temporarily discontinues executing instructions specified by the parent node. Instead, the SM concurrently executes instructions specified by the child nodes. After all the divergent paths reconverge to the parent path, the SM resumes executing instructions specified by the parent node. Advantageously, the disclosed techniques enable the SM to execute divergent paths in parallel, thereby reducing undesirable program behavior associated with conventional techniques that serialize divergent paths across thread groups.

Type: Grant

Filed: January 21, 2014

Date of Patent: March 20, 2018

Assignee: NVIDIA Corporation

Inventor: John Erik Lindholm
Controlling work distribution for processing tasks

Patent number: 9921873

Abstract: A technique for controlling the distribution of compute task processing in a multi-threaded system encodes each processing task as task metadata (TMD) stored in memory. The TMD includes work distribution parameters specifying how the processing task should be distributed for processing. Scheduling circuitry selects a task for execution when entries of a work queue for the task have been written. The work distribution parameters may define a number of work queue entries needed before a cooperative thread array” (“CTA”) may be launched to process the work queue entries according to the compute task. The work distribution parameters may define a number of CTAs that are launched to process the same work queue entries. Finally, the work distribution parameters may define a step size that is used to update pointers to the work queue entries.

Type: Grant

Filed: January 31, 2012

Date of Patent: March 20, 2018

Assignee: NVIDIA Corporation

Inventors: Lacky V. Shah, Karim M. Abdalla, Sean J. Treichler, Abraham B. de Waal
Baking path rendering objects into compact and efficient memory representations

Patent number: 9916674

Abstract: One embodiment of the present invention sets forth a technique for improving path rendering on computer systems with an available graphics processing unit. The technique involves reducing complex path objects to simpler geometric objects suitable for rendering on a graphics processing unit. The process involves a central processing unit “baking” a set of complex path rendering objects to generate a set of simpler graphics objects. A graphics processing unit then renders the simpler graphics objects. This division of processing load can advantageously yield higher overall rendering performance.

Type: Grant

Filed: May 19, 2011

Date of Patent: March 13, 2018

Assignee: NVIDIA Corporation

Inventor: Mark J. Kilgard
Low-power processing in depth read-only operating regimes

Patent number: 9916680

Abstract: Techniques are disclosed for suppressing access to a depth processing unit associated with a graphics processing pipeline. The method includes receiving a graphics primitive from a first pipeline stage associated with the graphics processing pipeline. The method further includes determining that the graphics primitive is visible over one or more graphics primitives previously rendered to a frame buffer, and determining that the depth buffer is in a read-only mode. The method further includes suppressing an operation to transmit the graphics primitive to the depth processing unit. One advantage of the disclosed technique is that power consumption is reduced within the GPU by avoiding unnecessary accesses to the depth processing unit.

Type: Grant

Filed: October 12, 2012

Date of Patent: March 13, 2018

Assignee: NVIDIA CORPORATION

Inventors: Christian Amsinck, Christian Rouet, Tony Louca
Memory management of motion vectors in high efficiency video coding motion vector prediction

Patent number: 9918098

Abstract: In the claimed approach, a high efficiency video coding codec optimizes the memory resources used during motion vector (MV) prediction. As the codec processes block of pixels, known as coding units (CUs), the codec performs read and write operations on a fixed-sized neighbor union buffer representing the MVs associated with processed CUs. In operation, for each CU, the codec determines the indices at which proximally-located “neighbor” MVs are stored within the neighbor union buffer. The codec then uses these neighbor MVs to compute new MVs. Subsequently, the codec deterministically updates the neighbor union buffer—replacing irrelevant MVs with those new MVs that are useful for computing the MVs of unprocessed CUs. By contrast, many conventional codecs not only redundantly store MVs, but also retain irrelevant MVs. Consequently, the codec reduces memory usage and memory operations compared to conventional codecs, thereby decreasing power consumption and improving codec efficiency.

Type: Grant

Filed: January 23, 2014

Date of Patent: March 13, 2018

Assignee: NVIDIA Corporation

Inventors: Stefan Eckart, Yu Xinyang
Virtual keyboard with adaptive character recognition zones

Patent number: 9910589

Abstract: A virtual keyboard with dynamically adjusted recognition zones for predicted user-intended characters. When a user interaction with the virtual keyboard is received on the virtual keyboard, a character in a recognition zone encompassing the detected interaction location is selected as the current input character. Characters likely to be the next input character are predicted based on the current input character. The recognition zones of the predicted next input characters are adjusted to be larger than their original sizes.

Type: Grant

Filed: October 30, 2014

Date of Patent: March 6, 2018

Assignee: Nvidia Corporation

Inventors: Zhen Jia, Jing Guo, Lina Yu, Yuqi Cui
Fast-bypass memory circuit

Patent number: 9911470

Abstract: A memory circuit that presents input data at a data output promptly on receiving a clock pulse includes upstream and downstream memory logic and selection logic. The upstream memory logic is configured to latch the input data on receiving the clock pulse. The downstream memory logic is configured to store the latched input data. The selection logic is configured to expose a logic level dependent on whether the upstream memory logic has latched the input data, the exposed logic level derived from the input data before the input data is latched, and from the latched input data after the input data is latched.

Type: Grant

Filed: April 13, 2012

Date of Patent: March 6, 2018

Assignee: NVIDIA CORPORATION

Inventors: Venkata Kottapalli, Scott Pitkethly, Christian Klingner, Matthew Gerlach
Clock generation circuit that tracks critical path across process, voltage and temperature variation

Patent number: 9912322

Abstract: Clock generation circuit that track critical path across process, voltage and temperature variation. In accordance with a first embodiment of the present invention, an integrated circuit device includes an oscillator electronic circuit on the integrated circuit device configured to produce an oscillating signal and a receiving electronic circuit configured to use the oscillating signal as a system clock. The oscillating signal tracks a frequency-voltage characteristic of the receiving electronic circuit across process, voltage and temperature variations. The oscillating signal may be independent of any off-chip oscillating reference signal.

Type: Grant

Filed: September 12, 2016

Date of Patent: March 6, 2018

Assignee: NVIDIA CORPORATION

Inventors: Kalyana Bollapalli, Tezaswi Raja
Method for capturing the moment of the photo capture

Patent number: 9910865

Abstract: A method for storing digital images is presented. The method includes capturing an image using a digital camera system. It also comprises capturing metadata associated with the image or a moment of capture of the image. Further, it comprises storing the metadata in at least one field within a file format, wherein the file format defines a structure for the image, and wherein the at least one field is located within an extensible segment of the file format. In one embodiment, the metadata is selected from a group that comprises audio data, GPS data, time data, related image information, heat sensor data, gyroscope data, annotated text, and annotated audio.

Type: Grant

Filed: August 5, 2013

Date of Patent: March 6, 2018

Assignee: NVIDIA Corporation

Inventors: Peter Mikolajczyk, Patrick Shehane, Guanghua Gary Zhang
Method and apparatus for interception of synchronization objects in graphics application programming interfaces for frame debugging

Patent number: 9910760

Abstract: An aspect of the present invention proposes a solution for correctly intercepting, capturing, and replaying tasks (such as functions and methods) in an interception layer operating between an application programming interface (API) and the driver of a processor by using synchronization objects such as fences. According to one or more embodiments of the present invention, the application will use what appears to the application to be a single synchronization object to signal (from a processor) and to wait (on a processor), but will actually be two separate synchronization objects in the interception layer. According to one or more embodiments, the solution proposed herein may be implemented as part of an module or tool that works as an interception layer between an application and an API exposed by a device driver of a resource, and allows for an efficient and effective approach to frame-debugging and live capture and replay of function bundles.

Type: Grant

Filed: September 3, 2015

Date of Patent: March 6, 2018

Assignee: Nvidia Corporation

Inventors: Jeffrey Kiel, Dan Price, Mike Strauss

prev … 98 99 100 101 102 103 104 105 106 … next