Patents Assigned to NVidia
-
Patent number: 9934145Abstract: In one embodiment of the present invention a cache unit organizes data stored in an attached memory to optimize accesses to compressed data. In operation, the cache unit introduces a layer of indirection between a physical address associated with a memory access request and groups of blocks in the attached memory. The layer of indirection—virtual tiles—enables the cache unit to selectively store compressed data that would conventionally be stored in separate physical tiles included in a group of blocks in a single physical tile. Because the cache unit stores compressed data associated with multiple physical tiles in a single physical tile and, more specifically, in adjacent locations within the single physical tile, the cache unit coalesces the compressed data into contiguous blocks. Subsequently, upon performing a read operation, the cache unit may retrieve the compressed data conventionally associated with separate physical tiles in a single read operation.Type: GrantFiled: October 28, 2015Date of Patent: April 3, 2018Assignee: NVIDIA CorporationInventors: Praveen Krishnamurthy, Peter B. Holmquist, Wishwesh Gandhi, Timothy Purcell, Karan Mehra, Lacky Shah
-
Patent number: 9928642Abstract: A system and method uses the capabilities of a geometry shader unit within the multi-threaded graphics processor to implement algorithms with variable input and output.Type: GrantFiled: January 3, 2017Date of Patent: March 27, 2018Assignee: NVIDIA CORPORATIONInventor: Franck Diard
-
Patent number: 9928033Abstract: One embodiment of the present invention performs a parallel prefix scan in a single pass that incorporates variable look-back. A parallel processing unit (PPU) subdivides a list of inputs into sequentially-ordered segments and assigns each segment to a streaming multiprocessor (SM) included in the PPU. Notably, the SMs may operate in parallel. Each SM executes write operations on a segment descriptor that includes the status, aggregate, and inclusive-prefix associated with the assigned segment. Further, each SM may execute read operations on segment descriptors associated with other segments. In operation, each SM may perform reduction operations to determine a segment-wide aggregate, may perform look-back operations across multiple preceding segments to determine an exclusive-prefix, and may perform a scan seeded with the exclusive prefix to generate output data.Type: GrantFiled: October 1, 2013Date of Patent: March 27, 2018Assignee: NVIDIA CorporationInventor: Duane Merrill
-
Patent number: 9928109Abstract: One embodiment of the present disclosure sets forth a technique for enforcing cross stream dependencies in a parallel processing subsystem such as a graphics processing unit. The technique involves queuing waiting events to create cross stream dependencies and signaling events to indicated completion to the waiting events. A scheduler kernel examines a task status data structure from a corresponding stream and updates dependency counts for tasks and events within the stream. When each task dependency for a waiting event is satisfied, an associated task may execute.Type: GrantFiled: May 9, 2012Date of Patent: March 27, 2018Assignee: NVIDIA CorporationInventor: Luke Durant
-
Patent number: 9928644Abstract: A solution is proposed for efficiently determining whether or not a set of elements (such as convex shapes) in a multi-dimensional space mutually intersects. The solution may be applied to elements in any closed subset of real numbers for any number of spatial dimensions of the multi-dimensional space. The solutions provided herein include iterative processes for calculating the point displacement from boundaries of the elements (shapes), and devices for implementing the iterative process(es). The processes and devices herein may be extended to abstract (functional) definitions of convex shapes, allowing for simple and economical representations. As an embodiment of the present invention, an object called a “void simplex” may be determined, allowing the process to terminate even earlier when found, thereby avoiding unnecessary computation without excess memory requirements.Type: GrantFiled: July 1, 2015Date of Patent: March 27, 2018Assignee: NVIDIA CORPORATIONInventor: Bryan Galdrikian
-
Patent number: 9928104Abstract: A system, method, and computer program product are provided for accessing a queue. The method includes receiving a first request to reserve a data record entry in a queue, updating a queue state block based on the first request, and returning a response to the request. A second request is received to commit the data record entry and the queue state block is updated based on the second request.Type: GrantFiled: June 19, 2013Date of Patent: March 27, 2018Assignee: NVIDIA CorporationInventors: William J. Dally, James David Balfour, Ignacio Llamas Ubieto
-
Patent number: 9928639Abstract: A system and method for facilitating increased graphics processing without deadlock. Embodiments of the present invention provide storage for execution unit pipeline results (e.g., texture pipeline results). The storage allows increased processing of multiple threads as a texture unit may be used to store information while corresponding locations of the register file are available for reallocation to other threads. Embodiments further provide for preventing deadlock by limiting the number of requests and ensuring that a set of requests is not issued unless there are resources available to complete each request of the set of requests. Embodiments of the present invention thus provide for deadlock free increased performance.Type: GrantFiled: November 27, 2013Date of Patent: March 27, 2018Assignee: NVIDIA CORPORATIONInventors: Michael Toksvig, Erik Lindholm
-
Patent number: 9930082Abstract: A system and method for network driven automatic adaptive rendering impedance are presented. Embodiments of the present invention are operable to dynamically throttle the frame rate associated with an application using a server based graphics processor based on determined communication network conditions between a server based application and a remote server. Embodiments of the present invention are operable to monitor network conditions between the server and the client using a network monitoring module and correspondingly adjust the frame rate for a graphics processor used by an application through the use of a throttling signal in response to the determined network conditions. By throttling the application in the manner described by embodiments of the present invention, power resources of the server may be conserved, computational efficiency of the server may be promoted and user density of the server may be increased.Type: GrantFiled: November 20, 2012Date of Patent: March 27, 2018Assignee: NVIDIA CORPORATIONInventor: Lawrence Ibarria
-
Patent number: 9928034Abstract: A method, computer readable medium, and system are disclosed for processing a segmented data set. The method includes the steps of receiving a data structure storing a plurality of values segmented into a plurality of sequences; assigning a plurality of processing elements to process the plurality of values; and processing the plurality of values by the plurality of processing elements according to a merge-based algorithm. Each processing element in the plurality of processing elements identifies a portion of values in the plurality of values allocated to the processing element based on the merge-based algorithm. In one embodiment, the processing elements are threads executed in parallel by a parallel processing unit.Type: GrantFiled: December 16, 2015Date of Patent: March 27, 2018Assignee: NVIDIA CorporationInventor: Duane George Merrill, III
-
Patent number: 9922457Abstract: A system and method for performing tessellation of three-dimensional surface patches performs some tessellation operations using programmable processing units and other tessellation operations using fixed function units with limited precision. (u,v) parameter coordinates for each vertex are computed using fixed function units to offload programmable processing engines. The (u,v) computation is a symmetric operation and is based on integer coordinates of the vertex, tessellation level of detail values, and a spacing mode.Type: GrantFiled: December 2, 2013Date of Patent: March 20, 2018Assignee: NVIDIA CORPORATIONInventors: Justin S. Legakis, Emmett M. Kilgariff, Michael C. Shebanow
-
Patent number: 9921847Abstract: In one embodiment of the present invention, a streaming multiprocessor (SM) uses a tree of nodes to manage threads. Each node specifies a set of active threads and a program counter. Upon encountering a conditional instruction that causes an execution path to diverge, the SM creates child nodes corresponding to each of the divergent execution paths. Based on the conditional instruction, the SM assigns each active thread included in the parent node to at most one child node, and the SM temporarily discontinues executing instructions specified by the parent node. Instead, the SM concurrently executes instructions specified by the child nodes. After all the divergent paths reconverge to the parent path, the SM resumes executing instructions specified by the parent node. Advantageously, the disclosed techniques enable the SM to execute divergent paths in parallel, thereby reducing undesirable program behavior associated with conventional techniques that serialize divergent paths across thread groups.Type: GrantFiled: January 21, 2014Date of Patent: March 20, 2018Assignee: NVIDIA CorporationInventor: John Erik Lindholm
-
Patent number: 9921873Abstract: A technique for controlling the distribution of compute task processing in a multi-threaded system encodes each processing task as task metadata (TMD) stored in memory. The TMD includes work distribution parameters specifying how the processing task should be distributed for processing. Scheduling circuitry selects a task for execution when entries of a work queue for the task have been written. The work distribution parameters may define a number of work queue entries needed before a cooperative thread array” (“CTA”) may be launched to process the work queue entries according to the compute task. The work distribution parameters may define a number of CTAs that are launched to process the same work queue entries. Finally, the work distribution parameters may define a step size that is used to update pointers to the work queue entries.Type: GrantFiled: January 31, 2012Date of Patent: March 20, 2018Assignee: NVIDIA CorporationInventors: Lacky V. Shah, Karim M. Abdalla, Sean J. Treichler, Abraham B. de Waal
-
Patent number: 9916674Abstract: One embodiment of the present invention sets forth a technique for improving path rendering on computer systems with an available graphics processing unit. The technique involves reducing complex path objects to simpler geometric objects suitable for rendering on a graphics processing unit. The process involves a central processing unit “baking” a set of complex path rendering objects to generate a set of simpler graphics objects. A graphics processing unit then renders the simpler graphics objects. This division of processing load can advantageously yield higher overall rendering performance.Type: GrantFiled: May 19, 2011Date of Patent: March 13, 2018Assignee: NVIDIA CorporationInventor: Mark J. Kilgard
-
Patent number: 9916680Abstract: Techniques are disclosed for suppressing access to a depth processing unit associated with a graphics processing pipeline. The method includes receiving a graphics primitive from a first pipeline stage associated with the graphics processing pipeline. The method further includes determining that the graphics primitive is visible over one or more graphics primitives previously rendered to a frame buffer, and determining that the depth buffer is in a read-only mode. The method further includes suppressing an operation to transmit the graphics primitive to the depth processing unit. One advantage of the disclosed technique is that power consumption is reduced within the GPU by avoiding unnecessary accesses to the depth processing unit.Type: GrantFiled: October 12, 2012Date of Patent: March 13, 2018Assignee: NVIDIA CORPORATIONInventors: Christian Amsinck, Christian Rouet, Tony Louca
-
Patent number: 9918098Abstract: In the claimed approach, a high efficiency video coding codec optimizes the memory resources used during motion vector (MV) prediction. As the codec processes block of pixels, known as coding units (CUs), the codec performs read and write operations on a fixed-sized neighbor union buffer representing the MVs associated with processed CUs. In operation, for each CU, the codec determines the indices at which proximally-located “neighbor” MVs are stored within the neighbor union buffer. The codec then uses these neighbor MVs to compute new MVs. Subsequently, the codec deterministically updates the neighbor union buffer—replacing irrelevant MVs with those new MVs that are useful for computing the MVs of unprocessed CUs. By contrast, many conventional codecs not only redundantly store MVs, but also retain irrelevant MVs. Consequently, the codec reduces memory usage and memory operations compared to conventional codecs, thereby decreasing power consumption and improving codec efficiency.Type: GrantFiled: January 23, 2014Date of Patent: March 13, 2018Assignee: NVIDIA CorporationInventors: Stefan Eckart, Yu Xinyang
-
Patent number: 9910589Abstract: A virtual keyboard with dynamically adjusted recognition zones for predicted user-intended characters. When a user interaction with the virtual keyboard is received on the virtual keyboard, a character in a recognition zone encompassing the detected interaction location is selected as the current input character. Characters likely to be the next input character are predicted based on the current input character. The recognition zones of the predicted next input characters are adjusted to be larger than their original sizes.Type: GrantFiled: October 30, 2014Date of Patent: March 6, 2018Assignee: Nvidia CorporationInventors: Zhen Jia, Jing Guo, Lina Yu, Yuqi Cui
-
Patent number: 9911470Abstract: A memory circuit that presents input data at a data output promptly on receiving a clock pulse includes upstream and downstream memory logic and selection logic. The upstream memory logic is configured to latch the input data on receiving the clock pulse. The downstream memory logic is configured to store the latched input data. The selection logic is configured to expose a logic level dependent on whether the upstream memory logic has latched the input data, the exposed logic level derived from the input data before the input data is latched, and from the latched input data after the input data is latched.Type: GrantFiled: April 13, 2012Date of Patent: March 6, 2018Assignee: NVIDIA CORPORATIONInventors: Venkata Kottapalli, Scott Pitkethly, Christian Klingner, Matthew Gerlach
-
Clock generation circuit that tracks critical path across process, voltage and temperature variation
Patent number: 9912322Abstract: Clock generation circuit that track critical path across process, voltage and temperature variation. In accordance with a first embodiment of the present invention, an integrated circuit device includes an oscillator electronic circuit on the integrated circuit device configured to produce an oscillating signal and a receiving electronic circuit configured to use the oscillating signal as a system clock. The oscillating signal tracks a frequency-voltage characteristic of the receiving electronic circuit across process, voltage and temperature variations. The oscillating signal may be independent of any off-chip oscillating reference signal.Type: GrantFiled: September 12, 2016Date of Patent: March 6, 2018Assignee: NVIDIA CORPORATIONInventors: Kalyana Bollapalli, Tezaswi Raja -
Patent number: 9910865Abstract: A method for storing digital images is presented. The method includes capturing an image using a digital camera system. It also comprises capturing metadata associated with the image or a moment of capture of the image. Further, it comprises storing the metadata in at least one field within a file format, wherein the file format defines a structure for the image, and wherein the at least one field is located within an extensible segment of the file format. In one embodiment, the metadata is selected from a group that comprises audio data, GPS data, time data, related image information, heat sensor data, gyroscope data, annotated text, and annotated audio.Type: GrantFiled: August 5, 2013Date of Patent: March 6, 2018Assignee: NVIDIA CorporationInventors: Peter Mikolajczyk, Patrick Shehane, Guanghua Gary Zhang
-
Patent number: 9910760Abstract: An aspect of the present invention proposes a solution for correctly intercepting, capturing, and replaying tasks (such as functions and methods) in an interception layer operating between an application programming interface (API) and the driver of a processor by using synchronization objects such as fences. According to one or more embodiments of the present invention, the application will use what appears to the application to be a single synchronization object to signal (from a processor) and to wait (on a processor), but will actually be two separate synchronization objects in the interception layer. According to one or more embodiments, the solution proposed herein may be implemented as part of an module or tool that works as an interception layer between an application and an API exposed by a device driver of a resource, and allows for an efficient and effective approach to frame-debugging and live capture and replay of function bundles.Type: GrantFiled: September 3, 2015Date of Patent: March 6, 2018Assignee: Nvidia CorporationInventors: Jeffrey Kiel, Dan Price, Mike Strauss