Patents Assigned to NVidia

Algorithm for vectorization and memory coalescing during compiling

Patent number: 9639336

Abstract: One embodiment of the present invention sets forth a technique for reducing the number of assembly instructions included in a computer program. The technique involves receiving a directed acyclic graph (DAG) that includes a plurality of nodes, where each node includes an assembly instruction of the computer program, hierarchically parsing the plurality of nodes to identify at least two assembly instructions that are vectorizable and can be replaced by a single vectorized assembly instruction, and replacing the at least two assembly instructions with the single vectorized assembly instruction.

Type: Grant

Filed: October 25, 2012

Date of Patent: May 2, 2017

Assignee: NVIDIA Corporation

Inventors: Vinod Grover, Manjunath Kudlur, Michael Murphy
Migration of peer-mapped memory pages

Patent number: 9639474

Abstract: Techniques are provided by which memory pages may be migrated among PPU memories in a multi-PPU system. According to the techniques, a UVM driver determines that a particular memory page should change ownership state and/or be migrated between one PPU memory and another PPU memory. In response to this determination, the UVM driver initiates a peer transition sequence to cause the ownership state and/or location of the memory page to change. Various peer transition sequences involve modifying mappings for one or more PPU, and copying a memory page from one PPU memory to another PPU memory. Several steps in peer transition sequences may be performed in parallel for increased processing speed.

Type: Grant

Filed: December 19, 2013

Date of Patent: May 2, 2017

Assignee: NVIDIA Corporation

Inventors: Jerome F. Duluk, Jr., John Mashey, Mark Hairgrove, Chenghuan Jia, Cameron Buschardt, Lucien Dunning, Brian Fahs
Indirect function call instructions in a synchronous parallel thread processor

Patent number: 9639365

Abstract: An indirect branch instruction takes an address register as an argument in order to provide indirect function call capability for single-instruction multiple-thread (SIMT) processor architectures. The indirect branch instruction is used to implement indirect function calls, virtual function calls, and switch statements to improve processing performance compared with using sequential chains of tests and branches.

Type: Grant

Filed: November 12, 2012

Date of Patent: May 2, 2017

Assignee: NVIDIA Corporation

Inventors: Brett W. Coon, John R. Nickolls, Lars Nyland, Peter C. Mills, John Erik Lindholm
Setting a PCIE Device ID

Patent number: 9639494

Abstract: One embodiment of the present invention includes a hard-coded first device ID. The embodiment also includes a set of fuses that represents a second device ID. The hard-coded device ID and the set of fuses each designate a separate device ID for the device, and each device ID corresponds to a specific operating configuration of the device. The embodiment also includes selection logic to select between the hardcoded device ID and the set of fuses to set the device ID for the device. One advantage of the disclosed embodiments is providing flexibility for engineers who develop the devices while also reducing the likelihood that a third party can counterfeit the device.

Type: Grant

Filed: November 1, 2013

Date of Patent: May 2, 2017

Assignee: NVIDIA Corporation

Inventors: Jesse Max Guss, Philip Browning Johnson, Chris Marriott, Wojciech Jan Truty
Control mechanism for fine-tuned cache to backing-store synchronization

Patent number: 9639466

Abstract: One embodiment of the present invention sets forth a technique for processing commands received by an intermediary cache from one or more clients. The technique involves receiving a first write command from an arbiter unit, where the first write command specifies a first memory address, determining that a first cache line related to a set of cache lines included in the intermediary cache is associated with the first memory address, causing data associated with the first write command to be written into the first cache line, and marking the first cache line as dirty.

Type: Grant

Filed: October 30, 2012

Date of Patent: May 2, 2017

Assignee: NVIDIA Corporation

Inventors: James Patrick Robertson, Gregory Alan Muthler, Hemayet Hossain, Timothy John Purcell, Karan Mehra, Peter B. Holmqvist, George R. Lynch
Managing event count reports in a tile-based architecture

Patent number: 9639367

Abstract: One embodiment of the present invention sets forth a graphics processing system configured to track event counts in a tile-based architecture. The graphics processing system includes a screen-space pipeline and a tiling unit. The screen-space pipeline includes a first unit, a count memory associated with the first unit, and an accumulating memory associated with the first unit. The first unit is configured to detect an event type and increment the count memory. The tiling unit is configured to cause the screen-space pipeline to update an external memory address to reflect a first value stored in the count memory when the first unit completes processing of a first set of primitives. The tiling unit is also configured to cause the screen-space pipeline to update the accumulating memory to reflect a second value stored in the count memory when the first unit completes processing of a second set of primitives.

Type: Grant

Filed: October 4, 2013

Date of Patent: May 2, 2017

Assignee: NVIDIA Corporation

Inventors: Ziyad S. Hakura, Jerome F. Duluk, Jr.
Predictive current sensing

Patent number: 9639102

Abstract: A system and method are provided for estimating current. A current source is configured to generate a current and a pulsed sense enable signal is generated. An estimate of the current is generated and the estimate of the current is updated based on a first signal that is configured to couple the current source to an electric power supply and a second signal that is configured to couple the current source to aloud. A system includes the current source and a current prediction unit. The current source is configured to generate a current. The current prediction unit is coupled the current source and is configured to generate the estimate of the current and update the estimate of the current based on the first signal and the second signal.

Type: Grant

Filed: February 19, 2013

Date of Patent: May 2, 2017

Assignee: NVIDIA Corporation

Inventor: William J. Dally
Write assist scheme for low power SRAM

Patent number: 9640249

Abstract: A write-assist memory includes a memory supply voltage and a column of SRAM cells that is controlled by a pair of bit lines, during a write operation. Additionally, the write-assist memory includes a write-assist unit that is coupled to the memory supply voltage and the column of SRAM cells and has a separable conductive line located between the pair of bit lines that provides a collapsible SRAM supply voltage to the column of SRAM cells based on a capacitive coupling of a control signal in the pair of bit lines, during the write operation. A method of operating a write-assist memory is also provided.

Type: Grant

Filed: May 20, 2014

Date of Patent: May 2, 2017

Assignee: Nvidia Corporation

Inventors: Gang Chen, Jing Guo, Jun Yang
Instructions for managing a parallel cache hierarchy

Patent number: 9639479

Abstract: A method for managing a parallel cache hierarchy in a processing unit. The method includes receiving an instruction from a scheduler unit, where the instruction comprises a load instruction or a store instruction; determining that the instruction includes a cache operations modifier that identifies a policy for caching data associated with the instruction at one or more levels of the parallel cache hierarchy; and executing the instruction and caching the data associated with the instruction based on the cache operations modifier.

Type: Grant

Filed: September 22, 2010

Date of Patent: May 2, 2017

Assignee: NVIDIA Corporation

Inventors: John R. Nickolls, Brett W. Coon, Michael C. Shebanow
Method and system for reducing a polygon bounding box

Patent number: 9633458

Abstract: In a graphics processing pipeline, a processing unit establishes a bounding box around a polygon in order to identify sample points that are covered by the polygon. For a given sample point included within the bounding box, the processing unit constructs a set of lines that intersect at the sample point, where each line in the set of lines is parallel to at least one side of the polygon. When all vertices of the polygon reside on one side of at least one line in the set of lines, the processing unit may reduce the size of the bounding box to exclude the sample point.

Type: Grant

Filed: January 23, 2012

Date of Patent: April 25, 2017

Assignee: NVIDIA Corporation

Inventors: Walter R. Steiner, Eric Lum, Dale L. Kirkland, Steven James Heinrich, David Charles Patrick
Conservative rasterization of primitives using an error term

Patent number: 9633469

Abstract: A system, method, and computer program product are provided for conservative rasterization of primitives using an error term. In use, an edge equation is determined for each edge of a primitive, the edge equation having coefficients defining the edge of the primitive. Each edge of the primitive is shifted to enlarge the primitive by modifying coefficients of the edge equation defining the edge by an error term that is a predetermined amount. Pixels that intersect the primitive are then determined using the enlarged primitive.

Type: Grant

Filed: March 15, 2013

Date of Patent: April 25, 2017

Assignee: NVIDIA Corporation

Inventors: Eric Brian Lum, Walter Robert Steiner, Henry Packard Moreton, Justin L. Cobb, Barry Nolan Rodgers, Yury Uralsky, Timo Oskari Aila, Tero Tapani Karras
Lazy runahead operation for a microprocessor

Patent number: 9632976

Abstract: Embodiments related to managing lazy runahead operations at a microprocessor are disclosed. For example, an embodiment of a method for operating a microprocessor described herein includes identifying a primary condition that triggers an unresolved state of the microprocessor. The example method also includes identifying a forcing condition that compels resolution of the unresolved state. The example method also includes, in response to identification of the forcing condition, causing the microprocessor to enter a runahead mode.

Type: Grant

Filed: December 7, 2012

Date of Patent: April 25, 2017

Assignee: NVIDIA CORPORATION

Inventors: Guillermo J. Rozas, Alexander Klaiber, James van Zoeren, Paul Serris, Brad Hoyt, Sridharan Ramakrishnan, Hens Vanderschoot, Ross Segelken, Darrell D. Boggs, Magnus Ekman
Assigning priorities to computational work streams by mapping desired execution priorities to device priorities

Patent number: 9632834

Abstract: One embodiment sets forth a method for assigning priorities to kernels launched by a software application and executed within a stream of work on a parallel processing subsystem. First, the software application assigns a desired priority to a stream using a call included in the API. The API receives this call and passes it to a driver. The driver maps the desired priority to an appropriate device priority associated with the parallel processing subsystem. Subsequently, if the software application launches a particular kernel within the stream, then the driver assigns the device priority associated with the stream to the kernel before adding the kernel to the stream for execution on the parallel processing subsystem. Advantageously, by assigning priorities to streams and, subsequently, strategically launching kernels within the prioritized streams, an application developer may fine-tune the software application to increase the overall processing efficiency of the software application.

Type: Grant

Filed: May 17, 2013

Date of Patent: April 25, 2017

Assignee: NVIDIA Corporation

Inventors: Vivek Kini, Forrest Iandola, Timothy James Murray
Shaped register file reads

Patent number: 9626191

Abstract: One embodiment of the present invention sets forth a technique for performing a shaped access of a register file that includes a set of N registers, wherein N is greater than or equal to two. The technique involves, for at least one thread included in a group of threads, receiving a request to access a first amount of data from each register in the set of N registers, and configuring a crossbar to allow the at least one thread to access the first amount of data from each register in the set of N registers.

Type: Grant

Filed: December 22, 2011

Date of Patent: April 18, 2017

Assignee: NVIDIA Corporation

Inventors: Jack Hilaire Choquette, Michael Fetterman, Shirish Gadre, Xiaogang Qiu, Omkar Paranjape, Anjana Rajendran, Stewart Glenn Carlton, Eric Lyell Hill, Rajeshwaran Selvanesan, Douglas J. Hahn
Graphics processing unit sharing between many applications

Patent number: 9626216

Abstract: A technique for executing a plurality of applications on a GPU. The technique involves establishing a first connection to a first application and a second connection to a second application, establishing a universal processing context that is shared by the first application and the second application, transmitting a first workload pointer to a first queue allocated to the first application, the first workload pointer pointing to a first workload generated by the first application, transmitting a second workload pointer to a second queue allocated to the second application, the second workload pointer pointing to a second workload generated by the second application, transmitting the first workload pointer to a first GPU queue in the GPU, and transmitting the second workload pointer to a second GPU queue in the GPU, wherein the GPU is configured to execute the first workload and the second workload in accordance with the universal processing context.

Type: Grant

Filed: May 9, 2012

Date of Patent: April 18, 2017

Assignee: NVIDIA Corporation

Inventors: Christopher Michael Cameron, Timothy James Murray
Navigation device

Patent number: 9628705

Abstract: In one embodiment, a car navigation device is provided. The device comprises: at least one wide-angle camera; a video correction unit for acquiring video data from the wide-angle lens and correcting the video data; a video merging unit for acquiring corrected video data from video correction unit and merging the corrected video data; an image recognition unit for acquiring video from the video merging unit and performing image recognition to the video; and a driving assistant unit for acquiring data from the image recognition unit and assisting driving in accordance with the recognized content. The navigation device provided by various embodiments in accordance with the present invention can correct and recognize the images taken by fisheye lens in real-time so as to assist the driver for driving or drive the car automatically without a human being.

Type: Grant

Filed: November 14, 2012

Date of Patent: April 18, 2017

Assignee: NVIDIA CORPORATION

Inventor: Wenjie Zheng
8-transistor dual-ported static random access memory

Patent number: 9627021

Abstract: An 8-transistor SRAM (static random access memory) storage cell provides differential read bit lines that are precharged to a low voltage level for read operations. The 8-transistor storage cell provides separate ports for read and write operations, including differential read bit lines. Prior to each read operation, the differential read bit lines are precharged to the low voltage level. During read operations, one of the two differential read bit lines is pulled high towards a high voltage level while the complementary bit line remains at the low voltage level resulting from the precharge. The difference in voltage between the differential read bit lines is sensed to determine the value stored in each 8-transistor SRAM storage cell and complete the read operation.

Type: Grant

Filed: October 9, 2012

Date of Patent: April 18, 2017

Assignee: NVIDIA Corporation

Inventors: John W. Poulton, Brian Zimmer
Technique for scaling the bandwidth of a processing element to match the bandwidth of an interconnect

Patent number: 9626320

Abstract: A transmitter is configured to scale up a low bandwidth delivered by a first processing element to match a higher bandwidth associated with an interconnect. A receiver is configured to scale down the high bandwidth delivered by the interconnect to match the lower bandwidth associated with a second processing element. The first processing element and the second processing element may thus communicate with one another across the interconnect via the transmitter and the receiver, respectively, despite the bandwidth mismatch between those processing elements and the interconnect.

Type: Grant

Filed: September 19, 2013

Date of Patent: April 18, 2017

Assignee: NVIDIA Corporation

Inventors: Marvin A. Denman, Dennis K. Ma, Stephen David Glaser
Light transport simulator and method of construction and classification of light transport paths

Patent number: 9619930

Abstract: A light transport simulator and a method of constructing and classifying light transport paths. One embodiment of the light transport simulator includes a light transport simulator operable to construct and classify a light transport path between two points in a scene, including: (1) a memory configured to store dual deterministic finite automata (DFA) based on an LPE that defines criteria for accepting the light transport path, and (2) a processor configured to employ the dual DFA to construct opposing light subpaths originating from the two points, and employ a correspondence among states of the dual DFA to unite the opposing light subpaths to form the light transport path.

Type: Grant

Filed: May 3, 2013

Date of Patent: April 11, 2017

Assignee: Nvidia Corporation

Inventor: Daniel Siebert
Method and system for bin coalescing for parallel divide-and-conquer sorting algorithms

Patent number: 9619204

Abstract: A system and method for performing sorting. The method includes partitioning a plurality of keys needing sorting into a first plurality of bins, wherein the bins are sequentially sorted. The plurality of keys is capable of being sorted into a sequence of keys using a corresponding ordering system. The method includes coalescing a first pair of consecutive bins, such that when coalesced the first pair of bins falls below a threshold. The method also includes ordering keys in the first coalesced pair to generate a first sub-sequence of keys in the sequence of keys.

Type: Grant

Filed: June 14, 2013

Date of Patent: April 11, 2017

Assignee: Nvidia Corporation

Inventor: Duane Merrill

prev … 110 111 112 113 114 115 116 117 118 … next