Patents Assigned to NVidia
-
Patent number: 9639336Abstract: One embodiment of the present invention sets forth a technique for reducing the number of assembly instructions included in a computer program. The technique involves receiving a directed acyclic graph (DAG) that includes a plurality of nodes, where each node includes an assembly instruction of the computer program, hierarchically parsing the plurality of nodes to identify at least two assembly instructions that are vectorizable and can be replaced by a single vectorized assembly instruction, and replacing the at least two assembly instructions with the single vectorized assembly instruction.Type: GrantFiled: October 25, 2012Date of Patent: May 2, 2017Assignee: NVIDIA CorporationInventors: Vinod Grover, Manjunath Kudlur, Michael Murphy
-
Patent number: 9639474Abstract: Techniques are provided by which memory pages may be migrated among PPU memories in a multi-PPU system. According to the techniques, a UVM driver determines that a particular memory page should change ownership state and/or be migrated between one PPU memory and another PPU memory. In response to this determination, the UVM driver initiates a peer transition sequence to cause the ownership state and/or location of the memory page to change. Various peer transition sequences involve modifying mappings for one or more PPU, and copying a memory page from one PPU memory to another PPU memory. Several steps in peer transition sequences may be performed in parallel for increased processing speed.Type: GrantFiled: December 19, 2013Date of Patent: May 2, 2017Assignee: NVIDIA CorporationInventors: Jerome F. Duluk, Jr., John Mashey, Mark Hairgrove, Chenghuan Jia, Cameron Buschardt, Lucien Dunning, Brian Fahs
-
Patent number: 9639365Abstract: An indirect branch instruction takes an address register as an argument in order to provide indirect function call capability for single-instruction multiple-thread (SIMT) processor architectures. The indirect branch instruction is used to implement indirect function calls, virtual function calls, and switch statements to improve processing performance compared with using sequential chains of tests and branches.Type: GrantFiled: November 12, 2012Date of Patent: May 2, 2017Assignee: NVIDIA CorporationInventors: Brett W. Coon, John R. Nickolls, Lars Nyland, Peter C. Mills, John Erik Lindholm
-
Patent number: 9639494Abstract: One embodiment of the present invention includes a hard-coded first device ID. The embodiment also includes a set of fuses that represents a second device ID. The hard-coded device ID and the set of fuses each designate a separate device ID for the device, and each device ID corresponds to a specific operating configuration of the device. The embodiment also includes selection logic to select between the hardcoded device ID and the set of fuses to set the device ID for the device. One advantage of the disclosed embodiments is providing flexibility for engineers who develop the devices while also reducing the likelihood that a third party can counterfeit the device.Type: GrantFiled: November 1, 2013Date of Patent: May 2, 2017Assignee: NVIDIA CorporationInventors: Jesse Max Guss, Philip Browning Johnson, Chris Marriott, Wojciech Jan Truty
-
Patent number: 9639466Abstract: One embodiment of the present invention sets forth a technique for processing commands received by an intermediary cache from one or more clients. The technique involves receiving a first write command from an arbiter unit, where the first write command specifies a first memory address, determining that a first cache line related to a set of cache lines included in the intermediary cache is associated with the first memory address, causing data associated with the first write command to be written into the first cache line, and marking the first cache line as dirty.Type: GrantFiled: October 30, 2012Date of Patent: May 2, 2017Assignee: NVIDIA CorporationInventors: James Patrick Robertson, Gregory Alan Muthler, Hemayet Hossain, Timothy John Purcell, Karan Mehra, Peter B. Holmqvist, George R. Lynch
-
Patent number: 9639367Abstract: One embodiment of the present invention sets forth a graphics processing system configured to track event counts in a tile-based architecture. The graphics processing system includes a screen-space pipeline and a tiling unit. The screen-space pipeline includes a first unit, a count memory associated with the first unit, and an accumulating memory associated with the first unit. The first unit is configured to detect an event type and increment the count memory. The tiling unit is configured to cause the screen-space pipeline to update an external memory address to reflect a first value stored in the count memory when the first unit completes processing of a first set of primitives. The tiling unit is also configured to cause the screen-space pipeline to update the accumulating memory to reflect a second value stored in the count memory when the first unit completes processing of a second set of primitives.Type: GrantFiled: October 4, 2013Date of Patent: May 2, 2017Assignee: NVIDIA CorporationInventors: Ziyad S. Hakura, Jerome F. Duluk, Jr.
-
Patent number: 9639102Abstract: A system and method are provided for estimating current. A current source is configured to generate a current and a pulsed sense enable signal is generated. An estimate of the current is generated and the estimate of the current is updated based on a first signal that is configured to couple the current source to an electric power supply and a second signal that is configured to couple the current source to aloud. A system includes the current source and a current prediction unit. The current source is configured to generate a current. The current prediction unit is coupled the current source and is configured to generate the estimate of the current and update the estimate of the current based on the first signal and the second signal.Type: GrantFiled: February 19, 2013Date of Patent: May 2, 2017Assignee: NVIDIA CorporationInventor: William J. Dally
-
Patent number: 9640249Abstract: A write-assist memory includes a memory supply voltage and a column of SRAM cells that is controlled by a pair of bit lines, during a write operation. Additionally, the write-assist memory includes a write-assist unit that is coupled to the memory supply voltage and the column of SRAM cells and has a separable conductive line located between the pair of bit lines that provides a collapsible SRAM supply voltage to the column of SRAM cells based on a capacitive coupling of a control signal in the pair of bit lines, during the write operation. A method of operating a write-assist memory is also provided.Type: GrantFiled: May 20, 2014Date of Patent: May 2, 2017Assignee: Nvidia CorporationInventors: Gang Chen, Jing Guo, Jun Yang
-
Patent number: 9639479Abstract: A method for managing a parallel cache hierarchy in a processing unit. The method includes receiving an instruction from a scheduler unit, where the instruction comprises a load instruction or a store instruction; determining that the instruction includes a cache operations modifier that identifies a policy for caching data associated with the instruction at one or more levels of the parallel cache hierarchy; and executing the instruction and caching the data associated with the instruction based on the cache operations modifier.Type: GrantFiled: September 22, 2010Date of Patent: May 2, 2017Assignee: NVIDIA CorporationInventors: John R. Nickolls, Brett W. Coon, Michael C. Shebanow
-
Patent number: 9633458Abstract: In a graphics processing pipeline, a processing unit establishes a bounding box around a polygon in order to identify sample points that are covered by the polygon. For a given sample point included within the bounding box, the processing unit constructs a set of lines that intersect at the sample point, where each line in the set of lines is parallel to at least one side of the polygon. When all vertices of the polygon reside on one side of at least one line in the set of lines, the processing unit may reduce the size of the bounding box to exclude the sample point.Type: GrantFiled: January 23, 2012Date of Patent: April 25, 2017Assignee: NVIDIA CorporationInventors: Walter R. Steiner, Eric Lum, Dale L. Kirkland, Steven James Heinrich, David Charles Patrick
-
Patent number: 9633469Abstract: A system, method, and computer program product are provided for conservative rasterization of primitives using an error term. In use, an edge equation is determined for each edge of a primitive, the edge equation having coefficients defining the edge of the primitive. Each edge of the primitive is shifted to enlarge the primitive by modifying coefficients of the edge equation defining the edge by an error term that is a predetermined amount. Pixels that intersect the primitive are then determined using the enlarged primitive.Type: GrantFiled: March 15, 2013Date of Patent: April 25, 2017Assignee: NVIDIA CorporationInventors: Eric Brian Lum, Walter Robert Steiner, Henry Packard Moreton, Justin L. Cobb, Barry Nolan Rodgers, Yury Uralsky, Timo Oskari Aila, Tero Tapani Karras
-
Patent number: 9632976Abstract: Embodiments related to managing lazy runahead operations at a microprocessor are disclosed. For example, an embodiment of a method for operating a microprocessor described herein includes identifying a primary condition that triggers an unresolved state of the microprocessor. The example method also includes identifying a forcing condition that compels resolution of the unresolved state. The example method also includes, in response to identification of the forcing condition, causing the microprocessor to enter a runahead mode.Type: GrantFiled: December 7, 2012Date of Patent: April 25, 2017Assignee: NVIDIA CORPORATIONInventors: Guillermo J. Rozas, Alexander Klaiber, James van Zoeren, Paul Serris, Brad Hoyt, Sridharan Ramakrishnan, Hens Vanderschoot, Ross Segelken, Darrell D. Boggs, Magnus Ekman
-
Patent number: 9632834Abstract: One embodiment sets forth a method for assigning priorities to kernels launched by a software application and executed within a stream of work on a parallel processing subsystem. First, the software application assigns a desired priority to a stream using a call included in the API. The API receives this call and passes it to a driver. The driver maps the desired priority to an appropriate device priority associated with the parallel processing subsystem. Subsequently, if the software application launches a particular kernel within the stream, then the driver assigns the device priority associated with the stream to the kernel before adding the kernel to the stream for execution on the parallel processing subsystem. Advantageously, by assigning priorities to streams and, subsequently, strategically launching kernels within the prioritized streams, an application developer may fine-tune the software application to increase the overall processing efficiency of the software application.Type: GrantFiled: May 17, 2013Date of Patent: April 25, 2017Assignee: NVIDIA CorporationInventors: Vivek Kini, Forrest Iandola, Timothy James Murray
-
Patent number: 9626191Abstract: One embodiment of the present invention sets forth a technique for performing a shaped access of a register file that includes a set of N registers, wherein N is greater than or equal to two. The technique involves, for at least one thread included in a group of threads, receiving a request to access a first amount of data from each register in the set of N registers, and configuring a crossbar to allow the at least one thread to access the first amount of data from each register in the set of N registers.Type: GrantFiled: December 22, 2011Date of Patent: April 18, 2017Assignee: NVIDIA CorporationInventors: Jack Hilaire Choquette, Michael Fetterman, Shirish Gadre, Xiaogang Qiu, Omkar Paranjape, Anjana Rajendran, Stewart Glenn Carlton, Eric Lyell Hill, Rajeshwaran Selvanesan, Douglas J. Hahn
-
Patent number: 9626216Abstract: A technique for executing a plurality of applications on a GPU. The technique involves establishing a first connection to a first application and a second connection to a second application, establishing a universal processing context that is shared by the first application and the second application, transmitting a first workload pointer to a first queue allocated to the first application, the first workload pointer pointing to a first workload generated by the first application, transmitting a second workload pointer to a second queue allocated to the second application, the second workload pointer pointing to a second workload generated by the second application, transmitting the first workload pointer to a first GPU queue in the GPU, and transmitting the second workload pointer to a second GPU queue in the GPU, wherein the GPU is configured to execute the first workload and the second workload in accordance with the universal processing context.Type: GrantFiled: May 9, 2012Date of Patent: April 18, 2017Assignee: NVIDIA CorporationInventors: Christopher Michael Cameron, Timothy James Murray
-
Patent number: 9628705Abstract: In one embodiment, a car navigation device is provided. The device comprises: at least one wide-angle camera; a video correction unit for acquiring video data from the wide-angle lens and correcting the video data; a video merging unit for acquiring corrected video data from video correction unit and merging the corrected video data; an image recognition unit for acquiring video from the video merging unit and performing image recognition to the video; and a driving assistant unit for acquiring data from the image recognition unit and assisting driving in accordance with the recognized content. The navigation device provided by various embodiments in accordance with the present invention can correct and recognize the images taken by fisheye lens in real-time so as to assist the driver for driving or drive the car automatically without a human being.Type: GrantFiled: November 14, 2012Date of Patent: April 18, 2017Assignee: NVIDIA CORPORATIONInventor: Wenjie Zheng
-
Patent number: 9627021Abstract: An 8-transistor SRAM (static random access memory) storage cell provides differential read bit lines that are precharged to a low voltage level for read operations. The 8-transistor storage cell provides separate ports for read and write operations, including differential read bit lines. Prior to each read operation, the differential read bit lines are precharged to the low voltage level. During read operations, one of the two differential read bit lines is pulled high towards a high voltage level while the complementary bit line remains at the low voltage level resulting from the precharge. The difference in voltage between the differential read bit lines is sensed to determine the value stored in each 8-transistor SRAM storage cell and complete the read operation.Type: GrantFiled: October 9, 2012Date of Patent: April 18, 2017Assignee: NVIDIA CorporationInventors: John W. Poulton, Brian Zimmer
-
Patent number: 9626320Abstract: A transmitter is configured to scale up a low bandwidth delivered by a first processing element to match a higher bandwidth associated with an interconnect. A receiver is configured to scale down the high bandwidth delivered by the interconnect to match the lower bandwidth associated with a second processing element. The first processing element and the second processing element may thus communicate with one another across the interconnect via the transmitter and the receiver, respectively, despite the bandwidth mismatch between those processing elements and the interconnect.Type: GrantFiled: September 19, 2013Date of Patent: April 18, 2017Assignee: NVIDIA CorporationInventors: Marvin A. Denman, Dennis K. Ma, Stephen David Glaser
-
Patent number: 9619930Abstract: A light transport simulator and a method of constructing and classifying light transport paths. One embodiment of the light transport simulator includes a light transport simulator operable to construct and classify a light transport path between two points in a scene, including: (1) a memory configured to store dual deterministic finite automata (DFA) based on an LPE that defines criteria for accepting the light transport path, and (2) a processor configured to employ the dual DFA to construct opposing light subpaths originating from the two points, and employ a correspondence among states of the dual DFA to unite the opposing light subpaths to form the light transport path.Type: GrantFiled: May 3, 2013Date of Patent: April 11, 2017Assignee: Nvidia CorporationInventor: Daniel Siebert
-
Patent number: 9619204Abstract: A system and method for performing sorting. The method includes partitioning a plurality of keys needing sorting into a first plurality of bins, wherein the bins are sequentially sorted. The plurality of keys is capable of being sorted into a sequence of keys using a corresponding ordering system. The method includes coalescing a first pair of consecutive bins, such that when coalesced the first pair of bins falls below a threshold. The method also includes ordering keys in the first coalesced pair to generate a first sub-sequence of keys in the sequence of keys.Type: GrantFiled: June 14, 2013Date of Patent: April 11, 2017Assignee: Nvidia CorporationInventor: Duane Merrill