Patents Assigned to NVidia

ADDRESS BIT REMAPPING SCHEME TO REDUCE ACCESS GRANULARITY OF DRAM ACCESSES

Publication number: 20140160876

Abstract: One embodiment of the present invention sets forth a method for accessing non-contiguous locations within a DRAM memory page by sending a first column address command to a first DRAM device using a first subset of pins and sending a second column address command to a second DRAM device using a second subset of repurposed pins. One advantage of the disclosed technique is that it requires minimal additional pins, space, and power consumption. Further, sending multiple column address commands allows for increased granularity of DRAM accesses and therefore more efficient use of pins. Thus, the disclosed technique provides a better approach for accessing non-contiguous locations within a DRAM memory page.

Type: Application

Filed: December 12, 2012

Publication date: June 12, 2014

Applicant: NVIDIA CORPORATION

Inventors: Alok GUPTA, Wishwesh GANDHI, Ram GUMMADI
REORDERING BUFFER FOR MEMORY ACCESS LOCALITY

Publication number: 20140164743

Abstract: Systems and methods for scheduling instructions for execution on a multi-core processor reorder the execution of different threads to ensure that instructions specified as having localized memory access behavior are executed over one or more sequential clock cycles to benefit from memory access locality. At compile time, code sequences including memory access instructions that may be localized are delineated into separate batches. A scheduling unit ensures that multiple parallel threads are processed over one or more sequential scheduling cycles to execute the batched instructions. The scheduling unit waits to schedule execution of instructions that are not included in the particular batch until execution of the batched instructions is done so that memory access locality is maintained for the particular batch. In between the separate batches, instructions that are not included in a batch are scheduled so that threads executing non-batched instructions are also processed and not starved.

Type: Application

Filed: December 10, 2012

Publication date: June 12, 2014

Applicant: NVIDIA CORPORATION

Inventors: Olivier GIROUX, Jack Hilaire CHOQUETTE, Xiaogang QIU, Robert J. STOLL
SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR OPTIMIZING THE MANAGEMENT OF THREAD STACK MEMORY

Publication number: 20140164727

Abstract: A system, method, and computer program product for optimizing thread stack memory allocation is disclosed. The method includes the steps of receiving source code for a program, translating the source code into an intermediate representation, analyzing the intermediate representation to identify at least two objects that could use a first allocated memory space in a thread stack memory, and modifying the intermediate representation by replacing references to a first object of the at least two objects with a reference to a second object of the at least two objects.

Type: Application

Filed: December 12, 2012

Publication date: June 12, 2014

Applicant: NVIDIA Corporation

Inventors: Adriana Maria Susnea, Vinod Grover, Sean Youngsung Lee
SYSTEM AND METHOD FOR PERFORMING SRAM WRITE ASSIST

Publication number: 20140160871

Abstract: A method and a system are provided for performing write assist. Write assist circuitry is initialized and voltage collapse is initiated to reduce a column supply voltage provided to a storage cell. A bitline of the storage cell is boosted to a boosted voltage level that is below a low supply voltage provided to the storage cell and data encoded by the bitline is written to the storage cell.

Type: Application

Filed: December 10, 2012

Publication date: June 12, 2014

Applicant: NVIDIA CORPORATION

Inventors: Brian Matthew Zimmer, Mahmut Ersin Sinangil
FOLDED FIFO MEMORY GENERATOR

Publication number: 20140164655

Abstract: Synthesizable code representing first-in-first out (FIFO) memories may be used to produce FIFO memories in a hardware element or system. To more efficiently use a memory element that stores the data in a FIFO, a code generator may generate a wrapper that enables the FIFO to use a memory element with different dimension (i.e., depth and width) than the FIFO's dimensions. For example, the wrapper enables a 128 deep, 1 bit wide FIFO to store data in a memory element with 16 rows that store 8 bits each. To any system communicating with the FIFO, the FIFO behaves like a 128×1 FIFO even though the FIFO is implemented using a 16×8 memory element. To do so, the code generator may generate a wrapper which enables the folded memory element to behave like a memory element that was not folded.

Type: Application

Filed: December 6, 2012

Publication date: June 12, 2014

Applicant: NVIDIA CORPORATION

Inventor: Robert A. ALFIERI
REGISTER ALLOCATION FOR CLUSTERED MULTI-LEVEL REGISTER FILES

Publication number: 20140164745

Abstract: A method for allocating registers within a processing unit. A compiler assigns a plurality of instructions to a plurality of processing clusters. Each instruction is configured to access a first virtual register within a live range. The compiler determines which processing cluster in the plurality of processing clusters is an owner cluster for the first virtual register within the live range. The compiler configures a first instruction included in the plurality of instructions to access a first global virtual register.

Type: Application

Filed: December 11, 2012

Publication date: June 12, 2014

Applicant: NVIDIA CORPORATION

Inventors: Mojtaba MEHRARA, Gregory DIAMOS
LAZY RUNAHEAD OPERATION FOR A MICROPROCESSOR

Publication number: 20140164736

Abstract: Embodiments related to managing lazy runahead operations at a microprocessor are disclosed. For example, an embodiment of a method for operating a microprocessor described herein includes identifying a primary condition that triggers an unresolved state of the microprocessor. The example method also includes identifying a forcing condition that compels resolution of the unresolved state. The example method also includes, in response to identification of the forcing condition, causing the microprocessor to enter a runahead mode.

Type: Application

Filed: December 7, 2012

Publication date: June 12, 2014

Applicant: NVIDIA CORPORATION

Inventors: Guillermo J. Rozas, Alexander Klaiber, James van Zoeren, Paul Serris, Brad Hoyt, Sridharan Ramakrishnan, Hens Vanderschoot, Ross Segelken, Darrell D. Boggs, Magnus Ekman
TECHNIQUE FOR SAVING AND RESTORING THREAD GROUP OPERATING STATE

Publication number: 20140165072

Abstract: A streaming multiprocessor (SM) included within a parallel processing unit (PPU) is configured to suspend a thread group executing on the SM and to save the operating state of the suspended thread group. A load-store unit (LSU) within the SM re-maps local memory associated with the thread group to a location in global memory. Subsequently, the SM may re-launch the suspended thread group. The LSU may then perform local memory access operations on behalf of the re-launched thread group with the re-mapped local memory that resides in global memory.

Type: Application

Filed: December 11, 2012

Publication date: June 12, 2014

Applicant: NVIDIA CORPORATION

Inventors: Nicholas WANG, Lacky V. SHAH, Gerald F. LUIZ, Philip Alexander CUADRA, Luke DURANT, Shirish GADRE
VISIBLE POLYGON DATA STRUCTURE AND METHOD OF USE THEREOF

Publication number: 20140160124

Abstract: A visible polygon data structure and method of use thereof. One embodiment of the visible polygon data structure includes: (1) a memory configured to store a data structure containing vertices of at least partially visible polygons of the scene but lacking vertices of at least some wholly invisible polygons of the scene, and (2) a graphics processing unit (GPU) configured to employ the vertices of the at least partially visible polygons to approximate an ambient occlusive effect on a point in the scene, the effect being independent of the wholly invisible polygons.

Type: Application

Filed: December 12, 2012

Publication date: June 12, 2014

Applicant: NVIDIA CORPORATION

Inventors: Louis Bavoil, Miguel Sainz
METHODS FOR ENHANCING USER INTERACTION WITH MOBILE DEVICES

Publication number: 20140160019

Abstract: A method for enhancing user interaction with mobile electronic devices is presented. The method includes determining screen orientation on the device by first detecting the presence of a user using data captured by a camera of the portable electronic device. The method further includes searching the data from the camera for a plurality of physical characteristics of the user if a user is detected. The method also includes determining a facial orientation of the user based on information regarding at least one physical characteristic of the user determined from the data. Finally, the method includes setting a screen orientation of a display device of the portable electronic device based on the determined facial orientation of the user.

Type: Application

Filed: December 7, 2012

Publication date: June 12, 2014

Applicant: NVIDIA CORPORATION

Inventors: Venkata R. Anda, Guanghua Zhang, Michael Lin
Internal Logic Analyzer with Programmable Window Capture

Publication number: 20140164847

Abstract: One embodiment includes receiving a data signal transmitted to the processing unit, analyzing the data signal and generating feedback information related to the data signal, and capturing the data signal via a write enable during a plurality of clock cycles specified by a programmable controller included within the processing unit. One advantage of the disclosed technique is that the programmable controller can be used to set the capture window for one or more hardwired triggers included within the processing unit. Further, the programmable controller is able to set up additional triggers that separate and apart from the hardwired triggers included within the processing unit and set the capture window for those triggers. Thus, the disclosed technique provides a highly flexible and adaptive approach for capturing and storing on-chip data and feedback information that can be analyzed later when performing diagnostic and debugging operations.

Type: Application

Filed: December 6, 2012

Publication date: June 12, 2014

Applicant: NVIDIA Corporation

Inventors: Peter C. Mills, Gautam Bhatia
INSTRUCTION CATEGORIZATION FOR RUNAHEAD OPERATION

Publication number: 20140164738

Abstract: Embodiments related to methods and devices operative, in the event that execution of an instruction produces a runahead-triggering event, to cause a microprocessor to enter into and operate in a runahead without reissuing the instruction are provided. In one example, a microprocessor is provided. The example microprocessor includes fetch logic for retrieving an instruction, scheduling logic for issuing the instruction retrieved by the fetch logic for execution, and runahead control logic. The example runahead control logic is operative, in the event that execution of the instruction as scheduled by the scheduling logic produces a runahead-triggering event, to cause the microprocessor to enter into and operate in a runahead mode without reissuing the instruction, and carry out runahead policies while the microprocessor is in the runahead mode that governs operation of the microprocessor and cause the microprocessor to operate differently than when not in the runahead mode.

Type: Application

Filed: December 7, 2012

Publication date: June 12, 2014

Applicant: NVIDIA Corporation

Inventors: Magnus Ekman, Guillermo J. Rozas, Alexander Klaiber, James van Zoeren, Paul Serris, Brad Hoyt, Sridharan Ramakrishnan, Hens Vanderschoot, Ross Segelken, Darrell D. Boggs
SYSTEM AND METHOD FOR CONTROLLING VIDEO ENCODING USING CONTENT INFORMATION

Publication number: 20140161173

Abstract: A system and method are provided for a 3D modeling system with which an encoded video stream is produced. The system includes a content engine, an encoder, and a fixed function engine. The fixed function engine receives content information from the content engine. The fixed function engine produces encoder information from the content information. The encoder uses the encoder information to produce an encoded video stream having at least one of a higher quality and a lower bandwidth than a video stream encoded without the encoder information.

Type: Application

Filed: December 11, 2012

Publication date: June 12, 2014

Applicant: Nvidia Corporation

Inventors: Hassane S. Azar, Bryan Dudash, Rochelle Pereira, Dawid Pajak
COMPILER-CONTROLLED REGION SCHEDULING FOR SIMD EXECUTION OF THREADS

Publication number: 20140165049

Abstract: A compiler-controlled technique for scheduling threads to execute different regions of a program. A compiler analyzes program code to determine a control flow graph for the program code. The control flow graph contains regions and directed edges between regions. The regions have associated execution priorities. The directed edges indicate the direction of program control flow. Each region has a thread frontier which contains one or more regions. The compiler inserts one or more update predicate mask variable instructions at the end of a region. The compiler also inserts one or more conditional branch instructions at the end of the region. The conditional branch instructions are arranged in order of execution priority of the regions in the thread frontier of the region, to enforce execution priority of the regions at runtime.

Type: Application

Filed: December 10, 2012

Publication date: June 12, 2014

Applicant: NVIDIA CORPORATION

Inventors: Gregory DIAMOS, Mojtaba MEHRARA
SYSTEM AND METHOD FOR COMPRESSING BOUNDING BOX DATA AND PROCESSOR INCORPORATING THE SAME

Publication number: 20140160151

Abstract: Methods of compressing (and decompressing) bounding box data and a processor incorporating one or more of the methods. In one embodiment, a method of compressing such data includes: (1) generating dimension-specific multiplicands and a floating-shared scale multiplier from floating-point numbers representing extents of the bounding box and (2) substituting portions of floating-point numbers representing a reference point of the bounding box with the dimension-specific multiplicands to yield floating-point packed boundary box descriptors, the floating-point shared scale multiplier and the floating-point packed boundary box descriptors together constituting compressed bounding box data.

Type: Application

Filed: December 6, 2012

Publication date: June 12, 2014

Applicant: NVIDIA CORPORATION

Inventor: Andrei Pokrovsky
Method and system for coordinated data execution using a primary graphics processor and a secondary graphics processor

Patent number: 8749561

Abstract: A method and system for coordinated data execution in a computer system. The system includes a first graphics processor coupled to a first memory and a second graphics processor coupled to a second memory. A graphics bus is configured to couple the first graphics processor and the second graphics processor. The first graphics processor and the second graphics processor are configured for coordinated data execution via communication across the graphics bus.

Type: Grant

Filed: March 14, 2003

Date of Patent: June 10, 2014

Assignee: NVIDIA Corporation

Inventors: Dwight D. Diercks, Abraham B. de Waal
Emitting coherent output from multiple threads for printf

Patent number: 8752018

Abstract: One embodiment of the present invention sets forth a technique for emitting coherent output from multiple threads for the printf( ) function. Additionally, parallel (not divergent) execution of the threads for the printf( ) function is maintained when possible to improve run-time performance. Processing of the printf( ) function is separated into two tasks, gathering of the per thread data and formatting the gathered data according to the formatting codes for display. The threads emit a coherent stream of contiguous segments, where each segment includes the format string for the printf( ) function and the gathered data for a thread. The coherent stream is written by the threads and read by a display processor. The display processor executes a single thread to format the gathered data according to the format string for display.

Type: Grant

Filed: June 21, 2011

Date of Patent: June 10, 2014

Assignee: NVIDIA Corporation

Inventors: Stephen Jones, Geoffrey Gerfin
Barrier commands in a cache tiling architecture

Patent number: 8749564

Abstract: One embodiment of the present invention includes a graphics subsystem. The graphics subsystem includes a first processing entity and a second processing entity. Both the first processing entity and the second processing entity are configured to receive first and second batches of primitives, and a barrier command in between the first and second batches of primitives. The barrier command may be either a tiled or a non-tiled barrier command. A tiled barrier command is transmitted through the graphics subsystem for each cache tile. A non-tiled barrier command is transmitted through the graphics subsystem only once. The barrier command causes work that is after the barrier command to stop at a barrier point until a release signal is received. The back-end unit transmits a release signal to both processing entities after the first batch of primitives has been processed by both the first processing entity and the second processing entity.

Type: Grant

Filed: July 3, 2013

Date of Patent: June 10, 2014

Assignee: NVIDIA Corporation

Inventors: Ziyad S. Hakura, Dale L. Kirkland
Sharing binding groups between shaders

Patent number: 8749562

Abstract: A system and method for sharing binding groups between shaders allows for efficient use of shader state data storage resources. In contrast with conventional graphics processors and Application Programming Interfaces that specify a set of binding points for each shader that are exclusive to that shader, two or more shaders may reference the same binding group that includes multiple binding points. As the number and variety of different shaders increases, the number of binding groups may increase at a slower rate since some binding groups may be shared between different shaders.

Type: Grant

Filed: September 23, 2009

Date of Patent: June 10, 2014

Assignee: NVIDIA Corporation

Inventor: Jerome F. Duluk, Jr.
Method and system for implementing multiple high precision and low precision interpolators for a graphics pipeline

Patent number: 8749576

Abstract: A rasterizer stage configured to implement multiple interpolators for graphics pipeline. The rasterizer stage includes a plurality of simultaneously operable low precision interpolators for computing a first set of pixel parameters for pixels of a geometric primitive and a plurality of simultaneously operable high precision interpolators for computing a second set of pixel parameters for pixels of the geometric primitive. The rasterizer stage also includes an output mechanism coupled to the interpolators for routing computed pixel parameters into a memory array. Parameters may be programmably assigned to the interpolators and the results thereof may be programmably assigned to portions of a pixel packet.

Type: Grant

Filed: July 6, 2006

Date of Patent: June 10, 2014

Assignee: Nvidia Corporation

Inventors: Edward A. Hutchins, Brian K. Angell

prev … 192 193 194 195 196 197 198 199 200 … next