Patents Assigned to NVidia
  • Publication number: 20140160876
    Abstract: One embodiment of the present invention sets forth a method for accessing non-contiguous locations within a DRAM memory page by sending a first column address command to a first DRAM device using a first subset of pins and sending a second column address command to a second DRAM device using a second subset of repurposed pins. One advantage of the disclosed technique is that it requires minimal additional pins, space, and power consumption. Further, sending multiple column address commands allows for increased granularity of DRAM accesses and therefore more efficient use of pins. Thus, the disclosed technique provides a better approach for accessing non-contiguous locations within a DRAM memory page.
    Type: Application
    Filed: December 12, 2012
    Publication date: June 12, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Alok GUPTA, Wishwesh GANDHI, Ram GUMMADI
  • Publication number: 20140164743
    Abstract: Systems and methods for scheduling instructions for execution on a multi-core processor reorder the execution of different threads to ensure that instructions specified as having localized memory access behavior are executed over one or more sequential clock cycles to benefit from memory access locality. At compile time, code sequences including memory access instructions that may be localized are delineated into separate batches. A scheduling unit ensures that multiple parallel threads are processed over one or more sequential scheduling cycles to execute the batched instructions. The scheduling unit waits to schedule execution of instructions that are not included in the particular batch until execution of the batched instructions is done so that memory access locality is maintained for the particular batch. In between the separate batches, instructions that are not included in a batch are scheduled so that threads executing non-batched instructions are also processed and not starved.
    Type: Application
    Filed: December 10, 2012
    Publication date: June 12, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Olivier GIROUX, Jack Hilaire CHOQUETTE, Xiaogang QIU, Robert J. STOLL
  • Publication number: 20140164727
    Abstract: A system, method, and computer program product for optimizing thread stack memory allocation is disclosed. The method includes the steps of receiving source code for a program, translating the source code into an intermediate representation, analyzing the intermediate representation to identify at least two objects that could use a first allocated memory space in a thread stack memory, and modifying the intermediate representation by replacing references to a first object of the at least two objects with a reference to a second object of the at least two objects.
    Type: Application
    Filed: December 12, 2012
    Publication date: June 12, 2014
    Applicant: NVIDIA Corporation
    Inventors: Adriana Maria Susnea, Vinod Grover, Sean Youngsung Lee
  • Publication number: 20140160871
    Abstract: A method and a system are provided for performing write assist. Write assist circuitry is initialized and voltage collapse is initiated to reduce a column supply voltage provided to a storage cell. A bitline of the storage cell is boosted to a boosted voltage level that is below a low supply voltage provided to the storage cell and data encoded by the bitline is written to the storage cell.
    Type: Application
    Filed: December 10, 2012
    Publication date: June 12, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Brian Matthew Zimmer, Mahmut Ersin Sinangil
  • Publication number: 20140164655
    Abstract: Synthesizable code representing first-in-first out (FIFO) memories may be used to produce FIFO memories in a hardware element or system. To more efficiently use a memory element that stores the data in a FIFO, a code generator may generate a wrapper that enables the FIFO to use a memory element with different dimension (i.e., depth and width) than the FIFO's dimensions. For example, the wrapper enables a 128 deep, 1 bit wide FIFO to store data in a memory element with 16 rows that store 8 bits each. To any system communicating with the FIFO, the FIFO behaves like a 128×1 FIFO even though the FIFO is implemented using a 16×8 memory element. To do so, the code generator may generate a wrapper which enables the folded memory element to behave like a memory element that was not folded.
    Type: Application
    Filed: December 6, 2012
    Publication date: June 12, 2014
    Applicant: NVIDIA CORPORATION
    Inventor: Robert A. ALFIERI
  • Publication number: 20140164745
    Abstract: A method for allocating registers within a processing unit. A compiler assigns a plurality of instructions to a plurality of processing clusters. Each instruction is configured to access a first virtual register within a live range. The compiler determines which processing cluster in the plurality of processing clusters is an owner cluster for the first virtual register within the live range. The compiler configures a first instruction included in the plurality of instructions to access a first global virtual register.
    Type: Application
    Filed: December 11, 2012
    Publication date: June 12, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Mojtaba MEHRARA, Gregory DIAMOS
  • Publication number: 20140164736
    Abstract: Embodiments related to managing lazy runahead operations at a microprocessor are disclosed. For example, an embodiment of a method for operating a microprocessor described herein includes identifying a primary condition that triggers an unresolved state of the microprocessor. The example method also includes identifying a forcing condition that compels resolution of the unresolved state. The example method also includes, in response to identification of the forcing condition, causing the microprocessor to enter a runahead mode.
    Type: Application
    Filed: December 7, 2012
    Publication date: June 12, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Guillermo J. Rozas, Alexander Klaiber, James van Zoeren, Paul Serris, Brad Hoyt, Sridharan Ramakrishnan, Hens Vanderschoot, Ross Segelken, Darrell D. Boggs, Magnus Ekman
  • Publication number: 20140165072
    Abstract: A streaming multiprocessor (SM) included within a parallel processing unit (PPU) is configured to suspend a thread group executing on the SM and to save the operating state of the suspended thread group. A load-store unit (LSU) within the SM re-maps local memory associated with the thread group to a location in global memory. Subsequently, the SM may re-launch the suspended thread group. The LSU may then perform local memory access operations on behalf of the re-launched thread group with the re-mapped local memory that resides in global memory.
    Type: Application
    Filed: December 11, 2012
    Publication date: June 12, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Nicholas WANG, Lacky V. SHAH, Gerald F. LUIZ, Philip Alexander CUADRA, Luke DURANT, Shirish GADRE
  • Publication number: 20140160124
    Abstract: A visible polygon data structure and method of use thereof. One embodiment of the visible polygon data structure includes: (1) a memory configured to store a data structure containing vertices of at least partially visible polygons of the scene but lacking vertices of at least some wholly invisible polygons of the scene, and (2) a graphics processing unit (GPU) configured to employ the vertices of the at least partially visible polygons to approximate an ambient occlusive effect on a point in the scene, the effect being independent of the wholly invisible polygons.
    Type: Application
    Filed: December 12, 2012
    Publication date: June 12, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Louis Bavoil, Miguel Sainz
  • Publication number: 20140160019
    Abstract: A method for enhancing user interaction with mobile electronic devices is presented. The method includes determining screen orientation on the device by first detecting the presence of a user using data captured by a camera of the portable electronic device. The method further includes searching the data from the camera for a plurality of physical characteristics of the user if a user is detected. The method also includes determining a facial orientation of the user based on information regarding at least one physical characteristic of the user determined from the data. Finally, the method includes setting a screen orientation of a display device of the portable electronic device based on the determined facial orientation of the user.
    Type: Application
    Filed: December 7, 2012
    Publication date: June 12, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Venkata R. Anda, Guanghua Zhang, Michael Lin
  • Publication number: 20140164847
    Abstract: One embodiment includes receiving a data signal transmitted to the processing unit, analyzing the data signal and generating feedback information related to the data signal, and capturing the data signal via a write enable during a plurality of clock cycles specified by a programmable controller included within the processing unit. One advantage of the disclosed technique is that the programmable controller can be used to set the capture window for one or more hardwired triggers included within the processing unit. Further, the programmable controller is able to set up additional triggers that separate and apart from the hardwired triggers included within the processing unit and set the capture window for those triggers. Thus, the disclosed technique provides a highly flexible and adaptive approach for capturing and storing on-chip data and feedback information that can be analyzed later when performing diagnostic and debugging operations.
    Type: Application
    Filed: December 6, 2012
    Publication date: June 12, 2014
    Applicant: NVIDIA Corporation
    Inventors: Peter C. Mills, Gautam Bhatia
  • Publication number: 20140164738
    Abstract: Embodiments related to methods and devices operative, in the event that execution of an instruction produces a runahead-triggering event, to cause a microprocessor to enter into and operate in a runahead without reissuing the instruction are provided. In one example, a microprocessor is provided. The example microprocessor includes fetch logic for retrieving an instruction, scheduling logic for issuing the instruction retrieved by the fetch logic for execution, and runahead control logic. The example runahead control logic is operative, in the event that execution of the instruction as scheduled by the scheduling logic produces a runahead-triggering event, to cause the microprocessor to enter into and operate in a runahead mode without reissuing the instruction, and carry out runahead policies while the microprocessor is in the runahead mode that governs operation of the microprocessor and cause the microprocessor to operate differently than when not in the runahead mode.
    Type: Application
    Filed: December 7, 2012
    Publication date: June 12, 2014
    Applicant: NVIDIA Corporation
    Inventors: Magnus Ekman, Guillermo J. Rozas, Alexander Klaiber, James van Zoeren, Paul Serris, Brad Hoyt, Sridharan Ramakrishnan, Hens Vanderschoot, Ross Segelken, Darrell D. Boggs
  • Publication number: 20140161173
    Abstract: A system and method are provided for a 3D modeling system with which an encoded video stream is produced. The system includes a content engine, an encoder, and a fixed function engine. The fixed function engine receives content information from the content engine. The fixed function engine produces encoder information from the content information. The encoder uses the encoder information to produce an encoded video stream having at least one of a higher quality and a lower bandwidth than a video stream encoded without the encoder information.
    Type: Application
    Filed: December 11, 2012
    Publication date: June 12, 2014
    Applicant: Nvidia Corporation
    Inventors: Hassane S. Azar, Bryan Dudash, Rochelle Pereira, Dawid Pajak
  • Publication number: 20140165049
    Abstract: A compiler-controlled technique for scheduling threads to execute different regions of a program. A compiler analyzes program code to determine a control flow graph for the program code. The control flow graph contains regions and directed edges between regions. The regions have associated execution priorities. The directed edges indicate the direction of program control flow. Each region has a thread frontier which contains one or more regions. The compiler inserts one or more update predicate mask variable instructions at the end of a region. The compiler also inserts one or more conditional branch instructions at the end of the region. The conditional branch instructions are arranged in order of execution priority of the regions in the thread frontier of the region, to enforce execution priority of the regions at runtime.
    Type: Application
    Filed: December 10, 2012
    Publication date: June 12, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Gregory DIAMOS, Mojtaba MEHRARA
  • Publication number: 20140160151
    Abstract: Methods of compressing (and decompressing) bounding box data and a processor incorporating one or more of the methods. In one embodiment, a method of compressing such data includes: (1) generating dimension-specific multiplicands and a floating-shared scale multiplier from floating-point numbers representing extents of the bounding box and (2) substituting portions of floating-point numbers representing a reference point of the bounding box with the dimension-specific multiplicands to yield floating-point packed boundary box descriptors, the floating-point shared scale multiplier and the floating-point packed boundary box descriptors together constituting compressed bounding box data.
    Type: Application
    Filed: December 6, 2012
    Publication date: June 12, 2014
    Applicant: NVIDIA CORPORATION
    Inventor: Andrei Pokrovsky
  • Patent number: 8749561
    Abstract: A method and system for coordinated data execution in a computer system. The system includes a first graphics processor coupled to a first memory and a second graphics processor coupled to a second memory. A graphics bus is configured to couple the first graphics processor and the second graphics processor. The first graphics processor and the second graphics processor are configured for coordinated data execution via communication across the graphics bus.
    Type: Grant
    Filed: March 14, 2003
    Date of Patent: June 10, 2014
    Assignee: NVIDIA Corporation
    Inventors: Dwight D. Diercks, Abraham B. de Waal
  • Patent number: 8752018
    Abstract: One embodiment of the present invention sets forth a technique for emitting coherent output from multiple threads for the printf( ) function. Additionally, parallel (not divergent) execution of the threads for the printf( ) function is maintained when possible to improve run-time performance. Processing of the printf( ) function is separated into two tasks, gathering of the per thread data and formatting the gathered data according to the formatting codes for display. The threads emit a coherent stream of contiguous segments, where each segment includes the format string for the printf( ) function and the gathered data for a thread. The coherent stream is written by the threads and read by a display processor. The display processor executes a single thread to format the gathered data according to the format string for display.
    Type: Grant
    Filed: June 21, 2011
    Date of Patent: June 10, 2014
    Assignee: NVIDIA Corporation
    Inventors: Stephen Jones, Geoffrey Gerfin
  • Patent number: 8749564
    Abstract: One embodiment of the present invention includes a graphics subsystem. The graphics subsystem includes a first processing entity and a second processing entity. Both the first processing entity and the second processing entity are configured to receive first and second batches of primitives, and a barrier command in between the first and second batches of primitives. The barrier command may be either a tiled or a non-tiled barrier command. A tiled barrier command is transmitted through the graphics subsystem for each cache tile. A non-tiled barrier command is transmitted through the graphics subsystem only once. The barrier command causes work that is after the barrier command to stop at a barrier point until a release signal is received. The back-end unit transmits a release signal to both processing entities after the first batch of primitives has been processed by both the first processing entity and the second processing entity.
    Type: Grant
    Filed: July 3, 2013
    Date of Patent: June 10, 2014
    Assignee: NVIDIA Corporation
    Inventors: Ziyad S. Hakura, Dale L. Kirkland
  • Patent number: 8749562
    Abstract: A system and method for sharing binding groups between shaders allows for efficient use of shader state data storage resources. In contrast with conventional graphics processors and Application Programming Interfaces that specify a set of binding points for each shader that are exclusive to that shader, two or more shaders may reference the same binding group that includes multiple binding points. As the number and variety of different shaders increases, the number of binding groups may increase at a slower rate since some binding groups may be shared between different shaders.
    Type: Grant
    Filed: September 23, 2009
    Date of Patent: June 10, 2014
    Assignee: NVIDIA Corporation
    Inventor: Jerome F. Duluk, Jr.
  • Patent number: 8749576
    Abstract: A rasterizer stage configured to implement multiple interpolators for graphics pipeline. The rasterizer stage includes a plurality of simultaneously operable low precision interpolators for computing a first set of pixel parameters for pixels of a geometric primitive and a plurality of simultaneously operable high precision interpolators for computing a second set of pixel parameters for pixels of the geometric primitive. The rasterizer stage also includes an output mechanism coupled to the interpolators for routing computed pixel parameters into a memory array. Parameters may be programmably assigned to the interpolators and the results thereof may be programmably assigned to portions of a pixel packet.
    Type: Grant
    Filed: July 6, 2006
    Date of Patent: June 10, 2014
    Assignee: Nvidia Corporation
    Inventors: Edward A. Hutchins, Brian K. Angell