Patents by Inventor John Erik Lindholm

John Erik Lindholm has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 8732713
    Abstract: A parallel thread processor executes thread groups belonging to multiple cooperative thread arrays (CTAs). At each cycle of the parallel thread processor, an instruction scheduler selects a thread group to be issued for execution during a subsequent cycle. The instruction scheduler selects a thread group to issue for execution by (i) identifying a pool of available thread groups, (ii) identifying a CTA that has the greatest seniority value, and (iii) selecting the thread group that has the greatest credit value from within the CTA with the greatest seniority value.
    Type: Grant
    Filed: September 28, 2011
    Date of Patent: May 20, 2014
    Assignee: NVIDIA Corporation
    Inventors: Brett W. Coon, John Erik Lindholm, Robert J. Stoll, Nicholas Wang, Jack Hilaire Choquette, Kathleen Elliott Nickolls
  • Publication number: 20140123150
    Abstract: One embodiment sets forth a technique for scheduling the execution of ordered critical code sections by multiple threads. A multithreaded processor includes an instruction scheduling unit that is configured to schedule threads to process ordered critical code sections. A ordered critical code section is preceded by a barrier instruction and when all of the threads have reached the barrier instruction, the instruction scheduling unit controls the thread execution order by selecting each thread for execution based on logical identifiers associated with the threads. The logical identifiers are mapped to physical identifiers that are referenced by the multithreaded processor during execution of the threads. The logical identifiers are used by the instruction scheduling unit to control the order in which the threads execute the ordered critical code section.
    Type: Application
    Filed: October 25, 2012
    Publication date: May 1, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: John Erik LINDHOLM, Tero Tapani KARRAS, Samuli Matias LAINE, Timo AILA
  • Patent number: 8667256
    Abstract: One embodiment of a computing system configured to manage divergent threads in a thread group includes a stack configured to store at least one token and a multithreaded processing unit. The multithreaded processing unit is configured to perform the steps of fetching a program instruction, determining that the program instruction is a branch instruction, determining that the program instruction is not a return or break instruction, determining whether the program instruction includes a set-synchronization bit, and updating an active program counter, where the manner in which the active program counter is updated depends on a branch instruction type.
    Type: Grant
    Filed: June 1, 2009
    Date of Patent: March 4, 2014
    Assignee: NVIDIA Corporation
    Inventors: Brett W. Coon, John Erik Lindholm
  • Patent number: 8639882
    Abstract: Methods and apparatus for source operand collector caching. In one embodiment, a processor includes a register file that may be coupled to storage elements (i.e., an operand collector) that provide inputs to the datapath of the processor core for executing an instruction. In order to reduce bandwidth between the register file and the operand collector, operands may be cached and reused in subsequent instructions. A scheduling unit maintains a cache table for monitoring which register values are currently stored in the operand collector. The scheduling unit may also configure the operand collector to select the particular storage elements that are coupled to the inputs to the datapath for a given instruction.
    Type: Grant
    Filed: December 14, 2011
    Date of Patent: January 28, 2014
    Assignee: Nvidia Corporation
    Inventors: Jack Hilaire Choquette, Manuel Olivier Gautho, John Erik Lindholm
  • Patent number: 8624910
    Abstract: One embodiment of the present invention sets forth a technique for dynamically specifying a texture header and texture sampler using an index. The index corresponds to a particular register value that may be static or computed during execution of a shader program. Any texture operation instruction may specify an index value for each of the texture header and the texture sampler.
    Type: Grant
    Filed: August 25, 2010
    Date of Patent: January 7, 2014
    Assignee: Nvidia Corporation
    Inventors: John Erik Lindholm, Yan Yan Tang
  • Patent number: 8578387
    Abstract: An embodiment of a computing system is configured to process data using a multithreaded SIMD architecture that includes heterogeneous processing engines to execute a program. The program is constructed of various program instructions. A first type of the program instructions can only be executed by a first type of processing engine and a third type of program instructions can only be executed by a second type of processing engine. A second type of program instructions can be executed by the first and the second type of processing engines. An assignment unit may be configured to dynamically determine which of the two processing engines executes any program instructions of the second type in order to balance the workload between the heterogeneous processing engines.
    Type: Grant
    Filed: July 31, 2007
    Date of Patent: November 5, 2013
    Assignee: Nvidia Corporation
    Inventors: Peter C. Mills, Stuart F. Oberman, John Erik Lindholm, Samuel Liu
  • Patent number: 8564589
    Abstract: A method for performing a ray-box intersection test includes forming a span extending between a first plane-ray intersection point and a second plane-ray intersection point, and increasing the span by relocating to a new position at least one of the first and second plane-ray intersection points. A box intersection span is constructed using the increased span, and the box intersection span, which corresponds to a node in a hierarchical acceleration structure, is tested for intersection with the ray.
    Type: Grant
    Filed: May 17, 2010
    Date of Patent: October 22, 2013
    Assignee: NVIDIA Corporation
    Inventors: Timo Aila, Samuli Laine, John Erik Lindholm
  • Patent number: 8564616
    Abstract: One embodiment of the invention sets forth a mechanism for compiling a vertex shader program into two portions, a culling portion and a shading portion. The culling portion of the compiled vertex shader program specifies vertex attributes and instructions of the vertex shader program needed to determine whether early vertex culling operations should be performed on a batch of vertices associated with one or more primitives of a graphics scene. The shading portion of the compiled vertex shader program specifies the remaining vertex attributes and instructions of the vertex shader program for performing vertex lighting and performing other operations on the vertices in the batch of vertices. When the compiled vertex shader program is executed by graphics processing hardware, the shading portion of the compiled vertex shader is executed only when early vertex culling operations are not performed on the batch of vertices.
    Type: Grant
    Filed: July 17, 2009
    Date of Patent: October 22, 2013
    Assignee: Nvidia Corporation
    Inventors: Ziyad S. Hakura, John Erik Lindholm, Emmett M. Kilgariff, Robert Ohannessian, Scott R. Whitman, James C. Bowman, Patrick R. Brown, Ross A. Cunniff
  • Patent number: 8542247
    Abstract: One embodiment of the invention sets forth a mechanism for compiling a vertex shader program into two portions, a culling portion and a shading portion. The culling portion of the compiled vertex shader program specifies vertex attributes and instructions of the vertex shader program needed to determine whether early vertex culling operations should be performed on a batch of vertices associated with one or more primitives of a graphics scene. The shading portion of the compiled vertex shader program specifies the remaining vertex attributes and instructions of the vertex shader program for performing vertex lighting and performing other operations on the vertices in the batch of vertices. When the compiled vertex shader program is executed by graphics processing hardware, the shading portion of the compiled vertex shader is executed only when early vertex culling operations are not performed on the batch of vertices.
    Type: Grant
    Filed: July 17, 2009
    Date of Patent: September 24, 2013
    Assignee: Nvidia Corporation
    Inventors: Ziyad S. Hakura, John Erik Lindholm, Emmett M. Kilgariff, Robert Ohannessian, Scott R. Whitman, James C. Bowman, Patrick R. Brown, Ross A. Cunniff
  • Patent number: 8533435
    Abstract: One embodiment of the present invention sets forth a technique for collecting operands specified by an instruction. As a sequence of instructions is received the operands specified by the instructions are assigned to ports, so that each one of the operands specified by a single instruction is assigned to a different port. Reading of the operands from a multi-bank register file is scheduled by selecting an operand from each one of the different ports to produce an operand read request and ensuring that two or more of the selected operands are not stored in the same bank of the multi-bank register file. The operands specified by the operand read request are read from the multi-bank register file in a single clock cycle. Each instruction is then executed as the operands specified by the instruction are read from the multi-bank register file and collected over one or more clock cycles.
    Type: Grant
    Filed: September 3, 2010
    Date of Patent: September 10, 2013
    Assignee: NVIDIA Corporation
    Inventors: Xiaogang Qiu, Ming Y. Siu, Yan Yan Tang, John Erik Lindholm, Michael C. Shebanow, Stuart F. Oberman
  • Patent number: 8502819
    Abstract: A method for performing a ray tracing node traversal operation in an image rendering process includes traversing a plurality of nodes within spatial hierarchy that represents a scene which is to be rendered, the spatial hierarchy including two or more hierarchy levels, each hierarchy level including one or more nodes. A number representing the number of nodes traversed in each one of a plurality of different hierarchy levels is stored, wherein each number is represented by at least one bit in a multi-bit binary sequence.
    Type: Grant
    Filed: May 17, 2010
    Date of Patent: August 6, 2013
    Assignee: NVIDIA Corporation
    Inventors: Timo Aila, Samuli Laine, John Erik Lindholm
  • Publication number: 20130159628
    Abstract: Methods and apparatus for source operand collector caching. In one embodiment, a processor includes a register file that may be coupled to storage elements (i.e., an operand collector) that provide inputs to the datapath of the processor core for executing an instruction. In order to reduce bandwidth between the register file and the operand collector, operands may be cached and reused in subsequent instructions. A scheduling unit maintains a cache table for monitoring which register values are currently stored in the operand collector. The scheduling unit may also configure the operand collector to select the particular storage elements that are coupled to the inputs to the datapath for a given instruction.
    Type: Application
    Filed: December 14, 2011
    Publication date: June 20, 2013
    Inventors: Jack Hilaire CHOQUETTE, Manuel Olivier Gautho, John Erik Lindholm
  • Patent number: 8405665
    Abstract: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.
    Type: Grant
    Filed: May 7, 2012
    Date of Patent: March 26, 2013
    Assignee: Nvidia Corporation
    Inventors: John Erik Lindholm, Brett W. Coon, Stuart F. Oberman, Ming Y. Siu, Matthew P. Gerlach
  • Patent number: 8384736
    Abstract: One embodiment of the present invention sets forth a technique for generating a batch clip state stored in clip state machine (CSM) associated with a batch of vertices. Per-vertex clip state is generated for each vertex in the batch of vertices based on the position of each vertex relative to each clip plane. For a given vertex, per-vertex clip state indicates whether the vertex is inside or outside each of the one or more clip planes. The per-vertex clip states of all the vertices in the batch of vertices are coalesced into a batch clip state by determining whether each vertex in the batch of vertices is inside every clip plane, each vertex is outside at least one clip plane or neither. The batch clip state is stored in the CSM associated with the thread group that processes the batch of vertices that can be accessed by further stages of the graphics pipeline.
    Type: Grant
    Filed: October 14, 2009
    Date of Patent: February 26, 2013
    Assignee: NVIDIA Corporation
    Inventors: John Erik Lindholm, Ziyad S. Hakura
  • Patent number: 8312254
    Abstract: An indirect branch instruction takes an address register as an argument in order to provide indirect function call capability for single-instruction multiple-thread (SIMT) processor architectures. The indirect branch instruction is used to implement indirect function calls, virtual function calls, and switch statements to improve processing performance compared with using sequential chains of tests and branches.
    Type: Grant
    Filed: March 24, 2008
    Date of Patent: November 13, 2012
    Assignee: NVIDIA Corporation
    Inventors: Brett W. Coon, John R. Nickolls, Lars Nyland, Peter C. Mills, John Erik Lindholm
  • Patent number: 8264492
    Abstract: A system, method and article of manufacture are provided for programmable processing in a computer graphics pipeline. Initially, data is received from a source buffer. Thereafter, programmable operations are performed on the data in order to generate output. The operations are programmable in that a user may utilize instructions from a predetermined instruction set for generating the same. Such output is stored in a register. During operation, the output stored in the register is used in performing the programmable operations on the data.
    Type: Grant
    Filed: November 19, 2007
    Date of Patent: September 11, 2012
    Assignee: NVIDIA Corporation
    Inventors: John Erik Lindholm, David B. Kirk, Henry P. Moreton, Simon Moy
  • Patent number: 8259122
    Abstract: A system, method and article of manufacture are provided for programmable processing in a computer graphics pipeline. Initially, data is received from a source buffer. Thereafter, programmable operations are performed on the data in order to generate output. The operations are programmable in that a user may utilize instructions from a predetermined instruction set for generating the same. Such output is stored in a register. During operation, the output stored in the register is used in performing the programmable operations on the data.
    Type: Grant
    Filed: November 19, 2007
    Date of Patent: September 4, 2012
    Assignee: NVIDIA Corporation
    Inventors: John Erik Lindholm, David B. Kirk, Henry P. Moreton, Simon Moy
  • Publication number: 20120218267
    Abstract: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.
    Type: Application
    Filed: May 7, 2012
    Publication date: August 30, 2012
    Inventors: John Erik Lindholm, Brett W. Coon, Stuart F. Oberman, Ming Y. Siu, Matthew P. Gerlach
  • Patent number: 8237705
    Abstract: Apparatuses and methods are presented for a hierarchical processor. The processor comprises, at a first level of hierarchy, a plurality of similarly structured first level components, wherein each of the plurality of similarly structured first level components includes at least one combined function module capable of performing multiple classes of graphics operations, each of the multiple classes of graphics operations being associated with a different stage of graphics processing.
    Type: Grant
    Filed: October 10, 2011
    Date of Patent: August 7, 2012
    Assignee: NVIDIA Corporation
    Inventors: John Erik Lindholm, John S. Montrym, Emmett M. Kilgariff, Simon S. Moy, Sean Jeffrey Treichler, Brett W. Coon, David Kirk, John Danskin
  • Patent number: 8223158
    Abstract: A method and system for connecting multiple shaders are disclosed. Specifically, one embodiment of the present invention sets forth a method, which includes the steps of configuring a set of shaders in a user-defined sequence within a modular pipeline (MPipe), allocating resources to execute the programming instructions of each of the set of shaders in the user-defined sequence to operate on the data unit, and directing the output of the MPipe to an external sink.
    Type: Grant
    Filed: December 19, 2006
    Date of Patent: July 17, 2012
    Assignee: NVIDIA Corporation
    Inventors: John Erik Lindholm, Michael C. Shebanow, Jerome F. Duluk, Jr.