Patents by Inventor John Erik Lindholm

John Erik Lindholm has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Thread group scheduler for computing on a parallel thread processor

Patent number: 8732713

Abstract: A parallel thread processor executes thread groups belonging to multiple cooperative thread arrays (CTAs). At each cycle of the parallel thread processor, an instruction scheduler selects a thread group to be issued for execution during a subsequent cycle. The instruction scheduler selects a thread group to issue for execution by (i) identifying a pool of available thread groups, (ii) identifying a CTA that has the greatest seniority value, and (iii) selecting the thread group that has the greatest credit value from within the CTA with the greatest seniority value.

Type: Grant

Filed: September 28, 2011

Date of Patent: May 20, 2014

Assignee: NVIDIA Corporation

Inventors: Brett W. Coon, John Erik Lindholm, Robert J. Stoll, Nicholas Wang, Jack Hilaire Choquette, Kathleen Elliott Nickolls
HARDWARE SCHEDULING OF ORDERED CRITICAL CODE SECTIONS

Publication number: 20140123150

Abstract: One embodiment sets forth a technique for scheduling the execution of ordered critical code sections by multiple threads. A multithreaded processor includes an instruction scheduling unit that is configured to schedule threads to process ordered critical code sections. A ordered critical code section is preceded by a barrier instruction and when all of the threads have reached the barrier instruction, the instruction scheduling unit controls the thread execution order by selecting each thread for execution based on logical identifiers associated with the threads. The logical identifiers are mapped to physical identifiers that are referenced by the multithreaded processor during execution of the threads. The logical identifiers are used by the instruction scheduling unit to control the order in which the threads execute the ordered critical code section.

Type: Application

Filed: October 25, 2012

Publication date: May 1, 2014

Applicant: NVIDIA CORPORATION

Inventors: John Erik LINDHOLM, Tero Tapani KARRAS, Samuli Matias LAINE, Timo AILA
Systems and method for managing divergent threads in a SIMD architecture

Patent number: 8667256

Abstract: One embodiment of a computing system configured to manage divergent threads in a thread group includes a stack configured to store at least one token and a multithreaded processing unit. The multithreaded processing unit is configured to perform the steps of fetching a program instruction, determining that the program instruction is a branch instruction, determining that the program instruction is not a return or break instruction, determining whether the program instruction includes a set-synchronization bit, and updating an active program counter, where the manner in which the active program counter is updated depends on a branch instruction type.

Type: Grant

Filed: June 1, 2009

Date of Patent: March 4, 2014

Assignee: NVIDIA Corporation

Inventors: Brett W. Coon, John Erik Lindholm
Methods and apparatus for source operand collector caching

Patent number: 8639882

Abstract: Methods and apparatus for source operand collector caching. In one embodiment, a processor includes a register file that may be coupled to storage elements (i.e., an operand collector) that provide inputs to the datapath of the processor core for executing an instruction. In order to reduce bandwidth between the register file and the operand collector, operands may be cached and reused in subsequent instructions. A scheduling unit maintains a cache table for monitoring which register values are currently stored in the operand collector. The scheduling unit may also configure the operand collector to select the particular storage elements that are coupled to the inputs to the datapath for a given instruction.

Type: Grant

Filed: December 14, 2011

Date of Patent: January 28, 2014

Assignee: Nvidia Corporation

Inventors: Jack Hilaire Choquette, Manuel Olivier Gautho, John Erik Lindholm
Register indexed sampler for texture opcodes

Patent number: 8624910

Abstract: One embodiment of the present invention sets forth a technique for dynamically specifying a texture header and texture sampler using an index. The index corresponds to a particular register value that may be static or computed during execution of a shader program. Any texture operation instruction may specify an index value for each of the texture header and the texture sampler.

Type: Grant

Filed: August 25, 2010

Date of Patent: January 7, 2014

Assignee: Nvidia Corporation

Inventors: John Erik Lindholm, Yan Yan Tang
Dynamic load balancing of instructions for execution by heterogeneous processing engines

Patent number: 8578387

Abstract: An embodiment of a computing system is configured to process data using a multithreaded SIMD architecture that includes heterogeneous processing engines to execute a program. The program is constructed of various program instructions. A first type of the program instructions can only be executed by a first type of processing engine and a third type of program instructions can only be executed by a second type of processing engine. A second type of program instructions can be executed by the first and the second type of processing engines. An assignment unit may be configured to dynamically determine which of the two processing engines executes any program instructions of the second type in order to balance the workload between the heterogeneous processing engines.

Type: Grant

Filed: July 31, 2007

Date of Patent: November 5, 2013

Assignee: Nvidia Corporation

Inventors: Peter C. Mills, Stuart F. Oberman, John Erik Lindholm, Samuel Liu
System and method for accelerated ray-box intersection testing

Patent number: 8564589

Abstract: A method for performing a ray-box intersection test includes forming a span extending between a first plane-ray intersection point and a second plane-ray intersection point, and increasing the span by relocating to a new position at least one of the first and second plane-ray intersection points. A box intersection span is constructed using the increased span, and the box intersection span, which corresponds to a node in a hierarchical acceleration structure, is tested for intersection with the ray.

Type: Grant

Filed: May 17, 2010

Date of Patent: October 22, 2013

Assignee: NVIDIA Corporation

Inventors: Timo Aila, Samuli Laine, John Erik Lindholm
Cull before vertex attribute fetch and vertex lighting

Patent number: 8564616

Abstract: One embodiment of the invention sets forth a mechanism for compiling a vertex shader program into two portions, a culling portion and a shading portion. The culling portion of the compiled vertex shader program specifies vertex attributes and instructions of the vertex shader program needed to determine whether early vertex culling operations should be performed on a batch of vertices associated with one or more primitives of a graphics scene. The shading portion of the compiled vertex shader program specifies the remaining vertex attributes and instructions of the vertex shader program for performing vertex lighting and performing other operations on the vertices in the batch of vertices. When the compiled vertex shader program is executed by graphics processing hardware, the shading portion of the compiled vertex shader is executed only when early vertex culling operations are not performed on the batch of vertices.

Type: Grant

Filed: July 17, 2009

Date of Patent: October 22, 2013

Assignee: Nvidia Corporation

Inventors: Ziyad S. Hakura, John Erik Lindholm, Emmett M. Kilgariff, Robert Ohannessian, Scott R. Whitman, James C. Bowman, Patrick R. Brown, Ross A. Cunniff
Cull before vertex attribute fetch and vertex lighting

Patent number: 8542247

Abstract: One embodiment of the invention sets forth a mechanism for compiling a vertex shader program into two portions, a culling portion and a shading portion. The culling portion of the compiled vertex shader program specifies vertex attributes and instructions of the vertex shader program needed to determine whether early vertex culling operations should be performed on a batch of vertices associated with one or more primitives of a graphics scene. The shading portion of the compiled vertex shader program specifies the remaining vertex attributes and instructions of the vertex shader program for performing vertex lighting and performing other operations on the vertices in the batch of vertices. When the compiled vertex shader program is executed by graphics processing hardware, the shading portion of the compiled vertex shader is executed only when early vertex culling operations are not performed on the batch of vertices.

Type: Grant

Filed: July 17, 2009

Date of Patent: September 24, 2013

Assignee: Nvidia Corporation

Inventors: Ziyad S. Hakura, John Erik Lindholm, Emmett M. Kilgariff, Robert Ohannessian, Scott R. Whitman, James C. Bowman, Patrick R. Brown, Ross A. Cunniff
Reordering operands assigned to each one of read request ports concurrently accessing multibank register file to avoid bank conflict

Patent number: 8533435

Abstract: One embodiment of the present invention sets forth a technique for collecting operands specified by an instruction. As a sequence of instructions is received the operands specified by the instructions are assigned to ports, so that each one of the operands specified by a single instruction is assigned to a different port. Reading of the operands from a multi-bank register file is scheduled by selecting an operand from each one of the different ports to produce an operand read request and ensuring that two or more of the selected operands are not stored in the same bank of the multi-bank register file. The operands specified by the operand read request are read from the multi-bank register file in a single clock cycle. Each instruction is then executed as the operands specified by the instruction are read from the multi-bank register file and collected over one or more clock cycles.

Type: Grant

Filed: September 3, 2010

Date of Patent: September 10, 2013

Assignee: NVIDIA Corporation

Inventors: Xiaogang Qiu, Ming Y. Siu, Yan Yan Tang, John Erik Lindholm, Michael C. Shebanow, Stuart F. Oberman
System and method for performing ray tracing node traversal in image rendering

Patent number: 8502819

Abstract: A method for performing a ray tracing node traversal operation in an image rendering process includes traversing a plurality of nodes within spatial hierarchy that represents a scene which is to be rendered, the spatial hierarchy including two or more hierarchy levels, each hierarchy level including one or more nodes. A number representing the number of nodes traversed in each one of a plurality of different hierarchy levels is stored, wherein each number is represented by at least one bit in a multi-bit binary sequence.

Type: Grant

Filed: May 17, 2010

Date of Patent: August 6, 2013

Assignee: NVIDIA Corporation

Inventors: Timo Aila, Samuli Laine, John Erik Lindholm
METHODS AND APPARATUS FOR SOURCE OPERAND COLLECTOR CACHING

Publication number: 20130159628

Abstract: Methods and apparatus for source operand collector caching. In one embodiment, a processor includes a register file that may be coupled to storage elements (i.e., an operand collector) that provide inputs to the datapath of the processor core for executing an instruction. In order to reduce bandwidth between the register file and the operand collector, operands may be cached and reused in subsequent instructions. A scheduling unit maintains a cache table for monitoring which register values are currently stored in the operand collector. The scheduling unit may also configure the operand collector to select the particular storage elements that are coupled to the inputs to the datapath for a given instruction.

Type: Application

Filed: December 14, 2011

Publication date: June 20, 2013

Inventors: Jack Hilaire CHOQUETTE, Manuel Olivier Gautho, John Erik Lindholm
Programmable graphics processor for multithreaded execution of programs

Patent number: 8405665

Abstract: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.

Type: Grant

Filed: May 7, 2012

Date of Patent: March 26, 2013

Assignee: Nvidia Corporation

Inventors: John Erik Lindholm, Brett W. Coon, Stuart F. Oberman, Ming Y. Siu, Matthew P. Gerlach
Generating clip state for a batch of vertices

Patent number: 8384736

Abstract: One embodiment of the present invention sets forth a technique for generating a batch clip state stored in clip state machine (CSM) associated with a batch of vertices. Per-vertex clip state is generated for each vertex in the batch of vertices based on the position of each vertex relative to each clip plane. For a given vertex, per-vertex clip state indicates whether the vertex is inside or outside each of the one or more clip planes. The per-vertex clip states of all the vertices in the batch of vertices are coalesced into a batch clip state by determining whether each vertex in the batch of vertices is inside every clip plane, each vertex is outside at least one clip plane or neither. The batch clip state is stored in the CSM associated with the thread group that processes the batch of vertices that can be accessed by further stages of the graphics pipeline.

Type: Grant

Filed: October 14, 2009

Date of Patent: February 26, 2013

Assignee: NVIDIA Corporation

Inventors: John Erik Lindholm, Ziyad S. Hakura
Indirect function call instructions in a synchronous parallel thread processor

Patent number: 8312254

Abstract: An indirect branch instruction takes an address register as an argument in order to provide indirect function call capability for single-instruction multiple-thread (SIMT) processor architectures. The indirect branch instruction is used to implement indirect function calls, virtual function calls, and switch statements to improve processing performance compared with using sequential chains of tests and branches.

Type: Grant

Filed: March 24, 2008

Date of Patent: November 13, 2012

Assignee: NVIDIA Corporation

Inventors: Brett W. Coon, John R. Nickolls, Lars Nyland, Peter C. Mills, John Erik Lindholm
System, method and article of manufacture for a programmable processing model with instruction set

Patent number: 8264492

Abstract: A system, method and article of manufacture are provided for programmable processing in a computer graphics pipeline. Initially, data is received from a source buffer. Thereafter, programmable operations are performed on the data in order to generate output. The operations are programmable in that a user may utilize instructions from a predetermined instruction set for generating the same. Such output is stored in a register. During operation, the output stored in the register is used in performing the programmable operations on the data.

Type: Grant

Filed: November 19, 2007

Date of Patent: September 11, 2012

Assignee: NVIDIA Corporation

Inventors: John Erik Lindholm, David B. Kirk, Henry P. Moreton, Simon Moy
System, method and article of manufacture for a programmable processing model with instruction set

Patent number: 8259122

Abstract: A system, method and article of manufacture are provided for programmable processing in a computer graphics pipeline. Initially, data is received from a source buffer. Thereafter, programmable operations are performed on the data in order to generate output. The operations are programmable in that a user may utilize instructions from a predetermined instruction set for generating the same. Such output is stored in a register. During operation, the output stored in the register is used in performing the programmable operations on the data.

Type: Grant

Filed: November 19, 2007

Date of Patent: September 4, 2012

Assignee: NVIDIA Corporation

Inventors: John Erik Lindholm, David B. Kirk, Henry P. Moreton, Simon Moy
PROGRAMMABLE GRAPHICS PROCESSOR FOR MULTITHREADED EXECUTION OF PROGRAMS

Publication number: 20120218267

Abstract: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.

Type: Application

Filed: May 7, 2012

Publication date: August 30, 2012

Inventors: John Erik Lindholm, Brett W. Coon, Stuart F. Oberman, Ming Y. Siu, Matthew P. Gerlach
Hierarchical processor array

Patent number: 8237705

Abstract: Apparatuses and methods are presented for a hierarchical processor. The processor comprises, at a first level of hierarchy, a plurality of similarly structured first level components, wherein each of the plurality of similarly structured first level components includes at least one combined function module capable of performing multiple classes of graphics operations, each of the multiple classes of graphics operations being associated with a different stage of graphics processing.

Type: Grant

Filed: October 10, 2011

Date of Patent: August 7, 2012

Assignee: NVIDIA Corporation

Inventors: John Erik Lindholm, John S. Montrym, Emmett M. Kilgariff, Simon S. Moy, Sean Jeffrey Treichler, Brett W. Coon, David Kirk, John Danskin
Method and system for connecting multiple shaders

Patent number: 8223158

Abstract: A method and system for connecting multiple shaders are disclosed. Specifically, one embodiment of the present invention sets forth a method, which includes the steps of configuring a set of shaders in a user-defined sequence within a modular pipeline (MPipe), allocating resources to execute the programming instructions of each of the set of shaders in the user-defined sequence to operate on the data unit, and directing the output of the MPipe to an external sink.

Type: Grant

Filed: December 19, 2006

Date of Patent: July 17, 2012

Assignee: NVIDIA Corporation

Inventors: John Erik Lindholm, Michael C. Shebanow, Jerome F. Duluk, Jr.

prev 1 2 3 4 5 6 7 … next