Patents by Inventor John Erik Lindholm

John Erik Lindholm has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Dispatching of instructions for execution by heterogeneous processing engines

Patent number: 9304775

Abstract: An embodiment of a computing system is configured to process data using a multithreaded SIMD architecture that includes heterogeneous processing engines to execute a program. The program is constructed of various program instructions. A first type of the program instructions can only be executed by a first type of processing engine and a second type of program instructions can only be executed by a second type of processing engine. A third type of program instructions can be executed by the first and the second type of processing engines. An instruction dispatcher is configured to identify and remove program instruction execution conflicts for the heterogeneous processing engines to improve instruction execution throughput.

Type: Grant

Filed: November 5, 2007

Date of Patent: April 5, 2016

Assignee: NVIDIA Corporation

Inventors: John Erik Lindholm, Jered Wierzbicki
BEAM TRACING

Publication number: 20160071310

Abstract: An apparatus, computer readable medium, and method are disclosed for performing an intersection query between a query beam and a target bounding volume. The target bounding volume may comprise an axis-aligned bounding box (AABB) associated with a bounding volume hierarchy (BVH) tree. An intersection query comprising beam information associated with the query beam and slab boundary information for a first dimension of a target bounding volume is received. Intersection parameter values are calculated for the first dimension based on the beam information and the slab boundary information and a slab intersection case is determined for the first dimension based on the beam information. A parametric variable range for the first dimension is assigned based on the slab intersection case and the intersection parameter values and it is determined whether the query beam intersects the target bounding volume based on at least the parametric variable range for the first dimension.

Type: Application

Filed: March 18, 2015

Publication date: March 10, 2016

Inventors: Tero Tapani Karras, Timo Oskari Aila, Samuli Matias Laine, John Erik Lindholm
Using condition codes in the presence of non-numeric values

Patent number: 9195460

Abstract: Systems and methods for compiling programs using condition codes and executing those programs when non-numeric values are present allow for explicit handling of non-numeric values. In addition to the conventional condition code values of positive, negative, and zero, a fourth value may be encoded, not a number (NaN) representing a non-numeric value. New condition tests are defined that explicitly account for condition code values of NaN. A compiler may produce code using the new condition tests to represent if and if-else statements. The code including the new condition tests generates deterministic results during execution when non-numeric values are present.

Type: Grant

Filed: May 2, 2006

Date of Patent: November 24, 2015

Assignee: NVIDIA CORPORATION

Inventors: Robert Steven Glanville, John Erik Lindholm, Ming Y. Siu
Credit-based streaming multiprocessor warp scheduling

Patent number: 9189242

Abstract: One embodiment of the present invention sets forth a technique for ensuring cache access instructions are scheduled for execution in a multi-threaded system to improve cache locality and system performance. A credit-based technique may be used to control instruction by instruction scheduling for each warp in a group so that the group of warps is processed uniformly. A credit is computed for each warp and the credit contributes to a weight for each warp. The weight is used to select instructions for the warps that are issued for execution.

Type: Grant

Filed: September 17, 2010

Date of Patent: November 17, 2015

Assignee: NVIDIA Corporation

Inventors: John Erik Lindholm, Brett W. Coon, Jered Wierzbicki, Robert J. Stoll, Stuart F. Oberman
Graphics processing unit with a texture return buffer and a texture queue

Patent number: 9171525

Abstract: A processor and a system are provided for performing texturing operations loaded from a texture queue that provides temporary storage of texture coordinates and texture values. The processor includes a texture queue implemented in a memory of the processor, a crossbar coupled to the texture queue, and one or more texture units coupled to the texture queue via the crossbar. The crossbar is configured to reorder texture coordinates for consumption by the one or more texture units and to reorder texture values received from the one or more texture units.

Type: Grant

Filed: February 26, 2013

Date of Patent: October 27, 2015

Assignee: NVIDIA Corporation

Inventor: John Erik Lindholm
Hardware scheduling of ordered critical code sections

Patent number: 9158595

Abstract: One embodiment sets forth a technique for scheduling the execution of ordered critical code sections by multiple threads. A multithreaded processor includes an instruction scheduling unit that is configured to schedule threads to process ordered critical code sections. A ordered critical code section is preceded by a barrier instruction and when all of the threads have reached the barrier instruction, the instruction scheduling unit controls the thread execution order by selecting each thread for execution based on logical identifiers associated with the threads. The logical identifiers are mapped to physical identifiers that are referenced by the multithreaded processor during execution of the threads. The logical identifiers are used by the instruction scheduling unit to control the order in which the threads execute the ordered critical code section.

Type: Grant

Filed: October 25, 2012

Date of Patent: October 13, 2015

Assignee: NVIDIA Corporation

Inventors: John Erik Lindholm, Tero Tapani Karras, Samuli Matias Laine, Timo Aila
TREE-BASED THREAD MANAGEMENT

Publication number: 20150205606

Abstract: In one embodiment of the present invention, a streaming multiprocessor (SM) uses a tree of nodes to manage threads. Each node specifies a set of active threads and a program counter. Upon encountering a conditional instruction that causes an execution path to diverge, the SM creates child nodes corresponding to each of the divergent execution paths. Based on the conditional instruction, the SM assigns each active thread included in the parent node to at most one child node, and the SM temporarily discontinues executing instructions specified by the parent node. Instead, the SM concurrently executes instructions specified by the child nodes. After all the divergent paths reconverge to the parent path, the SM resumes executing instructions specified by the parent node. Advantageously, the disclosed techniques enable the SM to execute divergent paths in parallel, thereby reducing undesirable program behavior associated with conventional techniques that serialize divergent paths across thread groups.

Type: Application

Filed: January 21, 2014

Publication date: July 23, 2015

Applicant: NVIDIA CORPORATION

Inventors: John Erik LINDHOLM, Michael C. SHEBANOW
TREE-BASED THREAD MANAGEMENT

Publication number: 20150205607

Abstract: In one embodiment of the present invention, a streaming multiprocessor (SM) uses a tree of nodes to manage threads. Each node specifies a set of active threads and a program counter. Upon encountering a conditional instruction that causes an execution path to diverge, the SM creates child nodes corresponding to each of the divergent execution paths. Based on the conditional instruction, the SM assigns each active thread included in the parent node to at most one child node, and the SM temporarily discontinues executing instructions specified by the parent node. Instead, the SM concurrently executes instructions specified by the child nodes. After all the divergent paths reconverge to the parent path, the SM resumes executing instructions specified by the parent node. Advantageously, the disclosed techniques enable the SM to execute divergent paths in parallel, thereby reducing undesirable program behavior associated with conventional techniques that serialize divergent paths across thread groups.

Type: Application

Filed: January 21, 2014

Publication date: July 23, 2015

Applicant: NVIDIA CORPORATION

Inventor: John Erik LINDHOLM
Using a pixel offset for evaluating a plane equation

Patent number: 9058672

Abstract: One embodiment of the present invention sets forth a technique controlling the pixel location at which the plane equation is evaluated. Multiple pixel offsets (dx, dy) may be specified that each define to a sub-pixel sample position. Attributes are then calculated for each sub-pixel sample position that is covered by a geometric primitive. One advantage of the technique is that anti-aliasing quality may be improved since high frequency color components may be selectively supersampled for particular geometric primitives.

Type: Grant

Filed: October 5, 2010

Date of Patent: June 16, 2015

Assignee: NVIDIA Corporation

Inventors: John Erik Lindholm, Henry Packard Moreton, Ming Y. Siu, Stuart F. Oberman
Generating clip state for a batch of vertices

Patent number: 8976195

Abstract: One embodiment of the present invention sets forth a technique for generating a batch clip state stored in clip state machine (CSM) associated with a batch of vertices. Per-vertex clip state is generated for each vertex in the batch of vertices based on the position of each vertex relative to each clip plane. For a given vertex, per-vertex clip state indicates whether the vertex is inside or outside each of the one or more clip planes. The per-vertex clip states of all the vertices in the batch of vertices are coalesced into a batch clip state by determining whether each vertex in the batch of vertices is inside every clip plane, each vertex is outside at least one clip plane or neither. The batch clip state is stored in the CSM associated with the thread group that processes the batch of vertices that can be accessed by further stages of the graphics pipeline.

Type: Grant

Filed: October 14, 2009

Date of Patent: March 10, 2015

Assignee: NVIDIA Corporation

Inventors: John Erik Lindholm, Ziyad S. Hakura
Approach for a configurable phase-based priority scheduler

Patent number: 8949841

Abstract: A streaming multiprocessor (SM) in a parallel processing subsystem schedules priority among a plurality of threads. The SM retrieves a priority descriptor associated with a thread group, and determines whether the thread group and a second thread group are both operating in the same phase. If so, then the method determines whether the priority descriptor of the thread group indicates a higher priority than the priority descriptor of the second thread group. If so, the SM skews the thread group relative to the second thread group such that the thread groups operate in different phases, otherwise the SM increases the priority of the thread group. f the thread groups are not operating in the same phase, then the SM increases the priority of the thread group. One advantage of the disclosed techniques is that thread groups execute with increased efficiency, resulting in improved processor performance.

Type: Grant

Filed: December 27, 2012

Date of Patent: February 3, 2015

Assignee: NVIDIA Corporation

Inventors: Jack Hilaire Choquette, Olivier Giroux, Robert J. Stoll, Gary M. Tarolli, John Erik Lindholm
Programmable graphics processor for multithreaded execution of programs

Patent number: 8860737

Abstract: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.

Type: Grant

Filed: July 19, 2006

Date of Patent: October 14, 2014

Assignee: NVIDIA Corporation

Inventors: John Erik Lindholm, Brett W. Coon, Stuart F. Oberman, Ming Y. Siu, Matthew P. Gerlach
PROGRAMMABLE GRAPHICS PROCESSOR FOR MULTITHREADED EXECUTION OF PROGRAMS

Publication number: 20140285500

Abstract: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.

Type: Application

Filed: March 25, 2013

Publication date: September 25, 2014

Applicant: NVIDIA Corporation

Inventors: John Erik LINDHOLM, Brett W. COON, Stuart F. OBERMAN, Ming Y. SIU, Matthew P. GERLACH
SYSTEM AND METHOD FOR HARDWARE SCHEDULING OF INDEXED BARRIERS

Publication number: 20140282566

Abstract: A method and a system are provided for hardware scheduling of indexed barrier instructions. Execution of a plurality of threads to process instructions of a program that includes a barrier instruction is initiated and when each thread reaches the barrier instruction, the thread pauses execution of the instructions. A first sub-group of threads in the plurality of threads is associated with a first sub-barrier index and a second sub-group of threads in the plurality of threads is associated with a second sub-barrier index. When the barrier instruction can be scheduled for execution, threads in the first sub-group are executed serially and threads in the second sub-group are executed serially and at least one thread in the first sub-group is executed in parallel with at least one thread in the second sub-group.

Type: Application

Filed: March 15, 2013

Publication date: September 18, 2014

Applicant: NVIDIA CORPORATION

Inventors: John Erik Lindholm, Tero Tapani Karras
SYSTEM AND METHOD FOR HARDWARE SCHEDULING OF CONDITIONAL BARRIERS AND IMPATIENT BARRIERS

Publication number: 20140258693

Abstract: A method and a system are provided for hardware scheduling of barrier instructions. Execution of a plurality of threads to process instructions of a program that includes a barrier instruction is initiated, and when each thread reaches the barrier instruction during execution of program, it is determined whether the thread participates in the barrier instruction. The threads that participate in the barrier instruction are then serially executed to process one or more instructions of the program that follow the barrier instruction. A method and system are also provided for impatient scheduling of barrier instructions. When a portion of the threads that is greater than a minimum number of threads and less than all of the threads in the plurality of threads reaches the barrier instruction each of the threads in the portion is serially executed to process one or more instructions of the program that follow the barrier instruction.

Type: Application

Filed: March 11, 2013

Publication date: September 11, 2014

Applicant: NVIDIA CORPORATION

Inventors: John Erik Lindholm, Tero Tapani Karras, Timo Oskari Aila, Samuli Matias Laine
GRAPHICS PROCESSING UNIT WITH A TEXTURE RETURN BUFFER AND A TEXTURE QUEUE

Publication number: 20140240337

Abstract: A processor and a system are provided for performing texturing operations loaded from a texture queue that provides temporary storage of texture coordinates and texture values. The processor includes a texture queue implemented in a memory of the processor, a crossbar coupled to the texture queue, and one or more texture units coupled to the texture queue via the crossbar. The crossbar is configured to reorder texture coordinates for consumption by the one or more texture units and to reorder texture values received from the one or more texture units.

Type: Application

Filed: February 26, 2013

Publication date: August 28, 2014

Applicant: NVIDIA CORPORATION

Inventor: John Erik Lindholm
APPROACH FOR A CONFIGURABLE PHASE-BASED PRIORITY SCHEDULER

Publication number: 20140189698

Abstract: A streaming multiprocessor (SM) in a parallel processing subsystem schedules priority among a plurality of threads. The SM retrieves a priority descriptor associated with a thread group, and determines whether the thread group and a second thread group are both operating in the same phase. If so, then the method determines whether the priority descriptor of the thread group indicates a higher priority than the priority descriptor of the second thread group. If so, the SM skews the thread group relative to the second thread group such that the thread groups operate in different phases, otherwise the SM increases the priority of the thread group. f the thread groups are not operating in the same phase, then the SM increases the priority of the thread group. One advantage of the disclosed techniques is that thread groups execute with increased efficiency, resulting in improved processor performance.

Type: Application

Filed: December 27, 2012

Publication date: July 3, 2014

Applicant: NVIDIA Corporation

Inventors: Jack Hilaire CHOQUETTE, Olivier GIROUX, Robert J. STOLL, Gary M. TAROLLI, John Erik LINDHOLM
Parallel array architecture for a graphics processor

Patent number: 8730249

Abstract: A parallel array architecture for a graphics processor includes a multithreaded core array including a plurality of processing clusters, each processing cluster including at least one processing core operable to execute a pixel shader program that generates pixel data from coverage data; a rasterizer configured to generate coverage data for each of a plurality of pixels; and pixel distribution logic configured to deliver the coverage data from the rasterizer to one of the processing clusters in the multithreaded core array. A crossbar coupled to each of the processing clusters is configured to deliver pixel data from the processing clusters to a frame buffer having a plurality of partitions.

Type: Grant

Filed: October 7, 2011

Date of Patent: May 20, 2014

Assignee: NVIDIA Corporation

Inventors: John M. Danskin, John S. Montrym, John Erik Lindholm, Steven E. Molnar, Mark French
System, method and computer program product for bump mapping

Patent number: 8730252

Abstract: A system, method and computer program product are provided for bump mapping in a hardware graphics processor. Initially, a first set of texture coordinates is received. The texture coordinates are then multiplied by a matrix to generate results. A second set of texture coordinates is then offset utilizing the results. The offset second set of texture coordinates is then mapped to color.

Type: Grant

Filed: March 31, 2004

Date of Patent: May 20, 2014

Assignee: NVIDIA Corporation

Inventors: Henry P. Moreton, John Erik Lindholm, Matthew N. Papakipos, Harold Robert Feldman Zatz
Two-level scheduler for multi-threaded processing

Patent number: 8732711

Abstract: One embodiment of the present invention sets forth a technique for scheduling thread execution in a multi-threaded processing environment. A two-level scheduler maintains a small set of active threads called strands to hide function unit pipeline latency and local memory access latency. The strands are a sub-set of a larger set of pending threads that is also maintained by the two-leveler scheduler. Pending threads are promoted to strands and strands are demoted to pending threads based on latency characteristics. The two-level scheduler selects strands for execution based on strand state. The longer latency of the pending threads is hidden by selecting strands for execution. When the latency for a pending thread is expired, the pending thread may be promoted to a strand and begin (or resume) execution. When a strand encounters a latency event, the strand may be demoted to a pending thread while the latency is incurred.

Type: Grant

Filed: June 1, 2011

Date of Patent: May 20, 2014

Assignee: NVIDIA Corporation

Inventors: William James Dally, Stephen William Keckler, David Tarjan, John Erik Lindholm, Mark Alan Gebhart, Daniel Robert Johnson

prev 1 2 3 4 5 6 … next