Patents by Inventor John Erik Lindholm

John Erik Lindholm has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Reduction operations in a synchronous parallel thread processing system with disabled execution threads

Patent number: 8200940

Abstract: A system and method for successfully performing reduction operations in a multi-threaded SIMD (single-instruction multiple-data) system while one or more threads are disabled allows for the reduction operations to be performed without a performance penalty compared with performing the same operation with all of the threads enabled. The source data for each intermediate computation of the reduction operation is remapped by a configurable crossbar as needed to avoid using invalid data from the disabled threads. The remapping function is transparent to the user and enables correct execution of order invariant reduction operations and order dependent prefix-reduction operations.

Type: Grant

Filed: June 30, 2008

Date of Patent: June 12, 2012

Assignee: NVIDIA Corporation

Inventor: John Erik Lindholm
Programmable graphics processor for multithreaded execution of programs

Patent number: 8174531

Abstract: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.

Type: Grant

Filed: December 29, 2009

Date of Patent: May 8, 2012

Assignee: NVIDIA Corporation

Inventors: John Erik Lindholm, Brett W. Coon, Stuart F. Oberman, Ming Y. Siu, Matthew P. Gerlach
THREAD GROUP SCHEDULER FOR COMPUTING ON A PARALLEL THREAD PROCESSOR

Publication number: 20120110586

Abstract: A parallel thread processor executes thread groups belonging to multiple cooperative thread arrays (CTAs). At each cycle of the parallel thread processor, an instruction scheduler selects a thread group to be issued for execution during a subsequent cycle. The instruction scheduler selects a thread group to issue for execution by (i) identifying a pool of available thread groups, (ii) identifying a CTA that has the greatest seniority value, and (iii) selecting the thread group that has the greatest credit value from within the CTA with the greatest seniority value.

Type: Application

Filed: September 28, 2011

Publication date: May 3, 2012

Inventors: Brett W. Coon, John R. Nickolls, John Erik Lindholm, Robert J. Stoll, Nicholas Wang, Jack Hilaire Choquette, Kathleen Elliott Nickolls
Subdividing a shader program

Patent number: 8159496

Abstract: Methods and apparatus for subdividing a shader program into regions or “phases” of instructions identifiable by phase identifiers (IDs) inserted into the shader program are provided. The phase IDs may be used to constrain execution of the shader program to prohibit texture fetches in later phases from being executed before a texture fetch in a current phase has completed. Other operations (e.g., math operations) within the current phase, however, may be allowed to execute while waiting for the current phase texture fetch to complete.

Type: Grant

Filed: June 1, 2009

Date of Patent: April 17, 2012

Assignee: NVIDIA Corporation

Inventors: John Erik Lindholm, Brett W. Coon, Gary M Tarolli
Two-Level Scheduler for Multi-Threaded Processing

Publication number: 20120079503

Abstract: One embodiment of the present invention sets forth a technique for scheduling thread execution in a multi-threaded processing environment. A two-level scheduler maintains a small set of active threads called strands to hide function unit pipeline latency and local memory access latency. The strands are a sub-set of a larger set of pending threads that is also maintained by the two-leveler scheduler. Pending threads are promoted to strands and strands are demoted to pending threads based on latency characteristics. The two-level scheduler selects strands for execution based on strand state. The longer latency of the pending threads is hidden by selecting strands for execution. When the latency for a pending thread is expired, the pending thread may be promoted to a strand and begin (or resume) execution. When a strand encounters a latency event, the strand may be demoted to a pending thread while the latency is incurred.

Type: Application

Filed: June 1, 2011

Publication date: March 29, 2012

Inventors: William James DALLY, Stephen William Keckler, David Tarjan, John Erik Lindholm, Mark Alan Gebhart, Daniel Robert Johnson
INSTRUCTION EXECUTION BASED ON OUTSTANDING LOAD OPERATIONS

Publication number: 20120079241

Abstract: One embodiment of the present invention sets forth a technique for scheduling thread execution in a multi-threaded processing environment. A two-level scheduler maintains a small set of active threads called strands to hide function unit pipeline latency and local memory access latency. The strands are a sub-set of a larger set of pending threads that is also maintained by the two-leveler scheduler. Pending threads are promoted to strands and strands are demoted to pending threads based on latency characteristics, such as whether outstanding load operations have been executed. The longer latency of the pending threads is hidden by selecting strands for execution. When the latency for a pending thread is expired, the pending thread may be promoted to a strand and begin (or resume) execution. When a strand encounters a latency event, the strand may be demoted to a pending thread while the latency is incurred.

Type: Application

Filed: September 23, 2011

Publication date: March 29, 2012

Inventors: William James DALLY, John Erik Lindholm
PARALLEL ARRAY ARCHITECTURE FOR A GRAPHICS PROCESSOR

Publication number: 20120026171

Abstract: A parallel array architecture for a graphics processor includes a multithreaded core array including a plurality of processing clusters, each processing cluster including at least one processing core operable to execute a pixel shader program that generates pixel data from coverage data; a rasterizer configured to generate coverage data for each of a plurality of pixels; and pixel distribution logic configured to deliver the coverage data from the rasterizer to one of the processing clusters in the multithreaded core array. A crossbar coupled to each of the processing clusters is configured to deliver pixel data from the processing clusters to a frame buffer having a plurality of partitions.

Type: Application

Filed: October 7, 2011

Publication date: February 2, 2012

Applicant: NVIDIA Corporation

Inventors: John M. Danskin, John S. Montrym, John Erik Lindholm, Steven E. Molnar, Mark French
HIERARCHICAL PROCESSOR ARRAY

Publication number: 20120026175

Abstract: Apparatuses and methods are presented for a hierarchical processor. The processor comprises, at a first level of hierarchy, a plurality of similarly structured first level components, wherein each of the plurality of similarly structured first level components includes at least one combined function module capable of performing multiple classes of graphics operations, each of the multiple classes of graphics operations being associated with a different stage of graphics processing.

Type: Application

Filed: October 10, 2011

Publication date: February 2, 2012

Applicant: NVIDIA Corporation

Inventors: John Erik Lindholm, John S. Montrym, Emmett M. Kilgariff, Simon S. Moy, Sean Jeffrey Treichler, Brett W. Coon, David Kirk, John Danskin
Thread-type-based resource allocation in a multithreaded processor

Patent number: 8108872

Abstract: Resources to be used by concurrent threads in a multithreaded processor are allocated based on thread types of the threads. For each of at least two thread types, an amount of the resource is reserved, and amounts currently allocated are tracked. When a request to allocate some of the resource to a new thread is received, a determination as to whether the allocation can be made is based on the thread type of the new thread, the amount of the resource reserved for that thread type, and the amount currently allocated to threads of that type.

Type: Grant

Filed: October 23, 2006

Date of Patent: January 31, 2012

Assignee: NVIDIA Corporation

Inventors: John Erik Lindholm, Bryon S. Nordquist, Simon S. Moy, Svetoslav D. Tzvetkov
Thread-type-based load balancing in a multithreaded processor

Patent number: 8087029

Abstract: Resources to be used by concurrent threads in a multithreaded processor are allocated based on thread types of the threads, and thread-type-based criteria governing resource allocation decisions are dynamically modified based on feedback information indicating the degree to which various thread types are using the resource. For each of at least two thread types, an amount of the resource is reserved, and amounts currently allocated are tracked. When an allocation request for a new thread is received, the allocation is made or not based on the new thread's type, the amount of the resource reserved for that type, and the amount currently allocated to threads of that type. If, based on feedback information from the allocation decision, the amount of the resource reserved for one thread type is determined to be insufficient, the reserved amounts are modified to better meet the demand.

Type: Grant

Filed: October 23, 2006

Date of Patent: December 27, 2011

Assignee: NVIDIA Corporation

Inventors: John Erik Lindholm, Bryon S. Nordquist, Simon S. Moy, Svetoslav D. Tzvetkov
Hierarchical processor array

Patent number: 8077174

Abstract: Apparatuses and methods are presented for a hierarchical processor. The processor comprises, at a first level of hierarchy, a plurality of similarly structured first level components, wherein each of the plurality of similarly structured first level components includes at least one combined function module capable of performing multiple classes of graphics operations, each of the multiple classes of graphics operations being associated with a different stage of graphics processing.

Type: Grant

Filed: November 1, 2007

Date of Patent: December 13, 2011

Assignee: NVIDIA Corporation

Inventors: John Erik Lindholm, John S. Montrym, Emmett M. Kilgariff, Simon S. Moy, Sean Jeffrey Treichler, Brett W. Coon, David Kirk, John Danskin
Multiple simultaneous context architecture

Patent number: 7979683

Abstract: Graphics processing elements are capable of processing multiple contexts simultaneously, reducing the need to perform time consuming context switches compared with processing a single context at a time. Processing elements of a graphics processing pipeline may be configured to support all of the multiple contexts or only a portion of the multiple contexts. Each processing element may be allocated to process a particular context or a portion of the multiple contexts in order to simultaneously process more than one context. The allocation of processing elements to the multiple contexts may be determined dynamically in order to improve graphics processing throughput.

Type: Grant

Filed: April 5, 2007

Date of Patent: July 12, 2011

Assignee: NVIDIA Corporation

Inventors: John M. Danskin, John Erik Lindholm
Scheduler in multi-threaded processor prioritizing instructions passing qualification rule

Patent number: 7949855

Abstract: A processor buffers asynchronous threads. Instructions requiring operations provided by a plurality of execution units are divided into phases, each phase having at least one computation operation and at least one memory access operation. Instructions within each phase are qualified and prioritized. The instructions may be qualified based on the status of the execution unit needed to execute one or more of the current instructions. The instructions may also be qualified based on an age of each instruction, status of the execution units, a divergence potential, locality, thread diversity, and resource requirements. Qualified instructions may be prioritized based on execution units needed to execute instructions and the execution units in use. One or more of the prioritized instructions is issued per cycle to the plurality of execution units.

Type: Grant

Filed: April 28, 2008

Date of Patent: May 24, 2011

Assignee: NVIDIA Corporation

Inventors: Peter C. Mills, John Erik Lindholm, Brett W. Coon, Gary M. Tarolli, John Matthew Burgess
USING A PIXEL OFFSET FOR EVALUATING A PLANE EQUATION

Publication number: 20110081100

Abstract: One embodiment of the present invention sets forth a technique controlling the pixel location at which the plane equation is evaluated. Multiple pixel offsets (dx, dy) may be specified that each define to a sub-pixel sample position. Attributes are then calculated for each sub-pixel sample position that is covered by a geometric primitive. One advantage of the technique is that anti-aliasing quality may be improved since high frequency color components may be selectively supersampled for particular geometric primitives.

Type: Application

Filed: October 5, 2010

Publication date: April 7, 2011

Inventors: John Erik Lindholm, Henry Packard Moreton, Ming Y. Siu, Stuart F. Oberman
Architecture and Instructions for Accessing Multi-Dimensional Formatted Surface Memory

Publication number: 20110074802

Abstract: One embodiment of the present invention sets forth a technique for a program to access multi-dimensional formatted graphics surface memory. Multi-dimensional memory objects called “surfaces” stored in a user-specified data or pixel format and arranged in a graphics optimized layout are accessed by programs using surface instructions. A set of memory access instructions e.g., load, store, reduce, and atomic, referred to as surface instructions, may be used to access the surfaces. Coordinate bounds checking is performed with configurable clamping. Caching behavior may also be specified by the surface instructions. Data format conversion and packing to a specified storage format is supported for store, reduction, and atomic surface instructions. Data format conversion and unpacking from a specified storage format is supported for loads and atomic surface instructions.

Type: Application

Filed: September 24, 2010

Publication date: March 31, 2011

Inventors: John R. Nickolls, Brian Fahs, Lars Nyland, John Erik Lindholm, Richard Craig Johnson
Unified Collector Structure for Multi-Bank Register File

Publication number: 20110072243

Abstract: One embodiment of the present invention sets forth a technique for collecting operands specified by an instruction. As a sequence of instructions is received the operands specified by the instructions are assigned to ports, so that each one of the operands specified by a single instruction is assigned to a different port. Reading of the operands from a multi-bank register file is scheduled by selecting an operand from each one of the different ports to produce an operand read request and ensuring that two or more of the selected operands are not stored in the same bank of the multi-bank register file. The operands specified by the operand read request are read from the multi-bank register file in a single clock cycle. Each instruction is then executed as the operands specified by the instruction are read from the multi-bank register file and collected over one or more clock cycles.

Type: Application

Filed: September 3, 2010

Publication date: March 24, 2011

Inventors: Xiaogang Qiu, Ming Y. Siu, Yan Yan Tang, John Erik Lindholm, Michael C. Shebanow, Stuart F. Oberman
Credit-Based Streaming Multiprocessor Warp Scheduling

Publication number: 20110072244

Abstract: One embodiment of the present invention sets forth a technique for ensuring cache access instructions are scheduled for execution in a multi-threaded system to improve cache locality and system performance. A credit-based technique may be used to control instruction by instruction scheduling for each warp in a group so that the group of warps is processed uniformly. A credit is computed for each warp and the credit contributes to a weight for each warp. The weight is used to select instructions for the warps that are issued for execution.

Type: Application

Filed: September 17, 2010

Publication date: March 24, 2011

Inventors: John Erik Lindholm, Brett W. Coon, Jered Wierzbicki, Robert J. Stoll, Stuart F. Oberman
REGISTER INDEXED SAMPLER FOR TEXTURE OPCODES

Publication number: 20110069076

Abstract: One embodiment of the present invention sets forth a technique for dynamically specifying a texture header and texture sampler using an index. The index corresponds to a particular register value that may be static or computed during execution of a shader program. Any texture operation instruction may specify an index value for each of the texture header and the texture sampler.

Type: Application

Filed: August 25, 2010

Publication date: March 24, 2011

Inventors: John Erik LINDHOLM, Yan Yan Tang
Z-texture mapping system, method and computer program product

Patent number: 7889208

Abstract: A system, method and computer program product are provided for computer graphics processing. In use, a value is modified based on an algorithm. An operation is subsequently performed on pixel data taking into account the modified value.

Type: Grant

Filed: March 18, 2004

Date of Patent: February 15, 2011

Assignee: NVIDIA Corporation

Inventors: Henry P. Moreton, John Erik Lindholm, Matthew N. Papakipos, Harold Robert Feldman Zatz
Structured programming control flow in a SIMD architecture

Patent number: 7877585

Abstract: One embodiment of a computing system configured to manage divergent threads in a SIMD thread group includes a stack configured to store state information for processing control instructions. A parallel processing unit is configured to perform the steps of determining if one or more threads diverge during execution of a conditional control instruction. A disable mask allows for the use of conditional return and break instructions in a multithreaded SIMD architecture. Additional control instructions are used to set up thread processing target addresses for synchronization, breaks, and returns.

Type: Grant

Filed: August 27, 2007

Date of Patent: January 25, 2011

Assignee: NVIDIA Corporation

Inventors: Brett W. Coon, John R. Nickolls, John Erik Lindholm, Svetoslav D. Tzvetkov

prev 1 2 3 4 5 6 7 8 … next