Patents by Inventor John Erik Lindholm

John Erik Lindholm has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 8200940
    Abstract: A system and method for successfully performing reduction operations in a multi-threaded SIMD (single-instruction multiple-data) system while one or more threads are disabled allows for the reduction operations to be performed without a performance penalty compared with performing the same operation with all of the threads enabled. The source data for each intermediate computation of the reduction operation is remapped by a configurable crossbar as needed to avoid using invalid data from the disabled threads. The remapping function is transparent to the user and enables correct execution of order invariant reduction operations and order dependent prefix-reduction operations.
    Type: Grant
    Filed: June 30, 2008
    Date of Patent: June 12, 2012
    Assignee: NVIDIA Corporation
    Inventor: John Erik Lindholm
  • Patent number: 8174531
    Abstract: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.
    Type: Grant
    Filed: December 29, 2009
    Date of Patent: May 8, 2012
    Assignee: NVIDIA Corporation
    Inventors: John Erik Lindholm, Brett W. Coon, Stuart F. Oberman, Ming Y. Siu, Matthew P. Gerlach
  • Publication number: 20120110586
    Abstract: A parallel thread processor executes thread groups belonging to multiple cooperative thread arrays (CTAs). At each cycle of the parallel thread processor, an instruction scheduler selects a thread group to be issued for execution during a subsequent cycle. The instruction scheduler selects a thread group to issue for execution by (i) identifying a pool of available thread groups, (ii) identifying a CTA that has the greatest seniority value, and (iii) selecting the thread group that has the greatest credit value from within the CTA with the greatest seniority value.
    Type: Application
    Filed: September 28, 2011
    Publication date: May 3, 2012
    Inventors: Brett W. Coon, John R. Nickolls, John Erik Lindholm, Robert J. Stoll, Nicholas Wang, Jack Hilaire Choquette, Kathleen Elliott Nickolls
  • Patent number: 8159496
    Abstract: Methods and apparatus for subdividing a shader program into regions or “phases” of instructions identifiable by phase identifiers (IDs) inserted into the shader program are provided. The phase IDs may be used to constrain execution of the shader program to prohibit texture fetches in later phases from being executed before a texture fetch in a current phase has completed. Other operations (e.g., math operations) within the current phase, however, may be allowed to execute while waiting for the current phase texture fetch to complete.
    Type: Grant
    Filed: June 1, 2009
    Date of Patent: April 17, 2012
    Assignee: NVIDIA Corporation
    Inventors: John Erik Lindholm, Brett W. Coon, Gary M Tarolli
  • Publication number: 20120079503
    Abstract: One embodiment of the present invention sets forth a technique for scheduling thread execution in a multi-threaded processing environment. A two-level scheduler maintains a small set of active threads called strands to hide function unit pipeline latency and local memory access latency. The strands are a sub-set of a larger set of pending threads that is also maintained by the two-leveler scheduler. Pending threads are promoted to strands and strands are demoted to pending threads based on latency characteristics. The two-level scheduler selects strands for execution based on strand state. The longer latency of the pending threads is hidden by selecting strands for execution. When the latency for a pending thread is expired, the pending thread may be promoted to a strand and begin (or resume) execution. When a strand encounters a latency event, the strand may be demoted to a pending thread while the latency is incurred.
    Type: Application
    Filed: June 1, 2011
    Publication date: March 29, 2012
    Inventors: William James DALLY, Stephen William Keckler, David Tarjan, John Erik Lindholm, Mark Alan Gebhart, Daniel Robert Johnson
  • Publication number: 20120079241
    Abstract: One embodiment of the present invention sets forth a technique for scheduling thread execution in a multi-threaded processing environment. A two-level scheduler maintains a small set of active threads called strands to hide function unit pipeline latency and local memory access latency. The strands are a sub-set of a larger set of pending threads that is also maintained by the two-leveler scheduler. Pending threads are promoted to strands and strands are demoted to pending threads based on latency characteristics, such as whether outstanding load operations have been executed. The longer latency of the pending threads is hidden by selecting strands for execution. When the latency for a pending thread is expired, the pending thread may be promoted to a strand and begin (or resume) execution. When a strand encounters a latency event, the strand may be demoted to a pending thread while the latency is incurred.
    Type: Application
    Filed: September 23, 2011
    Publication date: March 29, 2012
    Inventors: William James DALLY, John Erik Lindholm
  • Publication number: 20120026171
    Abstract: A parallel array architecture for a graphics processor includes a multithreaded core array including a plurality of processing clusters, each processing cluster including at least one processing core operable to execute a pixel shader program that generates pixel data from coverage data; a rasterizer configured to generate coverage data for each of a plurality of pixels; and pixel distribution logic configured to deliver the coverage data from the rasterizer to one of the processing clusters in the multithreaded core array. A crossbar coupled to each of the processing clusters is configured to deliver pixel data from the processing clusters to a frame buffer having a plurality of partitions.
    Type: Application
    Filed: October 7, 2011
    Publication date: February 2, 2012
    Applicant: NVIDIA Corporation
    Inventors: John M. Danskin, John S. Montrym, John Erik Lindholm, Steven E. Molnar, Mark French
  • Publication number: 20120026175
    Abstract: Apparatuses and methods are presented for a hierarchical processor. The processor comprises, at a first level of hierarchy, a plurality of similarly structured first level components, wherein each of the plurality of similarly structured first level components includes at least one combined function module capable of performing multiple classes of graphics operations, each of the multiple classes of graphics operations being associated with a different stage of graphics processing.
    Type: Application
    Filed: October 10, 2011
    Publication date: February 2, 2012
    Applicant: NVIDIA Corporation
    Inventors: John Erik Lindholm, John S. Montrym, Emmett M. Kilgariff, Simon S. Moy, Sean Jeffrey Treichler, Brett W. Coon, David Kirk, John Danskin
  • Patent number: 8108872
    Abstract: Resources to be used by concurrent threads in a multithreaded processor are allocated based on thread types of the threads. For each of at least two thread types, an amount of the resource is reserved, and amounts currently allocated are tracked. When a request to allocate some of the resource to a new thread is received, a determination as to whether the allocation can be made is based on the thread type of the new thread, the amount of the resource reserved for that thread type, and the amount currently allocated to threads of that type.
    Type: Grant
    Filed: October 23, 2006
    Date of Patent: January 31, 2012
    Assignee: NVIDIA Corporation
    Inventors: John Erik Lindholm, Bryon S. Nordquist, Simon S. Moy, Svetoslav D. Tzvetkov
  • Patent number: 8087029
    Abstract: Resources to be used by concurrent threads in a multithreaded processor are allocated based on thread types of the threads, and thread-type-based criteria governing resource allocation decisions are dynamically modified based on feedback information indicating the degree to which various thread types are using the resource. For each of at least two thread types, an amount of the resource is reserved, and amounts currently allocated are tracked. When an allocation request for a new thread is received, the allocation is made or not based on the new thread's type, the amount of the resource reserved for that type, and the amount currently allocated to threads of that type. If, based on feedback information from the allocation decision, the amount of the resource reserved for one thread type is determined to be insufficient, the reserved amounts are modified to better meet the demand.
    Type: Grant
    Filed: October 23, 2006
    Date of Patent: December 27, 2011
    Assignee: NVIDIA Corporation
    Inventors: John Erik Lindholm, Bryon S. Nordquist, Simon S. Moy, Svetoslav D. Tzvetkov
  • Patent number: 8077174
    Abstract: Apparatuses and methods are presented for a hierarchical processor. The processor comprises, at a first level of hierarchy, a plurality of similarly structured first level components, wherein each of the plurality of similarly structured first level components includes at least one combined function module capable of performing multiple classes of graphics operations, each of the multiple classes of graphics operations being associated with a different stage of graphics processing.
    Type: Grant
    Filed: November 1, 2007
    Date of Patent: December 13, 2011
    Assignee: NVIDIA Corporation
    Inventors: John Erik Lindholm, John S. Montrym, Emmett M. Kilgariff, Simon S. Moy, Sean Jeffrey Treichler, Brett W. Coon, David Kirk, John Danskin
  • Patent number: 7979683
    Abstract: Graphics processing elements are capable of processing multiple contexts simultaneously, reducing the need to perform time consuming context switches compared with processing a single context at a time. Processing elements of a graphics processing pipeline may be configured to support all of the multiple contexts or only a portion of the multiple contexts. Each processing element may be allocated to process a particular context or a portion of the multiple contexts in order to simultaneously process more than one context. The allocation of processing elements to the multiple contexts may be determined dynamically in order to improve graphics processing throughput.
    Type: Grant
    Filed: April 5, 2007
    Date of Patent: July 12, 2011
    Assignee: NVIDIA Corporation
    Inventors: John M. Danskin, John Erik Lindholm
  • Patent number: 7949855
    Abstract: A processor buffers asynchronous threads. Instructions requiring operations provided by a plurality of execution units are divided into phases, each phase having at least one computation operation and at least one memory access operation. Instructions within each phase are qualified and prioritized. The instructions may be qualified based on the status of the execution unit needed to execute one or more of the current instructions. The instructions may also be qualified based on an age of each instruction, status of the execution units, a divergence potential, locality, thread diversity, and resource requirements. Qualified instructions may be prioritized based on execution units needed to execute instructions and the execution units in use. One or more of the prioritized instructions is issued per cycle to the plurality of execution units.
    Type: Grant
    Filed: April 28, 2008
    Date of Patent: May 24, 2011
    Assignee: NVIDIA Corporation
    Inventors: Peter C. Mills, John Erik Lindholm, Brett W. Coon, Gary M. Tarolli, John Matthew Burgess
  • Publication number: 20110081100
    Abstract: One embodiment of the present invention sets forth a technique controlling the pixel location at which the plane equation is evaluated. Multiple pixel offsets (dx, dy) may be specified that each define to a sub-pixel sample position. Attributes are then calculated for each sub-pixel sample position that is covered by a geometric primitive. One advantage of the technique is that anti-aliasing quality may be improved since high frequency color components may be selectively supersampled for particular geometric primitives.
    Type: Application
    Filed: October 5, 2010
    Publication date: April 7, 2011
    Inventors: John Erik Lindholm, Henry Packard Moreton, Ming Y. Siu, Stuart F. Oberman
  • Publication number: 20110074802
    Abstract: One embodiment of the present invention sets forth a technique for a program to access multi-dimensional formatted graphics surface memory. Multi-dimensional memory objects called “surfaces” stored in a user-specified data or pixel format and arranged in a graphics optimized layout are accessed by programs using surface instructions. A set of memory access instructions e.g., load, store, reduce, and atomic, referred to as surface instructions, may be used to access the surfaces. Coordinate bounds checking is performed with configurable clamping. Caching behavior may also be specified by the surface instructions. Data format conversion and packing to a specified storage format is supported for store, reduction, and atomic surface instructions. Data format conversion and unpacking from a specified storage format is supported for loads and atomic surface instructions.
    Type: Application
    Filed: September 24, 2010
    Publication date: March 31, 2011
    Inventors: John R. Nickolls, Brian Fahs, Lars Nyland, John Erik Lindholm, Richard Craig Johnson
  • Publication number: 20110072243
    Abstract: One embodiment of the present invention sets forth a technique for collecting operands specified by an instruction. As a sequence of instructions is received the operands specified by the instructions are assigned to ports, so that each one of the operands specified by a single instruction is assigned to a different port. Reading of the operands from a multi-bank register file is scheduled by selecting an operand from each one of the different ports to produce an operand read request and ensuring that two or more of the selected operands are not stored in the same bank of the multi-bank register file. The operands specified by the operand read request are read from the multi-bank register file in a single clock cycle. Each instruction is then executed as the operands specified by the instruction are read from the multi-bank register file and collected over one or more clock cycles.
    Type: Application
    Filed: September 3, 2010
    Publication date: March 24, 2011
    Inventors: Xiaogang Qiu, Ming Y. Siu, Yan Yan Tang, John Erik Lindholm, Michael C. Shebanow, Stuart F. Oberman
  • Publication number: 20110072244
    Abstract: One embodiment of the present invention sets forth a technique for ensuring cache access instructions are scheduled for execution in a multi-threaded system to improve cache locality and system performance. A credit-based technique may be used to control instruction by instruction scheduling for each warp in a group so that the group of warps is processed uniformly. A credit is computed for each warp and the credit contributes to a weight for each warp. The weight is used to select instructions for the warps that are issued for execution.
    Type: Application
    Filed: September 17, 2010
    Publication date: March 24, 2011
    Inventors: John Erik Lindholm, Brett W. Coon, Jered Wierzbicki, Robert J. Stoll, Stuart F. Oberman
  • Publication number: 20110069076
    Abstract: One embodiment of the present invention sets forth a technique for dynamically specifying a texture header and texture sampler using an index. The index corresponds to a particular register value that may be static or computed during execution of a shader program. Any texture operation instruction may specify an index value for each of the texture header and the texture sampler.
    Type: Application
    Filed: August 25, 2010
    Publication date: March 24, 2011
    Inventors: John Erik LINDHOLM, Yan Yan Tang
  • Patent number: 7889208
    Abstract: A system, method and computer program product are provided for computer graphics processing. In use, a value is modified based on an algorithm. An operation is subsequently performed on pixel data taking into account the modified value.
    Type: Grant
    Filed: March 18, 2004
    Date of Patent: February 15, 2011
    Assignee: NVIDIA Corporation
    Inventors: Henry P. Moreton, John Erik Lindholm, Matthew N. Papakipos, Harold Robert Feldman Zatz
  • Patent number: 7877585
    Abstract: One embodiment of a computing system configured to manage divergent threads in a SIMD thread group includes a stack configured to store state information for processing control instructions. A parallel processing unit is configured to perform the steps of determining if one or more threads diverge during execution of a conditional control instruction. A disable mask allows for the use of conditional return and break instructions in a multithreaded SIMD architecture. Additional control instructions are used to set up thread processing target addresses for synchronization, breaks, and returns.
    Type: Grant
    Filed: August 27, 2007
    Date of Patent: January 25, 2011
    Assignee: NVIDIA Corporation
    Inventors: Brett W. Coon, John R. Nickolls, John Erik Lindholm, Svetoslav D. Tzvetkov