Patents by Inventor John Erik Lindholm

John Erik Lindholm has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 9304775
    Abstract: An embodiment of a computing system is configured to process data using a multithreaded SIMD architecture that includes heterogeneous processing engines to execute a program. The program is constructed of various program instructions. A first type of the program instructions can only be executed by a first type of processing engine and a second type of program instructions can only be executed by a second type of processing engine. A third type of program instructions can be executed by the first and the second type of processing engines. An instruction dispatcher is configured to identify and remove program instruction execution conflicts for the heterogeneous processing engines to improve instruction execution throughput.
    Type: Grant
    Filed: November 5, 2007
    Date of Patent: April 5, 2016
    Assignee: NVIDIA Corporation
    Inventors: John Erik Lindholm, Jered Wierzbicki
  • Publication number: 20160071310
    Abstract: An apparatus, computer readable medium, and method are disclosed for performing an intersection query between a query beam and a target bounding volume. The target bounding volume may comprise an axis-aligned bounding box (AABB) associated with a bounding volume hierarchy (BVH) tree. An intersection query comprising beam information associated with the query beam and slab boundary information for a first dimension of a target bounding volume is received. Intersection parameter values are calculated for the first dimension based on the beam information and the slab boundary information and a slab intersection case is determined for the first dimension based on the beam information. A parametric variable range for the first dimension is assigned based on the slab intersection case and the intersection parameter values and it is determined whether the query beam intersects the target bounding volume based on at least the parametric variable range for the first dimension.
    Type: Application
    Filed: March 18, 2015
    Publication date: March 10, 2016
    Inventors: Tero Tapani Karras, Timo Oskari Aila, Samuli Matias Laine, John Erik Lindholm
  • Patent number: 9195460
    Abstract: Systems and methods for compiling programs using condition codes and executing those programs when non-numeric values are present allow for explicit handling of non-numeric values. In addition to the conventional condition code values of positive, negative, and zero, a fourth value may be encoded, not a number (NaN) representing a non-numeric value. New condition tests are defined that explicitly account for condition code values of NaN. A compiler may produce code using the new condition tests to represent if and if-else statements. The code including the new condition tests generates deterministic results during execution when non-numeric values are present.
    Type: Grant
    Filed: May 2, 2006
    Date of Patent: November 24, 2015
    Assignee: NVIDIA CORPORATION
    Inventors: Robert Steven Glanville, John Erik Lindholm, Ming Y. Siu
  • Patent number: 9189242
    Abstract: One embodiment of the present invention sets forth a technique for ensuring cache access instructions are scheduled for execution in a multi-threaded system to improve cache locality and system performance. A credit-based technique may be used to control instruction by instruction scheduling for each warp in a group so that the group of warps is processed uniformly. A credit is computed for each warp and the credit contributes to a weight for each warp. The weight is used to select instructions for the warps that are issued for execution.
    Type: Grant
    Filed: September 17, 2010
    Date of Patent: November 17, 2015
    Assignee: NVIDIA Corporation
    Inventors: John Erik Lindholm, Brett W. Coon, Jered Wierzbicki, Robert J. Stoll, Stuart F. Oberman
  • Patent number: 9171525
    Abstract: A processor and a system are provided for performing texturing operations loaded from a texture queue that provides temporary storage of texture coordinates and texture values. The processor includes a texture queue implemented in a memory of the processor, a crossbar coupled to the texture queue, and one or more texture units coupled to the texture queue via the crossbar. The crossbar is configured to reorder texture coordinates for consumption by the one or more texture units and to reorder texture values received from the one or more texture units.
    Type: Grant
    Filed: February 26, 2013
    Date of Patent: October 27, 2015
    Assignee: NVIDIA Corporation
    Inventor: John Erik Lindholm
  • Patent number: 9158595
    Abstract: One embodiment sets forth a technique for scheduling the execution of ordered critical code sections by multiple threads. A multithreaded processor includes an instruction scheduling unit that is configured to schedule threads to process ordered critical code sections. A ordered critical code section is preceded by a barrier instruction and when all of the threads have reached the barrier instruction, the instruction scheduling unit controls the thread execution order by selecting each thread for execution based on logical identifiers associated with the threads. The logical identifiers are mapped to physical identifiers that are referenced by the multithreaded processor during execution of the threads. The logical identifiers are used by the instruction scheduling unit to control the order in which the threads execute the ordered critical code section.
    Type: Grant
    Filed: October 25, 2012
    Date of Patent: October 13, 2015
    Assignee: NVIDIA Corporation
    Inventors: John Erik Lindholm, Tero Tapani Karras, Samuli Matias Laine, Timo Aila
  • Publication number: 20150205606
    Abstract: In one embodiment of the present invention, a streaming multiprocessor (SM) uses a tree of nodes to manage threads. Each node specifies a set of active threads and a program counter. Upon encountering a conditional instruction that causes an execution path to diverge, the SM creates child nodes corresponding to each of the divergent execution paths. Based on the conditional instruction, the SM assigns each active thread included in the parent node to at most one child node, and the SM temporarily discontinues executing instructions specified by the parent node. Instead, the SM concurrently executes instructions specified by the child nodes. After all the divergent paths reconverge to the parent path, the SM resumes executing instructions specified by the parent node. Advantageously, the disclosed techniques enable the SM to execute divergent paths in parallel, thereby reducing undesirable program behavior associated with conventional techniques that serialize divergent paths across thread groups.
    Type: Application
    Filed: January 21, 2014
    Publication date: July 23, 2015
    Applicant: NVIDIA CORPORATION
    Inventors: John Erik LINDHOLM, Michael C. SHEBANOW
  • Publication number: 20150205607
    Abstract: In one embodiment of the present invention, a streaming multiprocessor (SM) uses a tree of nodes to manage threads. Each node specifies a set of active threads and a program counter. Upon encountering a conditional instruction that causes an execution path to diverge, the SM creates child nodes corresponding to each of the divergent execution paths. Based on the conditional instruction, the SM assigns each active thread included in the parent node to at most one child node, and the SM temporarily discontinues executing instructions specified by the parent node. Instead, the SM concurrently executes instructions specified by the child nodes. After all the divergent paths reconverge to the parent path, the SM resumes executing instructions specified by the parent node. Advantageously, the disclosed techniques enable the SM to execute divergent paths in parallel, thereby reducing undesirable program behavior associated with conventional techniques that serialize divergent paths across thread groups.
    Type: Application
    Filed: January 21, 2014
    Publication date: July 23, 2015
    Applicant: NVIDIA CORPORATION
    Inventor: John Erik LINDHOLM
  • Patent number: 9058672
    Abstract: One embodiment of the present invention sets forth a technique controlling the pixel location at which the plane equation is evaluated. Multiple pixel offsets (dx, dy) may be specified that each define to a sub-pixel sample position. Attributes are then calculated for each sub-pixel sample position that is covered by a geometric primitive. One advantage of the technique is that anti-aliasing quality may be improved since high frequency color components may be selectively supersampled for particular geometric primitives.
    Type: Grant
    Filed: October 5, 2010
    Date of Patent: June 16, 2015
    Assignee: NVIDIA Corporation
    Inventors: John Erik Lindholm, Henry Packard Moreton, Ming Y. Siu, Stuart F. Oberman
  • Patent number: 8976195
    Abstract: One embodiment of the present invention sets forth a technique for generating a batch clip state stored in clip state machine (CSM) associated with a batch of vertices. Per-vertex clip state is generated for each vertex in the batch of vertices based on the position of each vertex relative to each clip plane. For a given vertex, per-vertex clip state indicates whether the vertex is inside or outside each of the one or more clip planes. The per-vertex clip states of all the vertices in the batch of vertices are coalesced into a batch clip state by determining whether each vertex in the batch of vertices is inside every clip plane, each vertex is outside at least one clip plane or neither. The batch clip state is stored in the CSM associated with the thread group that processes the batch of vertices that can be accessed by further stages of the graphics pipeline.
    Type: Grant
    Filed: October 14, 2009
    Date of Patent: March 10, 2015
    Assignee: NVIDIA Corporation
    Inventors: John Erik Lindholm, Ziyad S. Hakura
  • Patent number: 8949841
    Abstract: A streaming multiprocessor (SM) in a parallel processing subsystem schedules priority among a plurality of threads. The SM retrieves a priority descriptor associated with a thread group, and determines whether the thread group and a second thread group are both operating in the same phase. If so, then the method determines whether the priority descriptor of the thread group indicates a higher priority than the priority descriptor of the second thread group. If so, the SM skews the thread group relative to the second thread group such that the thread groups operate in different phases, otherwise the SM increases the priority of the thread group. f the thread groups are not operating in the same phase, then the SM increases the priority of the thread group. One advantage of the disclosed techniques is that thread groups execute with increased efficiency, resulting in improved processor performance.
    Type: Grant
    Filed: December 27, 2012
    Date of Patent: February 3, 2015
    Assignee: NVIDIA Corporation
    Inventors: Jack Hilaire Choquette, Olivier Giroux, Robert J. Stoll, Gary M. Tarolli, John Erik Lindholm
  • Patent number: 8860737
    Abstract: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.
    Type: Grant
    Filed: July 19, 2006
    Date of Patent: October 14, 2014
    Assignee: NVIDIA Corporation
    Inventors: John Erik Lindholm, Brett W. Coon, Stuart F. Oberman, Ming Y. Siu, Matthew P. Gerlach
  • Publication number: 20140285500
    Abstract: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.
    Type: Application
    Filed: March 25, 2013
    Publication date: September 25, 2014
    Applicant: NVIDIA Corporation
    Inventors: John Erik LINDHOLM, Brett W. COON, Stuart F. OBERMAN, Ming Y. SIU, Matthew P. GERLACH
  • Publication number: 20140282566
    Abstract: A method and a system are provided for hardware scheduling of indexed barrier instructions. Execution of a plurality of threads to process instructions of a program that includes a barrier instruction is initiated and when each thread reaches the barrier instruction, the thread pauses execution of the instructions. A first sub-group of threads in the plurality of threads is associated with a first sub-barrier index and a second sub-group of threads in the plurality of threads is associated with a second sub-barrier index. When the barrier instruction can be scheduled for execution, threads in the first sub-group are executed serially and threads in the second sub-group are executed serially and at least one thread in the first sub-group is executed in parallel with at least one thread in the second sub-group.
    Type: Application
    Filed: March 15, 2013
    Publication date: September 18, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: John Erik Lindholm, Tero Tapani Karras
  • Publication number: 20140258693
    Abstract: A method and a system are provided for hardware scheduling of barrier instructions. Execution of a plurality of threads to process instructions of a program that includes a barrier instruction is initiated, and when each thread reaches the barrier instruction during execution of program, it is determined whether the thread participates in the barrier instruction. The threads that participate in the barrier instruction are then serially executed to process one or more instructions of the program that follow the barrier instruction. A method and system are also provided for impatient scheduling of barrier instructions. When a portion of the threads that is greater than a minimum number of threads and less than all of the threads in the plurality of threads reaches the barrier instruction each of the threads in the portion is serially executed to process one or more instructions of the program that follow the barrier instruction.
    Type: Application
    Filed: March 11, 2013
    Publication date: September 11, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: John Erik Lindholm, Tero Tapani Karras, Timo Oskari Aila, Samuli Matias Laine
  • Publication number: 20140240337
    Abstract: A processor and a system are provided for performing texturing operations loaded from a texture queue that provides temporary storage of texture coordinates and texture values. The processor includes a texture queue implemented in a memory of the processor, a crossbar coupled to the texture queue, and one or more texture units coupled to the texture queue via the crossbar. The crossbar is configured to reorder texture coordinates for consumption by the one or more texture units and to reorder texture values received from the one or more texture units.
    Type: Application
    Filed: February 26, 2013
    Publication date: August 28, 2014
    Applicant: NVIDIA CORPORATION
    Inventor: John Erik Lindholm
  • Publication number: 20140189698
    Abstract: A streaming multiprocessor (SM) in a parallel processing subsystem schedules priority among a plurality of threads. The SM retrieves a priority descriptor associated with a thread group, and determines whether the thread group and a second thread group are both operating in the same phase. If so, then the method determines whether the priority descriptor of the thread group indicates a higher priority than the priority descriptor of the second thread group. If so, the SM skews the thread group relative to the second thread group such that the thread groups operate in different phases, otherwise the SM increases the priority of the thread group. f the thread groups are not operating in the same phase, then the SM increases the priority of the thread group. One advantage of the disclosed techniques is that thread groups execute with increased efficiency, resulting in improved processor performance.
    Type: Application
    Filed: December 27, 2012
    Publication date: July 3, 2014
    Applicant: NVIDIA Corporation
    Inventors: Jack Hilaire CHOQUETTE, Olivier GIROUX, Robert J. STOLL, Gary M. TAROLLI, John Erik LINDHOLM
  • Patent number: 8730249
    Abstract: A parallel array architecture for a graphics processor includes a multithreaded core array including a plurality of processing clusters, each processing cluster including at least one processing core operable to execute a pixel shader program that generates pixel data from coverage data; a rasterizer configured to generate coverage data for each of a plurality of pixels; and pixel distribution logic configured to deliver the coverage data from the rasterizer to one of the processing clusters in the multithreaded core array. A crossbar coupled to each of the processing clusters is configured to deliver pixel data from the processing clusters to a frame buffer having a plurality of partitions.
    Type: Grant
    Filed: October 7, 2011
    Date of Patent: May 20, 2014
    Assignee: NVIDIA Corporation
    Inventors: John M. Danskin, John S. Montrym, John Erik Lindholm, Steven E. Molnar, Mark French
  • Patent number: 8730252
    Abstract: A system, method and computer program product are provided for bump mapping in a hardware graphics processor. Initially, a first set of texture coordinates is received. The texture coordinates are then multiplied by a matrix to generate results. A second set of texture coordinates is then offset utilizing the results. The offset second set of texture coordinates is then mapped to color.
    Type: Grant
    Filed: March 31, 2004
    Date of Patent: May 20, 2014
    Assignee: NVIDIA Corporation
    Inventors: Henry P. Moreton, John Erik Lindholm, Matthew N. Papakipos, Harold Robert Feldman Zatz
  • Patent number: 8732711
    Abstract: One embodiment of the present invention sets forth a technique for scheduling thread execution in a multi-threaded processing environment. A two-level scheduler maintains a small set of active threads called strands to hide function unit pipeline latency and local memory access latency. The strands are a sub-set of a larger set of pending threads that is also maintained by the two-leveler scheduler. Pending threads are promoted to strands and strands are demoted to pending threads based on latency characteristics. The two-level scheduler selects strands for execution based on strand state. The longer latency of the pending threads is hidden by selecting strands for execution. When the latency for a pending thread is expired, the pending thread may be promoted to a strand and begin (or resume) execution. When a strand encounters a latency event, the strand may be demoted to a pending thread while the latency is incurred.
    Type: Grant
    Filed: June 1, 2011
    Date of Patent: May 20, 2014
    Assignee: NVIDIA Corporation
    Inventors: William James Dally, Stephen William Keckler, David Tarjan, John Erik Lindholm, Mark Alan Gebhart, Daniel Robert Johnson