Patents by Inventor John Erik Lindholm

John Erik Lindholm has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10983699
    Abstract: A queue manager apparatus converts inbound commands of a first width into scalar format commands to be queued in a command queue. Furthermore, the queue manager converts the scalar format commands residing in the command queue into outbound commands of a second width for transmission. Converting inbound commands to scalar format commands and then converting the scalar format commands to a target width for transmission allows the queue manager to advantageously provide efficient and programmable command transmission between arbitrary processing units, regardless of potentially mismatched native command widths.
    Type: Grant
    Filed: October 25, 2019
    Date of Patent: April 20, 2021
    Assignee: NVIDIA Corporation
    Inventor: John Erik Lindholm
  • Publication number: 20200057560
    Abstract: A queue manager apparatus converts inbound commands of a first width into scalar format commands to be queued in a command queue. Furthermore, the queue manager converts the scalar format commands residing in the command queue into outbound commands of a second width for transmission. Converting inbound commands to scalar format commands and then converting the scalar format commands to a target width for transmission allows the queue manager to advantageously provide efficient and programmable command transmission between arbitrary processing units, regardless of potentially mismatched native command widths.
    Type: Application
    Filed: October 25, 2019
    Publication date: February 20, 2020
    Inventor: John Erik Lindholm
  • Patent number: 10489056
    Abstract: A queue manager apparatus converts inbound commands of a first width into scalar format commands to be queued in a command queue. Furthermore, the queue manager converts the scalar format commands residing in the command queue into outbound commands of a second width for transmission. Converting inbound commands to scalar format commands and then converting the scalar format commands to a target width for transmission allows the queue manager to advantageously provide efficient and programmable command transmission between arbitrary processing units, regardless of potentially mismatched native command widths.
    Type: Grant
    Filed: November 9, 2017
    Date of Patent: November 26, 2019
    Assignee: NVIDIA Corporation
    Inventor: John Erik Lindholm
  • Patent number: 10346212
    Abstract: A streaming multiprocessor (SM) in a parallel processing subsystem schedules priority among a plurality of threads. The SM retrieves a priority descriptor associated with a thread group, and determines whether the thread group and a second thread group are both operating in the same phase. If so, then the method determines whether the priority descriptor of the thread group indicates a higher priority than the priority descriptor of the second thread group. If so, the SM skews the thread group relative to the second thread group such that the thread groups operate in different phases, otherwise the SM increases the priority of the thread group. f the thread groups are not operating in the same phase, then the SM increases the priority of the thread group. One advantage of the disclosed techniques is that thread groups execute with increased efficiency, resulting in improved processor performance.
    Type: Grant
    Filed: February 3, 2015
    Date of Patent: July 9, 2019
    Assignee: NVIDIA CORPORATION
    Inventors: Jack Hilaire Choquette, Olivier Giroux, Robert J. Stoll, Gary M. Tarolli, John Erik Lindholm
  • Publication number: 20190138210
    Abstract: A queue manager apparatus converts inbound commands of a first width into scalar format commands to be queued in a command queue. Furthermore, the queue manager converts the scalar format commands residing in the command queue into outbound commands of a second width for transmission. Converting inbound commands to scalar format commands and then converting the scalar format commands to a target width for transmission allows the queue manager to advantageously provide efficient and programmable command transmission between arbitrary processing units, regardless of potentially mismatched native command widths.
    Type: Application
    Filed: November 9, 2017
    Publication date: May 9, 2019
    Inventor: John Erik Lindholm
  • Patent number: 10242485
    Abstract: An apparatus, computer readable medium, and method are disclosed for performing an intersection query between a query beam and a target bounding volume. The target bounding volume may comprise an axis-aligned bounding box (AABB) associated with a bounding volume hierarchy (BVH) tree. An intersection query comprising beam information associated with the query beam and slab boundary information for a first dimension of a target bounding volume is received. Intersection parameter values are calculated for the first dimension based on the beam information and the slab boundary information and a slab intersection case is determined for the first dimension based on the beam information. A parametric variable range for the first dimension is assigned based on the slab intersection case and the intersection parameter values and it is determined whether the query beam intersects the target bounding volume based on at least the parametric variable range for the first dimension.
    Type: Grant
    Filed: December 28, 2016
    Date of Patent: March 26, 2019
    Assignee: NVIDIA CORPORATION
    Inventors: Tero Tapani Karras, Timo Oskari Aila, Samuli Matias Laine, John Erik Lindholm
  • Patent number: 10217184
    Abstract: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.
    Type: Grant
    Filed: May 23, 2017
    Date of Patent: February 26, 2019
    Assignee: NVIDIA CORPORATION
    Inventors: John Erik Lindholm, Brett W. Coon, Stuart F. Oberman, Ming Y. Siu, Matthew P. Gerlach
  • Publication number: 20180182158
    Abstract: An apparatus, computer readable medium, and method are disclosed for performing an intersection query between a query beam and a target bounding volume. The target bounding volume may comprise an axis-aligned bounding box (AABB) associated with a bounding volume hierarchy (BVH) tree. An intersection query comprising beam information associated with the query beam and slab boundary information for a first dimension of a target bounding volume is received. Intersection parameter values are calculated for the first dimension based on the beam information and the slab boundary information and a slab intersection case is determined for the first dimension based on the beam information. A parametric variable range for the first dimension is assigned based on the slab intersection case and the intersection parameter values and it is determined whether the query beam intersects the target bounding volume based on at least the parametric variable range for the first dimension.
    Type: Application
    Filed: December 28, 2016
    Publication date: June 28, 2018
    Inventors: Tero Tapani Karras, Timo Oskari Aila, Samuli Matias Laine, John Erik Lindholm
  • Patent number: 9921847
    Abstract: In one embodiment of the present invention, a streaming multiprocessor (SM) uses a tree of nodes to manage threads. Each node specifies a set of active threads and a program counter. Upon encountering a conditional instruction that causes an execution path to diverge, the SM creates child nodes corresponding to each of the divergent execution paths. Based on the conditional instruction, the SM assigns each active thread included in the parent node to at most one child node, and the SM temporarily discontinues executing instructions specified by the parent node. Instead, the SM concurrently executes instructions specified by the child nodes. After all the divergent paths reconverge to the parent path, the SM resumes executing instructions specified by the parent node. Advantageously, the disclosed techniques enable the SM to execute divergent paths in parallel, thereby reducing undesirable program behavior associated with conventional techniques that serialize divergent paths across thread groups.
    Type: Grant
    Filed: January 21, 2014
    Date of Patent: March 20, 2018
    Assignee: NVIDIA Corporation
    Inventor: John Erik Lindholm
  • Patent number: 9830161
    Abstract: In one embodiment of the present invention, a streaming multiprocessor (SM) uses a tree of nodes to manage threads. Each node specifies a set of active threads and a program counter. Upon encountering a conditional instruction that causes an execution path to diverge, the SM creates child nodes corresponding to each of the divergent execution paths. Based on the conditional instruction, the SM assigns each active thread included in the parent node to at most one child node, and the SM temporarily discontinues executing instructions specified by the parent node. Instead, the SM concurrently executes instructions specified by the child nodes. After all the divergent paths reconverge to the parent path, the SM resumes executing instructions specified by the parent node. Advantageously, the disclosed techniques enable the SM to execute divergent paths in parallel, thereby reducing undesirable program behavior associated with conventional techniques that serialize divergent paths across thread groups.
    Type: Grant
    Filed: January 21, 2014
    Date of Patent: November 28, 2017
    Assignee: NVIDIA Corporation
    Inventors: John Erik Lindholm, Michael C. Shebanow
  • Publication number: 20170256022
    Abstract: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.
    Type: Application
    Filed: May 23, 2017
    Publication date: September 7, 2017
    Inventors: John Erik LINDHOLM, Brett W. COON, Stuart F. OBERMAN, Ming Y. SIU, Matthew P. GERLACH
  • Publication number: 20170192822
    Abstract: A streaming multiprocessor (SM) in a parallel processing subsystem schedules priority among a plurality of threads. The SM retrieves a priority descriptor associated with a thread group, and determines whether the thread group and a second thread group are both operating in the same phase. If so, then the method determines whether the priority descriptor of the thread group indicates a higher priority than the priority descriptor of the second thread group. If so, the SM skews the thread group relative to the second thread group such that the thread groups operate in different phases, otherwise the SM increases the priority of the thread group. f the thread groups are not operating in the same phase, then the SM increases the priority of the thread group. One advantage of the disclosed techniques is that thread groups execute with increased efficiency, resulting in improved processor performance.
    Type: Application
    Filed: February 3, 2015
    Publication date: July 6, 2017
    Inventors: Jack Hilaire CHOQUETTE, Olivier GIROUX, Robert J. STOLL, Gary M. TAROLLI, John Erik LINDHOLM
  • Patent number: 9659339
    Abstract: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.
    Type: Grant
    Filed: March 25, 2013
    Date of Patent: May 23, 2017
    Assignee: NVIDIA CORPORATION
    Inventors: John Erik Lindholm, Brett W. Coon, Stuart F. Oberman, Ming Y. Siu, Matthew P. Gerlach
  • Patent number: 9639365
    Abstract: An indirect branch instruction takes an address register as an argument in order to provide indirect function call capability for single-instruction multiple-thread (SIMT) processor architectures. The indirect branch instruction is used to implement indirect function calls, virtual function calls, and switch statements to improve processing performance compared with using sequential chains of tests and branches.
    Type: Grant
    Filed: November 12, 2012
    Date of Patent: May 2, 2017
    Assignee: NVIDIA Corporation
    Inventors: Brett W. Coon, John R. Nickolls, Lars Nyland, Peter C. Mills, John Erik Lindholm
  • Patent number: 9569559
    Abstract: An apparatus, computer readable medium, and method are disclosed for performing an intersection query between a query beam and a target bounding volume. The target bounding volume may comprise an axis-aligned bounding box (AABB) associated with a bounding volume hierarchy (BVH) tree. An intersection query comprising beam information associated with the query beam and slab boundary information for a first dimension of a target bounding volume is received. Intersection parameter values are calculated for the first dimension based on the beam information and the slab boundary information and a slab intersection case is determined for the first dimension based on the beam information. A parametric variable range for the first dimension is assigned based on the slab intersection case and the intersection parameter values and it is determined whether the query beam intersects the target bounding volume based on at least the parametric variable range for the first dimension.
    Type: Grant
    Filed: March 18, 2015
    Date of Patent: February 14, 2017
    Assignee: NVIDIA Corporation
    Inventors: Tero Tapani Karras, Timo Oskari Aila, Samuli Matias Laine, John Erik Lindholm
  • Patent number: 9519947
    Abstract: One embodiment of the present invention sets forth a technique for a program to access multi-dimensional formatted graphics surface memory. Multi-dimensional memory objects called “surfaces” stored in a user-specified data or pixel format and arranged in a graphics optimized layout are accessed by programs using surface instructions. A set of memory access instructions e.g., load, store, reduce, and atomic, referred to as surface instructions, may be used to access the surfaces. Coordinate bounds checking is performed with configurable clamping. Caching behavior may also be specified by the surface instructions. Data format conversion and packing to a specified storage format is supported for store, reduction, and atomic surface instructions. Data format conversion and unpacking from a specified storage format is supported for loads and atomic surface instructions.
    Type: Grant
    Filed: September 24, 2010
    Date of Patent: December 13, 2016
    Assignee: NVIDIA Corporation
    Inventors: John R. Nickolls, Brian Fahs, Lars Nyland, John Erik Lindholm, Richard Craig Johnson
  • Publication number: 20160300319
    Abstract: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.
    Type: Application
    Filed: March 25, 2013
    Publication date: October 13, 2016
    Applicant: NVIDIA Corporation
    Inventors: John Erik LINDHOLM, Brett W. COON, Stuart F. OBERMAN, Ming Y. SIU, Matthew P. GERLACH
  • Patent number: 9448803
    Abstract: A method and a system are provided for hardware scheduling of barrier instructions. Execution of a plurality of threads to process instructions of a program that includes a barrier instruction is initiated, and when each thread reaches the barrier instruction during execution of program, it is determined whether the thread participates in the barrier instruction. The threads that participate in the barrier instruction are then serially executed to process one or more instructions of the program that follow the barrier instruction. A method and system are also provided for impatient scheduling of barrier instructions. When a portion of the threads that is greater than a minimum number of threads and less than all of the threads in the plurality of threads reaches the barrier instruction each of the threads in the portion is serially executed to process one or more instructions of the program that follow the barrier instruction.
    Type: Grant
    Filed: March 11, 2013
    Date of Patent: September 20, 2016
    Assignee: NVIDIA Corporation
    Inventors: John Erik Lindholm, Tero Tapani Karras, Timo Oskari Aila, Samuli Matias Laine
  • Patent number: 9442755
    Abstract: A method and a system are provided for hardware scheduling of indexed barrier instructions. Execution of a plurality of threads to process instructions of a program that includes a barrier instruction is initiated and when each thread reaches the barrier instruction, the thread pauses execution of the instructions. A first sub-group of threads in the plurality of threads is associated with a first sub-barrier index and a second sub-group of threads in the plurality of threads is associated with a second sub-barrier index. When the barrier instruction can be scheduled for execution, threads in the first sub-group are executed serially and threads in the second sub-group are executed serially and at least one thread in the first sub-group is executed in parallel with at least one thread in the second sub-group.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: September 13, 2016
    Assignee: NVIDIA Corporation
    Inventors: John Erik Lindholm, Tero Tapani Karras
  • Publication number: 20160224386
    Abstract: A streaming multiprocessor (SM) in a parallel processing subsystem schedules priority among a plurality of threads. The SM retrieves a priority descriptor associated with a thread group, and determines whether the thread group and a second thread group are both operating in the same phase. If so, then the method determines whether the priority descriptor of the thread group indicates a higher priority than the priority descriptor of the second thread group. If so, the SM skews the thread group relative to the second thread group such that the thread groups operate in different phases, otherwise the SM increases the priority of the thread group. f the thread groups are not operating in the same phase, then the SM increases the priority of the thread group. One advantage of the disclosed techniques is that thread groups execute with increased efficiency, resulting in improved processor performance.
    Type: Application
    Filed: February 3, 2015
    Publication date: August 4, 2016
    Inventors: Jack Hilaire CHOQUETTE, Olivier GIROUX, Robert J. STOLL, Gary M. TAROLLI, John Erik LINDHOLM