Patents by Inventor Brett W. Coon

Brett W. Coon has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Instructions for managing a parallel cache hierarchy

Patent number: 10365930

Abstract: A technique for managing a parallel cache hierarchy that includes receiving an instruction from a scheduler unit, where the instruction comprises a load instruction or a store instruction; determining that the instruction includes a cache operations modifier that identifies a policy for caching data associated with the instruction at one or more levels of the parallel cache hierarchy; and executing the instruction and caching the data associated with the instruction based on the cache operations modifier.

Type: Grant

Filed: May 1, 2017

Date of Patent: July 30, 2019

Assignee: NVIDIA CORPORATION

Inventors: John R. Nickolls, Brett W. Coon, Michael C. Shebanow
Programmable graphics processor for multithreaded execution of programs

Patent number: 10217184

Abstract: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.

Type: Grant

Filed: May 23, 2017

Date of Patent: February 26, 2019

Assignee: NVIDIA CORPORATION

Inventors: John Erik Lindholm, Brett W. Coon, Stuart F. Oberman, Ming Y. Siu, Matthew P. Gerlach
Cache operations and policies for a multi-threaded client

Patent number: 9952977

Abstract: A method for managing a parallel cache hierarchy in a processing unit. The method including receiving an instruction that includes a cache operations modifier that identifies a level of the parallel cache hierarchy in which to cache data associated with the instruction; and implementing a cache replacement policy based on the cache operations modifier.

Type: Grant

Filed: September 24, 2010

Date of Patent: April 24, 2018

Assignee: NVIDIA CORPORATION

Inventors: Steven James Heinrich, Alexander L. Minkin, Brett W. Coon, Rajeshwaran Selvanesan, Robert Steven Glanville, Charles McCarver, Anjana Rajendran, Stewart Glenn Carlton, John R. Nickolls, Brian Fahs
Cooperative thread array reduction and scan operations

Patent number: 9830197

Abstract: One embodiment of the present invention sets forth a technique for performing aggregation operations across multiple threads that execute independently. Aggregation is specified as part of a barrier synchronization or barrier arrival instruction, where in addition to performing the barrier synchronization or arrival, the instruction aggregates (using reduction or scan operations) values supplied by each thread. When a thread executes the barrier aggregation instruction the thread contributes to a scan or reduction result, and waits to execute any more instructions until after all of the threads have executed the barrier aggregation instruction. A reduction result is communicated to each thread after all of the threads have executed the barrier aggregation instruction and a scan result is communicated to each thread as the barrier aggregation instruction is executed by the thread.

Type: Grant

Filed: August 16, 2016

Date of Patent: November 28, 2017

Assignee: NVIDIA Corporation

Inventors: Brian Fahs, Ming Y Siu, Brett W. Coon, John R. Nickolls, Lars Nyland
PROGRAMMABLE GRAPHICS PROCESSOR FOR MULTITHREADED EXECUTION OF PROGRAMS

Publication number: 20170256022

Abstract: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.

Type: Application

Filed: May 23, 2017

Publication date: September 7, 2017

Inventors: John Erik LINDHOLM, Brett W. COON, Stuart F. OBERMAN, Ming Y. SIU, Matthew P. GERLACH
INSTRUCTIONS FOR MANAGING A PARALLEL CACHE HIERARCHY

Publication number: 20170235581

Abstract: A technique for managing a parallel cache hierarchy that includes receiving an instruction from a scheduler unit, where the instruction comprises a load instruction or a store instruction; determining that the instruction includes a cache operations modifier that identifies a policy for caching data associated with the instruction at one or more levels of the parallel cache hierarchy; and executing the instruction and caching the data associated with the instruction based on the cache operations modifier.

Type: Application

Filed: May 1, 2017

Publication date: August 17, 2017

Inventors: John R. NICKOLLS, Brett W. Coon, Michael C. Shebanow
Programmable graphics processor for multithreaded execution of programs

Patent number: 9659339

Abstract: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.

Type: Grant

Filed: March 25, 2013

Date of Patent: May 23, 2017

Assignee: NVIDIA CORPORATION

Inventors: John Erik Lindholm, Brett W. Coon, Stuart F. Oberman, Ming Y. Siu, Matthew P. Gerlach
Indirect function call instructions in a synchronous parallel thread processor

Patent number: 9639365

Abstract: An indirect branch instruction takes an address register as an argument in order to provide indirect function call capability for single-instruction multiple-thread (SIMT) processor architectures. The indirect branch instruction is used to implement indirect function calls, virtual function calls, and switch statements to improve processing performance compared with using sequential chains of tests and branches.

Type: Grant

Filed: November 12, 2012

Date of Patent: May 2, 2017

Assignee: NVIDIA Corporation

Inventors: Brett W. Coon, John R. Nickolls, Lars Nyland, Peter C. Mills, John Erik Lindholm
Instructions for managing a parallel cache hierarchy

Patent number: 9639479

Abstract: A method for managing a parallel cache hierarchy in a processing unit. The method includes receiving an instruction from a scheduler unit, where the instruction comprises a load instruction or a store instruction; determining that the instruction includes a cache operations modifier that identifies a policy for caching data associated with the instruction at one or more levels of the parallel cache hierarchy; and executing the instruction and caching the data associated with the instruction based on the cache operations modifier.

Type: Grant

Filed: September 22, 2010

Date of Patent: May 2, 2017

Assignee: NVIDIA Corporation

Inventors: John R. Nickolls, Brett W. Coon, Michael C. Shebanow
COOPERATIVE THREAD ARRAY REDUCTION AND SCAN OPERATIONS

Publication number: 20160357560

Abstract: One embodiment of the present invention sets forth a technique for performing aggregation operations across multiple threads that execute independently. Aggregation is specified as part of a barrier synchronization or barrier arrival instruction, where in addition to performing the barrier synchronization or arrival, the instruction aggregates (using reduction or scan operations) values supplied by each thread. When a thread executes the barrier aggregation instruction the thread contributes to a scan or reduction result, and waits to execute any more instructions until after all of the threads have executed the barrier aggregation instruction. A reduction result is communicated to each thread after all of the threads have executed the barrier aggregation instruction and a scan result is communicated to each thread as the barrier aggregation instruction is executed by the thread.

Type: Application

Filed: August 16, 2016

Publication date: December 8, 2016

Inventors: Brian FAHS, Ming Y. SIU, Brett W. Coon, John R. NICKOLLS, Lars NYLAND
PROGRAMMABLE GRAPHICS PROCESSOR FOR MULTITHREADED EXECUTION OF PROGRAMS

Publication number: 20160300319

Abstract: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.

Type: Application

Filed: March 25, 2013

Publication date: October 13, 2016

Applicant: NVIDIA Corporation

Inventors: John Erik LINDHOLM, Brett W. COON, Stuart F. OBERMAN, Ming Y. SIU, Matthew P. GERLACH
Cooperative thread array reduction and scan operations

Patent number: 9417875

Abstract: One embodiment of the present invention sets forth a technique for performing aggregation operations across multiple threads that execute independently. Aggregation is specified as part of a barrier synchronization or barrier arrival instruction, where in addition to performing the barrier synchronization or arrival, the instruction aggregates (using reduction or scan operations) values supplied by each thread. When a thread executes the barrier aggregation instruction the thread contributes to a scan or reduction result, and waits to execute any more instructions until after all of the threads have executed the barrier aggregation instruction. A reduction result is communicated to each thread after all of the threads have executed the barrier aggregation instruction and a scan result is communicated to each thread as the barrier aggregation instruction is executed by the thread.

Type: Grant

Filed: September 12, 2013

Date of Patent: August 16, 2016

Assignee: NVIDIA CORPORATION

Inventors: Brian Fahs, Ming Y. Siu, Brett W. Coon, John R. Nickolls, Lars Nyland
Coalescing memory barrier operations across multiple parallel threads

Patent number: 9223578

Abstract: One embodiment of the present invention sets forth a technique for coalescing memory barrier operations across multiple parallel threads. Memory barrier requests from a given parallel thread processing unit are coalesced to reduce the impact to the rest of the system. Additionally, memory barrier requests may specify a level of a set of threads with respect to which the memory transactions are committed. For example, a first type of memory barrier instruction may commit the memory transactions to a level of a set of cooperating threads that share an L1 (level one) cache. A second type of memory barrier instruction may commit the memory transactions to a level of a set of threads sharing a global memory. Finally, a third type of memory barrier instruction may commit the memory transactions to a system level of all threads sharing all system memories. The latency required to execute the memory barrier instruction varies based on the type of memory barrier instruction.

Type: Grant

Filed: September 21, 2010

Date of Patent: December 29, 2015

Assignee: NVIDIA Corporation

Inventors: John R. Nickolls, Steven James Heinrich, Brett W. Coon, Michael C. Shebanow
Credit-based streaming multiprocessor warp scheduling

Patent number: 9189242

Abstract: One embodiment of the present invention sets forth a technique for ensuring cache access instructions are scheduled for execution in a multi-threaded system to improve cache locality and system performance. A credit-based technique may be used to control instruction by instruction scheduling for each warp in a group so that the group of warps is processed uniformly. A credit is computed for each warp and the credit contributes to a weight for each warp. The weight is used to select instructions for the warps that are issued for execution.

Type: Grant

Filed: September 17, 2010

Date of Patent: November 17, 2015

Assignee: NVIDIA Corporation

Inventors: John Erik Lindholm, Brett W. Coon, Jered Wierzbicki, Robert J. Stoll, Stuart F. Oberman
Programmable graphics processor for multithreaded execution of programs

Patent number: 8860737

Abstract: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.

Type: Grant

Filed: July 19, 2006

Date of Patent: October 14, 2014

Assignee: NVIDIA Corporation

Inventors: John Erik Lindholm, Brett W. Coon, Stuart F. Oberman, Ming Y. Siu, Matthew P. Gerlach
PROGRAMMABLE GRAPHICS PROCESSOR FOR MULTITHREADED EXECUTION OF PROGRAMS

Publication number: 20140285500

Abstract: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.

Type: Application

Filed: March 25, 2013

Publication date: September 25, 2014

Applicant: NVIDIA Corporation

Inventors: John Erik LINDHOLM, Brett W. COON, Stuart F. OBERMAN, Ming Y. SIU, Matthew P. GERLACH
Efficient implementation of arrays of structures on SIMT and SIMD architectures

Patent number: 8751771

Abstract: One embodiment of the present invention sets forth a technique providing an optimized way to allocate and access memory across a plurality of thread/data lanes. Specifically, the device driver receives an instruction targeted to a memory set up as an array of structures of arrays. The device driver computes an address within the memory using information about the number of thread/data lanes and parameters from the instruction itself. The result is a memory allocation and access approach where the device driver properly computes the target address in the memory. Advantageously, processing efficiency is improved where memory in a parallel processing subsystem is internally stored and accessed as an array of structures of arrays, proportional to the SIMT/SIMD group width (the number of threads or lanes per execution group).

Type: Grant

Filed: September 28, 2011

Date of Patent: June 10, 2014

Assignee: NVIDIA Corporation

Inventors: Brian Fahs, Henry Packard Moreton, Brett W. Coon, Kathleen Elliott Nickolls
Thread group scheduler for computing on a parallel thread processor

Patent number: 8732713

Abstract: A parallel thread processor executes thread groups belonging to multiple cooperative thread arrays (CTAs). At each cycle of the parallel thread processor, an instruction scheduler selects a thread group to be issued for execution during a subsequent cycle. The instruction scheduler selects a thread group to issue for execution by (i) identifying a pool of available thread groups, (ii) identifying a CTA that has the greatest seniority value, and (iii) selecting the thread group that has the greatest credit value from within the CTA with the greatest seniority value.

Type: Grant

Filed: September 28, 2011

Date of Patent: May 20, 2014

Assignee: NVIDIA Corporation

Inventors: Brett W. Coon, John Erik Lindholm, Robert J. Stoll, Nicholas Wang, Jack Hilaire Choquette, Kathleen Elliott Nickolls
Systems and method for managing divergent threads in a SIMD architecture

Patent number: 8667256

Abstract: One embodiment of a computing system configured to manage divergent threads in a thread group includes a stack configured to store at least one token and a multithreaded processing unit. The multithreaded processing unit is configured to perform the steps of fetching a program instruction, determining that the program instruction is a branch instruction, determining that the program instruction is not a return or break instruction, determining whether the program instruction includes a set-synchronization bit, and updating an active program counter, where the manner in which the active program counter is updated depends on a branch instruction type.

Type: Grant

Filed: June 1, 2009

Date of Patent: March 4, 2014

Assignee: NVIDIA Corporation

Inventors: Brett W. Coon, John Erik Lindholm
Shared single-access memory with management of multiple parallel requests

Patent number: 8645638

Abstract: A memory is used by concurrent threads in a multithreaded processor. Any addressable storage location is accessible by any of the concurrent threads, but only one location at a time is accessible. The memory is coupled to parallel processing engines that generate a group of parallel memory access requests, each specifying a target address that might be the same or different for different requests. Serialization logic selects one of the target addresses and determines which of the requests specify the selected target address. All such requests are allowed to proceed in parallel, while other requests are deferred. Deferred requests may be regenerated and processed through the serialization logic so that a group of requests can be satisfied by accessing each different target address in the group exactly once.

Type: Grant

Filed: May 7, 2012

Date of Patent: February 4, 2014

Assignee: NVIDIA Corporation

Inventors: Brett W. Coon, Ming Y. Siu, Weizhong Xu, Stuart F. Oberman, John R. Nickolls, Peter C. Mills

1 2 3 4 5 next