Patents by Inventor Brett W. Coon

Brett W. Coon has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

LOCK MECHANISM TO ENABLE ATOMIC UPDATES TO SHARED MEMORY

Publication number: 20120036329

Abstract: A system and method for locking and unlocking access to a shared memory for atomic operations provides immediate feedback indicating whether or not the lock was successful. Read data is returned to the requestor with the lock status. The lock status may be changed concurrently when locking during a read or unlocking during a write. Therefore, it is not necessary to check the lock status as a separate transaction prior to or during a read-modify-write operation. Additionally, a lock or unlock may be explicitly specified for each atomic memory operation. Therefore, lock operations are not performed for operations that do not modify the contents of a memory location.

Type: Application

Filed: October 18, 2011

Publication date: February 9, 2012

Inventors: Brett W. Coon, John R. Nickolls, Lars Nyland, Peter C. Mills
HIERARCHICAL PROCESSOR ARRAY

Publication number: 20120026175

Abstract: Apparatuses and methods are presented for a hierarchical processor. The processor comprises, at a first level of hierarchy, a plurality of similarly structured first level components, wherein each of the plurality of similarly structured first level components includes at least one combined function module capable of performing multiple classes of graphics operations, each of the multiple classes of graphics operations being associated with a different stage of graphics processing.

Type: Application

Filed: October 10, 2011

Publication date: February 2, 2012

Applicant: NVIDIA Corporation

Inventors: John Erik Lindholm, John S. Montrym, Emmett M. Kilgariff, Simon S. Moy, Sean Jeffrey Treichler, Brett W. Coon, David Kirk, John Danskin
Shared memory with parallel access and access conflict resolution mechanism

Patent number: 8108625

Abstract: Concurrent threads in a multithreaded processor share access to a memory, with any location in the shared memory being accessible by any thread. In one embodiment, the shared memory has multiple independently-addressable memory banks, and one location per bank can be accessed in parallel. Parallel processing engines executing the threads generate a group of parallel memory access requests. Address conflict logic determines whether the requests can be satisfied in parallel (e.g., based on bank access constraints) and serializes the requests to the extent needed to avoid conflicts. In some embodiments, data read from one address in the shared memory can be broadcast to multiple processing engines.

Type: Grant

Filed: October 30, 2006

Date of Patent: January 31, 2012

Assignee: NVIDIA Corporation

Inventors: Brett W. Coon, Ming Y. Siu, Weizhong Xu, Stuart F. Oberman, John R. Nickolls, Peter C. Mills
Soldier-on mode to control processor error handling behavior

Patent number: 8095829

Abstract: A global processor operating mode is used select whether a processor stops processing when an error is detected or ignores the error and continues processing while overriding values as needed to recover from the error. When a soldier-on mode is enabled the system attempts to recover from the error while also recording the error state of the first error in on-chip registers for later analysis. When the soldier-on mode is not enabled and an error occurs, the system stops processing and the error is reported up to the operating system.

Type: Grant

Filed: November 2, 2007

Date of Patent: January 10, 2012

Assignee: NVIDIA Corporation

Inventors: Brett W. Coon, Bryon S. Nordquist
Hierarchical processor array

Patent number: 8077174

Abstract: Apparatuses and methods are presented for a hierarchical processor. The processor comprises, at a first level of hierarchy, a plurality of similarly structured first level components, wherein each of the plurality of similarly structured first level components includes at least one combined function module capable of performing multiple classes of graphics operations, each of the multiple classes of graphics operations being associated with a different stage of graphics processing.

Type: Grant

Filed: November 1, 2007

Date of Patent: December 13, 2011

Assignee: NVIDIA Corporation

Inventors: John Erik Lindholm, John S. Montrym, Emmett M. Kilgariff, Simon S. Moy, Sean Jeffrey Treichler, Brett W. Coon, David Kirk, John Danskin
Managing state information for a multi-threaded processor

Patent number: 8074224

Abstract: Embodiments of the present invention facilitate dynamically adapting to state information changes in a graphics processing environment. In one embodiment, a master register holds state information corresponding to units of work (threads) to be performed. The state information in the master register is copied to a per-group state register when a group of threads is to be launched. The per-group state register is coupled to processing engines configured to process the threads, so that the processing engines read state information from the per-group state register rather than the master register. In another embodiment, a number of master registers may be used to store state information for different types of threads.

Type: Grant

Filed: December 19, 2005

Date of Patent: December 6, 2011

Assignee: NVIDIA Corporation

Inventors: Bryon S. Nordquist, Brett W. Coon
Lock mechanism to enable atomic updates to shared memory

Patent number: 8055856

Abstract: A system and method for locking and unlocking access to a shared memory for atomic operations provides immediate feedback indicating whether or not the lock was successful. Read data is returned to the requestor with the lock status. The lock status may be changed concurrently when locking during a read or unlocking during a write. Therefore, it is not necessary to check the lock status as a separate transaction prior to or during a read-modify-write operation. Additionally, a lock or unlock may be explicitly specified for each atomic memory operation. Therefore, lock operations are not performed for operations that do not modify the contents of a memory location.

Type: Grant

Filed: March 24, 2008

Date of Patent: November 8, 2011

Assignee: NVIDIA Corporation

Inventors: Brett W. Coon, John R. Nickolls, Lars Nyland, Peter C. Mills
SHARED SINGLE ACCESS MEMORY WITH MANAGEMENT OF MULTIPLE PARALLEL REQUESTS

Publication number: 20110252204

Abstract: A memory is used by concurrent threads in a multithreaded processor. Any addressable storage location is accessible by any of the concurrent threads, but only one location at a time is accessible. The memory is coupled to parallel processing engines that generate a group of parallel memory access requests, each specifying a target address that might be the same or different for different requests. Serialization logic selects one of the target addresses and determines which of the requests specify the selected target address. All such requests are allowed to proceed in parallel, while other requests are deferred. Deferred requests may be regenerated and processed through the serialization logic so that a group of requests can be satisfied by accessing each different target address in the group exactly once.

Type: Application

Filed: June 21, 2011

Publication date: October 13, 2011

Applicant: NVIDIA Corporation

Inventors: Brett W. Coon, Ming Y. Siu, Weizhong Xu, Stuart F. Oberman, John R. Nickolls, Peter C. Mills
MAXIMIZED MEMORY THROUGHPUT ON PARALLEL PROCESSING DEVICES

Publication number: 20110173414

Abstract: In parallel processing devices, for streaming computations, processing of each data element of the stream may not be computationally intensive and thus processing may take relatively small amounts of time to compute as compared to memory accesses times required to read the stream and write the results. Therefore, memory throughput often limits the performance of the streaming computation. Generally stated, provided are methods for achieving improved, optimized, or ultimately, maximized memory throughput in such memory-throughput-limited streaming computations. Streaming computation performance is maximized by improving the aggregate memory throughput across the plurality of processing elements and threads. High aggregate memory throughput is achieved by balancing processing loads between threads and groups of threads and a hardware memory interface coupled to the parallel processing devices.

Type: Application

Filed: March 23, 2011

Publication date: July 14, 2011

Applicant: NVIDIA Corporation

Inventors: Norbert Juffa, Brett W. Coon
Scheduler in multi-threaded processor prioritizing instructions passing qualification rule

Patent number: 7949855

Abstract: A processor buffers asynchronous threads. Instructions requiring operations provided by a plurality of execution units are divided into phases, each phase having at least one computation operation and at least one memory access operation. Instructions within each phase are qualified and prioritized. The instructions may be qualified based on the status of the execution unit needed to execute one or more of the current instructions. The instructions may also be qualified based on an age of each instruction, status of the execution units, a divergence potential, locality, thread diversity, and resource requirements. Qualified instructions may be prioritized based on execution units needed to execute instructions and the execution units in use. One or more of the prioritized instructions is issued per cycle to the plurality of execution units.

Type: Grant

Filed: April 28, 2008

Date of Patent: May 24, 2011

Assignee: NVIDIA Corporation

Inventors: Peter C. Mills, John Erik Lindholm, Brett W. Coon, Gary M. Tarolli, John Matthew Burgess
Maximized memory throughput using cooperative thread arrays

Patent number: 7925860

Abstract: In parallel processing devices, for streaming computations, processing of each data element of the stream may not be computationally intensive and thus processing may take relatively small amounts of time to compute as compared to memory accesses times required to read the stream and write the results. Therefore, memory throughput often limits the performance of the streaming computation. Generally stated, provided are methods for achieving improved, optimized, or ultimately, maximized memory throughput in such memory-throughput-limited streaming computations. Streaming computation performance is maximized by improving the aggregate memory throughput across the plurality of processing elements and threads. High aggregate memory throughput is achieved by balancing processing loads between threads and groups of threads and a hardware memory interface coupled to the parallel processing devices.

Type: Grant

Filed: May 14, 2007

Date of Patent: April 12, 2011

Assignee: NVIDIA Corporation

Inventors: Norbert Juffa, Brett W. Coon
Support for Non-Local Returns in Parallel Thread SIMD Engine

Publication number: 20110078418

Abstract: One embodiment of the present invention sets forth a method for executing a non-local return instruction in a parallel thread processor. The method comprises the steps of receiving, within the thread group, a first long jump instruction and, in response, popping a first token from the execution stack. The method also comprises determining whether the first token is a first long jump token that was pushed onto the execution stack when a first push instruction associated with the first long jump instruction was executed, and when the first token is the first long jump token, jumping to the second instruction based on the address specified by the first long jump token, or, when the first token is not the first long jump token, disabling the active thread until the first long jump token is popped from the execution stack.

Type: Application

Filed: September 13, 2010

Publication date: March 31, 2011

Inventors: Guillermo Juan Rozas, Brett W. Coon
Unified Addressing and Instructions for Accessing Parallel Memory Spaces

Publication number: 20110078406

Abstract: One embodiment of the present invention sets forth a technique for unifying the addressing of multiple distinct parallel memory spaces into a single address space for a thread. A unified memory space address is converted into an address that accesses one of the parallel memory spaces for that thread. A single type of load or store instruction may be used that specifies the unified memory space address for a thread instead of using a different type of load or store instruction to access each of the distinct parallel memory spaces.

Type: Application

Filed: September 25, 2009

Publication date: March 31, 2011

Inventors: John R. Nickolls, Brett W. Coon, Ian A. Buck, Robert Steven Glanville
TRAP HANDLER ARCHITECTURE FOR A PARALLEL PROCESSING UNIT

Publication number: 20110078427

Abstract: A trap handler architecture is incorporated into a parallel processing subsystem such as a GPU. The trap handler architecture minimizes design complexity and verification efforts for concurrently executing threads by imposing a property that all thread groups associated with a streaming multi-processor are either all executing within their respective code segments or are all executing within the trap handler code segment.

Type: Application

Filed: September 29, 2009

Publication date: March 31, 2011

Inventors: Michael C. Shebanow, Jack Choquette, Brett W. Coon, Steven J. Heinrich, Aravind Kalaiah, John R. Nickolls, Daniel Salinas, Ming Y. Siu, Tommy Thorn, Nicholas Wang
COALESCING MEMORY BARRIER OPERATIONS ACROSS MULTIPLE PARALLEL THREADS

Publication number: 20110078692

Abstract: One embodiment of the present invention sets forth a technique for coalescing memory barrier operations across multiple parallel threads. Memory barrier requests from a given parallel thread processing unit are coalesced to reduce the impact to the rest of the system. Additionally, memory barrier requests may specify a level of a set of threads with respect to which the memory transactions are committed. For example, a first type of memory barrier instruction may commit the memory transactions to a level of a set of cooperating threads that share an L1 (level one) cache. A second type of memory barrier instruction may commit the memory transactions to a level of a set of threads sharing a global memory. Finally, a third type of memory barrier instruction may commit the memory transactions to a system level of all threads sharing all system memories. The latency required to execute the memory barrier instruction varies based on the type of memory barrier instruction.

Type: Application

Filed: September 21, 2010

Publication date: March 31, 2011

Inventors: John R. NICKOLLS, Steven James Heinrich, Brett W. Coon, Michael C. Shebanow
CONFIGURABLE CACHE FOR MULTIPLE CLIENTS

Publication number: 20110078367

Abstract: One embodiment of the present invention sets forth a technique for providing a L1 cache that is a central storage resource. The L1 cache services multiple clients with diverse latency and bandwidth requirements. The L1 cache may be reconfigured to create multiple storage spaces enabling the L1 cache may replace dedicated buffers, caches, and FIFOs in previous architectures. A “direct mapped” storage region that is configured within the L1 cache may replace dedicated buffers, FIFOs, and interface paths, allowing clients of the L1 cache to exchange attribute and primitive data. The direct mapped storage region may used as a global register file. A “local and global cache” storage region configured within the L1 cache may be used to support load/store memory requests to multiple spaces. These spaces include global, local, and call-return stack (CRS) memory.

Type: Application

Filed: September 25, 2009

Publication date: March 31, 2011

Inventors: Alexander L. Minkin, Steven James Heinrich, RaJeshwaren Selvanesan, Brett W. Coon, Charles McCarver, Anjana Rajendran, Stewart G. Carlton
COOPERATIVE THREAD ARRAY REDUCTION AND SCAN OPERATIONS

Publication number: 20110078417

Abstract: One embodiment of the present invention sets forth a technique for performing aggregation operations across multiple threads that execute independently. Aggregation is specified as part of a barrier synchronization or barrier arrival instruction, where in addition to performing the barrier synchronization or arrival, the instruction aggregates (using reduction or scan operations) values supplied by each thread. When a thread executes the barrier aggregation instruction the thread contributes to a scan or reduction result, and waits to execute any more instructions until after all of the threads have executed the barrier aggregation instruction. A reduction result is communicated to each thread after all of the threads have executed the barrier aggregation instruction and a scan result is communicated to each thread as the barrier aggregation instruction is executed by the thread.

Type: Application

Filed: September 24, 2010

Publication date: March 31, 2011

Inventors: Brian FAHS, Ming Y. Siu, Brett W. Coon, John R. Nickolls, Lars Nyland
Cache Operations and Policies For A Multi-Threaded Client

Publication number: 20110078381

Abstract: A method for managing a parallel cache hierarchy in a processing unit. The method including receiving an instruction that includes a cache operations modifier that identifies a level of the parallel cache hierarchy in which to cache data associated with the instruction; and implementing a cache replacement policy based on the cache operations modifier.

Type: Application

Filed: September 24, 2010

Publication date: March 31, 2011

Inventors: Steven James HEINRICH, Alexander L. Minkin, Brett W. Coon, Rajeshwaran Selvanesan, Robert Steven Glanville, Charles McCarver, Anjana Rajendran, Stewart Glenn Carlton, John R. Nickolls, Brian Fahs
Credit-Based Streaming Multiprocessor Warp Scheduling

Publication number: 20110072244

Abstract: One embodiment of the present invention sets forth a technique for ensuring cache access instructions are scheduled for execution in a multi-threaded system to improve cache locality and system performance. A credit-based technique may be used to control instruction by instruction scheduling for each warp in a group so that the group of warps is processed uniformly. A credit is computed for each warp and the credit contributes to a weight for each warp. The weight is used to select instructions for the warps that are issued for execution.

Type: Application

Filed: September 17, 2010

Publication date: March 24, 2011

Inventors: John Erik Lindholm, Brett W. Coon, Jered Wierzbicki, Robert J. Stoll, Stuart F. Oberman
INSTRUCTIONS FOR MANAGING A PARALLEL CACHE HIERARCHY

Publication number: 20110072213

Abstract: A method for managing a parallel cache hierarchy in a processing unit. The method includes receiving an instruction from a scheduler unit, where the instruction comprises a load instruction or a store instruction; determining that the instruction includes a cache operations modifier that identifies a policy for caching data associated with the instruction at one or more levels of the parallel cache hierarchy; and executing the instruction and caching the data associated with the instruction based on the cache operations modifier.

Type: Application

Filed: September 22, 2010

Publication date: March 24, 2011

Inventors: John R. NICKOLLS, Brett W. Coon, Michael C. Shebanow

prev 1 2 3 4 5 next