Patents by Inventor Brett W. Coon

Brett W. Coon has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Structured programming control flow in a SIMD architecture

Patent number: 7877585

Abstract: One embodiment of a computing system configured to manage divergent threads in a SIMD thread group includes a stack configured to store state information for processing control instructions. A parallel processing unit is configured to perform the steps of determining if one or more threads diverge during execution of a conditional control instruction. A disable mask allows for the use of conditional return and break instructions in a multithreaded SIMD architecture. Additional control instructions are used to set up thread processing target addresses for synchronization, breaks, and returns.

Type: Grant

Filed: August 27, 2007

Date of Patent: January 25, 2011

Assignee: NVIDIA Corporation

Inventors: Brett W. Coon, John R. Nickolls, John Erik Lindholm, Svetoslav D. Tzvetkov
Register based queuing for texture requests

Patent number: 7864185

Abstract: A graphics processing unit can queue a large number of texture requests to balance out the variability of texture requests without the need for a large texture request buffer. A dedicated texture request buffer queues the relatively small texture commands and parameters. Additionally, for each queued texture command, an associated set of texture arguments, which are typically much larger than the texture command, are stored in a general purpose register. The texture unit retrieves texture commands from the texture request buffer and then fetches the associated texture arguments from the appropriate general purpose register. The texture arguments may be stored in the general purpose register designated as the destination of the final texture value computed by the texture unit. Because the destination register must be allocated for the final texture value as texture commands are queued, storing the texture arguments in this register does not consume any additional registers.

Type: Grant

Filed: October 23, 2008

Date of Patent: January 4, 2011

Assignee: NVIDIA Corporation

Inventors: John Erik Lindholm, John R. Nickolls, Simon S. Moy, Brett W. Coon
System and method for processing thread groups in a SIMD architecture

Patent number: 7836276

Abstract: A SIMD processor efficiently utilizes its hardware resources to achieve higher data processing throughput. The effective width of a SIMD processor is extended by clocking the instruction processing side of the SIMD processor at a fraction of the rate of the data processing side and by providing multiple execution pipelines, each with multiple data paths. As a result, higher data processing throughput is achieved while an instruction is fetched and issued once per clock. This configuration also allows a large group of threads to be clustered and executed together through the SIMD processor so that greater memory efficiency can be achieved for certain types of operations like texture memory accesses performed in connection with graphics processing.

Type: Grant

Filed: December 2, 2005

Date of Patent: November 16, 2010

Assignee: NVIDIA Corporation

Inventors: Brett W. Coon, John Erik Lindholm
Operand collector architecture

Patent number: 7834881

Abstract: An apparatus and method for simulating a multi-ported memory using lower port count memories as banks. A collector units gather source operands from the banks as needed to process program instructions. The collector units also gather constants that are used as operands. When all of the source operands needed to process a program instruction have been gathered, a collector unit outputs the source operands to an execution unit while avoiding writeback conflicts to registers specified by the program instruction that may be accessed by other execution units.

Type: Grant

Filed: November 1, 2006

Date of Patent: November 16, 2010

Assignee: NVIDIA Corporation

Inventors: Samuel Liu, John Erik Lindholm, Ming Y Siu, Brett W. Coon, Stuart F. Oberman
Generating event signals for performance register control using non-operative instructions

Patent number: 7809928

Abstract: One embodiment of an instruction decoder includes an instruction parser configured to process a first non-operative instruction and to generate a first event signal corresponding to the first non-operative instruction, and a first event multiplexer configured to receive the first event signal from the instruction parser, to select the first event signal from one or more event signals and to transmit the first event signal to an event logic block. The instruction decoder may be implemented in a multithreaded processing unit, such as a shader unit, and the occurrences of the first event signal may be tracked when one or more threads are executed within the processing unit. The resulting event signal count may provide a designer with a better understanding of the behavior of a program, such as a shader program, executed within the processing unit, thereby facilitating overall processing unit and program design.

Type: Grant

Filed: December 20, 2005

Date of Patent: October 5, 2010

Assignee: NVIDIA Corporation

Inventors: Roger L. Allen, Brett W. Coon, Ian A. Buck, John R. Nickolls
Multi-threaded stack cache

Patent number: 7805573

Abstract: Systems and methods for storing stack data for multi-threaded processing in a specialized cache reduce on-chip memory requirements while maintaining low access latency. An on-chip stack cache is used store a predetermined number of stack entries for a thread. When additional entries are needed for the thread, entries stored in the stack cache are spilled, i.e., moved, to remote memory. As entries are popped off the on-chip stack cache, spilled entries are restored from the remote memory. The spilling and restoring processes may be performed while the on-chip stack cache is accessed. Therefore, a large stack size is supported using a smaller amount of die area than that needed to store the entire large stack on-chip. The large stack may be accessed without incurring the latency of reading and writing to remote memory since the stack cache is preemptively spilled and restored.

Type: Grant

Filed: December 20, 2005

Date of Patent: September 28, 2010

Assignee: NVIDIA Corporation

Inventor: Brett W. Coon
Synchronization of threads in a cooperative thread array

Patent number: 7788468

Abstract: A “cooperative thread array,” or “CTA,” is a group of multiple threads that concurrently execute the same program on an input data set to produce an output data set. Each thread in a CTA has a unique thread identifier assigned at thread launch time that controls various aspects of the thread's processing behavior such as the portion of the input data set to be processed by each thread, the portion of an output data set to be produced by each thread, and/or sharing of intermediate results among threads. Different threads of the CTA are advantageously synchronized at appropriate points during CTA execution using a barrier synchronization technique in which barrier instructions in the CTA program are detected and used to suspend execution of some threads until a specified number of other threads also reaches the barrier point.

Type: Grant

Filed: December 15, 2005

Date of Patent: August 31, 2010

Assignee: NVIDIA Corporation

Inventors: John R. Nickolls, Stephen D. Lew, Brett W. Coon, Peter C. Mills
Processing an indirect branch instruction in a SIMD architecture

Patent number: 7761697

Abstract: One embodiment of a computing system configured to manage divergent threads in a thread group includes a stack configured to store at least one token and a multithreaded processing unit. The multithreaded processing unit is configured to perform the steps of fetching a program instruction, determining that the program instruction is an indirect branch instruction, and processing the indirect branch instruction as a sequence of two-way branches to execute an indirect branch instruction with multiple branch addresses. Indirect branch instructions may be used to allow greater flexibility since the branch address or multiple branch addresses do not need to be determined at compile time.

Type: Grant

Filed: November 6, 2006

Date of Patent: July 20, 2010

Assignee: NVIDIA Corporation

Inventors: Brett W. Coon, John Erik Lindholm, Peter C. Mills, John R. Nickolls
Apparatus and method for debugging a graphics processing unit in response to a debug instruction

Patent number: 7711990

Abstract: A system includes a graphics processing unit with a processor responsive to a debug instruction that initiates the storage of execution state information. A memory stores the execution state information. A central processing unit executes a debugging program to analyze the execution state information.

Type: Grant

Filed: December 13, 2005

Date of Patent: May 4, 2010

Assignee: Nvidia Corporation

Inventors: John R. Nickolls, Roger L. Allen, Brian K. Cabral, Brett W. Coon, Robert C. Keller
Single interconnect providing read and write access to a memory shared by concurrent threads

Patent number: 7680988

Abstract: A shared memory is usable by concurrent threads in a multithreaded processor, with any addressable storage location in the shared memory being readable and writeable by any of the threads. Processing engines that execute the threads are coupled to the shared memory via an interconnect that transfers data in only one direction (e.g., from the shared memory to the processing engines); the same interconnect supports both read and write operations. The interconnect advantageously supports multiple parallel read or write operations.

Type: Grant

Filed: October 30, 2006

Date of Patent: March 16, 2010

Assignee: NVIDIA Corporation

Inventors: John R. Nickolls, Brett W. Coon, Ming Y. Siu, Stuart F. Oberman, Samuel Liu
Register file allocation

Patent number: 7634621

Abstract: Circuits, methods, and apparatus that provide the die area and power savings of a single-ported memory with the performance advantages of a multiported memory. One example provides register allocation methods for storing data in a multiple-bank register file. In a thin register allocation method, data for a process is stored in a single bank. In this way, different processes use different banks to avoid conflicts. In a fat register allocation method, processes store data in each bank. In this way, if one process uses a large number of registers, those registers are spread among the banks, avoiding a situation where one bank is filled and other processes are forced to share a reduced number of banks. In a hybrid register allocation method, processes store data in more than one bank, but fewer than all the banks. Each of these methods may be combined in varying ways.

Type: Grant

Filed: November 3, 2006

Date of Patent: December 15, 2009

Assignee: NVIDIA Corporation

Inventors: Brett W. Coon, John Erik Lindholm, Gary Tarolli, Svetoslav D. Tzvetkov, John R. Nickolls, Ming Y. Siu
Structured programming control flow using a disable mask in a SIMD architecture

Patent number: 7617384

Abstract: One embodiment of a computing system configured to manage divergent threads in a SIMD thread group includes a stack configured to store state information for processing control instructions. A parallel processing unit is configured to perform the steps of determining if one or more threads diverge during execution of a conditional control instruction. Threads that exit a program are identified as idle by a disable mask. Other threads that are disabled may be enabled once the divergent threads reach an instruction that enables the disabled threads. Use of the disable mask allows for the use of conditional return and break instructions in a multithreaded SIMD architecture.

Type: Grant

Filed: January 31, 2007

Date of Patent: November 10, 2009

Assignee: NVIDIA Corporation

Inventors: Brett W. Coon, John Erik Lindholm, Svetoslav D. Tzvetkov
Apparatus and method for monitoring and debugging a graphics processing unit

Patent number: 7600155

Abstract: A system has a graphics processing unit with a processor to monitor selected criteria and circuitry to initiate the storage of execution state information when the selected criteria reaches a specified state. A memory stores execution state information. A central processing unit executes a debugging program to analyze the execution state information.

Type: Grant

Filed: December 13, 2005

Date of Patent: October 6, 2009

Assignee: NVIDIA Corporation

Inventors: John R. Nickolls, Roger L. Allen, Brian K. Cabral, Brett W. Coon, Robert C. Keller
Indirect Function Call Instructions in a Synchronous Parallel Thread Processor

Publication number: 20090240931

Abstract: An indirect branch instruction takes an address register as an argument in order to provide indirect function call capability for single-instruction multiple-thread (SIMT) processor architectures. The indirect branch instruction is used to implement indirect function calls, virtual function calls, and switch statements to improve processing performance compared with using sequential chains of tests and branches.

Type: Application

Filed: March 24, 2008

Publication date: September 24, 2009

Inventors: Brett W. Coon, John R. Nickolls, Lars Nyland, Peter C. Mills, John Erik Lindholm
Lock Mechanism to Enable Atomic Updates to Shared Memory

Publication number: 20090240860

Abstract: A system and method for locking and unlocking access to a shared memory for atomic operations provides immediate feedback indicating whether or not the lock was successful. Read data is returned to the requestor with the lock status. The lock status may be changed concurrently when locking during a read or unlocking during a write. Therefore, it is not necessary to check the lock status as a separate transaction prior to or during a read-modify-write operation. Additionally, a lock or unlock may be explicitly specified for each atomic memory operation. Therefore, lock operations are not performed for operations that do not modify the contents of a memory location.

Type: Application

Filed: March 24, 2008

Publication date: September 24, 2009

Inventors: Brett W. Coon, John R. Nickolls, Lars Nyland, Peter C. Mills
System and method for managing divergent threads using synchronization tokens and program instructions that include set-synchronization bits

Patent number: 7543136

Abstract: One embodiment of a computing system configured to manage divergent threads in a thread group includes a stack configured to store at least one token and a multithreaded processing unit. The multithreaded processing unit is configured to perform the steps of fetching a program instruction, determining that the program instruction is a branch instruction, determining that the program instruction is not a return or break instruction, determining whether the program instruction includes a set-synchronization bit, and updating an active program counter, where the manner in which the active program counter is updated depends on a branch instruction type.

Type: Grant

Filed: July 13, 2005

Date of Patent: June 2, 2009

Assignee: NVIDIA Corporation

Inventors: Brett W. Coon, John Erik Lindholm
Subdividing a shader program

Patent number: 7542043

Abstract: Methods and apparatus for subdividing a shader program into regions or “phases” of instructions identifiable by phase identifiers (IDs) inserted into the shader program are provided. The phase IDs may be used to constrain execution of the shader program to prohibit texture fetches in later phases from being executed before a texture fetch in a current phase has completed. Other operations (e.g., math operations) within the current phase, however, may be allowed to execute while waiting for the current phase texture fetch to complete.

Type: Grant

Filed: May 23, 2005

Date of Patent: June 2, 2009

Assignee: NVIDIA Corporation

Inventors: John Erik Lindholm, Brett W. Coon, Gary M. Tarolli
Register based queuing for texture requests

Patent number: 7456835

Abstract: A graphics processing unit can queue a large number of texture requests to balance out the variability of texture requests without the need for a large texture request buffer. A dedicated texture request buffer queues the relatively small texture commands and parameters. Additionally, for each queued texture command, an associated set of texture arguments, which are typically much larger than the texture command, are stored in a general purpose register. The texture unit retrieves texture commands from the texture request buffer and then fetches the associated texture arguments from the appropriate general purpose register. The texture arguments may be stored in the general purpose register designated as the destination of the final texture value computed by the texture unit. Because the destination register must be allocated for the final texture value as texture commands are queued, storing the texture arguments in this register does not consume any additional registers.

Type: Grant

Filed: January 25, 2006

Date of Patent: November 25, 2008

Assignee: Nvidia Corporation

Inventors: John Erik Lindholm, John R. Nickolls, Simon S. Moy, Brett W. Coon
Tracking register usage during multithreaded processing using a scoreboard having separate memory regions and storing sequential register size indicators

Patent number: 7434032

Abstract: A scoreboard memory for a processing unit has separate memory regions allocated to each of the multiple threads to be processed. For each thread, the scoreboard memory stores register identifiers of registers that have pending writes. When an instruction is added to an instruction buffer, the register identifiers of the registers specified in the instruction are compared with the register identifiers stored in the scoreboard memory for that instruction's thread, and a multi-bit value representing the comparison result is generated. The multi-bit value is stored with the instruction in the instruction buffer and may be updated as instructions belonging to the same thread complete their execution. Before the instruction is issued for execution, this multi-bit value is checked. If this multi-bit value indicates that none of the registers specified in the instruction have pending writes, the instruction is allowed to issue for execution.

Type: Grant

Filed: December 13, 2005

Date of Patent: October 7, 2008

Assignee: NVIDIA Corporation

Inventors: Brett W. Coon, Peter C. Mills, Stuart F. Oberman, Ming Y. Siu
Prioritized issuing of operation dedicated execution unit tagged instructions from multiple different type threads performing different set of operations

Patent number: 7418576

Abstract: A graphics processor buffers vertex thread and pixel threads. The different types of threads issue instructions corresponding to different sets of operations. A plurality of different types of execution units are provided, each type of execution unit servicing a different class of operations, such as an executing unit supporting texture operations, an execution unit supporting blending operations, and an execution unit supporting mathematical operations. Current instructions of the threads are buffered and prioritized in a common instruction buffer. A set of high priority instructions is issued per cycle to the plurality of different types of execution units.

Type: Grant

Filed: November 17, 2004

Date of Patent: August 26, 2008

Assignee: Nvidia Corporation

Inventors: John E. Lindholm, Brett W. Coon

prev 1 2 3 4 5 next