Patents by Inventor Brett W. Coon

Brett W. Coon has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 7877585
    Abstract: One embodiment of a computing system configured to manage divergent threads in a SIMD thread group includes a stack configured to store state information for processing control instructions. A parallel processing unit is configured to perform the steps of determining if one or more threads diverge during execution of a conditional control instruction. A disable mask allows for the use of conditional return and break instructions in a multithreaded SIMD architecture. Additional control instructions are used to set up thread processing target addresses for synchronization, breaks, and returns.
    Type: Grant
    Filed: August 27, 2007
    Date of Patent: January 25, 2011
    Assignee: NVIDIA Corporation
    Inventors: Brett W. Coon, John R. Nickolls, John Erik Lindholm, Svetoslav D. Tzvetkov
  • Patent number: 7864185
    Abstract: A graphics processing unit can queue a large number of texture requests to balance out the variability of texture requests without the need for a large texture request buffer. A dedicated texture request buffer queues the relatively small texture commands and parameters. Additionally, for each queued texture command, an associated set of texture arguments, which are typically much larger than the texture command, are stored in a general purpose register. The texture unit retrieves texture commands from the texture request buffer and then fetches the associated texture arguments from the appropriate general purpose register. The texture arguments may be stored in the general purpose register designated as the destination of the final texture value computed by the texture unit. Because the destination register must be allocated for the final texture value as texture commands are queued, storing the texture arguments in this register does not consume any additional registers.
    Type: Grant
    Filed: October 23, 2008
    Date of Patent: January 4, 2011
    Assignee: NVIDIA Corporation
    Inventors: John Erik Lindholm, John R. Nickolls, Simon S. Moy, Brett W. Coon
  • Patent number: 7836276
    Abstract: A SIMD processor efficiently utilizes its hardware resources to achieve higher data processing throughput. The effective width of a SIMD processor is extended by clocking the instruction processing side of the SIMD processor at a fraction of the rate of the data processing side and by providing multiple execution pipelines, each with multiple data paths. As a result, higher data processing throughput is achieved while an instruction is fetched and issued once per clock. This configuration also allows a large group of threads to be clustered and executed together through the SIMD processor so that greater memory efficiency can be achieved for certain types of operations like texture memory accesses performed in connection with graphics processing.
    Type: Grant
    Filed: December 2, 2005
    Date of Patent: November 16, 2010
    Assignee: NVIDIA Corporation
    Inventors: Brett W. Coon, John Erik Lindholm
  • Patent number: 7834881
    Abstract: An apparatus and method for simulating a multi-ported memory using lower port count memories as banks. A collector units gather source operands from the banks as needed to process program instructions. The collector units also gather constants that are used as operands. When all of the source operands needed to process a program instruction have been gathered, a collector unit outputs the source operands to an execution unit while avoiding writeback conflicts to registers specified by the program instruction that may be accessed by other execution units.
    Type: Grant
    Filed: November 1, 2006
    Date of Patent: November 16, 2010
    Assignee: NVIDIA Corporation
    Inventors: Samuel Liu, John Erik Lindholm, Ming Y Siu, Brett W. Coon, Stuart F. Oberman
  • Patent number: 7809928
    Abstract: One embodiment of an instruction decoder includes an instruction parser configured to process a first non-operative instruction and to generate a first event signal corresponding to the first non-operative instruction, and a first event multiplexer configured to receive the first event signal from the instruction parser, to select the first event signal from one or more event signals and to transmit the first event signal to an event logic block. The instruction decoder may be implemented in a multithreaded processing unit, such as a shader unit, and the occurrences of the first event signal may be tracked when one or more threads are executed within the processing unit. The resulting event signal count may provide a designer with a better understanding of the behavior of a program, such as a shader program, executed within the processing unit, thereby facilitating overall processing unit and program design.
    Type: Grant
    Filed: December 20, 2005
    Date of Patent: October 5, 2010
    Assignee: NVIDIA Corporation
    Inventors: Roger L. Allen, Brett W. Coon, Ian A. Buck, John R. Nickolls
  • Patent number: 7805573
    Abstract: Systems and methods for storing stack data for multi-threaded processing in a specialized cache reduce on-chip memory requirements while maintaining low access latency. An on-chip stack cache is used store a predetermined number of stack entries for a thread. When additional entries are needed for the thread, entries stored in the stack cache are spilled, i.e., moved, to remote memory. As entries are popped off the on-chip stack cache, spilled entries are restored from the remote memory. The spilling and restoring processes may be performed while the on-chip stack cache is accessed. Therefore, a large stack size is supported using a smaller amount of die area than that needed to store the entire large stack on-chip. The large stack may be accessed without incurring the latency of reading and writing to remote memory since the stack cache is preemptively spilled and restored.
    Type: Grant
    Filed: December 20, 2005
    Date of Patent: September 28, 2010
    Assignee: NVIDIA Corporation
    Inventor: Brett W. Coon
  • Patent number: 7788468
    Abstract: A “cooperative thread array,” or “CTA,” is a group of multiple threads that concurrently execute the same program on an input data set to produce an output data set. Each thread in a CTA has a unique thread identifier assigned at thread launch time that controls various aspects of the thread's processing behavior such as the portion of the input data set to be processed by each thread, the portion of an output data set to be produced by each thread, and/or sharing of intermediate results among threads. Different threads of the CTA are advantageously synchronized at appropriate points during CTA execution using a barrier synchronization technique in which barrier instructions in the CTA program are detected and used to suspend execution of some threads until a specified number of other threads also reaches the barrier point.
    Type: Grant
    Filed: December 15, 2005
    Date of Patent: August 31, 2010
    Assignee: NVIDIA Corporation
    Inventors: John R. Nickolls, Stephen D. Lew, Brett W. Coon, Peter C. Mills
  • Patent number: 7761697
    Abstract: One embodiment of a computing system configured to manage divergent threads in a thread group includes a stack configured to store at least one token and a multithreaded processing unit. The multithreaded processing unit is configured to perform the steps of fetching a program instruction, determining that the program instruction is an indirect branch instruction, and processing the indirect branch instruction as a sequence of two-way branches to execute an indirect branch instruction with multiple branch addresses. Indirect branch instructions may be used to allow greater flexibility since the branch address or multiple branch addresses do not need to be determined at compile time.
    Type: Grant
    Filed: November 6, 2006
    Date of Patent: July 20, 2010
    Assignee: NVIDIA Corporation
    Inventors: Brett W. Coon, John Erik Lindholm, Peter C. Mills, John R. Nickolls
  • Patent number: 7711990
    Abstract: A system includes a graphics processing unit with a processor responsive to a debug instruction that initiates the storage of execution state information. A memory stores the execution state information. A central processing unit executes a debugging program to analyze the execution state information.
    Type: Grant
    Filed: December 13, 2005
    Date of Patent: May 4, 2010
    Assignee: Nvidia Corporation
    Inventors: John R. Nickolls, Roger L. Allen, Brian K. Cabral, Brett W. Coon, Robert C. Keller
  • Patent number: 7680988
    Abstract: A shared memory is usable by concurrent threads in a multithreaded processor, with any addressable storage location in the shared memory being readable and writeable by any of the threads. Processing engines that execute the threads are coupled to the shared memory via an interconnect that transfers data in only one direction (e.g., from the shared memory to the processing engines); the same interconnect supports both read and write operations. The interconnect advantageously supports multiple parallel read or write operations.
    Type: Grant
    Filed: October 30, 2006
    Date of Patent: March 16, 2010
    Assignee: NVIDIA Corporation
    Inventors: John R. Nickolls, Brett W. Coon, Ming Y. Siu, Stuart F. Oberman, Samuel Liu
  • Patent number: 7634621
    Abstract: Circuits, methods, and apparatus that provide the die area and power savings of a single-ported memory with the performance advantages of a multiported memory. One example provides register allocation methods for storing data in a multiple-bank register file. In a thin register allocation method, data for a process is stored in a single bank. In this way, different processes use different banks to avoid conflicts. In a fat register allocation method, processes store data in each bank. In this way, if one process uses a large number of registers, those registers are spread among the banks, avoiding a situation where one bank is filled and other processes are forced to share a reduced number of banks. In a hybrid register allocation method, processes store data in more than one bank, but fewer than all the banks. Each of these methods may be combined in varying ways.
    Type: Grant
    Filed: November 3, 2006
    Date of Patent: December 15, 2009
    Assignee: NVIDIA Corporation
    Inventors: Brett W. Coon, John Erik Lindholm, Gary Tarolli, Svetoslav D. Tzvetkov, John R. Nickolls, Ming Y. Siu
  • Patent number: 7617384
    Abstract: One embodiment of a computing system configured to manage divergent threads in a SIMD thread group includes a stack configured to store state information for processing control instructions. A parallel processing unit is configured to perform the steps of determining if one or more threads diverge during execution of a conditional control instruction. Threads that exit a program are identified as idle by a disable mask. Other threads that are disabled may be enabled once the divergent threads reach an instruction that enables the disabled threads. Use of the disable mask allows for the use of conditional return and break instructions in a multithreaded SIMD architecture.
    Type: Grant
    Filed: January 31, 2007
    Date of Patent: November 10, 2009
    Assignee: NVIDIA Corporation
    Inventors: Brett W. Coon, John Erik Lindholm, Svetoslav D. Tzvetkov
  • Patent number: 7600155
    Abstract: A system has a graphics processing unit with a processor to monitor selected criteria and circuitry to initiate the storage of execution state information when the selected criteria reaches a specified state. A memory stores execution state information. A central processing unit executes a debugging program to analyze the execution state information.
    Type: Grant
    Filed: December 13, 2005
    Date of Patent: October 6, 2009
    Assignee: NVIDIA Corporation
    Inventors: John R. Nickolls, Roger L. Allen, Brian K. Cabral, Brett W. Coon, Robert C. Keller
  • Publication number: 20090240931
    Abstract: An indirect branch instruction takes an address register as an argument in order to provide indirect function call capability for single-instruction multiple-thread (SIMT) processor architectures. The indirect branch instruction is used to implement indirect function calls, virtual function calls, and switch statements to improve processing performance compared with using sequential chains of tests and branches.
    Type: Application
    Filed: March 24, 2008
    Publication date: September 24, 2009
    Inventors: Brett W. Coon, John R. Nickolls, Lars Nyland, Peter C. Mills, John Erik Lindholm
  • Publication number: 20090240860
    Abstract: A system and method for locking and unlocking access to a shared memory for atomic operations provides immediate feedback indicating whether or not the lock was successful. Read data is returned to the requestor with the lock status. The lock status may be changed concurrently when locking during a read or unlocking during a write. Therefore, it is not necessary to check the lock status as a separate transaction prior to or during a read-modify-write operation. Additionally, a lock or unlock may be explicitly specified for each atomic memory operation. Therefore, lock operations are not performed for operations that do not modify the contents of a memory location.
    Type: Application
    Filed: March 24, 2008
    Publication date: September 24, 2009
    Inventors: Brett W. Coon, John R. Nickolls, Lars Nyland, Peter C. Mills
  • Patent number: 7543136
    Abstract: One embodiment of a computing system configured to manage divergent threads in a thread group includes a stack configured to store at least one token and a multithreaded processing unit. The multithreaded processing unit is configured to perform the steps of fetching a program instruction, determining that the program instruction is a branch instruction, determining that the program instruction is not a return or break instruction, determining whether the program instruction includes a set-synchronization bit, and updating an active program counter, where the manner in which the active program counter is updated depends on a branch instruction type.
    Type: Grant
    Filed: July 13, 2005
    Date of Patent: June 2, 2009
    Assignee: NVIDIA Corporation
    Inventors: Brett W. Coon, John Erik Lindholm
  • Patent number: 7542043
    Abstract: Methods and apparatus for subdividing a shader program into regions or “phases” of instructions identifiable by phase identifiers (IDs) inserted into the shader program are provided. The phase IDs may be used to constrain execution of the shader program to prohibit texture fetches in later phases from being executed before a texture fetch in a current phase has completed. Other operations (e.g., math operations) within the current phase, however, may be allowed to execute while waiting for the current phase texture fetch to complete.
    Type: Grant
    Filed: May 23, 2005
    Date of Patent: June 2, 2009
    Assignee: NVIDIA Corporation
    Inventors: John Erik Lindholm, Brett W. Coon, Gary M. Tarolli
  • Patent number: 7456835
    Abstract: A graphics processing unit can queue a large number of texture requests to balance out the variability of texture requests without the need for a large texture request buffer. A dedicated texture request buffer queues the relatively small texture commands and parameters. Additionally, for each queued texture command, an associated set of texture arguments, which are typically much larger than the texture command, are stored in a general purpose register. The texture unit retrieves texture commands from the texture request buffer and then fetches the associated texture arguments from the appropriate general purpose register. The texture arguments may be stored in the general purpose register designated as the destination of the final texture value computed by the texture unit. Because the destination register must be allocated for the final texture value as texture commands are queued, storing the texture arguments in this register does not consume any additional registers.
    Type: Grant
    Filed: January 25, 2006
    Date of Patent: November 25, 2008
    Assignee: Nvidia Corporation
    Inventors: John Erik Lindholm, John R. Nickolls, Simon S. Moy, Brett W. Coon
  • Patent number: 7434032
    Abstract: A scoreboard memory for a processing unit has separate memory regions allocated to each of the multiple threads to be processed. For each thread, the scoreboard memory stores register identifiers of registers that have pending writes. When an instruction is added to an instruction buffer, the register identifiers of the registers specified in the instruction are compared with the register identifiers stored in the scoreboard memory for that instruction's thread, and a multi-bit value representing the comparison result is generated. The multi-bit value is stored with the instruction in the instruction buffer and may be updated as instructions belonging to the same thread complete their execution. Before the instruction is issued for execution, this multi-bit value is checked. If this multi-bit value indicates that none of the registers specified in the instruction have pending writes, the instruction is allowed to issue for execution.
    Type: Grant
    Filed: December 13, 2005
    Date of Patent: October 7, 2008
    Assignee: NVIDIA Corporation
    Inventors: Brett W. Coon, Peter C. Mills, Stuart F. Oberman, Ming Y. Siu
  • Patent number: 7418576
    Abstract: A graphics processor buffers vertex thread and pixel threads. The different types of threads issue instructions corresponding to different sets of operations. A plurality of different types of execution units are provided, each type of execution unit servicing a different class of operations, such as an executing unit supporting texture operations, an execution unit supporting blending operations, and an execution unit supporting mathematical operations. Current instructions of the threads are buffered and prioritized in a common instruction buffer. A set of high priority instructions is issued per cycle to the plurality of different types of execution units.
    Type: Grant
    Filed: November 17, 2004
    Date of Patent: August 26, 2008
    Assignee: Nvidia Corporation
    Inventors: John E. Lindholm, Brett W. Coon