Patents by Inventor Peter C. Mills

Peter C. Mills has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 8176265
    Abstract: A memory is used by concurrent threads in a multithreaded processor. Any addressable storage location is accessible by any of the concurrent threads, but only one location at a time is accessible. The memory is coupled to parallel processing engines that generate a group of parallel memory access requests, each specifying a target address that might be the same or different for different requests. Serialization logic selects one of the target addresses and determines which of the requests specify the selected target address. All such requests are allowed to proceed in parallel, while other requests are deferred. Deferred requests may be regenerated and processed through the serialization logic so that a group of requests can be satisfied by accessing each different target address in the group exactly once.
    Type: Grant
    Filed: June 21, 2011
    Date of Patent: May 8, 2012
    Assignee: NVIDIA Corporation
    Inventors: Brett W. Coon, Ming Y. Siu, Weizhong Xu, Stuart F. Oberman, John R. Nickolls, Peter C. Mills
  • Publication number: 20120036329
    Abstract: A system and method for locking and unlocking access to a shared memory for atomic operations provides immediate feedback indicating whether or not the lock was successful. Read data is returned to the requestor with the lock status. The lock status may be changed concurrently when locking during a read or unlocking during a write. Therefore, it is not necessary to check the lock status as a separate transaction prior to or during a read-modify-write operation. Additionally, a lock or unlock may be explicitly specified for each atomic memory operation. Therefore, lock operations are not performed for operations that do not modify the contents of a memory location.
    Type: Application
    Filed: October 18, 2011
    Publication date: February 9, 2012
    Inventors: Brett W. Coon, John R. Nickolls, Lars Nyland, Peter C. Mills
  • Patent number: 8108625
    Abstract: Concurrent threads in a multithreaded processor share access to a memory, with any location in the shared memory being accessible by any thread. In one embodiment, the shared memory has multiple independently-addressable memory banks, and one location per bank can be accessed in parallel. Parallel processing engines executing the threads generate a group of parallel memory access requests. Address conflict logic determines whether the requests can be satisfied in parallel (e.g., based on bank access constraints) and serializes the requests to the extent needed to avoid conflicts. In some embodiments, data read from one address in the shared memory can be broadcast to multiple processing engines.
    Type: Grant
    Filed: October 30, 2006
    Date of Patent: January 31, 2012
    Assignee: NVIDIA Corporation
    Inventors: Brett W. Coon, Ming Y. Siu, Weizhong Xu, Stuart F. Oberman, John R. Nickolls, Peter C. Mills
  • Patent number: 8055856
    Abstract: A system and method for locking and unlocking access to a shared memory for atomic operations provides immediate feedback indicating whether or not the lock was successful. Read data is returned to the requestor with the lock status. The lock status may be changed concurrently when locking during a read or unlocking during a write. Therefore, it is not necessary to check the lock status as a separate transaction prior to or during a read-modify-write operation. Additionally, a lock or unlock may be explicitly specified for each atomic memory operation. Therefore, lock operations are not performed for operations that do not modify the contents of a memory location.
    Type: Grant
    Filed: March 24, 2008
    Date of Patent: November 8, 2011
    Assignee: NVIDIA Corporation
    Inventors: Brett W. Coon, John R. Nickolls, Lars Nyland, Peter C. Mills
  • Publication number: 20110252204
    Abstract: A memory is used by concurrent threads in a multithreaded processor. Any addressable storage location is accessible by any of the concurrent threads, but only one location at a time is accessible. The memory is coupled to parallel processing engines that generate a group of parallel memory access requests, each specifying a target address that might be the same or different for different requests. Serialization logic selects one of the target addresses and determines which of the requests specify the selected target address. All such requests are allowed to proceed in parallel, while other requests are deferred. Deferred requests may be regenerated and processed through the serialization logic so that a group of requests can be satisfied by accessing each different target address in the group exactly once.
    Type: Application
    Filed: June 21, 2011
    Publication date: October 13, 2011
    Applicant: NVIDIA Corporation
    Inventors: Brett W. Coon, Ming Y. Siu, Weizhong Xu, Stuart F. Oberman, John R. Nickolls, Peter C. Mills
  • Patent number: 7949855
    Abstract: A processor buffers asynchronous threads. Instructions requiring operations provided by a plurality of execution units are divided into phases, each phase having at least one computation operation and at least one memory access operation. Instructions within each phase are qualified and prioritized. The instructions may be qualified based on the status of the execution unit needed to execute one or more of the current instructions. The instructions may also be qualified based on an age of each instruction, status of the execution units, a divergence potential, locality, thread diversity, and resource requirements. Qualified instructions may be prioritized based on execution units needed to execute instructions and the execution units in use. One or more of the prioritized instructions is issued per cycle to the plurality of execution units.
    Type: Grant
    Filed: April 28, 2008
    Date of Patent: May 24, 2011
    Assignee: NVIDIA Corporation
    Inventors: Peter C. Mills, John Erik Lindholm, Brett W. Coon, Gary M. Tarolli, John Matthew Burgess
  • Patent number: 7788468
    Abstract: A “cooperative thread array,” or “CTA,” is a group of multiple threads that concurrently execute the same program on an input data set to produce an output data set. Each thread in a CTA has a unique thread identifier assigned at thread launch time that controls various aspects of the thread's processing behavior such as the portion of the input data set to be processed by each thread, the portion of an output data set to be produced by each thread, and/or sharing of intermediate results among threads. Different threads of the CTA are advantageously synchronized at appropriate points during CTA execution using a barrier synchronization technique in which barrier instructions in the CTA program are detected and used to suspend execution of some threads until a specified number of other threads also reaches the barrier point.
    Type: Grant
    Filed: December 15, 2005
    Date of Patent: August 31, 2010
    Assignee: NVIDIA Corporation
    Inventors: John R. Nickolls, Stephen D. Lew, Brett W. Coon, Peter C. Mills
  • Patent number: 7761697
    Abstract: One embodiment of a computing system configured to manage divergent threads in a thread group includes a stack configured to store at least one token and a multithreaded processing unit. The multithreaded processing unit is configured to perform the steps of fetching a program instruction, determining that the program instruction is an indirect branch instruction, and processing the indirect branch instruction as a sequence of two-way branches to execute an indirect branch instruction with multiple branch addresses. Indirect branch instructions may be used to allow greater flexibility since the branch address or multiple branch addresses do not need to be determined at compile time.
    Type: Grant
    Filed: November 6, 2006
    Date of Patent: July 20, 2010
    Assignee: NVIDIA Corporation
    Inventors: Brett W. Coon, John Erik Lindholm, Peter C. Mills, John R. Nickolls
  • Publication number: 20090240931
    Abstract: An indirect branch instruction takes an address register as an argument in order to provide indirect function call capability for single-instruction multiple-thread (SIMT) processor architectures. The indirect branch instruction is used to implement indirect function calls, virtual function calls, and switch statements to improve processing performance compared with using sequential chains of tests and branches.
    Type: Application
    Filed: March 24, 2008
    Publication date: September 24, 2009
    Inventors: Brett W. Coon, John R. Nickolls, Lars Nyland, Peter C. Mills, John Erik Lindholm
  • Publication number: 20090240860
    Abstract: A system and method for locking and unlocking access to a shared memory for atomic operations provides immediate feedback indicating whether or not the lock was successful. Read data is returned to the requestor with the lock status. The lock status may be changed concurrently when locking during a read or unlocking during a write. Therefore, it is not necessary to check the lock status as a separate transaction prior to or during a read-modify-write operation. Additionally, a lock or unlock may be explicitly specified for each atomic memory operation. Therefore, lock operations are not performed for operations that do not modify the contents of a memory location.
    Type: Application
    Filed: March 24, 2008
    Publication date: September 24, 2009
    Inventors: Brett W. Coon, John R. Nickolls, Lars Nyland, Peter C. Mills
  • Patent number: 7434032
    Abstract: A scoreboard memory for a processing unit has separate memory regions allocated to each of the multiple threads to be processed. For each thread, the scoreboard memory stores register identifiers of registers that have pending writes. When an instruction is added to an instruction buffer, the register identifiers of the registers specified in the instruction are compared with the register identifiers stored in the scoreboard memory for that instruction's thread, and a multi-bit value representing the comparison result is generated. The multi-bit value is stored with the instruction in the instruction buffer and may be updated as instructions belonging to the same thread complete their execution. Before the instruction is issued for execution, this multi-bit value is checked. If this multi-bit value indicates that none of the registers specified in the instruction have pending writes, the instruction is allowed to issue for execution.
    Type: Grant
    Filed: December 13, 2005
    Date of Patent: October 7, 2008
    Assignee: NVIDIA Corporation
    Inventors: Brett W. Coon, Peter C. Mills, Stuart F. Oberman, Ming Y. Siu
  • Patent number: 7366878
    Abstract: A processor buffers asynchronous threads. Current instructions requiring operations provided by a plurality of execution units are divided into phases, each phase having at least one math operation and at least one texture cache access operation. Instructions within each phase are qualified and prioritized, with texture cache access operations in a subsequent phase not qualified until all of the texture cache access operations in a current phase have completed. The instructions may be qualified based on the status of the execution unit needed to execute one or more of the instructions. The instructions may also be qualified based on an age of each instruction, a divergence potential, locality, thread diversity, and resource requirements. Qualified instructions may be prioritized based on execution units needed to execute current instructions and the execution units in use. One or more of the prioritized instructions is issued per cycle to the plurality of execution units.
    Type: Grant
    Filed: April 13, 2006
    Date of Patent: April 29, 2008
    Assignee: NVIDIA Corporation
    Inventors: Peter C. Mills, John Erik Lindholm, Brett W. Coon, Gary M. Tarolli, John Matthew Burgess
  • Patent number: 6233675
    Abstract: Improvements are made in how microprocessors execute AND, OR, and TEST instructions when the operands of this instruction are equal. AND/OR/TEST instructions with equal operands are used to set flags based on the contents of the single operand without explicitly performing the actual AND/OR/TEST command. By resetting these flags directly, this mechanism allows these instructions to be paired with preceding dependent instructions simply by using the flags set by the AND/OR/TEST for the previous instruction. An architecture that hardwires the implementation into the microprocessor through logic gates is preferred. This will result in increased speed while reducing power consumption. Further, a full-sized ALU is not needed in order to execute the AND/OR/TEST instruction with equal operands. As this is a more direct procedure, a pipeline with a reduced capability ALU can be utilized.
    Type: Grant
    Filed: March 25, 1999
    Date of Patent: May 15, 2001
    Assignee: Rise Technology Company
    Inventors: Kenneth K. Munson, Peter C. Mills