Patents by Inventor Robert Steven Glanville

Robert Steven Glanville has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10360039
    Abstract: A mechanism for predicated execution of instructions within a parallel processor executing multiple threads or data lanes is disclosed. Each thread or data lane executing within the parallel processor is associated with a predicate register that stores a set of 1-bit predicates. Each of these predicates can be set using different types of predicate-setting instructions, where each predicate setting instruction specifies one or more source operands, at least one operation to be performed on the source operands, and one or more destination predicates for storing the result of the operation. An instruction can be guarded by a predicate that may influence whether the instruction is executed for a particular thread or data lane or how the instruction is executed for a particular thread or data lane.
    Type: Grant
    Filed: September 27, 2010
    Date of Patent: July 23, 2019
    Assignee: NVIDIA CORPORATION
    Inventors: Richard Craig Johnson, John R. Nickolls, Robert Steven Glanville
  • Patent number: 9952977
    Abstract: A method for managing a parallel cache hierarchy in a processing unit. The method including receiving an instruction that includes a cache operations modifier that identifies a level of the parallel cache hierarchy in which to cache data associated with the instruction; and implementing a cache replacement policy based on the cache operations modifier.
    Type: Grant
    Filed: September 24, 2010
    Date of Patent: April 24, 2018
    Assignee: NVIDIA CORPORATION
    Inventors: Steven James Heinrich, Alexander L. Minkin, Brett W. Coon, Rajeshwaran Selvanesan, Robert Steven Glanville, Charles McCarver, Anjana Rajendran, Stewart Glenn Carlton, John R. Nickolls, Brian Fahs
  • Patent number: 9195460
    Abstract: Systems and methods for compiling programs using condition codes and executing those programs when non-numeric values are present allow for explicit handling of non-numeric values. In addition to the conventional condition code values of positive, negative, and zero, a fourth value may be encoded, not a number (NaN) representing a non-numeric value. New condition tests are defined that explicitly account for condition code values of NaN. A compiler may produce code using the new condition tests to represent if and if-else statements. The code including the new condition tests generates deterministic results during execution when non-numeric values are present.
    Type: Grant
    Filed: May 2, 2006
    Date of Patent: November 24, 2015
    Assignee: NVIDIA CORPORATION
    Inventors: Robert Steven Glanville, John Erik Lindholm, Ming Y. Siu
  • Patent number: 9142005
    Abstract: One embodiment of the present invention sets forth a technique for placing texture barrier instructions within a thread program to advantageously enable efficient and correct operation of the thread program. A thread program compiler statically determines a pending request count needed to progress beyond a particular texture barrier instruction, which blocks execution of subsequent instructions that depend on previously requested data. Each instance of the thread program blocks execution at the barrier instruction until a pending request count condition is satisfied. This technique may advantageously reduce power consumption in a graphics processing unit by eliminating power consumption associated with conventional, generalized scoreboard resources.
    Type: Grant
    Filed: August 20, 2012
    Date of Patent: September 22, 2015
    Assignee: NVIDIA CORPORATION
    Inventors: Maxim Lukyanov, Boris Beylin, Robert Steven Glanville, Alexander Grosul
  • Patent number: 8850436
    Abstract: One embodiment of the present invention sets forth a technique for performing a method for synchronizing divergent executing threads. The method includes receiving a plurality of instructions that includes at least one set-synchronization instruction and at least one instruction that includes a synchronization command, and determining an active mask that indicates which threads in a plurality of threads are active and which threads in the plurality of threads are disabled. For each instruction included in the plurality of instructions, the instruction is transmitted to each of the active threads included in the plurality of threads. If the instruction is a set-synchronization instruction, then a synchronization token, the active mask and the synchronization point is each pushed onto a stack.
    Type: Grant
    Filed: September 28, 2010
    Date of Patent: September 30, 2014
    Assignee: NVIDIA Corporation
    Inventors: Brian Fahs, Ming Y. Siu, Robert Steven Glanville
  • Patent number: 8677106
    Abstract: One embodiment of the present invention sets forth a mechanism for managing thread divergence in a thread group executing a multithreaded processor. A unanimous branch instruction, when executed, causes all the active threads in the thread group to branch only when each thread in the thread group agrees to take the branch. In such a manner, thread divergence is eliminated. A branch-any instruction, when executed, causes all the active threads in the thread group to branch when at least one thread in the thread group agrees to take the branch.
    Type: Grant
    Filed: June 14, 2010
    Date of Patent: March 18, 2014
    Assignee: Nvidia Corporation
    Inventors: John R. Nickolls, Richard Craig Johnson, Robert Steven Glanville, Guillermo Juan Rozas
  • Publication number: 20140049549
    Abstract: One embodiment of the present invention sets forth a technique for placing texture barrier instructions within a thread program to advantageously enable efficient and correct operation of the thread program. A thread program compiler statically determines a pending request count needed to progress beyond a particular texture barrier instruction, which blocks execution of subsequent instructions that depend on previously requested data. Each instance of the thread program blocks execution at the barrier instruction until a pending request count condition is satisfied. This technique may advantageously reduce power consumption in a graphics processing unit by eliminating power consumption associated with conventional, generalized scoreboard resources.
    Type: Application
    Filed: August 20, 2012
    Publication date: February 20, 2014
    Inventors: Maxim Lukyanov, Boris Beylin, Robert Steven Glanville, Alexander Grosul
  • Patent number: 8615646
    Abstract: One embodiment of the present invention sets forth a mechanism for managing thread divergence in a thread group executing a multithreaded processor. A unanimous branch instruction, when executed, causes all the active threads in the thread group to branch only when each thread in the thread group agrees to take the branch. In such a manner, thread divergence is eliminated. A branch-any instruction, when executed, causes all the active threads in the thread group to branch when at least one thread in the thread group agrees to take the branch.
    Type: Grant
    Filed: June 14, 2010
    Date of Patent: December 24, 2013
    Assignee: Nvidia Corporation
    Inventors: John R. Nickolls, Richard Craig Johnson, Robert Steven Glanville, Guillermo Juan Rozas
  • Publication number: 20130166882
    Abstract: Systems and methods for scheduling instructions without instruction decode. In one embodiment, a multi-core processor includes a scheduling unit in each core for scheduling instructions from two or more threads scheduled for execution on that particular core. As threads are scheduled for execution on the core, instructions from the threads are fetched into a buffer without being decoded. The scheduling unit includes a macro-scheduler unit for performing a priority sort of the two or more threads and a micro-scheduler arbiter for determining the highest order thread that is ready to execute. The macro-scheduler unit and the micro-scheduler arbiter use pre-decode data to implement the scheduling algorithm. The pre-decode data may be generated by decoding only a small portion of the instruction or received along with the instruction. Once the micro-scheduler arbiter has selected an instruction to dispatch to the execution unit, a decode unit fully decodes the instruction.
    Type: Application
    Filed: December 22, 2011
    Publication date: June 27, 2013
    Inventors: Jack Hilaire CHOQUETTE, Robert J. STOLL, Olivier GIROUX, Michael FETTERMAN, Shirish GADRE, Robert Steven GLANVILLE, Alexandre JOLY
  • Patent number: 8381203
    Abstract: A compiler is configured to determine a set of points in a flow graph for a software program where multithreaded execution synchronization points are inserted to synchronize divergent threads for SIMD processing. MIMD execution of divergent threads is allowed and execution of the divergent threads proceeds until a synchronization point is reached. When all of the threads reach the synchronization point, synchronous execution resumes. The synchronization points are needed to ensure proper execution of the certain instructions that require synchronous execution as defined in some graphics APIs and when synchronous execution improves performance based on a SIMD architecture.
    Type: Grant
    Filed: November 3, 2006
    Date of Patent: February 19, 2013
    Assignee: NVIDIA Corporation
    Inventors: Boris Beylin, Robert Steven Glanville
  • Patent number: 8271763
    Abstract: One embodiment of the present invention sets forth a technique for unifying the addressing of multiple distinct parallel memory spaces into a single address space for a thread. A unified memory space address is converted into an address that accesses one of the parallel memory spaces for that thread. A single type of load or store instruction may be used that specifies the unified memory space address for a thread instead of using a different type of load or store instruction to access each of the distinct parallel memory spaces.
    Type: Grant
    Filed: September 25, 2009
    Date of Patent: September 18, 2012
    Assignee: NVIDIA Corporation
    Inventors: John R. Nickolls, Brett W. Coon, Ian A. Buck, Robert Steven Glanville
  • Patent number: 8171461
    Abstract: Systems and methods for compiling high-level primitive programs are used to generate primitive program micro-code for execution by a primitive processor. A compiler is configured to produce micro-code for a specific target primitive processor based on the target primitive processor's capabilities. The compiler supports features of the high-level primitive program by providing conversions for different applications programming interface conventions, determining output primitive types, initializing attribute arrays based on primitive input profile modifiers, and determining vertex set lengths from specified primitive input types.
    Type: Grant
    Filed: February 24, 2006
    Date of Patent: May 1, 2012
    Assignee: NVIDIA Coporation
    Inventors: Mark J. Kilgard, Cass W. Everitt, Christopher T. Dodd, Robert Steven Glanville
  • Patent number: 8006236
    Abstract: Systems and methods for compiling high-level primitive programs are used to generate primitive program micro-code for execution by a primitive processor. A compiler is configured to produce micro-code for a specific target primitive processor based on the target primitive processor's capabilities. The compiler supports features of the high-level primitive program by providing conversions for different applications programming interface conventions, determining output primitive types, initializing attribute arrays based on primitive input profile modifiers, and determining vertex set lengths from specified primitive input types.
    Type: Grant
    Filed: February 24, 2006
    Date of Patent: August 23, 2011
    Assignee: NVIDIA Corporation
    Inventors: Mark J. Kilgard, Cass W. Everitt, Christopher T. Dodd, Robert Steven Glanville
  • Publication number: 20110078415
    Abstract: The invention set forth herein describes a mechanism for predicated execution of instructions within a parallel processor executing multiple threads or data lanes. Each thread or data lane executing within the parallel processor is associated with a predicate register that stores a set of 1-bit predicates. Each of these predicates can be set using different types of predicate-setting instructions, where each predicate setting instruction specifies one or more source operands, at least one operation to be performed on the source operands, and one or more destination predicates for storing the result of the operation. An instruction can be guarded by a predicate that may influence whether the instruction is executed for a particular thread or data lane or how the instruction is executed for a particular thread or data lane.
    Type: Application
    Filed: September 27, 2010
    Publication date: March 31, 2011
    Inventors: Richard Craig Johnson, John R. Nickolls, Robert Steven Glanville
  • Publication number: 20110078381
    Abstract: A method for managing a parallel cache hierarchy in a processing unit. The method including receiving an instruction that includes a cache operations modifier that identifies a level of the parallel cache hierarchy in which to cache data associated with the instruction; and implementing a cache replacement policy based on the cache operations modifier.
    Type: Application
    Filed: September 24, 2010
    Publication date: March 31, 2011
    Inventors: Steven James HEINRICH, Alexander L. Minkin, Brett W. Coon, Rajeshwaran Selvanesan, Robert Steven Glanville, Charles McCarver, Anjana Rajendran, Stewart Glenn Carlton, John R. Nickolls, Brian Fahs
  • Publication number: 20110078406
    Abstract: One embodiment of the present invention sets forth a technique for unifying the addressing of multiple distinct parallel memory spaces into a single address space for a thread. A unified memory space address is converted into an address that accesses one of the parallel memory spaces for that thread. A single type of load or store instruction may be used that specifies the unified memory space address for a thread instead of using a different type of load or store instruction to access each of the distinct parallel memory spaces.
    Type: Application
    Filed: September 25, 2009
    Publication date: March 31, 2011
    Inventors: John R. Nickolls, Brett W. Coon, Ian A. Buck, Robert Steven Glanville
  • Publication number: 20110078690
    Abstract: One embodiment of the present invention sets forth a technique for performing a method for synchronizing divergent executing threads. The method includes receiving a plurality of instructions that includes at least one set-synchronization instruction and at least one instruction that includes a synchronization command, and determining an active mask that indicates which threads in a plurality of threads are active and which threads in the plurality of threads are disabled. For each instruction included in the plurality of instructions, the instruction is transmitted to each of the active threads included in the plurality of threads. If the instruction is a set-synchronization instruction, then a synchronization token, the active mask and the synchronization point is each pushed onto a stack.
    Type: Application
    Filed: September 28, 2010
    Publication date: March 31, 2011
    Inventors: Brian Fahs, Ming Y. Siu, Robert Steven Glanville
  • Publication number: 20110072249
    Abstract: One embodiment of the present invention sets forth a mechanism for managing thread divergence in a thread group executing a multithreaded processor. A unanimous branch instruction, when executed, causes all the active threads in the thread group to branch only when each thread in the thread group agrees to take the branch. In such a manner, thread divergence is eliminated. A branch-any instruction, when executed, causes all the active threads in the thread group to branch when at least one thread in the thread group agrees to take the branch.
    Type: Application
    Filed: June 14, 2010
    Publication date: March 24, 2011
    Inventors: John R. Nickolls, Richard Craig Johnson, Robert Steven Glanville, Guillermo Juan Rozas
  • Publication number: 20110072248
    Abstract: One embodiment of the present invention sets forth a mechanism for managing thread divergence in a thread group executing a multithreaded processor. A unanimous branch instruction, when executed, causes all the active threads in the thread group to branch only when each thread in the thread group agrees to take the branch. In such a manner, thread divergence is eliminated. A branch-any instruction, when executed, causes all the active threads in the thread group to branch when at least one thread in the thread group agrees to take the branch.
    Type: Application
    Filed: June 14, 2010
    Publication date: March 24, 2011
    Inventors: John R. NICKOLLS, Richard Craig Johnson, Robert Steven Glanville, Guillermo Juan Rozas
  • Patent number: 7825933
    Abstract: Systems and methods for compiling high-level primitive programs are used to generate primitive program micro-code for execution by a primitive processor. A compiler is configured to produce micro-code for a specific target primitive processor based on the target primitive processor's capabilities. The compiler supports features of the high-level primitive program by providing conversions for different applications programming interface conventions, determining output primitive types, initializing attribute arrays based on primitive input profile modifiers, and determining vertex set lengths from specified primitive input types.
    Type: Grant
    Filed: February 24, 2006
    Date of Patent: November 2, 2010
    Assignee: NVIDIA Corporation
    Inventors: Mark J. Kilgard, Cass W. Everitt, Christopher T. Dodd, Robert Steven Glanville