Patents by Inventor Robert Steven Glanville
Robert Steven Glanville has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 10360039Abstract: A mechanism for predicated execution of instructions within a parallel processor executing multiple threads or data lanes is disclosed. Each thread or data lane executing within the parallel processor is associated with a predicate register that stores a set of 1-bit predicates. Each of these predicates can be set using different types of predicate-setting instructions, where each predicate setting instruction specifies one or more source operands, at least one operation to be performed on the source operands, and one or more destination predicates for storing the result of the operation. An instruction can be guarded by a predicate that may influence whether the instruction is executed for a particular thread or data lane or how the instruction is executed for a particular thread or data lane.Type: GrantFiled: September 27, 2010Date of Patent: July 23, 2019Assignee: NVIDIA CORPORATIONInventors: Richard Craig Johnson, John R. Nickolls, Robert Steven Glanville
-
Patent number: 9952977Abstract: A method for managing a parallel cache hierarchy in a processing unit. The method including receiving an instruction that includes a cache operations modifier that identifies a level of the parallel cache hierarchy in which to cache data associated with the instruction; and implementing a cache replacement policy based on the cache operations modifier.Type: GrantFiled: September 24, 2010Date of Patent: April 24, 2018Assignee: NVIDIA CORPORATIONInventors: Steven James Heinrich, Alexander L. Minkin, Brett W. Coon, Rajeshwaran Selvanesan, Robert Steven Glanville, Charles McCarver, Anjana Rajendran, Stewart Glenn Carlton, John R. Nickolls, Brian Fahs
-
Patent number: 9195460Abstract: Systems and methods for compiling programs using condition codes and executing those programs when non-numeric values are present allow for explicit handling of non-numeric values. In addition to the conventional condition code values of positive, negative, and zero, a fourth value may be encoded, not a number (NaN) representing a non-numeric value. New condition tests are defined that explicitly account for condition code values of NaN. A compiler may produce code using the new condition tests to represent if and if-else statements. The code including the new condition tests generates deterministic results during execution when non-numeric values are present.Type: GrantFiled: May 2, 2006Date of Patent: November 24, 2015Assignee: NVIDIA CORPORATIONInventors: Robert Steven Glanville, John Erik Lindholm, Ming Y. Siu
-
Patent number: 9142005Abstract: One embodiment of the present invention sets forth a technique for placing texture barrier instructions within a thread program to advantageously enable efficient and correct operation of the thread program. A thread program compiler statically determines a pending request count needed to progress beyond a particular texture barrier instruction, which blocks execution of subsequent instructions that depend on previously requested data. Each instance of the thread program blocks execution at the barrier instruction until a pending request count condition is satisfied. This technique may advantageously reduce power consumption in a graphics processing unit by eliminating power consumption associated with conventional, generalized scoreboard resources.Type: GrantFiled: August 20, 2012Date of Patent: September 22, 2015Assignee: NVIDIA CORPORATIONInventors: Maxim Lukyanov, Boris Beylin, Robert Steven Glanville, Alexander Grosul
-
Patent number: 8850436Abstract: One embodiment of the present invention sets forth a technique for performing a method for synchronizing divergent executing threads. The method includes receiving a plurality of instructions that includes at least one set-synchronization instruction and at least one instruction that includes a synchronization command, and determining an active mask that indicates which threads in a plurality of threads are active and which threads in the plurality of threads are disabled. For each instruction included in the plurality of instructions, the instruction is transmitted to each of the active threads included in the plurality of threads. If the instruction is a set-synchronization instruction, then a synchronization token, the active mask and the synchronization point is each pushed onto a stack.Type: GrantFiled: September 28, 2010Date of Patent: September 30, 2014Assignee: NVIDIA CorporationInventors: Brian Fahs, Ming Y. Siu, Robert Steven Glanville
-
Patent number: 8677106Abstract: One embodiment of the present invention sets forth a mechanism for managing thread divergence in a thread group executing a multithreaded processor. A unanimous branch instruction, when executed, causes all the active threads in the thread group to branch only when each thread in the thread group agrees to take the branch. In such a manner, thread divergence is eliminated. A branch-any instruction, when executed, causes all the active threads in the thread group to branch when at least one thread in the thread group agrees to take the branch.Type: GrantFiled: June 14, 2010Date of Patent: March 18, 2014Assignee: Nvidia CorporationInventors: John R. Nickolls, Richard Craig Johnson, Robert Steven Glanville, Guillermo Juan Rozas
-
Publication number: 20140049549Abstract: One embodiment of the present invention sets forth a technique for placing texture barrier instructions within a thread program to advantageously enable efficient and correct operation of the thread program. A thread program compiler statically determines a pending request count needed to progress beyond a particular texture barrier instruction, which blocks execution of subsequent instructions that depend on previously requested data. Each instance of the thread program blocks execution at the barrier instruction until a pending request count condition is satisfied. This technique may advantageously reduce power consumption in a graphics processing unit by eliminating power consumption associated with conventional, generalized scoreboard resources.Type: ApplicationFiled: August 20, 2012Publication date: February 20, 2014Inventors: Maxim Lukyanov, Boris Beylin, Robert Steven Glanville, Alexander Grosul
-
Patent number: 8615646Abstract: One embodiment of the present invention sets forth a mechanism for managing thread divergence in a thread group executing a multithreaded processor. A unanimous branch instruction, when executed, causes all the active threads in the thread group to branch only when each thread in the thread group agrees to take the branch. In such a manner, thread divergence is eliminated. A branch-any instruction, when executed, causes all the active threads in the thread group to branch when at least one thread in the thread group agrees to take the branch.Type: GrantFiled: June 14, 2010Date of Patent: December 24, 2013Assignee: Nvidia CorporationInventors: John R. Nickolls, Richard Craig Johnson, Robert Steven Glanville, Guillermo Juan Rozas
-
Publication number: 20130166882Abstract: Systems and methods for scheduling instructions without instruction decode. In one embodiment, a multi-core processor includes a scheduling unit in each core for scheduling instructions from two or more threads scheduled for execution on that particular core. As threads are scheduled for execution on the core, instructions from the threads are fetched into a buffer without being decoded. The scheduling unit includes a macro-scheduler unit for performing a priority sort of the two or more threads and a micro-scheduler arbiter for determining the highest order thread that is ready to execute. The macro-scheduler unit and the micro-scheduler arbiter use pre-decode data to implement the scheduling algorithm. The pre-decode data may be generated by decoding only a small portion of the instruction or received along with the instruction. Once the micro-scheduler arbiter has selected an instruction to dispatch to the execution unit, a decode unit fully decodes the instruction.Type: ApplicationFiled: December 22, 2011Publication date: June 27, 2013Inventors: Jack Hilaire CHOQUETTE, Robert J. STOLL, Olivier GIROUX, Michael FETTERMAN, Shirish GADRE, Robert Steven GLANVILLE, Alexandre JOLY
-
Patent number: 8381203Abstract: A compiler is configured to determine a set of points in a flow graph for a software program where multithreaded execution synchronization points are inserted to synchronize divergent threads for SIMD processing. MIMD execution of divergent threads is allowed and execution of the divergent threads proceeds until a synchronization point is reached. When all of the threads reach the synchronization point, synchronous execution resumes. The synchronization points are needed to ensure proper execution of the certain instructions that require synchronous execution as defined in some graphics APIs and when synchronous execution improves performance based on a SIMD architecture.Type: GrantFiled: November 3, 2006Date of Patent: February 19, 2013Assignee: NVIDIA CorporationInventors: Boris Beylin, Robert Steven Glanville
-
Patent number: 8271763Abstract: One embodiment of the present invention sets forth a technique for unifying the addressing of multiple distinct parallel memory spaces into a single address space for a thread. A unified memory space address is converted into an address that accesses one of the parallel memory spaces for that thread. A single type of load or store instruction may be used that specifies the unified memory space address for a thread instead of using a different type of load or store instruction to access each of the distinct parallel memory spaces.Type: GrantFiled: September 25, 2009Date of Patent: September 18, 2012Assignee: NVIDIA CorporationInventors: John R. Nickolls, Brett W. Coon, Ian A. Buck, Robert Steven Glanville
-
Patent number: 8171461Abstract: Systems and methods for compiling high-level primitive programs are used to generate primitive program micro-code for execution by a primitive processor. A compiler is configured to produce micro-code for a specific target primitive processor based on the target primitive processor's capabilities. The compiler supports features of the high-level primitive program by providing conversions for different applications programming interface conventions, determining output primitive types, initializing attribute arrays based on primitive input profile modifiers, and determining vertex set lengths from specified primitive input types.Type: GrantFiled: February 24, 2006Date of Patent: May 1, 2012Assignee: NVIDIA CoporationInventors: Mark J. Kilgard, Cass W. Everitt, Christopher T. Dodd, Robert Steven Glanville
-
Patent number: 8006236Abstract: Systems and methods for compiling high-level primitive programs are used to generate primitive program micro-code for execution by a primitive processor. A compiler is configured to produce micro-code for a specific target primitive processor based on the target primitive processor's capabilities. The compiler supports features of the high-level primitive program by providing conversions for different applications programming interface conventions, determining output primitive types, initializing attribute arrays based on primitive input profile modifiers, and determining vertex set lengths from specified primitive input types.Type: GrantFiled: February 24, 2006Date of Patent: August 23, 2011Assignee: NVIDIA CorporationInventors: Mark J. Kilgard, Cass W. Everitt, Christopher T. Dodd, Robert Steven Glanville
-
Publication number: 20110078415Abstract: The invention set forth herein describes a mechanism for predicated execution of instructions within a parallel processor executing multiple threads or data lanes. Each thread or data lane executing within the parallel processor is associated with a predicate register that stores a set of 1-bit predicates. Each of these predicates can be set using different types of predicate-setting instructions, where each predicate setting instruction specifies one or more source operands, at least one operation to be performed on the source operands, and one or more destination predicates for storing the result of the operation. An instruction can be guarded by a predicate that may influence whether the instruction is executed for a particular thread or data lane or how the instruction is executed for a particular thread or data lane.Type: ApplicationFiled: September 27, 2010Publication date: March 31, 2011Inventors: Richard Craig Johnson, John R. Nickolls, Robert Steven Glanville
-
Publication number: 20110078381Abstract: A method for managing a parallel cache hierarchy in a processing unit. The method including receiving an instruction that includes a cache operations modifier that identifies a level of the parallel cache hierarchy in which to cache data associated with the instruction; and implementing a cache replacement policy based on the cache operations modifier.Type: ApplicationFiled: September 24, 2010Publication date: March 31, 2011Inventors: Steven James HEINRICH, Alexander L. Minkin, Brett W. Coon, Rajeshwaran Selvanesan, Robert Steven Glanville, Charles McCarver, Anjana Rajendran, Stewart Glenn Carlton, John R. Nickolls, Brian Fahs
-
Publication number: 20110078406Abstract: One embodiment of the present invention sets forth a technique for unifying the addressing of multiple distinct parallel memory spaces into a single address space for a thread. A unified memory space address is converted into an address that accesses one of the parallel memory spaces for that thread. A single type of load or store instruction may be used that specifies the unified memory space address for a thread instead of using a different type of load or store instruction to access each of the distinct parallel memory spaces.Type: ApplicationFiled: September 25, 2009Publication date: March 31, 2011Inventors: John R. Nickolls, Brett W. Coon, Ian A. Buck, Robert Steven Glanville
-
Publication number: 20110078690Abstract: One embodiment of the present invention sets forth a technique for performing a method for synchronizing divergent executing threads. The method includes receiving a plurality of instructions that includes at least one set-synchronization instruction and at least one instruction that includes a synchronization command, and determining an active mask that indicates which threads in a plurality of threads are active and which threads in the plurality of threads are disabled. For each instruction included in the plurality of instructions, the instruction is transmitted to each of the active threads included in the plurality of threads. If the instruction is a set-synchronization instruction, then a synchronization token, the active mask and the synchronization point is each pushed onto a stack.Type: ApplicationFiled: September 28, 2010Publication date: March 31, 2011Inventors: Brian Fahs, Ming Y. Siu, Robert Steven Glanville
-
Publication number: 20110072249Abstract: One embodiment of the present invention sets forth a mechanism for managing thread divergence in a thread group executing a multithreaded processor. A unanimous branch instruction, when executed, causes all the active threads in the thread group to branch only when each thread in the thread group agrees to take the branch. In such a manner, thread divergence is eliminated. A branch-any instruction, when executed, causes all the active threads in the thread group to branch when at least one thread in the thread group agrees to take the branch.Type: ApplicationFiled: June 14, 2010Publication date: March 24, 2011Inventors: John R. Nickolls, Richard Craig Johnson, Robert Steven Glanville, Guillermo Juan Rozas
-
Publication number: 20110072248Abstract: One embodiment of the present invention sets forth a mechanism for managing thread divergence in a thread group executing a multithreaded processor. A unanimous branch instruction, when executed, causes all the active threads in the thread group to branch only when each thread in the thread group agrees to take the branch. In such a manner, thread divergence is eliminated. A branch-any instruction, when executed, causes all the active threads in the thread group to branch when at least one thread in the thread group agrees to take the branch.Type: ApplicationFiled: June 14, 2010Publication date: March 24, 2011Inventors: John R. NICKOLLS, Richard Craig Johnson, Robert Steven Glanville, Guillermo Juan Rozas
-
Patent number: 7825933Abstract: Systems and methods for compiling high-level primitive programs are used to generate primitive program micro-code for execution by a primitive processor. A compiler is configured to produce micro-code for a specific target primitive processor based on the target primitive processor's capabilities. The compiler supports features of the high-level primitive program by providing conversions for different applications programming interface conventions, determining output primitive types, initializing attribute arrays based on primitive input profile modifiers, and determining vertex set lengths from specified primitive input types.Type: GrantFiled: February 24, 2006Date of Patent: November 2, 2010Assignee: NVIDIA CorporationInventors: Mark J. Kilgard, Cass W. Everitt, Christopher T. Dodd, Robert Steven Glanville