Patents by Inventor Robert Steven Glanville

Robert Steven Glanville has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Predicted instruction execution in parallel processors with reduced per-thread state information including choosing a minimum or maximum of two operands based on a predicate value

Patent number: 10360039

Abstract: A mechanism for predicated execution of instructions within a parallel processor executing multiple threads or data lanes is disclosed. Each thread or data lane executing within the parallel processor is associated with a predicate register that stores a set of 1-bit predicates. Each of these predicates can be set using different types of predicate-setting instructions, where each predicate setting instruction specifies one or more source operands, at least one operation to be performed on the source operands, and one or more destination predicates for storing the result of the operation. An instruction can be guarded by a predicate that may influence whether the instruction is executed for a particular thread or data lane or how the instruction is executed for a particular thread or data lane.

Type: Grant

Filed: September 27, 2010

Date of Patent: July 23, 2019

Assignee: NVIDIA CORPORATION

Inventors: Richard Craig Johnson, John R. Nickolls, Robert Steven Glanville
Cache operations and policies for a multi-threaded client

Patent number: 9952977

Abstract: A method for managing a parallel cache hierarchy in a processing unit. The method including receiving an instruction that includes a cache operations modifier that identifies a level of the parallel cache hierarchy in which to cache data associated with the instruction; and implementing a cache replacement policy based on the cache operations modifier.

Type: Grant

Filed: September 24, 2010

Date of Patent: April 24, 2018

Assignee: NVIDIA CORPORATION

Inventors: Steven James Heinrich, Alexander L. Minkin, Brett W. Coon, Rajeshwaran Selvanesan, Robert Steven Glanville, Charles McCarver, Anjana Rajendran, Stewart Glenn Carlton, John R. Nickolls, Brian Fahs
Using condition codes in the presence of non-numeric values

Patent number: 9195460

Abstract: Systems and methods for compiling programs using condition codes and executing those programs when non-numeric values are present allow for explicit handling of non-numeric values. In addition to the conventional condition code values of positive, negative, and zero, a fourth value may be encoded, not a number (NaN) representing a non-numeric value. New condition tests are defined that explicitly account for condition code values of NaN. A compiler may produce code using the new condition tests to represent if and if-else statements. The code including the new condition tests generates deterministic results during execution when non-numeric values are present.

Type: Grant

Filed: May 2, 2006

Date of Patent: November 24, 2015

Assignee: NVIDIA CORPORATION

Inventors: Robert Steven Glanville, John Erik Lindholm, Ming Y. Siu
Efficient placement of texture barrier instructions

Patent number: 9142005

Abstract: One embodiment of the present invention sets forth a technique for placing texture barrier instructions within a thread program to advantageously enable efficient and correct operation of the thread program. A thread program compiler statically determines a pending request count needed to progress beyond a particular texture barrier instruction, which blocks execution of subsequent instructions that depend on previously requested data. Each instance of the thread program blocks execution at the barrier instruction until a pending request count condition is satisfied. This technique may advantageously reduce power consumption in a graphics processing unit by eliminating power consumption associated with conventional, generalized scoreboard resources.

Type: Grant

Filed: August 20, 2012

Date of Patent: September 22, 2015

Assignee: NVIDIA CORPORATION

Inventors: Maxim Lukyanov, Boris Beylin, Robert Steven Glanville, Alexander Grosul
Opcode-specified predicatable warp post-synchronization

Patent number: 8850436

Abstract: One embodiment of the present invention sets forth a technique for performing a method for synchronizing divergent executing threads. The method includes receiving a plurality of instructions that includes at least one set-synchronization instruction and at least one instruction that includes a synchronization command, and determining an active mask that indicates which threads in a plurality of threads are active and which threads in the plurality of threads are disabled. For each instruction included in the plurality of instructions, the instruction is transmitted to each of the active threads included in the plurality of threads. If the instruction is a set-synchronization instruction, then a synchronization token, the active mask and the synchronization point is each pushed onto a stack.

Type: Grant

Filed: September 28, 2010

Date of Patent: September 30, 2014

Assignee: NVIDIA Corporation

Inventors: Brian Fahs, Ming Y. Siu, Robert Steven Glanville
Unanimous branch instructions in a parallel thread processor

Patent number: 8677106

Abstract: One embodiment of the present invention sets forth a mechanism for managing thread divergence in a thread group executing a multithreaded processor. A unanimous branch instruction, when executed, causes all the active threads in the thread group to branch only when each thread in the thread group agrees to take the branch. In such a manner, thread divergence is eliminated. A branch-any instruction, when executed, causes all the active threads in the thread group to branch when at least one thread in the thread group agrees to take the branch.

Type: Grant

Filed: June 14, 2010

Date of Patent: March 18, 2014

Assignee: Nvidia Corporation

Inventors: John R. Nickolls, Richard Craig Johnson, Robert Steven Glanville, Guillermo Juan Rozas
EFFICIENT PLACEMENT OF TEXTURE BARRIER INSTRUCTIONS

Publication number: 20140049549

Abstract: One embodiment of the present invention sets forth a technique for placing texture barrier instructions within a thread program to advantageously enable efficient and correct operation of the thread program. A thread program compiler statically determines a pending request count needed to progress beyond a particular texture barrier instruction, which blocks execution of subsequent instructions that depend on previously requested data. Each instance of the thread program blocks execution at the barrier instruction until a pending request count condition is satisfied. This technique may advantageously reduce power consumption in a graphics processing unit by eliminating power consumption associated with conventional, generalized scoreboard resources.

Type: Application

Filed: August 20, 2012

Publication date: February 20, 2014

Inventors: Maxim Lukyanov, Boris Beylin, Robert Steven Glanville, Alexander Grosul
Unanimous branch instructions in a parallel thread processor

Patent number: 8615646

Abstract: One embodiment of the present invention sets forth a mechanism for managing thread divergence in a thread group executing a multithreaded processor. A unanimous branch instruction, when executed, causes all the active threads in the thread group to branch only when each thread in the thread group agrees to take the branch. In such a manner, thread divergence is eliminated. A branch-any instruction, when executed, causes all the active threads in the thread group to branch when at least one thread in the thread group agrees to take the branch.

Type: Grant

Filed: June 14, 2010

Date of Patent: December 24, 2013

Assignee: Nvidia Corporation

Inventors: John R. Nickolls, Richard Craig Johnson, Robert Steven Glanville, Guillermo Juan Rozas
METHODS AND APPARATUS FOR SCHEDULING INSTRUCTIONS WITHOUT INSTRUCTION DECODE

Publication number: 20130166882

Abstract: Systems and methods for scheduling instructions without instruction decode. In one embodiment, a multi-core processor includes a scheduling unit in each core for scheduling instructions from two or more threads scheduled for execution on that particular core. As threads are scheduled for execution on the core, instructions from the threads are fetched into a buffer without being decoded. The scheduling unit includes a macro-scheduler unit for performing a priority sort of the two or more threads and a micro-scheduler arbiter for determining the highest order thread that is ready to execute. The macro-scheduler unit and the micro-scheduler arbiter use pre-decode data to implement the scheduling algorithm. The pre-decode data may be generated by decoding only a small portion of the instruction or received along with the instruction. Once the micro-scheduler arbiter has selected an instruction to dispatch to the execution unit, a decode unit fully decodes the instruction.

Type: Application

Filed: December 22, 2011

Publication date: June 27, 2013

Inventors: Jack Hilaire CHOQUETTE, Robert J. STOLL, Olivier GIROUX, Michael FETTERMAN, Shirish GADRE, Robert Steven GLANVILLE, Alexandre JOLY
Insertion of multithreaded execution synchronization points in a software program

Patent number: 8381203

Abstract: A compiler is configured to determine a set of points in a flow graph for a software program where multithreaded execution synchronization points are inserted to synchronize divergent threads for SIMD processing. MIMD execution of divergent threads is allowed and execution of the divergent threads proceeds until a synchronization point is reached. When all of the threads reach the synchronization point, synchronous execution resumes. The synchronization points are needed to ensure proper execution of the certain instructions that require synchronous execution as defined in some graphics APIs and when synchronous execution improves performance based on a SIMD architecture.

Type: Grant

Filed: November 3, 2006

Date of Patent: February 19, 2013

Assignee: NVIDIA Corporation

Inventors: Boris Beylin, Robert Steven Glanville
Unified addressing and instructions for accessing parallel memory spaces

Patent number: 8271763

Abstract: One embodiment of the present invention sets forth a technique for unifying the addressing of multiple distinct parallel memory spaces into a single address space for a thread. A unified memory space address is converted into an address that accesses one of the parallel memory spaces for that thread. A single type of load or store instruction may be used that specifies the unified memory space address for a thread instead of using a different type of load or store instruction to access each of the distinct parallel memory spaces.

Type: Grant

Filed: September 25, 2009

Date of Patent: September 18, 2012

Assignee: NVIDIA Corporation

Inventors: John R. Nickolls, Brett W. Coon, Ian A. Buck, Robert Steven Glanville
Primitive program compilation for flat attributes with provoking vertex independence

Patent number: 8171461

Abstract: Systems and methods for compiling high-level primitive programs are used to generate primitive program micro-code for execution by a primitive processor. A compiler is configured to produce micro-code for a specific target primitive processor based on the target primitive processor's capabilities. The compiler supports features of the high-level primitive program by providing conversions for different applications programming interface conventions, determining output primitive types, initializing attribute arrays based on primitive input profile modifiers, and determining vertex set lengths from specified primitive input types.

Type: Grant

Filed: February 24, 2006

Date of Patent: May 1, 2012

Assignee: NVIDIA Coporation

Inventors: Mark J. Kilgard, Cass W. Everitt, Christopher T. Dodd, Robert Steven Glanville
System and method for compiling high-level primitive programs into primitive program micro-code

Patent number: 8006236

Abstract: Systems and methods for compiling high-level primitive programs are used to generate primitive program micro-code for execution by a primitive processor. A compiler is configured to produce micro-code for a specific target primitive processor based on the target primitive processor's capabilities. The compiler supports features of the high-level primitive program by providing conversions for different applications programming interface conventions, determining output primitive types, initializing attribute arrays based on primitive input profile modifiers, and determining vertex set lengths from specified primitive input types.

Type: Grant

Filed: February 24, 2006

Date of Patent: August 23, 2011

Assignee: NVIDIA Corporation

Inventors: Mark J. Kilgard, Cass W. Everitt, Christopher T. Dodd, Robert Steven Glanville
Efficient Predicated Execution For Parallel Processors

Publication number: 20110078415

Abstract: The invention set forth herein describes a mechanism for predicated execution of instructions within a parallel processor executing multiple threads or data lanes. Each thread or data lane executing within the parallel processor is associated with a predicate register that stores a set of 1-bit predicates. Each of these predicates can be set using different types of predicate-setting instructions, where each predicate setting instruction specifies one or more source operands, at least one operation to be performed on the source operands, and one or more destination predicates for storing the result of the operation. An instruction can be guarded by a predicate that may influence whether the instruction is executed for a particular thread or data lane or how the instruction is executed for a particular thread or data lane.

Type: Application

Filed: September 27, 2010

Publication date: March 31, 2011

Inventors: Richard Craig Johnson, John R. Nickolls, Robert Steven Glanville
Cache Operations and Policies For A Multi-Threaded Client

Publication number: 20110078381

Abstract: A method for managing a parallel cache hierarchy in a processing unit. The method including receiving an instruction that includes a cache operations modifier that identifies a level of the parallel cache hierarchy in which to cache data associated with the instruction; and implementing a cache replacement policy based on the cache operations modifier.

Type: Application

Filed: September 24, 2010

Publication date: March 31, 2011

Inventors: Steven James HEINRICH, Alexander L. Minkin, Brett W. Coon, Rajeshwaran Selvanesan, Robert Steven Glanville, Charles McCarver, Anjana Rajendran, Stewart Glenn Carlton, John R. Nickolls, Brian Fahs
Unified Addressing and Instructions for Accessing Parallel Memory Spaces

Publication number: 20110078406

Abstract: One embodiment of the present invention sets forth a technique for unifying the addressing of multiple distinct parallel memory spaces into a single address space for a thread. A unified memory space address is converted into an address that accesses one of the parallel memory spaces for that thread. A single type of load or store instruction may be used that specifies the unified memory space address for a thread instead of using a different type of load or store instruction to access each of the distinct parallel memory spaces.

Type: Application

Filed: September 25, 2009

Publication date: March 31, 2011

Inventors: John R. Nickolls, Brett W. Coon, Ian A. Buck, Robert Steven Glanville
Opcode-Specified Predicatable Warp Post-Synchronization

Publication number: 20110078690

Abstract: One embodiment of the present invention sets forth a technique for performing a method for synchronizing divergent executing threads. The method includes receiving a plurality of instructions that includes at least one set-synchronization instruction and at least one instruction that includes a synchronization command, and determining an active mask that indicates which threads in a plurality of threads are active and which threads in the plurality of threads are disabled. For each instruction included in the plurality of instructions, the instruction is transmitted to each of the active threads included in the plurality of threads. If the instruction is a set-synchronization instruction, then a synchronization token, the active mask and the synchronization point is each pushed onto a stack.

Type: Application

Filed: September 28, 2010

Publication date: March 31, 2011

Inventors: Brian Fahs, Ming Y. Siu, Robert Steven Glanville
UNANIMOUS BRANCH INSTRUCTIONS IN A PARALLEL THREAD PROCESSOR

Publication number: 20110072249

Abstract: One embodiment of the present invention sets forth a mechanism for managing thread divergence in a thread group executing a multithreaded processor. A unanimous branch instruction, when executed, causes all the active threads in the thread group to branch only when each thread in the thread group agrees to take the branch. In such a manner, thread divergence is eliminated. A branch-any instruction, when executed, causes all the active threads in the thread group to branch when at least one thread in the thread group agrees to take the branch.

Type: Application

Filed: June 14, 2010

Publication date: March 24, 2011

Inventors: John R. Nickolls, Richard Craig Johnson, Robert Steven Glanville, Guillermo Juan Rozas
UNANIMOUS BRANCH INSTRUCTIONS IN A PARALLEL THREAD PROCESSOR

Publication number: 20110072248

Abstract: One embodiment of the present invention sets forth a mechanism for managing thread divergence in a thread group executing a multithreaded processor. A unanimous branch instruction, when executed, causes all the active threads in the thread group to branch only when each thread in the thread group agrees to take the branch. In such a manner, thread divergence is eliminated. A branch-any instruction, when executed, causes all the active threads in the thread group to branch when at least one thread in the thread group agrees to take the branch.

Type: Application

Filed: June 14, 2010

Publication date: March 24, 2011

Inventors: John R. NICKOLLS, Richard Craig Johnson, Robert Steven Glanville, Guillermo Juan Rozas
Managing primitive program vertex attributes as per-attribute arrays

Patent number: 7825933

Abstract: Systems and methods for compiling high-level primitive programs are used to generate primitive program micro-code for execution by a primitive processor. A compiler is configured to produce micro-code for a specific target primitive processor based on the target primitive processor's capabilities. The compiler supports features of the high-level primitive program by providing conversions for different applications programming interface conventions, determining output primitive types, initializing attribute arrays based on primitive input profile modifiers, and determining vertex set lengths from specified primitive input types.

Type: Grant

Filed: February 24, 2006

Date of Patent: November 2, 2010

Assignee: NVIDIA Corporation

Inventors: Mark J. Kilgard, Cass W. Everitt, Christopher T. Dodd, Robert Steven Glanville

1 2 next