Patents by Inventor Michael Alan Fetterman

Michael Alan Fetterman has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Method and apparatus for obtaining sampled positions of texturing operations

Patent number: 10699427

Abstract: Methods and apparatuses are disclosed for reporting texture footprint information. A texture footprint identifies the portion of a texture that will be utilized in rendering a pixel in a scene. The disclosed methods and apparatuses advantageously improve system efficiency in decoupled shading systems by first identifying which texels in a given texture map are needed for subsequently rendering a scene. Therefore, the number of texels that are generated and stored may be reduced to include the identified texels. Texels that are not identified need not be rendered and/or stored.

Type: Grant

Filed: August 12, 2019

Date of Patent: June 30, 2020

Assignee: NVIDIA Corporation

Inventors: Yury Uralsky, Henry Packard Moreton, Eric Brian Lum, Jonathan J. Dunaisky, Steven James Heinrich, Stefano Pescador, Shirish Gadre, Michael Alan Fetterman
METHOD AND APPARATUS FOR OBTAINING SAMPLED POSITIONS OF TEXTURING OPERATIONS

Publication number: 20200013174

Abstract: Methods and apparatuses are disclosed for reporting texture footprint information. A texture footprint identifies the portion of a texture that will be utilized in rendering a pixel in a scene. The disclosed methods and apparatuses advantageously improve system efficiency in decoupled shading systems by first identifying which texels in a given texture map are needed for subsequently rendering a scene. Therefore, the number of texels that are generated and stored may be reduced to include the identified texels. Texels that are not identified need not be rendered and/or stored.

Type: Application

Filed: August 12, 2019

Publication date: January 9, 2020

Inventors: Yury Uralsky, Henry Packard Moreton, Eric Brian Lum, Jonathan J. Dunaisky, Steven James Heinrich, Stefano Pescador, Shirish Gadre, Michael Alan Fetterman
Dispatching a stored instruction in response to determining that a received instruction is of a same instruction type

Patent number: 10503513

Abstract: A subsystem is configured to support a distributed instruction set architecture with primary and secondary execution pipelines. The primary execution pipeline supports the execution of a subset of instructions in the distributed instruction set architecture that are issued frequently. The secondary execution pipeline supports the execution of another subset of instructions in the distributed instruction set architecture that are issued less frequently. Both execution pipelines also support the execution of FFMA instructions as well as a common subset of instructions in the distributed instruction set architecture. When dispatching a requested instruction, an instruction scheduling unit is configured to select between the two execution pipelines based on various criteria. Those criteria may include power efficiency with which the instruction can be executed and availability of execution units to support execution of the instruction.

Type: Grant

Filed: October 23, 2013

Date of Patent: December 10, 2019

Assignee: NVIDIA CORPORATION

Inventors: David Conrad Tannenbaum, Srinivasan (Vasu) Iyer, Stuart F. Oberman, Ming Y. Siu, Michael Alan Fetterman, John Matthew Burgess, Shirish Gadre
Hierarchical staging areas for scheduling threads for execution

Patent number: 10489200

Abstract: One embodiment of the present invention is a computer-implemented method for scheduling a thread group for execution on a processing engine that includes identifying a first thread group included in a first set of thread groups that can be issued for execution on the processing engine, where the first thread group includes one or more threads. The method also includes transferring the first thread group from the first set of thread groups to a second set of thread groups, allocating hardware resources to the first thread group, and selecting the first thread group from the second set of thread groups for execution on the processing engine. One advantage of the disclosed technique is that a scheduler only allocates limited hardware resources to thread groups that are, in fact, ready to be issued for execution, thereby conserving those resources in a manner that is generally more efficient than conventional techniques.

Type: Grant

Filed: October 23, 2013

Date of Patent: November 26, 2019

Assignee: NVIDIA CORPORATION

Inventors: Olivier Giroux, Jack Hilaire Choquette, Robert J. Stoll, Xiaogang Qiu, Michael Alan Fetterman
Method and apparatus for obtaining sampled positions of texturing operations

Patent number: 10424074

Abstract: Methods and apparatuses are disclosed for reporting texture footprint information. A texture footprint identifies the portion of a texture that will be utilized in rendering a pixel in a scene. The disclosed methods and apparatuses advantageously improve system efficiency in decoupled shading systems by first identifying which texels in a given texture map are needed for subsequently rendering a scene. Therefore, the number of texels that are generated and stored may be reduced to include the identified texels. Texels that are not identified need not be rendered and/or stored.

Type: Grant

Filed: July 3, 2018

Date of Patent: September 24, 2019

Assignee: NVIDIA Corporation

Inventors: Yury Uralsky, Henry Packard Moreton, Eric Brian Lum, Jonathan J. Dunaisky, Steven James Heinrich, Stefano Pescador, Shirish Gadre, Michael Alan Fetterman
Execution of divergent threads using a convergence barrier

Patent number: 10067768

Abstract: A method, system, and computer program product for executing divergent threads using a convergence barrier are disclosed. A first instruction in a program is executed by a plurality of threads, where the first instruction, when executed by a particular thread, indicates to a scheduler unit that the thread participates in a convergence barrier. A first path through the program is executed by a first divergent portion of the participating threads and a second path through the program is executed by a second divergent portion of the participating threads. The first divergent portion of the participating threads executes a second instruction in the program and transitions to a blocked state at the convergence barrier. The scheduler unit determines that all of the participating threads are synchronized at the convergence barrier and the convergence barrier is cleared.

Type: Grant

Filed: July 13, 2015

Date of Patent: September 4, 2018

Assignee: NVIDIA CORPORATION

Inventors: Gregory Frederick Diamos, Richard Craig Johnson, Vinod Grover, Olivier Giroux, Jack H. Choquette, Michael Alan Fetterman, Ajay S. Tirumala, Peter Nelson, Ronny Meir Krashinsky
System, method, and computer program product for implementing software-based scoreboarding

Patent number: 9612836

Abstract: A system, method, and computer program product are provided for implementing a software-based scoreboarding mechanism. The method includes the steps of receiving a dependency barrier instruction that includes an immediate value and an identifier corresponding to a first register and, based on a comparison of the immediate value to the value stored in the first register, dispatching a subsequent instruction to at least a first processing unit of two or more processing units.

Type: Grant

Filed: February 3, 2014

Date of Patent: April 4, 2017

Assignee: NVIDIA Corporation

Inventors: Robert Ohannessian, Jr., Michael Alan Fetterman, Olivier Giroux, Jack H. Choquette, Xiaogang Qiu, Shirish Gadre, Meenaradchagan Vishnu
Technique for performing arbitrary width integer arithmetic operations using fixed width elements

Patent number: 9600235

Abstract: One embodiment of the present invention includes a method for performing arithmetic operations on arbitrary width integers using fixed width elements. The method includes receiving a plurality of input operands, segmenting each input operand into multiple sectors, performing a plurality of multiply-add operations based on the multiple sectors to generate a plurality of multiply-add operation results, and combining the multiply-add operation results to generate a final result. One advantage of the disclosed embodiments is that, by using a common fused floating point multiply-add unit to perform arithmetic operations on integers of arbitrary width, the method avoids the area and power penalty of having additional dedicated integer units.

Type: Grant

Filed: September 13, 2013

Date of Patent: March 21, 2017

Assignee: NVIDIA Corporation

Inventors: Srinivasan Iyer, Michael Alan Fetterman, David Conrad Tannenbaum
System and processor for implementing interruptible batches of instructions

Patent number: 9477480

Abstract: A system, method, and computer program product are provided for scheduling interruptible hatches of instructions for execution by one or more functional units of a processor. The method includes the steps of receiving a batch of instructions that includes a plurality of instructions and dispatching at least one instruction from the batch of instructions to one or more functional units for execution. The method further includes the step of receiving an interrupt request that causes an interrupt routine to be dispatched to the one or more functional units prior to all instructions in the batch of instructions being dispatched to the one or more functional units. When the interrupt request is received, the method further includes the step of storing batch-level resources in a memory to resume execution of the batch of instructions once the interrupt routine has finished execution.

Type: Grant

Filed: January 30, 2014

Date of Patent: October 25, 2016

Assignee: NVIDIA Corporation

Inventors: Olivier Giroux, Robert Ohannessian, Jr., Jack H. Choquette, Michael Alan Fetterman
System, method, and computer program product for implementing multi-cycle register file bypass

Patent number: 9477482

Abstract: A system, method, and computer program product are provided for implementing a multi-cycle register file bypass mechanism. The method includes the steps of receiving a set of control bits, combining the set of control bits with a set of valid bits associated with previously issued instructions, and enabling a bypass path for each thread based on the set of control bits and the set of valid bits. Each valid bit in the set of valid bits indicates whether execution of an instruction of the previously issued instructions was enabled for a thread in a thread block.

Type: Grant

Filed: September 26, 2013

Date of Patent: October 25, 2016

Assignee: NVIDIA Corporation

Inventors: Xiaogang Qiu, Ian Chi Yan Kwong, Ming Yiu Siu, Jack H. Choquette, Michael Alan Fetterman
System and processor that include an implementation of decoupled pipelines

Patent number: 9471307

Abstract: A system and apparatus are provided that include an implementation for decoupled pipelines. The apparatus includes a scheduler configured to issue instructions to one or more functional units and a functional unit coupled to a queue having a number of slots for storing instructions. The instructions issued to the functional unit are stored in the queue until the functional unit is available to process the instructions.

Type: Grant

Filed: January 3, 2014

Date of Patent: October 18, 2016

Assignee: NVIDIA Corporation

Inventors: Olivier Giroux, Michael Alan Fetterman, Robert Ohannessian, Jr., Shirish Gadre, Jack H. Choquette, Xiaogang Qiu, Jeffrey Scott Tuckey, Robert James Stoll
EXECUTION OF DIVERGENT THREADS USING A CONVERGENCE BARRIER

Publication number: 20160019066

Abstract: A method, system, and computer program product for executing divergent threads using a convergence barrier are disclosed. A first instruction in a program is executed by a plurality of threads, where the first instruction, when executed by a particular thread, indicates to a scheduler unit that the thread participates in a convergence barrier. A first path through the program is executed by a first divergent portion of the participating threads and a second path through the program is executed by a second divergent portion of the participating threads. The first divergent portion of the participating threads executes a second instruction in the program and transitions to a blocked state at the convergence barrier. The scheduler unit determines that all of the participating threads are synchronized at the convergence barrier and the convergence barrier is cleared.

Type: Application

Filed: July 13, 2015

Publication date: January 21, 2016

Inventors: Gregory Frederick Diamos, Richard Craig Johnson, Vinod Grover, Olivier Giroux, Jack H. Choquette, Michael Alan Fetterman, Ajay S. Tirumala, Peter Nelson, Ronny Meir Krashinsky
SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR IMPLEMENTING SOFTWARE-BASED SCOREBOARDING

Publication number: 20150220341

Abstract: A system, method, and computer program product are provided for implementing a software-based scoreboarding mechanism. The method includes the steps of receiving a dependency barrier instruction that includes an immediate value and an identifier corresponding to a first register and, based on a comparison of the immediate value to the value stored in the first register, dispatching a subsequent instruction to at least a first processing unit of two or more processing units.

Type: Application

Filed: February 3, 2014

Publication date: August 6, 2015

Applicant: NVIDIA Corporation

Inventors: Robert Ohannessian, JR., Michael Alan Fetterman, Olivier Giroux, Jack H. Choquette, Xiaogang Qiu, Shirish Gadre, Meenaradchagan Vishnu
SYSTEM AND PROCESSOR FOR IMPLEMENTING INTERRUPTIBLE BATCHES OF INSTRUCTIONS

Publication number: 20150212819

Abstract: A system, method, and computer program product are provided for scheduling interruptible hatches of instructions for execution by one or more functional units of a processor. The method includes the steps of receiving a batch of instructions that includes a plurality of instructions and dispatching at least one instruction from the batch of instructions to one or more functional units for execution. The method further includes the step of receiving an interrupt request that causes an interrupt routine to be dispatched to the one or more functional units prior to all instructions in the batch of instructions being dispatched to the one or more functional units. When the interrupt request is received, the method further includes the step of storing batch-level resources in a memory to resume execution of the batch of instructions once the interrupt routine has finished execution.

Type: Application

Filed: January 30, 2014

Publication date: July 30, 2015

Applicant: NVIDIA Corporation

Inventors: Olivier Giroux, Robert Ohannessian, JR., Jack H. Choquette, Michael Alan Fetterman
SYSTEM AND PROCESSOR THAT INCLUDE AN IMPLEMENTATION OF DECOUPLED PIPELINES

Publication number: 20150193272

Abstract: A system and apparatus are provided that include an implementation for decoupled pipelines. The apparatus includes a scheduler configured to issue instructions to one or more functional units and a functional unit coupled to a queue having a number of slots for storing instructions. The instructions issued to the functional unit are stored in the queue until the functional unit is available to process the instructions.

Type: Application

Filed: January 3, 2014

Publication date: July 9, 2015

Applicant: NVIDIA Corporation

Inventors: Olivier Giroux, Michael Alan Fetterman, Robert Ohannessian, JR., Shirish Gadre, Jack H. Choquette, Xiaogang Qiu, Jeffrey Scott Tuckey, Robert James Stoll
HIERARCHICAL STAGING AREAS FOR SCHEDULING THREADS FOR EXECUTION

Publication number: 20150113538

Abstract: One embodiment of the present invention is a computer-implemented method for scheduling a thread group for execution on a processing engine that includes identifying a first thread group included in a first set of thread groups that can be issued for execution on the processing engine, where the first thread group includes one or more threads. The method also includes transferring the first thread group from the first set of thread groups to a second set of thread groups, allocating hardware resources to the first thread group, and selecting the first thread group from the second set of thread groups for execution on the processing engine. One advantage of the disclosed technique is that a scheduler only allocates limited hardware resources to thread groups that are, in fact, ready to be issued for execution, thereby conserving those resources in a manner that is generally more efficient than conventional techniques.

Type: Application

Filed: October 23, 2013

Publication date: April 23, 2015

Applicant: NVIDIA CORPORATION

Inventors: Olivier GIROUX, Jack Hilaire CHOQUETTE, Robert J. STOLL, Xiaogang QIU, Michael Alan FETTERMAN
EFFICIENCY THROUGH A DISTRIBUTED INSTRUCTION SET ARCHITECTURE

Publication number: 20150113254

Abstract: A subsystem is configured to support a distributed instruction set architecture with primary and secondary execution pipelines. The primary execution pipeline supports the execution of a subset of instructions in the distributed instruction set architecture that are issued frequently. The secondary execution pipeline supports the execution of another subset of instructions in the distributed instruction set architecture that are issued less frequently. Both execution pipelines also support the execution of FFMA instructions as well a common subset of instructions in the distributed instruction set architecture. When dispatching a requested instruction, an instruction scheduling unit is configured to select between the two execution pipelines based on various criteria. Those criteria may include power efficiency with which the instruction can be executed and availability of execution units to support execution of the instruction.

Type: Application

Filed: October 23, 2013

Publication date: April 23, 2015

Applicant: NVIDIA CORPORATION

Inventors: David Conrad TANNENBAUM, Srinivasan (Vasu) IYER, Stuart F. OBERMAN, Ming Y. SIU, Michael Alan FETTERMAN, John Matthew BURGESS, Shirish GADRE
SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR IMPLEMENTING MULTI-CYCLE REGISTER FILE BYPASS

Publication number: 20150089202

Abstract: A system, method, and computer program product are provided for implementing a multi-cycle register file bypass mechanism. The method includes the steps of receiving a set of control bits, combining the set of control bits with a set of valid bits associated with previously issued instructions, and enabling a bypass path for each thread based on the set of control bits and the set of valid bits. Each valid bit in the set of valid bits indicates whether execution of an instruction of the previously issued instructions was enabled for a thread in a thread block.

Type: Application

Filed: September 26, 2013

Publication date: March 26, 2015

Applicant: NVIDIA Corporation

Inventors: Xiaogang Qiu, Ian Chi Yan Kwong, Ming Yiu Siu, Jack H. Choquette, Michael Alan Fetterman
TECHNIQUE FOR PERFORMING ARBITRARY WIDTH INTEGER ARITHMETIC OPERATIONS USING FIXED WIDTH ELEMENTS

Publication number: 20150081753

Abstract: One embodiment of the present invention includes a method for performing arithmetic operations on arbitrary width integers using fixed width elements. The method includes receiving a plurality of input operands, segmenting each input operand into multiple sectors, performing a plurality of multiply-add operations based on the multiple sectors to generate a plurality of multiply-add operation results, and combining the multiply-add operation results to generate a final result. One advantage of the disclosed embodiments is that, by using a common fused floating point multiply-add unit to perform arithmetic operations on integers of arbitrary width, the method avoids the area and power penalty of having additional dedicated integer units.

Type: Application

Filed: September 13, 2013

Publication date: March 19, 2015

Applicant: NVIDIA CORPORATION

Inventors: Srinivasan (Vasu) IYER, Michael Alan FETTERMAN, David Conrad TANNENBAUM
Method and apparatus for implementing a set-associative branch target buffer

Patent number: 5944817

Abstract: A Branch Target Buffer Circuit in a computer processor that predicts branch instructions with a stream of computer instructions is disclosed. The Branch Target Buffer Circuit uses a Branch Target Buffer Cache that stores branch information about previously executed branch instructions. The branch information stored in the Branch Target Buffer Cache is addressed by the last byte of each branch instruction. When an Instruction Fetch Unit in the computer processor fetches a block of instructions it sends the Branch Target Buffer Circuit an instruction pointer. Based on the instruction pointer, the Branch Target Buffer Circuit looks in the Branch Target Buffer Cache to see if any of the instructions in the block being fetched is a branch instruction. When the Branch Target Buffer Circuit finds an upcoming branch instruction in the Branch Target Buffer Cache, the Branch Target Buffer Circuit informs an Instruction Fetch Unit about the upcoming branch instruction.

Type: Grant

Filed: October 7, 1998

Date of Patent: August 31, 1999

Assignee: Intel Corporation

Inventors: Bradley D. Hoyt, Glenn I. Hinton, David B. Papworth, Ashwani Kumar Gupta, Michael Alan Fetterman, Subramanian Natarajan, Sunil Shenoy, Reynold V. D'Sa

1 2 next