Patents by Inventor Eric Lyell Hill

Eric Lyell Hill has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Pre-scheduled replays of divergent operations

Patent number: 10152329

Abstract: One embodiment of the present disclosure sets forth an optimized way to execute pre-scheduled replay operations for divergent operations in a parallel processing subsystem. Specifically, a streaming multiprocessor (SM) includes a multi-stage pipeline configured to insert pre-scheduled replay operations into a multi-stage pipeline. A pre-scheduled replay unit detects whether the operation associated with the current instruction is accessing a common resource. If the threads are accessing data which are distributed across multiple cache lines, then the pre-scheduled replay unit inserts pre-scheduled replay operations behind the current instruction. The multi-stage pipeline executes the instruction and the associated pre-scheduled replay operations sequentially. If additional threads remain unserviced after execution of the instruction and the pre-scheduled replay operations, then additional replay operations are inserted via the replay loop, until all threads are serviced.

Type: Grant

Filed: February 9, 2012

Date of Patent: December 11, 2018

Assignee: NVIDIA CORPORATION

Inventors: Michael Fetterman, Stewart Glenn Carlton, Jack Hilaire Choquette, Shirish Gadre, Olivier Giroux, Douglas J. Hahn, Steven James Heinrich, Eric Lyell Hill, Charles McCarver, Omkar Paranjape, Anjana Rajendran, Rajeshwaran Selvanesan
Mechanism for waking common resource requests within a resource management subsystem

Patent number: 10095548

Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.

Type: Grant

Filed: May 21, 2012

Date of Patent: October 9, 2018

Assignee: NVIDIA CORPORATION

Inventors: Michael Fetterman, Shirish Gadre, John H. Edmondson, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Rajeshwaran Selvanesan, Charles McCarver, Kevin Mitchell, Steven James Heinrich
Resource management subsystem that maintains fairness and order

Patent number: 9836325

Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.

Type: Grant

Filed: May 21, 2012

Date of Patent: December 5, 2017

Assignee: NVIDIA Corporation

Inventors: Michael Fetterman, Shirish Gadre, John H. Edmondson, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Rajeshwaran Selvanesan, Charles McCarver, Kevin Mitchell, Steven James Heinrich
Batched replays of divergent operations

Patent number: 9817668

Abstract: One embodiment of the present invention sets forth an approach for executing replay operations for divergent operations in a parallel processing subsystem. Specifically, the streaming multiprocessor (SM) includes a multistage pipeline configured to batch two or more replay operations for processing via replay loop. A logic element within the multistage pipeline detects whether the current pipeline stage is accessing a shared resource, such as loading data from a shared memory. If the threads are accessing data which are distributed across multiple cache lines, then the multistage pipeline batches two or more replay operations, where the replay operations are inserted into the pipeline back-to-back.

Type: Grant

Filed: December 16, 2011

Date of Patent: November 14, 2017

Assignee: NVIDIA Corporation

Inventors: Michael Fetterman, Jack Hilaire Choquette, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Stewart Glenn Carlton, Rajeshwaran Selvanesan, Douglas J. Hahn, Steven James Heinrich
Mechanism for tracking age of common resource requests within a resource management subsystem

Patent number: 9755994

Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.

Type: Grant

Filed: May 21, 2012

Date of Patent: September 5, 2017

Assignee: NVIDIA Corporation

Inventors: Michael Fetterman, Shirish Gadre, John H. Edmondson, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Rajeshwaran Selvanesan, Charles McCarver, Kevin Mitchell, Steven James Heinrich
Shaped register file reads

Patent number: 9626191

Abstract: One embodiment of the present invention sets forth a technique for performing a shaped access of a register file that includes a set of N registers, wherein N is greater than or equal to two. The technique involves, for at least one thread included in a group of threads, receiving a request to access a first amount of data from each register in the set of N registers, and configuring a crossbar to allow the at least one thread to access the first amount of data from each register in the set of N registers.

Type: Grant

Filed: December 22, 2011

Date of Patent: April 18, 2017

Assignee: NVIDIA Corporation

Inventors: Jack Hilaire Choquette, Michael Fetterman, Shirish Gadre, Xiaogang Qiu, Omkar Paranjape, Anjana Rajendran, Stewart Glenn Carlton, Eric Lyell Hill, Rajeshwaran Selvanesan, Douglas J. Hahn
Hardware WCK2CK training engine using meta-EDC sweeping and adjustably accurate voting algorithm for clock phase detection

Patent number: 8812892

Abstract: One embodiment of the present invention sets forth a technique for performing high-performance clock training. One clock training sweep operation is performed to determine phase relationships for two write clocks with respect to a command clock. The phase relationships are generated to satisfy timing requirements for two different client devices, such as GDDR5 DRAM components. A second clock training sweep operation is performed to better align local clocks operating on the client devices. A voting tally is maintained during the second clock training sweep to record phase agreement at each step in the clock training sweep. The voting tally then determines whether one of the local clocks should be inverted to better align the two local clocks.

Type: Grant

Filed: December 30, 2009

Date of Patent: August 19, 2014

Assignee: NVIDIA Corporation

Inventors: Eric Lyell Hill, Russell R. Newcomb, Shu-Yi Yu
Cyclic redundancy check generation via distributed time multiplexed linear feedback shift registers

Patent number: 8738990

Abstract: Cyclic redundancy check (CRC) values are efficiently calculated using an improved linear feedback shift register (LFSR) circuit. CRC value generation is separated into two sub-calculations, which are then combined to form a final CRC value. A programmable XOR engine performs logic functions via a table lookup rather than via a random logic circuit. LCRC and ECRC calculations are performed using a single shared LFSR circuit. Multiple links share the same CRC value generator. One advantage of the present invention is that CRC values are generated using smaller and fewer LFSR circuits relative to conventional circuit designs. As a result, a CRC value generator utilizing the disclosed techniques consumes less surface area of an integrated circuit and consumes less power, resulting in cooler operation.

Type: Grant

Filed: July 19, 2012

Date of Patent: May 27, 2014

Assignee: NVIDIA Corporation

Inventors: Eric Lyell Hill, Richard L. Schober, Jr., Hungse Cha
Cyclic redundancy check generation via distributed time multiplexed linear feedback shift registers

Patent number: 8726124

Abstract: Cyclic redundancy check (CRC) values are efficiently calculated using an improved linear feedback shift register (LFSR) circuit. CRC value generation is separated into two sub-calculations, which are then combined to form a final CRC value. A programmable XOR engine performs logic functions via a table lookup rather than via a random logic circuit. LCRC and ECRC calculations are performed using a single shared LFSR circuit. Multiple links share the same CRC value generator. One advantage of the present invention is that CRC values are generated using smaller and fewer LFSR circuits relative to conventional circuit designs. As a result, a CRC value generator utilizing the disclosed techniques consumes less surface area of an integrated circuit and consumes less power, resulting in cooler operation.

Type: Grant

Filed: July 19, 2012

Date of Patent: May 13, 2014

Assignee: NVIDIA Corporation

Inventors: Eric Lyell Hill, Richard L. Schober, Jr., Hungse Cha
CYCLIC REDUNDANCY CHECK GENERATION VIA DISTRIBUTED TIME MULTIPLEXED LINEAR FEEDBACK SHIFT REGISTERS

Publication number: 20140026021

Abstract: Cyclic redundancy check (CRC) values are efficiently calculated using an improved linear feedback shift register (LFSR) circuit. CRC value generation is separated into two sub-calculations, which are then combined to form a final CRC value. A programmable XOR engine performs logic functions via a table lookup rather than via a random logic circuit. LCRC and ECRC calculations are performed using a single shared LFSR circuit. Multiple links share the same CRC value generator. One advantage of the present invention is that CRC values are generated using smaller and fewer LFSR circuits relative to conventional circuit designs. As a result, a CRC value generator utilizing the disclosed techniques consumes less surface area of an integrated circuit and consumes less power, resulting in cooler operation.

Type: Application

Filed: July 19, 2012

Publication date: January 23, 2014

Inventors: Eric Lyell Hill, Richard L. Schober, JR., Hungse Cha
CYCLIC REDUNDANCY CHECK GENERATION VIA DISTRIBUTED TIME MULTIPLEXED LINEAR FEEDBACK SHIFT REGISTERS

Publication number: 20140026022

Abstract: Cyclic redundancy check (CRC) values are efficiently calculated using an improved linear feedback shift register (LFSR) circuit. CRC value generation is separated into two sub-calculations, which are then combined to form a final CRC value. A programmable XOR engine performs logic functions via a table lookup rather than via a random logic circuit. LCRC and ECRC calculations are performed using a single shared LFSR circuit. Multiple links share the same CRC value generator. One advantage of the present invention is that CRC values are generated using smaller and fewer LFSR circuits relative to conventional circuit designs. As a result, a CRC value generator utilizing the disclosed techniques consumes less surface area of an integrated circuit and consumes less power, resulting in cooler operation.

Type: Application

Filed: July 19, 2012

Publication date: January 23, 2014

Inventors: Eric Lyell HILL, Richard L. SCHOBER, JR., Hungse CHA
MECHANISM FOR WAKING COMMON RESOURCE REQUESTS WITHIN A RESOURCE MANAGEMENT SUBSYSTEM

Publication number: 20130311996

Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.

Type: Application

Filed: May 21, 2012

Publication date: November 21, 2013

Inventors: Michael FETTERMAN, Shirish GADRE, John H. EDMONDSON, Omkar PARANJAPE, Anjana RAJENDRAN, Eric Lyell HILL, Rajeshwaran SELVANESAN, Charles McCARVER, Kevin MITCHELL, Steven James HEINRICH
MECHANISM FOR TRACKING AGE OF COMMON RESOURCE REQUESTS WITHIN A RESOURCE MANAGEMENT SUBSYSTEM

Publication number: 20130311686

Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.

Type: Application

Filed: May 21, 2012

Publication date: November 21, 2013

Inventors: Michael FETTERMAN, Shirish Gadre, John H. Edmondson, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Rajeshwaran Selvanesan, Charles McCarver, Kevin Mitchell, Steven James Heinrich
RESOURCE MANAGEMENT SUBSYSTEM THAT MAINTAINS FAIRNESS AND ORDER

Publication number: 20130311999

Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.

Type: Application

Filed: May 21, 2012

Publication date: November 21, 2013

Inventors: Michael FETTERMAN, Shirish Gadre, John H. Edmondson, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Rajeshwaran Selvanesan, Charles McCarver, Kevin Mitchell, Steven James Heinrich
PRE-SCHEDULED REPLAYS OF DIVERGENT OPERATIONS

Publication number: 20130212364

Abstract: One embodiment of the present disclosure sets forth an optimized way to execute pre-scheduled replay operations for divergent operations in a parallel processing subsystem. Specifically, a streaming multiprocessor (SM) includes a multi-stage pipeline configured to insert pre-scheduled replay operations into a multi-stage pipeline. A pre-scheduled replay unit detects whether the operation associated with the current instruction is accessing a common resource. If the threads are accessing data which are distributed across multiple cache lines, then the pre-scheduled replay unit inserts pre-scheduled replay operations behind the current instruction. The multi-stage pipeline executes the instruction and the associated pre-scheduled replay operations sequentially. If additional threads remain unserviced after execution of the instruction and the pre-scheduled replay operations, then additional replay operations are inserted via the replay loop, until all threads are serviced.

Type: Application

Filed: February 9, 2012

Publication date: August 15, 2013

Inventors: Michael FETTERMAN, Stewart Glenn Carlton, Jack Hilaire Choquette, Shirish Gadre, Olivier Giroux, Douglas J. Hahn, Steven James Heinrich, Eric Lyell Hill, Charles McCarver, Omkar Paranjape, Anjana Rajendran, Rajeshwaran Selvanesan
Hardware WCK2CK training engine using meta-EDC sweeping and adjustably accurate voting algorithm for clock phase detection

Patent number: 8489911

Abstract: One embodiment of the present invention sets forth a technique for performing high-performance clock training. One clock training sweep operation is performed to determine phase relationships for two write clocks with respect to a command clock. The phase relationships are generated to satisfy timing requirements for two different client devices, such as GDDR5 DRAM components. A second clock training sweep operation is performed to better align local clocks operating on the client devices. A voting tally is maintained during the second clock training sweep to record phase agreement at each step in the clock training sweep. The voting tally then determines whether one of the local clocks should be inverted to better align the two local clocks.

Type: Grant

Filed: December 30, 2009

Date of Patent: July 16, 2013

Assignee: NVIDIA Corporation

Inventors: Eric Lyell Hill, Russell R. Newcomb, Shu-Yi Yu
SHAPED REGISTER FILE READS

Publication number: 20130166877

Abstract: One embodiment of the present invention sets forth a technique for performing a shaped access of a register file that includes a set of N registers, wherein N is greater than or equal to two. The technique involves, for at least one thread included in a group of threads, receiving a request to access a first amount of data from each register in the set of N registers, and configuring a crossbar to allow the at least one thread to access the first amount of data from each register in the set of N registers.

Type: Application

Filed: December 22, 2011

Publication date: June 27, 2013

Inventors: Jack Hilaire CHOQUETTE, Michael FETTERMAN, Shirish GADRE, Xiaogang QIU, Omkar PARANJAPE, Anjana RAJENDRAN, Stewart Glenn CARLTON, Eric Lyell HILL, Rajeshwaran SELVANESAN, Douglas J. HAHN
BATCHED REPLAYS OF DIVERGENT OPERATIONS

Publication number: 20130159684

Abstract: One embodiment of the present invention sets forth an optimized way to execute replay operations for divergent operations in a parallel processing subsystem. Specifically, the streaming multiprocessor (SM) includes a multistage pipeline configured to batch two or more replay operations for processing via replay loop. A logic element within the multistage pipeline detects whether the current pipeline stage is accessing a shared resource, such as loading data from a shared memory. If the threads are accessing data which are distributed across multiple cache lines, then the multistage pipeline batches two or more replay operations, where the replay operations are inserted into the pipeline back-to-back. Advantageously, divergent operations requiring two or more replay operations operate with reduced latency. Where memory access operations require transfer of more than two cache lines to service all threads, the number of clock cycles required to complete all replay operations is reduced.

Type: Application

Filed: December 16, 2011

Publication date: June 20, 2013

Inventors: Michael Fetterman, Jack Hilaire Choquette, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Stewart glenn Carlton, Rajeshwaran Selvanesan, Douglas J. Hahn, Steven James Heinrich