Patents by Inventor Eric Lyell Hill

Eric Lyell Hill has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10152329
    Abstract: One embodiment of the present disclosure sets forth an optimized way to execute pre-scheduled replay operations for divergent operations in a parallel processing subsystem. Specifically, a streaming multiprocessor (SM) includes a multi-stage pipeline configured to insert pre-scheduled replay operations into a multi-stage pipeline. A pre-scheduled replay unit detects whether the operation associated with the current instruction is accessing a common resource. If the threads are accessing data which are distributed across multiple cache lines, then the pre-scheduled replay unit inserts pre-scheduled replay operations behind the current instruction. The multi-stage pipeline executes the instruction and the associated pre-scheduled replay operations sequentially. If additional threads remain unserviced after execution of the instruction and the pre-scheduled replay operations, then additional replay operations are inserted via the replay loop, until all threads are serviced.
    Type: Grant
    Filed: February 9, 2012
    Date of Patent: December 11, 2018
    Assignee: NVIDIA CORPORATION
    Inventors: Michael Fetterman, Stewart Glenn Carlton, Jack Hilaire Choquette, Shirish Gadre, Olivier Giroux, Douglas J. Hahn, Steven James Heinrich, Eric Lyell Hill, Charles McCarver, Omkar Paranjape, Anjana Rajendran, Rajeshwaran Selvanesan
  • Patent number: 10095548
    Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.
    Type: Grant
    Filed: May 21, 2012
    Date of Patent: October 9, 2018
    Assignee: NVIDIA CORPORATION
    Inventors: Michael Fetterman, Shirish Gadre, John H. Edmondson, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Rajeshwaran Selvanesan, Charles McCarver, Kevin Mitchell, Steven James Heinrich
  • Patent number: 9836325
    Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.
    Type: Grant
    Filed: May 21, 2012
    Date of Patent: December 5, 2017
    Assignee: NVIDIA Corporation
    Inventors: Michael Fetterman, Shirish Gadre, John H. Edmondson, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Rajeshwaran Selvanesan, Charles McCarver, Kevin Mitchell, Steven James Heinrich
  • Patent number: 9817668
    Abstract: One embodiment of the present invention sets forth an approach for executing replay operations for divergent operations in a parallel processing subsystem. Specifically, the streaming multiprocessor (SM) includes a multistage pipeline configured to batch two or more replay operations for processing via replay loop. A logic element within the multistage pipeline detects whether the current pipeline stage is accessing a shared resource, such as loading data from a shared memory. If the threads are accessing data which are distributed across multiple cache lines, then the multistage pipeline batches two or more replay operations, where the replay operations are inserted into the pipeline back-to-back.
    Type: Grant
    Filed: December 16, 2011
    Date of Patent: November 14, 2017
    Assignee: NVIDIA Corporation
    Inventors: Michael Fetterman, Jack Hilaire Choquette, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Stewart Glenn Carlton, Rajeshwaran Selvanesan, Douglas J. Hahn, Steven James Heinrich
  • Patent number: 9755994
    Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.
    Type: Grant
    Filed: May 21, 2012
    Date of Patent: September 5, 2017
    Assignee: NVIDIA Corporation
    Inventors: Michael Fetterman, Shirish Gadre, John H. Edmondson, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Rajeshwaran Selvanesan, Charles McCarver, Kevin Mitchell, Steven James Heinrich
  • Patent number: 9626191
    Abstract: One embodiment of the present invention sets forth a technique for performing a shaped access of a register file that includes a set of N registers, wherein N is greater than or equal to two. The technique involves, for at least one thread included in a group of threads, receiving a request to access a first amount of data from each register in the set of N registers, and configuring a crossbar to allow the at least one thread to access the first amount of data from each register in the set of N registers.
    Type: Grant
    Filed: December 22, 2011
    Date of Patent: April 18, 2017
    Assignee: NVIDIA Corporation
    Inventors: Jack Hilaire Choquette, Michael Fetterman, Shirish Gadre, Xiaogang Qiu, Omkar Paranjape, Anjana Rajendran, Stewart Glenn Carlton, Eric Lyell Hill, Rajeshwaran Selvanesan, Douglas J. Hahn
  • Patent number: 8812892
    Abstract: One embodiment of the present invention sets forth a technique for performing high-performance clock training. One clock training sweep operation is performed to determine phase relationships for two write clocks with respect to a command clock. The phase relationships are generated to satisfy timing requirements for two different client devices, such as GDDR5 DRAM components. A second clock training sweep operation is performed to better align local clocks operating on the client devices. A voting tally is maintained during the second clock training sweep to record phase agreement at each step in the clock training sweep. The voting tally then determines whether one of the local clocks should be inverted to better align the two local clocks.
    Type: Grant
    Filed: December 30, 2009
    Date of Patent: August 19, 2014
    Assignee: NVIDIA Corporation
    Inventors: Eric Lyell Hill, Russell R. Newcomb, Shu-Yi Yu
  • Patent number: 8738990
    Abstract: Cyclic redundancy check (CRC) values are efficiently calculated using an improved linear feedback shift register (LFSR) circuit. CRC value generation is separated into two sub-calculations, which are then combined to form a final CRC value. A programmable XOR engine performs logic functions via a table lookup rather than via a random logic circuit. LCRC and ECRC calculations are performed using a single shared LFSR circuit. Multiple links share the same CRC value generator. One advantage of the present invention is that CRC values are generated using smaller and fewer LFSR circuits relative to conventional circuit designs. As a result, a CRC value generator utilizing the disclosed techniques consumes less surface area of an integrated circuit and consumes less power, resulting in cooler operation.
    Type: Grant
    Filed: July 19, 2012
    Date of Patent: May 27, 2014
    Assignee: NVIDIA Corporation
    Inventors: Eric Lyell Hill, Richard L. Schober, Jr., Hungse Cha
  • Patent number: 8726124
    Abstract: Cyclic redundancy check (CRC) values are efficiently calculated using an improved linear feedback shift register (LFSR) circuit. CRC value generation is separated into two sub-calculations, which are then combined to form a final CRC value. A programmable XOR engine performs logic functions via a table lookup rather than via a random logic circuit. LCRC and ECRC calculations are performed using a single shared LFSR circuit. Multiple links share the same CRC value generator. One advantage of the present invention is that CRC values are generated using smaller and fewer LFSR circuits relative to conventional circuit designs. As a result, a CRC value generator utilizing the disclosed techniques consumes less surface area of an integrated circuit and consumes less power, resulting in cooler operation.
    Type: Grant
    Filed: July 19, 2012
    Date of Patent: May 13, 2014
    Assignee: NVIDIA Corporation
    Inventors: Eric Lyell Hill, Richard L. Schober, Jr., Hungse Cha
  • Publication number: 20140026021
    Abstract: Cyclic redundancy check (CRC) values are efficiently calculated using an improved linear feedback shift register (LFSR) circuit. CRC value generation is separated into two sub-calculations, which are then combined to form a final CRC value. A programmable XOR engine performs logic functions via a table lookup rather than via a random logic circuit. LCRC and ECRC calculations are performed using a single shared LFSR circuit. Multiple links share the same CRC value generator. One advantage of the present invention is that CRC values are generated using smaller and fewer LFSR circuits relative to conventional circuit designs. As a result, a CRC value generator utilizing the disclosed techniques consumes less surface area of an integrated circuit and consumes less power, resulting in cooler operation.
    Type: Application
    Filed: July 19, 2012
    Publication date: January 23, 2014
    Inventors: Eric Lyell Hill, Richard L. Schober, JR., Hungse Cha
  • Publication number: 20140026022
    Abstract: Cyclic redundancy check (CRC) values are efficiently calculated using an improved linear feedback shift register (LFSR) circuit. CRC value generation is separated into two sub-calculations, which are then combined to form a final CRC value. A programmable XOR engine performs logic functions via a table lookup rather than via a random logic circuit. LCRC and ECRC calculations are performed using a single shared LFSR circuit. Multiple links share the same CRC value generator. One advantage of the present invention is that CRC values are generated using smaller and fewer LFSR circuits relative to conventional circuit designs. As a result, a CRC value generator utilizing the disclosed techniques consumes less surface area of an integrated circuit and consumes less power, resulting in cooler operation.
    Type: Application
    Filed: July 19, 2012
    Publication date: January 23, 2014
    Inventors: Eric Lyell HILL, Richard L. SCHOBER, JR., Hungse CHA
  • Publication number: 20130311996
    Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.
    Type: Application
    Filed: May 21, 2012
    Publication date: November 21, 2013
    Inventors: Michael FETTERMAN, Shirish GADRE, John H. EDMONDSON, Omkar PARANJAPE, Anjana RAJENDRAN, Eric Lyell HILL, Rajeshwaran SELVANESAN, Charles McCARVER, Kevin MITCHELL, Steven James HEINRICH
  • Publication number: 20130311686
    Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.
    Type: Application
    Filed: May 21, 2012
    Publication date: November 21, 2013
    Inventors: Michael FETTERMAN, Shirish Gadre, John H. Edmondson, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Rajeshwaran Selvanesan, Charles McCarver, Kevin Mitchell, Steven James Heinrich
  • Publication number: 20130311999
    Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.
    Type: Application
    Filed: May 21, 2012
    Publication date: November 21, 2013
    Inventors: Michael FETTERMAN, Shirish Gadre, John H. Edmondson, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Rajeshwaran Selvanesan, Charles McCarver, Kevin Mitchell, Steven James Heinrich
  • Publication number: 20130212364
    Abstract: One embodiment of the present disclosure sets forth an optimized way to execute pre-scheduled replay operations for divergent operations in a parallel processing subsystem. Specifically, a streaming multiprocessor (SM) includes a multi-stage pipeline configured to insert pre-scheduled replay operations into a multi-stage pipeline. A pre-scheduled replay unit detects whether the operation associated with the current instruction is accessing a common resource. If the threads are accessing data which are distributed across multiple cache lines, then the pre-scheduled replay unit inserts pre-scheduled replay operations behind the current instruction. The multi-stage pipeline executes the instruction and the associated pre-scheduled replay operations sequentially. If additional threads remain unserviced after execution of the instruction and the pre-scheduled replay operations, then additional replay operations are inserted via the replay loop, until all threads are serviced.
    Type: Application
    Filed: February 9, 2012
    Publication date: August 15, 2013
    Inventors: Michael FETTERMAN, Stewart Glenn Carlton, Jack Hilaire Choquette, Shirish Gadre, Olivier Giroux, Douglas J. Hahn, Steven James Heinrich, Eric Lyell Hill, Charles McCarver, Omkar Paranjape, Anjana Rajendran, Rajeshwaran Selvanesan
  • Patent number: 8489911
    Abstract: One embodiment of the present invention sets forth a technique for performing high-performance clock training. One clock training sweep operation is performed to determine phase relationships for two write clocks with respect to a command clock. The phase relationships are generated to satisfy timing requirements for two different client devices, such as GDDR5 DRAM components. A second clock training sweep operation is performed to better align local clocks operating on the client devices. A voting tally is maintained during the second clock training sweep to record phase agreement at each step in the clock training sweep. The voting tally then determines whether one of the local clocks should be inverted to better align the two local clocks.
    Type: Grant
    Filed: December 30, 2009
    Date of Patent: July 16, 2013
    Assignee: NVIDIA Corporation
    Inventors: Eric Lyell Hill, Russell R. Newcomb, Shu-Yi Yu
  • Publication number: 20130166877
    Abstract: One embodiment of the present invention sets forth a technique for performing a shaped access of a register file that includes a set of N registers, wherein N is greater than or equal to two. The technique involves, for at least one thread included in a group of threads, receiving a request to access a first amount of data from each register in the set of N registers, and configuring a crossbar to allow the at least one thread to access the first amount of data from each register in the set of N registers.
    Type: Application
    Filed: December 22, 2011
    Publication date: June 27, 2013
    Inventors: Jack Hilaire CHOQUETTE, Michael FETTERMAN, Shirish GADRE, Xiaogang QIU, Omkar PARANJAPE, Anjana RAJENDRAN, Stewart Glenn CARLTON, Eric Lyell HILL, Rajeshwaran SELVANESAN, Douglas J. HAHN
  • Publication number: 20130159684
    Abstract: One embodiment of the present invention sets forth an optimized way to execute replay operations for divergent operations in a parallel processing subsystem. Specifically, the streaming multiprocessor (SM) includes a multistage pipeline configured to batch two or more replay operations for processing via replay loop. A logic element within the multistage pipeline detects whether the current pipeline stage is accessing a shared resource, such as loading data from a shared memory. If the threads are accessing data which are distributed across multiple cache lines, then the multistage pipeline batches two or more replay operations, where the replay operations are inserted into the pipeline back-to-back. Advantageously, divergent operations requiring two or more replay operations operate with reduced latency. Where memory access operations require transfer of more than two cache lines to service all threads, the number of clock cycles required to complete all replay operations is reduced.
    Type: Application
    Filed: December 16, 2011
    Publication date: June 20, 2013
    Inventors: Michael Fetterman, Jack Hilaire Choquette, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Stewart glenn Carlton, Rajeshwaran Selvanesan, Douglas J. Hahn, Steven James Heinrich