Patents by Inventor Jack Hilaire Choquette

Jack Hilaire Choquette has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

RECONFIGURING REGISTER AND SHARED MEMORY USAGE IN THREAD ARRAYS

Publication number: 20230297426

Abstract: Various embodiments include techniques for utilizing resources on a processing unit. Thread groups executing on a processor begin execution with specified resources, such as a number of registers and an amount of shared memory. During execution, one or more thread groups may determine that the thread groups have excess resources needed to execute the current functions. Such thread groups can deallocate the excess resources to a free pool. Similarly, during execution, one or more thread groups may determine that the thread groups have fewer resources needed to execute the current functions. Such thread groups can allocate the needed resources from the free pool. Further, producer thread groups that generate data for consumer thread groups can deallocate excess resources prior to completion. The consumer thread groups can allocate the excess resources and initiate execution while the producer thread groups complete execution, thereby decreasing latency between producer and consumer thread groups.

Type: Application

Filed: March 18, 2022

Publication date: September 21, 2023

Inventors: Rajballav DASH, Stephen JONES, Jack Hilaire CHOQUETTE, Manan PATEL, Ronny M. KRASHINSKY, Shirish GADRE, Lixia QIN
EFFICIENTLY LAUNCHING TASKS ON A PROCESSOR

Publication number: 20230236878

Abstract: In various embodiments, scheduling dependencies associated with tasks executed on a processor are decoupled from data dependencies associated with the tasks. Before the completion of a first task that is executing in the processor, a scheduling dependency specifying that a second task is dependent on the first task is resolved based on a pre-exit trigger. In response to the resolution of the scheduling dependency, the second task is launched on the processor.

Type: Application

Filed: January 25, 2022

Publication date: July 27, 2023

Inventors: Jack Hilaire CHOQUETTE, Rajballav DASH, Shayani DEB, Gentaro HIROTA, Ronny M. KRASHINSKY, Ze LONG, Chen MEI, Manan PATEL, Ming Y. SIU
Hierarchical staging areas for scheduling threads for execution

Patent number: 10489200

Abstract: One embodiment of the present invention is a computer-implemented method for scheduling a thread group for execution on a processing engine that includes identifying a first thread group included in a first set of thread groups that can be issued for execution on the processing engine, where the first thread group includes one or more threads. The method also includes transferring the first thread group from the first set of thread groups to a second set of thread groups, allocating hardware resources to the first thread group, and selecting the first thread group from the second set of thread groups for execution on the processing engine. One advantage of the disclosed technique is that a scheduler only allocates limited hardware resources to thread groups that are, in fact, ready to be issued for execution, thereby conserving those resources in a manner that is generally more efficient than conventional techniques.

Type: Grant

Filed: October 23, 2013

Date of Patent: November 26, 2019

Assignee: NVIDIA CORPORATION

Inventors: Olivier Giroux, Jack Hilaire Choquette, Robert J. Stoll, Xiaogang Qiu, Michael Alan Fetterman
Approach for a configurable phase-based priority scheduler

Patent number: 10346212

Abstract: A streaming multiprocessor (SM) in a parallel processing subsystem schedules priority among a plurality of threads. The SM retrieves a priority descriptor associated with a thread group, and determines whether the thread group and a second thread group are both operating in the same phase. If so, then the method determines whether the priority descriptor of the thread group indicates a higher priority than the priority descriptor of the second thread group. If so, the SM skews the thread group relative to the second thread group such that the thread groups operate in different phases, otherwise the SM increases the priority of the thread group. f the thread groups are not operating in the same phase, then the SM increases the priority of the thread group. One advantage of the disclosed techniques is that thread groups execute with increased efficiency, resulting in improved processor performance.

Type: Grant

Filed: February 3, 2015

Date of Patent: July 9, 2019

Assignee: NVIDIA CORPORATION

Inventors: Jack Hilaire Choquette, Olivier Giroux, Robert J. Stoll, Gary M. Tarolli, John Erik Lindholm
System and method for performing shaped memory access operations

Patent number: 10255228

Abstract: One embodiment of the present invention sets forth a technique that provides an efficient way to retrieve operands from a register file. Specifically, the instruction dispatch unit receives one or more instructions, each of which includes one or more operands. Collectively, the operands are organized into one or more operand groups from which a shaped access may be formed. The operands are retrieved from the register file and stored in a collector. Once all operands are read and collected in the collector, the instruction dispatch unit transmits the instructions and corresponding operands to functional units within the streaming multiprocessor for execution. One advantage of the present invention is that multiple operands are retrieved from the register file in a single register access operation without resource conflict. Performance in retrieving operands from the register file is improved by forming shaped accesses that efficiently retrieve operands exhibiting recognized memory access patterns.

Type: Grant

Filed: December 6, 2011

Date of Patent: April 9, 2019

Assignee: NVIDIA CORPORATION

Inventors: Xiaogang Qiu, Jack Hilaire Choquette, Manuel Olivier Gautho, Ming Y. (Michael) Siu
Pre-scheduled replays of divergent operations

Patent number: 10152329

Abstract: One embodiment of the present disclosure sets forth an optimized way to execute pre-scheduled replay operations for divergent operations in a parallel processing subsystem. Specifically, a streaming multiprocessor (SM) includes a multi-stage pipeline configured to insert pre-scheduled replay operations into a multi-stage pipeline. A pre-scheduled replay unit detects whether the operation associated with the current instruction is accessing a common resource. If the threads are accessing data which are distributed across multiple cache lines, then the pre-scheduled replay unit inserts pre-scheduled replay operations behind the current instruction. The multi-stage pipeline executes the instruction and the associated pre-scheduled replay operations sequentially. If additional threads remain unserviced after execution of the instruction and the pre-scheduled replay operations, then additional replay operations are inserted via the replay loop, until all threads are serviced.

Type: Grant

Filed: February 9, 2012

Date of Patent: December 11, 2018

Assignee: NVIDIA CORPORATION

Inventors: Michael Fetterman, Stewart Glenn Carlton, Jack Hilaire Choquette, Shirish Gadre, Olivier Giroux, Douglas J. Hahn, Steven James Heinrich, Eric Lyell Hill, Charles McCarver, Omkar Paranjape, Anjana Rajendran, Rajeshwaran Selvanesan
Speculative execution and rollback

Patent number: 9830158

Abstract: One embodiment of the present invention sets forth a technique for speculatively issuing instructions to allow a processing pipeline to continue to process some instructions during rollback of other instructions. A scheduler circuit issues instructions for execution assuming that, several cycles later, when the instructions reach multithreaded execution units, that dependencies between the instructions will be resolved, resources will be available, operand data will be available, and other conditions will not prevent execution of the instructions. When a rollback condition exists at the point of execution for an instruction for a particular thread group, the instruction is not dispatched to the multithreaded execution units. However, other instructions issued by the scheduler circuit for execution by different thread groups, and for which a rollback condition does not exist, are executed by the multithreaded execution units.

Type: Grant

Filed: November 4, 2011

Date of Patent: November 28, 2017

Assignee: NVIDIA CORPORATION

Inventors: Jack Hilaire Choquette, Olivier Giroux, Robert J. Stoll, Xiaogang Qiu
Batched replays of divergent operations

Patent number: 9817668

Abstract: One embodiment of the present invention sets forth an approach for executing replay operations for divergent operations in a parallel processing subsystem. Specifically, the streaming multiprocessor (SM) includes a multistage pipeline configured to batch two or more replay operations for processing via replay loop. A logic element within the multistage pipeline detects whether the current pipeline stage is accessing a shared resource, such as loading data from a shared memory. If the threads are accessing data which are distributed across multiple cache lines, then the multistage pipeline batches two or more replay operations, where the replay operations are inserted into the pipeline back-to-back.

Type: Grant

Filed: December 16, 2011

Date of Patent: November 14, 2017

Assignee: NVIDIA Corporation

Inventors: Michael Fetterman, Jack Hilaire Choquette, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Stewart Glenn Carlton, Rajeshwaran Selvanesan, Douglas J. Hahn, Steven James Heinrich
Methods and apparatus for scheduling instructions using pre-decode data

Patent number: 9798548

Abstract: Systems and methods for scheduling instructions using pre-decode data corresponding to each instruction. In one embodiment, a multi-core processor includes a scheduling unit in each core for selecting instructions from two or more threads each scheduling cycle for execution on that particular core. As threads are scheduled for execution on the core, instructions from the threads are fetched into a buffer without being decoded. The pre-decode data is determined by a compiler and is extracted by the scheduling unit during runtime and used to control selection of threads for execution. The pre-decode data may specify a number of scheduling cycles to wait before scheduling the instruction. The pre-decode data may also specify a scheduling priority for the instruction. Once the scheduling unit selects an instruction to issue for execution, a decode unit fully decodes the instruction.

Type: Grant

Filed: December 21, 2011

Date of Patent: October 24, 2017

Assignee: NVIDIA Corporation

Inventors: Jack Hilaire Choquette, Robert J. Stoll, Olivier Giroux
Reordering buffer for memory access locality

Patent number: 9798544

Abstract: Systems and methods for scheduling instructions for execution on a multi-core processor reorder the execution of different threads to ensure that instructions specified as having localized memory access behavior are executed over one or more sequential clock cycles to benefit from memory access locality. At compile time, code sequences including memory access instructions that may be localized are delineated into separate batches. A scheduling unit ensures that multiple parallel threads are processed over one or more sequential scheduling cycles to execute the batched instructions. The scheduling unit waits to schedule execution of instructions that are not included in the particular batch until execution of the batched instructions is done so that memory access locality is maintained for the particular batch. In between the separate batches, instructions that are not included in a batch are scheduled so that threads executing non-batched instructions are also processed and not starved.

Type: Grant

Filed: December 10, 2012

Date of Patent: October 24, 2017

Assignee: NVIDIA CORPORATION

Inventors: Olivier Giroux, Jack Hilaire Choquette, Xiaogang Qiu, Robert J. Stoll
APPROACH FOR A CONFIGURABLE PHASE-BASED PRIORITY SCHEDULER

Publication number: 20170192822

Abstract: A streaming multiprocessor (SM) in a parallel processing subsystem schedules priority among a plurality of threads. The SM retrieves a priority descriptor associated with a thread group, and determines whether the thread group and a second thread group are both operating in the same phase. If so, then the method determines whether the priority descriptor of the thread group indicates a higher priority than the priority descriptor of the second thread group. If so, the SM skews the thread group relative to the second thread group such that the thread groups operate in different phases, otherwise the SM increases the priority of the thread group. f the thread groups are not operating in the same phase, then the SM increases the priority of the thread group. One advantage of the disclosed techniques is that thread groups execute with increased efficiency, resulting in improved processor performance.

Type: Application

Filed: February 3, 2015

Publication date: July 6, 2017

Inventors: Jack Hilaire CHOQUETTE, Olivier GIROUX, Robert J. STOLL, Gary M. TAROLLI, John Erik LINDHOLM
Shaped register file reads

Patent number: 9626191

Abstract: One embodiment of the present invention sets forth a technique for performing a shaped access of a register file that includes a set of N registers, wherein N is greater than or equal to two. The technique involves, for at least one thread included in a group of threads, receiving a request to access a first amount of data from each register in the set of N registers, and configuring a crossbar to allow the at least one thread to access the first amount of data from each register in the set of N registers.

Type: Grant

Filed: December 22, 2011

Date of Patent: April 18, 2017

Assignee: NVIDIA Corporation

Inventors: Jack Hilaire Choquette, Michael Fetterman, Shirish Gadre, Xiaogang Qiu, Omkar Paranjape, Anjana Rajendran, Stewart Glenn Carlton, Eric Lyell Hill, Rajeshwaran Selvanesan, Douglas J. Hahn
Throttling instruction issue rate based on updated moving average to avoid surges in DI/DT

Patent number: 9430242

Abstract: Systems and methods for throttling GPU execution performance to avoid surges in DI/DT. A processor includes one or more execution units coupled to a scheduling unit configured to select instructions for execution by the one or more execution units. The execution units may be connected to one or more decoupling capacitors that store power for the circuits of the execution units. The scheduling unit is configured to throttle the instruction issue rate of the execution units based on a moving average issue rate over a large number of scheduling periods. The number of instructions issued during the current scheduling period is less than or equal to a throttling rate maintained by the scheduling unit that is greater than or equal to a minimum throttling issue rate. The throttling rate is set equal to the moving average plus an offset value at the end of each scheduling period.

Type: Grant

Filed: April 2, 2012

Date of Patent: August 30, 2016

Assignee: NVIDIA Corporation

Inventors: Peter Michael Nelson, Jack Hilaire Choquette, Olivier Giroux
APPROACH FOR A CONFIGURABLE PHASE-BASED PRIORITY SCHEDULER

Publication number: 20160224386

Abstract: A streaming multiprocessor (SM) in a parallel processing subsystem schedules priority among a plurality of threads. The SM retrieves a priority descriptor associated with a thread group, and determines whether the thread group and a second thread group are both operating in the same phase. If so, then the method determines whether the priority descriptor of the thread group indicates a higher priority than the priority descriptor of the second thread group. If so, the SM skews the thread group relative to the second thread group such that the thread groups operate in different phases, otherwise the SM increases the priority of the thread group. f the thread groups are not operating in the same phase, then the SM increases the priority of the thread group. One advantage of the disclosed techniques is that thread groups execute with increased efficiency, resulting in improved processor performance.

Type: Application

Filed: February 3, 2015

Publication date: August 4, 2016

Inventors: Jack Hilaire CHOQUETTE, Olivier GIROUX, Robert J. STOLL, Gary M. TAROLLI, John Erik LINDHOLM
Multi-level instruction cache prefetching

Patent number: 9110810

Abstract: One embodiment of the present invention sets forth an improved way to prefetch instructions in a multi-level cache. Fetch unit initiates a prefetch operation to transfer one of a set of multiple cache lines, based on a function of a pseudorandom number generator and the sector corresponding to the current instruction L1 cache line. The fetch unit selects a prefetch target from the set of multiple cache lines according to some probability function. If the current instruction L1 cache 370 is located within the first sector of the corresponding L1.5 cache line, then the selected prefetch target is located at a sector within the next L1.5 cache line. The result is that the instruction L1 cache hit rate is improved and instruction fetch latency is reduced, even where the processor consumes instructions in the instruction L1 cache at a fast rate.

Type: Grant

Filed: December 6, 2011

Date of Patent: August 18, 2015

Assignee: NVIDIA CORPORATION

Inventors: Nicholas Wang, Jack Hilaire Choquette
HIERARCHICAL STAGING AREAS FOR SCHEDULING THREADS FOR EXECUTION

Publication number: 20150113538

Abstract: One embodiment of the present invention is a computer-implemented method for scheduling a thread group for execution on a processing engine that includes identifying a first thread group included in a first set of thread groups that can be issued for execution on the processing engine, where the first thread group includes one or more threads. The method also includes transferring the first thread group from the first set of thread groups to a second set of thread groups, allocating hardware resources to the first thread group, and selecting the first thread group from the second set of thread groups for execution on the processing engine. One advantage of the disclosed technique is that a scheduler only allocates limited hardware resources to thread groups that are, in fact, ready to be issued for execution, thereby conserving those resources in a manner that is generally more efficient than conventional techniques.

Type: Application

Filed: October 23, 2013

Publication date: April 23, 2015

Applicant: NVIDIA CORPORATION

Inventors: Olivier GIROUX, Jack Hilaire CHOQUETTE, Robert J. STOLL, Xiaogang QIU, Michael Alan FETTERMAN
Approach for a configurable phase-based priority scheduler

Patent number: 8949841

Abstract: A streaming multiprocessor (SM) in a parallel processing subsystem schedules priority among a plurality of threads. The SM retrieves a priority descriptor associated with a thread group, and determines whether the thread group and a second thread group are both operating in the same phase. If so, then the method determines whether the priority descriptor of the thread group indicates a higher priority than the priority descriptor of the second thread group. If so, the SM skews the thread group relative to the second thread group such that the thread groups operate in different phases, otherwise the SM increases the priority of the thread group. f the thread groups are not operating in the same phase, then the SM increases the priority of the thread group. One advantage of the disclosed techniques is that thread groups execute with increased efficiency, resulting in improved processor performance.

Type: Grant

Filed: December 27, 2012

Date of Patent: February 3, 2015

Assignee: NVIDIA Corporation

Inventors: Jack Hilaire Choquette, Olivier Giroux, Robert J. Stoll, Gary M. Tarolli, John Erik Lindholm
APPROACH FOR A CONFIGURABLE PHASE-BASED PRIORITY SCHEDULER

Publication number: 20140189698

Abstract: A streaming multiprocessor (SM) in a parallel processing subsystem schedules priority among a plurality of threads. The SM retrieves a priority descriptor associated with a thread group, and determines whether the thread group and a second thread group are both operating in the same phase. If so, then the method determines whether the priority descriptor of the thread group indicates a higher priority than the priority descriptor of the second thread group. If so, the SM skews the thread group relative to the second thread group such that the thread groups operate in different phases, otherwise the SM increases the priority of the thread group. f the thread groups are not operating in the same phase, then the SM increases the priority of the thread group. One advantage of the disclosed techniques is that thread groups execute with increased efficiency, resulting in improved processor performance.

Type: Application

Filed: December 27, 2012

Publication date: July 3, 2014

Applicant: NVIDIA Corporation

Inventors: Jack Hilaire CHOQUETTE, Olivier GIROUX, Robert J. STOLL, Gary M. TAROLLI, John Erik LINDHOLM
REORDERING BUFFER FOR MEMORY ACCESS LOCALITY

Publication number: 20140164743

Abstract: Systems and methods for scheduling instructions for execution on a multi-core processor reorder the execution of different threads to ensure that instructions specified as having localized memory access behavior are executed over one or more sequential clock cycles to benefit from memory access locality. At compile time, code sequences including memory access instructions that may be localized are delineated into separate batches. A scheduling unit ensures that multiple parallel threads are processed over one or more sequential scheduling cycles to execute the batched instructions. The scheduling unit waits to schedule execution of instructions that are not included in the particular batch until execution of the batched instructions is done so that memory access locality is maintained for the particular batch. In between the separate batches, instructions that are not included in a batch are scheduled so that threads executing non-batched instructions are also processed and not starved.

Type: Application

Filed: December 10, 2012

Publication date: June 12, 2014

Applicant: NVIDIA CORPORATION

Inventors: Olivier GIROUX, Jack Hilaire CHOQUETTE, Xiaogang QIU, Robert J. STOLL
Thread group scheduler for computing on a parallel thread processor

Patent number: 8732713

Abstract: A parallel thread processor executes thread groups belonging to multiple cooperative thread arrays (CTAs). At each cycle of the parallel thread processor, an instruction scheduler selects a thread group to be issued for execution during a subsequent cycle. The instruction scheduler selects a thread group to issue for execution by (i) identifying a pool of available thread groups, (ii) identifying a CTA that has the greatest seniority value, and (iii) selecting the thread group that has the greatest credit value from within the CTA with the greatest seniority value.

Type: Grant

Filed: September 28, 2011

Date of Patent: May 20, 2014

Assignee: NVIDIA Corporation

Inventors: Brett W. Coon, John Erik Lindholm, Robert J. Stoll, Nicholas Wang, Jack Hilaire Choquette, Kathleen Elliott Nickolls

1 2 next