Patents by Inventor Olivier Giroux

Olivier Giroux has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR IMPLEMENTING SOFTWARE-BASED SCOREBOARDING

Publication number: 20150220341

Abstract: A system, method, and computer program product are provided for implementing a software-based scoreboarding mechanism. The method includes the steps of receiving a dependency barrier instruction that includes an immediate value and an identifier corresponding to a first register and, based on a comparison of the immediate value to the value stored in the first register, dispatching a subsequent instruction to at least a first processing unit of two or more processing units.

Type: Application

Filed: February 3, 2014

Publication date: August 6, 2015

Applicant: NVIDIA Corporation

Inventors: Robert Ohannessian, JR., Michael Alan Fetterman, Olivier Giroux, Jack H. Choquette, Xiaogang Qiu, Shirish Gadre, Meenaradchagan Vishnu
SYSTEM AND PROCESSOR FOR IMPLEMENTING INTERRUPTIBLE BATCHES OF INSTRUCTIONS

Publication number: 20150212819

Abstract: A system, method, and computer program product are provided for scheduling interruptible hatches of instructions for execution by one or more functional units of a processor. The method includes the steps of receiving a batch of instructions that includes a plurality of instructions and dispatching at least one instruction from the batch of instructions to one or more functional units for execution. The method further includes the step of receiving an interrupt request that causes an interrupt routine to be dispatched to the one or more functional units prior to all instructions in the batch of instructions being dispatched to the one or more functional units. When the interrupt request is received, the method further includes the step of storing batch-level resources in a memory to resume execution of the batch of instructions once the interrupt routine has finished execution.

Type: Application

Filed: January 30, 2014

Publication date: July 30, 2015

Applicant: NVIDIA Corporation

Inventors: Olivier Giroux, Robert Ohannessian, JR., Jack H. Choquette, Michael Alan Fetterman
SYSTEM AND PROCESSOR THAT INCLUDE AN IMPLEMENTATION OF DECOUPLED PIPELINES

Publication number: 20150193272

Abstract: A system and apparatus are provided that include an implementation for decoupled pipelines. The apparatus includes a scheduler configured to issue instructions to one or more functional units and a functional unit coupled to a queue having a number of slots for storing instructions. The instructions issued to the functional unit are stored in the queue until the functional unit is available to process the instructions.

Type: Application

Filed: January 3, 2014

Publication date: July 9, 2015

Applicant: NVIDIA Corporation

Inventors: Olivier Giroux, Michael Alan Fetterman, Robert Ohannessian, JR., Shirish Gadre, Jack H. Choquette, Xiaogang Qiu, Jeffrey Scott Tuckey, Robert James Stoll
HIERARCHICAL STAGING AREAS FOR SCHEDULING THREADS FOR EXECUTION

Publication number: 20150113538

Abstract: One embodiment of the present invention is a computer-implemented method for scheduling a thread group for execution on a processing engine that includes identifying a first thread group included in a first set of thread groups that can be issued for execution on the processing engine, where the first thread group includes one or more threads. The method also includes transferring the first thread group from the first set of thread groups to a second set of thread groups, allocating hardware resources to the first thread group, and selecting the first thread group from the second set of thread groups for execution on the processing engine. One advantage of the disclosed technique is that a scheduler only allocates limited hardware resources to thread groups that are, in fact, ready to be issued for execution, thereby conserving those resources in a manner that is generally more efficient than conventional techniques.

Type: Application

Filed: October 23, 2013

Publication date: April 23, 2015

Applicant: NVIDIA CORPORATION

Inventors: Olivier GIROUX, Jack Hilaire CHOQUETTE, Robert J. STOLL, Xiaogang QIU, Michael Alan FETTERMAN
Approach for a configurable phase-based priority scheduler

Patent number: 8949841

Abstract: A streaming multiprocessor (SM) in a parallel processing subsystem schedules priority among a plurality of threads. The SM retrieves a priority descriptor associated with a thread group, and determines whether the thread group and a second thread group are both operating in the same phase. If so, then the method determines whether the priority descriptor of the thread group indicates a higher priority than the priority descriptor of the second thread group. If so, the SM skews the thread group relative to the second thread group such that the thread groups operate in different phases, otherwise the SM increases the priority of the thread group. f the thread groups are not operating in the same phase, then the SM increases the priority of the thread group. One advantage of the disclosed techniques is that thread groups execute with increased efficiency, resulting in improved processor performance.

Type: Grant

Filed: December 27, 2012

Date of Patent: February 3, 2015

Assignee: NVIDIA Corporation

Inventors: Jack Hilaire Choquette, Olivier Giroux, Robert J. Stoll, Gary M. Tarolli, John Erik Lindholm
SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR MANAGING OUT-OF-ORDER EXECUTION OF PROGRAM INSTRUCTIONS

Publication number: 20150026442

Abstract: A method, system and computer program product embodied on a computer-readable medium are provided for managing the execution of out-of-order instructions. The method includes the steps of receiving a plurality of instructions and identifying a subset of instructions in the plurality of instructions to be executed out-of-order.

Type: Application

Filed: July 18, 2013

Publication date: January 22, 2015

Applicant: NVIDIA Corporation

Inventors: Olivier Giroux, Robert Ohannessian, Jr., Jack H. Choquette, William Parsons Newhall, Jr.
SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR COOPERATIVE MULTI-THREADING FOR VECTOR THREADS

Publication number: 20150026438

Abstract: A system, method, and computer program product for ensuring forward progress of threads that implement divergent operations in a single-instruction, multiple data (SIMD) architecture is disclosed. The method includes the steps of allocating a queue data structure to a thread block including a plurality of threads, determining that a current instruction specifies a yield operation, pushing a token onto the second side of the queue data structure, disabling any active threads in the thread block, popping a next pending token from the first side of the queue data structure, and activating one or more threads in the thread block according to a mask included in the next pending token.

Type: Application

Filed: July 18, 2013

Publication date: January 22, 2015

Inventors: Olivier Giroux, Gregory Frederick Diamos
Relaxed coherency between different caches

Patent number: 8930636

Abstract: One embodiment sets forth a technique for ensuring relaxed coherency between different caches. Two different execution units may be configured to access different caches that may store one or more cache lines corresponding to the same memory address. During time periods between memory barrier instructions relaxed coherency is maintained between the different caches. More specifically, writes to a cache line in a first cache that corresponds to a particular memory address are not necessarily propagated to a cache line in a second cache before the second cache receives a read or write request that also corresponds to the particular memory address. Therefore, the first cache and the second are not necessarily coherent during time periods of relaxed coherency. Execution of a memory barrier instruction ensures that the different caches will be coherent before a new period of relaxed coherency begins.

Type: Grant

Filed: July 20, 2012

Date of Patent: January 6, 2015

Assignee: NVIDIA Corporation

Inventors: Joel James McCormack, Rajesh Kota, Olivier Giroux, Emmett M. Kilgariff
SYSTEM AND METHOD FOR GLOBALLY ADDRESSABLE GPU MEMORY

Publication number: 20140310484

Abstract: A system and method for efficient memory access. The method includes receiving a request to access a portion of memory. The request comprises a first address. The method further includes determining whether the first address corresponds to a thread local portion of memory and in response to the first address corresponding to the thread local portion of memory, translating the first address to a second address. The method further includes accessing the thread local portion of memory based on the second address. The second address corresponds to an offset in a region of memory reserved for storing thread local data and allocations into the region are contiguous for a plurality of threads at each thread local offset.

Type: Application

Filed: April 16, 2013

Publication date: October 16, 2014

Applicant: NVIDIA Corporation

Inventor: Olivier GIROUX
SELECTIVE FAULT STALLING FOR A GPU MEMORY PIPELINE IN A UNIFIED VIRTUAL MEMORY SYSTEM

Publication number: 20140281679

Abstract: One embodiment of the present invention is a parallel processing unit (PPU) that includes one or more streaming multiprocessors (SMs) and implements a selective fault-stalling pipeline. Upon detecting a memory access fault associated with an operation executing on a particular SM, a replay unit in the selective fault-stalling pipeline considers the operation as a faulting operation. Subsequently, instead of notifying the SM of the memory access fault, the replay unit recirculates the operation—reinserting the operation into the selective fault-stalling pipeline. Recirculating faulting operations in such a fashion enables the SM to execute other operation while the replay unit stalls the faulting request until the associated access fault is resolved. Advantageously, the overall performance of the PPU is improved compared to conventional PPUs that, upon detecting a memory access fault, cancel the associated operation and subsequent operations.

Type: Application

Filed: December 17, 2013

Publication date: September 18, 2014

Applicant: NVIDIA CORPORATION

Inventors: Olivier GIROUX, Shirish GADRE
APPROACH FOR A CONFIGURABLE PHASE-BASED PRIORITY SCHEDULER

Publication number: 20140189698

Abstract: A streaming multiprocessor (SM) in a parallel processing subsystem schedules priority among a plurality of threads. The SM retrieves a priority descriptor associated with a thread group, and determines whether the thread group and a second thread group are both operating in the same phase. If so, then the method determines whether the priority descriptor of the thread group indicates a higher priority than the priority descriptor of the second thread group. If so, the SM skews the thread group relative to the second thread group such that the thread groups operate in different phases, otherwise the SM increases the priority of the thread group. f the thread groups are not operating in the same phase, then the SM increases the priority of the thread group. One advantage of the disclosed techniques is that thread groups execute with increased efficiency, resulting in improved processor performance.

Type: Application

Filed: December 27, 2012

Publication date: July 3, 2014

Applicant: NVIDIA Corporation

Inventors: Jack Hilaire CHOQUETTE, Olivier GIROUX, Robert J. STOLL, Gary M. TAROLLI, John Erik LINDHOLM
REORDERING BUFFER FOR MEMORY ACCESS LOCALITY

Publication number: 20140164743

Abstract: Systems and methods for scheduling instructions for execution on a multi-core processor reorder the execution of different threads to ensure that instructions specified as having localized memory access behavior are executed over one or more sequential clock cycles to benefit from memory access locality. At compile time, code sequences including memory access instructions that may be localized are delineated into separate batches. A scheduling unit ensures that multiple parallel threads are processed over one or more sequential scheduling cycles to execute the batched instructions. The scheduling unit waits to schedule execution of instructions that are not included in the particular batch until execution of the batched instructions is done so that memory access locality is maintained for the particular batch. In between the separate batches, instructions that are not included in a batch are scheduled so that threads executing non-batched instructions are also processed and not starved.

Type: Application

Filed: December 10, 2012

Publication date: June 12, 2014

Applicant: NVIDIA CORPORATION

Inventors: Olivier GIROUX, Jack Hilaire CHOQUETTE, Xiaogang QIU, Robert J. STOLL
METHOD AND SYSTEM FOR MEMORY OVERLAYS FOR PORTABLE FUNCTION POINTERS

Publication number: 20140047213

Abstract: A system and method for implementing memory overlays for portable pointer variables. The method includes providing a program executable by a heterogeneous processing system comprising a plurality of a processors running a plurality of instruction set architectures (ISAs). The method also includes providing a plurality of processor specific functions associated with a function pointer in the program. The method includes executing the program by a first processor. The method includes dereferencing the function pointer by mapping the function pointer to a corresponding processor specific feature based on which processor in the plurality of processors is executing the program.

Type: Application

Filed: August 8, 2012

Publication date: February 13, 2014

Applicant: NVIDIA CORPORATION

Inventor: Olivier Giroux
RELAXED COHERENCY BETWEEN DIFFERENT CACHES

Publication number: 20140025891

Abstract: One embodiment sets forth a technique for ensuring relaxed coherency between different caches. Two different execution units may be configured to access different caches that may store one or more cache lines corresponding to the same memory address. During time periods between memory barrier instructions relaxed coherency is maintained between the different caches. More specifically, writes to a cache line in a first cache that corresponds to a particular memory address are not necessarily propagated to a cache line in a second cache before the second cache receives a read or write request that also corresponds to the particular memory address. Therefore, the first cache and the second are not necessarily coherent during time periods of relaxed coherency. Execution of a memory barrier instruction ensures that the different caches will be coherent before a new period of relaxed coherency begins.

Type: Application

Filed: July 20, 2012

Publication date: January 23, 2014

Inventors: Joel James MCCORMACK, Rajesh KOTA, Olivier GIROUX, Emmett M. KILGARIFF
METHODS AND APPARATUS TO AVOID SURGES IN DI/DT BY THROTTLING GPU EXECUTION PERFORMANCE

Publication number: 20130262831

Abstract: Systems and methods for throttling GPU execution performance to avoid surges in DI/DT. A processor includes one or more execution units coupled to a scheduling unit configured to select instructions for execution by the one or more execution units. The execution units may be connected to one or more decoupling capacitors that store power for the circuits of the execution units. The scheduling unit is configured to throttle the instruction issue rate of the execution units based on a moving average issue rate over a large number of scheduling periods. The number of instructions issued during the current scheduling period is less than or equal to a throttling rate maintained by the scheduling unit that is greater than or equal to a minimum throttling issue rate. The throttling rate is set equal to the moving average plus an offset value at the end of each scheduling period.

Type: Application

Filed: April 2, 2012

Publication date: October 3, 2013

Inventors: Peter Michael NELSON, Jack Hilaire Choquette, Olivier Giroux
PRE-SCHEDULED REPLAYS OF DIVERGENT OPERATIONS

Publication number: 20130212364

Abstract: One embodiment of the present disclosure sets forth an optimized way to execute pre-scheduled replay operations for divergent operations in a parallel processing subsystem. Specifically, a streaming multiprocessor (SM) includes a multi-stage pipeline configured to insert pre-scheduled replay operations into a multi-stage pipeline. A pre-scheduled replay unit detects whether the operation associated with the current instruction is accessing a common resource. If the threads are accessing data which are distributed across multiple cache lines, then the pre-scheduled replay unit inserts pre-scheduled replay operations behind the current instruction. The multi-stage pipeline executes the instruction and the associated pre-scheduled replay operations sequentially. If additional threads remain unserviced after execution of the instruction and the pre-scheduled replay operations, then additional replay operations are inserted via the replay loop, until all threads are serviced.

Type: Application

Filed: February 9, 2012

Publication date: August 15, 2013

Inventors: Michael FETTERMAN, Stewart Glenn Carlton, Jack Hilaire Choquette, Shirish Gadre, Olivier Giroux, Douglas J. Hahn, Steven James Heinrich, Eric Lyell Hill, Charles McCarver, Omkar Paranjape, Anjana Rajendran, Rajeshwaran Selvanesan
Method and System for Resolving Thread Divergences

Publication number: 20130179662

Abstract: An address divergence unit detects divergence between threads in a thread group and then separates those threads into a subset of non-divergent threads and a subset of divergent threads. In one embodiment, the address divergence unit causes instructions associated with the subset of non-divergent threads to be issued for execution on a parallel processing unit, while causing the instructions associated with the subset of divergent threads to be re-fetched and re-issued for execution.

Type: Application

Filed: January 11, 2012

Publication date: July 11, 2013

Inventors: Jack CHOQUETTE, Xiaogang Qiu, Jeff Tuckey, Michael (Ming Yiu) Siu, Robert J. Stoll, Olivier Giroux
METHODS AND APPARATUS FOR SCHEDULING INSTRUCTIONS USING PRE-DECODE DATA

Publication number: 20130166881

Abstract: Systems and methods for scheduling instructions using pre-decode data corresponding to each instruction. In one embodiment, a multi-core processor includes a scheduling unit in each core for selecting instructions from two or more threads each scheduling cycle for execution on that particular core. As threads are scheduled for execution on the core, instructions from the threads are fetched into a buffer without being decoded. The pre-decode data is determined by a compiler and is extracted by the scheduling unit during runtime and used to control selection of threads for execution. The pre-decode data may specify a number of scheduling cycles to wait before scheduling the instruction. The pre-decode data may also specify a scheduling priority for the instruction. Once the scheduling unit selects an instruction to issue for execution, a decode unit fully decodes the instruction.

Type: Application

Filed: December 21, 2011

Publication date: June 27, 2013

Inventors: Jack Hilaire CHOQUETTE, Robert J. Stoll, Olivier Giroux
METHODS AND APPARATUS FOR SCHEDULING INSTRUCTIONS WITHOUT INSTRUCTION DECODE

Publication number: 20130166882

Abstract: Systems and methods for scheduling instructions without instruction decode. In one embodiment, a multi-core processor includes a scheduling unit in each core for scheduling instructions from two or more threads scheduled for execution on that particular core. As threads are scheduled for execution on the core, instructions from the threads are fetched into a buffer without being decoded. The scheduling unit includes a macro-scheduler unit for performing a priority sort of the two or more threads and a micro-scheduler arbiter for determining the highest order thread that is ready to execute. The macro-scheduler unit and the micro-scheduler arbiter use pre-decode data to implement the scheduling algorithm. The pre-decode data may be generated by decoding only a small portion of the instruction or received along with the instruction. Once the micro-scheduler arbiter has selected an instruction to dispatch to the execution unit, a decode unit fully decodes the instruction.

Type: Application

Filed: December 22, 2011

Publication date: June 27, 2013

Inventors: Jack Hilaire CHOQUETTE, Robert J. STOLL, Olivier GIROUX, Michael FETTERMAN, Shirish GADRE, Robert Steven GLANVILLE, Alexandre JOLY
SPECULATIVE EXECUTION AND ROLLBACK

Publication number: 20130117541

Abstract: One embodiment of the present invention sets forth a technique for speculatively issuing instructions to allow a processing pipeline to continue to process some instructions during rollback of other instructions. A scheduler circuit issues instructions for execution assuming that, several cycles later, when the instructions reach multithreaded execution units, that dependencies between the instructions will be resolved, resources will be available, operand data will be available, and other conditions will not prevent execution of the instructions. When a rollback condition exists at the point of execution for an instruction for a particular thread group, the instruction is not dispatched to the multithreaded execution units. However, other instructions issued by the scheduler circuit for execution by different thread groups, and for which a rollback condition does not exist, are executed by the multithreaded execution units.

Type: Application

Filed: November 4, 2011

Publication date: May 9, 2013

Inventors: Jack Hilaire CHOQUETTE, Olivier Giroux, Robert J. Stoll, Xiaogang Qiu

prev 1 2 3 4 5 next