Patents by Inventor Xiaogang Qiu

Xiaogang Qiu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Shaped register file reads

Patent number: 9626191

Abstract: One embodiment of the present invention sets forth a technique for performing a shaped access of a register file that includes a set of N registers, wherein N is greater than or equal to two. The technique involves, for at least one thread included in a group of threads, receiving a request to access a first amount of data from each register in the set of N registers, and configuring a crossbar to allow the at least one thread to access the first amount of data from each register in the set of N registers.

Type: Grant

Filed: December 22, 2011

Date of Patent: April 18, 2017

Assignee: NVIDIA Corporation

Inventors: Jack Hilaire Choquette, Michael Fetterman, Shirish Gadre, Xiaogang Qiu, Omkar Paranjape, Anjana Rajendran, Stewart Glenn Carlton, Eric Lyell Hill, Rajeshwaran Selvanesan, Douglas J. Hahn
System, method, and computer program product for implementing software-based scoreboarding

Patent number: 9612836

Abstract: A system, method, and computer program product are provided for implementing a software-based scoreboarding mechanism. The method includes the steps of receiving a dependency barrier instruction that includes an immediate value and an identifier corresponding to a first register and, based on a comparison of the immediate value to the value stored in the first register, dispatching a subsequent instruction to at least a first processing unit of two or more processing units.

Type: Grant

Filed: February 3, 2014

Date of Patent: April 4, 2017

Assignee: NVIDIA Corporation

Inventors: Robert Ohannessian, Jr., Michael Alan Fetterman, Olivier Giroux, Jack H. Choquette, Xiaogang Qiu, Shirish Gadre, Meenaradchagan Vishnu
Method and system for resolving thread divergences

Patent number: 9606808

Abstract: A computing device detects divergences between threads in a thread group executing on a parallel processing unit. The computing device includes an address divergence unit that identifies a subset of non-divergent threads included in the thread group. The address divergence unit stores instructions related to the subset of non-divergent threads in a multi-issue queue. The address divergence unit causes the instructions related to the subset of non-divergent threads to be retrieved from the multi-issue queue when the parallel processing unit is available. The address divergence unit causes the subset of non-divergent threads to be issued for execution on the parallel processing unit. The address divergence unit repeats the identifying, storing, and causing steps for the remaining threads in the thread group that are not included in the subset of non-divergent threads.

Type: Grant

Filed: January 11, 2012

Date of Patent: March 28, 2017

Assignee: NVIDIA Corporation

Inventors: Jack Choquette, Xiaogang Qiu, Jeff Tuckey, Michael (Ming Yiu) Siu, Robert J. Stoll, Olivier Giroux
System, method, and computer program product for implementing multi-cycle register file bypass

Patent number: 9477482

Abstract: A system, method, and computer program product are provided for implementing a multi-cycle register file bypass mechanism. The method includes the steps of receiving a set of control bits, combining the set of control bits with a set of valid bits associated with previously issued instructions, and enabling a bypass path for each thread based on the set of control bits and the set of valid bits. Each valid bit in the set of valid bits indicates whether execution of an instruction of the previously issued instructions was enabled for a thread in a thread block.

Type: Grant

Filed: September 26, 2013

Date of Patent: October 25, 2016

Assignee: NVIDIA Corporation

Inventors: Xiaogang Qiu, Ian Chi Yan Kwong, Ming Yiu Siu, Jack H. Choquette, Michael Alan Fetterman
System and processor that include an implementation of decoupled pipelines

Patent number: 9471307

Abstract: A system and apparatus are provided that include an implementation for decoupled pipelines. The apparatus includes a scheduler configured to issue instructions to one or more functional units and a functional unit coupled to a queue having a number of slots for storing instructions. The instructions issued to the functional unit are stored in the queue until the functional unit is available to process the instructions.

Type: Grant

Filed: January 3, 2014

Date of Patent: October 18, 2016

Assignee: NVIDIA Corporation

Inventors: Olivier Giroux, Michael Alan Fetterman, Robert Ohannessian, Jr., Shirish Gadre, Jack H. Choquette, Xiaogang Qiu, Jeffrey Scott Tuckey, Robert James Stoll
SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR IMPLEMENTING SOFTWARE-BASED SCOREBOARDING

Publication number: 20150220341

Abstract: A system, method, and computer program product are provided for implementing a software-based scoreboarding mechanism. The method includes the steps of receiving a dependency barrier instruction that includes an immediate value and an identifier corresponding to a first register and, based on a comparison of the immediate value to the value stored in the first register, dispatching a subsequent instruction to at least a first processing unit of two or more processing units.

Type: Application

Filed: February 3, 2014

Publication date: August 6, 2015

Applicant: NVIDIA Corporation

Inventors: Robert Ohannessian, JR., Michael Alan Fetterman, Olivier Giroux, Jack H. Choquette, Xiaogang Qiu, Shirish Gadre, Meenaradchagan Vishnu
SYSTEM AND PROCESSOR THAT INCLUDE AN IMPLEMENTATION OF DECOUPLED PIPELINES

Publication number: 20150193272

Abstract: A system and apparatus are provided that include an implementation for decoupled pipelines. The apparatus includes a scheduler configured to issue instructions to one or more functional units and a functional unit coupled to a queue having a number of slots for storing instructions. The instructions issued to the functional unit are stored in the queue until the functional unit is available to process the instructions.

Type: Application

Filed: January 3, 2014

Publication date: July 9, 2015

Applicant: NVIDIA Corporation

Inventors: Olivier Giroux, Michael Alan Fetterman, Robert Ohannessian, JR., Shirish Gadre, Jack H. Choquette, Xiaogang Qiu, Jeffrey Scott Tuckey, Robert James Stoll
HIERARCHICAL STAGING AREAS FOR SCHEDULING THREADS FOR EXECUTION

Publication number: 20150113538

Abstract: One embodiment of the present invention is a computer-implemented method for scheduling a thread group for execution on a processing engine that includes identifying a first thread group included in a first set of thread groups that can be issued for execution on the processing engine, where the first thread group includes one or more threads. The method also includes transferring the first thread group from the first set of thread groups to a second set of thread groups, allocating hardware resources to the first thread group, and selecting the first thread group from the second set of thread groups for execution on the processing engine. One advantage of the disclosed technique is that a scheduler only allocates limited hardware resources to thread groups that are, in fact, ready to be issued for execution, thereby conserving those resources in a manner that is generally more efficient than conventional techniques.

Type: Application

Filed: October 23, 2013

Publication date: April 23, 2015

Applicant: NVIDIA CORPORATION

Inventors: Olivier GIROUX, Jack Hilaire CHOQUETTE, Robert J. STOLL, Xiaogang QIU, Michael Alan FETTERMAN
SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR IMPLEMENTING MULTI-CYCLE REGISTER FILE BYPASS

Publication number: 20150089202

Abstract: A system, method, and computer program product are provided for implementing a multi-cycle register file bypass mechanism. The method includes the steps of receiving a set of control bits, combining the set of control bits with a set of valid bits associated with previously issued instructions, and enabling a bypass path for each thread based on the set of control bits and the set of valid bits. Each valid bit in the set of valid bits indicates whether execution of an instruction of the previously issued instructions was enabled for a thread in a thread block.

Type: Application

Filed: September 26, 2013

Publication date: March 26, 2015

Applicant: NVIDIA Corporation

Inventors: Xiaogang Qiu, Ian Chi Yan Kwong, Ming Yiu Siu, Jack H. Choquette, Michael Alan Fetterman
REORDERING BUFFER FOR MEMORY ACCESS LOCALITY

Publication number: 20140164743

Abstract: Systems and methods for scheduling instructions for execution on a multi-core processor reorder the execution of different threads to ensure that instructions specified as having localized memory access behavior are executed over one or more sequential clock cycles to benefit from memory access locality. At compile time, code sequences including memory access instructions that may be localized are delineated into separate batches. A scheduling unit ensures that multiple parallel threads are processed over one or more sequential scheduling cycles to execute the batched instructions. The scheduling unit waits to schedule execution of instructions that are not included in the particular batch until execution of the batched instructions is done so that memory access locality is maintained for the particular batch. In between the separate batches, instructions that are not included in a batch are scheduled so that threads executing non-batched instructions are also processed and not starved.

Type: Application

Filed: December 10, 2012

Publication date: June 12, 2014

Applicant: NVIDIA CORPORATION

Inventors: Olivier GIROUX, Jack Hilaire CHOQUETTE, Xiaogang QIU, Robert J. STOLL
Reordering operands assigned to each one of read request ports concurrently accessing multibank register file to avoid bank conflict

Patent number: 8533435

Abstract: One embodiment of the present invention sets forth a technique for collecting operands specified by an instruction. As a sequence of instructions is received the operands specified by the instructions are assigned to ports, so that each one of the operands specified by a single instruction is assigned to a different port. Reading of the operands from a multi-bank register file is scheduled by selecting an operand from each one of the different ports to produce an operand read request and ensuring that two or more of the selected operands are not stored in the same bank of the multi-bank register file. The operands specified by the operand read request are read from the multi-bank register file in a single clock cycle. Each instruction is then executed as the operands specified by the instruction are read from the multi-bank register file and collected over one or more clock cycles.

Type: Grant

Filed: September 3, 2010

Date of Patent: September 10, 2013

Assignee: NVIDIA Corporation

Inventors: Xiaogang Qiu, Ming Y. Siu, Yan Yan Tang, John Erik Lindholm, Michael C. Shebanow, Stuart F. Oberman
Method and System for Resolving Thread Divergences

Publication number: 20130179662

Abstract: An address divergence unit detects divergence between threads in a thread group and then separates those threads into a subset of non-divergent threads and a subset of divergent threads. In one embodiment, the address divergence unit causes instructions associated with the subset of non-divergent threads to be issued for execution on a parallel processing unit, while causing the instructions associated with the subset of divergent threads to be re-fetched and re-issued for execution.

Type: Application

Filed: January 11, 2012

Publication date: July 11, 2013

Inventors: Jack CHOQUETTE, Xiaogang Qiu, Jeff Tuckey, Michael (Ming Yiu) Siu, Robert J. Stoll, Olivier Giroux
SHAPED REGISTER FILE READS

Publication number: 20130166877

Abstract: One embodiment of the present invention sets forth a technique for performing a shaped access of a register file that includes a set of N registers, wherein N is greater than or equal to two. The technique involves, for at least one thread included in a group of threads, receiving a request to access a first amount of data from each register in the set of N registers, and configuring a crossbar to allow the at least one thread to access the first amount of data from each register in the set of N registers.

Type: Application

Filed: December 22, 2011

Publication date: June 27, 2013

Inventors: Jack Hilaire CHOQUETTE, Michael FETTERMAN, Shirish GADRE, Xiaogang QIU, Omkar PARANJAPE, Anjana RAJENDRAN, Stewart Glenn CARLTON, Eric Lyell HILL, Rajeshwaran SELVANESAN, Douglas J. HAHN
SYSTEM AND METHOD FOR PERFORMING SHAPED MEMORY ACCESS OPERATIONS

Publication number: 20130145124

Abstract: One embodiment of the present invention sets forth a technique that provides an efficient way to retrieve operands from a register file. Specifically, the instruction dispatch unit receives one or more instructions, each of which includes one or more operands. Collectively, the operands are organized into one or more operand groups from which a shaped access may be formed. The operands are retrieved from the register file and stored in a collector. Once all operands are read and collected in the collector, the instruction dispatch unit transmits the instructions and corresponding operands to functional units within the streaming multiprocessor for execution. One advantage of the present invention is that multiple operands are retrieved from the register file in a single register access operation without resource conflict. Performance in retrieving operands from the register file is improved by forming shaped accesses that efficiently retrieve operands exhibiting recognized memory access patterns.

Type: Application

Filed: December 6, 2011

Publication date: June 6, 2013

Inventors: Xiaogang Qiu, Jack Hilaire Choquette, Manuel Olivier Gautho, Ming Y. (Michael) Siu
SPECULATIVE EXECUTION AND ROLLBACK

Publication number: 20130117541

Abstract: One embodiment of the present invention sets forth a technique for speculatively issuing instructions to allow a processing pipeline to continue to process some instructions during rollback of other instructions. A scheduler circuit issues instructions for execution assuming that, several cycles later, when the instructions reach multithreaded execution units, that dependencies between the instructions will be resolved, resources will be available, operand data will be available, and other conditions will not prevent execution of the instructions. When a rollback condition exists at the point of execution for an instruction for a particular thread group, the instruction is not dispatched to the multithreaded execution units. However, other instructions issued by the scheduler circuit for execution by different thread groups, and for which a rollback condition does not exist, are executed by the multithreaded execution units.

Type: Application

Filed: November 4, 2011

Publication date: May 9, 2013

Inventors: Jack Hilaire CHOQUETTE, Olivier Giroux, Robert J. Stoll, Xiaogang Qiu
ECC bits used as additional register file storage

Patent number: 8321761

Abstract: A memory module includes a plurality of register files. Each register file is associated with a set of error-correcting code (ECC) bits and ECC check/correct logic that can provide error-correcting functionality, if required. When error-correcting functionality is not required, ECC bits are grouped together to form additional register files, thereby providing additional storage space.

Type: Grant

Filed: September 28, 2009

Date of Patent: November 27, 2012

Assignee: NVIDIA Corporation

Inventors: Fred Gruner, Xiaogang Qiu, Yan Yan Tang
ECC bits used as additional register file storage

Patent number: 8250439

Abstract: A memory module includes a plurality of register files. Each register file is associated with a set of error-correcting code (ECC) bits and ECC check/correct logic that can provide error-correcting functionality, if required. When error-correcting functionality is not required, ECC bits are grouped together to form additional register files, thereby providing additional storage space.

Type: Grant

Filed: September 28, 2009

Date of Patent: August 21, 2012

Assignee: NVIDIA Corporation

Inventors: Fred Gruner, Xiaogang Qiu
Unified Collector Structure for Multi-Bank Register File

Publication number: 20110072243

Abstract: One embodiment of the present invention sets forth a technique for collecting operands specified by an instruction. As a sequence of instructions is received the operands specified by the instructions are assigned to ports, so that each one of the operands specified by a single instruction is assigned to a different port. Reading of the operands from a multi-bank register file is scheduled by selecting an operand from each one of the different ports to produce an operand read request and ensuring that two or more of the selected operands are not stored in the same bank of the multi-bank register file. The operands specified by the operand read request are read from the multi-bank register file in a single clock cycle. Each instruction is then executed as the operands specified by the instruction are read from the multi-bank register file and collected over one or more clock cycles.

Type: Application

Filed: September 3, 2010

Publication date: March 24, 2011

Inventors: Xiaogang Qiu, Ming Y. Siu, Yan Yan Tang, John Erik Lindholm, Michael C. Shebanow, Stuart F. Oberman
Method and apparatus for supporting asymmetric multi-threading in a computer system

Patent number: 7509643

Abstract: One embodiment of the present invention facilitates favoring the performance of a single-threaded application in a computer system that supports simultaneous multi-threading (SMT), wherein multiple threads of execution simultaneously execute in an interleaved manner on functional units within a processor. During operation, the system maintains a priority for each simultaneously executing thread. The system uses these priorities in allocating a shared computational resource between the simultaneously executing threads, so that a thread with a higher priority is given preferential access to the shared computational resource. This asymmetric treatment of the threads enables+ the system to favor the performance of a single-threaded application while performing simultaneous multi-threading.

Type: Grant

Filed: March 24, 2003

Date of Patent: March 24, 2009

Assignee: Sun Microsystems, Inc.

Inventors: Xiaogang Qiu, Si-En Chang
Method and apparatus for implementing a grid storage system

Patent number: 7484039

Abstract: Embodiments of the present invention facilitate implementing external storage systems using commodity computer components to achieve high performance and reliability. An exemplary method facilitates dynamic repairing of disk failures for RAID1 storage coherently across a plurality of loosely coupled storage controller computers via message communications through network interfaces. An exemplary method facilitates snapshot function coherently across a plurality of loosely coupled storage controller nodes via message communications through network interfaces. An exemplary method facilitates to detect, tolerate, and repair temporary target device failures in a networked storage system. An exemplary target device may contain a plurality of disk devices, and a temporary target device failure may due to many reasons such as a network or software glitch.

Type: Grant

Filed: May 22, 2006

Date of Patent: January 27, 2009

Inventors: Xiaogang Qiu, Ningchuan Shen

prev 1 2 3 next