Patents by Inventor Xiaogang Qiu

Xiaogang Qiu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 9626191
    Abstract: One embodiment of the present invention sets forth a technique for performing a shaped access of a register file that includes a set of N registers, wherein N is greater than or equal to two. The technique involves, for at least one thread included in a group of threads, receiving a request to access a first amount of data from each register in the set of N registers, and configuring a crossbar to allow the at least one thread to access the first amount of data from each register in the set of N registers.
    Type: Grant
    Filed: December 22, 2011
    Date of Patent: April 18, 2017
    Assignee: NVIDIA Corporation
    Inventors: Jack Hilaire Choquette, Michael Fetterman, Shirish Gadre, Xiaogang Qiu, Omkar Paranjape, Anjana Rajendran, Stewart Glenn Carlton, Eric Lyell Hill, Rajeshwaran Selvanesan, Douglas J. Hahn
  • Patent number: 9612836
    Abstract: A system, method, and computer program product are provided for implementing a software-based scoreboarding mechanism. The method includes the steps of receiving a dependency barrier instruction that includes an immediate value and an identifier corresponding to a first register and, based on a comparison of the immediate value to the value stored in the first register, dispatching a subsequent instruction to at least a first processing unit of two or more processing units.
    Type: Grant
    Filed: February 3, 2014
    Date of Patent: April 4, 2017
    Assignee: NVIDIA Corporation
    Inventors: Robert Ohannessian, Jr., Michael Alan Fetterman, Olivier Giroux, Jack H. Choquette, Xiaogang Qiu, Shirish Gadre, Meenaradchagan Vishnu
  • Patent number: 9606808
    Abstract: A computing device detects divergences between threads in a thread group executing on a parallel processing unit. The computing device includes an address divergence unit that identifies a subset of non-divergent threads included in the thread group. The address divergence unit stores instructions related to the subset of non-divergent threads in a multi-issue queue. The address divergence unit causes the instructions related to the subset of non-divergent threads to be retrieved from the multi-issue queue when the parallel processing unit is available. The address divergence unit causes the subset of non-divergent threads to be issued for execution on the parallel processing unit. The address divergence unit repeats the identifying, storing, and causing steps for the remaining threads in the thread group that are not included in the subset of non-divergent threads.
    Type: Grant
    Filed: January 11, 2012
    Date of Patent: March 28, 2017
    Assignee: NVIDIA Corporation
    Inventors: Jack Choquette, Xiaogang Qiu, Jeff Tuckey, Michael (Ming Yiu) Siu, Robert J. Stoll, Olivier Giroux
  • Patent number: 9477482
    Abstract: A system, method, and computer program product are provided for implementing a multi-cycle register file bypass mechanism. The method includes the steps of receiving a set of control bits, combining the set of control bits with a set of valid bits associated with previously issued instructions, and enabling a bypass path for each thread based on the set of control bits and the set of valid bits. Each valid bit in the set of valid bits indicates whether execution of an instruction of the previously issued instructions was enabled for a thread in a thread block.
    Type: Grant
    Filed: September 26, 2013
    Date of Patent: October 25, 2016
    Assignee: NVIDIA Corporation
    Inventors: Xiaogang Qiu, Ian Chi Yan Kwong, Ming Yiu Siu, Jack H. Choquette, Michael Alan Fetterman
  • Patent number: 9471307
    Abstract: A system and apparatus are provided that include an implementation for decoupled pipelines. The apparatus includes a scheduler configured to issue instructions to one or more functional units and a functional unit coupled to a queue having a number of slots for storing instructions. The instructions issued to the functional unit are stored in the queue until the functional unit is available to process the instructions.
    Type: Grant
    Filed: January 3, 2014
    Date of Patent: October 18, 2016
    Assignee: NVIDIA Corporation
    Inventors: Olivier Giroux, Michael Alan Fetterman, Robert Ohannessian, Jr., Shirish Gadre, Jack H. Choquette, Xiaogang Qiu, Jeffrey Scott Tuckey, Robert James Stoll
  • Publication number: 20150220341
    Abstract: A system, method, and computer program product are provided for implementing a software-based scoreboarding mechanism. The method includes the steps of receiving a dependency barrier instruction that includes an immediate value and an identifier corresponding to a first register and, based on a comparison of the immediate value to the value stored in the first register, dispatching a subsequent instruction to at least a first processing unit of two or more processing units.
    Type: Application
    Filed: February 3, 2014
    Publication date: August 6, 2015
    Applicant: NVIDIA Corporation
    Inventors: Robert Ohannessian, JR., Michael Alan Fetterman, Olivier Giroux, Jack H. Choquette, Xiaogang Qiu, Shirish Gadre, Meenaradchagan Vishnu
  • Publication number: 20150193272
    Abstract: A system and apparatus are provided that include an implementation for decoupled pipelines. The apparatus includes a scheduler configured to issue instructions to one or more functional units and a functional unit coupled to a queue having a number of slots for storing instructions. The instructions issued to the functional unit are stored in the queue until the functional unit is available to process the instructions.
    Type: Application
    Filed: January 3, 2014
    Publication date: July 9, 2015
    Applicant: NVIDIA Corporation
    Inventors: Olivier Giroux, Michael Alan Fetterman, Robert Ohannessian, JR., Shirish Gadre, Jack H. Choquette, Xiaogang Qiu, Jeffrey Scott Tuckey, Robert James Stoll
  • Publication number: 20150113538
    Abstract: One embodiment of the present invention is a computer-implemented method for scheduling a thread group for execution on a processing engine that includes identifying a first thread group included in a first set of thread groups that can be issued for execution on the processing engine, where the first thread group includes one or more threads. The method also includes transferring the first thread group from the first set of thread groups to a second set of thread groups, allocating hardware resources to the first thread group, and selecting the first thread group from the second set of thread groups for execution on the processing engine. One advantage of the disclosed technique is that a scheduler only allocates limited hardware resources to thread groups that are, in fact, ready to be issued for execution, thereby conserving those resources in a manner that is generally more efficient than conventional techniques.
    Type: Application
    Filed: October 23, 2013
    Publication date: April 23, 2015
    Applicant: NVIDIA CORPORATION
    Inventors: Olivier GIROUX, Jack Hilaire CHOQUETTE, Robert J. STOLL, Xiaogang QIU, Michael Alan FETTERMAN
  • Publication number: 20150089202
    Abstract: A system, method, and computer program product are provided for implementing a multi-cycle register file bypass mechanism. The method includes the steps of receiving a set of control bits, combining the set of control bits with a set of valid bits associated with previously issued instructions, and enabling a bypass path for each thread based on the set of control bits and the set of valid bits. Each valid bit in the set of valid bits indicates whether execution of an instruction of the previously issued instructions was enabled for a thread in a thread block.
    Type: Application
    Filed: September 26, 2013
    Publication date: March 26, 2015
    Applicant: NVIDIA Corporation
    Inventors: Xiaogang Qiu, Ian Chi Yan Kwong, Ming Yiu Siu, Jack H. Choquette, Michael Alan Fetterman
  • Publication number: 20140164743
    Abstract: Systems and methods for scheduling instructions for execution on a multi-core processor reorder the execution of different threads to ensure that instructions specified as having localized memory access behavior are executed over one or more sequential clock cycles to benefit from memory access locality. At compile time, code sequences including memory access instructions that may be localized are delineated into separate batches. A scheduling unit ensures that multiple parallel threads are processed over one or more sequential scheduling cycles to execute the batched instructions. The scheduling unit waits to schedule execution of instructions that are not included in the particular batch until execution of the batched instructions is done so that memory access locality is maintained for the particular batch. In between the separate batches, instructions that are not included in a batch are scheduled so that threads executing non-batched instructions are also processed and not starved.
    Type: Application
    Filed: December 10, 2012
    Publication date: June 12, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Olivier GIROUX, Jack Hilaire CHOQUETTE, Xiaogang QIU, Robert J. STOLL
  • Patent number: 8533435
    Abstract: One embodiment of the present invention sets forth a technique for collecting operands specified by an instruction. As a sequence of instructions is received the operands specified by the instructions are assigned to ports, so that each one of the operands specified by a single instruction is assigned to a different port. Reading of the operands from a multi-bank register file is scheduled by selecting an operand from each one of the different ports to produce an operand read request and ensuring that two or more of the selected operands are not stored in the same bank of the multi-bank register file. The operands specified by the operand read request are read from the multi-bank register file in a single clock cycle. Each instruction is then executed as the operands specified by the instruction are read from the multi-bank register file and collected over one or more clock cycles.
    Type: Grant
    Filed: September 3, 2010
    Date of Patent: September 10, 2013
    Assignee: NVIDIA Corporation
    Inventors: Xiaogang Qiu, Ming Y. Siu, Yan Yan Tang, John Erik Lindholm, Michael C. Shebanow, Stuart F. Oberman
  • Publication number: 20130179662
    Abstract: An address divergence unit detects divergence between threads in a thread group and then separates those threads into a subset of non-divergent threads and a subset of divergent threads. In one embodiment, the address divergence unit causes instructions associated with the subset of non-divergent threads to be issued for execution on a parallel processing unit, while causing the instructions associated with the subset of divergent threads to be re-fetched and re-issued for execution.
    Type: Application
    Filed: January 11, 2012
    Publication date: July 11, 2013
    Inventors: Jack CHOQUETTE, Xiaogang Qiu, Jeff Tuckey, Michael (Ming Yiu) Siu, Robert J. Stoll, Olivier Giroux
  • Publication number: 20130166877
    Abstract: One embodiment of the present invention sets forth a technique for performing a shaped access of a register file that includes a set of N registers, wherein N is greater than or equal to two. The technique involves, for at least one thread included in a group of threads, receiving a request to access a first amount of data from each register in the set of N registers, and configuring a crossbar to allow the at least one thread to access the first amount of data from each register in the set of N registers.
    Type: Application
    Filed: December 22, 2011
    Publication date: June 27, 2013
    Inventors: Jack Hilaire CHOQUETTE, Michael FETTERMAN, Shirish GADRE, Xiaogang QIU, Omkar PARANJAPE, Anjana RAJENDRAN, Stewart Glenn CARLTON, Eric Lyell HILL, Rajeshwaran SELVANESAN, Douglas J. HAHN
  • Publication number: 20130145124
    Abstract: One embodiment of the present invention sets forth a technique that provides an efficient way to retrieve operands from a register file. Specifically, the instruction dispatch unit receives one or more instructions, each of which includes one or more operands. Collectively, the operands are organized into one or more operand groups from which a shaped access may be formed. The operands are retrieved from the register file and stored in a collector. Once all operands are read and collected in the collector, the instruction dispatch unit transmits the instructions and corresponding operands to functional units within the streaming multiprocessor for execution. One advantage of the present invention is that multiple operands are retrieved from the register file in a single register access operation without resource conflict. Performance in retrieving operands from the register file is improved by forming shaped accesses that efficiently retrieve operands exhibiting recognized memory access patterns.
    Type: Application
    Filed: December 6, 2011
    Publication date: June 6, 2013
    Inventors: Xiaogang Qiu, Jack Hilaire Choquette, Manuel Olivier Gautho, Ming Y. (Michael) Siu
  • Publication number: 20130117541
    Abstract: One embodiment of the present invention sets forth a technique for speculatively issuing instructions to allow a processing pipeline to continue to process some instructions during rollback of other instructions. A scheduler circuit issues instructions for execution assuming that, several cycles later, when the instructions reach multithreaded execution units, that dependencies between the instructions will be resolved, resources will be available, operand data will be available, and other conditions will not prevent execution of the instructions. When a rollback condition exists at the point of execution for an instruction for a particular thread group, the instruction is not dispatched to the multithreaded execution units. However, other instructions issued by the scheduler circuit for execution by different thread groups, and for which a rollback condition does not exist, are executed by the multithreaded execution units.
    Type: Application
    Filed: November 4, 2011
    Publication date: May 9, 2013
    Inventors: Jack Hilaire CHOQUETTE, Olivier Giroux, Robert J. Stoll, Xiaogang Qiu
  • Patent number: 8321761
    Abstract: A memory module includes a plurality of register files. Each register file is associated with a set of error-correcting code (ECC) bits and ECC check/correct logic that can provide error-correcting functionality, if required. When error-correcting functionality is not required, ECC bits are grouped together to form additional register files, thereby providing additional storage space.
    Type: Grant
    Filed: September 28, 2009
    Date of Patent: November 27, 2012
    Assignee: NVIDIA Corporation
    Inventors: Fred Gruner, Xiaogang Qiu, Yan Yan Tang
  • Patent number: 8250439
    Abstract: A memory module includes a plurality of register files. Each register file is associated with a set of error-correcting code (ECC) bits and ECC check/correct logic that can provide error-correcting functionality, if required. When error-correcting functionality is not required, ECC bits are grouped together to form additional register files, thereby providing additional storage space.
    Type: Grant
    Filed: September 28, 2009
    Date of Patent: August 21, 2012
    Assignee: NVIDIA Corporation
    Inventors: Fred Gruner, Xiaogang Qiu
  • Publication number: 20110072243
    Abstract: One embodiment of the present invention sets forth a technique for collecting operands specified by an instruction. As a sequence of instructions is received the operands specified by the instructions are assigned to ports, so that each one of the operands specified by a single instruction is assigned to a different port. Reading of the operands from a multi-bank register file is scheduled by selecting an operand from each one of the different ports to produce an operand read request and ensuring that two or more of the selected operands are not stored in the same bank of the multi-bank register file. The operands specified by the operand read request are read from the multi-bank register file in a single clock cycle. Each instruction is then executed as the operands specified by the instruction are read from the multi-bank register file and collected over one or more clock cycles.
    Type: Application
    Filed: September 3, 2010
    Publication date: March 24, 2011
    Inventors: Xiaogang Qiu, Ming Y. Siu, Yan Yan Tang, John Erik Lindholm, Michael C. Shebanow, Stuart F. Oberman
  • Patent number: 7509643
    Abstract: One embodiment of the present invention facilitates favoring the performance of a single-threaded application in a computer system that supports simultaneous multi-threading (SMT), wherein multiple threads of execution simultaneously execute in an interleaved manner on functional units within a processor. During operation, the system maintains a priority for each simultaneously executing thread. The system uses these priorities in allocating a shared computational resource between the simultaneously executing threads, so that a thread with a higher priority is given preferential access to the shared computational resource. This asymmetric treatment of the threads enables+ the system to favor the performance of a single-threaded application while performing simultaneous multi-threading.
    Type: Grant
    Filed: March 24, 2003
    Date of Patent: March 24, 2009
    Assignee: Sun Microsystems, Inc.
    Inventors: Xiaogang Qiu, Si-En Chang
  • Patent number: 7484039
    Abstract: Embodiments of the present invention facilitate implementing external storage systems using commodity computer components to achieve high performance and reliability. An exemplary method facilitates dynamic repairing of disk failures for RAID1 storage coherently across a plurality of loosely coupled storage controller computers via message communications through network interfaces. An exemplary method facilitates snapshot function coherently across a plurality of loosely coupled storage controller nodes via message communications through network interfaces. An exemplary method facilitates to detect, tolerate, and repair temporary target device failures in a networked storage system. An exemplary target device may contain a plurality of disk devices, and a temporary target device failure may due to many reasons such as a network or software glitch.
    Type: Grant
    Filed: May 22, 2006
    Date of Patent: January 27, 2009
    Inventors: Xiaogang Qiu, Ningchuan Shen