Patents by Inventor Liangxiao Hu

Liangxiao Hu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11204765
    Abstract: A graphics processing unit (GPU) utilizes block general purpose registers (bGPRs) to load multiple waves of samples for an instruction group into a processing pipeline and receive processed samples from the pipeline. The GPU acquires a credit for the bGPR for execution of the instruction group for a first wave using a persistent GPR and the bGPR. The GPU refunds the credit upon loading the first wave into the pipeline. The GPU executes a subsequent wave for the instruction group to load samples to the pipeline when at least one credit is available and the pipeline is processing the first wave. The GPU stores an indication of each wave that has been loaded into the pipeline in a queue. The GPU returns samples for a next wave in the queue from the pipeline to the bGPR for further processing when the physical slot of the bGPR is available.
    Type: Grant
    Filed: August 26, 2020
    Date of Patent: December 21, 2021
    Assignee: QUALCOMM Incorporated
    Inventors: Yun Du, Fei Wei, Gang Zhong, Minjie Huang, Jian Jiang, Zilin Ying, Baoguang Yang, Yang Xia, Jing Han, Liangxiao Hu, Chihong Zhang, Chun Yu, Andrew Evan Gruber, Eric Demers
  • Patent number: 9361078
    Abstract: A compiler method for exploiting data value locality for computation reuse. When a code region having single entry and exit points and in which a potential computation reuse opportunity exists is identified during runtime, a helper thread is created separate from the master thread. One of the helper thread and master thread performs a computation specified in the code region, and the other of the helper thread and master thread looks up a value of the computation previously executed and stored in a lookup table. If the value of the computation previously executed is located in the lookup table, the other thread retrieves the value from the table, and ignores the computation performed by the thread. If the value of the computation is not located, the other thread obtains a result of the computation performed by the thread and stores the result in the lookup table for future computation reuse.
    Type: Grant
    Filed: March 19, 2007
    Date of Patent: June 7, 2016
    Assignee: International Business Machines Corporation
    Inventors: Yaoqing Gao, Liangxiao Hu, Guansong Zhang, Peng Zhao
  • Patent number: 9038045
    Abstract: Control flow information and data flow information associated with a program containing a upc_forall loop are built. A shared reference map data structure using the control flow information and the data flow information is created. All local shared accesses are hashed to facilitate a constant access stride after being rewritten. All local shared references in a hash entry having a longest list are privatized. The upc_forall loop is rewritten into a for loop. Responsive to a determination that an unprocessed upc_forall loop does not exist, dead store elimination is run. The control flow information and the data flow information associated with the program containing the for loop is rebuilt.
    Type: Grant
    Filed: November 15, 2011
    Date of Patent: May 19, 2015
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Yaoqing Gao, Liangxiao Hu, Raul Esteban Silvera, Ettore Tiotto
  • Publication number: 20130125105
    Abstract: Control flow information and data flow information associated with a program containing a upc_forall loop are built. A shared reference map data structure using the control flow information and the data flow information is created. All local shared accesses are hashed to facilitate a constant access stride after being rewritten. All local shared references in a hash entry having a longest list are privatized. The upc_forall loop is rewritten into a for loop. Responsive to a determination that an unprocessed upc_forall loop does not exist, dead store elimination is run. The control flow information and the data flow information associated with the program containing the for loop is rebuilt.
    Type: Application
    Filed: November 15, 2011
    Publication date: May 16, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Yaoqing Gao, Liangxiao Hu, Raul Esteban Silvera, Ettore Tiotto
  • Patent number: 8191057
    Abstract: Systems, methods and computer products for compiler support for aggressive safe load speculation. Exemplary embodiments include a method for aggressive safe load speculation for a compiler in a computer system, the method including building a control flow graph, identifying both countable and non-countable loops, gathering a set of candidate loops for load speculation, and for each candidate loop in the set of candidate loops gathered for load speculation, computing an estimate of the iteration count, delay cycles, and code size, performing a profitability analysis and determining an unroll factor based on the delay cycles and the code size, transforming the loop by generating a prologue loop to achieve data alignment and an unrolled main loop with loop directives, indicating which loads can safely be executed speculatively and performing low-level instruction scheduling on the generated unrolled main loop.
    Type: Grant
    Filed: August 27, 2007
    Date of Patent: May 29, 2012
    Assignee: International Business Machines Corporation
    Inventors: Roch G. Archambault, Geoffrey O. Blandy, Roland Froese, Yaoqing Gao, Liangxiao Hu, James L. McInnes, Raul E. Silvera
  • Publication number: 20090064119
    Abstract: Systems, methods and computer products for compiler support for aggressive safe load speculation. Exemplary embodiments include a method for aggressive safe load speculation for a compiler in a computer system, the method including building a control flow graph, identifying both countable and non-countable loops, gathering a set of candidate loops for load speculation, for each candidate loop in the set of candidate loops gathered for load speculation performing computing an estimate of the iteration count, delay cycles, and code size, performing a profitability analysis and determine an unroll factor based on the delay cycles and the code size, transforming the loop by generating a prologue loop to achieve data alignment and an unrolled main loop with loop directives, indicating which loads can safely be executed speculatively and performing low-level instruction on the generated unrolled main loop.
    Type: Application
    Filed: August 27, 2007
    Publication date: March 5, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Roch G. Archambault, Geoffrey O. Blandy, Roland Froese, Yaoqing Gao, Liangxiao Hu, James L. McInnes, Raul E. Silvera
  • Publication number: 20080235674
    Abstract: A compiler method for exploiting data value locality for computation reuse. When a code region having single entry and exit points and in which a potential computation reuse opportunity exists is identified during runtime, a helper thread is created separate from the master thread. One of the helper thread and master thread performs a computation specified in the code region, and the other of the helper thread and master thread looks up a value of the computation previously executed and stored in a lookup table. If the value of the computation previously executed is located in the lookup table, the other thread retrieves the value from the table, and ignores the computation performed by the thread. If the value of the computation is not located, the other thread obtains a result of the computation performed by the thread and stores the result in the lookup table for future computation reuse.
    Type: Application
    Filed: March 19, 2007
    Publication date: September 25, 2008
    Inventors: Yaoqing Gao, Liangxiao Hu, Guansong Zhang, Peng Zhao
  • Publication number: 20070089097
    Abstract: An optimization mechanism in a compiler performs region-based code straightening to line up frequently executed basic blocks together based on profile directed feedback. The region-based code straightening lines up the basic blocks in order of execution sequence, but does not move infrequently executed basic blocks too far away from predecessors. As such, code locality is maintained, thus reducing instruction cache misses.
    Type: Application
    Filed: October 13, 2005
    Publication date: April 19, 2007
    Inventors: Liangxiao Hu, Mark Mendell, Raul Silvera