Patents by Inventor Hongjiang Shang

Hongjiang Shang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240037183
    Abstract: The present disclosure relates to methods and apparatus for compute processing. For example, disclosed techniques facilitate improving performance of matrix multiplication in streaming processor. Aspects of the present disclosure can execute, with a load control unit, a first load instruction to load a set of input data of an input matrix from a first memory to a second memory. Aspects of the present disclosure can also execute, with the load control unit, a second load instruction to load a set of weight data of a weight matrix from the first memory to the second memory. Additionally, aspects of the present disclosure can perform, with an ALU component, a matrix multiplication operation using the set of input data and the set of weight data to generate an output matrix. Further, aspects of the present disclosure can store the output matrix at a general purpose register accessible to the ALU component.
    Type: Application
    Filed: October 16, 2023
    Publication date: February 1, 2024
    Inventors: Yun DU, Gang ZHONG, Fei WEI, Yibin ZHANG, Jing HAN, Hongjiang SHANG, Elina KAMENETSKAYA, Minjie HUANG, Alexei Vladimirovich BOURD, Chun YU, Andrew Evan GRUBER, Eric DEMERS
  • Patent number: 11829439
    Abstract: The present disclosure relates to methods and apparatus for compute processing. For example, disclosed techniques facilitate improving performance of matrix multiplication in streaming processor. Aspects of the present disclosure can execute, with a load control unit, a first load instruction to load a set of input data of an input matrix from a first memory to a second memory. Aspects of the present disclosure can also execute, with the load control unit, a second load instruction to load a set of weight data of a weight matrix from the first memory to the second memory. Additionally, aspects of the present disclosure can perform, with an ALU component, a matrix multiplication operation using the set of input data and the set of weight data to generate an output matrix. Further, aspects of the present disclosure can store the output matrix at a general purpose register accessible to the ALU component.
    Type: Grant
    Filed: December 29, 2020
    Date of Patent: November 28, 2023
    Assignee: QUALCOMM Incorporated
    Inventors: Yun Du, Gang Zhong, Fei Wei, Yibin Zhang, Jing Han, Hongjiang Shang, Elina Kamenetskaya, Minjie Huang, Alexei Vladimirovich Bourd, Chun Yu, Andrew Evan Gruber, Eric Demers
  • Patent number: 11094103
    Abstract: Example techniques are described for generating graphics content by obtaining texture operation instructions corresponding to a texture operation, in response to determining at least one of insufficient general purpose register space is available for the texture operation or insufficient wave slots are available for the texture operation, generating an indication that the texture operation corresponds to a deferred wave, executing the texture operation, sending, to a texture processor, initial texture sample instructions corresponding to the texture operation that was executed, and receiving texture mapped data corresponding to the initial texture sample instructions.
    Type: Grant
    Filed: March 26, 2019
    Date of Patent: August 17, 2021
    Assignee: QUALCOMM Incorporated
    Inventors: Yun Du, Andrew Evan Gruber, Chun Yu, Chihong Zhang, Hongjiang Shang, Zilin Ying, Fei Wei
  • Publication number: 20210200836
    Abstract: The present disclosure relates to methods and apparatus for compute processing. For example, disclosed techniques facilitate improving performance of matrix multiplication in streaming processor. Aspects of the present disclosure can execute, with a load control unit, a first load instruction to load a set of input data of an input matrix from a first memory to a second memory. Aspects of the present disclosure can also execute, with the load control unit, a second load instruction to load a set of weight data of a weight matrix from the first memory to the second memory. Additionally, aspects of the present disclosure can perform, with an ALU component, a matrix multiplication operation using the set of input data and the set of weight data to generate an output matrix. Further, aspects of the present disclosure can store the output matrix at a general purpose register accessible to the ALU component.
    Type: Application
    Filed: December 29, 2020
    Publication date: July 1, 2021
    Inventors: Yun DU, Gang ZHONG, Fei WEI, Yibin ZHANG, Jing HAN, Hongjiang SHANG, Elina KAMENETSKAYA, Minjie HUANG, Alexei Vladimirovich BOURD, Chun YU, Andrew Evan GRUBER, Eric DEMERS
  • Publication number: 20200312006
    Abstract: Example techniques are described for generating graphics content by obtaining texture operation instructions corresponding to a texture operation, in response to determining at least one of insufficient general purpose register space is available for the texture operation or insufficient wave slots are available for the texture operation, generating an indication that the texture operation corresponds to a deferred wave, executing the texture operation, sending, to a texture processor, initial texture sample instructions corresponding to the texture operation that was executed, and receiving texture mapped data corresponding to the initial texture sample instructions.
    Type: Application
    Filed: March 26, 2019
    Publication date: October 1, 2020
    Inventors: Yun DU, Andrew Evan GRUBER, Chun YU, Chihong ZHANG, Hongjiang SHANG, Zilin YING, Fei WEI
  • Publication number: 20200311859
    Abstract: The present disclosure relates to methods and apparatus for graphics processing. In some aspects, multiple processing units can be in a graphics processing pipeline of a GPU. The apparatus can also group the multiple processing units into one or more processing unit clusters. In some aspects, each of the one or more processing unit clusters can correspond to one or more context registers. Additionally, the apparatus can determine one or more context states of the one or more context registers in each of the one or more processing unit clusters. Also, the apparatus can implement one or more execution counters corresponding to at least one of the one or more processing unit clusters in the graphics processing pipeline, where each of the one or more execution counters includes an execution value.
    Type: Application
    Filed: March 28, 2019
    Publication date: October 1, 2020
    Inventors: Yun DU, Nigel POOLE, Zilin YING, Ling Feng HUANG, Donghyun KIM, Chun YU, Tzun-Wei LEE, Xuefeng TANG, Shambhoo KHANDELWAL, Hongjiang SHANG, Elina KAMENETSKAYA, Zhu LIANG, Cary ROBINS
  • Patent number: 10558460
    Abstract: Systems and techniques are disclosed for general purpose register dynamic allocation based on latency associated with of instructions in processor threads. A streaming processor can include a general purpose registers configured to stored data associated with threads, and a thread scheduler configured to receive allocation information for the general purpose registers, the information describing general purpose registers that are to be assigned as persistent general purpose registers (pGPRs) and volatile general purpose registers (vGPRs). The plurality of general purpose registers can be allocated according to the received information. The streaming processor can include the general purpose registers allocated according to the received information, the allocated based on execution latencies of instructions included in the threads.
    Type: Grant
    Filed: December 14, 2016
    Date of Patent: February 11, 2020
    Assignee: QUALCOMM Incorporated
    Inventors: Yun Du, Liang Han, Lin Chen, Chihong Zhang, Hongjiang Shang, Jing Wu, Zilin Ying, Chun Yu, Guofang Jiao, Andrew Gruber, Eric Demers
  • Publication number: 20180165092
    Abstract: Systems and techniques are disclosed for general purpose register dynamic allocation based on latency associated with of instructions in processor threads. A streaming processor can include a general purpose registers configured to stored data associated with threads, and a thread scheduler configured to receive allocation information for the general purpose registers, the information describing general purpose registers that are to be assigned as persistent general purpose registers (pGPRs) and volatile general purpose registers (vGPRs). The plurality of general purpose registers can be allocated according to the received information. The streaming processor can include the general purpose registers allocated according to the received information, the allocated based on execution latencies of instructions included in the threads.
    Type: Application
    Filed: December 14, 2016
    Publication date: June 14, 2018
    Inventors: Yun Du, Liang Han, Lin Chen, Chihong Zhang, Hongjiang Shang, Jing Wu, Zilin Ying, Chun Yu, Guofang Jiao, Andrew Gruber, Eric Demers
  • Patent number: 9632783
    Abstract: Techniques are described for determining whether execution of an instruction would require reading more values from a memory cell of a general purpose register (GPR) than a read port of the memory cell would allow. In such a case, the techniques may store, prior to execution of the instruction, one or more values from the memory cell in a separate conflict queue. During execution of the instruction to implement an operation defined by the instruction, one value that is an operand of the operation would be read from the memory cell and another value that is an operand of the operation other would be read from the conflict queue.
    Type: Grant
    Filed: October 3, 2014
    Date of Patent: April 25, 2017
    Assignee: QUALCOMM Incorporated
    Inventors: Yun Du, Hongjiang Shang, Haikun Zhu
  • Publication number: 20160098276
    Abstract: Techniques are described for determining whether execution of an instruction would require reading more values from a memory cell of a general purpose register (GPR) than a read port of the memory cell would allow. In such a case, the techniques may store, prior to execution of the instruction, one or more values from the memory cell in a separate conflict queue. During execution of the instruction to implement an operation defined by the instruction, one value that is an operand of the operation would be read from the memory cell and another value that is an operand of the operation other would be read from the conflict queue.
    Type: Application
    Filed: October 3, 2014
    Publication date: April 7, 2016
    Inventors: Yun Du, Hongjiang Shang, Haikun Zhu