Patents by Inventor Hongjiang Shang

Hongjiang Shang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Performing matrix multiplication in a streaming processor

Patent number: 12229215

Abstract: The present disclosure relates to methods and apparatus for compute processing. For example, disclosed techniques facilitate improving performance of matrix multiplication in streaming processor. Aspects of the present disclosure can execute, with a load control unit, a first load instruction to load a set of input data of an input matrix from a first memory to a second memory. Aspects of the present disclosure can also execute, with the load control unit, a second load instruction to load a set of weight data of a weight matrix from the first memory to the second memory. Additionally, aspects of the present disclosure can perform, with an ALU component, a matrix multiplication operation using the set of input data and the set of weight data to generate an output matrix. Further, aspects of the present disclosure can store the output matrix at a general purpose register accessible to the ALU component.

Type: Grant

Filed: October 16, 2023

Date of Patent: February 18, 2025

Assignee: QUALCOMM Incorporated

Inventors: Yun Du, Gang Zhong, Fei Wei, Yibin Zhang, Jing Han, Hongjiang Shang, Elina Kamenetskaya, Minjie Huang, Alexei Vladimirovich Bourd, Chun Yu, Andrew Evan Gruber, Eric Demers
PERFORMING MATRIX MULTIPLICATION IN A STREAMING PROCESSOR

Publication number: 20240037183

Abstract: The present disclosure relates to methods and apparatus for compute processing. For example, disclosed techniques facilitate improving performance of matrix multiplication in streaming processor. Aspects of the present disclosure can execute, with a load control unit, a first load instruction to load a set of input data of an input matrix from a first memory to a second memory. Aspects of the present disclosure can also execute, with the load control unit, a second load instruction to load a set of weight data of a weight matrix from the first memory to the second memory. Additionally, aspects of the present disclosure can perform, with an ALU component, a matrix multiplication operation using the set of input data and the set of weight data to generate an output matrix. Further, aspects of the present disclosure can store the output matrix at a general purpose register accessible to the ALU component.

Type: Application

Filed: October 16, 2023

Publication date: February 1, 2024

Inventors: Yun DU, Gang ZHONG, Fei WEI, Yibin ZHANG, Jing HAN, Hongjiang SHANG, Elina KAMENETSKAYA, Minjie HUANG, Alexei Vladimirovich BOURD, Chun YU, Andrew Evan GRUBER, Eric DEMERS
Methods and apparatus to perform matrix multiplication in a streaming processor

Patent number: 11829439

Abstract: The present disclosure relates to methods and apparatus for compute processing. For example, disclosed techniques facilitate improving performance of matrix multiplication in streaming processor. Aspects of the present disclosure can execute, with a load control unit, a first load instruction to load a set of input data of an input matrix from a first memory to a second memory. Aspects of the present disclosure can also execute, with the load control unit, a second load instruction to load a set of weight data of a weight matrix from the first memory to the second memory. Additionally, aspects of the present disclosure can perform, with an ALU component, a matrix multiplication operation using the set of input data and the set of weight data to generate an output matrix. Further, aspects of the present disclosure can store the output matrix at a general purpose register accessible to the ALU component.

Type: Grant

Filed: December 29, 2020

Date of Patent: November 28, 2023

Assignee: QUALCOMM Incorporated

Inventors: Yun Du, Gang Zhong, Fei Wei, Yibin Zhang, Jing Han, Hongjiang Shang, Elina Kamenetskaya, Minjie Huang, Alexei Vladimirovich Bourd, Chun Yu, Andrew Evan Gruber, Eric Demers
General purpose register and wave slot allocation in graphics processing

Patent number: 11094103

Abstract: Example techniques are described for generating graphics content by obtaining texture operation instructions corresponding to a texture operation, in response to determining at least one of insufficient general purpose register space is available for the texture operation or insufficient wave slots are available for the texture operation, generating an indication that the texture operation corresponds to a deferred wave, executing the texture operation, sending, to a texture processor, initial texture sample instructions corresponding to the texture operation that was executed, and receiving texture mapped data corresponding to the initial texture sample instructions.

Type: Grant

Filed: March 26, 2019

Date of Patent: August 17, 2021

Assignee: QUALCOMM Incorporated

Inventors: Yun Du, Andrew Evan Gruber, Chun Yu, Chihong Zhang, Hongjiang Shang, Zilin Ying, Fei Wei
METHODS AND APPARATUS TO PERFORM MATRIX MULTIPLICATION IN A STREAMING PROCESSOR

Publication number: 20210200836

Abstract: The present disclosure relates to methods and apparatus for compute processing. For example, disclosed techniques facilitate improving performance of matrix multiplication in streaming processor. Aspects of the present disclosure can execute, with a load control unit, a first load instruction to load a set of input data of an input matrix from a first memory to a second memory. Aspects of the present disclosure can also execute, with the load control unit, a second load instruction to load a set of weight data of a weight matrix from the first memory to the second memory. Additionally, aspects of the present disclosure can perform, with an ALU component, a matrix multiplication operation using the set of input data and the set of weight data to generate an output matrix. Further, aspects of the present disclosure can store the output matrix at a general purpose register accessible to the ALU component.

Type: Application

Filed: December 29, 2020

Publication date: July 1, 2021

Inventors: Yun DU, Gang ZHONG, Fei WEI, Yibin ZHANG, Jing HAN, Hongjiang SHANG, Elina KAMENETSKAYA, Minjie HUANG, Alexei Vladimirovich BOURD, Chun YU, Andrew Evan GRUBER, Eric DEMERS
GENERAL PURPOSE REGISTER AND WAVE SLOT ALLOCATION IN GRAPHICS PROCESSING

Publication number: 20200312006

Abstract: Example techniques are described for generating graphics content by obtaining texture operation instructions corresponding to a texture operation, in response to determining at least one of insufficient general purpose register space is available for the texture operation or insufficient wave slots are available for the texture operation, generating an indication that the texture operation corresponds to a deferred wave, executing the texture operation, sending, to a texture processor, initial texture sample instructions corresponding to the texture operation that was executed, and receiving texture mapped data corresponding to the initial texture sample instructions.

Type: Application

Filed: March 26, 2019

Publication date: October 1, 2020

Inventors: Yun DU, Andrew Evan GRUBER, Chun YU, Chihong ZHANG, Hongjiang SHANG, Zilin YING, Fei WEI
METHODS AND APPARATUS FOR IMPROVING GPU PIPELINE UTILIZATION

Publication number: 20200311859

Abstract: The present disclosure relates to methods and apparatus for graphics processing. In some aspects, multiple processing units can be in a graphics processing pipeline of a GPU. The apparatus can also group the multiple processing units into one or more processing unit clusters. In some aspects, each of the one or more processing unit clusters can correspond to one or more context registers. Additionally, the apparatus can determine one or more context states of the one or more context registers in each of the one or more processing unit clusters. Also, the apparatus can implement one or more execution counters corresponding to at least one of the one or more processing unit clusters in the graphics processing pipeline, where each of the one or more execution counters includes an execution value.

Type: Application

Filed: March 28, 2019

Publication date: October 1, 2020

Inventors: Yun DU, Nigel POOLE, Zilin YING, Ling Feng HUANG, Donghyun KIM, Chun YU, Tzun-Wei LEE, Xuefeng TANG, Shambhoo KHANDELWAL, Hongjiang SHANG, Elina KAMENETSKAYA, Zhu LIANG, Cary ROBINS
General purpose register allocation in streaming processor

Patent number: 10558460

Abstract: Systems and techniques are disclosed for general purpose register dynamic allocation based on latency associated with of instructions in processor threads. A streaming processor can include a general purpose registers configured to stored data associated with threads, and a thread scheduler configured to receive allocation information for the general purpose registers, the information describing general purpose registers that are to be assigned as persistent general purpose registers (pGPRs) and volatile general purpose registers (vGPRs). The plurality of general purpose registers can be allocated according to the received information. The streaming processor can include the general purpose registers allocated according to the received information, the allocated based on execution latencies of instructions included in the threads.

Type: Grant

Filed: December 14, 2016

Date of Patent: February 11, 2020

Assignee: QUALCOMM Incorporated

Inventors: Yun Du, Liang Han, Lin Chen, Chihong Zhang, Hongjiang Shang, Jing Wu, Zilin Ying, Chun Yu, Guofang Jiao, Andrew Gruber, Eric Demers
GENERAL PURPOSE REGISTER ALLOCATION IN STREAMING PROCESSOR

Publication number: 20180165092

Abstract: Systems and techniques are disclosed for general purpose register dynamic allocation based on latency associated with of instructions in processor threads. A streaming processor can include a general purpose registers configured to stored data associated with threads, and a thread scheduler configured to receive allocation information for the general purpose registers, the information describing general purpose registers that are to be assigned as persistent general purpose registers (pGPRs) and volatile general purpose registers (vGPRs). The plurality of general purpose registers can be allocated according to the received information. The streaming processor can include the general purpose registers allocated according to the received information, the allocated based on execution latencies of instructions included in the threads.

Type: Application

Filed: December 14, 2016

Publication date: June 14, 2018

Inventors: Yun Du, Liang Han, Lin Chen, Chihong Zhang, Hongjiang Shang, Jing Wu, Zilin Ying, Chun Yu, Guofang Jiao, Andrew Gruber, Eric Demers
Operand conflict resolution for reduced port general purpose register

Patent number: 9632783

Abstract: Techniques are described for determining whether execution of an instruction would require reading more values from a memory cell of a general purpose register (GPR) than a read port of the memory cell would allow. In such a case, the techniques may store, prior to execution of the instruction, one or more values from the memory cell in a separate conflict queue. During execution of the instruction to implement an operation defined by the instruction, one value that is an operand of the operation would be read from the memory cell and another value that is an operand of the operation other would be read from the conflict queue.

Type: Grant

Filed: October 3, 2014

Date of Patent: April 25, 2017

Assignee: QUALCOMM Incorporated

Inventors: Yun Du, Hongjiang Shang, Haikun Zhu
OPERAND CONFLICT RESOLUTION FOR REDUCED PORT GENERAL PURPOSE REGISTER

Publication number: 20160098276

Abstract: Techniques are described for determining whether execution of an instruction would require reading more values from a memory cell of a general purpose register (GPR) than a read port of the memory cell would allow. In such a case, the techniques may store, prior to execution of the instruction, one or more values from the memory cell in a separate conflict queue. During execution of the instruction to implement an operation defined by the instruction, one value that is an operand of the operation would be read from the memory cell and another value that is an operand of the operation other would be read from the conflict queue.

Type: Application

Filed: October 3, 2014

Publication date: April 7, 2016

Inventors: Yun Du, Hongjiang Shang, Haikun Zhu