Patents by Inventor Yun Du

Yun Du has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11829439
    Abstract: The present disclosure relates to methods and apparatus for compute processing. For example, disclosed techniques facilitate improving performance of matrix multiplication in streaming processor. Aspects of the present disclosure can execute, with a load control unit, a first load instruction to load a set of input data of an input matrix from a first memory to a second memory. Aspects of the present disclosure can also execute, with the load control unit, a second load instruction to load a set of weight data of a weight matrix from the first memory to the second memory. Additionally, aspects of the present disclosure can perform, with an ALU component, a matrix multiplication operation using the set of input data and the set of weight data to generate an output matrix. Further, aspects of the present disclosure can store the output matrix at a general purpose register accessible to the ALU component.
    Type: Grant
    Filed: December 29, 2020
    Date of Patent: November 28, 2023
    Assignee: QUALCOMM Incorporated
    Inventors: Yun Du, Gang Zhong, Fei Wei, Yibin Zhang, Jing Han, Hongjiang Shang, Elina Kamenetskaya, Minjie Huang, Alexei Vladimirovich Bourd, Chun Yu, Andrew Evan Gruber, Eric Demers
  • Publication number: 20230377240
    Abstract: Aspects presented herein relate to methods and devices for graphics processing including an apparatus, e.g., a GPU. The apparatus may receive a set of draw call instructions corresponding to a graphics workload, where the set of draw call instructions is associated with at least one run-time parameter. The apparatus may also obtain a first shader program associated with storing data in a system memory and at least one second shader program associated with storing data in a constant memory. Further, the apparatus may execute the first shader program or the at least one second shader program based on whether the at least one run-time parameter is less than or equal to a size of the constant memory. The apparatus may also update or maintain a configuration of a shader processor or a streaming processor based on executing the first shader program or the at least one second shader program.
    Type: Application
    Filed: May 18, 2022
    Publication date: November 23, 2023
    Inventors: Yun DU, Eric DEMERS, Andrew Evan GRUBER, Chun YU, Chihong ZHANG, Baoguang YANG, Yuehai DU, Gang ZHONG, Avinash SEETHARAMAIAH, Jonnala Gadda NAGENDRA KUMAR
  • Publication number: 20230297349
    Abstract: A computer-implemented method of transforming a high-level program for mapping onto a coarse-grained reconfigurable (CGR) processor with an array of CGR units, including sectioning a dataflow graph into a plurality of sections; extracting performance information for each of the plurality of sections; on a CGR unit: assigning to a section at least two computations dependent on a first data element; scheduling an additional load of the first data element in response to available memory bandwidth for that section; eliminating a buffer between the additional load of the first data element and one of the two computations, for that section; generating configuration data for the and communication channels, wherein the configuration data, when loaded onto an instance of the array of CGR units, causes the array of CGR units to implement the dataflow graph; and storing the configuration data in a non-transitory computer-readable storage medium.
    Type: Application
    Filed: March 15, 2023
    Publication date: September 21, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Gao DENG, Weihang FAN, Fei WANG, Yun DU
  • Patent number: 11763419
    Abstract: This disclosure provides systems, devices, apparatus and methods, including computer programs encoded on storage media, for GPR optimization in a GPU based on a GPR release mechanism. More specifically, a GPU may determine at least one unutilized branch within an executable shader based on constants defined for the executable shader. Based on the at least one unutilized branch, the GPU may further determine a number of GPRs that can be deallocated from previously allocated GPRs. The GPU may deallocate, for a subsequent thread within a draw call, the number of GPRs from the previously allocated GPRs during execution of the executable shader based on the determined number of GPRs to be deallocated.
    Type: Grant
    Filed: October 14, 2022
    Date of Patent: September 19, 2023
    Assignee: QUALCOMM Incorporated
    Inventors: Andrew Evan Gruber, Yun Du
  • Publication number: 20230290034
    Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for fast incremental shared constants. In aspects, a CPU may determine/update shared constant data for a first draw call of a plurality of draw calls. The shared constant data, which may correspond to at least one shader, may be updated based on a draw call update for the first draw call. The CPU may communicate the updated shared constant data for the first draw call to a GPU. The GPU may receive, in at least one register, the updated shared constant data from the CPU and configure the at least one register based on the updated shared constant data corresponding to the draw call update of the first draw call of the plurality of draw calls.
    Type: Application
    Filed: May 15, 2023
    Publication date: September 14, 2023
    Inventors: Thomas Edwin FRISINGER, Richard HAMMERSTONE, Andrew Evan GRUBER, Gang ZHONG, Yun DU, Jonnala Gadda NAGENDRA KUMAR
  • Publication number: 20230267567
    Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for dynamic wave pairing. A graphics processor may allocate one or more GPU workloads to one or more wave slots of a plurality of wave slots. The graphics processor may select a first execution slot of a plurality of execution slots for executing the one or more GPU workloads. The selection may be based on one of a plurality of granularities. The graphics processor may execute, at the selected first execution slot, the one or more GPU workloads at the one of the plurality of granularities.
    Type: Application
    Filed: February 24, 2022
    Publication date: August 24, 2023
    Inventors: Yun DU, Andrew Evan GRUBER, Zilin YING, Chunling HU, Baoguang YANG, Yang XIA, Gang ZHONG, Chun YU, Eric DEMERS
  • Patent number: 11694384
    Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for fast incremental shared constants. In aspects, a CPU may determine/update shared constant data for a first draw call of a plurality of draw calls. The shared constant data, which may correspond to at least one shader, may be updated based on a draw call update for the first draw call. The CPU may communicate the updated shared constant data for the first draw call to a GPU. The GPU may receive, in at least one register, the updated shared constant data from the CPU and configure the at least one register based on the updated shared constant data corresponding to the draw call update of the first draw call of the plurality of draw calls.
    Type: Grant
    Filed: October 30, 2020
    Date of Patent: July 4, 2023
    Assignee: QUALCOMM Incorporated
    Inventors: Thomas Edwin Frisinger, Richard Hammerstone, Andrew Evan Gruber, Gang Zhong, Yun Du, Jonnala Gadda Nagendra Kumar
  • Patent number: 11657471
    Abstract: The present disclosure relates to methods and devices for graphics processing including an apparatus, e.g., a GPU. The apparatus may generate a table including a plurality of entries to store data associated with at least one of a constant value or an immediate value. The apparatus may also process, upon generating the table, first data including at least one of a constant value or an immediate value. Further, the apparatus may store, in the generated table, at least one of the constant value or the immediate value of the first data. The apparatus may also transmit, upon storing at least one of the constant value or the immediate value in the table, the table including the stored at least one of the constant value or the immediate value of the first data.
    Type: Grant
    Filed: June 23, 2021
    Date of Patent: May 23, 2023
    Assignee: QUALCOMM Incorporated
    Inventors: Yun Du, Andrew Evan Gruber, Chihong Zhang, Jian Jiang, Gang Zhong, Baoguang Yang, Yang Xia, Chun Yu, Eric Demers
  • Publication number: 20230113415
    Abstract: This disclosure provides systems, devices, apparatus and methods, including computer programs encoded on storage media, for GPR optimization in a GPU based on a GPR release mechanism. More specifically, a GPU may determine at least one unutilized branch within an executable shader based on constants defined for the executable shader. Based on the at least one unutilized branch, the GPU may further determine a number of GPRs that can be deallocated from previously allocated GPRs. The GPU may deallocate, for a subsequent thread within a draw call, the number of GPRs from the previously allocated GPRs during execution of the executable shader based on the determined number of GPRs to be deallocated.
    Type: Application
    Filed: October 14, 2022
    Publication date: April 13, 2023
    Inventors: Andrew Evan GRUBER, Yun DU
  • Publication number: 20230019763
    Abstract: The present disclosure relates to methods and apparatus for graphics processing. For example, disclosed techniques facilitate improving bindless state processing at a graphics processor. Aspects of the present disclosure can receive, at a graphics processor, a shader program including a preamble section and a main instructions section. Aspects of the present disclosure can also execute, with a scalar processor dedicated to processing preamble sections, instructions of the preamble section to implement a bindless mechanism for loading constant data associated with the shader program. Additionally, aspects of the present disclosure can distribute the main instructions section and the constant data to a streaming processor for executing the shader program.
    Type: Application
    Filed: January 31, 2020
    Publication date: January 19, 2023
    Inventors: Yun DU, Andrew Evan GRUBER, Chun YU, Chihong ZHANG, Thomas Edwin FRISINGER, Richard HAMMERSTONE, Zilin YING, Heng QI, Quanquan XU, Sheng GU
  • Publication number: 20220414814
    Abstract: The present disclosure relates to methods and devices for graphics processing including an apparatus, e.g., a GPU. The apparatus may generate a table including a plurality of entries to store data associated with at least one of a constant value or an immediate value. The apparatus may also process, upon generating the table, first data including at least one of a constant value or an immediate value. Further, the apparatus may store, in the generated table, at least one of the constant value or the immediate value of the first data. The apparatus may also transmit, upon storing at least one of the constant value or the immediate value in the table, the table including the stored at least one of the constant value or the immediate value of the first data.
    Type: Application
    Filed: June 23, 2021
    Publication date: December 29, 2022
    Inventors: Yun DU, Andrew Evan GRUBER, Chihong ZHANG, Jian JIANG, Gang ZHONG, Baoguang YANG, Yang XIA, Chun YU, Eric DEMERS
  • Publication number: 20220357983
    Abstract: The present disclosure relates to methods and devices for graphics processing including an apparatus, e.g., a GPU. The apparatus may receive a plurality of workloads based on a workload order, each of the plurality of workloads being received in the workload order including at least a first workload and a second workload. The apparatus may also allocate one or more workloads of the plurality of workloads to one or more wave slots. Additionally, the apparatus may execute the one or more allocated workloads at the one or more wave slots, such that at least the first workload is executed at the first wave slot and the second workload is executed at the second wave slot. The apparatus may also allocate at least one other workload of the plurality of workloads to at least one previously-allocated wave slot of the one or more wave slots.
    Type: Application
    Filed: May 7, 2021
    Publication date: November 10, 2022
    Inventors: Yun DU, Andrew Evan GRUBER, Zilin YING, Gang ZHONG, Baoguang YANG, Yang YU, Yang XIA, Ravindra KUMAR, Chun YU, Eric DEMERS
  • Patent number: 11475533
    Abstract: This disclosure provides systems, devices, apparatus and methods, including computer programs encoded on storage media, for GPR optimization in a GPU based on a GPR release mechanism. More specifically, a GPU may determine at least one unutilized branch within an executable shader based on constants defined for the executable shader. Based on the at least one unutilized branch, the GPU may further determine a number of GPRs that can be deallocated from previously allocated GPRs. The GPU may deallocate, for a subsequent thread within a draw call, the number of GPRs from the previously allocated GPRs during execution of the executable shader based on the determined number of GPRs to be deallocated.
    Type: Grant
    Filed: May 18, 2020
    Date of Patent: October 18, 2022
    Assignee: QUALCOMM Incorporated
    Inventors: Andrew Evan Gruber, Yun Du
  • Publication number: 20220139021
    Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for fast incremental shared constants. In aspects, a CPU may determine/update shared constant data for a first draw call of a plurality of draw calls. The shared constant data, which may correspond to at least one shader, may be updated based on a draw call update for the first draw call. The CPU may communicate the updated shared constant data for the first draw call to a GPU. The GPU may receive, in at least one register, the updated shared constant data from the CPU and configure the at least one register based on the updated shared constant data corresponding to the draw call update of the first draw call of the plurality of draw calls.
    Type: Application
    Filed: October 30, 2020
    Publication date: May 5, 2022
    Inventors: Thomas Edwin FRISINGER, Richard HAMMERSTONE, Andrew Evan GRUBER, Gang ZHONG, Yun DU, Jonnala Gadda NAGENDRA KUMAR
  • Patent number: 11204765
    Abstract: A graphics processing unit (GPU) utilizes block general purpose registers (bGPRs) to load multiple waves of samples for an instruction group into a processing pipeline and receive processed samples from the pipeline. The GPU acquires a credit for the bGPR for execution of the instruction group for a first wave using a persistent GPR and the bGPR. The GPU refunds the credit upon loading the first wave into the pipeline. The GPU executes a subsequent wave for the instruction group to load samples to the pipeline when at least one credit is available and the pipeline is processing the first wave. The GPU stores an indication of each wave that has been loaded into the pipeline in a queue. The GPU returns samples for a next wave in the queue from the pipeline to the bGPR for further processing when the physical slot of the bGPR is available.
    Type: Grant
    Filed: August 26, 2020
    Date of Patent: December 21, 2021
    Assignee: QUALCOMM Incorporated
    Inventors: Yun Du, Fei Wei, Gang Zhong, Minjie Huang, Jian Jiang, Zilin Ying, Baoguang Yang, Yang Xia, Jing Han, Liangxiao Hu, Chihong Zhang, Chun Yu, Andrew Evan Gruber, Eric Demers
  • Publication number: 20210358076
    Abstract: This disclosure provides systems, devices, apparatus and methods, including computer programs encoded on storage media, for GPR optimization in a GPU based on a GPR release mechanism. More specifically, a GPU may determine at least one unutilized branch within an executable shader based on constants defined for the executable shader. Based on the at least one unutilized branch, the GPU may further determine a number of GPRs that can be deallocated from previously allocated GPRs. The GPU may deallocate, for a subsequent thread within a draw call, the number of GPRs from the previously allocated GPRs during execution of the executable shader based on the determined number of GPRs to be deallocated.
    Type: Application
    Filed: May 18, 2020
    Publication date: November 18, 2021
    Inventors: Andrew Evan GRUBER, Yun DU
  • Patent number: 11132760
    Abstract: Methods, systems, and devices for graphic processing are described. The methods, systems, and devices may include or be associated with identifying a graphics instruction, determining that the graphics instruction is alias enabled for the device, partitioning an alias lookup table into one or more slots, allocating a slot of the alias lookup table based on the partitioning and determining that the graphics instruction is alias enabled, generating an alias instruction based on allocating the slot of the alias lookup table and determining that the graphics instruction is alias enabled, and processing the alias instruction.
    Type: Grant
    Filed: December 13, 2019
    Date of Patent: September 28, 2021
    Assignee: QUALCOMM Incorporated
    Inventors: Yun Du, Andrew Evan Gruber, Chihong Zhang, Gang Zhong, Jian Jiang, Fei Wei, Minjie Huang, Zilin Ying, Yang Xia, Jing Han, Chun Yu, Eric Demers
  • Patent number: 11094103
    Abstract: Example techniques are described for generating graphics content by obtaining texture operation instructions corresponding to a texture operation, in response to determining at least one of insufficient general purpose register space is available for the texture operation or insufficient wave slots are available for the texture operation, generating an indication that the texture operation corresponds to a deferred wave, executing the texture operation, sending, to a texture processor, initial texture sample instructions corresponding to the texture operation that was executed, and receiving texture mapped data corresponding to the initial texture sample instructions.
    Type: Grant
    Filed: March 26, 2019
    Date of Patent: August 17, 2021
    Assignee: QUALCOMM Incorporated
    Inventors: Yun Du, Andrew Evan Gruber, Chun Yu, Chihong Zhang, Hongjiang Shang, Zilin Ying, Fei Wei
  • Patent number: 11094032
    Abstract: Methods, systems, and devices for image processing are described. A device may determine, based on a test operation, to terminate a first wave associated with a first slot of a set of slots. The device may update a terminated wave bit associated with the first slot based on the determination to terminate the first wave. In some aspects, the device may update a number of invocations field associated with the first wave based on the determination to terminate the first wave. The device may release the first slot based on updating the terminated wave bit and the number of invocations field. In some examples, the device may output the number of invocations field to a rendering backend of the device based on the terminated wave bit.
    Type: Grant
    Filed: January 3, 2020
    Date of Patent: August 17, 2021
    Assignee: QUALCOMM Incorporated
    Inventors: Yun Du, Chun Yu, Andrew Evan Gruber, Zilin Ying, Baoguang Yang
  • Publication number: 20210209717
    Abstract: Methods, systems, and devices for image processing are described. A device may determine, based on a test operation, to terminate a first wave associated with a first slot of a set of slots. The device may update a terminated wave bit associated with the first slot based on the determination to terminate the first wave. In some aspects, the device may update a number of invocations field associated with the first wave based on the determination to terminate the first wave. The device may release the first slot based on updating the terminated wave bit and the number of invocations field. In some examples, the device may output the number of invocations field to a rendering backend of the device based on the terminated wave bit.
    Type: Application
    Filed: January 3, 2020
    Publication date: July 8, 2021
    Inventors: Yun DU, Chun YU, Andrew Evan GRUBER, Zilin YING, Baoguang YANG