Patents by Inventor Eric Demers

Eric Demers has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240046543
    Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for runtime optimization of the shader execution flow. A graphics processor may obtain instruction execution data associated with a graphics workload, the instruction execution data including graphics data for a set of shader operations. The graphics processor may configure, at a first iteration, at least one predication value based on the instruction execution data including the graphics data for the set of shader operations. The graphics processor may adjust, at a second iteration, an execution flow of the graphics workload based on the configured at least one predication value, the execution flow of the graphics workload including the set of shader operations. The graphics processor may execute or refrain from executing, at the second iteration, each of the set of shader operations based on the adjusted execution flow of the graphics workload.
    Type: Application
    Filed: August 5, 2022
    Publication date: February 8, 2024
    Inventors: Yun DU, Eric DEMERS, Andrew Evan GRUBER, Chun YU, Baoguang YANG, Chihong ZHANG, Yuehai DU, Avinash SEETHARAMAIAH, Jonnala Gadda NAGENDRA KUMAR, Gang ZHONG, Zilin YING, Fei WEI
  • Publication number: 20240037183
    Abstract: The present disclosure relates to methods and apparatus for compute processing. For example, disclosed techniques facilitate improving performance of matrix multiplication in streaming processor. Aspects of the present disclosure can execute, with a load control unit, a first load instruction to load a set of input data of an input matrix from a first memory to a second memory. Aspects of the present disclosure can also execute, with the load control unit, a second load instruction to load a set of weight data of a weight matrix from the first memory to the second memory. Additionally, aspects of the present disclosure can perform, with an ALU component, a matrix multiplication operation using the set of input data and the set of weight data to generate an output matrix. Further, aspects of the present disclosure can store the output matrix at a general purpose register accessible to the ALU component.
    Type: Application
    Filed: October 16, 2023
    Publication date: February 1, 2024
    Inventors: Yun DU, Gang ZHONG, Fei WEI, Yibin ZHANG, Jing HAN, Hongjiang SHANG, Elina KAMENETSKAYA, Minjie HUANG, Alexei Vladimirovich BOURD, Chun YU, Andrew Evan GRUBER, Eric DEMERS
  • Patent number: 11829439
    Abstract: The present disclosure relates to methods and apparatus for compute processing. For example, disclosed techniques facilitate improving performance of matrix multiplication in streaming processor. Aspects of the present disclosure can execute, with a load control unit, a first load instruction to load a set of input data of an input matrix from a first memory to a second memory. Aspects of the present disclosure can also execute, with the load control unit, a second load instruction to load a set of weight data of a weight matrix from the first memory to the second memory. Additionally, aspects of the present disclosure can perform, with an ALU component, a matrix multiplication operation using the set of input data and the set of weight data to generate an output matrix. Further, aspects of the present disclosure can store the output matrix at a general purpose register accessible to the ALU component.
    Type: Grant
    Filed: December 29, 2020
    Date of Patent: November 28, 2023
    Assignee: QUALCOMM Incorporated
    Inventors: Yun Du, Gang Zhong, Fei Wei, Yibin Zhang, Jing Han, Hongjiang Shang, Elina Kamenetskaya, Minjie Huang, Alexei Vladimirovich Bourd, Chun Yu, Andrew Evan Gruber, Eric Demers
  • Publication number: 20230377240
    Abstract: Aspects presented herein relate to methods and devices for graphics processing including an apparatus, e.g., a GPU. The apparatus may receive a set of draw call instructions corresponding to a graphics workload, where the set of draw call instructions is associated with at least one run-time parameter. The apparatus may also obtain a first shader program associated with storing data in a system memory and at least one second shader program associated with storing data in a constant memory. Further, the apparatus may execute the first shader program or the at least one second shader program based on whether the at least one run-time parameter is less than or equal to a size of the constant memory. The apparatus may also update or maintain a configuration of a shader processor or a streaming processor based on executing the first shader program or the at least one second shader program.
    Type: Application
    Filed: May 18, 2022
    Publication date: November 23, 2023
    Inventors: Yun DU, Eric DEMERS, Andrew Evan GRUBER, Chun YU, Chihong ZHANG, Baoguang YANG, Yuehai DU, Gang ZHONG, Avinash SEETHARAMAIAH, Jonnala Gadda NAGENDRA KUMAR
  • Publication number: 20230267567
    Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for dynamic wave pairing. A graphics processor may allocate one or more GPU workloads to one or more wave slots of a plurality of wave slots. The graphics processor may select a first execution slot of a plurality of execution slots for executing the one or more GPU workloads. The selection may be based on one of a plurality of granularities. The graphics processor may execute, at the selected first execution slot, the one or more GPU workloads at the one of the plurality of granularities.
    Type: Application
    Filed: February 24, 2022
    Publication date: August 24, 2023
    Inventors: Yun DU, Andrew Evan GRUBER, Zilin YING, Chunling HU, Baoguang YANG, Yang XIA, Gang ZHONG, Chun YU, Eric DEMERS
  • Patent number: 11657471
    Abstract: The present disclosure relates to methods and devices for graphics processing including an apparatus, e.g., a GPU. The apparatus may generate a table including a plurality of entries to store data associated with at least one of a constant value or an immediate value. The apparatus may also process, upon generating the table, first data including at least one of a constant value or an immediate value. Further, the apparatus may store, in the generated table, at least one of the constant value or the immediate value of the first data. The apparatus may also transmit, upon storing at least one of the constant value or the immediate value in the table, the table including the stored at least one of the constant value or the immediate value of the first data.
    Type: Grant
    Filed: June 23, 2021
    Date of Patent: May 23, 2023
    Assignee: QUALCOMM Incorporated
    Inventors: Yun Du, Andrew Evan Gruber, Chihong Zhang, Jian Jiang, Gang Zhong, Baoguang Yang, Yang Xia, Chun Yu, Eric Demers
  • Publication number: 20220414814
    Abstract: The present disclosure relates to methods and devices for graphics processing including an apparatus, e.g., a GPU. The apparatus may generate a table including a plurality of entries to store data associated with at least one of a constant value or an immediate value. The apparatus may also process, upon generating the table, first data including at least one of a constant value or an immediate value. Further, the apparatus may store, in the generated table, at least one of the constant value or the immediate value of the first data. The apparatus may also transmit, upon storing at least one of the constant value or the immediate value in the table, the table including the stored at least one of the constant value or the immediate value of the first data.
    Type: Application
    Filed: June 23, 2021
    Publication date: December 29, 2022
    Inventors: Yun DU, Andrew Evan GRUBER, Chihong ZHANG, Jian JIANG, Gang ZHONG, Baoguang YANG, Yang XIA, Chun YU, Eric DEMERS
  • Publication number: 20220357983
    Abstract: The present disclosure relates to methods and devices for graphics processing including an apparatus, e.g., a GPU. The apparatus may receive a plurality of workloads based on a workload order, each of the plurality of workloads being received in the workload order including at least a first workload and a second workload. The apparatus may also allocate one or more workloads of the plurality of workloads to one or more wave slots. Additionally, the apparatus may execute the one or more allocated workloads at the one or more wave slots, such that at least the first workload is executed at the first wave slot and the second workload is executed at the second wave slot. The apparatus may also allocate at least one other workload of the plurality of workloads to at least one previously-allocated wave slot of the one or more wave slots.
    Type: Application
    Filed: May 7, 2021
    Publication date: November 10, 2022
    Inventors: Yun DU, Andrew Evan GRUBER, Zilin YING, Gang ZHONG, Baoguang YANG, Yang YU, Yang XIA, Ravindra KUMAR, Chun YU, Eric DEMERS
  • Patent number: 11204765
    Abstract: A graphics processing unit (GPU) utilizes block general purpose registers (bGPRs) to load multiple waves of samples for an instruction group into a processing pipeline and receive processed samples from the pipeline. The GPU acquires a credit for the bGPR for execution of the instruction group for a first wave using a persistent GPR and the bGPR. The GPU refunds the credit upon loading the first wave into the pipeline. The GPU executes a subsequent wave for the instruction group to load samples to the pipeline when at least one credit is available and the pipeline is processing the first wave. The GPU stores an indication of each wave that has been loaded into the pipeline in a queue. The GPU returns samples for a next wave in the queue from the pipeline to the bGPR for further processing when the physical slot of the bGPR is available.
    Type: Grant
    Filed: August 26, 2020
    Date of Patent: December 21, 2021
    Assignee: QUALCOMM Incorporated
    Inventors: Yun Du, Fei Wei, Gang Zhong, Minjie Huang, Jian Jiang, Zilin Ying, Baoguang Yang, Yang Xia, Jing Han, Liangxiao Hu, Chihong Zhang, Chun Yu, Andrew Evan Gruber, Eric Demers
  • Patent number: 11132760
    Abstract: Methods, systems, and devices for graphic processing are described. The methods, systems, and devices may include or be associated with identifying a graphics instruction, determining that the graphics instruction is alias enabled for the device, partitioning an alias lookup table into one or more slots, allocating a slot of the alias lookup table based on the partitioning and determining that the graphics instruction is alias enabled, generating an alias instruction based on allocating the slot of the alias lookup table and determining that the graphics instruction is alias enabled, and processing the alias instruction.
    Type: Grant
    Filed: December 13, 2019
    Date of Patent: September 28, 2021
    Assignee: QUALCOMM Incorporated
    Inventors: Yun Du, Andrew Evan Gruber, Chihong Zhang, Gang Zhong, Jian Jiang, Fei Wei, Minjie Huang, Zilin Ying, Yang Xia, Jing Han, Chun Yu, Eric Demers
  • Publication number: 20210200836
    Abstract: The present disclosure relates to methods and apparatus for compute processing. For example, disclosed techniques facilitate improving performance of matrix multiplication in streaming processor. Aspects of the present disclosure can execute, with a load control unit, a first load instruction to load a set of input data of an input matrix from a first memory to a second memory. Aspects of the present disclosure can also execute, with the load control unit, a second load instruction to load a set of weight data of a weight matrix from the first memory to the second memory. Additionally, aspects of the present disclosure can perform, with an ALU component, a matrix multiplication operation using the set of input data and the set of weight data to generate an output matrix. Further, aspects of the present disclosure can store the output matrix at a general purpose register accessible to the ALU component.
    Type: Application
    Filed: December 29, 2020
    Publication date: July 1, 2021
    Inventors: Yun DU, Gang ZHONG, Fei WEI, Yibin ZHANG, Jing HAN, Hongjiang SHANG, Elina KAMENETSKAYA, Minjie HUANG, Alexei Vladimirovich BOURD, Chun YU, Andrew Evan GRUBER, Eric DEMERS
  • Publication number: 20210183005
    Abstract: Methods, systems, and devices for graphic processing are described. The methods, systems, and devices may include or be associated with identifying a graphics instruction, determining that the graphics instruction is alias enabled for the device, partitioning an alias lookup table into one or more slots, allocating a slot of the alias lookup table based on the partitioning and determining that the graphics instruction is alias enabled, generating an alias instruction based on allocating the slot of the alias lookup table and determining that the graphics instruction is alias enabled, and processing the alias instruction.
    Type: Application
    Filed: December 13, 2019
    Publication date: June 17, 2021
    Inventors: Yun Du, Andrew Evan Gruber, Chihong Zhang, Gang Zhong, Jian Jiang, Fei Wei, Minjie Huang, Zilin Ying, Yang Xia, Jing Han, Chun Yu, Eric Demers
  • Publication number: 20200263405
    Abstract: A universal canister flush valve having a valve body configured to be fixed relative to a toilet tank and having a hollow wall defining an internal flow passage; a guide post coupled to and extending away from the valve body; a float fitted about and configured to slide relative to the guide post between a closed position and an open position, the float having an open top; and an extender that selectively couples to the open top in a first position, in which a first end of the extender is received in and coupled to the open top, and in a second position, in which a second end of the extender is received in and coupled to the open top, wherein the extender and float define a first overflow height in the first position and define a second overflow height in the second position.
    Type: Application
    Filed: February 14, 2020
    Publication date: August 20, 2020
    Inventors: Andrew L. Smith, Billy Jack Ahola, Donald G. Bogenschuetz, Lawrence E. Duwell, Peter W. Swart, Jeffrey T. Laundre, Bradley Strasser, Matthew Krebs, Douglas E. Bogard, Daniel N. Halloran, Scott R. Krebs, Edward F. Malls, JR., Randy O. Mesun, Eric Demer, Stewart Anthony Schaal
  • Patent number: 10558460
    Abstract: Systems and techniques are disclosed for general purpose register dynamic allocation based on latency associated with of instructions in processor threads. A streaming processor can include a general purpose registers configured to stored data associated with threads, and a thread scheduler configured to receive allocation information for the general purpose registers, the information describing general purpose registers that are to be assigned as persistent general purpose registers (pGPRs) and volatile general purpose registers (vGPRs). The plurality of general purpose registers can be allocated according to the received information. The streaming processor can include the general purpose registers allocated according to the received information, the allocated based on execution latencies of instructions included in the threads.
    Type: Grant
    Filed: December 14, 2016
    Date of Patent: February 11, 2020
    Assignee: QUALCOMM Incorporated
    Inventors: Yun Du, Liang Han, Lin Chen, Chihong Zhang, Hongjiang Shang, Jing Wu, Zilin Ying, Chun Yu, Guofang Jiao, Andrew Gruber, Eric Demers
  • Patent number: 10509588
    Abstract: Systems, methods, and computer programs are disclosed for controlling memory frequency. One method comprises a first memory client generating a compressed data buffer and compression statistics related to the compressed data buffer. The compressed data buffer and the compression statistics are stored in a memory device. Based on the stored compression statistics, a frequency or voltage setting of the memory device is adjusted for enabling a second memory client to read the compressed data buffer.
    Type: Grant
    Filed: January 13, 2016
    Date of Patent: December 17, 2019
    Assignee: Qualcomm Incorporated
    Inventors: Serag Gadelrab, Sudeep Ravi Kottilingal, Meghal Varia, Pooja Sinha, Ujwal Patel, Ruo Long Liu, Jeffrey Chu, Sina Gholamian, Hyukjune Chung, David Strasser, Raghavendra Nagaraj, Eric Demers
  • Publication number: 20180165092
    Abstract: Systems and techniques are disclosed for general purpose register dynamic allocation based on latency associated with of instructions in processor threads. A streaming processor can include a general purpose registers configured to stored data associated with threads, and a thread scheduler configured to receive allocation information for the general purpose registers, the information describing general purpose registers that are to be assigned as persistent general purpose registers (pGPRs) and volatile general purpose registers (vGPRs). The plurality of general purpose registers can be allocated according to the received information. The streaming processor can include the general purpose registers allocated according to the received information, the allocated based on execution latencies of instructions included in the threads.
    Type: Application
    Filed: December 14, 2016
    Publication date: June 14, 2018
    Inventors: Yun Du, Liang Han, Lin Chen, Chihong Zhang, Hongjiang Shang, Jing Wu, Zilin Ying, Chun Yu, Guofang Jiao, Andrew Gruber, Eric Demers
  • Publication number: 20180040095
    Abstract: This disclosure describes techniques for compressing a graphical state object. In one example, a central processing unit may be configured to receive, for output to the GPU, a set of instructions to render a scene. Responsive to receiving the set of instructions to render the scene, the central processing unit may be further configured to determine whether the set of instructions includes a state object that is registered as corresponding to an identifier. Responsive to determining that the set of instructions includes the state object that is registered as corresponding to the identifier, the central processing unit may be further configured to output, to the GPU, the identifier that is registered as corresponding to the state object.
    Type: Application
    Filed: August 2, 2016
    Publication date: February 8, 2018
    Inventors: Avinash Seetharamaiah, Christopher Paul Frascati, Jonnala Gadda Nagendra Kumar, Andrew Evan Gruber, Colin Christopher Sharp, Eric Demers
  • Patent number: 9824458
    Abstract: A graphics processing unit (GPU) may determine a workload of a fragment shader program that executes on the GPU. The GPU may compare the workload of the fragment shader program to a threshold. In response to determining that the workload of the fragment shader program is lower than a specified threshold, the fragment shader program may process one or more fragments without the GPU performing early depth testing of the one or more fragments before the processing by the fragment shader program. The GPU may perform, after processing by the fragment shader program, late depth testing of the one or more fragments to result in one or more non-occluded fragments. The GPU may write pixel values for the one or more non-occluded fragments into a frame buffer.
    Type: Grant
    Filed: September 23, 2015
    Date of Patent: November 21, 2017
    Assignee: QUALCOMM Incorporated
    Inventors: Shambhoo Khandelwal, Yang Xia, Xuefeng Tang, Jian Liang, Tao Wang, Andrew Evan Gruber, Eric Demers
  • Publication number: 20170083262
    Abstract: Systems, methods, and computer programs are disclosed for controlling memory frequency. One method comprises a first memory client generating a compressed data buffer and compression statistics related to the compressed data buffer. The compressed data buffer and the compression statistics are stored in a memory device. Based on the stored compression statistics, a frequency or voltage setting of the memory device is adjusted for enabling a second memory client to read the compressed data buffer.
    Type: Application
    Filed: January 13, 2016
    Publication date: March 23, 2017
    Inventors: SERAG GADELRAB, SUDEEP RAVI KOTTILINGAL, MEGHAL VARIA, POOJA SINHA, UJWAL PATEL, RUOLONG LIU, JEFFREY CHU, SINA GHOLAMIAN, HYUKJUNE CHUNG, DAVID STRASSER, RAGHAVENDRA NAGARAJ, ERIC DEMERS
  • Publication number: 20170084043
    Abstract: A graphics processing unit (GPU) may determine a workload of a fragment shader program that executes on the GPU. The GPU may compare the workload of the fragment shader program to a threshold. In response to determining that the workload of the fragment shader program is lower than a specified threshold, the fragment shader program may process one or more fragments without the GPU performing early depth testing of the one or more fragments before the processing by the fragment shader program. The GPU may perform, after processing by the fragment shader program, late depth testing of the one or more fragments to result in one or more non-occluded fragments. The GPU may write pixel values for the one or more non-occluded fragments into a frame buffer.
    Type: Application
    Filed: September 23, 2015
    Publication date: March 23, 2017
    Inventors: Shambhoo Khandelwal, Yang Xia, Xuefeng Tang, Jian Liang, Tao Wang, Andrew Evan Gruber, Eric Demers