Patents by Inventor Andrew Evan Gruber

Andrew Evan Gruber has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11954758
    Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for dynamic wave pairing. A graphics processor may allocate one or more GPU workloads to one or more wave slots of a plurality of wave slots. The graphics processor may select a first execution slot of a plurality of execution slots for executing the one or more GPU workloads. The selection may be based on one of a plurality of granularities. The graphics processor may execute, at the selected first execution slot, the one or more GPU workloads at the one of the plurality of granularities.
    Type: Grant
    Filed: February 24, 2022
    Date of Patent: April 9, 2024
    Assignee: QUALCOMM Incorporated
    Inventors: Yun Du, Andrew Evan Gruber, Zilin Ying, Chunling Hu, Baoguang Yang, Yang Xia, Gang Zhong, Chun Yu, Eric Demers
  • Publication number: 20240104824
    Abstract: Systems and techniques are provided for accelerated ray tracing. For instance, a process can include obtaining a hierarchical acceleration data structure that includes a plurality of primitives of a scene object and obtaining a respective information value associated with each primitive included in the plurality of primitives. A sort order can be determined for two or more nodes included in a same level of the hierarchical acceleration data structure at least in part by sorting the two or more nodes based on a respective sorting parameter value determined for each respective node of the two or more nodes. Each respective sorting parameter value can be determined based on at least one information value associated with one or more primitives included in a sub-tree of each respective node of the two or more nodes. The hierarchical acceleration data structure can be traversed using the sort order.
    Type: Application
    Filed: September 23, 2022
    Publication date: March 28, 2024
    Inventors: Piyush GUPTA, Pavan Kumar AKKARAJU, Alexei Vladimirovich BOURD, Andrew Evan GRUBER
  • Publication number: 20240104837
    Abstract: Aspects of this disclosure relate to a process for rendering graphics that includes performing, with a hardware unit of a graphics processing unit (GPU) designated for vertex shading, a vertex shading operation to shade input vertices so as to output vertex shaded vertices, wherein the hardware unit adheres to an interface that receives a single vertex as an input and generates a single vertex as an output. The process also includes performing, with the hardware unit of the GPU designated for vertex shading, a hull shading operation to generate one or more control points based on one or more of the vertex shaded vertices, wherein the one or more hull shading operations operate on at least one of the one or more vertex shaded vertices to output the one or more control points.
    Type: Application
    Filed: August 9, 2023
    Publication date: March 28, 2024
    Inventors: Vineet GOEL, Andrew Evan GRUBER, Donghyun KIM
  • Publication number: 20240104684
    Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for improving visibility generation in tile-based GPU architectures. A graphics processor may perform a first binning pass associated with visibility information for each of a plurality of primitives in at least one frame. The visibility information for each of the plurality of primitives may correspond to a visible indication or an invisible indication. The graphics processor may update a depth buffer based on the visibility information for all of the plurality of primitives in the at least one frame. The graphics processor may perform a second binning pass for each of the visible set of primitives based on the updated depth buffer. The graphics processor may store at least one of the updated visibility information or updated position data for all primitives in the visible set of primitives from the second binning pass.
    Type: Application
    Filed: September 23, 2022
    Publication date: March 28, 2024
    Inventors: Kalyan Kumar BHIRAVABHATLA, Andrew Evan GRUBER, Rahul Sunil KUKREJA, Vishwanath Shashikant NIKAM, Tao WANG, Jian LIANG
  • Patent number: 11928754
    Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for GPU wave-to-wave optimization. A graphics processor may execute a shader program for a first wave associated with a draw call or a compute kernel. The graphics processor may identify at least one first indication for the first wave associated with the draw call or the compute kernel. The graphics processor may store the at least one first indication for the first wave to a memory location. The graphics processor may execute the shader program for at least one second wave associated with the draw call or the compute kernel. The execution of the shader program for the at least one second wave may be based on the shader program for the at least one second wave reading the memory location to retrieve the at least one first indication.
    Type: Grant
    Filed: April 7, 2022
    Date of Patent: March 12, 2024
    Assignee: QUALCOMM Incorporated
    Inventor: Andrew Evan Gruber
  • Publication number: 20240078735
    Abstract: A sliced graphics processing unit (GPU) architecture in processor-based devices is disclosed. In some aspects, a GPU based on a sliced GPU architecture includes multiple hardware slices. The GPU further includes a command processor (CP) circuit and an unslice primitive controller (PC_US). Upon receiving a graphics instruction from a central processing unit (CPU), the CP circuit determines a graphics workload, and transmits the graphics workload to the PC_US. The PC_US then partitions the graphics workload into multiple subbatches and distributes each subbatch to a PC_S of a hardware slice for processing.
    Type: Application
    Filed: December 19, 2022
    Publication date: March 7, 2024
    Inventors: Jian Liang, Andrew Evan Gruber, Tao Wang, Xuefeng Tang, Vishwanath Shashikant Nikam, Nigel Poole, Kalyan Kumar Bhiravabhatla, Fei Xu, Zilin Ying
  • Publication number: 20240078737
    Abstract: A sliced graphics processing unit (GPU) architecture in processor-based devices is disclosed. In some aspects, a GPU based on a sliced GPU architecture includes multiple hardware slices. The GPU further includes a command processor (CP) circuit and an unslice primitive controller (PC_US). Upon receiving a graphics instruction from a central processing unit (CPU), the CP circuit determines a graphics workload, and transmits the graphics workload to the PC_US. The PC_US then partitions the graphics workload into multiple subbatches and distributes each subbatch to a PC_S of a hardware slice for processing.
    Type: Application
    Filed: May 19, 2023
    Publication date: March 7, 2024
    Inventors: Jian LIANG, Andrew Evan GRUBER, Tao WANG, Xuefeng TANG, Vishwanath Shashikant NIKAM, Nigel POOLE, Kalyan Kumar BHIRAVABHATLA, Fei XU, Zilin YING
  • Publication number: 20240046543
    Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for runtime optimization of the shader execution flow. A graphics processor may obtain instruction execution data associated with a graphics workload, the instruction execution data including graphics data for a set of shader operations. The graphics processor may configure, at a first iteration, at least one predication value based on the instruction execution data including the graphics data for the set of shader operations. The graphics processor may adjust, at a second iteration, an execution flow of the graphics workload based on the configured at least one predication value, the execution flow of the graphics workload including the set of shader operations. The graphics processor may execute or refrain from executing, at the second iteration, each of the set of shader operations based on the adjusted execution flow of the graphics workload.
    Type: Application
    Filed: August 5, 2022
    Publication date: February 8, 2024
    Inventors: Yun DU, Eric DEMERS, Andrew Evan GRUBER, Chun YU, Baoguang YANG, Chihong ZHANG, Yuehai DU, Avinash SEETHARAMAIAH, Jonnala Gadda NAGENDRA KUMAR, Gang ZHONG, Zilin YING, Fei WEI
  • Patent number: 11893654
    Abstract: The present disclosure relates to methods and devices for graphics processing including an apparatus, e.g., a GPU. The apparatus may configure a portion of a GPU to include at least one depth processing block, the at least one depth processing block being associated with at least one depth buffer. The apparatus may also identify one or more depth passes of each of a plurality of graphics workloads, the plurality of graphics workloads being associated with a plurality of frames. Further, the apparatus may process each of the one or more depth passes in the portion of the GPU including the at least one depth processing block, each of the one or more depth passes being processed by the at least one depth processing block, the one or more depth passes being associated with the at least one depth buffer.
    Type: Grant
    Filed: July 12, 2021
    Date of Patent: February 6, 2024
    Assignee: QUALCOMM Incorporated
    Inventors: Sreyas Kurumanghat, Kalyan Kumar Bhiravabhatla, Andrew Evan Gruber, Tao Wang, Baoguang Yang, Pavan Kumar Akkaraju
  • Publication number: 20240037183
    Abstract: The present disclosure relates to methods and apparatus for compute processing. For example, disclosed techniques facilitate improving performance of matrix multiplication in streaming processor. Aspects of the present disclosure can execute, with a load control unit, a first load instruction to load a set of input data of an input matrix from a first memory to a second memory. Aspects of the present disclosure can also execute, with the load control unit, a second load instruction to load a set of weight data of a weight matrix from the first memory to the second memory. Additionally, aspects of the present disclosure can perform, with an ALU component, a matrix multiplication operation using the set of input data and the set of weight data to generate an output matrix. Further, aspects of the present disclosure can store the output matrix at a general purpose register accessible to the ALU component.
    Type: Application
    Filed: October 16, 2023
    Publication date: February 1, 2024
    Inventors: Yun DU, Gang ZHONG, Fei WEI, Yibin ZHANG, Jing HAN, Hongjiang SHANG, Elina KAMENETSKAYA, Minjie HUANG, Alexei Vladimirovich BOURD, Chun YU, Andrew Evan GRUBER, Eric DEMERS
  • Publication number: 20240013336
    Abstract: The present disclosure relates to graphics processing. An apparatus of the present disclosure may determine visibility streams corresponding to a target and a set of bins into which the target is divided. The apparatus may select one of a first rendering mode or a second rendering mode for the target based on the first visibility stream and based on the set of second visibility streams. When the first rendering mode is select, the apparatus may configure each of the set of bins into a first subset associated with a first type of rendering pass or a second subset associated with a second type of rendering pass. The apparatus may then render the target based on the selected one of the first rendering mode or the second rendering mode and, if applicable, based on the first rendering pass type or the second rendering pass type.
    Type: Application
    Filed: November 19, 2020
    Publication date: January 11, 2024
    Inventors: Bo DU, Andrew Evan GRUBER, Yongjun XU
  • Publication number: 20230394738
    Abstract: The present disclosure relates to methods and apparatus for graphics processing, e.g., a GPU. The apparatus may receive an image including a plurality of pixels associated with one or more workgroups and one or more pixel tiles, each of the workgroups and the pixel tiles including one or more pixels of the plurality of pixels. The apparatus may determine whether the one or more workgroups are misaligned with the one or more pixel tiles. The apparatus may determine a conversion order of the one or more workgroups when the one or more workgroups are misaligned with the one or more pixel tiles, the conversion order corresponding to a common multiple of one of the one or more workgroups and one of the one or more pixel tiles. The apparatus may convert each of the one or more workgroups based on the conversion order of the one or more workgroups.
    Type: Application
    Filed: November 9, 2020
    Publication date: December 7, 2023
    Inventors: Yibin ZHANG, Zilin YING, Yun DU, Heng QI, Jiexia YU, Yang YU, Andrew Evan GRUBER, Jian LIANG, Tao WANG, Alexei Vladimirovich BOURD, Gang ZHONG, Minjie HUANG
  • Patent number: 11829439
    Abstract: The present disclosure relates to methods and apparatus for compute processing. For example, disclosed techniques facilitate improving performance of matrix multiplication in streaming processor. Aspects of the present disclosure can execute, with a load control unit, a first load instruction to load a set of input data of an input matrix from a first memory to a second memory. Aspects of the present disclosure can also execute, with the load control unit, a second load instruction to load a set of weight data of a weight matrix from the first memory to the second memory. Additionally, aspects of the present disclosure can perform, with an ALU component, a matrix multiplication operation using the set of input data and the set of weight data to generate an output matrix. Further, aspects of the present disclosure can store the output matrix at a general purpose register accessible to the ALU component.
    Type: Grant
    Filed: December 29, 2020
    Date of Patent: November 28, 2023
    Assignee: QUALCOMM Incorporated
    Inventors: Yun Du, Gang Zhong, Fei Wei, Yibin Zhang, Jing Han, Hongjiang Shang, Elina Kamenetskaya, Minjie Huang, Alexei Vladimirovich Bourd, Chun Yu, Andrew Evan Gruber, Eric Demers
  • Publication number: 20230377240
    Abstract: Aspects presented herein relate to methods and devices for graphics processing including an apparatus, e.g., a GPU. The apparatus may receive a set of draw call instructions corresponding to a graphics workload, where the set of draw call instructions is associated with at least one run-time parameter. The apparatus may also obtain a first shader program associated with storing data in a system memory and at least one second shader program associated with storing data in a constant memory. Further, the apparatus may execute the first shader program or the at least one second shader program based on whether the at least one run-time parameter is less than or equal to a size of the constant memory. The apparatus may also update or maintain a configuration of a shader processor or a streaming processor based on executing the first shader program or the at least one second shader program.
    Type: Application
    Filed: May 18, 2022
    Publication date: November 23, 2023
    Inventors: Yun DU, Eric DEMERS, Andrew Evan GRUBER, Chun YU, Chihong ZHANG, Baoguang YANG, Yuehai DU, Gang ZHONG, Avinash SEETHARAMAIAH, Jonnala Gadda NAGENDRA KUMAR
  • Publication number: 20230343016
    Abstract: The present disclosure relates to graphics processing. An apparatus of the present disclosure may determine visibility streams corresponding to a target and a set of bins into which the target is divided. The apparatus may select one of a first rendering mode or a second rendering mode for the target based on the first visibility stream and based on the set of second visibility streams. When the first rendering mode is select, the apparatus may configure each of the set of bins into a first subset associated with a first type of rendering pass or a second subset associated with a second type of rendering pass. The apparatus may then render the target based on the selected one of the first rendering mode or the second rendering mode and, if applicable, based on the first rendering pass type or the second rendering pass type.
    Type: Application
    Filed: November 18, 2020
    Publication date: October 26, 2023
    Inventors: Srihari Babu ALLA, Jonnala Gadda NAGENDRA KUMAR, Avinash SEETHARAMAIAH, Andrew Evan GRUBER, Thomas Edwin FRISINGER, Richard HAMMERSTONE, Bo DU, Yongjun XU
  • Patent number: 11790478
    Abstract: The present disclosure relates to methods and apparatus for mapping a source location of input data for processing by a graphics processing unit. The apparatus can configure a processing element of the graphics processing unit with a predefined rule for decoding a data source parameter for executing a task by the graphics processing unit. Moreover, the apparatus can store the parameter in local storage of the processing element and configure the processing element to decode the parameter according to the at least one predefined rule to determine a source location of the input data and at least one relationship between invocations of the task. The apparatus can also load, to the local storage of the processing element, the input data from a plurality of memory addresses of the source location determined by the parameter. A one logic unit can then execute the task on the loaded input data.
    Type: Grant
    Filed: August 3, 2020
    Date of Patent: October 17, 2023
    Assignee: QUALCOMM Incorporated
    Inventors: Liang Li, Elina Kamenetskaya, Andrew Evan Gruber
  • Publication number: 20230325962
    Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for GPU wave-to-wave optimization. A graphics processor may execute a shader program for a first wave associated with a draw call or a compute kernel. The graphics processor may identify at least one first indication for the first wave associated with the draw call or the compute kernel. The graphics processor may store the at least one first indication for the first wave to a memory location. The graphics processor may execute the shader program for at least one second wave associated with the draw call or the compute kernel. The execution of the shader program for the at least one second wave may be based on the shader program for the at least one second wave reading the memory location to retrieve the at least one first indication.
    Type: Application
    Filed: April 7, 2022
    Publication date: October 12, 2023
    Inventor: Andrew Evan GRUBER
  • Patent number: 11769294
    Abstract: Aspects of this disclosure relate to a process for rendering graphics that includes performing, with a hardware unit of a graphics processing unit (GPU) designated for vertex shading, a vertex shading operation to shade input vertices so as to output vertex shaded vertices, wherein the hardware unit adheres to an interface that receives a single vertex as an input and generates a single vertex as an output. The process also includes performing, with the hardware unit of the GPU designated for vertex shading, a hull shading operation to generate one or more control points based on one or more of the vertex shaded vertices, wherein the one or more hull shading operations operate on at least one of the one or more vertex shaded vertices to output the one or more control points.
    Type: Grant
    Filed: November 9, 2021
    Date of Patent: September 26, 2023
    Assignee: QUALCOMM INCORPORATED
    Inventors: Vineet Goel, Andrew Evan Gruber, Donghyun Kim
  • Publication number: 20230298123
    Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for compatible compression for different types of image views. A graphics processor may select a first common format of a plurality of common formats for at least one image based on at least one of application data or first metadata associated with the at least one image. The graphics processor may encode the at least one image based on the selected first common format for the at least one image. The graphics processor may select a second common format for the at least one image based on second metadata of the at least one image. The second common format may be identical to the first common format. The graphics processor may decode the at least one image based on the selected second common format for the at least one image.
    Type: Application
    Filed: March 17, 2022
    Publication date: September 21, 2023
    Inventors: Srihari Babu ALLA, Tao WANG, Andrew Evan GRUBER, Matthew NETSCH, Richard HAMMERSTONE, Thomas Edwin FRISINGER
  • Patent number: 11763419
    Abstract: This disclosure provides systems, devices, apparatus and methods, including computer programs encoded on storage media, for GPR optimization in a GPU based on a GPR release mechanism. More specifically, a GPU may determine at least one unutilized branch within an executable shader based on constants defined for the executable shader. Based on the at least one unutilized branch, the GPU may further determine a number of GPRs that can be deallocated from previously allocated GPRs. The GPU may deallocate, for a subsequent thread within a draw call, the number of GPRs from the previously allocated GPRs during execution of the executable shader based on the determined number of GPRs to be deallocated.
    Type: Grant
    Filed: October 14, 2022
    Date of Patent: September 19, 2023
    Assignee: QUALCOMM Incorporated
    Inventors: Andrew Evan Gruber, Yun Du