Patents by Inventor Andrew Evan Gruber

Andrew Evan Gruber has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Dynamic wave pairing

Patent number: 11954758

Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for dynamic wave pairing. A graphics processor may allocate one or more GPU workloads to one or more wave slots of a plurality of wave slots. The graphics processor may select a first execution slot of a plurality of execution slots for executing the one or more GPU workloads. The selection may be based on one of a plurality of granularities. The graphics processor may execute, at the selected first execution slot, the one or more GPU workloads at the one of the plurality of granularities.

Type: Grant

Filed: February 24, 2022

Date of Patent: April 9, 2024

Assignee: QUALCOMM Incorporated

Inventors: Yun Du, Andrew Evan Gruber, Zilin Ying, Chunling Hu, Baoguang Yang, Yang Xia, Gang Zhong, Chun Yu, Eric Demers
ACCELERATED BOUNDING VOLUME HIERARCHY (BVH) TRAVERSAL FOR RAY TRACING

Publication number: 20240104824

Abstract: Systems and techniques are provided for accelerated ray tracing. For instance, a process can include obtaining a hierarchical acceleration data structure that includes a plurality of primitives of a scene object and obtaining a respective information value associated with each primitive included in the plurality of primitives. A sort order can be determined for two or more nodes included in a same level of the hierarchical acceleration data structure at least in part by sorting the two or more nodes based on a respective sorting parameter value determined for each respective node of the two or more nodes. Each respective sorting parameter value can be determined based on at least one information value associated with one or more primitives included in a sub-tree of each respective node of the two or more nodes. The hierarchical acceleration data structure can be traversed using the sort order.

Type: Application

Filed: September 23, 2022

Publication date: March 28, 2024

Inventors: Piyush GUPTA, Pavan Kumar AKKARAJU, Alexei Vladimirovich BOURD, Andrew Evan GRUBER
PATCHED SHADING IN GRAPHICS PROCESSING

Publication number: 20240104837

Abstract: Aspects of this disclosure relate to a process for rendering graphics that includes performing, with a hardware unit of a graphics processing unit (GPU) designated for vertex shading, a vertex shading operation to shade input vertices so as to output vertex shaded vertices, wherein the hardware unit adheres to an interface that receives a single vertex as an input and generates a single vertex as an output. The process also includes performing, with the hardware unit of the GPU designated for vertex shading, a hull shading operation to generate one or more control points based on one or more of the vertex shaded vertices, wherein the one or more hull shading operations operate on at least one of the one or more vertex shaded vertices to output the one or more control points.

Type: Application

Filed: August 9, 2023

Publication date: March 28, 2024

Inventors: Vineet GOEL, Andrew Evan GRUBER, Donghyun KIM
VISIBILITY GENERATION IN TILE BASED GPU ARCHITECTURES

Publication number: 20240104684

Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for improving visibility generation in tile-based GPU architectures. A graphics processor may perform a first binning pass associated with visibility information for each of a plurality of primitives in at least one frame. The visibility information for each of the plurality of primitives may correspond to a visible indication or an invisible indication. The graphics processor may update a depth buffer based on the visibility information for all of the plurality of primitives in the at least one frame. The graphics processor may perform a second binning pass for each of the visible set of primitives based on the updated depth buffer. The graphics processor may store at least one of the updated visibility information or updated position data for all primitives in the visible set of primitives from the second binning pass.

Type: Application

Filed: September 23, 2022

Publication date: March 28, 2024

Inventors: Kalyan Kumar BHIRAVABHATLA, Andrew Evan GRUBER, Rahul Sunil KUKREJA, Vishwanath Shashikant NIKAM, Tao WANG, Jian LIANG
GPU wave-to-wave optimization

Patent number: 11928754

Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for GPU wave-to-wave optimization. A graphics processor may execute a shader program for a first wave associated with a draw call or a compute kernel. The graphics processor may identify at least one first indication for the first wave associated with the draw call or the compute kernel. The graphics processor may store the at least one first indication for the first wave to a memory location. The graphics processor may execute the shader program for at least one second wave associated with the draw call or the compute kernel. The execution of the shader program for the at least one second wave may be based on the shader program for the at least one second wave reading the memory location to retrieve the at least one first indication.

Type: Grant

Filed: April 7, 2022

Date of Patent: March 12, 2024

Assignee: QUALCOMM Incorporated

Inventor: Andrew Evan Gruber
SLICED GRAPHICS PROCESSING UNIT (GPU) ARCHITECTURE IN PROCESSOR-BASED DEVICES

Publication number: 20240078735

Abstract: A sliced graphics processing unit (GPU) architecture in processor-based devices is disclosed. In some aspects, a GPU based on a sliced GPU architecture includes multiple hardware slices. The GPU further includes a command processor (CP) circuit and an unslice primitive controller (PC_US). Upon receiving a graphics instruction from a central processing unit (CPU), the CP circuit determines a graphics workload, and transmits the graphics workload to the PC_US. The PC_US then partitions the graphics workload into multiple subbatches and distributes each subbatch to a PC_S of a hardware slice for processing.

Type: Application

Filed: December 19, 2022

Publication date: March 7, 2024

Inventors: Jian Liang, Andrew Evan Gruber, Tao Wang, Xuefeng Tang, Vishwanath Shashikant Nikam, Nigel Poole, Kalyan Kumar Bhiravabhatla, Fei Xu, Zilin Ying
SLICED GRAPHICS PROCESSING UNIT (GPU) ARCHITECTURE IN PROCESSOR-BASED DEVICES

Publication number: 20240078737

Abstract: A sliced graphics processing unit (GPU) architecture in processor-based devices is disclosed. In some aspects, a GPU based on a sliced GPU architecture includes multiple hardware slices. The GPU further includes a command processor (CP) circuit and an unslice primitive controller (PC_US). Upon receiving a graphics instruction from a central processing unit (CPU), the CP circuit determines a graphics workload, and transmits the graphics workload to the PC_US. The PC_US then partitions the graphics workload into multiple subbatches and distributes each subbatch to a PC_S of a hardware slice for processing.

Type: Application

Filed: May 19, 2023

Publication date: March 7, 2024

Inventors: Jian LIANG, Andrew Evan GRUBER, Tao WANG, Xuefeng TANG, Vishwanath Shashikant NIKAM, Nigel POOLE, Kalyan Kumar BHIRAVABHATLA, Fei XU, Zilin YING
RUNTIME MECHANISM TO OPTIMIZE SHADER EXECUTION FLOW

Publication number: 20240046543

Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for runtime optimization of the shader execution flow. A graphics processor may obtain instruction execution data associated with a graphics workload, the instruction execution data including graphics data for a set of shader operations. The graphics processor may configure, at a first iteration, at least one predication value based on the instruction execution data including the graphics data for the set of shader operations. The graphics processor may adjust, at a second iteration, an execution flow of the graphics workload based on the configured at least one predication value, the execution flow of the graphics workload including the set of shader operations. The graphics processor may execute or refrain from executing, at the second iteration, each of the set of shader operations based on the adjusted execution flow of the graphics workload.

Type: Application

Filed: August 5, 2022

Publication date: February 8, 2024

Inventors: Yun DU, Eric DEMERS, Andrew Evan GRUBER, Chun YU, Baoguang YANG, Chihong ZHANG, Yuehai DU, Avinash SEETHARAMAIAH, Jonnala Gadda NAGENDRA KUMAR, Gang ZHONG, Zilin YING, Fei WEI
Optimization of depth and shadow pass rendering in tile based architectures

Patent number: 11893654

Abstract: The present disclosure relates to methods and devices for graphics processing including an apparatus, e.g., a GPU. The apparatus may configure a portion of a GPU to include at least one depth processing block, the at least one depth processing block being associated with at least one depth buffer. The apparatus may also identify one or more depth passes of each of a plurality of graphics workloads, the plurality of graphics workloads being associated with a plurality of frames. Further, the apparatus may process each of the one or more depth passes in the portion of the GPU including the at least one depth processing block, each of the one or more depth passes being processed by the at least one depth processing block, the one or more depth passes being associated with the at least one depth buffer.

Type: Grant

Filed: July 12, 2021

Date of Patent: February 6, 2024

Assignee: QUALCOMM Incorporated

Inventors: Sreyas Kurumanghat, Kalyan Kumar Bhiravabhatla, Andrew Evan Gruber, Tao Wang, Baoguang Yang, Pavan Kumar Akkaraju
PERFORMING MATRIX MULTIPLICATION IN A STREAMING PROCESSOR

Publication number: 20240037183

Abstract: The present disclosure relates to methods and apparatus for compute processing. For example, disclosed techniques facilitate improving performance of matrix multiplication in streaming processor. Aspects of the present disclosure can execute, with a load control unit, a first load instruction to load a set of input data of an input matrix from a first memory to a second memory. Aspects of the present disclosure can also execute, with the load control unit, a second load instruction to load a set of weight data of a weight matrix from the first memory to the second memory. Additionally, aspects of the present disclosure can perform, with an ALU component, a matrix multiplication operation using the set of input data and the set of weight data to generate an output matrix. Further, aspects of the present disclosure can store the output matrix at a general purpose register accessible to the ALU component.

Type: Application

Filed: October 16, 2023

Publication date: February 1, 2024

Inventors: Yun DU, Gang ZHONG, Fei WEI, Yibin ZHANG, Jing HAN, Hongjiang SHANG, Elina KAMENETSKAYA, Minjie HUANG, Alexei Vladimirovich BOURD, Chun YU, Andrew Evan GRUBER, Eric DEMERS
METHODS AND APPARATUS FOR SELECTION OF RENDERING MODES

Publication number: 20240013336

Abstract: The present disclosure relates to graphics processing. An apparatus of the present disclosure may determine visibility streams corresponding to a target and a set of bins into which the target is divided. The apparatus may select one of a first rendering mode or a second rendering mode for the target based on the first visibility stream and based on the set of second visibility streams. When the first rendering mode is select, the apparatus may configure each of the set of bins into a first subset associated with a first type of rendering pass or a second subset associated with a second type of rendering pass. The apparatus may then render the target based on the selected one of the first rendering mode or the second rendering mode and, if applicable, based on the first rendering pass type or the second rendering pass type.

Type: Application

Filed: November 19, 2020

Publication date: January 11, 2024

Inventors: Bo DU, Andrew Evan GRUBER, Yongjun XU
RASTERIZATION OF COMPUTE WORKLOADS

Publication number: 20230394738

Abstract: The present disclosure relates to methods and apparatus for graphics processing, e.g., a GPU. The apparatus may receive an image including a plurality of pixels associated with one or more workgroups and one or more pixel tiles, each of the workgroups and the pixel tiles including one or more pixels of the plurality of pixels. The apparatus may determine whether the one or more workgroups are misaligned with the one or more pixel tiles. The apparatus may determine a conversion order of the one or more workgroups when the one or more workgroups are misaligned with the one or more pixel tiles, the conversion order corresponding to a common multiple of one of the one or more workgroups and one of the one or more pixel tiles. The apparatus may convert each of the one or more workgroups based on the conversion order of the one or more workgroups.

Type: Application

Filed: November 9, 2020

Publication date: December 7, 2023

Inventors: Yibin ZHANG, Zilin YING, Yun DU, Heng QI, Jiexia YU, Yang YU, Andrew Evan GRUBER, Jian LIANG, Tao WANG, Alexei Vladimirovich BOURD, Gang ZHONG, Minjie HUANG
Methods and apparatus to perform matrix multiplication in a streaming processor

Patent number: 11829439

Abstract: The present disclosure relates to methods and apparatus for compute processing. For example, disclosed techniques facilitate improving performance of matrix multiplication in streaming processor. Aspects of the present disclosure can execute, with a load control unit, a first load instruction to load a set of input data of an input matrix from a first memory to a second memory. Aspects of the present disclosure can also execute, with the load control unit, a second load instruction to load a set of weight data of a weight matrix from the first memory to the second memory. Additionally, aspects of the present disclosure can perform, with an ALU component, a matrix multiplication operation using the set of input data and the set of weight data to generate an output matrix. Further, aspects of the present disclosure can store the output matrix at a general purpose register accessible to the ALU component.

Type: Grant

Filed: December 29, 2020

Date of Patent: November 28, 2023

Assignee: QUALCOMM Incorporated

Inventors: Yun Du, Gang Zhong, Fei Wei, Yibin Zhang, Jing Han, Hongjiang Shang, Elina Kamenetskaya, Minjie Huang, Alexei Vladimirovich Bourd, Chun Yu, Andrew Evan Gruber, Eric Demers
RUN-TIME MECHANISM FOR OPTIMAL SHADER

Publication number: 20230377240

Abstract: Aspects presented herein relate to methods and devices for graphics processing including an apparatus, e.g., a GPU. The apparatus may receive a set of draw call instructions corresponding to a graphics workload, where the set of draw call instructions is associated with at least one run-time parameter. The apparatus may also obtain a first shader program associated with storing data in a system memory and at least one second shader program associated with storing data in a constant memory. Further, the apparatus may execute the first shader program or the at least one second shader program based on whether the at least one run-time parameter is less than or equal to a size of the constant memory. The apparatus may also update or maintain a configuration of a shader processor or a streaming processor based on executing the first shader program or the at least one second shader program.

Type: Application

Filed: May 18, 2022

Publication date: November 23, 2023

Inventors: Yun DU, Eric DEMERS, Andrew Evan GRUBER, Chun YU, Chihong ZHANG, Baoguang YANG, Yuehai DU, Gang ZHONG, Avinash SEETHARAMAIAH, Jonnala Gadda NAGENDRA KUMAR
METHODS AND APPARATUS FOR SELECTION OF RENDERING MODES

Publication number: 20230343016

Abstract: The present disclosure relates to graphics processing. An apparatus of the present disclosure may determine visibility streams corresponding to a target and a set of bins into which the target is divided. The apparatus may select one of a first rendering mode or a second rendering mode for the target based on the first visibility stream and based on the set of second visibility streams. When the first rendering mode is select, the apparatus may configure each of the set of bins into a first subset associated with a first type of rendering pass or a second subset associated with a second type of rendering pass. The apparatus may then render the target based on the selected one of the first rendering mode or the second rendering mode and, if applicable, based on the first rendering pass type or the second rendering pass type.

Type: Application

Filed: November 18, 2020

Publication date: October 26, 2023

Inventors: Srihari Babu ALLA, Jonnala Gadda NAGENDRA KUMAR, Avinash SEETHARAMAIAH, Andrew Evan GRUBER, Thomas Edwin FRISINGER, Richard HAMMERSTONE, Bo DU, Yongjun XU
Methods and apparatus for mapping source location for input data to a graphics processing unit

Patent number: 11790478

Abstract: The present disclosure relates to methods and apparatus for mapping a source location of input data for processing by a graphics processing unit. The apparatus can configure a processing element of the graphics processing unit with a predefined rule for decoding a data source parameter for executing a task by the graphics processing unit. Moreover, the apparatus can store the parameter in local storage of the processing element and configure the processing element to decode the parameter according to the at least one predefined rule to determine a source location of the input data and at least one relationship between invocations of the task. The apparatus can also load, to the local storage of the processing element, the input data from a plurality of memory addresses of the source location determined by the parameter. A one logic unit can then execute the task on the loaded input data.

Type: Grant

Filed: August 3, 2020

Date of Patent: October 17, 2023

Assignee: QUALCOMM Incorporated

Inventors: Liang Li, Elina Kamenetskaya, Andrew Evan Gruber
GPU WAVE-TO-WAVE OPTIMIZATION

Publication number: 20230325962

Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for GPU wave-to-wave optimization. A graphics processor may execute a shader program for a first wave associated with a draw call or a compute kernel. The graphics processor may identify at least one first indication for the first wave associated with the draw call or the compute kernel. The graphics processor may store the at least one first indication for the first wave to a memory location. The graphics processor may execute the shader program for at least one second wave associated with the draw call or the compute kernel. The execution of the shader program for the at least one second wave may be based on the shader program for the at least one second wave reading the memory location to retrieve the at least one first indication.

Type: Application

Filed: April 7, 2022

Publication date: October 12, 2023

Inventor: Andrew Evan GRUBER
Patched shading in graphics processing

Patent number: 11769294

Abstract: Aspects of this disclosure relate to a process for rendering graphics that includes performing, with a hardware unit of a graphics processing unit (GPU) designated for vertex shading, a vertex shading operation to shade input vertices so as to output vertex shaded vertices, wherein the hardware unit adheres to an interface that receives a single vertex as an input and generates a single vertex as an output. The process also includes performing, with the hardware unit of the GPU designated for vertex shading, a hull shading operation to generate one or more control points based on one or more of the vertex shaded vertices, wherein the one or more hull shading operations operate on at least one of the one or more vertex shaded vertices to output the one or more control points.

Type: Grant

Filed: November 9, 2021

Date of Patent: September 26, 2023

Assignee: QUALCOMM INCORPORATED

Inventors: Vineet Goel, Andrew Evan Gruber, Donghyun Kim
COMPATIBLE COMPRESSION FOR DIFFERENT TYPES OF IMAGE VIEWS

Publication number: 20230298123

Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for compatible compression for different types of image views. A graphics processor may select a first common format of a plurality of common formats for at least one image based on at least one of application data or first metadata associated with the at least one image. The graphics processor may encode the at least one image based on the selected first common format for the at least one image. The graphics processor may select a second common format for the at least one image based on second metadata of the at least one image. The second common format may be identical to the first common format. The graphics processor may decode the at least one image based on the selected second common format for the at least one image.

Type: Application

Filed: March 17, 2022

Publication date: September 21, 2023

Inventors: Srihari Babu ALLA, Tao WANG, Andrew Evan GRUBER, Matthew NETSCH, Richard HAMMERSTONE, Thomas Edwin FRISINGER
GPR optimization in a GPU based on a GPR release mechanism

Patent number: 11763419

Abstract: This disclosure provides systems, devices, apparatus and methods, including computer programs encoded on storage media, for GPR optimization in a GPU based on a GPR release mechanism. More specifically, a GPU may determine at least one unutilized branch within an executable shader based on constants defined for the executable shader. Based on the at least one unutilized branch, the GPU may further determine a number of GPRs that can be deallocated from previously allocated GPRs. The GPU may deallocate, for a subsequent thread within a draw call, the number of GPRs from the previously allocated GPRs during execution of the executable shader based on the determined number of GPRs to be deallocated.

Type: Grant

Filed: October 14, 2022

Date of Patent: September 19, 2023

Assignee: QUALCOMM Incorporated

Inventors: Andrew Evan Gruber, Yun Du

1 2 3 4 5 … next