Patents by Inventor Andrew Evan Gruber

Andrew Evan Gruber has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20210200836
    Abstract: The present disclosure relates to methods and apparatus for compute processing. For example, disclosed techniques facilitate improving performance of matrix multiplication in streaming processor. Aspects of the present disclosure can execute, with a load control unit, a first load instruction to load a set of input data of an input matrix from a first memory to a second memory. Aspects of the present disclosure can also execute, with the load control unit, a second load instruction to load a set of weight data of a weight matrix from the first memory to the second memory. Additionally, aspects of the present disclosure can perform, with an ALU component, a matrix multiplication operation using the set of input data and the set of weight data to generate an output matrix. Further, aspects of the present disclosure can store the output matrix at a general purpose register accessible to the ALU component.
    Type: Application
    Filed: December 29, 2020
    Publication date: July 1, 2021
    Inventors: Yun DU, Gang ZHONG, Fei WEI, Yibin ZHANG, Jing HAN, Hongjiang SHANG, Elina KAMENETSKAYA, Minjie HUANG, Alexei Vladimirovich BOURD, Chun YU, Andrew Evan GRUBER, Eric DEMERS
  • Publication number: 20210183005
    Abstract: Methods, systems, and devices for graphic processing are described. The methods, systems, and devices may include or be associated with identifying a graphics instruction, determining that the graphics instruction is alias enabled for the device, partitioning an alias lookup table into one or more slots, allocating a slot of the alias lookup table based on the partitioning and determining that the graphics instruction is alias enabled, generating an alias instruction based on allocating the slot of the alias lookup table and determining that the graphics instruction is alias enabled, and processing the alias instruction.
    Type: Application
    Filed: December 13, 2019
    Publication date: June 17, 2021
    Inventors: Yun Du, Andrew Evan Gruber, Chihong Zhang, Gang Zhong, Jian Jiang, Fei Wei, Minjie Huang, Zilin Ying, Yang Xia, Jing Han, Chun Yu, Eric Demers
  • Publication number: 20210182140
    Abstract: The present disclosure relates to methods and apparatus for display processing. For example, disclosed techniques facilitate speculative page fault handling in a GPU. Aspects of the present disclosure can perform a graphics operation associated with using a set of constants within a flow control. Aspects of the present disclosure can also query a first memory to determine whether memory addresses associated with the set of constants are allocated at a constant buffer of the first memory. Further, aspects of the present disclosure can set a page fault indicator to a true value when the query indicates that at least one memory address associated with the set of constants is unallocated at the constant buffer, and set the page fault indicator to a false value otherwise.
    Type: Application
    Filed: December 16, 2019
    Publication date: June 17, 2021
    Inventor: Andrew Evan GRUBER
  • Publication number: 20210133912
    Abstract: The present disclosure relates to methods and apparatus for graphics processing. Aspects of the present disclosure can determine a portion of a display area, where the portion of the display area is determined based on display content of the display area. Further, aspects of the present disclosure can communicate display information corresponding to the determined portion of the display area. Additionally, aspects of the present disclosure can update the display information corresponding to the determined portion of the display area. Aspects of the present disclosure can also communicate the updated display information corresponding to the determined portion of the display area. Aspects of the present disclosure can also render at least some display content of the display area corresponding to the determined portion of the display area. In some aspects, the updated display information can be based on the rendered display content of the display area.
    Type: Application
    Filed: November 4, 2019
    Publication date: May 6, 2021
    Inventors: Tao WANG, Shambhoo KHANDELWAL, Andrew Evan GRUBER, Shangmei YU, Jing GAO, Junmei SHAO, Thomas Edwin Frisinger, Rick Hammerstone
  • Publication number: 20210103467
    Abstract: A graphics processing unit (GPU) may execute a shader program that may include instructions for prioritization and scheduling of waves processed in parallel. According to some aspects of the described techniques, instruction variants (e.g., set-lowest-priority, set-highest-priority, set-priority-to-N, etc.) may be executed by hardware during processing of a wave to control (e.g., modify) processing priority for that wave. As such, the described techniques for shader controlled wave scheduling priority may allow waves to be processed while avoiding interference with lagging waves, while avoiding taking resources from lagging waves, etc. In one example, when a set-lowest-priority instruction is executed by hardware during execution of a first loop of a first wave, the instruction may push the current wave's priority to be lowest on the list. Such may result in pending loops from other waves being processed prior to the processing returning to a second loop of the first wave.
    Type: Application
    Filed: October 2, 2019
    Publication date: April 8, 2021
    Inventors: Elina Kamenetskaya, Andrew Evan Gruber, Alexei Vladimirovich Bourd
  • Publication number: 20210104009
    Abstract: Methods, systems, and devices for image processing are described. A device may identify a target pixel having a texel coordinate in an image. The device may select, based on the texel coordinate, a first texel sample of a first set of texel samples and a second texel sample of a second set of texel samples. In some examples, the device may group the first texel sample and the second texel sample into a third set of texel samples. The device may generate an instruction including the third set of texel samples and a weighted sum associated with the first texel sample and the second texel sample, and process the third set of texel samples based on the instruction. In some examples, the instruction may be a macro instruction.
    Type: Application
    Filed: October 2, 2019
    Publication date: April 8, 2021
    Inventors: Liang Li, Andrew Evan Gruber
  • Publication number: 20210103852
    Abstract: Methods, systems, and devices for workload balancing for machine learning are described. Generally, a device may determine a size of a level one cache of a texture processor, identify a portion of input activation data for an iterative machine-learning process, and load the portion of input activation data into the level one cache. The device may allocate, based at least in part on a texture processor to shading processor arithmetic logic unit (ALU) resource ratio, a first set of one or more weight batches and a second set of one or more weight batches associated with the loaded portion of input activation data to the shading processor, and process the portion of input activation data based at least in part on the first set of one or more weight batches and the second set of one or more weight batches using the texture processor and the shading processor in parallel.
    Type: Application
    Filed: October 2, 2019
    Publication date: April 8, 2021
    Inventors: Elina Kamenetskaya, Andrew Evan Gruber, Amir Momeni
  • Publication number: 20200410743
    Abstract: The present disclosure relates to methods and apparatus for graphics processing. In some aspects, the apparatus selects a first mip-map layer with a first texture size and a second mip-map layer with a second texture size based on a third texture size of an image. The apparatus also determines a relative distance associated with the texture sizes. Additionally, the apparatus determines a first quantity of samples to select from the first mip-map layer, and determines a second quantity of samples to select from the second mip-map layer, the second quantity of samples being less than the first quantity of samples, and a second quantity of filter taps being less than a first quantity of filter taps. Also, the apparatus generates the image at the third texture size through filtering based on the first quantity of samples and the second quantity of samples.
    Type: Application
    Filed: June 25, 2020
    Publication date: December 31, 2020
    Inventors: Liang LI, Andrew Evan GRUBER, Yunshan KONG
  • Publication number: 20200410626
    Abstract: The present disclosure relates to methods and apparatus for graphics processing. In some aspects, the apparatus can determine one or more context states of at least one context register in each of multiple wave slots. The apparatus can also send information corresponding to the one or more context states in one of the multiple wave slots to a context queue. Further, the apparatus can convert the information corresponding to the one or more context states to context information compatible with the context queue. The apparatus can also store the context information compatible with the context queue in the context queue. In some aspects, the apparatus can send the context information compatible with the context queue to one of the multiple wave slots. Additionally, the apparatus can convert the context information compatible with the context queue to the information corresponding to the one or more context states.
    Type: Application
    Filed: June 27, 2019
    Publication date: December 31, 2020
    Inventors: Yun DU, Andrew Evan Gruber, Chun Yu, Zilin Ying
  • Patent number: 10796478
    Abstract: A method, a computer-readable medium, and an apparatus are provided. The apparatus may be a GPU. The GPU generates first visibility information during a visibility pass associated with an application requested depth pre-pass. In addition, the GPU renders an application requested color pass based on the first visibility information generated during the visibility pass associated with the application requested depth pre-pass.
    Type: Grant
    Filed: September 26, 2018
    Date of Patent: October 6, 2020
    Assignee: QUALCOMM Incorporated
    Inventor: Andrew Evan Gruber
  • Publication number: 20200312006
    Abstract: Example techniques are described for generating graphics content by obtaining texture operation instructions corresponding to a texture operation, in response to determining at least one of insufficient general purpose register space is available for the texture operation or insufficient wave slots are available for the texture operation, generating an indication that the texture operation corresponds to a deferred wave, executing the texture operation, sending, to a texture processor, initial texture sample instructions corresponding to the texture operation that was executed, and receiving texture mapped data corresponding to the initial texture sample instructions.
    Type: Application
    Filed: March 26, 2019
    Publication date: October 1, 2020
    Inventors: Yun DU, Andrew Evan GRUBER, Chun YU, Chihong ZHANG, Hongjiang SHANG, Zilin YING, Fei WEI
  • Publication number: 20200273142
    Abstract: The described techniques provide for bin-based rendering where the scene geometry in a frame is subdivided into bins or tiles, and bins are resolved concurrently with the rendering of a next bin. For example, a graphics processing unit (GPU) may process an entire image and sort transactions (e.g., rasterized primitives, such as triangles) into bins. For the rendering of each transaction, a device may identify a memory address of a memory block (e.g., a unit or portion of internal GPU memory (GMEM)) the transaction will be written (i.e., rendered) to. The device may thus prepare the memory block for rendering (e.g., by performing a resolve operation, a clear operation, or an unresolve operation on the memory block), such that the memory block is prepared prior to rendering of the particular transaction. As such, transactions of a bin may be resolved concurrently with rendering of transactions of a next bin.
    Type: Application
    Filed: February 21, 2019
    Publication date: August 27, 2020
    Inventors: Shambhoo Khandelwal, Tao Wang, Shangmei Yu, Jing Gao, Jian Liang, Andrew Evan Gruber, Chun Yu
  • Patent number: 10706494
    Abstract: A method for processing data in a graphics processing unit including receiving an indication that all threads of a warp in a graphics processing unit (GPU) are to execute a same branch in a first set of instructions, storing one or more predicate bits in a memory as a single set of predicate bits, wherein the single set of predicate bits applies to all of the threads in the warp, and executing a portion of the first set of instructions in accordance with the single set of predicate bits. Executing the first set of instructions may include executing the first set of instruction in accordance with the single set of predicate bits using a single instruction, multiple data (SIMD) processing core and/or executing the first set of instruction in accordance with the single set of predicate bits using a scalar processing unit.
    Type: Grant
    Filed: August 14, 2018
    Date of Patent: July 7, 2020
    Assignee: QUALCOMM Incorporated
    Inventors: Andrew Evan Gruber, Pramod Vasant Argade, Jing Wu
  • Publication number: 20200210299
    Abstract: The disclosure describes techniques for a self-test of a graphics processing unit (GPU) independent of instructions from another processing device. The GPU may perform the self-test in response to a determination that the GPU enters an idle mode. The self-test may be based on information indicating a safety level, where the safety level indicates how many faults in circuits or memory blocks of the GPU need to be detected.
    Type: Application
    Filed: March 11, 2020
    Publication date: July 2, 2020
    Inventors: Rahul GULATI, Andrew Evan GRUBER, Brendon Lewis JOHNSON, Jay Chunsup YUN, Donghyun KIM, Alex Kwang Ho JONG, Anshuman SAXENA
  • Patent number: 10628274
    Abstract: The disclosure describes techniques for a self-test of a graphics processing unit (GPU) independent of instructions from another processing device. The GPU may perform the self-test in response to a determination that the GPU enters an idle mode. The self-test may be based on information indicating a safety level, where the safety level indicates how many faults in circuits or memory blocks of the GPU need to be detected.
    Type: Grant
    Filed: December 5, 2017
    Date of Patent: April 21, 2020
    Assignee: QUALCOMM Incorporated
    Inventors: Rahul Gulati, Andrew Evan Gruber, Brendon Lewis Johnson, Jay Chunsup Yun, Donghyun Kim, Alex Kwang Ho Jong, Anshuman Saxena
  • Publication number: 20200118328
    Abstract: Aspects of this disclosure relate to a process for rendering graphics that includes performing, with a hardware unit of a graphics processing unit (GPU) designated for vertex shading, a vertex shading operation to shade input vertices so as to output vertex shaded vertices, wherein the hardware unit adheres to an interface that receives a single vertex as an input and generates a single vertex as an output. The process also includes performing, with the hardware unit of the GPU designated for vertex shading, a hull shading operation to generate one or more control points based on one or more of the vertex shaded vertices, wherein the one or more hull shading operations operate on at least one of the one or more vertex shaded vertices to output the one or more control points.
    Type: Application
    Filed: December 11, 2019
    Publication date: April 16, 2020
    Inventors: Vineet Goel, Andrew Evan Gruber, Donghyun Kim
  • Patent number: 10621690
    Abstract: A computing device may allocate a plurality of blocks in the memory, wherein each of the plurality of blocks is of a uniform fixed size in the memory. The computing device may further store a plurality of bandwidth-compressed graphics data into the respective plurality of blocks in the memory, wherein one or more of the plurality of bandwidth-compressed graphics data each has a size that is smaller than the fixed size. The computing device may further store data associated with the plurality of bandwidth-compressed graphics data into unused space of one or more of the plurality of blocks that contains the respective one or more of the plurality of bandwidth-compressed graphics data.
    Type: Grant
    Filed: September 17, 2015
    Date of Patent: April 14, 2020
    Assignee: QUALCOMM Incorporated
    Inventors: Andrew Evan Gruber, Rexford Alan Hill, Shambhoo Khandelwal
  • Publication number: 20200098165
    Abstract: A method, a computer-readable medium, and an apparatus are provided. The apparatus may be a GPU. The GPU generates first visibility information during a visibility pass associated with an application requested depth pre-pass. In addition, the GPU renders an application requested color pass based on the first visibility information generated during the visibility pass associated with the application requested depth pre-pass.
    Type: Application
    Filed: September 26, 2018
    Publication date: March 26, 2020
    Inventor: Andrew Evan GRUBER
  • Patent number: 10559123
    Abstract: Aspects of this disclosure relate to a process for rendering graphics that includes designating a hardware shading unit of a graphics processing unit (GPU) to perform first shading operations associated with a first shader stage of a rendering pipeline. The process also includes switching operational modes of the hardware shading unit upon completion of the first shading operations. The process also includes performing, with the hardware shading unit of the GPU designated to perform the first shading operations, second shading operations associated with a second, different shader stage of the rendering pipeline.
    Type: Grant
    Filed: March 14, 2013
    Date of Patent: February 11, 2020
    Assignee: QUALCOMM Incorporated
    Inventors: Vineet Goel, Andrew Evan Gruber
  • Publication number: 20200020067
    Abstract: A method, an apparatus, and a computer-readable medium may be configured to perform a binning pass for a first frame. The apparatus may be configured to perform a rendering pass for the first frame in parallel with the binning pass. The apparatus may be configured to enhance efficiency in performing a binning pass and a rendering pass for tile-based rendering, such that the binning pass and rendering pass are performed concurrently. The apparatus may be configured to perform the binning pass using a first hardware pipeline, and may be configured to perform the rendering pass using a second hardware pipeline.
    Type: Application
    Filed: July 13, 2018
    Publication date: January 16, 2020
    Inventors: Jian LIANG, Tao WANG, Chun YU, Andrew Evan GRUBER, Donghyun KIM, Nigel POOLE, Tzun-Wei LEE, Shambhoo KHANDELWAL