Patents by Inventor Vineet Goel

Vineet Goel has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20210374607
    Abstract: A device is disclosed. The device includes a machine learning die including a memory and one or more machine learning accelerators; and a processing core die stacked with the machine learning die, the processing core die being configured to execute shader programs for controlling operations on the machine learning die, wherein the memory is configurable as either or both of a cache and a directly accessible memory.
    Type: Application
    Filed: December 21, 2020
    Publication date: December 2, 2021
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Maxim V. Kazakov, Swapnil P. Sakharshete, Milind N. Nemlekar, Vineet Goel
  • Patent number: 11158106
    Abstract: Techniques for performing shader operations are provided. The techniques include, performing pixel shading at a shading rate defined by pixel shader variable rate shading (“VRS”) data, updating the pixel VRS data that indicates one or more shading rates for one or more tiles based on whether the tiles of the one or more tiles include triangle edges or do not include triangle edges, to generate updated VRS data, and writing a VRS rate feedback buffer based on the updated VRS data.
    Type: Grant
    Filed: December 20, 2019
    Date of Patent: October 26, 2021
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Skyler Jonathon Saleh, Vineet Goel, Pazhani Pillai, Ruijin Wu, Christopher J. Brennan, Andrew S. Pomianowski
  • Publication number: 20210225060
    Abstract: A processing device and a method of tiled rendering of an image for display is provided. The processing device includes memory and a processor. The processor is configured to receive the image comprising one or more three dimensional (3D) objects, divide the image into tiles, execute coarse level tiling for the tiles of the image and execute fine level tiling for the tiles of the image. The processing device also includes same fixed function hardware used to execute the coarse level tiling and the fine level tiling. The processor is also configured to determine visibility information for a first one of the tiles. The visibility information is divided into draw call visibility information and triangle visibility information for each remaining tile of the image.
    Type: Application
    Filed: September 25, 2020
    Publication date: July 22, 2021
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Mika Tuomi, Kiia Kallio, Ruijin Wu, Anirudh R. Acharya, Vineet Goel
  • Publication number: 20210192827
    Abstract: Techniques for performing shader operations are provided. The techniques include, performing pixel shading at a shading rate defined by pixel shader variable rate shading (“VRS”) data, updating the pixel VRS data that indicates one or more shading rates for one or more tiles based on whether the tiles of the one or more tiles include triangle edges or do not include triangle edges, to generate updated VRS data, and writing a VRS rate feedback buffer based on the updated VRS data.
    Type: Application
    Filed: December 20, 2019
    Publication date: June 24, 2021
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Skyler Jonathon Saleh, Vineet Goel, Pazhani Pillai, Ruijin Wu, Christopher J. Brennan, Andrew S. Pomianowski
  • Publication number: 20210150669
    Abstract: A processing device is provided which includes memory and a processor. The processor is configured to receive an input image having a first resolution, generate linear down-sampled versions of the input image by down-sampling the input image via a linear upscaling network and generate non-linear down-sampled versions of the input image by down-sampling the input image via a non-linear upscaling network.
    Type: Application
    Filed: November 18, 2019
    Publication date: May 20, 2021
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Alexander M. Potapov, Skyler Jonathon Saleh, Swapnil P. Sakharshete, Vineet Goel
  • Publication number: 20210089423
    Abstract: A technique for operating a processor that includes multiple cores is provided. The technique includes determining a number of active applications, selecting a processor configuration for the processor based on the number of active applications, configuring the processor according to the selected processor configuration, and executing the active applications with the configured processor.
    Type: Application
    Filed: June 26, 2020
    Publication date: March 25, 2021
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Ruijin Wu, Skyler Jonathon Saleh, Vineet Goel
  • Publication number: 20210026686
    Abstract: Techniques for performing machine learning operations are provided. The techniques include configuring a first portion of a first chiplet as a cache; performing caching operations via the first portion; configuring at least a first sub-portion of the first portion of the chiplet as directly-accessible memory; and performing machine learning operations with the first sub-portion by a machine learning accelerator within the first chiplet.
    Type: Application
    Filed: July 20, 2020
    Publication date: January 28, 2021
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Swapnil P. Sakharshete, Andrew S. Pomianowski, Maxim V. Kazakov, Vineet Goel, Milind N. Nemlekar, Skyler Jonathon Saleh
  • Publication number: 20200184002
    Abstract: A processing device is provided which includes memory configured to store data and a processor configured to determine, based on convolutional parameters associated with an image, a virtual general matrix-matrix multiplication (GEMM) space of a virtual GEMM space output matrix and generate, in the virtual GEMM space output matrix, a convolution result by matrix multiplying the data corresponding to a virtual GEMM space input matrix with the data corresponding to a virtual GEMM space filter matrix. The processing device also includes convolutional mapping hardware configured to map, based on the convolutional parameters, positions of the virtual GEMM space input matrix to positions of an image space of the image.
    Type: Application
    Filed: August 30, 2019
    Publication date: June 11, 2020
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Swapnil P. Sakharshete, Samuel Lawrence Wasmundt, Maxim V. Kazakov, Vineet Goel
  • Publication number: 20200118328
    Abstract: Aspects of this disclosure relate to a process for rendering graphics that includes performing, with a hardware unit of a graphics processing unit (GPU) designated for vertex shading, a vertex shading operation to shade input vertices so as to output vertex shaded vertices, wherein the hardware unit adheres to an interface that receives a single vertex as an input and generates a single vertex as an output. The process also includes performing, with the hardware unit of the GPU designated for vertex shading, a hull shading operation to generate one or more control points based on one or more of the vertex shaded vertices, wherein the one or more hull shading operations operate on at least one of the one or more vertex shaded vertices to output the one or more control points.
    Type: Application
    Filed: December 11, 2019
    Publication date: April 16, 2020
    Inventors: Vineet Goel, Andrew Evan Gruber, Donghyun Kim
  • Publication number: 20200098169
    Abstract: Described herein are techniques for improving the effectiveness of depth culling. In a first technique, a binner is used to sort primitives into depth bins. Each depth bin covers a range of depths. The binner transmits the depth bins to the screen space pipeline for processing in near-to-far order. Processing the near bins first results in the depth buffer being updated, allowing fragments for the primitives in the farther bins to be culled more aggressively than if the depth binning did not occur. In a second technique, a buffer is used to initiate two-pass processing through the screen space pipeline. In the first pass, primitives are sent down to update the depth block and are then culled. The fragments are processed normally in the second pass, with the benefit of the updated depth values.
    Type: Application
    Filed: September 21, 2018
    Publication date: March 26, 2020
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Ruijin Wu, Young In Yeo, Sagar S. Bhandare, Vineet Goel, Martin G. Sarov, Christopher J. Brennan
  • Patent number: 10600142
    Abstract: A compute unit accesses a chunk of bits that represent indices of vertices of a graphics primitive. The compute unit sets values of a first bit to indicate whether the chunk is monotonic or ordinary, second bits to define an offset that is determined based on values of indices in the chunk, and sets of third bits that determine values of the indices in the chunk based on the offset defined by the second bits. The compute unit writes a compressed chunk represented by the first bit, the second bits, and the sets of third bits to a memory. The compressed chunk is decompressed and the decompressed indices are written to an index buffer. In some embodiments, the indices are decompressed based on metadata that includes offsets that are determined based on values of the indices and bitfields that indicate characteristics of the indices.
    Type: Grant
    Filed: December 5, 2017
    Date of Patent: March 24, 2020
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Usame Ceylan, Young In Yeo, Todd Martin, Vineet Goel
  • Patent number: 10559123
    Abstract: Aspects of this disclosure relate to a process for rendering graphics that includes designating a hardware shading unit of a graphics processing unit (GPU) to perform first shading operations associated with a first shader stage of a rendering pipeline. The process also includes switching operational modes of the hardware shading unit upon completion of the first shading operations. The process also includes performing, with the hardware shading unit of the GPU designated to perform the first shading operations, second shading operations associated with a second, different shader stage of the rendering pipeline.
    Type: Grant
    Filed: March 14, 2013
    Date of Patent: February 11, 2020
    Assignee: QUALCOMM Incorporated
    Inventors: Vineet Goel, Andrew Evan Gruber
  • Patent number: 10535185
    Abstract: Aspects of this disclosure relate to a process for rendering graphics that includes performing, with a hardware unit of a graphics processing unit (GPU) designated for vertex shading, a vertex shading operation to shade input vertices so as to output vertex shaded vertices, wherein the hardware unit adheres to an interface that receives a single vertex as an input and generates a single vertex as an output. The process also includes performing, with the hardware unit of the GPU designated for vertex shading, a hull shading operation to generate one or more control points based on one or more of the vertex shaded vertices, wherein the one or more hull shading operations operate on at least one of the one or more vertex shaded vertices to output the one or more control points.
    Type: Grant
    Filed: March 14, 2013
    Date of Patent: January 14, 2020
    Assignee: QUALCOMM Incorporated
    Inventors: Vineet Goel, Andrew Evan Gruber, Donghyun Kim
  • Patent number: 10453243
    Abstract: Processing of non-real-time and real-time workloads is performed using discrete pipelines. A first pipeline includes a first shader and one or more fixed function hardware blocks. A second pipeline includes a second shader that is configured to emulate the at least one fixed function hardware block. First and second memory elements store first state information for the first pipeline and second state information for the second pipeline, respectively. A non-real-time workload executing in the first pipeline is preempted at a primitive boundary in response to a real-time workload being dispatched for execution in the second pipeline. The first memory element retains the first state information in response to preemption of the non-real-time workload. The first pipeline is configured to resume processing the subsequent primitive on the basis of the first state information stored in the first memory element.
    Type: Grant
    Filed: January 3, 2019
    Date of Patent: October 22, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Anirudh R. Acharya, Swapnil Sakharshete, Michael Mantor, Mangesh P. Nijasure, Todd Martin, Vineet Goel
  • Patent number: 10417791
    Abstract: Techniques are described for using a texture unit to perform operations of a shader processor. Some operations of a shader processor are repeatedly executed until a condition is satisfied, and in each execution iteration, the shader processor accesses the texture unit. Techniques are described for the texture unit to perform such operations until the condition is satisfied.
    Type: Grant
    Filed: June 12, 2018
    Date of Patent: September 17, 2019
    Assignee: QUALCOMM Incorporated
    Inventors: Usame Ceylan, Vineet Goel, Juraj Obert, Liang Li
  • Publication number: 20190197761
    Abstract: A texture processor based ray tracing accelerator method and system are described. The system includes a shader, texture processor (TP) and cache, which are interconnected. The TP includes a texture address unit (TA), a texture cache processor (TCP), a filter pipeline unit and a ray intersection engine. The shader sends a texture instruction which contains ray data and a pointer to a bounded volume hierarchy (BVH) node to the TA. The TCP uses an address provided by the TA to fetch BVH node data from the cache. The ray intersection engine performs ray-BVH node type intersection testing using the ray data and the BVH node data. The intersection testing results and indications for BVH traversal are returned to the shader via a texture data return path. The shader reviews the intersection results and the indications to decide how to traverse to the next BVH node.
    Type: Application
    Filed: December 22, 2017
    Publication date: June 27, 2019
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Skyler Jonathon Saleh, Maxim V. Kazakov, Vineet Goel
  • Publication number: 20190172173
    Abstract: A compute unit accesses a chunk of bits that represent indices of vertices of a graphics primitive. The compute unit sets values of a first bit to indicate whether the chunk is monotonic or ordinary, second bits to define an offset that is determined based on values of indices in the chunk, and sets of third bits that determine values of the indices in the chunk based on the offset defined by the second bits. The compute unit writes a compressed chunk represented by the first bit, the second bits, and the sets of third bits to a memory. The compressed chunk is decompressed and the decompressed indices are written to an index buffer. In some embodiments, the indices are decompressed based on metadata that includes offsets that are determined based on values of the indices and bitfields that indicate characteristics of the indices.
    Type: Application
    Filed: December 5, 2017
    Publication date: June 6, 2019
    Inventors: Usame Ceylan, Young In Yeo, Todd Martin, Vineet Goel
  • Publication number: 20190164328
    Abstract: Processing of non-real-time and real-time workloads is performed using discrete pipelines. A first pipeline includes a first shader and one or more fixed function hardware blocks. A second pipeline includes a second shader that is configured to emulate the at least one fixed function hardware block. First and second memory elements store first state information for the first pipeline and second state information for the second pipeline, respectively. A non-real-time workload executing in the first pipeline is preempted at a primitive boundary in response to a real-time workload being dispatched for execution in the second pipeline. The first memory element retains the first state information in response to preemption of the non-real-time workload. The first pipeline is configured to resume processing the subsequent primitive on the basis of the first state information stored in the first memory element.
    Type: Application
    Filed: January 3, 2019
    Publication date: May 30, 2019
    Inventors: Anirudh R. ACHARYA, Swapnil SAKHARSHETE, Michael MANTOR, Mangesh P. NIJASURE, Todd MARTIN, Vineet GOEL
  • Patent number: 10210593
    Abstract: A graphics processing unit (GPU) may dispatch a first set of commands for execution on one or more processing units of the GPU. The GPU may receive notification from a host device indicating that a second set of commands are ready to execute on the GPU. In response, the GPU may issue a first preemption command at a first preemption granularity to the one or more processing units. In response to the GPU failing to preempt execution of the first set of commands within an elapsed time period after issuing the first preemption command, the GPU may issue a second preemption command at a second preemption granularity to the one or more processing units, where the second preemption granularity is finer-grained than the first preemption granularity.
    Type: Grant
    Filed: January 28, 2016
    Date of Patent: February 19, 2019
    Assignee: QUALCOMM Incorporated
    Inventors: Anirudh Rajendra Acharya, Alexei Vladimirovich Bourd, David Rigel Garcia Garcia, Milind Nilkanth Nemlekar, Vineet Goel
  • Patent number: 10210650
    Abstract: Processing of non-real-time and real-time workloads is performed using discrete pipelines. A first pipeline includes a first shader and one or more fixed function hardware blocks. A second pipeline includes a second shader that is configured to emulate the at least one fixed function hardware block. First and second memory elements store first state information for the first pipeline and second state information for the second pipeline, respectively. A non-real-time workload executing in the first pipeline is preempted at a primitive boundary in response to a real-time workload being dispatched for execution in the second pipeline. The first memory element retains the first state information in response to preemption of the non-real-time workload. The first pipeline is configured to resume processing the subsequent primitive on the basis of the first state information stored in the first memory element.
    Type: Grant
    Filed: November 30, 2017
    Date of Patent: February 19, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Anirudh R. Acharya, Swapnil Sakharshete, Michael Mantor, Mangesh P. Nijasure, Todd Martin, Vineet Goel