Patents by Inventor Anirudh R. ACHARYA

Anirudh R. ACHARYA has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Processing system with selective priority-based two-level binning

Patent number: 12266030

Abstract: Systems and methods related to priority-based and performance-based selection of a render mode, such as a two-level binning mode, in which to execute workloads with a graphics processing unit (GPU) of a system are provided. A user mode driver (UMD) or kernel mode driver (KMD) executed at a central processing unit (CPU) configures low and medium priority workloads to be executed in a two-level binning mode and selects a binning mode for high priority workloads based on whether performance heuristics indicate that one or more binning conditions or override conditions have been met. High priority workloads are maintained in a high priority queue, while low and medium priority workloads are maintained in a low/medium priority queue, such that execution of low and medium priority workloads at the GPU can be preempted in favor of executing high priority workloads.

Type: Grant

Filed: April 15, 2021

Date of Patent: April 1, 2025

Assignee: Advanced Micro Devices, Inc.

Inventors: Anirudh R. Acharya, Ruijin Wu, Young In Yeo
VMID as a GPU task container for virtualization

Patent number: 12153958

Abstract: Systems, apparatuses, and methods for abstracting tasks in virtual memory identifier (VMID) containers are disclosed. A processor coupled to a memory executes a plurality of concurrent tasks including a first task. Responsive to detecting one or more instructions of the first task which correspond to a first operation, the processor retrieves a first identifier (ID) which is used to uniquely identify the first task, wherein the first ID is transparent to the first task. Then, the processor maps the first ID to a second ID and/or a third ID. The processor completes the first operation by using the second ID and/or the third ID to identify the first task to at least a first data structure. In one implementation, the first operation is a memory access operation and the first data structure is a set of page tables. Also, in one implementation, the second ID identifies a first application of the first task and the third ID identifies a first operating system (OS) of the first task.

Type: Grant

Filed: October 7, 2022

Date of Patent: November 26, 2024

Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC

Inventors: Anirudh R. Acharya, Michael J. Mantor, Rex Eldon McCrary, Anthony Asaro, Jeffrey Gongxian Cheng, Mark Fowler
Fine grained replay control in binning hardware

Patent number: 12154224

Abstract: Some implementations provide systems, devices, and methods for rendering a plurality of primitives of a frame, the plurality of primitives being divided into a plurality of batches of primitives and the frame being divided into a plurality of bins. For at least one batch of the plurality of batches the rendering includes, for each of the plurality of bins, rendering primitives of a first sub-batch rasterizing to that bin, and for each of the plurality of bins, rendering primitives of a second sub-batch rasterizing to that bin.

Type: Grant

Filed: September 25, 2020

Date of Patent: November 26, 2024

Assignee: Advanced Micro Devices, Inc.

Inventors: Jan H. Achrenius, Kiia Kallio, Miikka Kangasluoma, Ruijin Wu, Anirudh R. Acharya
SEPARATE CLOCKING FOR COMPONENTS OF A GRAPHICS PROCESSING UNIT

Publication number: 20240345617

Abstract: Systems and methods related to controlling clock signals for clocking shader engines modules (SEs) and non-shader-engine modules (nSEs) of a graphics processing unit (GPU) are provided. One or more dividers receive a clock signal CLK and output a clock signal CLKA to the SEs and output a clock signal CLKB to the nSEs. The frequencies of CLKA and CLKB are independently selected based on sets of performance counter data monitored at the SEs and nSEs, respectively. The clock signal frequency for either the SEs or the nSEs is reduced when the corresponding sets of performance counter data indicates a comparatively lower processing workload for the SEs or for the nSEs.

Type: Application

Filed: March 13, 2024

Publication date: October 17, 2024

Inventors: Ranjith Kumar SAJJA, Sreekanth GODEY, Anirudh R. ACHARYA
SECURE MEMORY ACCESS IN A VIRTUALIZED COMPUTING ENVIRONMENT

Publication number: 20240330199

Abstract: A processor supports secure memory access in a virtualized computing environment by employing requestor identifiers at bus devices (such as a graphics processing unit) to identify the virtual machine associated with each memory access request. The virtualized computing environment uses the requestor identifiers to control access to different regions of system memory, ensuring that each VM accesses only those regions of memory that the VM is allowed to access. The virtualized computing environment thereby supports efficient memory access by the bus devices while ensuring that the different regions of memory are protected from unauthorized access.

Type: Application

Filed: November 22, 2023

Publication date: October 3, 2024

Inventors: Anthony ASARO, Jeffrey G. CHENG, Anirudh R. ACHARYA
Graphics processing unit with selective two-level binning

Patent number: 12086899

Abstract: Systems and methods related to run-time selection of a render mode in which to execute command buffers with a graphics processing unit (GPU) of a device based on performance data corresponding to the device are provided. A user mode driver (UMD) or kernel mode driver (KMD) executed at a central processing unit (CPU) selects abinning mode based on whether performance data that includes sensor data or performance counter data indicates that an associated binning condition or override condition has been met. The UMD or the KMD causes pending command buffers to be patched to execute in the selected binning mode based on whether the binning mode is enabled or disabled.

Type: Grant

Filed: September 25, 2020

Date of Patent: September 10, 2024

Assignee: Advanced Micro Devices, Inc.

Inventors: Anirudh R. Acharya, Ruijin Wu, Paul E. Ruggieri
HYBRID BINNING

Publication number: 20240257435

Abstract: A processing device and a method of tiled rendering of an image for display is provided. The processing device includes memory and a processor. The processor is configured to receive the image comprising one or more three dimensional (3D) objects, divide the image into tiles, execute coarse level tiling for the tiles of the image and execute fine level tiling for the tiles of the image. The processing device also includes same fixed function hardware used to execute the coarse level tiling and the fine level tiling. The processor is also configured to determine visibility information for a first one of the tiles. The visibility information is divided into draw call visibility information and triangle visibility information for each remaining tile of the image.

Type: Application

Filed: April 11, 2024

Publication date: August 1, 2024

Applicant: Advanced Micro Devices, Inc.

Inventors: Mika Tuomi, Kiia Kallio, Ruijin Wu, Anirudh R. Acharya, Vineet Goel
Systems and methods for distributed rendering using two-level binning

Patent number: 12051154

Abstract: Systems and methods for distributed rendering using two-level binning include processing primitives of a frame to be rendered at a first graphics processing unit (GPU) chiplet in a set of GPU chiplets to generate visibility information of primitives for each coarse bin and providing the visibility information to the other GPU chiplets in the set of GPU chiplets. Each coarse bin is then assigned to one of the GPU chiplets of the set of GPU chiplets and rendered at the assigned GPU chiplet based on the corresponding visibility information.

Type: Grant

Filed: December 27, 2021

Date of Patent: July 30, 2024

Assignee: Advanced Micro Devices, Inc.

Inventors: Anirudh R. Acharya, Ruijin Wu
Delta triplet index compression

Patent number: 12014527

Abstract: Methods, devices, and systems for compressing and decompressing a stream of indices associated with graphics primitives. A group of delta values is determined based on a group of indices of the stream of indices. The group of delta values is compared to delta values in a lookup table. The group of indices is compressed based on an entry in the lookup table if the group of delta values matches all delta values in the entry, otherwise, the group of indices is compressed based on variable-length encoding.

Type: Grant

Filed: February 26, 2021

Date of Patent: June 18, 2024

Assignee: Advanced Micro Devices, Inc.

Inventors: Kiia Kallio, Mika Tuomi, Ruijin Wu, Anirudh R. Acharya
Hybrid binning

Patent number: 11972518

Abstract: A processing device and a method of tiled rendering of an image for display is provided. The processing device includes memory and a processor. The processor is configured to receive the image comprising one or more three dimensional (3D) objects, divide the image into tiles, execute coarse level tiling for the tiles of the image and execute fine level tiling for the tiles of the image. The processing device also includes same fixed function hardware used to execute the coarse level tiling and the fine level tiling. The processor is also configured to determine visibility information for a first one of the tiles. The visibility information is divided into draw call visibility information and triangle visibility information for each remaining tile of the image.

Type: Grant

Filed: September 25, 2020

Date of Patent: April 30, 2024

Assignee: Advanced Micro Devices, Inc.

Inventors: Mika Tuomi, Kiia Kallio, Ruijin Wu, Anirudh R. Acharya, Vineet Goel
Separate clocking for components of a graphics processing unit

Patent number: 11947380

Abstract: Systems and methods related to controlling clock signals for clocking shader engines modules (SEs) and non-shader-engine modules (nSEs) of a graphics processing unit (GPU) are provided. One or more dividers receive a clock signal CLK and output a clock signal CLKA to the SEs and output a clock signal CLKB to the nSEs. The frequencies of CLKA and CLKB are independently selected based on sets of performance counter data monitored at the SEs and nSEs, respectively. The clock signal frequency for either the SEs or the nSEs is reduced when the corresponding sets of performance counter data indicates a comparatively lower processing workload for the SEs or for the nSEs.

Type: Grant

Filed: August 18, 2022

Date of Patent: April 2, 2024

Assignee: Advanced Micro Devices, Inc.

Inventors: Ranjith Kumar Sajja, Sreekanth Godey, Anirudh R. Acharya
Synchronization free cross pass binning through subpass interleaving

Patent number: 11880924

Abstract: A method of tiled rendering is provided which comprises dividing a frame to be rendered, into a plurality of tiles, receiving commands to execute a plurality of subpasses of the tiles and interleaving execution of same subpasses of multiple tiles of the frame. Interleaving execution of same subpasses of multiple tiles comprises executing a previously ordered first subpass of a second tile between execution of the previously ordered first subpass of a first tile and execution of a subsequently ordered second subpass of the first tile. The interleaving is performed, for example, by executing the plurality of subpasses in an order different from the order in which the commands to execute the plurality of subpasses are stored and issued. Alternatively, interleaving is performed by executing one or more subpasses as skip operations such that the plurality of subpasses are executed in the same order.

Type: Grant

Filed: December 29, 2021

Date of Patent: January 23, 2024

Assignee: Advanced Micro Devices, Inc.

Inventors: Ruijin Wu, Mika Tuomi, Paavo Sampo Ilmari Pessi, Anirudh R. Acharya
Secure memory access in a virtualized computing environment

Patent number: 11836091

Abstract: A processor supports secure memory access in a virtualized computing environment by employing requestor identifiers at bus devices (such as a graphics processing unit) to identify the virtual machine associated with each memory access request. The virtualized computing environment uses the requestor identifiers to control access to different regions of system memory, ensuring that each VM accesses only those regions of memory that the VM is allowed to access. The virtualized computing environment thereby supports efficient memory access by the bus devices while ensuring that the different regions of memory are protected from unauthorized access.

Type: Grant

Filed: October 31, 2018

Date of Patent: December 5, 2023

Assignees: Advanced Micro Devices, Inc., ATI TECHNOLOGIES ULC

Inventors: Anthony Asaro, Jeffrey G. Cheng, Anirudh R. Acharya
Command processor prefetch techniques

Patent number: 11782838

Abstract: Techniques for prefetching are provided. The techniques include receiving a first prefetch command; in response to determining that a history buffer indicates that first information associated with the first prefetch command has not already been prefetched, prefetching the first information into a memory; receiving a second prefetch command; and in response to determining that the history buffer indicates that second information associated with the second prefetch command has already been prefetched, avoiding prefetching the second information into the memory.

Type: Grant

Filed: March 31, 2021

Date of Patent: October 10, 2023

Assignee: Advanced Micro Devices, Inc.

Inventors: Anirudh R. Acharya, Alexander Fuad Ashkar
Overlapping visibility and render passes for same frame

Patent number: 11741653

Abstract: A method of tiled rendering of an image for display is provided which comprises receiving an image comprising one or more three dimensional (3D) objects and executing a visibility pass for determining locations of primitives of the image. The method also comprises executing, concurrently with the executing of the visibility pass, front end geometry processing of one of the primitives determined, from the visibility pass, to be in a first one of a plurality of tiles of the image and executing, concurrently with the executing of the visibility pass, back end processing of the one primitive in the first tile.

Type: Grant

Filed: July 28, 2020

Date of Patent: August 29, 2023

Assignee: Advanced Micro Devices, Inc.

Inventors: Mika Tuomi, Ruijin Wu, Anirudh R. Acharya, Kiia Kallio
SEPARATE CLOCKING FOR COMPONENTS OF A GRAPHICS PROCESSING UNIT

Publication number: 20230096002

Abstract: Systems and methods related to controlling clock signals for clocking shader engines modules (SEs) and non-shader-engine modules (nSEs) of a graphics processing unit (GPU) are provided. One or more dividers receive a clock signal CLK and output a clock signal CLKA to the SEs and output a clock signal CLKB to the nSEs. The frequencies of CLKA and CLKB are independently selected based on sets of performance counter data monitored at the SEs and nSEs, respectively. The clock signal frequency for either the SEs or the nSEs is reduced when the corresponding sets of performance counter data indicates a comparatively lower processing workload for the SEs or for the nSEs.

Type: Application

Filed: August 18, 2022

Publication date: March 30, 2023

Inventors: Ranjith Kumar SAJJA, Sreekanth GODEY, Anirudh R. ACHARYA
Precise suspend and resume of workloads in a processing unit

Patent number: 11609791

Abstract: A first workload is executed in a first subset of pipelines of a processing unit. A second workload is executed in a second subset of the pipelines of the processing unit. The second workload is dependent upon the first workload. The first and second workloads are suspended and state information for the first and second workloads is stored in a first memory in response to suspending the first and second workloads. In some cases, a third workload executes in a third subset of the pipelines of the processing unit concurrently with executing the first and second workloads. In some cases, a fourth workload is executed in the first and second pipelines after suspending the first and second workloads. The first and second pipelines are resumed on the basis of the stored state information in response to completion or suspension of the fourth workload.

Type: Grant

Filed: November 30, 2017

Date of Patent: March 21, 2023

Assignee: Advanced Micro Devices, Inc.

Inventors: Anirudh R. Acharya, Michael Mantor
VMID AS A GPU TASK CONTAINER FOR VIRTUALIZATION

Publication number: 20230055695

Abstract: Systems, apparatuses, and methods for abstracting tasks in virtual memory identifier (VMID) containers are disclosed. A processor coupled to a memory executes a plurality of concurrent tasks including a first task. Responsive to detecting one or more instructions of the first task which correspond to a first operation, the processor retrieves a first identifier (ID) which is used to uniquely identify the first task, wherein the first ID is transparent to the first task. Then, the processor maps the first ID to a second ID and/or a third ID. The processor completes the first operation by using the second ID and/or the third ID to identify the first task to at least a first data structure. In one implementation, the first operation is a memory access operation and the first data structure is a set of page tables. Also, in one implementation, the second ID identifies a first application of the first task and the third ID identifies a first operating system (OS) of the first task.

Type: Application

Filed: October 7, 2022

Publication date: February 23, 2023

Inventors: Anirudh R. Acharya, Michael J. Mantor, Rex Eldon McCrary, Anthony Asaro, Jeffrey Gongxian Cheng, Mark Fowler
Reducing save restore latency for power control based on write signals

Patent number: 11579876

Abstract: A method of save-restore operations includes monitoring, by a power controller of a parallel processor (such as a graphics processing unit), of a register bus for one or more register write signals. The power controller determines that a register write signal is addressed to a state register that is designated to be saved prior to changing a power state of the parallel processor from a first state to a second state having a lower level of energy usage. The power controller instructs a copy of data corresponding to the state register to be written to a local memory module of the parallel processor. Subsequently, the parallel processor receives a power state change signal and writes state register data saved at the local memory module to an off-chip memory prior to changing the power state of the parallel processor.

Type: Grant

Filed: August 31, 2020

Date of Patent: February 14, 2023

Assignees: Advanced Micro Devices, Inc., ATI TECHNOLOGIES ULC

Inventors: Anirudh R. Acharya, Alexander Fuad Ashkar, Ashkan Hosseinzadeh Namin
Selectively writing back dirty cache lines concurrently with processing

Patent number: 11562459

Abstract: A graphics pipeline includes a cache having cache lines that are configured to store data used to process frames in a graphics pipeline. The graphics pipeline is implemented using a processor that processes frames for the graphics pipeline using data stored in the cache. The processor processes a first frame and writes back a dirty cache line from the cache to a memory concurrently with processing of the first frame. The dirty cache line is retained in the cache and marked as clean subsequent to being written back to the memory. In some cases, the processor generates a hint that indicates a priority for writing back the dirty cache line based on a read command occupancy at a system memory controller.

Type: Grant

Filed: December 21, 2020

Date of Patent: January 24, 2023

Assignees: ADVANCED MICRO DEVICES, INC., ATI TECHNOLOGIES ULC

Inventors: Noor Mohammed Saleem Bijapur, Ashish Khandelwal, Laurent Lefebvre, Anirudh R. Acharya

1 2 3 next