Patents by Inventor Anirudh R. ACHARYA
Anirudh R. ACHARYA has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11972518Abstract: A processing device and a method of tiled rendering of an image for display is provided. The processing device includes memory and a processor. The processor is configured to receive the image comprising one or more three dimensional (3D) objects, divide the image into tiles, execute coarse level tiling for the tiles of the image and execute fine level tiling for the tiles of the image. The processing device also includes same fixed function hardware used to execute the coarse level tiling and the fine level tiling. The processor is also configured to determine visibility information for a first one of the tiles. The visibility information is divided into draw call visibility information and triangle visibility information for each remaining tile of the image.Type: GrantFiled: September 25, 2020Date of Patent: April 30, 2024Assignee: Advanced Micro Devices, Inc.Inventors: Mika Tuomi, Kiia Kallio, Ruijin Wu, Anirudh R. Acharya, Vineet Goel
-
Patent number: 11947380Abstract: Systems and methods related to controlling clock signals for clocking shader engines modules (SEs) and non-shader-engine modules (nSEs) of a graphics processing unit (GPU) are provided. One or more dividers receive a clock signal CLK and output a clock signal CLKA to the SEs and output a clock signal CLKB to the nSEs. The frequencies of CLKA and CLKB are independently selected based on sets of performance counter data monitored at the SEs and nSEs, respectively. The clock signal frequency for either the SEs or the nSEs is reduced when the corresponding sets of performance counter data indicates a comparatively lower processing workload for the SEs or for the nSEs.Type: GrantFiled: August 18, 2022Date of Patent: April 2, 2024Assignee: Advanced Micro Devices, Inc.Inventors: Ranjith Kumar Sajja, Sreekanth Godey, Anirudh R. Acharya
-
Patent number: 11880924Abstract: A method of tiled rendering is provided which comprises dividing a frame to be rendered, into a plurality of tiles, receiving commands to execute a plurality of subpasses of the tiles and interleaving execution of same subpasses of multiple tiles of the frame. Interleaving execution of same subpasses of multiple tiles comprises executing a previously ordered first subpass of a second tile between execution of the previously ordered first subpass of a first tile and execution of a subsequently ordered second subpass of the first tile. The interleaving is performed, for example, by executing the plurality of subpasses in an order different from the order in which the commands to execute the plurality of subpasses are stored and issued. Alternatively, interleaving is performed by executing one or more subpasses as skip operations such that the plurality of subpasses are executed in the same order.Type: GrantFiled: December 29, 2021Date of Patent: January 23, 2024Assignee: Advanced Micro Devices, Inc.Inventors: Ruijin Wu, Mika Tuomi, Paavo Sampo Ilmari Pessi, Anirudh R. Acharya
-
Patent number: 11836091Abstract: A processor supports secure memory access in a virtualized computing environment by employing requestor identifiers at bus devices (such as a graphics processing unit) to identify the virtual machine associated with each memory access request. The virtualized computing environment uses the requestor identifiers to control access to different regions of system memory, ensuring that each VM accesses only those regions of memory that the VM is allowed to access. The virtualized computing environment thereby supports efficient memory access by the bus devices while ensuring that the different regions of memory are protected from unauthorized access.Type: GrantFiled: October 31, 2018Date of Patent: December 5, 2023Assignees: Advanced Micro Devices, Inc., ATI TECHNOLOGIES ULCInventors: Anthony Asaro, Jeffrey G. Cheng, Anirudh R. Acharya
-
Patent number: 11782838Abstract: Techniques for prefetching are provided. The techniques include receiving a first prefetch command; in response to determining that a history buffer indicates that first information associated with the first prefetch command has not already been prefetched, prefetching the first information into a memory; receiving a second prefetch command; and in response to determining that the history buffer indicates that second information associated with the second prefetch command has already been prefetched, avoiding prefetching the second information into the memory.Type: GrantFiled: March 31, 2021Date of Patent: October 10, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Anirudh R. Acharya, Alexander Fuad Ashkar
-
Patent number: 11741653Abstract: A method of tiled rendering of an image for display is provided which comprises receiving an image comprising one or more three dimensional (3D) objects and executing a visibility pass for determining locations of primitives of the image. The method also comprises executing, concurrently with the executing of the visibility pass, front end geometry processing of one of the primitives determined, from the visibility pass, to be in a first one of a plurality of tiles of the image and executing, concurrently with the executing of the visibility pass, back end processing of the one primitive in the first tile.Type: GrantFiled: July 28, 2020Date of Patent: August 29, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Mika Tuomi, Ruijin Wu, Anirudh R. Acharya, Kiia Kallio
-
Publication number: 20230096002Abstract: Systems and methods related to controlling clock signals for clocking shader engines modules (SEs) and non-shader-engine modules (nSEs) of a graphics processing unit (GPU) are provided. One or more dividers receive a clock signal CLK and output a clock signal CLKA to the SEs and output a clock signal CLKB to the nSEs. The frequencies of CLKA and CLKB are independently selected based on sets of performance counter data monitored at the SEs and nSEs, respectively. The clock signal frequency for either the SEs or the nSEs is reduced when the corresponding sets of performance counter data indicates a comparatively lower processing workload for the SEs or for the nSEs.Type: ApplicationFiled: August 18, 2022Publication date: March 30, 2023Inventors: Ranjith Kumar SAJJA, Sreekanth GODEY, Anirudh R. ACHARYA
-
Patent number: 11609791Abstract: A first workload is executed in a first subset of pipelines of a processing unit. A second workload is executed in a second subset of the pipelines of the processing unit. The second workload is dependent upon the first workload. The first and second workloads are suspended and state information for the first and second workloads is stored in a first memory in response to suspending the first and second workloads. In some cases, a third workload executes in a third subset of the pipelines of the processing unit concurrently with executing the first and second workloads. In some cases, a fourth workload is executed in the first and second pipelines after suspending the first and second workloads. The first and second pipelines are resumed on the basis of the stored state information in response to completion or suspension of the fourth workload.Type: GrantFiled: November 30, 2017Date of Patent: March 21, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Anirudh R. Acharya, Michael Mantor
-
Publication number: 20230055695Abstract: Systems, apparatuses, and methods for abstracting tasks in virtual memory identifier (VMID) containers are disclosed. A processor coupled to a memory executes a plurality of concurrent tasks including a first task. Responsive to detecting one or more instructions of the first task which correspond to a first operation, the processor retrieves a first identifier (ID) which is used to uniquely identify the first task, wherein the first ID is transparent to the first task. Then, the processor maps the first ID to a second ID and/or a third ID. The processor completes the first operation by using the second ID and/or the third ID to identify the first task to at least a first data structure. In one implementation, the first operation is a memory access operation and the first data structure is a set of page tables. Also, in one implementation, the second ID identifies a first application of the first task and the third ID identifies a first operating system (OS) of the first task.Type: ApplicationFiled: October 7, 2022Publication date: February 23, 2023Inventors: Anirudh R. Acharya, Michael J. Mantor, Rex Eldon McCrary, Anthony Asaro, Jeffrey Gongxian Cheng, Mark Fowler
-
Patent number: 11579876Abstract: A method of save-restore operations includes monitoring, by a power controller of a parallel processor (such as a graphics processing unit), of a register bus for one or more register write signals. The power controller determines that a register write signal is addressed to a state register that is designated to be saved prior to changing a power state of the parallel processor from a first state to a second state having a lower level of energy usage. The power controller instructs a copy of data corresponding to the state register to be written to a local memory module of the parallel processor. Subsequently, the parallel processor receives a power state change signal and writes state register data saved at the local memory module to an off-chip memory prior to changing the power state of the parallel processor.Type: GrantFiled: August 31, 2020Date of Patent: February 14, 2023Assignees: Advanced Micro Devices, Inc., ATI TECHNOLOGIES ULCInventors: Anirudh R. Acharya, Alexander Fuad Ashkar, Ashkan Hosseinzadeh Namin
-
Patent number: 11562459Abstract: A graphics pipeline includes a cache having cache lines that are configured to store data used to process frames in a graphics pipeline. The graphics pipeline is implemented using a processor that processes frames for the graphics pipeline using data stored in the cache. The processor processes a first frame and writes back a dirty cache line from the cache to a memory concurrently with processing of the first frame. The dirty cache line is retained in the cache and marked as clean subsequent to being written back to the memory. In some cases, the processor generates a hint that indicates a priority for writing back the dirty cache line based on a read command occupancy at a system memory controller.Type: GrantFiled: December 21, 2020Date of Patent: January 24, 2023Assignees: ADVANCED MICRO DEVICES, INC., ATI TECHNOLOGIES ULCInventors: Noor Mohammed Saleem Bijapur, Ashish Khandelwal, Laurent Lefebvre, Anirudh R. Acharya
-
Patent number: 11467870Abstract: Systems, apparatuses, and methods for abstracting tasks in virtual memory identifier (VMID) containers are disclosed. A processor coupled to a memory executes a plurality of concurrent tasks including a first task. Responsive to detecting one or more instructions of the first task which correspond to a first operation, the processor retrieves a first identifier (ID) which is used to uniquely identify the first task, wherein the first ID is transparent to the first task. Then, the processor maps the first ID to a second ID and/or a third ID. The processor completes the first operation by using the second ID and/or the third ID to identify the first task to at least a first data structure. In one implementation, the first operation is a memory access operation and the first data structure is a set of page tables. Also, in one implementation, the second ID identifies a first application of the first task and the third ID identifies a first operating system (OS) of the first task.Type: GrantFiled: July 24, 2020Date of Patent: October 11, 2022Assignees: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Anirudh R. Acharya, Michael J. Mantor, Rex Eldon McCrary, Anthony Asaro, Jeffrey Gongxian Cheng, Mark Fowler
-
Publication number: 20220319091Abstract: A method and apparatus of tile rendering of an image for a display in a computer system includes receiving the image in a graphics pipeline of the computer system, the image comprising one or more three dimensional (3D) objects. The image is divided into one or more tiles. A depth test is performed on the one or more tiles, and based upon the depth test, visibility information of the one or more tiles is binned.Type: ApplicationFiled: December 27, 2021Publication date: October 6, 2022Applicant: Advanced Micro Devices, Inc.Inventors: Mika Tuomi, Ruijin Wu, Anirudh R. Acharya
-
Publication number: 20220309729Abstract: A method of tiled rendering is provided which comprises dividing a frame to be rendered, into a plurality of tiles, receiving commands to execute a plurality of subpasses of the tiles and interleaving execution of same subpasses of multiple tiles of the frame. Interleaving execution of same subpasses of multiple tiles comprises executing a previously ordered first subpass of a second tile between execution of the previously ordered first subpass of a first tile and execution of a subsequently ordered second subpass of the first tile. The interleaving is performed, for example, by executing the plurality of subpasses in an order different from the order in which the commands to execute the plurality of subpasses are stored and issued. Alternatively, interleaving is performed by executing one or more subpasses as skip operations such that the plurality of subpasses are executed in the same order.Type: ApplicationFiled: December 29, 2021Publication date: September 29, 2022Applicant: Advanced Micro Devices, Inc.Inventors: Ruijin Wu, Mika Tuomi, Paavo Sampo Ilmari Pessi, Anirudh R. Acharya
-
Patent number: 11442495Abstract: Systems and methods related to controlling clock signals for clocking shader engines modules (SEs) and non-shader-engine modules (nSEs) of a graphics processing unit (GPU) are provided. One or more dividers receive a clock signal CLK and output a clock signal CLKA to the SEs and output a clock signal CLKB to the nSEs. The frequencies of CLKA and CLKB are independently selected based on sets of performance counter data monitored at the SEs and nSEs, respectively. The clock signal frequency for either the SEs or the nSEs is reduced when the corresponding sets of performance counter data indicates a comparatively lower processing workload for the SEs or for the nSEs.Type: GrantFiled: September 25, 2020Date of Patent: September 13, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Ranjith Kumar Sajja, Sreekanth Godey, Anirudh R. Acharya
-
Publication number: 20220207827Abstract: Systems and methods for distributed rendering using two-level binning include processing primitives of a frame to be rendered at a first graphics processing unit (GPU) chiplet in a set of GPU chiplets to generate visibility information of primitives for each coarse bin and providing the visibility information to the other GPU chiplets in the set of GPU chiplets. Each coarse bin is then assigned to one of the GPU chiplets of the set of GPU chiplets and rendered at the assigned GPU chiplet based on the corresponding visibility information.Type: ApplicationFiled: December 27, 2021Publication date: June 30, 2022Inventors: Anirudh R. Acharya, Ruijin Wu
-
Publication number: 20220156874Abstract: Systems and methods related to priority-based and performance-based selection of a render mode, such as a two-level binning mode, in which to execute workloads with a graphics processing unit (GPU) of a system are provided. A user mode driver (UMD) or kernel mode driver (KMD) executed at a central processing unit (CPU) configures low and medium priority workloads to be executed in a two-level binning mode and selects a binning mode for high priority workloads based on whether performance heuristics indicate that one or more binning conditions or override conditions have been met. High priority workloads are maintained in a high priority queue, while low and medium priority workloads are maintained in a low/medium priority queue, such that execution of low and medium priority workloads at the GPU can be preempted in favor of executing high priority workloads.Type: ApplicationFiled: April 15, 2021Publication date: May 19, 2022Inventors: Anirudh R. ACHARYA, Ruijin WU, Young In YEO
-
Publication number: 20220050781Abstract: Techniques for prefetching are provided. The techniques include receiving a first prefetch command; in response to determining that a history buffer indicates that first information associated with the first prefetch command has not already been prefetched, prefetching the first information into a memory; receiving a second prefetch command; and in response to determining that the history buffer indicates that second information associated with the second prefetch command has already been prefetched, avoiding prefetching the second information into the memory.Type: ApplicationFiled: March 31, 2021Publication date: February 17, 2022Applicant: Advanced Micro Devices, Inc.Inventors: Anirudh R. Acharya, Alexander Fuad Ashkar
-
Publication number: 20220044350Abstract: Systems and methods related to run-time selection of a render mode in which to execute command buffers with a graphics processing unit (GPU) of a device based on performance data corresponding to the device are provided. A user mode driver (UMD) or kernel mode driver (KMD) executed at a central processing unit (CPU) selects abinning mode based on whether performance data that includes sensor data or performance counter data indicates that an associated binning condition or override condition has been met. The UMD or the KMD causes pending command buffers to be patched to execute in the selected binning mode based on whether the binning mode is enabled or disabled.Type: ApplicationFiled: September 25, 2020Publication date: February 10, 2022Inventors: Anirudh R. ACHARYA, Ruijin WU, Paul E. RUGGIERI
-
Publication number: 20220043653Abstract: A method of save-restore operations includes monitoring, by a power controller of a parallel processor (such as a graphics processing unit), of a register bus for one or more register write signals. The power controller determines that a register write signal is addressed to a state register that is designated to be saved prior to changing a power state of the parallel processor from a first state to a second state having a lower level of energy usage. The power controller instructs a copy of data corresponding to the state register to be written to a local memory module of the parallel processor. Subsequently, the parallel processor receives a power state change signal and writes state register data saved at the local memory module to an off-chip memory prior to changing the power state of the parallel processor.Type: ApplicationFiled: August 31, 2020Publication date: February 10, 2022Inventors: Anirudh R. ACHARYA, Alexander Fuad ASHKAR, Ashkan HOSSEINZADEH NAMIN