Patents by Inventor Bradford Michael Beckmann

Bradford Michael Beckmann has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Executing Kernel Workgroups Across Multiple Compute Unit Types

Publication number: 20240111591

Abstract: Portions of programs, oftentimes referred to as kernels, are written by programmers to target a particular type of compute unit, such as a central processing unit (CPU) core or a graphics processing unit (GPU) core. When executing a kernel, the kernel is separated into multiple parts referred to as workgroups, and each workgroup is provided to a compute unit for execution. Usage of one type of compute unit is monitored and, in response to the one type of compute unit being idle, one or more workgroups targeting another type of compute unit are executed on the one type of compute unit. For example, usage of CPU cores is monitored, and in response to the CPU cores being idle, one or more workgroups targeting GPU cores are executed on the CPU cores.

Type: Application

Filed: September 30, 2022

Publication date: April 4, 2024

Applicant: Advanced Micro Devices, Inc.

Inventors: Bradford Michael Beckmann, Sooraj Puthoor
Management of thrashing in a GPU

Patent number: 11875197

Abstract: Systems, apparatuses, and methods for managing a number of wavefronts permitted to concurrently execute in a processing system. An apparatus includes a register file with a plurality of registers and a plurality of compute units configured to execute wavefronts. A control unit of the apparatus is configured to allow a first number of wavefronts to execute concurrently on the plurality of compute units. The control unit is configured to allow no more than a second number of wavefronts to execute concurrently on the plurality of compute units, wherein the second number is less than the first number, in response to detection that thrashing of the register file is above a threshold. The control unit is configured to detect said thrashing based at least in part on a number of registers in use by executing wavefronts that spill to memory.

Type: Grant

Filed: December 29, 2020

Date of Patent: January 16, 2024

Assignee: Advanced Micro Devices, Inc.

Inventors: Bradford Michael Beckmann, Steven Tony Tye, Brian L. Sumner, Nicolai Hähnle
DYNAMIC REGISTER RENAMING IN HARDWARE TO REDUCE BANK CONFLICTS IN PARALLEL PROCESSOR ARCHITECTURES

Publication number: 20230315536

Abstract: To reduce inter- and intra-instruction register bank access conflicts in parallel processors, a processing system includes a remapping circuit to dynamically remap virtual registers to physical registers of a parallel processor during execution of a wavefront. The remapping circuit remaps virtual registers to physical registers at a register mapping table that holds the current set of virtual to physical register mappings based on a list of available registers indicating which physical registers are available for a new mapping and a register mapping policy.

Type: Application

Filed: March 30, 2022

Publication date: October 5, 2023

Inventors: Mark Wyse, Bradford Michael Beckmann, John Kalamatianos, Anthony Thomas Gutierrez
DYNAMIC GRAPHICAL PROCESSING UNIT REGISTER ALLOCATION

Publication number: 20230153149

Abstract: Systems, apparatuses, and methods for dynamic graphics processing unit (GPU) register allocation are disclosed. A GPU includes at least a plurality of compute units (CUs), a control unit, and a plurality of registers for each CU. If a new wavefront requests more registers than are currently available on the CU, the control unit spills registers associated with stack frames at the bottom of a stack since they will not likely be used in the near future. The control unit has complete flexibility determining how many registers to spill based on dynamic demands and can prefetch the upcoming necessary fills without software involvement. Effectively, the control unit manages the physical register file as a cache. This allows younger workgroups to be dynamically descheduled so that older workgroups can allocate additional registers when needed to ensure improved fairness and better forward progress guarantees.

Type: Application

Filed: January 12, 2023

Publication date: May 18, 2023

Inventors: Bradford Michael Beckmann, Steven Tony Tye, Brian L. Sumner, Nicolai Hähnle
RESOURCE-AWARE COMPRESSION

Publication number: 20230110376

Abstract: Systems, apparatuses, and methods for implementing a multi-tiered approach to cache compression are disclosed. A cache includes a cache controller, light compressor, and heavy compressor. The decision on which compressor to use for compressing cache lines is made based on certain resource availability such as cache capacity or memory bandwidth. This allows the cache to opportunistically use complex algorithms for compression while limiting the adverse effects of high decompression latency on system performance. To address the above issue, the proposed design takes advantage of the heavy compressors for effectively reducing memory bandwidth in high bandwidth memory (HBM) interfaces as long as they do not sacrifice system performance. Accordingly, the cache combines light and heavy compressors with a decision-making unit to achieve reduced off-chip memory traffic without sacrificing system performance.

Type: Application

Filed: November 23, 2022

Publication date: April 13, 2023

Inventors: SeyedMohammad SeyedzadehDelcheh, Shomit N. Das, Bradford Michael Beckmann
Dynamic graphical processing unit register allocation

Patent number: 11579922

Abstract: Systems, apparatuses, and methods for dynamic graphics processing unit (GPU) register allocation are disclosed. A GPU includes at least a plurality of compute units (CUs), a control unit, and a plurality of registers for each CU. If a new wavefront requests more registers than are currently available on the CU, the control unit spills registers associated with stack frames at the bottom of a stack since they will not likely be used in the near future. The control unit has complete flexibility determining how many registers to spill based on dynamic demands and can prefetch the upcoming necessary fills without software involvement. Effectively, the control unit manages the physical register file as a cache. This allows younger workgroups to be dynamically descheduled so that older workgroups can allocate additional registers when needed to ensure improved fairness and better forward progress guarantees.

Type: Grant

Filed: December 29, 2020

Date of Patent: February 14, 2023

Assignee: Advanced Micro Devices, Inc.

Inventors: Bradford Michael Beckmann, Steven Tony Tye, Brian L. Sumner, Nicolai Hähnle
Continuation analysis tasks for GPU task scheduling

Patent number: 11544106

Abstract: Systems, apparatuses, and methods for implementing continuation analysis tasks (CATs) are disclosed. In one embodiment, a system implements hardware acceleration of CATs to manage the dependencies and scheduling of an application composed of multiple tasks. In one embodiment, a continuation packet is referenced directly by a first task. When the first task completes, the first task enqueues a continuation packet on a first queue. The first task can specify on which queue to place the continuation packet. The agent responsible for the first queue dequeues and executes the continuation packet which invokes an analysis phase which is performed prior to determining which dependent tasks to enqueue. If it is determined during the analysis phase that a second task is now ready to be launched, the second task is enqueued on one of the queues. Then, an agent responsible for this queue dequeues and executes the second task.

Type: Grant

Filed: April 13, 2020

Date of Patent: January 3, 2023

Assignee: Advanced Micro Devices, Inc.

Inventors: Steven Tony Tye, Brian L. Sumner, Bradford Michael Beckmann, Sooraj Puthoor
Resource-aware compression

Patent number: 11544196

Abstract: Systems, apparatuses, and methods for implementing a multi-tiered approach to cache compression are disclosed. A cache includes a cache controller, light compressor, and heavy compressor. The decision on which compressor to use for compressing cache lines is made based on certain resource availability such as cache capacity or memory bandwidth. This allows the cache to opportunistically use complex algorithms for compression while limiting the adverse effects of high decompression latency on system performance. To address the above issue, the proposed design takes advantage of the heavy compressors for effectively reducing memory bandwidth in high bandwidth memory (HBM) interfaces as long as they do not sacrifice system performance. Accordingly, the cache combines light and heavy compressors with a decision-making unit to achieve reduced off-chip memory traffic without sacrificing system performance.

Type: Grant

Filed: December 23, 2019

Date of Patent: January 3, 2023

Assignee: Advanced Micro Devices, Inc.

Inventors: SeyedMohammad SeyedzadehDelcheh, Shomit N. Das, Bradford Michael Beckmann
Memory request priority assignment techniques for parallel processors

Patent number: 11507522

Abstract: Systems, apparatuses, and methods for implementing memory request priority assignment techniques for parallel processors are disclosed. A system includes at least a parallel processor coupled to a memory subsystem, where the parallel processor includes at least a plurality of compute units for executing wavefronts in lock-step. The parallel processor assigns priorities to memory requests of wavefronts on a per-work-item basis by indexing into a first priority vector, with the index generated based on lane-specific information. If a given event is detected, a second priority vector is generated by applying a given priority promotion vector to the first priority vector. Then, for subsequent wavefronts, memory requests are assigned priorities by indexing into the second priority vector with lane-specific information. The use of priority vectors to assign priorities to memory requests helps to reduce the memory divergence problem experienced by different work-items of a wavefront.

Type: Grant

Filed: December 6, 2019

Date of Patent: November 22, 2022

Assignee: Advanced Micro Devices, Inc.

Inventors: Sooraj Puthoor, Kishore Punniyamurthy, Onur Kayiran, Xianwei Zhang, Yasuko Eckert, Johnathan Alsop, Bradford Michael Beckmann
Management of Thrashing in a GPU

Publication number: 20220206876

Abstract: Systems, apparatuses, and methods for managing a number of wavefronts permitted to concurrently execute in a processing system. An apparatus includes a register file with a plurality of registers and a plurality of compute units configured to execute wavefronts. A control unit of the apparatus is configured to allow a first number of wavefronts to execute concurrently on the plurality of compute units. The control unit is configured to allow no more than a second number of wavefronts to execute concurrently on the plurality of compute units, wherein the second number is less than the first number, in response to detection that thrashing of the register file is above a threshold.

Type: Application

Filed: December 29, 2020

Publication date: June 30, 2022

Inventors: Bradford Michael Beckmann, Steven Tony Tye, Brian L. Sumner, Nicolai Hähnle
DYNAMIC GRAPHICAL PROCESSING UNIT REGISTER ALLOCATION

Publication number: 20220206841

Abstract: Systems, apparatuses, and methods for dynamic graphics processing unit (GPU) register allocation are disclosed. A GPU includes at least a plurality of compute units (CUs), a control unit, and a plurality of registers for each CU. If a new wavefront requests more registers than are currently available on the CU, the control unit spills registers associated with stack frames at the bottom of a stack since they will not likely be used in the near future. The control unit has complete flexibility determining how many registers to spill based on dynamic demands and can prefetch the upcoming necessary fills without software involvement. Effectively, the control unit manages the physical register file as a cache. This allows younger workgroups to be dynamically descheduled so that older workgroups can allocate additional registers when needed to ensure improved fairness and better forward progress guarantees.

Type: Application

Filed: December 29, 2020

Publication date: June 30, 2022

Inventors: Bradford Michael Beckmann, Steven Tony Tye, Brian L. Sumner, Nicolai Hähnle
RESOURCE-AWARE COMPRESSION

Publication number: 20210191869

Abstract: Systems, apparatuses, and methods for implementing a multi-tiered approach to cache compression are disclosed. A cache includes a cache controller, light compressor, and heavy compressor. The decision on which compressor to use for compressing cache lines is made based on certain resource availability such as cache capacity or memory bandwidth. This allows the cache to opportunistically use complex algorithms for compression while limiting the adverse effects of high decompression latency on system performance. To address the above issue, the proposed design takes advantage of the heavy compressors for effectively reducing memory bandwidth in high bandwidth memory (HBM) interfaces as long as they do not sacrifice system performance. Accordingly, the cache combines light and heavy compressors with a decision-making unit to achieve reduced off-chip memory traffic without sacrificing system performance.

Type: Application

Filed: December 23, 2019

Publication date: June 24, 2021

Inventors: SeyedMohammad SeyedzadehDelcheh, Shomit N. Das, Bradford Michael Beckmann
MEMORY REQUEST PRIORITY ASSIGNMENT TECHNIQUES FOR PARALLEL PROCESSORS

Publication number: 20210173796

Abstract: Systems, apparatuses, and methods for implementing memory request priority assignment techniques for parallel processors are disclosed. A system includes at least a parallel processor coupled to a memory subsystem, where the parallel processor includes at least a plurality of compute units for executing wavefronts in lock-step. The parallel processor assigns priorities to memory requests of wavefronts on a per-work-item basis by indexing into a first priority vector, with the index generated based on lane-specific information. If a given event is detected, a second priority vector is generated by applying a given priority promotion vector to the first priority vector. Then, for subsequent wavefronts, memory requests are assigned priorities by indexing into the second priority vector with lane-specific information. The use of priority vectors to assign priorities to memory requests helps to reduce the memory divergence problem experienced by different work-items of a wavefront.

Type: Application

Filed: December 6, 2019

Publication date: June 10, 2021

Inventors: Sooraj Puthoor, Kishore Punniyamurthy, Onur Kayiran, Xianwei Zhang, Yasuko Eckert, Johnathan Alsop, Bradford Michael Beckmann
CONTINUATION ANALYSIS TASKS FOR GPU TASK SCHEDULING

Publication number: 20200379802

Abstract: Systems, apparatuses, and methods for implementing continuation analysis tasks (CATs) are disclosed. In one embodiment, a system implements hardware acceleration of CATs to manage the dependencies and scheduling of an application composed of multiple tasks. In one embodiment, a continuation packet is referenced directly by a first task. When the first task completes, the first task enqueues a continuation packet on a first queue. The first task can specify on which queue to place the continuation packet. The agent responsible for the first queue dequeues and executes the continuation packet which invokes an analysis phase which is performed prior to determining which dependent tasks to enqueue. If it is determined during the analysis phase that a second task is now ready to be launched, the second task is enqueued on one of the queues. Then, an agent responsible for this queue dequeues and executes the second task.

Type: Application

Filed: April 13, 2020

Publication date: December 3, 2020

Inventors: Steven Tony Tye, Brian L. Sumner, Bradford Michael Beckmann, Sooraj Puthoor
Continuation analysis tasks for GPU task scheduling

Patent number: 10620994

Abstract: Systems, apparatuses, and methods for implementing continuation analysis tasks (CATs) are disclosed. In one embodiment, a system implements hardware acceleration of CATs to manage the dependencies and scheduling of an application composed of multiple tasks. In one embodiment, a continuation packet is referenced directly by a first task. When the first task completes, the first task enqueues a continuation packet on a first queue. The first task can specify on which queue to place the continuation packet. The agent responsible for the first queue dequeues and executes the continuation packet which invokes an analysis phase which is performed prior to determining which dependent tasks to enqueue. If it is determined during the analysis phase that a second task is now ready to be launched, the second task is enqueued on one of the queues. Then, an agent responsible for this queue dequeues and executes the second task.

Type: Grant

Filed: May 30, 2017

Date of Patent: April 14, 2020

Assignee: Advanced Micro Devices, Inc.

Inventors: Steven Tony Tye, Brian L. Sumner, Bradford Michael Beckmann, Sooraj Puthoor
MULTI-KERNEL WAVEFRONT SCHEDULER

Publication number: 20190370059

Abstract: Systems, apparatuses, and methods for implementing a multi-kernel wavefront scheduler are disclosed. A system includes at least a parallel processor coupled to one or more memories, wherein the parallel processor includes a command processor and a plurality of compute units. The command processor launches multiple kernels for execution on the compute units. Each compute unit includes a multi-level scheduler for scheduling wavefronts from multiple kernels for execution on its execution units. A first level scheduler creates scheduling groups by grouping together wavefronts based on the priority of their kernels. Accordingly, wavefronts from kernels with the same priority are grouped together in the same scheduling group by the first level scheduler. Next, the first level scheduler selects, from a plurality of scheduling groups, the highest priority scheduling group for execution. Then, a second level scheduler schedules wavefronts for execution from the scheduling group selected by the first level scheduler.

Type: Application

Filed: May 30, 2018

Publication date: December 5, 2019

Inventors: Sooraj Puthoor, Joseph Gross, Xulong Tang, Bradford Michael Beckmann
FEEDBACK GUIDED SPLIT WORKGROUP DISPATCH FOR GPUS

Publication number: 20190332420

Abstract: Systems, apparatuses, and methods for performing split-workgroup dispatch to multiple compute units are disclosed. A system includes at least a plurality of compute units, control logic, and a dispatch unit. The control logic monitors resource contention among the plurality of compute units and calculates a load-rating for each compute unit based on the resource contention. The dispatch unit receives workgroups for dispatch and determines how to dispatch workgroups to the plurality of compute units based on the calculated load-ratings. If a workgroup is unable to fit in a single compute unit based on the currently available resources of the compute units, the dispatch unit divides the workgroup into its individual wavefronts and dispatches wavefronts of the workgroup to different compute units. The dispatch unit determines how to dispatch the wavefronts to specific ones of the compute units based on the calculated load-ratings.

Type: Application

Filed: April 27, 2018

Publication date: October 31, 2019

Inventors: Yash Sanjeev Ukidave, John Kalamatianos, Bradford Michael Beckmann
Method and Apparatus for Compiler Driven Bank Conflict Avoidance

Publication number: 20190187964

Abstract: Systems, apparatuses, and methods for converting computer program source code from a first high level language to a functionally equivalent executable program code. Source code in a first high level language is analyzed by a code compilation tool. In response to identifying a potential bank conflict in a multi-bank register file, operands of one or more instructions are remapped such that they map to different physical banks of the multi-bank register file. Identifying a potential bank conflict comprises one or more of identifying an intra-instruction bank conflict, an inter-instruction bank conflict, and identifying a multi-word operand with a potential bank conflict.

Type: Application

Filed: December 20, 2017

Publication date: June 20, 2019

Inventors: Mark U. Wyse, Bradford Michael Beckmann, John Kalamatianos, Anthony Thomas Gutierrez
CONTINUATION ANALYSIS TASKS FOR GPU TASK SCHEDULING

Publication number: 20180349145

Abstract: Systems, apparatuses, and methods for implementing continuation analysis tasks (CATs) are disclosed. In one embodiment, a system implements hardware acceleration of CATs to manage the dependencies and scheduling of an application composed of multiple tasks. In one embodiment, a continuation packet is referenced directly by a first task. When the first task completes, the first task enqueues a continuation packet on a first queue. The first task can specify on which queue to place the continuation packet. The agent responsible for the first queue dequeues and executes the continuation packet which invokes an analysis phase which is performed prior to determining which dependent tasks to enqueue. If it is determined during the analysis phase that a second task is now ready to be launched, the second task is enqueued on one of the queues. Then, an agent responsible for this queue dequeues and executes the second task.

Type: Application

Filed: May 30, 2017

Publication date: December 6, 2018

Inventors: Steven Tony Tye, Brian L. Sumner, Bradford Michael Beckmann, Sooraj Puthoor