Patents by Inventor Nicolai Hähnle

Nicolai Hähnle has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

HIERARCHICAL WORK SCHEDULING

Publication number: 20250068464

Abstract: A method for hierarchical work scheduling includes consuming a work item at a first scheduling domain having a local scheduler circuit and one or more workgroup processing elements. Consuming the work item produces a set of new work items. Subsequently, the local scheduler circuit distributes at least one new work item of the set of new work items to be executed locally at the first scheduling domain. If the local scheduler circuit of the first scheduling domain determines that the set of new work items includes one or more work items that would overload the first scheduling domain with work if scheduled for local execution, those work items are distributed to the next higher-level scheduler circuit in a scheduling domain hierarchy for redistribution to one or more other scheduling domains.

Type: Application

Filed: November 8, 2024

Publication date: February 27, 2025

Inventors: Matthaeus G. Chajdas, Christopher J. Brennan, Michael Mantor, Robert W. Martin, Nicolai Haehnle
Hierarchical work scheduling

Patent number: 12153957

Abstract: A method for hierarchical work scheduling includes consuming a work item at a first scheduling domain having a local scheduler circuit and one or more workgroup processing elements. Consuming the work item produces a set of new work items. Subsequently, the local scheduler circuit distributes at least one new work item of the set of new work items to be executed locally at the first scheduling domain. If the local scheduler circuit of the first scheduling domain determines that the set of new work items includes one or more work items that would overload the first scheduling domain with work if scheduled for local execution, those work items are distributed to the next higher-level scheduler circuit in a scheduling domain hierarchy for redistribution to one or more other scheduling domains.

Type: Grant

Filed: September 30, 2022

Date of Patent: November 26, 2024

Assignee: Advanced Micro Devices, Inc.

Inventors: Matthaeus G. Chajdas, Christopher J. Brennan, Michael Mantor, Robert W. Martin, Nicolai Haehnle
SOFTWARE-DEFINED COMPUTE UNIT RESOURCE ALLOCATION MODE

Publication number: 20240311199

Abstract: A program code executing on a processing system includes one or more instructions each identifying a workload that includes a plurality of waves and each identifying resource allocations for the plurality of waves of the workgroup. In response to receiving an instruction identifying a workload and resource allocations for the plurality of waves of the workgroup, a processor allocates a first set of processing resources to a compute unit of the processor based on the resource allocations for the plurality of waves. The compute unit then performs operations for the workgroup using the allocated set of processing resources.

Type: Application

Filed: March 13, 2023

Publication date: September 19, 2024

Inventors: Nicolai Haehnle, Mark Leather, Brian Emberling, Michael John Bedy, Daniel Schneider
HIERARCHICAL WORK SCHEDULING

Publication number: 20240111578

Abstract: A method for hierarchical work scheduling includes consuming a work item at a first scheduling domain having a local scheduler circuit and one or more workgroup processing elements. Consuming the work item produces a set of new work items. Subsequently, the local scheduler circuit distributes at least one new work item of the set of new work items to be executed locally at the first scheduling domain. If the local scheduler circuit of the first scheduling domain determines that the set of new work items includes one or more work items that would overload the first scheduling domain with work if scheduled for local execution, those work items are distributed to the next higher-level scheduler circuit in a scheduling domain hierarchy for redistribution to one or more other scheduling domains.

Type: Application

Filed: September 30, 2022

Publication date: April 4, 2024

Inventors: Matthaeus G. Chajdas, Christopher J. Brennan, Michael Mantor, Robert W. Martin, Nicolai Haehnle
Management of thrashing in a GPU

Patent number: 11875197

Abstract: Systems, apparatuses, and methods for managing a number of wavefronts permitted to concurrently execute in a processing system. An apparatus includes a register file with a plurality of registers and a plurality of compute units configured to execute wavefronts. A control unit of the apparatus is configured to allow a first number of wavefronts to execute concurrently on the plurality of compute units. The control unit is configured to allow no more than a second number of wavefronts to execute concurrently on the plurality of compute units, wherein the second number is less than the first number, in response to detection that thrashing of the register file is above a threshold. The control unit is configured to detect said thrashing based at least in part on a number of registers in use by executing wavefronts that spill to memory.

Type: Grant

Filed: December 29, 2020

Date of Patent: January 16, 2024

Assignee: Advanced Micro Devices, Inc.

Inventors: Bradford Michael Beckmann, Steven Tony Tye, Brian L. Sumner, Nicolai Hähnle
DYNAMIC GRAPHICAL PROCESSING UNIT REGISTER ALLOCATION

Publication number: 20230153149

Abstract: Systems, apparatuses, and methods for dynamic graphics processing unit (GPU) register allocation are disclosed. A GPU includes at least a plurality of compute units (CUs), a control unit, and a plurality of registers for each CU. If a new wavefront requests more registers than are currently available on the CU, the control unit spills registers associated with stack frames at the bottom of a stack since they will not likely be used in the near future. The control unit has complete flexibility determining how many registers to spill based on dynamic demands and can prefetch the upcoming necessary fills without software involvement. Effectively, the control unit manages the physical register file as a cache. This allows younger workgroups to be dynamically descheduled so that older workgroups can allocate additional registers when needed to ensure improved fairness and better forward progress guarantees.

Type: Application

Filed: January 12, 2023

Publication date: May 18, 2023

Inventors: Bradford Michael Beckmann, Steven Tony Tye, Brian L. Sumner, Nicolai Hähnle
Dynamic graphical processing unit register allocation

Patent number: 11579922

Abstract: Systems, apparatuses, and methods for dynamic graphics processing unit (GPU) register allocation are disclosed. A GPU includes at least a plurality of compute units (CUs), a control unit, and a plurality of registers for each CU. If a new wavefront requests more registers than are currently available on the CU, the control unit spills registers associated with stack frames at the bottom of a stack since they will not likely be used in the near future. The control unit has complete flexibility determining how many registers to spill based on dynamic demands and can prefetch the upcoming necessary fills without software involvement. Effectively, the control unit manages the physical register file as a cache. This allows younger workgroups to be dynamically descheduled so that older workgroups can allocate additional registers when needed to ensure improved fairness and better forward progress guarantees.

Type: Grant

Filed: December 29, 2020

Date of Patent: February 14, 2023

Assignee: Advanced Micro Devices, Inc.

Inventors: Bradford Michael Beckmann, Steven Tony Tye, Brian L. Sumner, Nicolai Hähnle
Dynamic instances semantics

Patent number: 11455153

Abstract: A computing system includes a processor and a memory storing instructions for a compiler that, when executed by the processor, cause the processor to generate a control flow graph of program source code by receiving the program source code in the compiler, in the compiler, generating a structure point representation based on the received program source code by inserting into the program source code a set of structure points including an anchor structure point and a join structure point associated with the anchor structure point, and based on the structure point representation, generating the control flow graph including a plurality of blocks each representing a portion of the program source code. In the control flow graph, a block A between the anchor structure point and the join structure point post-dominates each of the one or more divergent branches between the anchor structure point and the join structure point.

Type: Grant

Filed: August 19, 2019

Date of Patent: September 27, 2022

Assignee: Advanced Micro Devices, Inc.

Inventor: Nicolai Haehnle
DYNAMIC GRAPHICAL PROCESSING UNIT REGISTER ALLOCATION

Publication number: 20220206841

Abstract: Systems, apparatuses, and methods for dynamic graphics processing unit (GPU) register allocation are disclosed. A GPU includes at least a plurality of compute units (CUs), a control unit, and a plurality of registers for each CU. If a new wavefront requests more registers than are currently available on the CU, the control unit spills registers associated with stack frames at the bottom of a stack since they will not likely be used in the near future. The control unit has complete flexibility determining how many registers to spill based on dynamic demands and can prefetch the upcoming necessary fills without software involvement. Effectively, the control unit manages the physical register file as a cache. This allows younger workgroups to be dynamically descheduled so that older workgroups can allocate additional registers when needed to ensure improved fairness and better forward progress guarantees.

Type: Application

Filed: December 29, 2020

Publication date: June 30, 2022

Inventors: Bradford Michael Beckmann, Steven Tony Tye, Brian L. Sumner, Nicolai Hähnle
Management of Thrashing in a GPU

Publication number: 20220206876

Abstract: Systems, apparatuses, and methods for managing a number of wavefronts permitted to concurrently execute in a processing system. An apparatus includes a register file with a plurality of registers and a plurality of compute units configured to execute wavefronts. A control unit of the apparatus is configured to allow a first number of wavefronts to execute concurrently on the plurality of compute units. The control unit is configured to allow no more than a second number of wavefronts to execute concurrently on the plurality of compute units, wherein the second number is less than the first number, in response to detection that thrashing of the register file is above a threshold.

Type: Application

Filed: December 29, 2020

Publication date: June 30, 2022

Inventors: Bradford Michael Beckmann, Steven Tony Tye, Brian L. Sumner, Nicolai Hähnle
Generating vectorized control flow using reconverging control flow graphs

Patent number: 10802806

Abstract: A reconverging control flow graph is generated by receiving an input control flow graph including a plurality of basic code blocks, determining an order of the basic code blocks, and traversing the input control flow graph. The input control flow graph is traversed by, for each basic code block B of the plurality of basic code blocks, according to the determined order of the basic code blocks, visiting the basic code block B prior to visiting a subsequent block C of the plurality of basic code blocks, and based on determining that the basic code block B has a prior block A and that the prior block A has an open edge AC to the subsequent block C, in the reconverging control flow graph, creating an edge AF between the prior block A and a flow block F1, and creating an edge FC between the flow block F1 and the subsequent block C.

Type: Grant

Filed: March 29, 2019

Date of Patent: October 13, 2020

Assignee: Advanced Micro Devices, Inc.

Inventor: Nicolai Haehnle
GENERATING VECTORIZED CONTROL FLOW USING RECONVERGING CONTROL FLOW GRAPHS

Publication number: 20200310766

Abstract: A reconverging control flow graph is generated by receiving an input control flow graph including a plurality of basic code blocks, determining an order of the basic code blocks, and traversing the input control flow graph. The input control flow graph is traversed by, for each basic code block B of the plurality of basic code blocks, according to the determined order of the basic code blocks, visiting the basic code block B prior to visiting a subsequent block C of the plurality of basic code blocks, and based on determining that the basic code block B has a prior block A and that the prior block A has an open edge AC to the subsequent block C, in the reconverging control flow graph, creating an edge AF between the prior block A and a flow block F1, and creating an edge FC between the flow block F1 and the subsequent block C.

Type: Application

Filed: March 29, 2019

Publication date: October 1, 2020

Inventor: Nicolai Haehnle
DYNAMIC INSTANCES SEMANTICS

Publication number: 20200301681

Abstract: A computing system includes a processor and a memory storing instructions for a compiler that, when executed by the processor, cause the processor to generate a control flow graph of program source code by receiving the program source code in the compiler, in the compiler, generating a structure point representation based on the received program source code by inserting into the program source code a set of structure points including an anchor structure point and a join structure point associated with the anchor structure point, and based on the structure point representation, generating the control flow graph including a plurality of blocks each representing a portion of the program source code. In the control flow graph, a block A between the anchor structure point and the join structure point post-dominates each of the one or more divergent branches between the anchor structure point and the join structure point.

Type: Application

Filed: August 19, 2019

Publication date: September 24, 2020

Inventor: Nicolai Haehnle
Division using the Newton-Raphson method

Patent number: 10146504

Abstract: Systems, apparatuses, and methods for performing a division operation are disclosed. In one embodiment, a processor includes at least one arithmetic logic unit and a register file. In response to detecting a request to perform a division operation between a dividend and a divisor, the processor generates an initial approximation of the reciprocal of the divisor. Then, the processor converts the initial approximation of the reciprocal of the divisor into a fractional fixed point representation. The processor also introduces a small error into the initial approximation of the reciprocal of the divisor. Then, the processor implements one or more Newton-Raphson iterations for refining the approximation of the reciprocal and then multiplies the final reciprocal value by the dividend to generate the quotient.

Type: Grant

Filed: February 24, 2017

Date of Patent: December 4, 2018

Assignee: Advanced Micro Devices, Inc.

Inventor: Nicolai Hähnle
DIVISION USING THE NEWTON-RAPHSON METHOD

Publication number: 20180246700

Abstract: Systems, apparatuses, and methods for performing a division operation are disclosed. In one embodiment, a processor includes at least one arithmetic logic unit and a register file. In response to detecting a request to perform a division operation between a dividend and a divisor, the processor generates an initial approximation of the reciprocal of the divisor. Then, the processor converts the initial approximation of the reciprocal of the divisor into a fractional fixed point representation. The processor also introduces a small error into the initial approximation of the reciprocal of the divisor. Then, the processor implements one or more Newton-Raphson iterations for refining the approximation of the reciprocal and then multiplies the final reciprocal value by the dividend to generate the quotient.

Type: Application

Filed: February 24, 2017

Publication date: August 30, 2018

Inventor: Nicolai Hähnle