Patents by Inventor Nicolai Haehnle

Nicolai Haehnle has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240111578
    Abstract: A method for hierarchical work scheduling includes consuming a work item at a first scheduling domain having a local scheduler circuit and one or more workgroup processing elements. Consuming the work item produces a set of new work items. Subsequently, the local scheduler circuit distributes at least one new work item of the set of new work items to be executed locally at the first scheduling domain. If the local scheduler circuit of the first scheduling domain determines that the set of new work items includes one or more work items that would overload the first scheduling domain with work if scheduled for local execution, those work items are distributed to the next higher-level scheduler circuit in a scheduling domain hierarchy for redistribution to one or more other scheduling domains.
    Type: Application
    Filed: September 30, 2022
    Publication date: April 4, 2024
    Inventors: Matthaeus G. Chajdas, Christopher J. Brennan, Michael Mantor, Robert W. Martin, Nicolai Haehnle
  • Patent number: 11875197
    Abstract: Systems, apparatuses, and methods for managing a number of wavefronts permitted to concurrently execute in a processing system. An apparatus includes a register file with a plurality of registers and a plurality of compute units configured to execute wavefronts. A control unit of the apparatus is configured to allow a first number of wavefronts to execute concurrently on the plurality of compute units. The control unit is configured to allow no more than a second number of wavefronts to execute concurrently on the plurality of compute units, wherein the second number is less than the first number, in response to detection that thrashing of the register file is above a threshold. The control unit is configured to detect said thrashing based at least in part on a number of registers in use by executing wavefronts that spill to memory.
    Type: Grant
    Filed: December 29, 2020
    Date of Patent: January 16, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Bradford Michael Beckmann, Steven Tony Tye, Brian L. Sumner, Nicolai Hähnle
  • Publication number: 20230153149
    Abstract: Systems, apparatuses, and methods for dynamic graphics processing unit (GPU) register allocation are disclosed. A GPU includes at least a plurality of compute units (CUs), a control unit, and a plurality of registers for each CU. If a new wavefront requests more registers than are currently available on the CU, the control unit spills registers associated with stack frames at the bottom of a stack since they will not likely be used in the near future. The control unit has complete flexibility determining how many registers to spill based on dynamic demands and can prefetch the upcoming necessary fills without software involvement. Effectively, the control unit manages the physical register file as a cache. This allows younger workgroups to be dynamically descheduled so that older workgroups can allocate additional registers when needed to ensure improved fairness and better forward progress guarantees.
    Type: Application
    Filed: January 12, 2023
    Publication date: May 18, 2023
    Inventors: Bradford Michael Beckmann, Steven Tony Tye, Brian L. Sumner, Nicolai Hähnle
  • Patent number: 11579922
    Abstract: Systems, apparatuses, and methods for dynamic graphics processing unit (GPU) register allocation are disclosed. A GPU includes at least a plurality of compute units (CUs), a control unit, and a plurality of registers for each CU. If a new wavefront requests more registers than are currently available on the CU, the control unit spills registers associated with stack frames at the bottom of a stack since they will not likely be used in the near future. The control unit has complete flexibility determining how many registers to spill based on dynamic demands and can prefetch the upcoming necessary fills without software involvement. Effectively, the control unit manages the physical register file as a cache. This allows younger workgroups to be dynamically descheduled so that older workgroups can allocate additional registers when needed to ensure improved fairness and better forward progress guarantees.
    Type: Grant
    Filed: December 29, 2020
    Date of Patent: February 14, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Bradford Michael Beckmann, Steven Tony Tye, Brian L. Sumner, Nicolai Hähnle
  • Patent number: 11455153
    Abstract: A computing system includes a processor and a memory storing instructions for a compiler that, when executed by the processor, cause the processor to generate a control flow graph of program source code by receiving the program source code in the compiler, in the compiler, generating a structure point representation based on the received program source code by inserting into the program source code a set of structure points including an anchor structure point and a join structure point associated with the anchor structure point, and based on the structure point representation, generating the control flow graph including a plurality of blocks each representing a portion of the program source code. In the control flow graph, a block A between the anchor structure point and the join structure point post-dominates each of the one or more divergent branches between the anchor structure point and the join structure point.
    Type: Grant
    Filed: August 19, 2019
    Date of Patent: September 27, 2022
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Nicolai Haehnle
  • Publication number: 20220206841
    Abstract: Systems, apparatuses, and methods for dynamic graphics processing unit (GPU) register allocation are disclosed. A GPU includes at least a plurality of compute units (CUs), a control unit, and a plurality of registers for each CU. If a new wavefront requests more registers than are currently available on the CU, the control unit spills registers associated with stack frames at the bottom of a stack since they will not likely be used in the near future. The control unit has complete flexibility determining how many registers to spill based on dynamic demands and can prefetch the upcoming necessary fills without software involvement. Effectively, the control unit manages the physical register file as a cache. This allows younger workgroups to be dynamically descheduled so that older workgroups can allocate additional registers when needed to ensure improved fairness and better forward progress guarantees.
    Type: Application
    Filed: December 29, 2020
    Publication date: June 30, 2022
    Inventors: Bradford Michael Beckmann, Steven Tony Tye, Brian L. Sumner, Nicolai Hähnle
  • Publication number: 20220206876
    Abstract: Systems, apparatuses, and methods for managing a number of wavefronts permitted to concurrently execute in a processing system. An apparatus includes a register file with a plurality of registers and a plurality of compute units configured to execute wavefronts. A control unit of the apparatus is configured to allow a first number of wavefronts to execute concurrently on the plurality of compute units. The control unit is configured to allow no more than a second number of wavefronts to execute concurrently on the plurality of compute units, wherein the second number is less than the first number, in response to detection that thrashing of the register file is above a threshold.
    Type: Application
    Filed: December 29, 2020
    Publication date: June 30, 2022
    Inventors: Bradford Michael Beckmann, Steven Tony Tye, Brian L. Sumner, Nicolai Hähnle
  • Patent number: 10802806
    Abstract: A reconverging control flow graph is generated by receiving an input control flow graph including a plurality of basic code blocks, determining an order of the basic code blocks, and traversing the input control flow graph. The input control flow graph is traversed by, for each basic code block B of the plurality of basic code blocks, according to the determined order of the basic code blocks, visiting the basic code block B prior to visiting a subsequent block C of the plurality of basic code blocks, and based on determining that the basic code block B has a prior block A and that the prior block A has an open edge AC to the subsequent block C, in the reconverging control flow graph, creating an edge AF between the prior block A and a flow block F1, and creating an edge FC between the flow block F1 and the subsequent block C.
    Type: Grant
    Filed: March 29, 2019
    Date of Patent: October 13, 2020
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Nicolai Haehnle
  • Publication number: 20200310766
    Abstract: A reconverging control flow graph is generated by receiving an input control flow graph including a plurality of basic code blocks, determining an order of the basic code blocks, and traversing the input control flow graph. The input control flow graph is traversed by, for each basic code block B of the plurality of basic code blocks, according to the determined order of the basic code blocks, visiting the basic code block B prior to visiting a subsequent block C of the plurality of basic code blocks, and based on determining that the basic code block B has a prior block A and that the prior block A has an open edge AC to the subsequent block C, in the reconverging control flow graph, creating an edge AF between the prior block A and a flow block F1, and creating an edge FC between the flow block F1 and the subsequent block C.
    Type: Application
    Filed: March 29, 2019
    Publication date: October 1, 2020
    Inventor: Nicolai Haehnle
  • Publication number: 20200301681
    Abstract: A computing system includes a processor and a memory storing instructions for a compiler that, when executed by the processor, cause the processor to generate a control flow graph of program source code by receiving the program source code in the compiler, in the compiler, generating a structure point representation based on the received program source code by inserting into the program source code a set of structure points including an anchor structure point and a join structure point associated with the anchor structure point, and based on the structure point representation, generating the control flow graph including a plurality of blocks each representing a portion of the program source code. In the control flow graph, a block A between the anchor structure point and the join structure point post-dominates each of the one or more divergent branches between the anchor structure point and the join structure point.
    Type: Application
    Filed: August 19, 2019
    Publication date: September 24, 2020
    Inventor: Nicolai Haehnle
  • Patent number: 10146504
    Abstract: Systems, apparatuses, and methods for performing a division operation are disclosed. In one embodiment, a processor includes at least one arithmetic logic unit and a register file. In response to detecting a request to perform a division operation between a dividend and a divisor, the processor generates an initial approximation of the reciprocal of the divisor. Then, the processor converts the initial approximation of the reciprocal of the divisor into a fractional fixed point representation. The processor also introduces a small error into the initial approximation of the reciprocal of the divisor. Then, the processor implements one or more Newton-Raphson iterations for refining the approximation of the reciprocal and then multiplies the final reciprocal value by the dividend to generate the quotient.
    Type: Grant
    Filed: February 24, 2017
    Date of Patent: December 4, 2018
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Nicolai Hähnle
  • Publication number: 20180246700
    Abstract: Systems, apparatuses, and methods for performing a division operation are disclosed. In one embodiment, a processor includes at least one arithmetic logic unit and a register file. In response to detecting a request to perform a division operation between a dividend and a divisor, the processor generates an initial approximation of the reciprocal of the divisor. Then, the processor converts the initial approximation of the reciprocal of the divisor into a fractional fixed point representation. The processor also introduces a small error into the initial approximation of the reciprocal of the divisor. Then, the processor implements one or more Newton-Raphson iterations for refining the approximation of the reciprocal and then multiplies the final reciprocal value by the dividend to generate the quotient.
    Type: Application
    Filed: February 24, 2017
    Publication date: August 30, 2018
    Inventor: Nicolai Hähnle