Patents by Inventor Steven Tony Tye

Steven Tony Tye has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11875197
    Abstract: Systems, apparatuses, and methods for managing a number of wavefronts permitted to concurrently execute in a processing system. An apparatus includes a register file with a plurality of registers and a plurality of compute units configured to execute wavefronts. A control unit of the apparatus is configured to allow a first number of wavefronts to execute concurrently on the plurality of compute units. The control unit is configured to allow no more than a second number of wavefronts to execute concurrently on the plurality of compute units, wherein the second number is less than the first number, in response to detection that thrashing of the register file is above a threshold. The control unit is configured to detect said thrashing based at least in part on a number of registers in use by executing wavefronts that spill to memory.
    Type: Grant
    Filed: December 29, 2020
    Date of Patent: January 16, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Bradford Michael Beckmann, Steven Tony Tye, Brian L. Sumner, Nicolai Hähnle
  • Publication number: 20230153149
    Abstract: Systems, apparatuses, and methods for dynamic graphics processing unit (GPU) register allocation are disclosed. A GPU includes at least a plurality of compute units (CUs), a control unit, and a plurality of registers for each CU. If a new wavefront requests more registers than are currently available on the CU, the control unit spills registers associated with stack frames at the bottom of a stack since they will not likely be used in the near future. The control unit has complete flexibility determining how many registers to spill based on dynamic demands and can prefetch the upcoming necessary fills without software involvement. Effectively, the control unit manages the physical register file as a cache. This allows younger workgroups to be dynamically descheduled so that older workgroups can allocate additional registers when needed to ensure improved fairness and better forward progress guarantees.
    Type: Application
    Filed: January 12, 2023
    Publication date: May 18, 2023
    Inventors: Bradford Michael Beckmann, Steven Tony Tye, Brian L. Sumner, Nicolai Hähnle
  • Patent number: 11604737
    Abstract: A processing device determines a scope indicating at least a portion of the processing system and target data from atomic memory operation to be performed. Based on the scope, the processing device determines one or more hardware parameters for at least a portion of the processing system. The processing device then compares the hardware parameters to the scope and target data to determine one or more corrections. The processing device then provides the scope, target data, hardware parameters, and corrections to a plurality of hardware lookup tables. The hardware lookup tables are configured to receive the scope, target data, hardware parameters, and corrections as inputs and output values indicating one or more coherency actions and one or more orderings. The processing device then executes one or more of the indicated coherency actions and the atomic memory operation based on the indicated ordering.
    Type: Grant
    Filed: November 2, 2021
    Date of Patent: March 14, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Joseph L. Greathouse, Steven Tony Tye, Mark Fowler, Milind N. Nemlekar
  • Patent number: 11579922
    Abstract: Systems, apparatuses, and methods for dynamic graphics processing unit (GPU) register allocation are disclosed. A GPU includes at least a plurality of compute units (CUs), a control unit, and a plurality of registers for each CU. If a new wavefront requests more registers than are currently available on the CU, the control unit spills registers associated with stack frames at the bottom of a stack since they will not likely be used in the near future. The control unit has complete flexibility determining how many registers to spill based on dynamic demands and can prefetch the upcoming necessary fills without software involvement. Effectively, the control unit manages the physical register file as a cache. This allows younger workgroups to be dynamically descheduled so that older workgroups can allocate additional registers when needed to ensure improved fairness and better forward progress guarantees.
    Type: Grant
    Filed: December 29, 2020
    Date of Patent: February 14, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Bradford Michael Beckmann, Steven Tony Tye, Brian L. Sumner, Nicolai Hähnle
  • Patent number: 11544106
    Abstract: Systems, apparatuses, and methods for implementing continuation analysis tasks (CATs) are disclosed. In one embodiment, a system implements hardware acceleration of CATs to manage the dependencies and scheduling of an application composed of multiple tasks. In one embodiment, a continuation packet is referenced directly by a first task. When the first task completes, the first task enqueues a continuation packet on a first queue. The first task can specify on which queue to place the continuation packet. The agent responsible for the first queue dequeues and executes the continuation packet which invokes an analysis phase which is performed prior to determining which dependent tasks to enqueue. If it is determined during the analysis phase that a second task is now ready to be launched, the second task is enqueued on one of the queues. Then, an agent responsible for this queue dequeues and executes the second task.
    Type: Grant
    Filed: April 13, 2020
    Date of Patent: January 3, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Steven Tony Tye, Brian L. Sumner, Bradford Michael Beckmann, Sooraj Puthoor
  • Patent number: 11467812
    Abstract: Described herein are techniques for performing compilation operations for heterogeneous code objects. According to the techniques, a compiler identifies architectures targeted by a compilation unit, compiles the compilation unit into a heterogeneous code object that includes a different code object portion for each identified architecture, performs name mangling on functions of the compilation unit, links the heterogeneous code object with a second code object to form an executable, and generates relocation records for the executable.
    Type: Grant
    Filed: November 22, 2019
    Date of Patent: October 11, 2022
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Steven Tony Tye, Brian Laird Sumner, Konstantin Zhuravlyov
  • Publication number: 20220206841
    Abstract: Systems, apparatuses, and methods for dynamic graphics processing unit (GPU) register allocation are disclosed. A GPU includes at least a plurality of compute units (CUs), a control unit, and a plurality of registers for each CU. If a new wavefront requests more registers than are currently available on the CU, the control unit spills registers associated with stack frames at the bottom of a stack since they will not likely be used in the near future. The control unit has complete flexibility determining how many registers to spill based on dynamic demands and can prefetch the upcoming necessary fills without software involvement. Effectively, the control unit manages the physical register file as a cache. This allows younger workgroups to be dynamically descheduled so that older workgroups can allocate additional registers when needed to ensure improved fairness and better forward progress guarantees.
    Type: Application
    Filed: December 29, 2020
    Publication date: June 30, 2022
    Inventors: Bradford Michael Beckmann, Steven Tony Tye, Brian L. Sumner, Nicolai Hähnle
  • Publication number: 20220206876
    Abstract: Systems, apparatuses, and methods for managing a number of wavefronts permitted to concurrently execute in a processing system. An apparatus includes a register file with a plurality of registers and a plurality of compute units configured to execute wavefronts. A control unit of the apparatus is configured to allow a first number of wavefronts to execute concurrently on the plurality of compute units. The control unit is configured to allow no more than a second number of wavefronts to execute concurrently on the plurality of compute units, wherein the second number is less than the first number, in response to detection that thrashing of the register file is above a threshold.
    Type: Application
    Filed: December 29, 2020
    Publication date: June 30, 2022
    Inventors: Bradford Michael Beckmann, Steven Tony Tye, Brian L. Sumner, Nicolai Hähnle
  • Publication number: 20220171635
    Abstract: Described herein are techniques for executing a heterogeneous code object executable. According to the techniques, a loader identifies a first memory appropriate for loading a first architecture-specific portion of the heterogeneous code object executable, wherein the first architecture specific portion includes instructions for a first architecture, identifies a second memory appropriate for loading a second architecture-specific portion of the heterogeneous code object executable, wherein the second architecture specific portion includes instructions for a second architecture that is different than the first architecture, loads the first architecture-specific portion into the first memory and the second architecture-specific portion into the second memory, and performs relocations on the first architecture-specific portion and on the second architecture-specific portion.
    Type: Application
    Filed: February 16, 2022
    Publication date: June 2, 2022
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Steven Tony Tye, Brian Laird Sumner, Konstantin Zhuravlyov
  • Patent number: 11256522
    Abstract: Described herein are techniques for executing a heterogeneous code object executable. According to the techniques, a loader identifies a first memory appropriate for loading a first architecture-specific portion of the heterogeneous code object executable, wherein the first architecture specific portion includes instructions for a first architecture, identifies a second memory appropriate for loading a second architecture-specific portion of the heterogeneous code object executable, wherein the second architecture specific portion includes instructions for a second architecture that is different than the first architecture, loads the first architecture-specific portion into the first memory and the second architecture-specific portion into the second memory, and performs relocations on the first architecture-specific portion and on the second architecture-specific portion.
    Type: Grant
    Filed: November 22, 2019
    Date of Patent: February 22, 2022
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Steven Tony Tye, Brian Laird Sumner, Konstantin Zhuravlyov
  • Publication number: 20210157559
    Abstract: Described herein are techniques for performing compilation operations for heterogeneous code objects. According to the techniques, a compiler identifies architectures targeted by a compilation unit, compiles the compilation unit into a heterogeneous code object that includes a different code object portion for each identified architecture, performs name mangling on functions of the compilation unit, links the heterogeneous code object with a second code object to form an executable, and generates relocation records for the executable.
    Type: Application
    Filed: November 22, 2019
    Publication date: May 27, 2021
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Steven Tony Tye, Brian Laird Sumner, Konstantin Zhuravlyov
  • Publication number: 20210157611
    Abstract: Described herein are techniques for executing a heterogeneous code object executable. According to the techniques, a loader identifies a first memory appropriate for loading a first architecture-specific portion of the heterogeneous code object executable, wherein the first architecture specific portion includes instructions for a first architecture, identifies a second memory appropriate for loading a second architecture-specific portion of the heterogeneous code object executable, wherein the second architecture specific portion includes instructions for a second architecture that is different than the first architecture, loads the first architecture-specific portion into the first memory and the second architecture-specific portion into the second memory, and performs relocations on the first architecture-specific portion and on the second architecture-specific portion.
    Type: Application
    Filed: November 22, 2019
    Publication date: May 27, 2021
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Steven Tony Tye, Brian Laird Sumner, Konstantin Zhuravlyov
  • Publication number: 20200379802
    Abstract: Systems, apparatuses, and methods for implementing continuation analysis tasks (CATs) are disclosed. In one embodiment, a system implements hardware acceleration of CATs to manage the dependencies and scheduling of an application composed of multiple tasks. In one embodiment, a continuation packet is referenced directly by a first task. When the first task completes, the first task enqueues a continuation packet on a first queue. The first task can specify on which queue to place the continuation packet. The agent responsible for the first queue dequeues and executes the continuation packet which invokes an analysis phase which is performed prior to determining which dependent tasks to enqueue. If it is determined during the analysis phase that a second task is now ready to be launched, the second task is enqueued on one of the queues. Then, an agent responsible for this queue dequeues and executes the second task.
    Type: Application
    Filed: April 13, 2020
    Publication date: December 3, 2020
    Inventors: Steven Tony Tye, Brian L. Sumner, Bradford Michael Beckmann, Sooraj Puthoor
  • Patent number: 10620994
    Abstract: Systems, apparatuses, and methods for implementing continuation analysis tasks (CATs) are disclosed. In one embodiment, a system implements hardware acceleration of CATs to manage the dependencies and scheduling of an application composed of multiple tasks. In one embodiment, a continuation packet is referenced directly by a first task. When the first task completes, the first task enqueues a continuation packet on a first queue. The first task can specify on which queue to place the continuation packet. The agent responsible for the first queue dequeues and executes the continuation packet which invokes an analysis phase which is performed prior to determining which dependent tasks to enqueue. If it is determined during the analysis phase that a second task is now ready to be launched, the second task is enqueued on one of the queues. Then, an agent responsible for this queue dequeues and executes the second task.
    Type: Grant
    Filed: May 30, 2017
    Date of Patent: April 14, 2020
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Steven Tony Tye, Brian L. Sumner, Bradford Michael Beckmann, Sooraj Puthoor
  • Publication number: 20180349145
    Abstract: Systems, apparatuses, and methods for implementing continuation analysis tasks (CATs) are disclosed. In one embodiment, a system implements hardware acceleration of CATs to manage the dependencies and scheduling of an application composed of multiple tasks. In one embodiment, a continuation packet is referenced directly by a first task. When the first task completes, the first task enqueues a continuation packet on a first queue. The first task can specify on which queue to place the continuation packet. The agent responsible for the first queue dequeues and executes the continuation packet which invokes an analysis phase which is performed prior to determining which dependent tasks to enqueue. If it is determined during the analysis phase that a second task is now ready to be launched, the second task is enqueued on one of the queues. Then, an agent responsible for this queue dequeues and executes the second task.
    Type: Application
    Filed: May 30, 2017
    Publication date: December 6, 2018
    Inventors: Steven Tony Tye, Brian L. Sumner, Bradford Michael Beckmann, Sooraj Puthoor
  • Patent number: 6535903
    Abstract: A computer system for executing a binary image conversion system which converts instructions from a instruction set of a first, non native computer system to a second, different, native computer system, includes an run-time system which in response to a non-native image of an application program written for a non-native instruction set provides an native instruction or a native instruction routine. The run-time system collects profile data in response to execution of the native instructions to determine execution characteristics of the non-native instruction. Thereafter, the non-native instructions and the profile statistics are fed to a binary translator operating in a background mode and which is responsive to the profile data generated by the run-time system to form a translated native image. The run-time system and the binary translator are under the control of a server process.
    Type: Grant
    Filed: January 29, 1996
    Date of Patent: March 18, 2003
    Assignee: Compaq Information Technologies Group, L.P.
    Inventors: John S. Yates, Steven Tony Tye
  • Patent number: 6502237
    Abstract: A computer system for executing a binary image conversion system which converts instructions from a instruction set of a first, non native computer system to a second, different, native computer system, includes an run-time system which in response to a non-native image of an application program written for a non-native instruction set provides an native instruction or a native instruction routine. The run-time system collects profile data in response to execution of the native instructions to determine execution characteristics of the non-native instruction. Thereafter, the non-native instructions and the profile statistics are fed to a binary translator operating in a background mode and which is responsive to the profile data generated by the run-time system to form a translated native image. The run-time system and the binary translator are under the control of a server process.
    Type: Grant
    Filed: February 11, 2000
    Date of Patent: December 31, 2002
    Assignee: Compaq Information Technologies Group, L.P.
    Inventors: John S. Yates, Steven Tony Tye, Raymond J. Hookway
  • Patent number: 6226789
    Abstract: A computer system for executing a binary image conversion system which converts instructions from a instruction set of a first, non native computer system to a second, different, native computer system, includes an run-time system which in response to a non-native image of an application program written for a non-native instruction set provides an native instruction or a native instruction routine. The run-time system collects profile data in response to execution of the native instructions to determine execution characteristics of the non-native instruction. Thereafter, the non-native instructions and the profile statistics are fed to a binary translator operating in a background mode and which is responsive to the profile data generated by the run-time system to form a translated native image. The run-time system and the binary translator are under the control of a server process.
    Type: Grant
    Filed: January 29, 1996
    Date of Patent: May 1, 2001
    Assignee: Compaq Computer Corporation
    Inventors: Steven Tony Tye, John S. Yates
  • Patent number: 5930509
    Abstract: A computer system for executing a binary image conversion system which converts instructions from a instruction set of a first, non native computer system to a second, different, native computer system, includes an run-time system which in response to a non-native image of an application program written for a non-native instruction set provides an native instruction or a native instruction routine. The run-time system collects profile data in response to execution of the native instructions to determine execution characteristics of the non-native instruction. Thereafter, the non-native instructions and the profile statistics are fed to a binary translator operating in a background mode and which is responsive to the profile data generated by the run-time system to form a translated native image. The run-time system and the binary translator are under the control of a server process.
    Type: Grant
    Filed: January 29, 1996
    Date of Patent: July 27, 1999
    Assignee: Digital Equipment Corporation
    Inventors: John S. Yates, Steven Tony Tye, Raymond J. Hookway
  • Patent number: 5842017
    Abstract: A computer system for executing a binary image conversion system which converts instructions from a instruction set of a first, non native computer system to a second, different, native computer system, includes an run-time system which in response to a non-native image of an application program written for a non-native instruction set provides an native instruction or a native instruction routine. The run-time system collects profile data in response to execution of the native instructions to determine execution characteristics of the non-native instruction. Thereafter, the non-native instructions and the profile statistics are fed to a binary translator operating in a background mode and which is responsive to the profile data generated by the run-time system to form a translated native image. The run-time system and the binary translator are under the control of a server process.
    Type: Grant
    Filed: January 29, 1996
    Date of Patent: November 24, 1998
    Assignee: Digital Equipment Corporation
    Inventors: Raymond J. Hookway, John S. Yates, Steven Tony Tye