Patents by Inventor Joseph Lee Greathouse

Joseph Lee Greathouse has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230205306
    Abstract: One or more components of a computing device are run by default in a boost mode state. The one or more components continue to run in the boost mode state until the boost mode state is no longer sustainable, e.g., due to power consumption of the one or more components or temperature of the one or more components. The one or more components are switched to a reduced power state (e.g., a non-boost mode state) in response to the boost mode state no longer being sustainable. When operating the one or more components in the boost mode state again becomes sustainable due to power consumption or temperature of the one or more components, the one or more components are returned to the default boost mode state.
    Type: Application
    Filed: December 24, 2021
    Publication date: June 29, 2023
    Inventors: Joseph Lee Greathouse, Adam Neil Calder Clark, Stephen Kushnir
  • Patent number: 11669473
    Abstract: Systems, apparatuses, and methods for performing an allreduce operation on an enhanced direct memory access (DMA) engine are disclosed. A system implements a machine learning application which includes a first kernel and a second kernel. The first kernel corresponds to a first portion of a machine learning model while the second kernel corresponds to a second portion of the machine learning model. The first kernel is invoked on a plurality of compute units and the second kernel is converted into commands executable by an enhanced DMA engine to perform a collective communication operation. The first kernel is executed on the plurality of compute units in parallel with the enhanced DMA engine executing the commands for performing the collective communication operation. As a result, the allreduce operation may be executed in parallel on the enhanced DMA engine to the compute units.
    Type: Grant
    Filed: September 25, 2020
    Date of Patent: June 6, 2023
    Inventors: Abhinav Vishnu, Joseph Lee Greathouse
  • Patent number: 11663001
    Abstract: Systems, apparatuses, and methods for implementing a family of lossy sparse load single instruction, multiple data (SIMD) instructions are disclosed. A lossy sparse load unit (LSLU) loads a plurality of values from one or more input vector operands and determines how many non-zero values are included in one or more input vector operands of a given instruction. If the one or more input vector operands have less than a threshold number of non-zero values, then the LSLU causes an instruction for processing the one or more input vector operands to be skipped. In this case, the processing of the instruction of the one or more input vector operands is deemed to be redundant. If the one or more input vector operands have greater than or equal to the threshold number of non-zero values, then the LSLU causes an instruction for processing the input vector operand(s) to be executed.
    Type: Grant
    Filed: November 19, 2018
    Date of Patent: May 30, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Sanchari Sen, Derrick Allen Aguren, Joseph Lee Greathouse
  • Publication number: 20220269535
    Abstract: Systems, apparatuses, and methods for enforcing processor quality of service guarantees when servicing system service requests (SSRs) are disclosed. A system includes a first processor executing an operating system and a second processor executing an application which generates SSRs for the first processor to service. The first processor monitors the number of cycles spent servicing SSRs over a previous time interval, and if this number of cycles is above a threshold, the first processor starts delaying the servicing of subsequent SSRs. In one implementation, if the previous delay was non-zero, the first processor increases the delay used in the servicing of subsequent SSRs. If the number of cycles is less than or equal to the threshold, then the first processor services SSRs without delay. As the delay is increased, the second processor begins to stall and its SSR generation rate falls, reducing the load on the first processor.
    Type: Application
    Filed: March 1, 2022
    Publication date: August 25, 2022
    Inventors: Arkaprava Basu, Joseph Lee Greathouse
  • Publication number: 20220092725
    Abstract: Systems, apparatuses, and methods for implementing register compaction with early release are disclosed. A processor includes at least a command processor, a plurality of compute units, a plurality of registers, and a control unit. Registers are statically allocated to wavefronts by the control unit when wavefronts are launched by the command processor on the compute units. In response to determining that a first set of registers, previously allocated to a first wavefront, are no longer needed, the first wavefront executes an instruction to release the first set of registers. The control unit detects the executed instruction and releases the first set of registers to the available pool of registers to potentially be used by other wavefronts. Then, the control unit can allocate the first set of registers to a second wavefront for use by threads of the second wavefront while the first wavefront is still active.
    Type: Application
    Filed: September 24, 2020
    Publication date: March 24, 2022
    Inventors: Brian D. Emberling, Joseph Lee Greathouse, Anthony Thomas Gutierrez
  • Patent number: 11275613
    Abstract: Systems, apparatuses, and methods for enforcing processor quality of service guarantees when servicing system service requests (SSRs) are disclosed. A system includes a first processor executing an operating system and a second processor executing an application which generates SSRs for the first processor to service. The first processor monitors the number of cycles spent servicing SSRs over a previous time interval, and if this number of cycles is above a threshold, the first processor starts delaying the servicing of subsequent SSRs. In one implementation, if the previous delay was non-zero, the first processor increases the delay used in the servicing of subsequent SSRs. If the number of cycles is less than or equal to the threshold, then the first processor services SSRs without delay. As the delay is increased, the second processor begins to stall and its SSR generation rate falls, reducing the load on the first processor.
    Type: Grant
    Filed: April 16, 2018
    Date of Patent: March 15, 2022
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Arkaprava Basu, Joseph Lee Greathouse
  • Publication number: 20210406209
    Abstract: Systems, apparatuses, and methods for performing an allreduce operation on an enhanced direct memory access (DMA) engine are disclosed. A system implements a machine learning application which includes a first kernel and a second kernel. The first kernel corresponds to a first portion of a machine learning model while the second kernel corresponds to a second portion of the machine learning model. The first kernel is invoked on a plurality of compute units and the second kernel is converted into commands executable by an enhanced DMA engine to perform a collective communication operation. The first kernel is executed on the plurality of compute units in parallel with the enhanced DMA engine executing the commands for performing the collective communication operation. As a result, the allreduce operation may be executed in parallel on the enhanced DMA engine to the compute units.
    Type: Application
    Filed: September 25, 2020
    Publication date: December 30, 2021
    Inventors: Abhinav Vishnu, Joseph Lee Greathouse
  • Patent number: 10928789
    Abstract: A processing unit includes a plurality of subsystem control modules. Each subsystem control module includes a set of one or more inputs that receives a set of one or more external signals and a set of one or more monitored outputs from a hardware subsystem corresponding to the subsystem control module, and a set of configuration outputs for controlling one or more configuration settings of the hardware subsystem. The subsystem control module determines the one or more configuration settings based on the set of monitored outputs and on one or more targets derived from the set of external signals.
    Type: Grant
    Filed: April 11, 2018
    Date of Patent: February 23, 2021
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Raghavendra Pradyumna Pothukuchi, Joseph Lee Greathouse, Leonardo De Paula Rosa Piga
  • Patent number: 10691772
    Abstract: A method includes storing a sparse triangular matrix as a compressed sparse row (CSR) dataset. For each factor of a plurality of factors in a first vector, a value of the factor is calculated by identifying for the factor a set of one or more antecedent factors in the first vector, where the value of the factor is dependent on each of the one or more antecedent factors. In response to a completion array indicating that all of the one or more antecedent factor values are solved, the value of the factor is calculated based on one or more elements in a row of the matrix and a product value corresponding to the row. In the completion array, a first completion flag for the factor is asserted, indicating that the factor is solved.
    Type: Grant
    Filed: April 20, 2018
    Date of Patent: June 23, 2020
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Joseph Lee Greathouse
  • Publication number: 20200159529
    Abstract: Systems, apparatuses, and methods for implementing a family of lossy sparse load single instruction, multiple data (SIMD) instructions are disclosed. A lossy sparse load unit (LSLU) loads a plurality of values from one or more input vector operands and determines how many non-zero values are included in one or more input vector operands of a given instruction. If the one or more input vector operands have less than a threshold number of non-zero values, then the LSLU causes an instruction for processing the one or more input vector operands to be skipped. In this case, the processing of the instruction of the one or more input vector operands is deemed to be redundant. If the one or more input vector operands have greater than or equal to the threshold number of non-zero values, then the LSLU causes an instruction for processing the input vector operand(s) to be executed.
    Type: Application
    Filed: November 19, 2018
    Publication date: May 21, 2020
    Inventors: Sanchari Sen, Derrick Allen Aguren, Joseph Lee Greathouse
  • Publication number: 20190325005
    Abstract: A method includes storing a sparse triangular matrix as a compressed sparse row (CSR) dataset. For each factor of a plurality of factors in a first vector, a value of the factor is calculated by identifying for the factor a set of one or more antecedent factors in the first vector, where the value of the factor is dependent on each of the one or more antecedent factors. In response to a completion array indicating that all of the one or more antecedent factor values are solved, the value of the factor is calculated based on one or more elements in a row of the matrix and a product value corresponding to the row. In the completion array, a first completion flag for the factor is asserted, indicating that the factor is solved.
    Type: Application
    Filed: April 20, 2018
    Publication date: October 24, 2019
    Inventor: Joseph Lee Greathouse
  • Publication number: 20190317807
    Abstract: Systems, apparatuses, and methods for enforcing processor quality of service guarantees when servicing system service requests (SSRs) are disclosed. A system includes a first processor executing an operating system and a second processor executing an application which generates SSRs for the first processor to service. The first processor monitors the number of cycles spent servicing SSRs over a previous time interval, and if this number of cycles is above a threshold, the first processor starts delaying the servicing of subsequent SSRs. In one implementation, if the previous delay was non-zero, the first processor increases the delay used in the servicing of subsequent SSRs. If the number of cycles is less than or equal to the threshold, then the first processor services SSRs without delay. As the delay is increased, the second processor begins to stall and its SSR generation rate falls, reducing the load on the first processor.
    Type: Application
    Filed: April 16, 2018
    Publication date: October 17, 2019
    Inventors: Arkaprava Basu, Joseph Lee Greathouse
  • Publication number: 20190317461
    Abstract: A processing unit includes a plurality of subsystem control modules. Each subsystem control module includes a set of one or more inputs that receives a set of one or more external signals and a set of one or more monitored outputs from a hardware subsystem corresponding to the subsystem control module, and a set of configuration outputs for controlling one or more configuration settings of the hardware subsystem. The subsystem control module determines the one or more configuration settings based on the set of monitored outputs and on one or more targets derived from the set of external signals.
    Type: Application
    Filed: April 11, 2018
    Publication date: October 17, 2019
    Inventors: Raghavendra Pradyumna Pothukuchi, Joseph Lee Greathouse, Leonardo De Paula Rosa Piga
  • Patent number: 9372773
    Abstract: A processor, a method and a computer-readable medium for recording branch addresses are provided. The processor comprises hardware registers and first and second circuitry. The first circuitry is configured to store a first address associated with a branch instruction in the hardware registers. The first circuitry is further configured to store a second address that indicates where the processor execution is redirected to as a result of the branch instruction in the hardware registers. The second circuitry is configured to, in response to a second instruction, retrieve a value of at least one of the registers. The second instruction can be a user-level instruction.
    Type: Grant
    Filed: June 12, 2013
    Date of Patent: June 21, 2016
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Joseph Lee Greathouse, Anton Chernoff
  • Publication number: 20140372734
    Abstract: A processor, a method and a computer-readable medium for recording branch addresses are provided. The processor comprises hardware registers and first and second circuitry. The first circuitry is configured to store a first address associated with a branch instruction in the hardware registers. The first circuitry is further configured to store a second address that indicates where the processor execution is redirected to as a result of the branch instruction in the hardware registers. The second circuitry is configured to, in response to a second instruction, retrieve a value of at least one of the registers. The second instruction can be a user-level instruction.
    Type: Application
    Filed: June 12, 2013
    Publication date: December 18, 2014
    Inventors: Joseph Lee Greathouse, Anton Chernoff