Patents by Inventor Joseph Lee Greathouse
Joseph Lee Greathouse has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20230205306Abstract: One or more components of a computing device are run by default in a boost mode state. The one or more components continue to run in the boost mode state until the boost mode state is no longer sustainable, e.g., due to power consumption of the one or more components or temperature of the one or more components. The one or more components are switched to a reduced power state (e.g., a non-boost mode state) in response to the boost mode state no longer being sustainable. When operating the one or more components in the boost mode state again becomes sustainable due to power consumption or temperature of the one or more components, the one or more components are returned to the default boost mode state.Type: ApplicationFiled: December 24, 2021Publication date: June 29, 2023Inventors: Joseph Lee Greathouse, Adam Neil Calder Clark, Stephen Kushnir
-
Patent number: 11669473Abstract: Systems, apparatuses, and methods for performing an allreduce operation on an enhanced direct memory access (DMA) engine are disclosed. A system implements a machine learning application which includes a first kernel and a second kernel. The first kernel corresponds to a first portion of a machine learning model while the second kernel corresponds to a second portion of the machine learning model. The first kernel is invoked on a plurality of compute units and the second kernel is converted into commands executable by an enhanced DMA engine to perform a collective communication operation. The first kernel is executed on the plurality of compute units in parallel with the enhanced DMA engine executing the commands for performing the collective communication operation. As a result, the allreduce operation may be executed in parallel on the enhanced DMA engine to the compute units.Type: GrantFiled: September 25, 2020Date of Patent: June 6, 2023Inventors: Abhinav Vishnu, Joseph Lee Greathouse
-
Patent number: 11663001Abstract: Systems, apparatuses, and methods for implementing a family of lossy sparse load single instruction, multiple data (SIMD) instructions are disclosed. A lossy sparse load unit (LSLU) loads a plurality of values from one or more input vector operands and determines how many non-zero values are included in one or more input vector operands of a given instruction. If the one or more input vector operands have less than a threshold number of non-zero values, then the LSLU causes an instruction for processing the one or more input vector operands to be skipped. In this case, the processing of the instruction of the one or more input vector operands is deemed to be redundant. If the one or more input vector operands have greater than or equal to the threshold number of non-zero values, then the LSLU causes an instruction for processing the input vector operand(s) to be executed.Type: GrantFiled: November 19, 2018Date of Patent: May 30, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Sanchari Sen, Derrick Allen Aguren, Joseph Lee Greathouse
-
Publication number: 20220269535Abstract: Systems, apparatuses, and methods for enforcing processor quality of service guarantees when servicing system service requests (SSRs) are disclosed. A system includes a first processor executing an operating system and a second processor executing an application which generates SSRs for the first processor to service. The first processor monitors the number of cycles spent servicing SSRs over a previous time interval, and if this number of cycles is above a threshold, the first processor starts delaying the servicing of subsequent SSRs. In one implementation, if the previous delay was non-zero, the first processor increases the delay used in the servicing of subsequent SSRs. If the number of cycles is less than or equal to the threshold, then the first processor services SSRs without delay. As the delay is increased, the second processor begins to stall and its SSR generation rate falls, reducing the load on the first processor.Type: ApplicationFiled: March 1, 2022Publication date: August 25, 2022Inventors: Arkaprava Basu, Joseph Lee Greathouse
-
Publication number: 20220092725Abstract: Systems, apparatuses, and methods for implementing register compaction with early release are disclosed. A processor includes at least a command processor, a plurality of compute units, a plurality of registers, and a control unit. Registers are statically allocated to wavefronts by the control unit when wavefronts are launched by the command processor on the compute units. In response to determining that a first set of registers, previously allocated to a first wavefront, are no longer needed, the first wavefront executes an instruction to release the first set of registers. The control unit detects the executed instruction and releases the first set of registers to the available pool of registers to potentially be used by other wavefronts. Then, the control unit can allocate the first set of registers to a second wavefront for use by threads of the second wavefront while the first wavefront is still active.Type: ApplicationFiled: September 24, 2020Publication date: March 24, 2022Inventors: Brian D. Emberling, Joseph Lee Greathouse, Anthony Thomas Gutierrez
-
Patent number: 11275613Abstract: Systems, apparatuses, and methods for enforcing processor quality of service guarantees when servicing system service requests (SSRs) are disclosed. A system includes a first processor executing an operating system and a second processor executing an application which generates SSRs for the first processor to service. The first processor monitors the number of cycles spent servicing SSRs over a previous time interval, and if this number of cycles is above a threshold, the first processor starts delaying the servicing of subsequent SSRs. In one implementation, if the previous delay was non-zero, the first processor increases the delay used in the servicing of subsequent SSRs. If the number of cycles is less than or equal to the threshold, then the first processor services SSRs without delay. As the delay is increased, the second processor begins to stall and its SSR generation rate falls, reducing the load on the first processor.Type: GrantFiled: April 16, 2018Date of Patent: March 15, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Arkaprava Basu, Joseph Lee Greathouse
-
Publication number: 20210406209Abstract: Systems, apparatuses, and methods for performing an allreduce operation on an enhanced direct memory access (DMA) engine are disclosed. A system implements a machine learning application which includes a first kernel and a second kernel. The first kernel corresponds to a first portion of a machine learning model while the second kernel corresponds to a second portion of the machine learning model. The first kernel is invoked on a plurality of compute units and the second kernel is converted into commands executable by an enhanced DMA engine to perform a collective communication operation. The first kernel is executed on the plurality of compute units in parallel with the enhanced DMA engine executing the commands for performing the collective communication operation. As a result, the allreduce operation may be executed in parallel on the enhanced DMA engine to the compute units.Type: ApplicationFiled: September 25, 2020Publication date: December 30, 2021Inventors: Abhinav Vishnu, Joseph Lee Greathouse
-
Patent number: 10928789Abstract: A processing unit includes a plurality of subsystem control modules. Each subsystem control module includes a set of one or more inputs that receives a set of one or more external signals and a set of one or more monitored outputs from a hardware subsystem corresponding to the subsystem control module, and a set of configuration outputs for controlling one or more configuration settings of the hardware subsystem. The subsystem control module determines the one or more configuration settings based on the set of monitored outputs and on one or more targets derived from the set of external signals.Type: GrantFiled: April 11, 2018Date of Patent: February 23, 2021Assignee: Advanced Micro Devices, Inc.Inventors: Raghavendra Pradyumna Pothukuchi, Joseph Lee Greathouse, Leonardo De Paula Rosa Piga
-
Patent number: 10691772Abstract: A method includes storing a sparse triangular matrix as a compressed sparse row (CSR) dataset. For each factor of a plurality of factors in a first vector, a value of the factor is calculated by identifying for the factor a set of one or more antecedent factors in the first vector, where the value of the factor is dependent on each of the one or more antecedent factors. In response to a completion array indicating that all of the one or more antecedent factor values are solved, the value of the factor is calculated based on one or more elements in a row of the matrix and a product value corresponding to the row. In the completion array, a first completion flag for the factor is asserted, indicating that the factor is solved.Type: GrantFiled: April 20, 2018Date of Patent: June 23, 2020Assignee: Advanced Micro Devices, Inc.Inventor: Joseph Lee Greathouse
-
Publication number: 20200159529Abstract: Systems, apparatuses, and methods for implementing a family of lossy sparse load single instruction, multiple data (SIMD) instructions are disclosed. A lossy sparse load unit (LSLU) loads a plurality of values from one or more input vector operands and determines how many non-zero values are included in one or more input vector operands of a given instruction. If the one or more input vector operands have less than a threshold number of non-zero values, then the LSLU causes an instruction for processing the one or more input vector operands to be skipped. In this case, the processing of the instruction of the one or more input vector operands is deemed to be redundant. If the one or more input vector operands have greater than or equal to the threshold number of non-zero values, then the LSLU causes an instruction for processing the input vector operand(s) to be executed.Type: ApplicationFiled: November 19, 2018Publication date: May 21, 2020Inventors: Sanchari Sen, Derrick Allen Aguren, Joseph Lee Greathouse
-
Publication number: 20190325005Abstract: A method includes storing a sparse triangular matrix as a compressed sparse row (CSR) dataset. For each factor of a plurality of factors in a first vector, a value of the factor is calculated by identifying for the factor a set of one or more antecedent factors in the first vector, where the value of the factor is dependent on each of the one or more antecedent factors. In response to a completion array indicating that all of the one or more antecedent factor values are solved, the value of the factor is calculated based on one or more elements in a row of the matrix and a product value corresponding to the row. In the completion array, a first completion flag for the factor is asserted, indicating that the factor is solved.Type: ApplicationFiled: April 20, 2018Publication date: October 24, 2019Inventor: Joseph Lee Greathouse
-
Publication number: 20190317807Abstract: Systems, apparatuses, and methods for enforcing processor quality of service guarantees when servicing system service requests (SSRs) are disclosed. A system includes a first processor executing an operating system and a second processor executing an application which generates SSRs for the first processor to service. The first processor monitors the number of cycles spent servicing SSRs over a previous time interval, and if this number of cycles is above a threshold, the first processor starts delaying the servicing of subsequent SSRs. In one implementation, if the previous delay was non-zero, the first processor increases the delay used in the servicing of subsequent SSRs. If the number of cycles is less than or equal to the threshold, then the first processor services SSRs without delay. As the delay is increased, the second processor begins to stall and its SSR generation rate falls, reducing the load on the first processor.Type: ApplicationFiled: April 16, 2018Publication date: October 17, 2019Inventors: Arkaprava Basu, Joseph Lee Greathouse
-
Publication number: 20190317461Abstract: A processing unit includes a plurality of subsystem control modules. Each subsystem control module includes a set of one or more inputs that receives a set of one or more external signals and a set of one or more monitored outputs from a hardware subsystem corresponding to the subsystem control module, and a set of configuration outputs for controlling one or more configuration settings of the hardware subsystem. The subsystem control module determines the one or more configuration settings based on the set of monitored outputs and on one or more targets derived from the set of external signals.Type: ApplicationFiled: April 11, 2018Publication date: October 17, 2019Inventors: Raghavendra Pradyumna Pothukuchi, Joseph Lee Greathouse, Leonardo De Paula Rosa Piga
-
Patent number: 9372773Abstract: A processor, a method and a computer-readable medium for recording branch addresses are provided. The processor comprises hardware registers and first and second circuitry. The first circuitry is configured to store a first address associated with a branch instruction in the hardware registers. The first circuitry is further configured to store a second address that indicates where the processor execution is redirected to as a result of the branch instruction in the hardware registers. The second circuitry is configured to, in response to a second instruction, retrieve a value of at least one of the registers. The second instruction can be a user-level instruction.Type: GrantFiled: June 12, 2013Date of Patent: June 21, 2016Assignee: Advanced Micro Devices, Inc.Inventors: Joseph Lee Greathouse, Anton Chernoff
-
Publication number: 20140372734Abstract: A processor, a method and a computer-readable medium for recording branch addresses are provided. The processor comprises hardware registers and first and second circuitry. The first circuitry is configured to store a first address associated with a branch instruction in the hardware registers. The first circuitry is further configured to store a second address that indicates where the processor execution is redirected to as a result of the branch instruction in the hardware registers. The second circuitry is configured to, in response to a second instruction, retrieve a value of at least one of the registers. The second instruction can be a user-level instruction.Type: ApplicationFiled: June 12, 2013Publication date: December 18, 2014Inventors: Joseph Lee Greathouse, Anton Chernoff