Patents by Inventor Gregory W. Smaus

Gregory W. Smaus has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11868818
    Abstract: Techniques for selectively executing a lock instruction speculatively or non-speculatively based on lock address prediction and/or temporal lock prediction. including methods an devices for locking an entry in a memory device. In some techniques, a lock instruction executed by a thread for a particular memory entry of a memory device is detected. Whether contention occurred for the particular memory entry during an earlier speculative lock is detected on a condition that the lock instruction comprises a speculative lock instruction. The lock is executed non-speculatively if contention occurred for the particular memory entry during an earlier speculative lock. The lock is executed speculatively if contention did not occur for the particular memory entry during an earlier speculative lock.
    Type: Grant
    Filed: September 22, 2016
    Date of Patent: January 9, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Gregory W. Smaus, John M. King, Matthew A. Rafacz, Matthew M. Crum
  • Patent number: 11768771
    Abstract: The techniques described herein improve cache traffic performance in the context of contended lock instructions. More specifically, each core maintains a lock address contention table that stores addresses corresponding to contended lock instructions. The lock address contention table also includes a state value that indicates progress through a series of states meant to track whether a load by the core in a spin-loop associated with semaphore acquisition has obtained the semaphore in an exclusive state. Upon detecting that a load in a spin-loop has obtained the semaphore in an exclusive state, the core responds to incoming requests for access to the semaphore with negative acknowledgments. This allows the core to maintain the semaphore cache line in an exclusive state, which allows it to acquire the semaphore faster and to avoid transmitting that cache line to other cores unnecessarily.
    Type: Grant
    Filed: December 9, 2021
    Date of Patent: September 26, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: John M. King, Gregory W. Smaus
  • Patent number: 11379234
    Abstract: An arithmetic unit performs store-to-load forwarding based on predicted dependencies between store instructions and load instructions. In some embodiments, the arithmetic unit maintains a table of store instructions that are awaiting movement to a load/store unit of the instruction pipeline. In response to receiving a load instruction that is predicted to be dependent on a store instruction stored at the table, the arithmetic unit causes the data associated with the store instruction to be placed into the physical register targeted by the load instruction. In some embodiments, the arithmetic unit performs the forwarding by mapping the physical register targeted by the load instruction to the physical register where the data associated with the store instruction is located.
    Type: Grant
    Filed: May 19, 2021
    Date of Patent: July 5, 2022
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Gregory W. Smaus, Francesco Spadini, Matthew A. Rafacz, Michael Achenbach, Christopher J. Burke, Emil Talpes, Matthew M. Crum
  • Publication number: 20220100662
    Abstract: The techniques described herein improve cache traffic performance in the context of contended lock instructions. More specifically, each core maintains a lock address contention table that stores addresses corresponding to contended lock instructions. The lock address contention table also includes a state value that indicates progress through a series of states meant to track whether a load by the core in a spin-loop associated with semaphore acquisition has obtained the semaphore in an exclusive state. Upon detecting that a load in a spin-loop has obtained the semaphore in an exclusive state, the core responds to incoming requests for access to the semaphore with negative acknowledgments. This allows the core to maintain the semaphore cache line in an exclusive state, which allows it to acquire the semaphore faster and to avoid transmitting that cache line to other cores unnecessarily.
    Type: Application
    Filed: December 9, 2021
    Publication date: March 31, 2022
    Applicant: Advanced Micro Devices, Inc.
    Inventors: John M. King, Gregory W. Smaus
  • Patent number: 11216378
    Abstract: The techniques described herein improve cache traffic performance in the context of contended lock instructions. More specifically, each core maintains a lock address contention table that stores addresses corresponding to contended lock instructions. The lock address contention table also includes a state value that indicates progress through a series of states meant to track whether a load by the core in a spin-loop associated with semaphore acquisition has obtained the semaphore in an exclusive state. Upon detecting that a load in a spin-loop has obtained the semaphore in an exclusive state, the core responds to incoming requests for access to the semaphore with negative acknowledgments. This allows the core to maintain the semaphore cache line in an exclusive state, which allows it to acquire the semaphore faster and to avoid transmitting that cache line to other cores unnecessarily.
    Type: Grant
    Filed: September 19, 2016
    Date of Patent: January 4, 2022
    Assignee: Advanced Micro Devices, Inc.
    Inventors: John M. King, Gregory W. Smaus
  • Patent number: 11175916
    Abstract: A system and method for a lightweight fence is described. In particular, micro-operations including a fencing micro-operation are dispatched to a load queue. The fencing micro-operation allows micro-operations younger than the fencing micro-operation to execute, where the micro-operations are related to a type of fencing micro-operation. The fencing micro-operation is executed if the fencing micro-operation is the oldest memory access micro-operation, where the oldest memory access micro-operation is related to the type of fencing micro-operation. The fencing micro-operation determines whether micro-operations younger than the fencing micro-operation have load ordering violations and if load ordering violations are detected, the fencing micro-operation signals the retire queue that instructions younger than the fencing micro-operation should be flushed. The instructions to be flushed should include all micro-operations with load ordering violations.
    Type: Grant
    Filed: December 19, 2017
    Date of Patent: November 16, 2021
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Gregory W. Smaus, John M. King
  • Publication number: 20210311737
    Abstract: An arithmetic unit performs store-to-load forwarding based on predicted dependencies between store instructions and load instructions. In some embodiments, the arithmetic unit maintains a table of store instructions that are awaiting movement to a load/store unit of the instruction pipeline. In response to receiving a load instruction that is predicted to be dependent on a store instruction stored at the table, the arithmetic unit causes the data associated with the store instruction to be placed into the physical register targeted by the load instruction. In some embodiments, the arithmetic unit performs the forwarding by mapping the physical register targeted by the load instruction to the physical register where the data associated with the store instruction is located.
    Type: Application
    Filed: May 19, 2021
    Publication date: October 7, 2021
    Inventors: Gregory W. Smaus, Francesco Spadini, Matthew A. Rafacz, Michael Achenbach, Christopher J. Burke, Emil Talpes, Matthew M. Crum
  • Patent number: 11036505
    Abstract: An arithmetic unit performs store-to-load forwarding based on predicted dependencies between store instructions and load instructions. In some embodiments, the arithmetic unit maintains a table of store instructions that are awaiting movement to a load/store unit of the instruction pipeline. In response to receiving a load instruction that is predicted to be dependent on a store instruction stored at the table, the arithmetic unit causes the data associated with the store instruction to be placed into the physical register targeted by the load instruction. In some embodiments, the arithmetic unit performs the forwarding by mapping the physical register targeted by the load instruction to the physical register where the data associated with the store instruction is located.
    Type: Grant
    Filed: December 20, 2012
    Date of Patent: June 15, 2021
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Gregory W. Smaus, Francesco Spadini, Matthew A. Rafacz, Michael Achenbach, Christopher J. Burke, Emil Talpes, Matthew M. Crum
  • Publication number: 20190187990
    Abstract: A system and method for a lightweight fence is described. In particular, micro-operations including a fencing micro-operation are dispatched to a load queue. The fencing micro-operation allows micro-operations younger than the fencing micro-operation to execute, where the micro-operations are related to a type of fencing micro-operation. The fencing micro-operation is executed if the fencing micro-operation is the oldest memory access micro-operation, where the oldest memory access micro-operation is related to the type of fencing micro-operation. The fencing micro-operation determines whether micro-operations younger than the fencing micro-operation have load ordering violations and if load ordering violations are detected, the fencing micro-operation signals the retire queue that instructions younger than the fencing micro-operation should be flushed. The instructions to be flushed should include all micro-operations with load ordering violations.
    Type: Application
    Filed: December 19, 2017
    Publication date: June 20, 2019
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Gregory W. Smaus, John M. King
  • Patent number: 10095637
    Abstract: Techniques for improving execution of a lock instruction are provided herein. A lock instruction and younger instructions are allowed to speculatively retire prior to the store portion of the lock instruction committing its value to memory. These instructions thus do not have to wait for the lock instruction to complete before retiring. In the event that the processor detects a violation of the atomic or fencing properties of the lock instruction prior to committing the value of the lock instruction, the processor rolls back state and executes the lock instruction in a slow mode in which younger instructions are not allowed to retire until the stored value of the lock instruction is committed. Speculative retirement of these instructions results in increased processing speed, as instructions no longer need to wait to retire after execution of a lock instruction.
    Type: Grant
    Filed: September 15, 2016
    Date of Patent: October 9, 2018
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Gregory W. Smaus, John M. King, Michael D. Achenbach, Kevin M. Lepak, Matthew A. Rafacz, Noah Bamford
  • Publication number: 20180081544
    Abstract: Techniques for selectively executing a lock instruction speculatively or non-speculatively based on lock address prediction and/or temporal lock prediction. including methods an devices for locking an entry in a memory device. In some techniques, a lock instruction executed by a thread for a particular memory entry of a memory device is detected. Whether contention occurred for the particular memory entry during an earlier speculative lock is detected on a condition that the lock instruction comprises a speculative lock instruction. The lock is executed non-speculatively if contention occurred for the particular memory entry during an earlier speculative lock. The lock is executed speculatively if contention did not occur for the particular memory entry during an earlier speculative lock.
    Type: Application
    Filed: September 22, 2016
    Publication date: March 22, 2018
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Gregory W. Smaus, John M. King, Matthew A. Rafacz, Matthew M. Crum
  • Publication number: 20180081810
    Abstract: The techniques described herein improve cache traffic performance in the context of contended lock instructions. More specifically, each core maintains a lock address contention table that stores addresses corresponding to contended lock instructions. The lock address contention table also includes a state value that indicates progress through a series of states meant to track whether a load by the core in a spin-loop associated with semaphore acquisition has obtained the semaphore in an exclusive state. Upon detecting that a load in a spin-loop has obtained the semaphore in an exclusive state, the core responds to incoming requests for access to the semaphore with negative acknowledgments. This allows the core to maintain the semaphore cache line in an exclusive state, which allows it to acquire the semaphore faster and to avoid transmitting that cache line to other cores unnecessarily.
    Type: Application
    Filed: September 19, 2016
    Publication date: March 22, 2018
    Applicant: Advanced Micro Devices, Inc.
    Inventors: John M. King, Gregory W. Smaus
  • Publication number: 20180074977
    Abstract: Techniques for improving execution of a lock instruction are provided herein. A lock instruction and younger instructions are allowed to speculatively retire prior to the store portion of the lock instruction committing its value to memory. These instructions thus do not have to wait for the lock instruction to complete before retiring. In the event that the processor detects a violation of the atomic or fencing properties of the lock instruction prior to committing the value of the lock instruction, the processor rolls back state and executes the lock instruction in a slow mode in which younger instructions are not allowed to retire until the stored value of the lock instruction is committed. Speculative retirement of these instructions results in increased processing speed, as instructions no longer need to wait to retire after execution of a lock instruction.
    Type: Application
    Filed: September 15, 2016
    Publication date: March 15, 2018
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Gregory W. Smaus, John M. King, Michael D. Achenbach, Kevin M. Lepak, Matthew A. Rafacz, Noah Bamford
  • Patent number: 9727340
    Abstract: The present invention provides a method and apparatus for scheduling based on tags of different types. Some embodiments of the method include broadcasting a first tag to entries in a queue of a scheduler. The first tag is broadcast in response to a first instruction associated with a first entry in the queue being picked for execution. The first tag includes information identifying the first entry and information indicating a type of the first tag. Some embodiments of the method also include marking at least one second entry in the queue is ready to be picked for execution in response to at least one second tag associated with at least one second entry in the queue matching the first tag.
    Type: Grant
    Filed: July 17, 2013
    Date of Patent: August 8, 2017
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Michael Achenbach, Teik Tan, Gregory W. Smaus, Ganesh Venkataramanan, Emil Talpes
  • Patent number: 9606806
    Abstract: A method includes selecting for execution in a processor a load instruction having at least one dependent instruction. Responsive to selecting the load instruction, the at least one dependent instruction is selectively awakened based on a status of a store instruction associated with the load instruction to indicate that the at least one dependent instruction is eligible for execution. A processor includes an instruction pipeline having an execution unit to execute instructions, a scheduler, and a controller. The scheduler selects for execution in the execution unit a load instruction having at least one dependent instruction. The controller, responsive to the scheduler selecting the load instruction, selectively awakens the at least one dependent instruction based on a status of a store instruction associated with the load instruction to indicate that the at least one dependent instruction is eligible for execution by the execution unit.
    Type: Grant
    Filed: June 25, 2013
    Date of Patent: March 28, 2017
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Gregory W. Smaus, Michael Achenbach, Christopher J. Burke, Francesco Spadini
  • Patent number: 9582286
    Abstract: A processor includes a physical register file having physical registers and an execution unit to perform an arithmetic operation to generate a result mapped to a physical register, wherein the processor delays a write of the result to the physical register file until the result is qualified as valid. A method includes mapping the same physical register both to store load data of a load-execute operation and to subsequently store a result of an arithmetic operation of the load-execute operation, and writing the load data into the physical register. The method further includes, in a first clock cycle, executing the arithmetic operation to generate the result, and, in a second clock cycle, providing the result as a source operand for a dependent operation. The method includes, in a third clock cycle, enabling a write of the result to the physical register file responsive to the result qualifying as valid.
    Type: Grant
    Filed: November 9, 2012
    Date of Patent: February 28, 2017
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Ganesh Venkataramanan, Debjit Das Sarma, Betty A. McDaniel, Gregory W. Smaus, Francesco Spadini
  • Publication number: 20150026436
    Abstract: The present invention provides a method and apparatus for scheduling based on tags of different types. Some embodiments of the method include broadcasting a first tag to entries in a queue of a scheduler. The first tag is broadcast in response to a first instruction associated with a first entry in the queue being picked for execution. The first tag includes information identifying the first entry and information indicating a type of the first tag. Some embodiments of the method also include marking at least one second entry in the queue is ready to be picked for execution in response to at least one second tag associated with at least one second entry in the queue matching the first tag.
    Type: Application
    Filed: July 17, 2013
    Publication date: January 22, 2015
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Michael Achenbach, Teik Tan, Gregory W. Smaus, Ganesh Venkataramanan, Emil Talpes
  • Publication number: 20140380023
    Abstract: A method includes selecting for execution in a processor a load instruction having at least one dependent instruction. Responsive to selecting the load instruction, the at least one dependent instruction is selectively awakened based on a status of a store instruction associated with the load instruction to indicate that the at least one dependent instruction is eligible for execution. A processor includes an instruction pipeline having an execution unit to execute instructions, a scheduler, and a controller. The scheduler selects for execution in the execution unit a load instruction having at least one dependent instruction. The controller, responsive to the scheduler selecting the load instruction, selectively awakens the at least one dependent instruction based on a status of a store instruction associated with the load instruction to indicate that the at least one dependent instruction is eligible for execution by the execution unit.
    Type: Application
    Filed: June 25, 2013
    Publication date: December 25, 2014
    Inventors: Gregory W. Smaus, Michael Achenbach, Christopher J. Burke, Francesco Spadini
  • Publication number: 20140181482
    Abstract: An arithmetic unit performs store-to-load forwarding based on predicted dependencies between store instructions and load instructions. In some embodiments, the arithmetic unit maintains a table of store instructions that are awaiting movement to a load/store unit of the instruction pipeline. In response to receiving a load instruction that is predicted to be dependent on a store instruction stored at the table, the arithmetic unit causes the data associated with the store instruction to be placed into the physical register targeted by the load instruction. In some embodiments, the arithmetic unit performs the forwarding by mapping the physical register targeted by the load instruction to the physical register where the data associated with the store instruction is located.
    Type: Application
    Filed: December 20, 2012
    Publication date: June 26, 2014
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Gregory W. Smaus, Francesco Spadini, Matthew A. Rafacz, Michael Achenbach, Christopher J. Burke, Emil Talpes, Matthew M. Crum
  • Publication number: 20140136819
    Abstract: A processor includes a physical register file having physical registers and an execution unit to perform an arithmetic operation to generate a result mapped to a physical register, wherein the processor delays a write of the result to the physical register file until the result is qualified as valid. A method includes mapping the same physical register both to store load data of a load-execute operation and to subsequently store a result of an arithmetic operation of the load-execute operation, and writing the load data into the physical register. The method further includes, in a first clock cycle, executing the arithmetic operation to generate the result, and, in a second clock cycle, providing the result as a source operand for a dependent operation. The method includes, in a third clock cycle, enabling a write of the result to the physical register file responsive to the result qualifying as valid.
    Type: Application
    Filed: November 9, 2012
    Publication date: May 15, 2014
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Ganesh Venkataramanan, Debjit Das Sarma, Betty A. McDaniel, Gregory W. Smaus, Francesco Spadini