Patents by Inventor Gregory W. Smaus

Gregory W. Smaus has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Lock address contention predictor

Patent number: 11868818

Abstract: Techniques for selectively executing a lock instruction speculatively or non-speculatively based on lock address prediction and/or temporal lock prediction. including methods an devices for locking an entry in a memory device. In some techniques, a lock instruction executed by a thread for a particular memory entry of a memory device is detected. Whether contention occurred for the particular memory entry during an earlier speculative lock is detected on a condition that the lock instruction comprises a speculative lock instruction. The lock is executed non-speculatively if contention occurred for the particular memory entry during an earlier speculative lock. The lock is executed speculatively if contention did not occur for the particular memory entry during an earlier speculative lock.

Type: Grant

Filed: September 22, 2016

Date of Patent: January 9, 2024

Assignee: Advanced Micro Devices, Inc.

Inventors: Gregory W. Smaus, John M. King, Matthew A. Rafacz, Matthew M. Crum
Techniques for handling cache coherency traffic for contended semaphores

Patent number: 11768771

Abstract: The techniques described herein improve cache traffic performance in the context of contended lock instructions. More specifically, each core maintains a lock address contention table that stores addresses corresponding to contended lock instructions. The lock address contention table also includes a state value that indicates progress through a series of states meant to track whether a load by the core in a spin-loop associated with semaphore acquisition has obtained the semaphore in an exclusive state. Upon detecting that a load in a spin-loop has obtained the semaphore in an exclusive state, the core responds to incoming requests for access to the semaphore with negative acknowledgments. This allows the core to maintain the semaphore cache line in an exclusive state, which allows it to acquire the semaphore faster and to avoid transmitting that cache line to other cores unnecessarily.

Type: Grant

Filed: December 9, 2021

Date of Patent: September 26, 2023

Assignee: Advanced Micro Devices, Inc.

Inventors: John M. King, Gregory W. Smaus
Store-to-load forwarding

Patent number: 11379234

Abstract: An arithmetic unit performs store-to-load forwarding based on predicted dependencies between store instructions and load instructions. In some embodiments, the arithmetic unit maintains a table of store instructions that are awaiting movement to a load/store unit of the instruction pipeline. In response to receiving a load instruction that is predicted to be dependent on a store instruction stored at the table, the arithmetic unit causes the data associated with the store instruction to be placed into the physical register targeted by the load instruction. In some embodiments, the arithmetic unit performs the forwarding by mapping the physical register targeted by the load instruction to the physical register where the data associated with the store instruction is located.

Type: Grant

Filed: May 19, 2021

Date of Patent: July 5, 2022

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Gregory W. Smaus, Francesco Spadini, Matthew A. Rafacz, Michael Achenbach, Christopher J. Burke, Emil Talpes, Matthew M. Crum
TECHNIQUES FOR HANDLING CACHE COHERENCY TRAFFIC FOR CONTENDED SEMAPHORES

Publication number: 20220100662

Abstract: The techniques described herein improve cache traffic performance in the context of contended lock instructions. More specifically, each core maintains a lock address contention table that stores addresses corresponding to contended lock instructions. The lock address contention table also includes a state value that indicates progress through a series of states meant to track whether a load by the core in a spin-loop associated with semaphore acquisition has obtained the semaphore in an exclusive state. Upon detecting that a load in a spin-loop has obtained the semaphore in an exclusive state, the core responds to incoming requests for access to the semaphore with negative acknowledgments. This allows the core to maintain the semaphore cache line in an exclusive state, which allows it to acquire the semaphore faster and to avoid transmitting that cache line to other cores unnecessarily.

Type: Application

Filed: December 9, 2021

Publication date: March 31, 2022

Applicant: Advanced Micro Devices, Inc.

Inventors: John M. King, Gregory W. Smaus
Techniques for handling cache coherency traffic for contended semaphores

Patent number: 11216378

Abstract: The techniques described herein improve cache traffic performance in the context of contended lock instructions. More specifically, each core maintains a lock address contention table that stores addresses corresponding to contended lock instructions. The lock address contention table also includes a state value that indicates progress through a series of states meant to track whether a load by the core in a spin-loop associated with semaphore acquisition has obtained the semaphore in an exclusive state. Upon detecting that a load in a spin-loop has obtained the semaphore in an exclusive state, the core responds to incoming requests for access to the semaphore with negative acknowledgments. This allows the core to maintain the semaphore cache line in an exclusive state, which allows it to acquire the semaphore faster and to avoid transmitting that cache line to other cores unnecessarily.

Type: Grant

Filed: September 19, 2016

Date of Patent: January 4, 2022

Assignee: Advanced Micro Devices, Inc.

Inventors: John M. King, Gregory W. Smaus
System and method for a lightweight fencing operation

Patent number: 11175916

Abstract: A system and method for a lightweight fence is described. In particular, micro-operations including a fencing micro-operation are dispatched to a load queue. The fencing micro-operation allows micro-operations younger than the fencing micro-operation to execute, where the micro-operations are related to a type of fencing micro-operation. The fencing micro-operation is executed if the fencing micro-operation is the oldest memory access micro-operation, where the oldest memory access micro-operation is related to the type of fencing micro-operation. The fencing micro-operation determines whether micro-operations younger than the fencing micro-operation have load ordering violations and if load ordering violations are detected, the fencing micro-operation signals the retire queue that instructions younger than the fencing micro-operation should be flushed. The instructions to be flushed should include all micro-operations with load ordering violations.

Type: Grant

Filed: December 19, 2017

Date of Patent: November 16, 2021

Assignee: Advanced Micro Devices, Inc.

Inventors: Gregory W. Smaus, John M. King
STORE-TO-LOAD FORWARDING

Publication number: 20210311737

Abstract: An arithmetic unit performs store-to-load forwarding based on predicted dependencies between store instructions and load instructions. In some embodiments, the arithmetic unit maintains a table of store instructions that are awaiting movement to a load/store unit of the instruction pipeline. In response to receiving a load instruction that is predicted to be dependent on a store instruction stored at the table, the arithmetic unit causes the data associated with the store instruction to be placed into the physical register targeted by the load instruction. In some embodiments, the arithmetic unit performs the forwarding by mapping the physical register targeted by the load instruction to the physical register where the data associated with the store instruction is located.

Type: Application

Filed: May 19, 2021

Publication date: October 7, 2021

Inventors: Gregory W. Smaus, Francesco Spadini, Matthew A. Rafacz, Michael Achenbach, Christopher J. Burke, Emil Talpes, Matthew M. Crum
Store-to-load forwarding

Patent number: 11036505

Abstract: An arithmetic unit performs store-to-load forwarding based on predicted dependencies between store instructions and load instructions. In some embodiments, the arithmetic unit maintains a table of store instructions that are awaiting movement to a load/store unit of the instruction pipeline. In response to receiving a load instruction that is predicted to be dependent on a store instruction stored at the table, the arithmetic unit causes the data associated with the store instruction to be placed into the physical register targeted by the load instruction. In some embodiments, the arithmetic unit performs the forwarding by mapping the physical register targeted by the load instruction to the physical register where the data associated with the store instruction is located.

Type: Grant

Filed: December 20, 2012

Date of Patent: June 15, 2021

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Gregory W. Smaus, Francesco Spadini, Matthew A. Rafacz, Michael Achenbach, Christopher J. Burke, Emil Talpes, Matthew M. Crum
SYSTEM AND METHOD FOR A LIGHTWEIGHT FENCING OPERATION

Publication number: 20190187990

Abstract: A system and method for a lightweight fence is described. In particular, micro-operations including a fencing micro-operation are dispatched to a load queue. The fencing micro-operation allows micro-operations younger than the fencing micro-operation to execute, where the micro-operations are related to a type of fencing micro-operation. The fencing micro-operation is executed if the fencing micro-operation is the oldest memory access micro-operation, where the oldest memory access micro-operation is related to the type of fencing micro-operation. The fencing micro-operation determines whether micro-operations younger than the fencing micro-operation have load ordering violations and if load ordering violations are detected, the fencing micro-operation signals the retire queue that instructions younger than the fencing micro-operation should be flushed. The instructions to be flushed should include all micro-operations with load ordering violations.

Type: Application

Filed: December 19, 2017

Publication date: June 20, 2019

Applicant: Advanced Micro Devices, Inc.

Inventors: Gregory W. Smaus, John M. King
Speculative retirement of post-lock instructions

Patent number: 10095637

Abstract: Techniques for improving execution of a lock instruction are provided herein. A lock instruction and younger instructions are allowed to speculatively retire prior to the store portion of the lock instruction committing its value to memory. These instructions thus do not have to wait for the lock instruction to complete before retiring. In the event that the processor detects a violation of the atomic or fencing properties of the lock instruction prior to committing the value of the lock instruction, the processor rolls back state and executes the lock instruction in a slow mode in which younger instructions are not allowed to retire until the stored value of the lock instruction is committed. Speculative retirement of these instructions results in increased processing speed, as instructions no longer need to wait to retire after execution of a lock instruction.

Type: Grant

Filed: September 15, 2016

Date of Patent: October 9, 2018

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Gregory W. Smaus, John M. King, Michael D. Achenbach, Kevin M. Lepak, Matthew A. Rafacz, Noah Bamford
LOCK ADDRESS CONTENTION PREDICTOR

Publication number: 20180081544

Abstract: Techniques for selectively executing a lock instruction speculatively or non-speculatively based on lock address prediction and/or temporal lock prediction. including methods an devices for locking an entry in a memory device. In some techniques, a lock instruction executed by a thread for a particular memory entry of a memory device is detected. Whether contention occurred for the particular memory entry during an earlier speculative lock is detected on a condition that the lock instruction comprises a speculative lock instruction. The lock is executed non-speculatively if contention occurred for the particular memory entry during an earlier speculative lock. The lock is executed speculatively if contention did not occur for the particular memory entry during an earlier speculative lock.

Type: Application

Filed: September 22, 2016

Publication date: March 22, 2018

Applicant: Advanced Micro Devices, Inc.

Inventors: Gregory W. Smaus, John M. King, Matthew A. Rafacz, Matthew M. Crum
TECHNIQUES FOR HANDLING CACHE COHERENCY TRAFFIC FOR CONTENDED SEMAPHORES

Publication number: 20180081810

Abstract: The techniques described herein improve cache traffic performance in the context of contended lock instructions. More specifically, each core maintains a lock address contention table that stores addresses corresponding to contended lock instructions. The lock address contention table also includes a state value that indicates progress through a series of states meant to track whether a load by the core in a spin-loop associated with semaphore acquisition has obtained the semaphore in an exclusive state. Upon detecting that a load in a spin-loop has obtained the semaphore in an exclusive state, the core responds to incoming requests for access to the semaphore with negative acknowledgments. This allows the core to maintain the semaphore cache line in an exclusive state, which allows it to acquire the semaphore faster and to avoid transmitting that cache line to other cores unnecessarily.

Type: Application

Filed: September 19, 2016

Publication date: March 22, 2018

Applicant: Advanced Micro Devices, Inc.

Inventors: John M. King, Gregory W. Smaus
SPECULATIVE RETIREMENT OF POST-LOCK INSTRUCTIONS

Publication number: 20180074977

Abstract: Techniques for improving execution of a lock instruction are provided herein. A lock instruction and younger instructions are allowed to speculatively retire prior to the store portion of the lock instruction committing its value to memory. These instructions thus do not have to wait for the lock instruction to complete before retiring. In the event that the processor detects a violation of the atomic or fencing properties of the lock instruction prior to committing the value of the lock instruction, the processor rolls back state and executes the lock instruction in a slow mode in which younger instructions are not allowed to retire until the stored value of the lock instruction is committed. Speculative retirement of these instructions results in increased processing speed, as instructions no longer need to wait to retire after execution of a lock instruction.

Type: Application

Filed: September 15, 2016

Publication date: March 15, 2018

Applicant: Advanced Micro Devices, Inc.

Inventors: Gregory W. Smaus, John M. King, Michael D. Achenbach, Kevin M. Lepak, Matthew A. Rafacz, Noah Bamford
Hybrid tag scheduler to broadcast scheduler entry tags for picked instructions

Patent number: 9727340

Abstract: The present invention provides a method and apparatus for scheduling based on tags of different types. Some embodiments of the method include broadcasting a first tag to entries in a queue of a scheduler. The first tag is broadcast in response to a first instruction associated with a first entry in the queue being picked for execution. The first tag includes information identifying the first entry and information indicating a type of the first tag. Some embodiments of the method also include marking at least one second entry in the queue is ready to be picked for execution in response to at least one second tag associated with at least one second entry in the queue matching the first tag.

Type: Grant

Filed: July 17, 2013

Date of Patent: August 8, 2017

Assignee: Advanced Micro Devices, Inc.

Inventors: Michael Achenbach, Teik Tan, Gregory W. Smaus, Ganesh Venkataramanan, Emil Talpes
Dependence-based replay suppression

Patent number: 9606806

Abstract: A method includes selecting for execution in a processor a load instruction having at least one dependent instruction. Responsive to selecting the load instruction, the at least one dependent instruction is selectively awakened based on a status of a store instruction associated with the load instruction to indicate that the at least one dependent instruction is eligible for execution. A processor includes an instruction pipeline having an execution unit to execute instructions, a scheduler, and a controller. The scheduler selects for execution in the execution unit a load instruction having at least one dependent instruction. The controller, responsive to the scheduler selecting the load instruction, selectively awakens the at least one dependent instruction based on a status of a store instruction associated with the load instruction to indicate that the at least one dependent instruction is eligible for execution by the execution unit.

Type: Grant

Filed: June 25, 2013

Date of Patent: March 28, 2017

Assignee: Advanced Micro Devices, Inc.

Inventors: Gregory W. Smaus, Michael Achenbach, Christopher J. Burke, Francesco Spadini
Register file management for operations using a single physical register for both source and result

Patent number: 9582286

Abstract: A processor includes a physical register file having physical registers and an execution unit to perform an arithmetic operation to generate a result mapped to a physical register, wherein the processor delays a write of the result to the physical register file until the result is qualified as valid. A method includes mapping the same physical register both to store load data of a load-execute operation and to subsequently store a result of an arithmetic operation of the load-execute operation, and writing the load data into the physical register. The method further includes, in a first clock cycle, executing the arithmetic operation to generate the result, and, in a second clock cycle, providing the result as a source operand for a dependent operation. The method includes, in a third clock cycle, enabling a write of the result to the physical register file responsive to the result qualifying as valid.

Type: Grant

Filed: November 9, 2012

Date of Patent: February 28, 2017

Assignee: Advanced Micro Devices, Inc.

Inventors: Ganesh Venkataramanan, Debjit Das Sarma, Betty A. McDaniel, Gregory W. Smaus, Francesco Spadini
HYBRID TAG SCHEDULER

Publication number: 20150026436

Abstract: The present invention provides a method and apparatus for scheduling based on tags of different types. Some embodiments of the method include broadcasting a first tag to entries in a queue of a scheduler. The first tag is broadcast in response to a first instruction associated with a first entry in the queue being picked for execution. The first tag includes information identifying the first entry and information indicating a type of the first tag. Some embodiments of the method also include marking at least one second entry in the queue is ready to be picked for execution in response to at least one second tag associated with at least one second entry in the queue matching the first tag.

Type: Application

Filed: July 17, 2013

Publication date: January 22, 2015

Applicant: Advanced Micro Devices, Inc.

Inventors: Michael Achenbach, Teik Tan, Gregory W. Smaus, Ganesh Venkataramanan, Emil Talpes
DEPENDENCE-BASED REPLAY SUPPRESSION

Publication number: 20140380023

Abstract: A method includes selecting for execution in a processor a load instruction having at least one dependent instruction. Responsive to selecting the load instruction, the at least one dependent instruction is selectively awakened based on a status of a store instruction associated with the load instruction to indicate that the at least one dependent instruction is eligible for execution. A processor includes an instruction pipeline having an execution unit to execute instructions, a scheduler, and a controller. The scheduler selects for execution in the execution unit a load instruction having at least one dependent instruction. The controller, responsive to the scheduler selecting the load instruction, selectively awakens the at least one dependent instruction based on a status of a store instruction associated with the load instruction to indicate that the at least one dependent instruction is eligible for execution by the execution unit.

Type: Application

Filed: June 25, 2013

Publication date: December 25, 2014

Inventors: Gregory W. Smaus, Michael Achenbach, Christopher J. Burke, Francesco Spadini
STORE-TO-LOAD FORWARDING

Publication number: 20140181482

Abstract: An arithmetic unit performs store-to-load forwarding based on predicted dependencies between store instructions and load instructions. In some embodiments, the arithmetic unit maintains a table of store instructions that are awaiting movement to a load/store unit of the instruction pipeline. In response to receiving a load instruction that is predicted to be dependent on a store instruction stored at the table, the arithmetic unit causes the data associated with the store instruction to be placed into the physical register targeted by the load instruction. In some embodiments, the arithmetic unit performs the forwarding by mapping the physical register targeted by the load instruction to the physical register where the data associated with the store instruction is located.

Type: Application

Filed: December 20, 2012

Publication date: June 26, 2014

Applicant: Advanced Micro Devices, Inc.

Inventors: Gregory W. Smaus, Francesco Spadini, Matthew A. Rafacz, Michael Achenbach, Christopher J. Burke, Emil Talpes, Matthew M. Crum
REGISTER FILE MANAGEMENT FOR OPERATIONS USING A SINGLE PHYSICAL REGISTER FOR BOTH SOURCE AND RESULT

Publication number: 20140136819

Abstract: A processor includes a physical register file having physical registers and an execution unit to perform an arithmetic operation to generate a result mapped to a physical register, wherein the processor delays a write of the result to the physical register file until the result is qualified as valid. A method includes mapping the same physical register both to store load data of a load-execute operation and to subsequently store a result of an arithmetic operation of the load-execute operation, and writing the load data into the physical register. The method further includes, in a first clock cycle, executing the arithmetic operation to generate the result, and, in a second clock cycle, providing the result as a source operand for a dependent operation. The method includes, in a third clock cycle, enabling a write of the result to the physical register file responsive to the result qualifying as valid.

Type: Application

Filed: November 9, 2012

Publication date: May 15, 2014

Applicant: Advanced Micro Devices, Inc.

Inventors: Ganesh Venkataramanan, Debjit Das Sarma, Betty A. McDaniel, Gregory W. Smaus, Francesco Spadini

1 2 next