Patents by Inventor Francesco Spadini

Francesco Spadini has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Operation fusion for instructions bridging execution unit types

Patent number: 12288066

Abstract: Techniques are disclosed that relate to fusing operations for execution of certain instructions. A processor may include a first execution circuit, of a first type, coupled to a first register file, a second execution circuit, of a second type, coupled to a second register file and a load/store circuit coupled to the first and second register files. The load/store circuit includes an issue port configured to receive an instruction operation for execution, a memory execution circuit configured to execute memory access operations, and a register transfer execution circuit. The register transfer execution circuit is configured to execute instruction operations specifying data transfer from the first register file to the second register file and an operation to be performed using the data, and the load/store circuit is configured to direct a given instruction operation from the issue port to one of the memory execution circuit or the register transfer execution circuit.

Type: Grant

Filed: May 18, 2023

Date of Patent: April 29, 2025

Assignee: Apple Inc.

Inventors: Zhaoxiang Jin, Francesco Spadini, Skanda K. Srinivasa, Milos Becvar
Instruction fusion

Patent number: 12217060

Abstract: Techniques are disclosed that relate to executing pairs of instructions. A processor may include fusion detector circuitry configured to detect a pair of fetched instructions and fuse the pair of fetched instructions into a fused instruction operation, and execution circuitry coupled to the fusion detector circuitry and configured to execute the fused instruction operation. In some embodiments the pair of instructions is executable to generate a remainder of a division operation. In some embodiments the pair of instructions is executable to compare two operands and perform a write operation based on the comparison. In some embodiments the pair of instructions is executable to perform an operation and apply a mask bit sequence to the result. The fusion detector circuitry may also be configured to obtain first and second portions of a constant value from first and second instructions and store the first and second portions in a destination register.

Type: Grant

Filed: February 28, 2023

Date of Patent: February 4, 2025

Assignee: Apple Inc.

Inventors: Francesco Spadini, Skanda K. Srinivasa, Reena Panda, Brian T. Mokrzycki, Haoyan Jia, Zhaoxiang Jin
Load Instruction Fusion

Publication number: 20240329988

Abstract: Techniques are disclosed that relate to executing fused instructions. A processor may include a decoder circuit and a load/store circuit. The decoder circuit may detect a load/store instruction to load a value from a memory and detect a non-load/store instruction that depends on the value to be loaded. The decoder circuit may fuse the load/store instruction and the non-load/store instruction such that one or more operations that the non-load/store instruction is defined to perform are to be executed within the load/store circuit. The load/store circuit may receive an indication of the fused load/store and non-load/store instructions and then execute one or more operations of the load/store instruction and the one or more operations of the non-load/store instruction using a circuit included in the load/store circuit.

Type: Application

Filed: June 10, 2024

Publication date: October 3, 2024

Inventors: John D. Pape, Skanda K. Srinivasa, Francesco Spadini, Brian T. Mokrzycki
Decoupling Atomicity from Operation Size

Publication number: 20240248844

Abstract: In an embodiment, a processor implements a different atomicity size (for memory consistency order) than the operation size. More particularly, the processor may implement a smaller atomicity size than the operation size. For example, for multiple register loads, the atomicity size may be the register size. In another example, the vector element size may be the atomicity size for vector load instructions. In yet another example, multiple contiguous vector elements, but fewer than all the vector elements in a vector register, may be the atomicity size for vector load instructions.

Type: Application

Filed: February 26, 2024

Publication date: July 25, 2024

Inventors: Francesco Spadini, Gideon Levinsky, Mridul Agarwal
Load instruction fusion

Patent number: 12008369

Abstract: Techniques are disclosed that relate to executing fused instructions. A processor may include a decoder circuit and a load/store circuit. The decoder circuit may detect a load/store instruction to load a value from a memory and detect a non-load/store instruction that depends on the value to be loaded. The decoder circuit may fuse the load/store instruction and the non-load/store instruction such that one or more operations that the non-load/store instruction is defined to perform are to be executed within the load/store circuit. The load/store circuit may receive an indication of the fused load/store and non-load/store instructions and then execute one or more operations of the load/store instruction and the one or more operations of the non-load/store instruction using a circuit included in the load/store circuit.

Type: Grant

Filed: February 25, 2022

Date of Patent: June 11, 2024

Assignee: Apple Inc.

Inventors: John D. Pape, Skanda K. Srinivasa, Francesco Spadini, Brian T. Mokrzycki
Decoupling atomicity from operation size

Patent number: 11914511

Abstract: In an embodiment, a processor implements a different atomicity size (for memory consistency order) than the operation size. More particularly, the processor may implement a smaller atomicity size than the operation size. For example, for multiple register loads, the atomicity size may be the register size. In another example, the vector element size may be the atomicity size for vector load instructions. In yet another example, multiple contiguous vector elements, but fewer than all the vector elements in a vector register, may be the atomicity size for vector load instructions.

Type: Grant

Filed: June 22, 2020

Date of Patent: February 27, 2024

Assignee: Apple Inc.

Inventors: Francesco Spadini, Gideon Levinsky, Mridul Agarwal
Stack pointer instruction buffer for zero-cycle loads

Patent number: 11900118

Abstract: An apparatus includes a rescue buffer circuit, a store queue circuit, and a control circuit. The rescue buffer circuit may be configured to retain address information related to store instructions. The store queue circuit may be configured to buffer dependency information related to a particular store instruction until the particular store instruction is released to be executed. The control circuit may be configured to cause a subset of the dependency information for the particular store instruction to be written to the rescue buffer circuit. The rescue buffer circuit may be configured to retain the subset after the dependency information has been released from the store queue circuit, and to perform a subsequent load instruction corresponding to a memory location associated with the particular store instruction using the subset of the dependency information from the rescue buffer circuit.

Type: Grant

Filed: August 5, 2022

Date of Patent: February 13, 2024

Assignee: Apple Inc.

Inventors: John D. Pape, Francesco Spadini, Zhaoxiang Jin
Store-to-load forwarding

Patent number: 11379234

Abstract: An arithmetic unit performs store-to-load forwarding based on predicted dependencies between store instructions and load instructions. In some embodiments, the arithmetic unit maintains a table of store instructions that are awaiting movement to a load/store unit of the instruction pipeline. In response to receiving a load instruction that is predicted to be dependent on a store instruction stored at the table, the arithmetic unit causes the data associated with the store instruction to be placed into the physical register targeted by the load instruction. In some embodiments, the arithmetic unit performs the forwarding by mapping the physical register targeted by the load instruction to the physical register where the data associated with the store instruction is located.

Type: Grant

Filed: May 19, 2021

Date of Patent: July 5, 2022

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Gregory W. Smaus, Francesco Spadini, Matthew A. Rafacz, Michael Achenbach, Christopher J. Burke, Emil Talpes, Matthew M. Crum
Decoupling Atomicity from Operation Size

Publication number: 20210397555

Abstract: In an embodiment, a processor implements a different atomicity size (for memory consistency order) than the operation size. More particularly, the processor may implement a smaller atomicity size than the operation size. For example, for multiple register loads, the atomicity size may be the register size. In another example, the vector element size may be the atomicity size for vector load instructions. In yet another example, multiple contiguous vector elements, but fewer than all the vector elements in a vector register, may be the atomicity size for vector load instructions.

Type: Application

Filed: June 22, 2020

Publication date: December 23, 2021

Inventors: Francesco Spadini, Gideon Levinsky, Mridul Agarwal
STORE-TO-LOAD FORWARDING

Publication number: 20210311737

Abstract: An arithmetic unit performs store-to-load forwarding based on predicted dependencies between store instructions and load instructions. In some embodiments, the arithmetic unit maintains a table of store instructions that are awaiting movement to a load/store unit of the instruction pipeline. In response to receiving a load instruction that is predicted to be dependent on a store instruction stored at the table, the arithmetic unit causes the data associated with the store instruction to be placed into the physical register targeted by the load instruction. In some embodiments, the arithmetic unit performs the forwarding by mapping the physical register targeted by the load instruction to the physical register where the data associated with the store instruction is located.

Type: Application

Filed: May 19, 2021

Publication date: October 7, 2021

Inventors: Gregory W. Smaus, Francesco Spadini, Matthew A. Rafacz, Michael Achenbach, Christopher J. Burke, Emil Talpes, Matthew M. Crum
Store-to-load forwarding

Patent number: 11036505

Abstract: An arithmetic unit performs store-to-load forwarding based on predicted dependencies between store instructions and load instructions. In some embodiments, the arithmetic unit maintains a table of store instructions that are awaiting movement to a load/store unit of the instruction pipeline. In response to receiving a load instruction that is predicted to be dependent on a store instruction stored at the table, the arithmetic unit causes the data associated with the store instruction to be placed into the physical register targeted by the load instruction. In some embodiments, the arithmetic unit performs the forwarding by mapping the physical register targeted by the load instruction to the physical register where the data associated with the store instruction is located.

Type: Grant

Filed: December 20, 2012

Date of Patent: June 15, 2021

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Gregory W. Smaus, Francesco Spadini, Matthew A. Rafacz, Michael Achenbach, Christopher J. Burke, Emil Talpes, Matthew M. Crum
Prefetch circuit with global quality factor to reduce aggressiveness in low power modes

Patent number: 10331567

Abstract: A prefetch circuit may include a memory, each entry of which may store an address and other prefetch data used to generate prefetch requests. For each entry, there may be at least one “quality factor” (QF) that may control prefetch request generation for that entry. A global quality factor (GQF) may control generation of prefetch requests across the plurality of entries. The prefetch circuit may include one or more additional prefetch mechanisms. For example, a stride-based prefetch circuit may be included that may generate prefetch requests for strided access patterns having strides larger than a certain stride size. Another example is a spatial memory streaming (SMS)-based mechanism in which prefetch data from multiple evictions from the memory in the prefetch circuit is captured and used for SMS prefetching based on how well the prefetch data appears to match a spatial memory streaming pattern.

Type: Grant

Filed: February 17, 2017

Date of Patent: June 25, 2019

Assignee: Apple Inc.

Inventors: Stephan G. Meier, Tyler J. Huberty, Nikhil Gupta, Francesco Spadini, Gideon Levinsky
Techniques for scheduling operations at an instruction pipeline

Patent number: 9817667

Abstract: A dispatch stage of a processor core dispatches designated operations (e.g. load/store operations) to a temporary queue when the resources to execute the designated operations are not available. Once the resources become available to execute an operation at the temporary queue, the operation is transferred to a scheduler queue where it can be picked for execution. By dispatching the designated operations to the temporary queue, other operations behind the designated operations in a program order are made available for dispatch to the scheduler queue, thereby improving instruction throughput at the processor core.

Type: Grant

Filed: May 23, 2013

Date of Patent: November 14, 2017

Assignee: Advanced Micro Devices, Inc.

Inventor: Francesco Spadini
Dependent instruction suppression

Patent number: 9715389

Abstract: A method includes suppressing execution of at least one dependent instruction of a load instruction by a processor using stored dependency information responsive to an invalid status of the load instruction. A processor includes an execution unit to execute instructions and a scheduler. The scheduler is to select for execution in the execution unit a load instruction having at least one dependent instruction and suppress execution of the at least one dependent instruction using stored dependency information responsive to an invalid status of the load instruction.

Type: Grant

Filed: June 25, 2013

Date of Patent: July 25, 2017

Assignee: Advanced Micro Devices, Inc.

Inventors: Francesco Spadini, Michael Achenbach
Dependence-based replay suppression

Patent number: 9606806

Abstract: A method includes selecting for execution in a processor a load instruction having at least one dependent instruction. Responsive to selecting the load instruction, the at least one dependent instruction is selectively awakened based on a status of a store instruction associated with the load instruction to indicate that the at least one dependent instruction is eligible for execution. A processor includes an instruction pipeline having an execution unit to execute instructions, a scheduler, and a controller. The scheduler selects for execution in the execution unit a load instruction having at least one dependent instruction. The controller, responsive to the scheduler selecting the load instruction, selectively awakens the at least one dependent instruction based on a status of a store instruction associated with the load instruction to indicate that the at least one dependent instruction is eligible for execution by the execution unit.

Type: Grant

Filed: June 25, 2013

Date of Patent: March 28, 2017

Assignee: Advanced Micro Devices, Inc.

Inventors: Gregory W. Smaus, Michael Achenbach, Christopher J. Burke, Francesco Spadini
Register file management for operations using a single physical register for both source and result

Patent number: 9582286

Abstract: A processor includes a physical register file having physical registers and an execution unit to perform an arithmetic operation to generate a result mapped to a physical register, wherein the processor delays a write of the result to the physical register file until the result is qualified as valid. A method includes mapping the same physical register both to store load data of a load-execute operation and to subsequently store a result of an arithmetic operation of the load-execute operation, and writing the load data into the physical register. The method further includes, in a first clock cycle, executing the arithmetic operation to generate the result, and, in a second clock cycle, providing the result as a source operand for a dependent operation. The method includes, in a third clock cycle, enabling a write of the result to the physical register file responsive to the result qualifying as valid.

Type: Grant

Filed: November 9, 2012

Date of Patent: February 28, 2017

Assignee: Advanced Micro Devices, Inc.

Inventors: Ganesh Venkataramanan, Debjit Das Sarma, Betty A. McDaniel, Gregory W. Smaus, Francesco Spadini
Dependent instruction suppression

Patent number: 9489206

Abstract: A method includes suppressing execution of at least one dependent instruction of a first instruction by a processor responsive to an invalid status of an ancestor load instruction associated with the first instruction. A processor includes an instruction pipeline having an execution unit to execute instructions, a load store unit for retrieving data from a memory hierarchy, and a scheduler unit. The scheduler unit selects for execution in the execution unit a first load instruction having at least one dependent instruction linked to the first load instruction for data forwarding from the load store unit and suppresses execution of a second dependent instruction of the first dependent instruction responsive to an invalid status of the first load instruction.

Type: Grant

Filed: July 16, 2013

Date of Patent: November 8, 2016

Assignee: Advanced Micro Devices, Inc.

Inventors: Francesco Spadini, Michael Achenbach, Emil Talpes, Ganesh Venkataramanan
Dependent instruction suppression in a load-operation instruction

Patent number: 9483273

Abstract: A method includes suppressing execution of an operation portion of a load-operation instruction in a processor responsive to an invalid status of a load portion of load-operation instruction. A processor includes an instruction pipeline including an execution unit operable to execute instructions and a scheduler unit. The scheduler unit includes a scheduler queue and is operable to store a load-operation in the scheduler queue. The load-operation instruction includes a load portion and an operation portion. The scheduler unit schedules the load portion for execution in the execution unit, marks the operation portion in the scheduler queue as eligible for execution responsive to scheduling the load portion, receives an indication of an invalid status of the load portion, and suppresses execution of the operation portion responsive to the indication of the invalid status.

Type: Grant

Filed: July 16, 2013

Date of Patent: November 1, 2016

Assignee: Advanced Micro Devices, Inc.

Inventors: Francesco Spadini, Michael Achenbach, Emil Talpes, Ganesh Venkataramanan
Method and apparatus for differential checkpointing

Patent number: 9471326

Abstract: A processor core stores information that maps a physical register to an architectural register in response to an instruction modifying the architectural register. The processor recovers a checkpointed state of a set of architectural registers prior to modification of the architectural register by the instruction by modifying a reference mapping of physical registers to the set of architectural registers using the stored information.

Type: Grant

Filed: July 17, 2013

Date of Patent: October 18, 2016

Assignee: Advanced Micro Devices, Inc.

Inventors: Michael Achenbach, Francesco Spadini
DEPENDENT INSTRUCTION SUPPRESSION IN A LOAD-OPERATION INSTRUCTION

Publication number: 20150026686

Abstract: A method includes suppressing execution of an operation portion of a load-operation instruction in a processor responsive to an invalid status of a load portion of load-operation instruction. A processor includes an instruction pipeline including an execution unit operable to execute instructions and a scheduler unit. The scheduler unit includes a scheduler queue and is operable to store a load-operation in the scheduler queue. The load-operation instruction includes a load portion and an operation portion. The scheduler unit schedules the load portion for execution in the execution unit, marks the operation portion in the scheduler queue as eligible for execution responsive to scheduling the load portion, receives an indication of an invalid status of the load portion, and suppresses execution of the operation portion responsive to the indication of the invalid status.

Type: Application

Filed: July 16, 2013

Publication date: January 22, 2015

Applicant: Advanced Micro Devices, Inc.

Inventors: Francesco Spadini, Michael Achenbach, Emil Talpes, Ganesh Venkataramanan

1 2 next