Patents by Inventor Francesco Spadini

Francesco Spadini has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240329988
    Abstract: Techniques are disclosed that relate to executing fused instructions. A processor may include a decoder circuit and a load/store circuit. The decoder circuit may detect a load/store instruction to load a value from a memory and detect a non-load/store instruction that depends on the value to be loaded. The decoder circuit may fuse the load/store instruction and the non-load/store instruction such that one or more operations that the non-load/store instruction is defined to perform are to be executed within the load/store circuit. The load/store circuit may receive an indication of the fused load/store and non-load/store instructions and then execute one or more operations of the load/store instruction and the one or more operations of the non-load/store instruction using a circuit included in the load/store circuit.
    Type: Application
    Filed: June 10, 2024
    Publication date: October 3, 2024
    Inventors: John D. Pape, Skanda K. Srinivasa, Francesco Spadini, Brian T. Mokrzycki
  • Publication number: 20240248844
    Abstract: In an embodiment, a processor implements a different atomicity size (for memory consistency order) than the operation size. More particularly, the processor may implement a smaller atomicity size than the operation size. For example, for multiple register loads, the atomicity size may be the register size. In another example, the vector element size may be the atomicity size for vector load instructions. In yet another example, multiple contiguous vector elements, but fewer than all the vector elements in a vector register, may be the atomicity size for vector load instructions.
    Type: Application
    Filed: February 26, 2024
    Publication date: July 25, 2024
    Inventors: Francesco Spadini, Gideon Levinsky, Mridul Agarwal
  • Patent number: 12008369
    Abstract: Techniques are disclosed that relate to executing fused instructions. A processor may include a decoder circuit and a load/store circuit. The decoder circuit may detect a load/store instruction to load a value from a memory and detect a non-load/store instruction that depends on the value to be loaded. The decoder circuit may fuse the load/store instruction and the non-load/store instruction such that one or more operations that the non-load/store instruction is defined to perform are to be executed within the load/store circuit. The load/store circuit may receive an indication of the fused load/store and non-load/store instructions and then execute one or more operations of the load/store instruction and the one or more operations of the non-load/store instruction using a circuit included in the load/store circuit.
    Type: Grant
    Filed: February 25, 2022
    Date of Patent: June 11, 2024
    Assignee: Apple Inc.
    Inventors: John D. Pape, Skanda K. Srinivasa, Francesco Spadini, Brian T. Mokrzycki
  • Patent number: 11914511
    Abstract: In an embodiment, a processor implements a different atomicity size (for memory consistency order) than the operation size. More particularly, the processor may implement a smaller atomicity size than the operation size. For example, for multiple register loads, the atomicity size may be the register size. In another example, the vector element size may be the atomicity size for vector load instructions. In yet another example, multiple contiguous vector elements, but fewer than all the vector elements in a vector register, may be the atomicity size for vector load instructions.
    Type: Grant
    Filed: June 22, 2020
    Date of Patent: February 27, 2024
    Assignee: Apple Inc.
    Inventors: Francesco Spadini, Gideon Levinsky, Mridul Agarwal
  • Patent number: 11900118
    Abstract: An apparatus includes a rescue buffer circuit, a store queue circuit, and a control circuit. The rescue buffer circuit may be configured to retain address information related to store instructions. The store queue circuit may be configured to buffer dependency information related to a particular store instruction until the particular store instruction is released to be executed. The control circuit may be configured to cause a subset of the dependency information for the particular store instruction to be written to the rescue buffer circuit. The rescue buffer circuit may be configured to retain the subset after the dependency information has been released from the store queue circuit, and to perform a subsequent load instruction corresponding to a memory location associated with the particular store instruction using the subset of the dependency information from the rescue buffer circuit.
    Type: Grant
    Filed: August 5, 2022
    Date of Patent: February 13, 2024
    Assignee: Apple Inc.
    Inventors: John D. Pape, Francesco Spadini, Zhaoxiang Jin
  • Patent number: 11379234
    Abstract: An arithmetic unit performs store-to-load forwarding based on predicted dependencies between store instructions and load instructions. In some embodiments, the arithmetic unit maintains a table of store instructions that are awaiting movement to a load/store unit of the instruction pipeline. In response to receiving a load instruction that is predicted to be dependent on a store instruction stored at the table, the arithmetic unit causes the data associated with the store instruction to be placed into the physical register targeted by the load instruction. In some embodiments, the arithmetic unit performs the forwarding by mapping the physical register targeted by the load instruction to the physical register where the data associated with the store instruction is located.
    Type: Grant
    Filed: May 19, 2021
    Date of Patent: July 5, 2022
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Gregory W. Smaus, Francesco Spadini, Matthew A. Rafacz, Michael Achenbach, Christopher J. Burke, Emil Talpes, Matthew M. Crum
  • Publication number: 20210397555
    Abstract: In an embodiment, a processor implements a different atomicity size (for memory consistency order) than the operation size. More particularly, the processor may implement a smaller atomicity size than the operation size. For example, for multiple register loads, the atomicity size may be the register size. In another example, the vector element size may be the atomicity size for vector load instructions. In yet another example, multiple contiguous vector elements, but fewer than all the vector elements in a vector register, may be the atomicity size for vector load instructions.
    Type: Application
    Filed: June 22, 2020
    Publication date: December 23, 2021
    Inventors: Francesco Spadini, Gideon Levinsky, Mridul Agarwal
  • Publication number: 20210311737
    Abstract: An arithmetic unit performs store-to-load forwarding based on predicted dependencies between store instructions and load instructions. In some embodiments, the arithmetic unit maintains a table of store instructions that are awaiting movement to a load/store unit of the instruction pipeline. In response to receiving a load instruction that is predicted to be dependent on a store instruction stored at the table, the arithmetic unit causes the data associated with the store instruction to be placed into the physical register targeted by the load instruction. In some embodiments, the arithmetic unit performs the forwarding by mapping the physical register targeted by the load instruction to the physical register where the data associated with the store instruction is located.
    Type: Application
    Filed: May 19, 2021
    Publication date: October 7, 2021
    Inventors: Gregory W. Smaus, Francesco Spadini, Matthew A. Rafacz, Michael Achenbach, Christopher J. Burke, Emil Talpes, Matthew M. Crum
  • Patent number: 11036505
    Abstract: An arithmetic unit performs store-to-load forwarding based on predicted dependencies between store instructions and load instructions. In some embodiments, the arithmetic unit maintains a table of store instructions that are awaiting movement to a load/store unit of the instruction pipeline. In response to receiving a load instruction that is predicted to be dependent on a store instruction stored at the table, the arithmetic unit causes the data associated with the store instruction to be placed into the physical register targeted by the load instruction. In some embodiments, the arithmetic unit performs the forwarding by mapping the physical register targeted by the load instruction to the physical register where the data associated with the store instruction is located.
    Type: Grant
    Filed: December 20, 2012
    Date of Patent: June 15, 2021
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Gregory W. Smaus, Francesco Spadini, Matthew A. Rafacz, Michael Achenbach, Christopher J. Burke, Emil Talpes, Matthew M. Crum
  • Patent number: 10331567
    Abstract: A prefetch circuit may include a memory, each entry of which may store an address and other prefetch data used to generate prefetch requests. For each entry, there may be at least one “quality factor” (QF) that may control prefetch request generation for that entry. A global quality factor (GQF) may control generation of prefetch requests across the plurality of entries. The prefetch circuit may include one or more additional prefetch mechanisms. For example, a stride-based prefetch circuit may be included that may generate prefetch requests for strided access patterns having strides larger than a certain stride size. Another example is a spatial memory streaming (SMS)-based mechanism in which prefetch data from multiple evictions from the memory in the prefetch circuit is captured and used for SMS prefetching based on how well the prefetch data appears to match a spatial memory streaming pattern.
    Type: Grant
    Filed: February 17, 2017
    Date of Patent: June 25, 2019
    Assignee: Apple Inc.
    Inventors: Stephan G. Meier, Tyler J. Huberty, Nikhil Gupta, Francesco Spadini, Gideon Levinsky
  • Patent number: 9817667
    Abstract: A dispatch stage of a processor core dispatches designated operations (e.g. load/store operations) to a temporary queue when the resources to execute the designated operations are not available. Once the resources become available to execute an operation at the temporary queue, the operation is transferred to a scheduler queue where it can be picked for execution. By dispatching the designated operations to the temporary queue, other operations behind the designated operations in a program order are made available for dispatch to the scheduler queue, thereby improving instruction throughput at the processor core.
    Type: Grant
    Filed: May 23, 2013
    Date of Patent: November 14, 2017
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Francesco Spadini
  • Patent number: 9715389
    Abstract: A method includes suppressing execution of at least one dependent instruction of a load instruction by a processor using stored dependency information responsive to an invalid status of the load instruction. A processor includes an execution unit to execute instructions and a scheduler. The scheduler is to select for execution in the execution unit a load instruction having at least one dependent instruction and suppress execution of the at least one dependent instruction using stored dependency information responsive to an invalid status of the load instruction.
    Type: Grant
    Filed: June 25, 2013
    Date of Patent: July 25, 2017
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Francesco Spadini, Michael Achenbach
  • Patent number: 9606806
    Abstract: A method includes selecting for execution in a processor a load instruction having at least one dependent instruction. Responsive to selecting the load instruction, the at least one dependent instruction is selectively awakened based on a status of a store instruction associated with the load instruction to indicate that the at least one dependent instruction is eligible for execution. A processor includes an instruction pipeline having an execution unit to execute instructions, a scheduler, and a controller. The scheduler selects for execution in the execution unit a load instruction having at least one dependent instruction. The controller, responsive to the scheduler selecting the load instruction, selectively awakens the at least one dependent instruction based on a status of a store instruction associated with the load instruction to indicate that the at least one dependent instruction is eligible for execution by the execution unit.
    Type: Grant
    Filed: June 25, 2013
    Date of Patent: March 28, 2017
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Gregory W. Smaus, Michael Achenbach, Christopher J. Burke, Francesco Spadini
  • Patent number: 9582286
    Abstract: A processor includes a physical register file having physical registers and an execution unit to perform an arithmetic operation to generate a result mapped to a physical register, wherein the processor delays a write of the result to the physical register file until the result is qualified as valid. A method includes mapping the same physical register both to store load data of a load-execute operation and to subsequently store a result of an arithmetic operation of the load-execute operation, and writing the load data into the physical register. The method further includes, in a first clock cycle, executing the arithmetic operation to generate the result, and, in a second clock cycle, providing the result as a source operand for a dependent operation. The method includes, in a third clock cycle, enabling a write of the result to the physical register file responsive to the result qualifying as valid.
    Type: Grant
    Filed: November 9, 2012
    Date of Patent: February 28, 2017
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Ganesh Venkataramanan, Debjit Das Sarma, Betty A. McDaniel, Gregory W. Smaus, Francesco Spadini
  • Patent number: 9489206
    Abstract: A method includes suppressing execution of at least one dependent instruction of a first instruction by a processor responsive to an invalid status of an ancestor load instruction associated with the first instruction. A processor includes an instruction pipeline having an execution unit to execute instructions, a load store unit for retrieving data from a memory hierarchy, and a scheduler unit. The scheduler unit selects for execution in the execution unit a first load instruction having at least one dependent instruction linked to the first load instruction for data forwarding from the load store unit and suppresses execution of a second dependent instruction of the first dependent instruction responsive to an invalid status of the first load instruction.
    Type: Grant
    Filed: July 16, 2013
    Date of Patent: November 8, 2016
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Francesco Spadini, Michael Achenbach, Emil Talpes, Ganesh Venkataramanan
  • Patent number: 9483273
    Abstract: A method includes suppressing execution of an operation portion of a load-operation instruction in a processor responsive to an invalid status of a load portion of load-operation instruction. A processor includes an instruction pipeline including an execution unit operable to execute instructions and a scheduler unit. The scheduler unit includes a scheduler queue and is operable to store a load-operation in the scheduler queue. The load-operation instruction includes a load portion and an operation portion. The scheduler unit schedules the load portion for execution in the execution unit, marks the operation portion in the scheduler queue as eligible for execution responsive to scheduling the load portion, receives an indication of an invalid status of the load portion, and suppresses execution of the operation portion responsive to the indication of the invalid status.
    Type: Grant
    Filed: July 16, 2013
    Date of Patent: November 1, 2016
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Francesco Spadini, Michael Achenbach, Emil Talpes, Ganesh Venkataramanan
  • Patent number: 9471326
    Abstract: A processor core stores information that maps a physical register to an architectural register in response to an instruction modifying the architectural register. The processor recovers a checkpointed state of a set of architectural registers prior to modification of the architectural register by the instruction by modifying a reference mapping of physical registers to the set of architectural registers using the stored information.
    Type: Grant
    Filed: July 17, 2013
    Date of Patent: October 18, 2016
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Michael Achenbach, Francesco Spadini
  • Publication number: 20150026686
    Abstract: A method includes suppressing execution of an operation portion of a load-operation instruction in a processor responsive to an invalid status of a load portion of load-operation instruction. A processor includes an instruction pipeline including an execution unit operable to execute instructions and a scheduler unit. The scheduler unit includes a scheduler queue and is operable to store a load-operation in the scheduler queue. The load-operation instruction includes a load portion and an operation portion. The scheduler unit schedules the load portion for execution in the execution unit, marks the operation portion in the scheduler queue as eligible for execution responsive to scheduling the load portion, receives an indication of an invalid status of the load portion, and suppresses execution of the operation portion responsive to the indication of the invalid status.
    Type: Application
    Filed: July 16, 2013
    Publication date: January 22, 2015
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Francesco Spadini, Michael Achenbach, Emil Talpes, Ganesh Venkataramanan
  • Publication number: 20150026685
    Abstract: A method includes suppressing execution of at least one dependent instruction of a first instruction by a processor responsive to an invalid status of an ancestor load instruction associated with the first instruction. A processor includes an instruction pipeline having an execution unit to execute instructions, a load store unit for retrieving data from a memory hierarchy, and a scheduler unit. The scheduler unit selects for execution in the execution unit a first load instruction having at least one dependent instruction linked to the first load instruction for data forwarding from the load store unit and suppresses execution of a second dependent instruction of the first dependent instruction responsive to an invalid status of the first load instruction.
    Type: Application
    Filed: July 16, 2013
    Publication date: January 22, 2015
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Francesco Spadini, Michael Achenbach, Emil Talpes, Ganesh Venkataramanan
  • Publication number: 20150026437
    Abstract: A processor core stores information that maps a physical register to an architectural register in response to an instruction modifying the architectural register. The processor recovers a checkpointed state of a set of architectural registers prior to modification of the architectural register by the instruction by modifying a reference mapping of physical registers to the set of architectural registers using the stored information.
    Type: Application
    Filed: July 17, 2013
    Publication date: January 22, 2015
    Inventors: Michael Achenbach, Francesco Spadini