Patents by Inventor Francesco Spadini
Francesco Spadini has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240329988Abstract: Techniques are disclosed that relate to executing fused instructions. A processor may include a decoder circuit and a load/store circuit. The decoder circuit may detect a load/store instruction to load a value from a memory and detect a non-load/store instruction that depends on the value to be loaded. The decoder circuit may fuse the load/store instruction and the non-load/store instruction such that one or more operations that the non-load/store instruction is defined to perform are to be executed within the load/store circuit. The load/store circuit may receive an indication of the fused load/store and non-load/store instructions and then execute one or more operations of the load/store instruction and the one or more operations of the non-load/store instruction using a circuit included in the load/store circuit.Type: ApplicationFiled: June 10, 2024Publication date: October 3, 2024Inventors: John D. Pape, Skanda K. Srinivasa, Francesco Spadini, Brian T. Mokrzycki
-
Publication number: 20240248844Abstract: In an embodiment, a processor implements a different atomicity size (for memory consistency order) than the operation size. More particularly, the processor may implement a smaller atomicity size than the operation size. For example, for multiple register loads, the atomicity size may be the register size. In another example, the vector element size may be the atomicity size for vector load instructions. In yet another example, multiple contiguous vector elements, but fewer than all the vector elements in a vector register, may be the atomicity size for vector load instructions.Type: ApplicationFiled: February 26, 2024Publication date: July 25, 2024Inventors: Francesco Spadini, Gideon Levinsky, Mridul Agarwal
-
Patent number: 12008369Abstract: Techniques are disclosed that relate to executing fused instructions. A processor may include a decoder circuit and a load/store circuit. The decoder circuit may detect a load/store instruction to load a value from a memory and detect a non-load/store instruction that depends on the value to be loaded. The decoder circuit may fuse the load/store instruction and the non-load/store instruction such that one or more operations that the non-load/store instruction is defined to perform are to be executed within the load/store circuit. The load/store circuit may receive an indication of the fused load/store and non-load/store instructions and then execute one or more operations of the load/store instruction and the one or more operations of the non-load/store instruction using a circuit included in the load/store circuit.Type: GrantFiled: February 25, 2022Date of Patent: June 11, 2024Assignee: Apple Inc.Inventors: John D. Pape, Skanda K. Srinivasa, Francesco Spadini, Brian T. Mokrzycki
-
Patent number: 11914511Abstract: In an embodiment, a processor implements a different atomicity size (for memory consistency order) than the operation size. More particularly, the processor may implement a smaller atomicity size than the operation size. For example, for multiple register loads, the atomicity size may be the register size. In another example, the vector element size may be the atomicity size for vector load instructions. In yet another example, multiple contiguous vector elements, but fewer than all the vector elements in a vector register, may be the atomicity size for vector load instructions.Type: GrantFiled: June 22, 2020Date of Patent: February 27, 2024Assignee: Apple Inc.Inventors: Francesco Spadini, Gideon Levinsky, Mridul Agarwal
-
Patent number: 11900118Abstract: An apparatus includes a rescue buffer circuit, a store queue circuit, and a control circuit. The rescue buffer circuit may be configured to retain address information related to store instructions. The store queue circuit may be configured to buffer dependency information related to a particular store instruction until the particular store instruction is released to be executed. The control circuit may be configured to cause a subset of the dependency information for the particular store instruction to be written to the rescue buffer circuit. The rescue buffer circuit may be configured to retain the subset after the dependency information has been released from the store queue circuit, and to perform a subsequent load instruction corresponding to a memory location associated with the particular store instruction using the subset of the dependency information from the rescue buffer circuit.Type: GrantFiled: August 5, 2022Date of Patent: February 13, 2024Assignee: Apple Inc.Inventors: John D. Pape, Francesco Spadini, Zhaoxiang Jin
-
Patent number: 11379234Abstract: An arithmetic unit performs store-to-load forwarding based on predicted dependencies between store instructions and load instructions. In some embodiments, the arithmetic unit maintains a table of store instructions that are awaiting movement to a load/store unit of the instruction pipeline. In response to receiving a load instruction that is predicted to be dependent on a store instruction stored at the table, the arithmetic unit causes the data associated with the store instruction to be placed into the physical register targeted by the load instruction. In some embodiments, the arithmetic unit performs the forwarding by mapping the physical register targeted by the load instruction to the physical register where the data associated with the store instruction is located.Type: GrantFiled: May 19, 2021Date of Patent: July 5, 2022Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Gregory W. Smaus, Francesco Spadini, Matthew A. Rafacz, Michael Achenbach, Christopher J. Burke, Emil Talpes, Matthew M. Crum
-
Publication number: 20210397555Abstract: In an embodiment, a processor implements a different atomicity size (for memory consistency order) than the operation size. More particularly, the processor may implement a smaller atomicity size than the operation size. For example, for multiple register loads, the atomicity size may be the register size. In another example, the vector element size may be the atomicity size for vector load instructions. In yet another example, multiple contiguous vector elements, but fewer than all the vector elements in a vector register, may be the atomicity size for vector load instructions.Type: ApplicationFiled: June 22, 2020Publication date: December 23, 2021Inventors: Francesco Spadini, Gideon Levinsky, Mridul Agarwal
-
Publication number: 20210311737Abstract: An arithmetic unit performs store-to-load forwarding based on predicted dependencies between store instructions and load instructions. In some embodiments, the arithmetic unit maintains a table of store instructions that are awaiting movement to a load/store unit of the instruction pipeline. In response to receiving a load instruction that is predicted to be dependent on a store instruction stored at the table, the arithmetic unit causes the data associated with the store instruction to be placed into the physical register targeted by the load instruction. In some embodiments, the arithmetic unit performs the forwarding by mapping the physical register targeted by the load instruction to the physical register where the data associated with the store instruction is located.Type: ApplicationFiled: May 19, 2021Publication date: October 7, 2021Inventors: Gregory W. Smaus, Francesco Spadini, Matthew A. Rafacz, Michael Achenbach, Christopher J. Burke, Emil Talpes, Matthew M. Crum
-
Patent number: 11036505Abstract: An arithmetic unit performs store-to-load forwarding based on predicted dependencies between store instructions and load instructions. In some embodiments, the arithmetic unit maintains a table of store instructions that are awaiting movement to a load/store unit of the instruction pipeline. In response to receiving a load instruction that is predicted to be dependent on a store instruction stored at the table, the arithmetic unit causes the data associated with the store instruction to be placed into the physical register targeted by the load instruction. In some embodiments, the arithmetic unit performs the forwarding by mapping the physical register targeted by the load instruction to the physical register where the data associated with the store instruction is located.Type: GrantFiled: December 20, 2012Date of Patent: June 15, 2021Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Gregory W. Smaus, Francesco Spadini, Matthew A. Rafacz, Michael Achenbach, Christopher J. Burke, Emil Talpes, Matthew M. Crum
-
Patent number: 10331567Abstract: A prefetch circuit may include a memory, each entry of which may store an address and other prefetch data used to generate prefetch requests. For each entry, there may be at least one “quality factor” (QF) that may control prefetch request generation for that entry. A global quality factor (GQF) may control generation of prefetch requests across the plurality of entries. The prefetch circuit may include one or more additional prefetch mechanisms. For example, a stride-based prefetch circuit may be included that may generate prefetch requests for strided access patterns having strides larger than a certain stride size. Another example is a spatial memory streaming (SMS)-based mechanism in which prefetch data from multiple evictions from the memory in the prefetch circuit is captured and used for SMS prefetching based on how well the prefetch data appears to match a spatial memory streaming pattern.Type: GrantFiled: February 17, 2017Date of Patent: June 25, 2019Assignee: Apple Inc.Inventors: Stephan G. Meier, Tyler J. Huberty, Nikhil Gupta, Francesco Spadini, Gideon Levinsky
-
Patent number: 9817667Abstract: A dispatch stage of a processor core dispatches designated operations (e.g. load/store operations) to a temporary queue when the resources to execute the designated operations are not available. Once the resources become available to execute an operation at the temporary queue, the operation is transferred to a scheduler queue where it can be picked for execution. By dispatching the designated operations to the temporary queue, other operations behind the designated operations in a program order are made available for dispatch to the scheduler queue, thereby improving instruction throughput at the processor core.Type: GrantFiled: May 23, 2013Date of Patent: November 14, 2017Assignee: Advanced Micro Devices, Inc.Inventor: Francesco Spadini
-
Patent number: 9715389Abstract: A method includes suppressing execution of at least one dependent instruction of a load instruction by a processor using stored dependency information responsive to an invalid status of the load instruction. A processor includes an execution unit to execute instructions and a scheduler. The scheduler is to select for execution in the execution unit a load instruction having at least one dependent instruction and suppress execution of the at least one dependent instruction using stored dependency information responsive to an invalid status of the load instruction.Type: GrantFiled: June 25, 2013Date of Patent: July 25, 2017Assignee: Advanced Micro Devices, Inc.Inventors: Francesco Spadini, Michael Achenbach
-
Patent number: 9606806Abstract: A method includes selecting for execution in a processor a load instruction having at least one dependent instruction. Responsive to selecting the load instruction, the at least one dependent instruction is selectively awakened based on a status of a store instruction associated with the load instruction to indicate that the at least one dependent instruction is eligible for execution. A processor includes an instruction pipeline having an execution unit to execute instructions, a scheduler, and a controller. The scheduler selects for execution in the execution unit a load instruction having at least one dependent instruction. The controller, responsive to the scheduler selecting the load instruction, selectively awakens the at least one dependent instruction based on a status of a store instruction associated with the load instruction to indicate that the at least one dependent instruction is eligible for execution by the execution unit.Type: GrantFiled: June 25, 2013Date of Patent: March 28, 2017Assignee: Advanced Micro Devices, Inc.Inventors: Gregory W. Smaus, Michael Achenbach, Christopher J. Burke, Francesco Spadini
-
Patent number: 9582286Abstract: A processor includes a physical register file having physical registers and an execution unit to perform an arithmetic operation to generate a result mapped to a physical register, wherein the processor delays a write of the result to the physical register file until the result is qualified as valid. A method includes mapping the same physical register both to store load data of a load-execute operation and to subsequently store a result of an arithmetic operation of the load-execute operation, and writing the load data into the physical register. The method further includes, in a first clock cycle, executing the arithmetic operation to generate the result, and, in a second clock cycle, providing the result as a source operand for a dependent operation. The method includes, in a third clock cycle, enabling a write of the result to the physical register file responsive to the result qualifying as valid.Type: GrantFiled: November 9, 2012Date of Patent: February 28, 2017Assignee: Advanced Micro Devices, Inc.Inventors: Ganesh Venkataramanan, Debjit Das Sarma, Betty A. McDaniel, Gregory W. Smaus, Francesco Spadini
-
Patent number: 9489206Abstract: A method includes suppressing execution of at least one dependent instruction of a first instruction by a processor responsive to an invalid status of an ancestor load instruction associated with the first instruction. A processor includes an instruction pipeline having an execution unit to execute instructions, a load store unit for retrieving data from a memory hierarchy, and a scheduler unit. The scheduler unit selects for execution in the execution unit a first load instruction having at least one dependent instruction linked to the first load instruction for data forwarding from the load store unit and suppresses execution of a second dependent instruction of the first dependent instruction responsive to an invalid status of the first load instruction.Type: GrantFiled: July 16, 2013Date of Patent: November 8, 2016Assignee: Advanced Micro Devices, Inc.Inventors: Francesco Spadini, Michael Achenbach, Emil Talpes, Ganesh Venkataramanan
-
Patent number: 9483273Abstract: A method includes suppressing execution of an operation portion of a load-operation instruction in a processor responsive to an invalid status of a load portion of load-operation instruction. A processor includes an instruction pipeline including an execution unit operable to execute instructions and a scheduler unit. The scheduler unit includes a scheduler queue and is operable to store a load-operation in the scheduler queue. The load-operation instruction includes a load portion and an operation portion. The scheduler unit schedules the load portion for execution in the execution unit, marks the operation portion in the scheduler queue as eligible for execution responsive to scheduling the load portion, receives an indication of an invalid status of the load portion, and suppresses execution of the operation portion responsive to the indication of the invalid status.Type: GrantFiled: July 16, 2013Date of Patent: November 1, 2016Assignee: Advanced Micro Devices, Inc.Inventors: Francesco Spadini, Michael Achenbach, Emil Talpes, Ganesh Venkataramanan
-
Patent number: 9471326Abstract: A processor core stores information that maps a physical register to an architectural register in response to an instruction modifying the architectural register. The processor recovers a checkpointed state of a set of architectural registers prior to modification of the architectural register by the instruction by modifying a reference mapping of physical registers to the set of architectural registers using the stored information.Type: GrantFiled: July 17, 2013Date of Patent: October 18, 2016Assignee: Advanced Micro Devices, Inc.Inventors: Michael Achenbach, Francesco Spadini
-
Publication number: 20150026686Abstract: A method includes suppressing execution of an operation portion of a load-operation instruction in a processor responsive to an invalid status of a load portion of load-operation instruction. A processor includes an instruction pipeline including an execution unit operable to execute instructions and a scheduler unit. The scheduler unit includes a scheduler queue and is operable to store a load-operation in the scheduler queue. The load-operation instruction includes a load portion and an operation portion. The scheduler unit schedules the load portion for execution in the execution unit, marks the operation portion in the scheduler queue as eligible for execution responsive to scheduling the load portion, receives an indication of an invalid status of the load portion, and suppresses execution of the operation portion responsive to the indication of the invalid status.Type: ApplicationFiled: July 16, 2013Publication date: January 22, 2015Applicant: Advanced Micro Devices, Inc.Inventors: Francesco Spadini, Michael Achenbach, Emil Talpes, Ganesh Venkataramanan
-
Publication number: 20150026685Abstract: A method includes suppressing execution of at least one dependent instruction of a first instruction by a processor responsive to an invalid status of an ancestor load instruction associated with the first instruction. A processor includes an instruction pipeline having an execution unit to execute instructions, a load store unit for retrieving data from a memory hierarchy, and a scheduler unit. The scheduler unit selects for execution in the execution unit a first load instruction having at least one dependent instruction linked to the first load instruction for data forwarding from the load store unit and suppresses execution of a second dependent instruction of the first dependent instruction responsive to an invalid status of the first load instruction.Type: ApplicationFiled: July 16, 2013Publication date: January 22, 2015Applicant: Advanced Micro Devices, Inc.Inventors: Francesco Spadini, Michael Achenbach, Emil Talpes, Ganesh Venkataramanan
-
Publication number: 20150026437Abstract: A processor core stores information that maps a physical register to an architectural register in response to an instruction modifying the architectural register. The processor recovers a checkpointed state of a set of architectural registers prior to modification of the architectural register by the instruction by modifying a reference mapping of physical registers to the set of architectural registers using the stored information.Type: ApplicationFiled: July 17, 2013Publication date: January 22, 2015Inventors: Michael Achenbach, Francesco Spadini