Patents by Inventor Per H. Hammarlund

Per H. Hammarlund has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11544193
    Abstract: A scalable cache coherency protocol for system including a plurality of coherent agents coupled to one or more memory controllers is described. The memory controller may implement a precise directory for cache blocks from the memory to which the memory controller is coupled. Multiple requests to a cache block may be outstanding, and snoops and completions for requests may include an expected cache state at the receiving agent, as indicated by a directory in the memory controller when the request was processed, to allow the receiving agent to detect race conditions. In an embodiment, the cache states may include a primary shared and a secondary shared state. The primary shared state may apply to a coherent agent that bears responsibility for transmitting a copy of the cache block to a requesting agent. In an embodiment, at least two types of snoops may be supported: snoop forward and snoop back.
    Type: Grant
    Filed: May 10, 2021
    Date of Patent: January 3, 2023
    Assignee: Apple Inc.
    Inventors: James Vash, Gaurav Garg, Brian P. Lilly, Ramesh B. Gunna, Steven R. Hutsell, Lital Levy-Rubin, Per H. Hammarlund, Harshavardhan Kaushikkar
  • Patent number: 11513848
    Abstract: In an embodiment, a system includes rate limiter circuits corresponding to various agents that issue transactions in a virtual channel. At least one agent may be identified as a critical agent, and different rate limits (e.g., lower limits) may be selected for other agents when the critical agent is on than when the critical agent is off (e.g., higher limits).
    Type: Grant
    Filed: April 1, 2021
    Date of Patent: November 29, 2022
    Assignee: Apple Inc.
    Inventors: Per H. Hammarlund, Liran Fishel, Roman Gindin
  • Publication number: 20220342806
    Abstract: In an embodiment, a system may support programmable hashing of address bits at a plurality of levels of granularity to map memory addresses to memory controllers and ultimately at least to memory devices. The hashing may be programmed to distribute pages of memory across the memory controllers, and consecutive blocks of the page may be mapped to physically distant memory controllers. In an embodiment, address bits may be dropped from each level of granularity, forming a compacted pipe address to save power within the memory controller. In an embodiment, a memory folding scheme may be employed to reduce the number of active memory devices and/or memory controllers in the system when the full complement of memory is not needed.
    Type: Application
    Filed: November 4, 2021
    Publication date: October 27, 2022
    Inventors: Steven Fishwick, Jeffry E. Gonion, Per H. Hammarlund, Eran Tamari, Lior Zimet, Gerard R. Williams, III
  • Publication number: 20220334997
    Abstract: In an embodiment, a system on a chip (SOC) comprises a semiconductor die on which circuitry is formed, wherein the circuitry comprises a plurality of agents and a plurality of network switches coupled to the plurality of agents. The plurality of network switches are interconnected to form a plurality of physical and logically independent networks. A first network of the plurality of physically and logically independent networks is constructed according to a first topology and a second network of the plurality of physically and logically independent networks is constructed according to a second topology that is different from the first topology. For example, the first topology may a ring topology and the second topology may be a mesh topology. In an embodiment, coherency may be enforced on the first network and the second network may be a relaxed order network.
    Type: Application
    Filed: June 3, 2021
    Publication date: October 20, 2022
    Inventors: Sergio Kolor, Sergio V. Tota, Tzach Zemer, Sagi Lahav, Jonathan M. Redshaw, Per H. Hammarlund, Eran Tamari, James Vash, Gaurav Garg
  • Publication number: 20220318136
    Abstract: Techniques are disclosed relating to an I/O agent circuit of a computer system. The I/O agent circuit may receive, from a peripheral component, a set of transaction requests to perform a set of read transactions that are directed to one or more of a plurality of cache lines. The I/O agent circuit may issue, to a first memory controller circuit configured to manage access to a first one of the plurality of cache lines, a request for exclusive read ownership of the first cache line such that data of the first cache line is not cached outside of the memory and the I/O agent circuit in a valid state. The I/O agent circuit may receive exclusive read ownership of the first cache line, including receiving the data of the first cache line. The I/O agent circuit may then perform the set of read transactions with respect to the data.
    Type: Application
    Filed: January 14, 2022
    Publication date: October 6, 2022
    Inventors: Gaurav Garg, Sagi Lahav, Lital Levy - Rubin, Gerard Williams, III, Samer Nassar, Per H. Hammarlund, Harshavardhan Kaushikkar, Srinivasa Rangan Sridharan, Jeff Gonion, James Vash
  • Publication number: 20220107836
    Abstract: In an embodiment, a system includes rate limiter circuits corresponding to various agents that issue transactions in a virtual channel. At least one agent may be identified as a critical agent, and different rate limits (e.g., lower limits) may be selected for other agents when the critical agent is on than when the critical agent is off (e.g., higher limits).
    Type: Application
    Filed: April 1, 2021
    Publication date: April 7, 2022
    Inventors: Per H. Hammarlund, Liran Fishel, Roman Gindin
  • Publication number: 20220083472
    Abstract: A scalable cache coherency protocol for system including a plurality of coherent agents coupled to one or more memory controllers is described. The memory controller may implement a precise directory for cache blocks from the memory to which the memory controller is coupled. Multiple requests to a cache block may be outstanding, and snoops and completions for requests may include an expected cache state at the receiving agent, as indicated by a directory in the memory controller when the request was processed, to allow the receiving agent to detect race conditions. In an embodiment, the cache states may include a primary shared and a secondary shared state. The primary shared state may apply to a coherent agent that bears responsibility for transmitting a copy of the cache block to a requesting agent. In an embodiment, at least two types of snoops may be supported: snoop forward and snoop back.
    Type: Application
    Filed: May 10, 2021
    Publication date: March 17, 2022
    Inventors: James Vash, Gaurav Garg, Brian P. Lilly, Ramesh B. Gunna, Steven R. Hutsell, Lital Levy-Rubin, Per H. Hammarlund
  • Patent number: 10795818
    Abstract: Various systems and methods for ensuring real-time snoop latency are disclosed. A system includes a processor and a cache controller. The cache controller receives, via a channel, cache snoop requests from the processor, the snoop requests including latency-sensitive and non-latency sensitive requests. Requests are not prioritized by type within the channel. The cache controller limits a number of non-latency sensitive snoop requests that can be processed ahead of an incoming latency-sensitive snoop requests. Limiting the number of non-latency sensitive snoop requests that can be processed ahead of an incoming latency-sensitive snoop request includes the cache controller determining that the number of received non-latency sensitive snoop requests has reached a predetermined value and responsively prioritizing latency-sensitive requests over non-latency sensitive requests.
    Type: Grant
    Filed: May 21, 2019
    Date of Patent: October 6, 2020
    Assignee: Apple Inc.
    Inventors: Harshavardhan Kaushikkar, Per H. Hammarlund, Brian P. Lilly, Michael Bekerman, James Vash, Manu Gulati, Benjamin K. Dodge
  • Patent number: 7797683
    Abstract: Systems and methods of managing threads provide for supporting a plurality of logical threads with a plurality of simultaneous physical threads in which the number of logical threads may be greater than or less than the number of physical threads. In one approach, each of the plurality of logical threads is maintained in one of a wait state, an active state, a drain state, and a stall state. A state machine and hardware sequencer can be used to transition the logical threads between states based on triggering events and whether or not an interruptible point has been encountered in the logical threads. The logical threads are scheduled on the physical threads to meet, for example, priority, performance or fairness goals. It is also possible to specify the resources that are available to each logical thread in order to meet these and other, goals. In one example, a single logical thread can speculatively use more than one physical thread, pending a selection of which physical thread should be committed.
    Type: Grant
    Filed: December 29, 2003
    Date of Patent: September 14, 2010
    Assignee: Intel Corporation
    Inventors: Per H. Hammarlund, Stephan J. Jourdan, Pierre Michaud, Alexandre J. Farcy, Morris Marden, Robert L. Hinton, Douglas M. Carmean
  • Patent number: 7688746
    Abstract: Embodiments of the present invention provide a dynamic resource allocator to allocate resources performance optimization in, for example, a computer system. The dynamic resource allocator to allocate a resource to one or more threads associated with an application based on a performance rate. Embodiments of the present invention may further include a performance monitor to monitor the performance rate of the one or more threads. The dynamic resource allocator to allocate an additional resource to the one or more threads, if the thread is performing above a performance threshold. In embodiments of the present invention, the dynamic resource allocation strategy may be decided based on, for example, optimizing the overall system throughput, minimizing power consumption, meeting system performance goals (e.g., real time requirements), user specified performance priorities and/or application specified performance priorities.
    Type: Grant
    Filed: December 29, 2003
    Date of Patent: March 30, 2010
    Assignee: Intel Corporation
    Inventors: Per H. Hammarlund, Melih Ozgul
  • Patent number: 7640419
    Abstract: Embodiments of the present invention relate to a memory management scheme and apparatus that enables efficient memory renaming. The method includes computing a store address, writing the store address in a first storage, writing data associated with the store address to a memory, and de-allocating the store address from the first storage, allocating the store address in a second storage, predicting a load instruction to be memory renamed, computing a load store source index, computing a load address, disambiguating the memory renamed load instruction, and retiring the memory renamed load instruction, if the store instruction is still allocated in at least one of the first storage and the second storage and should have effectively provided to the load the full data. The method may also include re-executing the load instruction without memory renaming, if the store instruction is not in at the first storage or in the second storage.
    Type: Grant
    Filed: December 23, 2003
    Date of Patent: December 29, 2009
    Assignee: Intel Corporation
    Inventors: Sebastien Hily, Per H. Hammarlund
  • Patent number: 7529913
    Abstract: Embodiments of the present invention relate to a method and system for providing virtual identifiers corresponding to physical registers in a computer processor. According to the embodiments, the virtual identifiers may be used to represent the physical registers during operations in a pipeline of the processor.
    Type: Grant
    Filed: December 23, 2003
    Date of Patent: May 5, 2009
    Assignee: Intel Corporation
    Inventors: Avinash Sodani, Per H. Hammarlund, Stephan J. Jourdan
  • Patent number: 7502892
    Abstract: Embodiments of the present invention relate to cache coherency. In an embodiment of the invention, a cache includes one or more cache lines. A store pipeline may retrieve a tag associated with one of the cache lines. The data associated with the cache line may not retrieved and the cache line may be updated if, based on the tag, the cache line is determined to be in a modified or exclusive state.
    Type: Grant
    Filed: December 30, 2003
    Date of Patent: March 10, 2009
    Assignee: Intel Corporation
    Inventors: Per H. Hammarlund, Hermann W. Gartler
  • Patent number: 7502912
    Abstract: A method and apparatus for rescheduling operations in a processor. More particularly, the present invention relates to optimally using a scheduler resource in a processor by analyzing, predicting, and sorting the write order of instructions into the scheduler so that the duration the instructions sit idle in the scheduler is minimized. The analyses, prediction, and sorting may be done between an instruction queue and a scheduler by using delay units. The prediction can be based on history (latency, dependency, and resource) or on a general prediction scheme.
    Type: Grant
    Filed: December 30, 2003
    Date of Patent: March 10, 2009
    Assignee: Intel Corporation
    Inventors: Avinash Sodani, Per H. Hammarlund, Stephan J. Jourdan
  • Patent number: 7174428
    Abstract: Embodiments of the present invention provide a method, apparatus and system for memory renaming. In one embodiment, a decode unit may decode a load instruction. If the load instruction is predicted to be memory renamed, the load instruction may have a predicted store identifier associated with the load instruction. The decode unit may transform the load instruction that is predicted to be memory renamed into a data move instruction and a load check instruction. The data move instruction may read data from the cache based on the predicted store identifier and load check instruction may compare an identifier associated with an identified source store with the predicted store identifier. A retirement unit may retire the load instruction if the predicted store identifier matches an identifier associated with the identified source store.
    Type: Grant
    Filed: December 29, 2003
    Date of Patent: February 6, 2007
    Assignee: Intel Corporation
    Inventors: Sebastien Hily, Per H. Hammarlund, Avinash Sodani
  • Patent number: 7130965
    Abstract: Embodiments of the present invention relate to a memory management scheme and apparatus that enables efficient cache memory management. The method includes writing an entry to a store buffer at execute time; determining if the entry's address is in a first-level cache associated with the store buffer before retirement; and setting a status bit associated with the entry in said store buffer, if the address is in the cache in either exclusive or modified state. The method further includes immediately writing the entry to the first-level cache at or after retirement when the status bit is set; and de-allocating the entry from said store buffer at retirement. The method further may comprise resetting the status bit if the cacheline is allocated over or is evicted from the cache before the store buffer entry attempts to write to the cache.
    Type: Grant
    Filed: December 23, 2003
    Date of Patent: October 31, 2006
    Assignee: Intel Corporation
    Inventors: Per H. Hammarlund, Stephan Jourdan, Sebastien Hily, Aravindh Baktha, Hermann Gartler
  • Patent number: 6990551
    Abstract: A system and method for reducing linear address aliasing is described. In one embodiment, a portion of a linear address is combined with a process identifier, e.g., a page directory base pointer to form an adjusted-linear address. The page directory base pointer is unique to a process and combining it with a portion of the linear address produces an adjusted-linear address that provides a high probability of no aliasing. A portion of the adjusted-linear address is used to search an adjusted-linear-addressed cache memory for a data block specified by the linear address. If the data block does not reside in the adjusted-linear-addressed cache memory, then a replacement policy selects one of the cache lines in the adjusted-linear-addressed cache memory and replaces the data block of the selected cache line with a data block located at a physical address produced from translating the linear address.
    Type: Grant
    Filed: August 13, 2004
    Date of Patent: January 24, 2006
    Assignee: Intel Corporation
    Inventors: Herbert H. J. Hum, Stephan J. Jourdan, Per H. Hammarlund
  • Patent number: 6981129
    Abstract: Breaking replay dependency loops in a processor using a rescheduled replay queue. The processor comprises a replay queue to receive a plurality of instructions, and an execution unit to execute the plurality of instructions. A scheduler is coupled between the replay queue and the execution unit. The scheduler speculatively schedules instructions for execution and increments a counter for each of the plurality of instructions to reflect the number of times each of the plurality of instructions has been executed. The scheduler also dispatches each instruction to the execution unit either when the counter does not exceed a maximum number of replays or, if the counter exceeds the maximum number of replays, when the instruction is safe to execute. A checker is coupled to the execution unit to determine whether each instruction has executed successfully. The checker is also coupled to the replay queue to communicate to the replay queue each instruction that has not executed successfully.
    Type: Grant
    Filed: November 2, 2000
    Date of Patent: December 27, 2005
    Assignee: Intel Corporation
    Inventors: Darrell D. Boggs, Douglas M. Carmean, Per H. Hammarlund, Francis X. McKeen, David J. Sager, Ronak Singhal
  • Patent number: 6877086
    Abstract: Rescheduling multiple micro-operations in a processor using a replay queue. The processor comprises a replay queue to receive a plurality of instructions and an execution unit to execute the plurality of instructions. A scheduler is coupled between the replay queue and the execution unit. The scheduler speculatively schedules instructions for execution and dispatches each instruction to the execution unit. A checker is coupled to the execution unit to determine whether each instruction has executed successfully. The checker is also coupled to the replay queue to communicate to the replay queue each instruction that has not executed successfully.
    Type: Grant
    Filed: November 2, 2000
    Date of Patent: April 5, 2005
    Assignee: Intel Corporation
    Inventors: Darrell D. Boggs, Douglas M. Carmean, Per H. Hammarlund, Francis X. McKeen, David J. Sager, Ronak Singhal
  • Patent number: 6798364
    Abstract: A method and apparatus for variable length coding is described. A method comprises receiving a group of data having a group of set values, identifying a group of positions of the group of set values within the group of data without branching, for each of the group of positions, encoding a run of non-set values preceding each of the group of positions.
    Type: Grant
    Filed: February 5, 2002
    Date of Patent: September 28, 2004
    Assignee: Intel Corporation
    Inventors: Yen-Kuang Chen, Matthew J. Holliman, Herbert Hum, Per H. Hammarlund, Thomas Huff, William W. Macy