Patents by Inventor Stefanos Kaxiras
Stefanos Kaxiras has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12001845Abstract: An apparatus comprises first instruction execution circuitry, second instruction execution circuitry, and a decoupled access buffer. Instructions of an ordered sequence of instructions are issued to one of the first and second instruction execution circuitry for execution in dependence on whether the instruction has a first type label or a second type label. An instruction with the first type label is an access-related instruction which determines at least one characteristic of a load operation to retrieve a data value from a memory address. Instruction execution by the first instruction execution circuitry of instructions having the first type label is prioritised over instruction execution by the second instruction execution circuitry of instructions having the second type label. Data values retrieved from memory as a result of execution of the first type instructions are stored in the decoupled access buffer.Type: GrantFiled: October 15, 2020Date of Patent: June 4, 2024Assignee: Arm LimitedInventors: Mbou Eyole, Stefanos Kaxiras
-
Patent number: 11899940Abstract: When load requests are generated to support data processing operations, the load requests are buffered in pending load buffer circuitry prior to being carried out. Coalescing circuitry determines for a first load request whether a set of one or more subsequent load requests buffered in the pending load buffer circuitry satisfies an address proximity condition. The address proximity condition is satisfied when all data items identified by the set of one or more subsequent load requests are comprised within a series of data items which will be retrieved from the memory system in response to the first load request. When the address proximity condition is satisfied, forwarding of the set of one or more subsequent load requests is suppressed.Type: GrantFiled: October 7, 2020Date of Patent: February 13, 2024Assignee: Arm LimitedInventors: Mbou Eyole, Stefanos Kaxiras
-
Patent number: 11886881Abstract: Apparatuses and methods are provided, relating to the control of data processing in devices which comprise both decoupled access-execute processing circuitry and prefetch circuitry. Control of the access portion of the decoupled access-execute processing circuitry may be dependent on a performance metric of the prefetch circuitry. Alternatively or in addition, control of the prefetch circuitry may be dependent on a performance metric of the access portion.Type: GrantFiled: December 21, 2020Date of Patent: January 30, 2024Assignee: Arm LimitedInventors: Mbou Eyole, Michiel Willem Van Tol, Stefanos Kaxiras
-
Publication number: 20230120783Abstract: Apparatuses and methods are provided, relating to the control of data processing in devices which comprise both decoupled access-execute processing circuitry and prefetch circuitry. Control of the access portion of the decoupled access-execute processing circuitry may be dependent on a performance metric of the prefetch circuitry. Alternatively or in addition, control of the prefetch circuitry may be dependent on a performance metric of the access portion.Type: ApplicationFiled: December 21, 2020Publication date: April 20, 2023Inventors: Mbou EYOLE, Michiel Willem VAN TOL, Stefanos KAXIRAS
-
Publication number: 20220391214Abstract: An apparatus comprises first instruction execution circuitry, second instruction execution circuitry, and a decoupled access buffer. Instructions of an ordered sequence of instructions are issued to one of the first and second instruction execution circuitry for execution in dependence on whether the instruction has a first type label or a second type label. An instruction with the first type label is an access-related instruction which determines at least one characteristic of a load operation to retrieve a data value from a memory address. Instruction execution by the first instruction execution circuitry of instructions having the first type label is prioritised over instruction execution by the second instruction execution circuitry of instructions having the second type label. Data values retrieved from memory as a result of execution of the first type instructions are stored in the decoupled access buffer.Type: ApplicationFiled: October 15, 2020Publication date: December 8, 2022Inventors: Mbou EYOLE, Stefanos KAXIRAS
-
Publication number: 20220391101Abstract: When load requests are generated to support data processing operations, the load requests are buffered in pending load buffer circuitry prior to being carried out. Coalescing circuitry determines for a first load request whether a set of one or more subsequent load requests buffered in the pending load buffer circuitry satisfies an address proximity condition. The address proximity condition is satisfied when all data items identified by the set of one or more subsequent load requests are comprised within a series of data items which will be retrieved from the memory system in response to the first load request. When the address proximity condition is satisfied, forwarding of the set of one or more subsequent load requests is suppressed.Type: ApplicationFiled: October 7, 2020Publication date: December 8, 2022Inventors: Mbou EYOLE, Stefanos KAXIRAS
-
Patent number: 11334485Abstract: A computer system for dynamic enforcement of store atomicity includes multiple processor cores, local cache memory for each processor core, a shared memory, a separate store buffer for each processor core for executed stores that are not yet performed and a coherence mechanism. A first processor core load on a first processor core receives a value at a first time from a first processor core store in the store buffer and prevents any other first processor core load younger than the first processor core load in program order from committing until a second time when the first processor core store is performed. Between the first time and the second time any load younger in program load than the first processor core load and having an address matched by coherence invalidation or an address matched by an eviction is squashed.Type: GrantFiled: December 16, 2019Date of Patent: May 17, 2022Assignee: ETA SCALE ABInventors: Stefanos Kaxiras, Alberto Ros
-
Patent number: 11237966Abstract: Synchronization events associated with cache coherence are monitored without using invalidations. A callback-read is issued to a memory address associated with the synchronization event, which callback-read either reads the last value written in the memory address or blocks until a next write takes place in the memory address and reads a newly written value.Type: GrantFiled: June 28, 2019Date of Patent: February 1, 2022Assignee: ETA SCALE ABInventors: Stefanos Kaxiras, Alberto Ros
-
Patent number: 11188464Abstract: Methods and systems for self-invalidating cachelines in a computer system having a plurality of cores are described. A first one of the plurality of cores, requests to load a memory block from a cache memory local to the first one of the plurality of cores, which request results in a cache miss. This results in checking a read-after-write detection structure to determine if a race condition exists for the memory block. If a race condition exists for the memory block, program order is enforced by the first one of the plurality of cores at least between any older loads and any younger loads with respect to the load that detects the prior store in the first one of the plurality of cores that issued the load of the memory block and causing one or more cache lines in the local cache memory to be self-invalidated.Type: GrantFiled: December 11, 2019Date of Patent: November 30, 2021Assignee: ETA SCALE ABInventors: Alberto Ros, Stefanos Kaxiras
-
Publication number: 20210365554Abstract: A system and method for mitigating micro-architectural replay attacks in a processing system by delaying speculative execution on the processing system of a set of processor instructions upon detection that the set of processor instructions are part of a micro-architectural replay attack by detecting repeating speculative execution of the set of processor instructions interleaved with misspeculation and squashing of the set of processor instructions.Type: ApplicationFiled: May 25, 2021Publication date: November 25, 2021Inventors: Christos SAKALIS, Stefanos KAXIRAS, Magnus SJÄLANDER
-
Patent number: 11163576Abstract: A system and method for efficiently preventing visible side-effects in the memory hierarchy during speculative execution is disclosed. Hiding the side-effects of executed instructions in the whole memory hierarchy is both expensive, in terms of performance and energy, and complicated. A system and method is disclosed to hide the side-effects of speculative loads in the cache(s) until the earliest time these speculative loads become non-speculative. A refinement is disclosed where loads that hit in the L1 cache are allowed to proceed by keeping their side-effects on the L1 cache hidden until these loads become non-speculative, and all other speculative loads that miss in the cache(s) are prevented from executing until they become non-speculative. To limit the performance deterioration caused by these delayed loads, a system and method is disclosed that augments the cache(s) with a value predictor or a re-computation engine that supplies predicted or recomputed values to the loads that missed in the cache(s).Type: GrantFiled: March 20, 2020Date of Patent: November 2, 2021Assignee: ETA SCALE ABInventors: Christos Sakalis, Stefanos Kaxiras, Alberto Ros, Alexandra Jimborean, Magnus Själander
-
Patent number: 11119920Abstract: A method for performing store buffer coalescing in a multiprocessor computer system includes forming, in a coalescing store buffer associated with a core in said multiprocessor system, an atomic group of writes; and performing each individual write in said atomic group in an order which is a function of an address in a memory system to which each of the writes in said atomic group are being written.Type: GrantFiled: April 18, 2019Date of Patent: September 14, 2021Assignee: ETA SCALE ABInventors: Alberto Ros, Stefanos Kaxiras
-
Patent number: 11106468Abstract: Methods and systems for maintaining validity of a memory model in a multiple core computer system are described. A first core prevents a store instruction from being performed by another core until a condition is met which enables reordered instructions to validly execute.Type: GrantFiled: May 23, 2018Date of Patent: August 31, 2021Assignee: ETA SCALE ABInventors: Alberto Ros, Stefanos Kaxiras
-
Patent number: 11068410Abstract: According to embodiments described herein, the hierarchical complexity for coherence protocols associated with clustered cache architectures can be encapsulated in a simple function, i.e., that of determining when a data block is shared entirely within a cluster (i.e., a sub-tree of the hierarchy) and is private from the outside. This allows embodiments to eliminate complex recursive coherence operations that span the hierarchy and instead employ simple coherence mechanisms such as self-invalidation and write-through but which are restricted to operate where a data block is shared. Thus embodiments recognize that, in the context of clustered cache hierarchies, data can be shared entirely within one cluster but can be private (unshared) to this cluster when viewed from the perspective of other clusters. This characteristic of the data can be determined and then used to locally simplify coherence protocols.Type: GrantFiled: March 4, 2019Date of Patent: July 20, 2021Assignee: ETA SCALE ABInventors: Alberto Ros, Stefanos Kaxiras
-
Patent number: 10915466Abstract: Caches may be vulnerable to side-channel attacks, such as Spectre and Meltdown, that involve speculative execution of instructions, revealing information about a cache that the attacker is not permitted to access. Access permission may be stored in the cache, such as in an entry of a cache table or in the region information for a cache table. Optionally, the access permission may be re-checked if the access permission changes while a memory instruction is pending. Optionally, a random index value may be stored in a cache and used, at least in part, to identify a memory location of a cacheline. Optionally, cachelines that are involved in speculative loads for memory instructions may be marked as speculative. On condition of resolving the speculative load as non-speculative, the cacheline may be marked as non-speculative; and on condition of resolving the speculative load as mis-speculated, the cacheline may be removed from the cache.Type: GrantFiled: February 27, 2019Date of Patent: February 9, 2021Assignee: Samsung Electronics Co., Ltd.Inventors: Erik Ernst Hagersten, David Black-Schaffer, Stefanos Kaxiras
-
Publication number: 20200301712Abstract: A system and method for efficiently preventing visible side-effects in the memory hierarchy during speculative execution is disclosed. Hiding the side-effects of executed instructions in the whole memory hierarchy is both expensive, in terms of performance and energy, and complicated. A system and method is disclosed to hide the side-effects of speculative loads in the cache(s) until the earliest time these speculative loads become non-speculative. A refinement is disclosed where loads that hit in the L1 cache are allowed to proceed by keeping their side-effects on the L1 cache hidden until these loads become non-speculative, and all other speculative loads that miss in the cache(s) are prevented from executing until they become non-speculative. To limit the performance deterioration caused by these delayed loads, a system and method is disclosed that augments the cache(s) with a value predictor or a re-computation engine that supplies predicted or recomputed values to the loads that missed in the cache(s).Type: ApplicationFiled: March 20, 2020Publication date: September 24, 2020Inventors: Christos SAKALIS, Stefanos KAXIRAS, Alberto ROS, Alexandra JIMBOREAN, Magnus SJÄLANDER
-
Publication number: 20200192801Abstract: A computer system for dynamic enforcement of store atomicity includes multiple processor cores, local cache memory for each processor core, a shared memory, a separate store buffer for each processor core for executed stores that are not yet performed and a coherence mechanism. A first processor core load on a first processor core receives a value at a first time from a first processor core store in the store buffer and prevents any other first processor core load younger than the first processor core load in program order from committing until a second time when the first processor core store is performed. Between the first time and the second time any load younger in program load than the first processor core load and having an address matched by coherence invalidation or an address matched by an eviction is squashed.Type: ApplicationFiled: December 16, 2019Publication date: June 18, 2020Inventors: Stefanos KAXIRAS, Alberto ROS
-
Patent number: 10671543Abstract: Methods and systems which, for example, reduce energy usage in cache memories are described. Cache location information regarding the location of cachelines which are stored in a tracked portion of a memory hierarchy is stored in a cache location table. Address tags are stored with corresponding location information in the cache location table to associate the address tag with the cacheline and its cache location information. When a cacheline is moved to a new location in the memory hierarchy, the cache location table is updated so that the cache location information indicates where the cacheline is located within the memory hierarchy.Type: GrantFiled: November 20, 2014Date of Patent: June 2, 2020Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Erik Hagersten, Andreas Sembrant, David Black-Schaffer, Stefanos Kaxiras
-
Publication number: 20200110703Abstract: Methods and systems for self-invalidating cachelines in a computer system having a plurality of cores are described. A first one of the plurality of cores, requests to load a memory block from a cache memory local to the first one of the plurality of cores, which request results in a cache miss. This results in checking a read-after-write detection structure to determine if a race condition exists for the memory block. If a race condition exists for the memory block, program order is enforced by the first one of the plurality of cores at least between any older loads and any younger loads with respect to the load that detects the prior store in the first one of the plurality of cores that issued the load of the memory block and causing one or more cache lines in the local cache memory to be self-invalidated.Type: ApplicationFiled: December 11, 2019Publication date: April 9, 2020Inventors: Alberto ROS, Stefanos KAXIRAS
-
Patent number: 10528471Abstract: Methods and systems for self-invalidating cachelines in a computer system having a plurality of cores are described. A first one of the plurality of cores, requests to load a memory block from a cache memory local to the first one of the plurality of cores, which request results in a cache miss. This results in checking a read-after-write detection structure to determine if a race condition exists for the memory block. If a race condition exists for the memory block, program order is enforced by the first one of the plurality of cores at least between any older loads and any younger loads with respect to the load that detects the prior store in the first one of the plurality of cores that issued the load of the memory block and causing one or more cache lines in the local cache memory to be self-invalidated.Type: GrantFiled: December 27, 2017Date of Patent: January 7, 2020Assignee: ETA SCALE ABInventors: Alberto Ros, Stefanos Kaxiras