Patents by Inventor Stijn EYERMAN

Stijn EYERMAN has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12657026
    Abstract: Techniques for automatic fusion of arithmetic in-flight instructions are described. An example apparatus comprises a buffer to store instructions to be issued to a functional unit for execution, and circuitry coupled to the buffer to combine two or more instructions from the buffer into a single combined instruction. Other examples are disclosed and claimed.
    Type: Grant
    Filed: June 23, 2022
    Date of Patent: June 16, 2026
    Assignee: Intel Corporation
    Inventors: Kristof Du Bois, Wim Heirman, Stijn Eyerman, Ibrahim Hur, Jason Agron
  • Publication number: 20260126996
    Abstract: Techniques and mechanisms for tile prefetching to be performed based on both intra-tile stride characteristics and inter-tile stride characteristics. In an embodiment, a prefetch circuit of a processor core detects that multiple demand fetch instructions target different respective tiles of a matrix. Based on the multiple demand fetch instructions, fetch pattern information is registered and made available for future reference to facilitate detection of a later instance of the fetch pattern. Fetch pattern information corresponding to a first demand fetch instruction comprises both an inter-tile stride and an inter-tile stride. In another embodiment, the prefetch circuit generates micro-operations, based on the fetch pattern information, to prefetch one or more tiles of the matrix.
    Type: Application
    Filed: November 4, 2024
    Publication date: May 7, 2026
    Applicant: Intel Corporation
    Inventors: Stijn Eyerman, Wim Heirman
  • Publication number: 20260119408
    Abstract: It is provided an apparatus comprising interface circuitry, machine-readable instructions, and processing circuitry to execute the machine-readable instructions. The machine-readable instructions include instructions to maintain at least three page classes for a system memory. The at least three page classes comprise an uncompressed page class, a first compressed page class, and a second compressed page class. The machine-readable instructions further include instructions to determine a target page class from the at least three page classes for a page based on the determined status indicator corresponding to a memory page. The status indicator comprises at least a write-recency indicator and a read-recency indicator. The machine-readable instructions further include instructions to, in response to a determination that the target page is not matching a current page class of the page, migrate the page to the target page class.
    Type: Application
    Filed: December 22, 2025
    Publication date: April 30, 2026
    Inventors: Stijn EYERMAN, Wim HEIRMAN, Vinodh GOPAL, Wajdi FEGHALI
  • Patent number: 12585394
    Abstract: An apparatus to facilitate generating a memory bandwidth stack for visualizing memory bandwidth utilization is disclosed. The apparatus includes processors to receive data corresponding to a memory cycle occurring during a total execution time of an application executed by the one or more processors; for the memory cycle, assign the memory cycle to a component of a bandwidth stack based on analysis of the data and in accordance with a prioritization scheme; for the component, determine a portion of the bandwidth stack to account to the component based at least in part on the assignment of the memory cycle to the component; and generate the bandwidth stack by at least representing the portion accounted to the component in the bandwidth stack.
    Type: Grant
    Filed: May 25, 2022
    Date of Patent: March 24, 2026
    Assignee: INTEL CORPORATION
    Inventors: Stijn Eyerman, Wim Heirman, Ibrahim Hur
  • Publication number: 20260023569
    Abstract: Techniques for speculative invocation of accelerators in out-of-order pipelines are described. In some examples, a processor core at least comprising: decoder circuitry to at least decode an accelerator task instruction, scheduling circuitry to at least schedule the decoded accelerator task instruction to execute on an accelerator, a port coupled to the accelerator, and at least one register to store a result of the decoded accelerator task instruction; is coupled to the accelerator to execute the decoded accelerator task instruction and provide the result to the processor core through the port coupled to the accelerator.
    Type: Application
    Filed: September 26, 2025
    Publication date: January 22, 2026
    Inventors: Gerasimos Gerogiannis, Stijn Eyerman, Wim Heirman
  • Publication number: 20260023564
    Abstract: Techniques for using accelerators are described. In some examples, a system includes a processor core at least comprising: decoder circuitry to at least decode an accelerator task instruction to be executed by an accelerator, scheduling circuitry to at least schedule the decoded accelerator task instruction to execute on an accelerator, and at least one register to store a result of an execution of the decoded accelerator task instruction; an interface coupled to a port of the processor core and the accelerator, wherein the interface is to retrieve data for the accelerator and provide the result of the accelerator to one or more registers of the processor core; and the accelerator to execute the decoded accelerator task instruction.
    Type: Application
    Filed: September 27, 2025
    Publication date: January 22, 2026
    Inventors: Gerasimos Gerogiannis, Stijn Eyerman, Wim Heirman
  • Publication number: 20250355837
    Abstract: Methods and apparatus for data access pattern profiler for memory compression scheme selection are described herein. Respective data are stored as uncompressed data and compressed data in the system memory in which data are stored using multiple compressions schemes using different chunk sizes. In conjunction with servicing memory Read request from the compressed data, access patterns are profiled to generate profiled access patterns that are used to determine compression schemes to use to selectively recompress portions of the compressed data. Virtual memory areas are allocated for storing compressible data structures and divided into compressed memory regions (cmrs). Access to sampled pages in the cmr are profiled to generate the profiled access pattern for the cmr, which is used to determine whether a cmr compression scheme should be changed and what scheme to use for recompression.
    Type: Application
    Filed: August 4, 2025
    Publication date: November 20, 2025
    Inventors: Stijn EYERMAN, Wim HEIRMAN, Vinodh GOPAL, Wajdi FEGHALI
  • Publication number: 20250328245
    Abstract: Methods and apparatus for variable chunk size memory compression. A physical address space for system memory is partitioned into an uncompressed partition in which data are stored without compression and a compressed partition in which compressed data are stored using a plurality of chunk sizes. In response to a memory Read request, when it is determined that the requested data are stored in a compressed partition, the location of a compressed chunk on a memory device containing the data is determined, the data are retrieved and decompressed, and the decompressed data are returned to the core issuing the memory Read request. A compressed page table (CPT) is maintained containing entries having fields encoding a chunk size, a device address corresponding to a start of the compressed page, and one or more fields denoting sizes of each chunk in the compressed page.
    Type: Application
    Filed: June 30, 2025
    Publication date: October 23, 2025
    Inventors: Stijn EYERMAN, Wim HEIRMAN, Vinodh GOPAL, Wajdi FEGHALI
  • Publication number: 20250321738
    Abstract: Methods and apparatus relating to one or more delayed cache writeback instructions for improved data sharing in manycore processors are described. In an embodiment, a delayed cache writeback instruction causes a cache block in a modified state in a Level 1 (L1) cache of a first core of a plurality of cores of a multi-core processor to a Modified write back (M.wb) state. The M.wb state causes the cache block to be written back to LLC upon eviction of the cache block from the L1 cache. Other embodiments are also disclosed and claimed.
    Type: Application
    Filed: June 13, 2025
    Publication date: October 16, 2025
    Inventors: Wim Heirman, Stijn Eyerman, Ibrahim Hur
  • Patent number: 12333305
    Abstract: Methods and apparatus relating to one or more delayed cache writeback instructions for improved data sharing in manycore processors are described. In an embodiment, a delayed cache writeback instruction causes a cache block in a modified state in a Level 1 (L1) cache of a first core of a plurality of cores of a multi-core processor to a Modified write back (M.wb) state. The M.wb state causes the cache block to be written back to LLC upon eviction of the cache block from the L1 cache. Other embodiments are also disclosed and claimed.
    Type: Grant
    Filed: September 26, 2020
    Date of Patent: June 17, 2025
    Inventors: Wim Heirman, Stijn Eyerman, Ibrahim Hur
  • Publication number: 20250190360
    Abstract: Methods and apparatus for Operating System (OS)-transparent memory decompression with hardware acceleration. A physical address space for system memory is partitioned into compressed and uncompressed partitions. A core issues a memory Read request and on-chip L1, L2, and a last level cache (LLC) are checked, with misses leading to page table lookups to determine where in system memory the requested data are stored. When stored in the compressed partition, a compressed page table is searched to find the location of the compressed form of the data on a memory device. The compressed data are read from the memory device, decompressed using hardware acceleration and returned to the requesting core without writing the data to the uncompressed partition. Under one approach, a compressed page containing the requested data is decompressed and written to the LLC. When data (e.g.
    Type: Application
    Filed: February 10, 2025
    Publication date: June 12, 2025
    Inventors: Stijn EYERMAN, Wim HEIRMAN, Vinodh GOPAL, Wajdi FEGHALI
  • Publication number: 20250068422
    Abstract: Methods, apparatus, and computer programs are disclosed for context switching. In some embodiments, a method comprises dedicating a first subset of a plurality of vector registers to a first thread of a plurality of threads for thread execution; and responsive to a context switch from the first thread to a second thread, bypassing saving a state of the first subset of the plurality of vector registers; and saving a state of a second subset of the plurality of vector registers, wherein the second subset of the plurality of vector registers is not dedicated to the first thread, and wherein the first and second subsets are mutually exclusive.
    Type: Application
    Filed: November 8, 2024
    Publication date: February 27, 2025
    Inventors: Duane GALBI, Christopher J. HUGHES, Dan BAUM, H. Peter ANVIN, Stijn EYERMAN
  • Patent number: 12050915
    Abstract: In an embodiment, a processor includes a fetch circuit to fetch instructions, the instructions including a code prefetch instruction; a decode circuit to decode the code prefetch instruction and provide the decoded code prefetch instruction to a memory circuit, the memory circuit to execute the decoded code prefetch instruction to prefetch a first set of code blocks into a first cache and to prefetch a second set of code blocks into a second cache. Other embodiments are described and claimed.
    Type: Grant
    Filed: December 22, 2020
    Date of Patent: July 30, 2024
    Assignee: Intel Corporation
    Inventors: Wim Heirman, Stijn Eyerman, Ibrahim Hur
  • Publication number: 20230418612
    Abstract: Techniques for automatic fusion of arithmetic in-flight instructions are described. An example apparatus comprises a buffer to store instructions to be issued to a functional unit for execution, and circuitry coupled to the buffer to combine two or more instructions from the buffer into a single combined instruction. Other examples are disclosed and claimed.
    Type: Application
    Filed: June 23, 2022
    Publication date: December 28, 2023
    Applicant: Intel Corporation
    Inventors: Kristof Du Bois, Wim Heirman, Stijn Eyerman, Ibrahim Hur, Jason Agron
  • Patent number: 11526483
    Abstract: Methods, apparatus, systems and articles of manufacture to build a storage architecture for graph data are disclosed herein. Disclosed example apparatus include a neighbor identifier to identify respective sets of neighboring vertices of a graph. The neighboring vertices included in the respective sets are adjacent to respective ones of a plurality of vertices of the graph and respective sets of neighboring vertices are represented as respective lists of neighboring vertex identifiers. The apparatus also includes an element creator to create, in a cache memory, an array of elements that are unpopulated. The array elements have lengths equal to a length of a cache line. In addition, the apparatus includes an element populater to populate the elements with neighboring vertex identifiers. Each of the elements store neighboring vertex identifiers of respective ones of the list of neighboring vertex identifiers.
    Type: Grant
    Filed: March 30, 2018
    Date of Patent: December 13, 2022
    Assignee: Intel Corporation
    Inventors: Stijn Eyerman, Jason M. Howard, Ibrahim Hur, Ivan B. Ganev, Fabrizio Petrini, Joshua B. Fryman
  • Publication number: 20220283719
    Abstract: An apparatus to facilitate generating a memory bandwidth stack for visualizing memory bandwidth utilization is disclosed. The apparatus includes processors to receive data corresponding to a memory cycle occurring during a total execution time of an application executed by the one or more processors; for the memory cycle, assign the memory cycle to a component of a bandwidth stack based on analysis of the data and in accordance with a prioritization scheme; for the component, determine a portion of the bandwidth stack to account to the component based at least in part on the assignment of the memory cycle to the component; and generate the bandwidth stack by at least representing the portion accounted to the component in the bandwidth stack.
    Type: Application
    Filed: May 25, 2022
    Publication date: September 8, 2022
    Applicant: Intel Corporation
    Inventors: Stijn Eyerman, Wim Heirman, Ibrahim Hur
  • Publication number: 20220229677
    Abstract: A distributed simulation system is provided that includes a timing simulator and functional simulator(s) on different computing nodes to simulate a graph processing system. The functional simulators are to simulate execution of a set of instructions on the graph processing system and to send information associated with the simulated set of instructions to the timing simulator over the network. The timing simulator is to determine timing information associated with execution of the sets of instructions sent by the functional simulators and send the timing information to the functional simulators over the network. The timing simulator may determine a global synchronization point for the functional simulators and send the timing information for the sets of instructions to respective functional simulators at the global synchronization point. The functional simulators may stall simulation of further instructions until the timing information for its set of instructions is received from the timing simulator.
    Type: Application
    Filed: April 2, 2022
    Publication date: July 21, 2022
    Applicant: Intel Corporation
    Inventors: Wim Heirman, Stijn Eyerman, Kristof Du Bois, Ibrahim Hur
  • Publication number: 20220197656
    Abstract: In an embodiment, a processor includes a fetch circuit to fetch instructions, the instructions including a code prefetch instruction; a decode circuit to decode the code prefetch instruction and provide the decoded code prefetch instruction to a memory circuit, the memory circuit to execute the decoded code prefetch instruction to prefetch a first set of code blocks into a first cache and to prefetch a second set of code blocks into a second cache. Other embodiments are described and claimed.
    Type: Application
    Filed: December 22, 2020
    Publication date: June 23, 2022
    Inventors: WIM HEIRMAN, STIJN EYERMAN, IBRAHIM HUR
  • Publication number: 20220100511
    Abstract: Methods and apparatus relating to one or more delayed cache writeback instructions for improved data sharing in manycore processors are described. In an embodiment, a delayed cache writeback instruction causes a cache block in a modified state in a Level 1 (L1) cache of a first core of a plurality of cores of a multi-core processor to a Modified write back (M.wb) state. The M.wb state causes the cache block to be written back to LLC upon eviction of the cache block from the L1 cache. Other embodiments are also disclosed and claimed.
    Type: Application
    Filed: September 26, 2020
    Publication date: March 31, 2022
    Applicant: Intel Corporation
    Inventors: Wim Heirman, Stijn Eyerman, Ibrahim Hur
  • Patent number: 11256626
    Abstract: Apparatus, method, and system for enhancing data prefetching based on non-uniform memory access (NUMA) characteristics are described herein. An apparatus embodiment includes a system memory, a cache, and a prefetcher. The system memory includes multiple memory regions, at least some of which are associated with different NUMA characteristic (access latency, bandwidth, etc.) than others. Each region is associated with its own set of prefetch parameters that are set in accordance to their respective NUMA characteristics. The prefetcher monitors data accesses to the cache and generates one or more prefetch requests to fetch data from the system memory to the cache based on the monitored data accesses and the set of prefetch parameters associated with the memory region from which data is to be fetched. The set of prefetcher parameters may include prefetch distance, training-to-stable threshold, and throttle threshold.
    Type: Grant
    Filed: April 1, 2020
    Date of Patent: February 22, 2022
    Assignee: Intel Corporation
    Inventors: Wim Heirman, Ibrahim Hur, Ugonna Echeruo, Stijn Eyerman, Kristof Du Bois