Patents by Inventor Wim Heirman

Wim Heirman has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230418612
    Abstract: Techniques for automatic fusion of arithmetic in-flight instructions are described. An example apparatus comprises a buffer to store instructions to be issued to a functional unit for execution, and circuitry coupled to the buffer to combine two or more instructions from the buffer into a single combined instruction. Other examples are disclosed and claimed.
    Type: Application
    Filed: June 23, 2022
    Publication date: December 28, 2023
    Applicant: Intel Corporation
    Inventors: Kristof Du Bois, Wim Heirman, Stijn Eyerman, Ibrahim Hur, Jason Agron
  • Publication number: 20220283719
    Abstract: An apparatus to facilitate generating a memory bandwidth stack for visualizing memory bandwidth utilization is disclosed. The apparatus includes processors to receive data corresponding to a memory cycle occurring during a total execution time of an application executed by the one or more processors; for the memory cycle, assign the memory cycle to a component of a bandwidth stack based on analysis of the data and in accordance with a prioritization scheme; for the component, determine a portion of the bandwidth stack to account to the component based at least in part on the assignment of the memory cycle to the component; and generate the bandwidth stack by at least representing the portion accounted to the component in the bandwidth stack.
    Type: Application
    Filed: May 25, 2022
    Publication date: September 8, 2022
    Applicant: Intel Corporation
    Inventors: Stijn Eyerman, Wim Heirman, Ibrahim Hur
  • Publication number: 20220229677
    Abstract: A distributed simulation system is provided that includes a timing simulator and functional simulator(s) on different computing nodes to simulate a graph processing system. The functional simulators are to simulate execution of a set of instructions on the graph processing system and to send information associated with the simulated set of instructions to the timing simulator over the network. The timing simulator is to determine timing information associated with execution of the sets of instructions sent by the functional simulators and send the timing information to the functional simulators over the network. The timing simulator may determine a global synchronization point for the functional simulators and send the timing information for the sets of instructions to respective functional simulators at the global synchronization point. The functional simulators may stall simulation of further instructions until the timing information for its set of instructions is received from the timing simulator.
    Type: Application
    Filed: April 2, 2022
    Publication date: July 21, 2022
    Applicant: Intel Corporation
    Inventors: Wim Heirman, Stijn Eyerman, Kristof Du Bois, Ibrahim Hur
  • Publication number: 20220197821
    Abstract: Techniques and mechanisms for providing information to determine whether a software prefetch instruction is to be executed. In an embodiment, one or more entries of a translation lookaside buffer (TLB) each include a respective value which indicates whether, according to one or more criteria, corresponding data has been sufficiently utilized. Insufficiently utilized data is indicated in a TLB entry with an identifier of an executed instruction to prefetch the corresponding data. An eviction of the TLB entry results in the creation of an entry in a registry of prefetch instructions. The entry in the registry includes the identifier of the executed prefetch instruction, and a value indicating a number of times that one or more future prefetch instructions are to be dropped. In another embodiment, execution of a subsequent prefetch instruction—which also corresponds to the identifier—is prevented based on the registry entry.
    Type: Application
    Filed: December 23, 2020
    Publication date: June 23, 2022
    Applicant: Intel Corporation
    Inventors: Wim Heirman, Ibrahim Hur
  • Publication number: 20220197656
    Abstract: In an embodiment, a processor includes a fetch circuit to fetch instructions, the instructions including a code prefetch instruction; a decode circuit to decode the code prefetch instruction and provide the decoded code prefetch instruction to a memory circuit, the memory circuit to execute the decoded code prefetch instruction to prefetch a first set of code blocks into a first cache and to prefetch a second set of code blocks into a second cache. Other embodiments are described and claimed.
    Type: Application
    Filed: December 22, 2020
    Publication date: June 23, 2022
    Inventors: WIM HEIRMAN, STIJN EYERMAN, IBRAHIM HUR
  • Publication number: 20220100511
    Abstract: Methods and apparatus relating to one or more delayed cache writeback instructions for improved data sharing in manycore processors are described. In an embodiment, a delayed cache writeback instruction causes a cache block in a modified state in a Level 1 (L1) cache of a first core of a plurality of cores of a multi-core processor to a Modified write back (M.wb) state. The M.wb state causes the cache block to be written back to LLC upon eviction of the cache block from the L1 cache. Other embodiments are also disclosed and claimed.
    Type: Application
    Filed: September 26, 2020
    Publication date: March 31, 2022
    Applicant: Intel Corporation
    Inventors: Wim Heirman, Stijn Eyerman, Ibrahim Hur
  • Patent number: 11256626
    Abstract: Apparatus, method, and system for enhancing data prefetching based on non-uniform memory access (NUMA) characteristics are described herein. An apparatus embodiment includes a system memory, a cache, and a prefetcher. The system memory includes multiple memory regions, at least some of which are associated with different NUMA characteristic (access latency, bandwidth, etc.) than others. Each region is associated with its own set of prefetch parameters that are set in accordance to their respective NUMA characteristics. The prefetcher monitors data accesses to the cache and generates one or more prefetch requests to fetch data from the system memory to the cache based on the monitored data accesses and the set of prefetch parameters associated with the memory region from which data is to be fetched. The set of prefetcher parameters may include prefetch distance, training-to-stable threshold, and throttle threshold.
    Type: Grant
    Filed: April 1, 2020
    Date of Patent: February 22, 2022
    Assignee: Intel Corporation
    Inventors: Wim Heirman, Ibrahim Hur, Ugonna Echeruo, Stijn Eyerman, Kristof Du Bois
  • Publication number: 20220012126
    Abstract: A translation cache and configurable error checking and correction (“ECC”) memory reduces ECC memory overhead. The translation cache supports a configurable ECC memory capable of storing a portion of a cache line, along with any ECC data, in corresponding parts of memory devices to reduce the ECC memory overhead in a memory subsystem. The corresponding parts include any same one of an upper, lower, left or right part of memory devices in a memory module, including dynamic random access memory (“DRAM”) devices in a dual inline memory module (“DIMM”).
    Type: Application
    Filed: September 23, 2021
    Publication date: January 13, 2022
    Inventors: Duane E. GALBI, Wim HEIRMAN, Dimitrios ZIAKAS
  • Patent number: 11010182
    Abstract: A method for simulating a set of instructions to be executed on a processor including performing a performance simulation of the processor over a number of simulation cycles. Modeling, in a frontend component, branch prediction and instruction cache is performed providing instructions to the instruction window, and modeling of an instruction window for the cycle is performed. From the simulation, a performance parameter of the processor is obtained without modeling a reorder buffer, issue queue(s), register renaming, load-store queue(s) and other buffers of the processor.
    Type: Grant
    Filed: June 17, 2013
    Date of Patent: May 18, 2021
    Assignee: UNIVERSITEIT GENT
    Inventors: Lieven Eeckhout, Stijn Eyerman, Wim Heirman, Trevor E. Carlson
  • Patent number: 10942851
    Abstract: In one embodiment, an apparatus includes a memory access circuit to receive memory access instructions and provide at least some of the memory access instructions to a memory subsystem for execution. The memory access circuit may have a conversion circuit to convert the first memory access instruction to a first subline memory access instruction, e.g., based at least in part on an access history for a first memory access instruction. Other embodiments are described and claimed.
    Type: Grant
    Filed: November 29, 2018
    Date of Patent: March 9, 2021
    Assignee: Intel Corporation
    Inventors: Wim Heirman, Stijn Eyerman, Kristof Du Bois, Ibrahim Hur, Joshua B. Fryman
  • Patent number: 10929132
    Abstract: Disclosed embodiments relate to systems and methods for performing instructions to access a compressed graphic list. In one example, a processor includes fetch and decode circuitry to fetch and decode the single instruction to access the compressed graphic list, and execution circuitry to execute the decoded single instruction to cause access to the compressed graphic list by: receiving, from a load store queue, at a first op-engine associated with a first data location, an indirection request, computing, via the first op-engine, a second data location associated with a second op-engine, computing, via the second op-engine, a third data location associated with a third op-engine responsive to the indirection request, and providing, via the third op-engine, a data response to the load store queue responsive to receiving data from the third data location.
    Type: Grant
    Filed: September 23, 2019
    Date of Patent: February 23, 2021
    Assignee: Intel Corporation
    Inventors: Robert Pawlowski, Scott Hagan Schmittel, Joshua Fryman, Wim Heirman, Jason Howard, Ankit More, Shaden Smith, Scott Cline
  • Patent number: 10877886
    Abstract: Embodiment of this disclosure provides a mechanism to store cache lines in dedicated cache of an idle core. In one embodiment, a multi-core processor comprising a first core, a second core, a first cache, a second cache, a third cache, and a cache controller unit is provided. The cache controller is operatively coupled to at least the first cache, the second cache, and the third cache. The cache controller is to evict a first line from the first cache, wherein the first core is in an active state. Responsive to the evicting of the first line, the first line is stored in the third cache. Responsive to storing the first line, a second line is evicted from the third cache. Responsive to evicting the second line, the second line is stored in the second cache when the second core is in an idle state.
    Type: Grant
    Filed: March 29, 2018
    Date of Patent: December 29, 2020
    Assignee: Intel Corporation
    Inventors: Wim Heirman, Kristof Du Bois, Yves Vandriessche, Stijn Eyerman, Ibrahim Hur, Erik Hallnor
  • Publication number: 20200233806
    Abstract: Apparatus, method, and system for enhancing data prefetching based on non-uniform memory access (NUMA) characteristics are described herein. An apparatus embodiment includes a system memory, a cache, and a prefetcher. The system memory includes multiple memory regions, at least some of which are associated with different NUMA characteristic (access latency, bandwidth, etc.) than others. Each region is associated with its own set of prefetch parameters that are set in accordance to their respective NUMA characteristics. The prefetcher monitors data accesses to the cache and generates one or more prefetch requests to fetch data from the system memory to the cache based on the monitored data accesses and the set of prefetch parameters associated with the memory region from which data is to be fetched. The set of prefetcher parameters may include prefetch distance, training-to-stable threshold, and throttle threshold.
    Type: Application
    Filed: April 1, 2020
    Publication date: July 23, 2020
    Applicant: Intel Corporation
    Inventors: Wim Heirman, Ibrahim Hur, Ugonna Echeruo, Stijn Eyerman, Kristof Du Bois
  • Patent number: 10684858
    Abstract: Disclosed embodiments relate to an indirect memory fetch (IMF) unit. In one example, an apparatus includes circuitry to fetch and decode an instruction specifying a sparse operand array including N operands, and an index array including N contiguously-addressed indices. The apparatus further includes a processing engine associated with an IMF unit to respond to the decoded instruction by initializing the IMF unit to fetch the N operands in order, probing the IMF unit to determine that a fetched operand is ready to retrieve, retrieving the fetched operand from the IMF unit, and repeating the probing and retrieving until all N operands have been retrieved. The IMF unit, independent of the processing engine, is to fetch the N contiguously-addressed indices from the index array, use the N fetched indices to calculate memory addresses for the N operands, and issue a plurality of read requests to fetch the N operands in order.
    Type: Grant
    Filed: June 1, 2018
    Date of Patent: June 16, 2020
    Assignee: Intel Corporation
    Inventors: Stijn Eyerman, Wim Heirman, Kristof Du Bois, Ibrahim Hur, Joshua B. Fryman
  • Publication number: 20200174929
    Abstract: In one embodiment, an apparatus includes a memory access circuit to receive memory access instructions and provide at least some of the memory access instructions to a memory subsystem for execution. The memory access circuit may have a conversion circuit to convert the first memory access instruction to a first subline memory access instruction, e.g., based at least in part on an access history for a first memory access instruction. Other embodiments are described and claimed.
    Type: Application
    Filed: November 29, 2018
    Publication date: June 4, 2020
    Inventors: Wim Heirman, Stijn Eyerman, Kristof Du Bois, Ibrahim Hur, Joshua B. Fryman
  • Patent number: 10621099
    Abstract: Apparatus, method, and system for enhancing data prefetching based on non-uniform memory access (NUMA) characteristics are described herein. An apparatus embodiment includes a system memory, a cache, and a prefetcher. The system memory includes multiple memory regions, at least some of which are associated with different NUMA characteristic (access latency, bandwidth, etc.) than others. Each region is associated with its own set of prefetch parameters that are set in accordance to their respective NUMA characteristics. The prefetcher monitors data accesses to the cache and generates one or more prefetch requests to fetch data from the system memory to the cache based on the monitored data accesses and the set of prefetch parameters associated with the memory region from which data is to be fetched. The set of prefetcher parameters may include prefetch distance, training-to-stable threshold, and throttle threshold.
    Type: Grant
    Filed: June 29, 2018
    Date of Patent: April 14, 2020
    Assignee: Intel Corporation
    Inventors: Wim Heirman, Ibrahim Hur, Ugonna Echeruo, Stijn Eyerman, Kristof Du Bois
  • Publication number: 20200004684
    Abstract: Apparatus, method, and system for enhancing data prefetching based on non-uniform memory access (NUMA) characteristics are described herein. An apparatus embodiment includes a system memory, a cache, and a prefetcher. The system memory includes multiple memory regions, at least some of which are associated with different NUMA characteristic (access latency, bandwidth, etc.) than others. Each region is associated with its own set of prefetch parameters that are set in accordance to their respective NUMA characteristics. The prefetcher monitors data accesses to the cache and generates one or more prefetch requests to fetch data from the system memory to the cache based on the monitored data accesses and the set of prefetch parameters associated with the memory region from which data is to be fetched. The set of prefetcher parameters may include prefetch distance, training-to-stable threshold, and throttle threshold.
    Type: Application
    Filed: June 29, 2018
    Publication date: January 2, 2020
    Inventors: Wim Heirman, Ibrahim Hur, Ugonna Echeruo, Stijn Eyerman, Kristof Du Bois
  • Publication number: 20190369998
    Abstract: Disclosed embodiments relate to an indirect memory fetch (IMF) unit. In one example, an apparatus includes circuitry to fetch and decode an instruction specifying a sparse operand array including N operands, and an index array including N contiguously-addressed indices. The apparatus further includes a processing engine associated with an IMF unit to respond to the decoded instruction by initializing the IMF unit to fetch the N operands in order, probing the IMF unit to determine that a fetched operand is ready to retrieve, retrieving the fetched operand from the IMF unit, and repeating the probing and retrieving until all N operands have been retrieved. The IMF unit, independent of the processing engine, is to fetch the N contiguously-addressed indices from the index array, use the N fetched indices to calculate memory addresses for the N operands, and issue a plurality of read requests to fetch the N operands in order.
    Type: Application
    Filed: June 1, 2018
    Publication date: December 5, 2019
    Inventors: Stijn EYERMAN, Wim HEIRMAN, Kristof DU BOIS, Ibrahim HUR, Joshua B. FRYMAN
  • Patent number: 10489297
    Abstract: An example processor that includes a register, a cache, a processor core, and a programmable logic circuit. The register may store a first prefetch value indicating a first amount of time to prefetch data from a memory prior to an execution of a subsequent instruction that uses the data. The processor core may be coupled to the cache and the register. The processor core may execute a prefetch instruction to access the data from the memory, store a copy of the data in the cache, and execute the subsequent instruction. The programmable logic circuit may be coupled to the processor core. The programmable logic circuit may determine whether the first amount of time is insufficient to prefetch the data for the execution of the subsequent instruction and change the first prefetch value to a second prefetch value when the first amount of time is insufficient.
    Type: Grant
    Filed: February 22, 2017
    Date of Patent: November 26, 2019
    Assignee: Intel Corporation
    Inventors: Wim Heirman, Yves Vandriessche, Ibrahim Hur
  • Publication number: 20190303294
    Abstract: Embodiment of this disclosure provides a mechanism to store cache lines in dedicated cache of an idle core. In one embodiment, a multi-core processor comprising a first core, a second core, a first cache, a second cache, a third cache, and a cache controller unit is provided. The cache controller is operatively coupled to at least the first cache, the second cache, and the third cache. The cache controller is to evict a first line from the first cache, wherein the first core is in an active state. Responsive to the evicting of the first line, the first line is stored in the third cache. Responsive to storing the first line, a second line is evicted from the third cache. Responsive to evicting the second line, the second line is stored in the second cache when the second core is in an idle state.
    Type: Application
    Filed: March 29, 2018
    Publication date: October 3, 2019
    Inventors: Wim Heirman, Kristof Du Bois, Yves Vandriessche, Stijn Eyerman, Ibrahim Hur, Erik Hallnor