Patents by Inventor Ibrahim Hur

Ibrahim Hur has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12657026
    Abstract: Techniques for automatic fusion of arithmetic in-flight instructions are described. An example apparatus comprises a buffer to store instructions to be issued to a functional unit for execution, and circuitry coupled to the buffer to combine two or more instructions from the buffer into a single combined instruction. Other examples are disclosed and claimed.
    Type: Grant
    Filed: June 23, 2022
    Date of Patent: June 16, 2026
    Assignee: Intel Corporation
    Inventors: Kristof Du Bois, Wim Heirman, Stijn Eyerman, Ibrahim Hur, Jason Agron
  • Patent number: 12632254
    Abstract: A system simulator simulates operations of a plurality of interconnected devices in a simulation of a computing system. The system simulator implements a communication runtime in the simulation to receive a packet generated by a simulation of a first one of the plurality of devices to be sent to a simulation of a second one of the plurality of devices in the simulation. The communication runtime buffers the packet in its internal buffer and receives a query from the simulation of the second device based on buffer capacity in the simulation of the second device has capacity. The packet is sent from the communication runtime buffer to the simulation of the second device based on the query to simulate transmission of the packet from the first device to the second device on a link.
    Type: Grant
    Filed: April 1, 2022
    Date of Patent: May 19, 2026
    Assignee: Intel Corporation
    Inventors: Samkit Jain, Izajasz Piotr Wrosz, Nicholas M. Pepperling, Joshua B. Fryman, Balasubramanian Seshasayee, Ibrahim Hur
  • Patent number: 12585394
    Abstract: An apparatus to facilitate generating a memory bandwidth stack for visualizing memory bandwidth utilization is disclosed. The apparatus includes processors to receive data corresponding to a memory cycle occurring during a total execution time of an application executed by the one or more processors; for the memory cycle, assign the memory cycle to a component of a bandwidth stack based on analysis of the data and in accordance with a prioritization scheme; for the component, determine a portion of the bandwidth stack to account to the component based at least in part on the assignment of the memory cycle to the component; and generate the bandwidth stack by at least representing the portion accounted to the component in the bandwidth stack.
    Type: Grant
    Filed: May 25, 2022
    Date of Patent: March 24, 2026
    Assignee: INTEL CORPORATION
    Inventors: Stijn Eyerman, Wim Heirman, Ibrahim Hur
  • Publication number: 20250321738
    Abstract: Methods and apparatus relating to one or more delayed cache writeback instructions for improved data sharing in manycore processors are described. In an embodiment, a delayed cache writeback instruction causes a cache block in a modified state in a Level 1 (L1) cache of a first core of a plurality of cores of a multi-core processor to a Modified write back (M.wb) state. The M.wb state causes the cache block to be written back to LLC upon eviction of the cache block from the L1 cache. Other embodiments are also disclosed and claimed.
    Type: Application
    Filed: June 13, 2025
    Publication date: October 16, 2025
    Inventors: Wim Heirman, Stijn Eyerman, Ibrahim Hur
  • Patent number: 12423075
    Abstract: Embodiments of apparatuses, methods, and systems for code prefetching are described. In an embodiment, an apparatus includes an instruction decoder, load circuitry, and execution circuitry. The instruction decoder is to decode a code prefetch instruction. The code prefetch instruction is to specify a first instruction to be prefetched. The load circuitry to prefetch the first instruction in response to the decoded code prefetch instruction. The execution circuitry is to execute the first instruction at a fetch stage of a pipeline.
    Type: Grant
    Filed: September 26, 2020
    Date of Patent: September 23, 2025
    Assignee: Intel Corporation
    Inventors: Ahmad Yasin, Lihu Rappoport, Jared W. Stark, Jeffrey Baxter, Israel Diamand, Pavel Fridman, Ibrahim Hur, Nir Tell
  • Patent number: 12333305
    Abstract: Methods and apparatus relating to one or more delayed cache writeback instructions for improved data sharing in manycore processors are described. In an embodiment, a delayed cache writeback instruction causes a cache block in a modified state in a Level 1 (L1) cache of a first core of a plurality of cores of a multi-core processor to a Modified write back (M.wb) state. The M.wb state causes the cache block to be written back to LLC upon eviction of the cache block from the L1 cache. Other embodiments are also disclosed and claimed.
    Type: Grant
    Filed: September 26, 2020
    Date of Patent: June 17, 2025
    Inventors: Wim Heirman, Stijn Eyerman, Ibrahim Hur
  • Patent number: 12111772
    Abstract: Techniques and mechanisms for providing information to determine whether a software prefetch instruction is to be executed. In an embodiment, one or more entries of a translation lookaside buffer (TLB) each include a respective value which indicates whether, according to one or more criteria, corresponding data has been sufficiently utilized. Insufficiently utilized data is indicated in a TLB entry with an identifier of an executed instruction to prefetch the corresponding data. An eviction of the TLB entry results in the creation of an entry in a registry of prefetch instructions. The entry in the registry includes the identifier of the executed prefetch instruction, and a value indicating a number of times that one or more future prefetch instructions are to be dropped. In another embodiment, execution of a subsequent prefetch instruction—which also corresponds to the identifier—is prevented based on the registry entry.
    Type: Grant
    Filed: December 23, 2020
    Date of Patent: October 8, 2024
    Assignee: Intel Corporation
    Inventors: Wim Heirman, Ibrahim Hur
  • Patent number: 12050915
    Abstract: In an embodiment, a processor includes a fetch circuit to fetch instructions, the instructions including a code prefetch instruction; a decode circuit to decode the code prefetch instruction and provide the decoded code prefetch instruction to a memory circuit, the memory circuit to execute the decoded code prefetch instruction to prefetch a first set of code blocks into a first cache and to prefetch a second set of code blocks into a second cache. Other embodiments are described and claimed.
    Type: Grant
    Filed: December 22, 2020
    Date of Patent: July 30, 2024
    Assignee: Intel Corporation
    Inventors: Wim Heirman, Stijn Eyerman, Ibrahim Hur
  • Patent number: 11960922
    Abstract: In an embodiment, a processor comprises: an execution circuit to execute instructions; at least one cache memory coupled to the execution circuit; and a table storage element coupled to the at least one cache memory, the table storage element to store a plurality of entries each to store object metadata of an object used in a code sequence. The processor is to use the object metadata to provide user space multi-object transactional atomic operation of the code sequence. Other embodiments are described and claimed.
    Type: Grant
    Filed: September 24, 2020
    Date of Patent: April 16, 2024
    Assignee: Intel Corporation
    Inventors: Joshua B. Fryman, Jason M. Howard, Ibrahim Hur, Robert Pawlowski
  • Publication number: 20230418612
    Abstract: Techniques for automatic fusion of arithmetic in-flight instructions are described. An example apparatus comprises a buffer to store instructions to be issued to a functional unit for execution, and circuitry coupled to the buffer to combine two or more instructions from the buffer into a single combined instruction. Other examples are disclosed and claimed.
    Type: Application
    Filed: June 23, 2022
    Publication date: December 28, 2023
    Applicant: Intel Corporation
    Inventors: Kristof Du Bois, Wim Heirman, Stijn Eyerman, Ibrahim Hur, Jason Agron
  • Patent number: 11526483
    Abstract: Methods, apparatus, systems and articles of manufacture to build a storage architecture for graph data are disclosed herein. Disclosed example apparatus include a neighbor identifier to identify respective sets of neighboring vertices of a graph. The neighboring vertices included in the respective sets are adjacent to respective ones of a plurality of vertices of the graph and respective sets of neighboring vertices are represented as respective lists of neighboring vertex identifiers. The apparatus also includes an element creator to create, in a cache memory, an array of elements that are unpopulated. The array elements have lengths equal to a length of a cache line. In addition, the apparatus includes an element populater to populate the elements with neighboring vertex identifiers. Each of the elements store neighboring vertex identifiers of respective ones of the list of neighboring vertex identifiers.
    Type: Grant
    Filed: March 30, 2018
    Date of Patent: December 13, 2022
    Assignee: Intel Corporation
    Inventors: Stijn Eyerman, Jason M. Howard, Ibrahim Hur, Ivan B. Ganev, Fabrizio Petrini, Joshua B. Fryman
  • Publication number: 20220283719
    Abstract: An apparatus to facilitate generating a memory bandwidth stack for visualizing memory bandwidth utilization is disclosed. The apparatus includes processors to receive data corresponding to a memory cycle occurring during a total execution time of an application executed by the one or more processors; for the memory cycle, assign the memory cycle to a component of a bandwidth stack based on analysis of the data and in accordance with a prioritization scheme; for the component, determine a portion of the bandwidth stack to account to the component based at least in part on the assignment of the memory cycle to the component; and generate the bandwidth stack by at least representing the portion accounted to the component in the bandwidth stack.
    Type: Application
    Filed: May 25, 2022
    Publication date: September 8, 2022
    Applicant: Intel Corporation
    Inventors: Stijn Eyerman, Wim Heirman, Ibrahim Hur
  • Publication number: 20220229677
    Abstract: A distributed simulation system is provided that includes a timing simulator and functional simulator(s) on different computing nodes to simulate a graph processing system. The functional simulators are to simulate execution of a set of instructions on the graph processing system and to send information associated with the simulated set of instructions to the timing simulator over the network. The timing simulator is to determine timing information associated with execution of the sets of instructions sent by the functional simulators and send the timing information to the functional simulators over the network. The timing simulator may determine a global synchronization point for the functional simulators and send the timing information for the sets of instructions to respective functional simulators at the global synchronization point. The functional simulators may stall simulation of further instructions until the timing information for its set of instructions is received from the timing simulator.
    Type: Application
    Filed: April 2, 2022
    Publication date: July 21, 2022
    Applicant: Intel Corporation
    Inventors: Wim Heirman, Stijn Eyerman, Kristof Du Bois, Ibrahim Hur
  • Publication number: 20220224605
    Abstract: A system simulator simulates operations of a plurality of interconnected devices in a simulation of a computing system. The system simulator implements a communication runtime in the simulation to receive a packet generated by a simulation of a first one of the plurality of devices to be sent to a simulation of a second one of the plurality of devices in the simulation. The communication runtime buffers the packet in its internal buffer and receives a query from the simulation of the second device based on buffer capacity in the simulation of the second device has capacity. The packet is sent from the communication runtime buffer to the simulation of the second device based on the query to simulate transmission of the packet from the first device to the second device on a link.
    Type: Application
    Filed: April 1, 2022
    Publication date: July 14, 2022
    Applicant: Intel Corporation
    Inventors: Samkit Jain, Izajasz Piotr Wrosz, Nicholas M. Pepperling, Joshua B. Fryman, Balasubramanian Seshasayee, Ibrahim Hur
  • Publication number: 20220222397
    Abstract: A distributed simulation system is provided that includes a plurality of computing nodes interconnected via a network implementing a Message Passing Interface (MPI) protocol. Each computing node is to simulate hardware logic of a core of a graph processing system and to simulate a respective system memory portion of the graph processing system.
    Type: Application
    Filed: April 1, 2022
    Publication date: July 14, 2022
    Applicant: Intel Corporation
    Inventors: Samkit Jain, Nicholas M. Pepperling, Izajasz Piotr Wrosz, Joshua B. Fryman, Ibrahim Hur
  • Publication number: 20220197656
    Abstract: In an embodiment, a processor includes a fetch circuit to fetch instructions, the instructions including a code prefetch instruction; a decode circuit to decode the code prefetch instruction and provide the decoded code prefetch instruction to a memory circuit, the memory circuit to execute the decoded code prefetch instruction to prefetch a first set of code blocks into a first cache and to prefetch a second set of code blocks into a second cache. Other embodiments are described and claimed.
    Type: Application
    Filed: December 22, 2020
    Publication date: June 23, 2022
    Inventors: WIM HEIRMAN, STIJN EYERMAN, IBRAHIM HUR
  • Publication number: 20220197821
    Abstract: Techniques and mechanisms for providing information to determine whether a software prefetch instruction is to be executed. In an embodiment, one or more entries of a translation lookaside buffer (TLB) each include a respective value which indicates whether, according to one or more criteria, corresponding data has been sufficiently utilized. Insufficiently utilized data is indicated in a TLB entry with an identifier of an executed instruction to prefetch the corresponding data. An eviction of the TLB entry results in the creation of an entry in a registry of prefetch instructions. The entry in the registry includes the identifier of the executed prefetch instruction, and a value indicating a number of times that one or more future prefetch instructions are to be dropped. In another embodiment, execution of a subsequent prefetch instruction—which also corresponds to the identifier—is prevented based on the registry entry.
    Type: Application
    Filed: December 23, 2020
    Publication date: June 23, 2022
    Applicant: Intel Corporation
    Inventors: Wim Heirman, Ibrahim Hur
  • Publication number: 20220100511
    Abstract: Methods and apparatus relating to one or more delayed cache writeback instructions for improved data sharing in manycore processors are described. In an embodiment, a delayed cache writeback instruction causes a cache block in a modified state in a Level 1 (L1) cache of a first core of a plurality of cores of a multi-core processor to a Modified write back (M.wb) state. The M.wb state causes the cache block to be written back to LLC upon eviction of the cache block from the L1 cache. Other embodiments are also disclosed and claimed.
    Type: Application
    Filed: September 26, 2020
    Publication date: March 31, 2022
    Applicant: Intel Corporation
    Inventors: Wim Heirman, Stijn Eyerman, Ibrahim Hur
  • Publication number: 20220091987
    Abstract: In an embodiment, a processor comprises: an execution circuit to execute instructions; at least one cache memory coupled to the execution circuit; and a table storage element coupled to the at least one cache memory, the table storage element to store a plurality of entries each to store object metadata of an object used in a code sequence. The processor is to use the object metadata to provide user space multi-object transactional atomic operation of the code sequence. Other embodiments are described and claimed.
    Type: Application
    Filed: September 24, 2020
    Publication date: March 24, 2022
    Inventors: JOSHUA B. FRYMAN, JASON M. HOWARD, IBRAHIM HUR, ROBERT PAWLOWSKI
  • Patent number: 11256626
    Abstract: Apparatus, method, and system for enhancing data prefetching based on non-uniform memory access (NUMA) characteristics are described herein. An apparatus embodiment includes a system memory, a cache, and a prefetcher. The system memory includes multiple memory regions, at least some of which are associated with different NUMA characteristic (access latency, bandwidth, etc.) than others. Each region is associated with its own set of prefetch parameters that are set in accordance to their respective NUMA characteristics. The prefetcher monitors data accesses to the cache and generates one or more prefetch requests to fetch data from the system memory to the cache based on the monitored data accesses and the set of prefetch parameters associated with the memory region from which data is to be fetched. The set of prefetcher parameters may include prefetch distance, training-to-stable threshold, and throttle threshold.
    Type: Grant
    Filed: April 1, 2020
    Date of Patent: February 22, 2022
    Assignee: Intel Corporation
    Inventors: Wim Heirman, Ibrahim Hur, Ugonna Echeruo, Stijn Eyerman, Kristof Du Bois