Patents by Inventor Mohammad Abdallah

Mohammad Abdallah has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10564975
    Abstract: A global front end scheduler to schedule instruction sequences to a plurality of virtual cores implemented via a plurality of partitionable engines. The global front end scheduler includes a thread allocation array to store a set of allocation thread pointers to point to a set of buckets in a bucket buffer in which execution blocks for respective threads are placed, a bucket buffer to provide a matrix of buckets, the bucket buffer including storage for the execution blocks, and a bucket retirement array to store a set of retirement thread pointers that track a next execution block to retire for a thread.
    Type: Grant
    Filed: January 30, 2018
    Date of Patent: February 18, 2020
    Assignee: INTEL CORPORATION
    Inventor: Mohammad Abdallah
  • Patent number: 10521239
    Abstract: A method for accelerating code optimization a microprocessor. The method includes fetching an incoming microinstruction sequence using an instruction fetch component and transferring the fetched macroinstructions to a decoding component for decoding into microinstructions. Optimization processing is performed by reordering the microinstruction sequence into an optimized microinstruction sequence comprising a plurality of dependent code groups. The optimized microinstruction sequence is output to a microprocessor pipeline for execution. A copy of the optimized microinstruction sequence is stored into a sequence cache for subsequent use upon a subsequent hit optimized microinstruction sequence.
    Type: Grant
    Filed: October 3, 2016
    Date of Patent: December 31, 2019
    Assignee: INTEL CORPORATION
    Inventor: Mohammad Abdallah
  • Patent number: 10514926
    Abstract: A microprocessor implemented method for performing early dependency resolution and data forwarding is disclosed. The method comprises mapping a plurality of instructions in a guest address space into a corresponding plurality of instructions in a native address space. For each current guest branch instruction in the native address space fetched during execution, performing (a) determining a youngest prior guest branch target stored in a guest branch target register, wherein the guest branch register is operable to speculatively store a plurality of prior guest branch targets corresponding to prior guest branch instructions; (b) determining a current branch target for a respective current guest branch instruction by adding an offset value for the respective current guest branch instruction to the youngest prior guest branch target; and (c) creating an entry in the guest branch target register for the current branch target.
    Type: Grant
    Filed: March 14, 2014
    Date of Patent: December 24, 2019
    Assignee: INTEL CORPORATION
    Inventor: Mohammad A. Abdallah
  • Patent number: 10503514
    Abstract: A method of managing a reduced size register view data structure in a processor, where the method includes receiving an incoming instruction sequence using a global front end, grouping instructions from the incoming instruction sequence to form instruction blocks, populating a register view data structure, wherein the register view data structure stores register information references by the instruction blocks as a set of register templates, generating a set of snapshots of the register templates to reduce a size of the register view data structure, and tracking a state of the processor to handle a branch miss-prediction using the register view data structure in accordance with execution of the instruction blocks.
    Type: Grant
    Filed: November 7, 2017
    Date of Patent: December 10, 2019
    Assignee: INTEL CORPORATION
    Inventor: Mohammad Abdallah
  • Publication number: 20190370038
    Abstract: An apparatus and method for providing support for execution of optimized code. The apparatus includes a processor that is configured to convert guest code to native code and monitor access to an indicated memory address range associated with a read-only portion of the memory and to detect access to the indicated memory address range. The processor is further configured to raise an exception in response to memory access to the indicated memory address range and determine an access property of the indicated memory address range.
    Type: Application
    Filed: July 24, 2017
    Publication date: December 5, 2019
    Inventors: Micah VILLMOW, Kevin LAWTON, Ravishankar RAO, Mohammad A. ABDALLAH
  • Publication number: 20190361704
    Abstract: A method for a delayed branch implementation by using a front end track table. The method includes receiving an incoming instruction sequence using a global front end, wherein the instruction sequence includes at least one branch, creating a delayed branch in response to receiving the one branch, and using a front end track table to track both the delayed branch the one branch.
    Type: Application
    Filed: August 9, 2019
    Publication date: November 28, 2019
    Applicant: Intel Corporation
    Inventor: Mohammad ABDALLAH
  • Patent number: 10467010
    Abstract: A method for performing memory disambiguation in an out-of-order microprocessor pipeline is disclosed. The method comprises storing a tag with a load operation, wherein the tag is an identification number representing a store instruction nearest to the load operation, wherein the store instruction is older with respect to the load operation and wherein the store has potential to result in a RAW violation in conjunction with the load operation. The method also comprises issuing the load operation from an instruction scheduling module. Further, the method comprises acquiring data for the load operation speculatively after the load operation has arrived at a load store queue module. Finally, the method comprises determining if an identification number associated with a last contiguous issued store with respect to the load operation is equal to or greater than the tag and gating a validation process for the load operation in response to the determination.
    Type: Grant
    Filed: March 13, 2014
    Date of Patent: November 5, 2019
    Assignee: Intel Corporation
    Inventors: Mohammad A. Abdallah, Mandeep Singh
  • Publication number: 20190286445
    Abstract: Fast unaligned memory access. hi accordance with a first embodiment of the present invention, a computing device includes a load queue memory structure configured to queue load operations and a store queue memory structure configured to queue store operations. The computing device includes also includes at least one bit configured to indicate the presence of an unaligned address component for an entry of said load queue memory structure, and at least one bit configured to indicate the presence of an unaligned address component for an entry of said store queue memory structure. The load queue memory may also include memory configured to indicate data forwarding of an unaligned address component from said store queue memory structure to said load queue memory structure.
    Type: Application
    Filed: June 6, 2019
    Publication date: September 19, 2019
    Inventors: Mandeep SINGH, Mohammad ABDALLAH
  • Patent number: 10417000
    Abstract: A method for a delayed branch implementation by using a front end track table. The method includes receiving an incoming instruction sequence using a global front end, wherein the instruction sequence includes at least one branch, creating a delayed branch in response to receiving the one branch, and sing a front end track table to track both the delayed branch the one branch.
    Type: Grant
    Filed: October 13, 2017
    Date of Patent: September 17, 2019
    Assignee: Intel Corporation
    Inventor: Mohammad Abdallah
  • Patent number: 10394563
    Abstract: A method for converting guest instructions into native instructions is disclosed. The method comprises accessing a guest instruction and performing a first level translation of the guest instruction. The performing comprises: (a) comparing the guest instruction to a plurality of group masks and a plurality of tags stored in multi-level conversion tables by pattern matching subfields of the guest instruction in a hierarchical manner, wherein the conversion tables store mappings of guest instruction bit-fields to corresponding native instruction bit-fields; and (b) responsive to a hit in a conversion table, substituting a bit-field in the guest instruction with a corresponding native equivalent of the bit-field. The method further comprises performing a second level translation of the guest instruction using a second level conversion table and outputting a resulting native instruction when the second level translation proceeds to completion.
    Type: Grant
    Filed: November 16, 2016
    Date of Patent: August 27, 2019
    Assignee: Intel Corporation
    Inventor: Mohammad Abdallah
  • Patent number: 10372454
    Abstract: A method for allocation of a segmented interconnect in an integrate circuit is disclosed. The method comprises receiving a plurality of requests from a plurality of resource consumers of a plurality of engines to access a plurality of resources, wherein the resources are spread across the plurality of engines and contain data for supporting execution of multiple code sequences. The method also comprises contending for the plurality of resources in accordance with requests from the plurality of resource consumers. Finally, the method comprises accessing the plurality of resources via a global interconnect structure, wherein the global interconnect structure has a finite number of buses accessible each clock cycle, and wherein the global interconnect structure comprises a plurality of global segment buses.
    Type: Grant
    Filed: November 17, 2016
    Date of Patent: August 6, 2019
    Assignee: Intel Corporation
    Inventor: Mohammad Abdallah
  • Publication number: 20190235877
    Abstract: A method and apparatus including a cache controller coupled to a cache memory, wherein the cache controller receives a plurality of cache access requests, performs a pre-sorting of the plurality of cache access requests by a first stage of the cache controller to order the plurality of cache access requests, wherein the first stage functions by performing a presorting and pre-clustering process on the plurality of cache access requests in parallel to map the plurality of cache access requests from a first position to a second position corresponding to ports or banks of a cache memory, performs the combining and splitting of the plurality of cache access request by a second stage of the cache controller, and applies the plurality of cache access requests to the cache memory at line speed.
    Type: Application
    Filed: April 12, 2019
    Publication date: August 1, 2019
    Inventor: Mohammad ABDALLAH
  • Publication number: 20190227982
    Abstract: An execution unit to execute instructions using a time-lag sliced architecture (TLSA). The execution unit includes a first computation unit and a second computation unit, where each of the first computation unit and the second computation unit includes a plurality of logic slices arranged in order, where each of the plurality of logic slices except a lattermost logic slice is coupled to an immediately following logic slice to provide an output of that logic slice to the immediately following logic slice, where the immediately following logic slice is to execute with a time lag with respect to its immediately previous logic slice. Further, each of the plurality of logic slices of the second computation unit is coupled to a corresponding logic slice of the first computation unit to receive an output of the corresponding logic slice of the first computation unit.
    Type: Application
    Filed: April 1, 2019
    Publication date: July 25, 2019
    Inventor: Mohammad A. Abdallah
  • Patent number: 10360031
    Abstract: Fast unaligned memory access. In accordance with a first embodiment of the present invention, a computing device includes a load queue memory structure configured to queue load operations and a store queue memory structure configured to queue store operations. The computing device includes also includes at least one bit configured to indicate the presence of an unaligned address component for an entry of said load queue memory structure, and at least one bit configured to indicate the presence of an unaligned address component for an entry of said store queue memory structure. The load queue memory may also include memory configured to indicate data forwarding of an unaligned address component from said store queue memory structure to said load queue memory structure.
    Type: Grant
    Filed: October 21, 2011
    Date of Patent: July 23, 2019
    Assignee: Intel Corporation
    Inventors: Mandeep Singh, Mohammad Abdallah
  • Patent number: 10353680
    Abstract: A system for an agnostic runtime architecture. The system includes a system emulation/virtualization converter, an application code converter, and a converter wherein a system emulation/virtualization converter and an application code converter implement a system emulation process, and wherein the system converter implements a system and application conversion process for executing code from a guest image, wherein the system converter or the system emulator. The system further includes a run ahead run time guest such an conversion/decoding process, and a prefetching process where guest code is pre-fetched from the target of guest branches in an instruction sequence.
    Type: Grant
    Filed: July 23, 2015
    Date of Patent: July 16, 2019
    Assignee: Intel Corporation
    Inventor: Mohammad Abdallah
  • Patent number: 10346302
    Abstract: A method for maintaining the coherency of a store coalescing cache and a load cache is disclosed. As a part of the method, responsive to a write-back of an entry from a level one store coalescing cache to a level two cache, the entry is written into the level two cache and into the level one load cache. The writing of the entry into the level two cache and into the level one load cache is executed at the speed of access of the level two cache.
    Type: Grant
    Filed: July 19, 2017
    Date of Patent: July 9, 2019
    Assignee: Intel Corporation
    Inventors: Karthikeyan Avudaiyappan, Mohammad Abdallah
  • Patent number: 10310987
    Abstract: Systems and methods for accessing a unified translation lookaside buffer (TLB) are disclosed. A method includes receiving an indicator of a level one translation lookaside buffer (L1TLB) miss corresponding to a request for a virtual address to physical address translation, searching a cache that includes virtual addresses and page sizes that correspond to translation table entries (TTEs) that have been evicted from the L1TLB, where a page size is identified, and searching a second level TLB and identifying a physical address that is contained in the second level TLB. Access is provided to the identified physical address.
    Type: Grant
    Filed: August 15, 2017
    Date of Patent: June 4, 2019
    Assignee: INTEL CORPORATION
    Inventors: Karthikeyan Avudaiyappan, Mohammad Abdallah
  • Patent number: 10303484
    Abstract: A method for line speed interconnect processing. The method includes receiving initial inputs from an input communications path, performing a pre-sorting of the initial inputs by using a first stage interconnect parallel processor to create intermediate inputs, and performing the final combining and splitting of the intermediate inputs by using a second stage interconnect parallel processor to create resulting outputs. The method further includes transmitting the resulting outputs out of the second stage at line speed.
    Type: Grant
    Filed: June 30, 2017
    Date of Patent: May 28, 2019
    Assignee: INTEL CORPORATION
    Inventor: Mohammad Abdallah
  • Publication number: 20190155603
    Abstract: System and method for multiplexing vector comparison. The system and method access a first vector having a vector length. The first vector includes a plurality of vector portions having a vector portion length. In addition, the method accesses a second vector of the vector length. The second vector includes the same quantity of vector portions as the plurality of vector portions, and the vector portions of the second vector are of the vector portion length. The method further includes performing a comparison of each of the plurality of vector portions of the first vector to each of the plurality of vector portions of the second vector and storing a result of the comparing in a third vector with at least one bit of the third vector corresponding to each comparison of the vector portions.
    Type: Application
    Filed: July 24, 2017
    Publication date: May 23, 2019
    Inventors: Micah VILLMOW, Mohammad A. ABDALLAH
  • Publication number: 20190155609
    Abstract: A microprocessor implemented method of speculatively maintaining a guest return address stack (GRAS) in a fetch stage of a microprocessor pipeline. The method includes mapping instructions in a guest address space to corresponding instructions in a native address space. For each of one or more function calls made in the native address space, performing the following: (a) pushing a current entry into the GRAS responsive to the function call, where the current entry includes a guest target return address and a corresponding native target return address associated with the function call; (b) popping the current entry from the GRAS responsive to processing a return instruction; (c) comparing the current entry with an entry popped from a return address stack (RAS) maintained at a later stage of the pipeline; and (d) responsive to a mismatch, fetching instructions from the return address in the entry popped from the RAS.
    Type: Application
    Filed: January 18, 2019
    Publication date: May 23, 2019
    Inventor: Mohammad A. ABDALLAH