Patents by Inventor Rami Mohammad Al Sheikh

Rami Mohammad Al Sheikh has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10379863
    Abstract: Systems and methods for constructing an instruction slice for prefetching data of a data-dependent load instruction include a slicer for identifying a load instruction in an instruction sequence as a first occurrence of a qualified load instruction which will miss in a last-level cache. A commit buffer stores information pertaining to the first occurrence of the qualified load instruction and shadow instructions which follow. For a second occurrence of the qualified load instruction, an instruction slice is constructed from the information in the commit buffer to form a slice payload. A pre-execution engine pre-executes the instruction slice based on the slice payload to determine an address from which data is to be fetched for execution of a third and any subsequent occurrences of the qualified load instruction. The data is prefetched from the determined address for the third and any subsequent occurrence of the qualified load instruction.
    Type: Grant
    Filed: September 21, 2017
    Date of Patent: August 13, 2019
    Assignee: QUALCOMM Incorporated
    Inventors: Shivam Priyadarshi, Rami Mohammad A. Al Sheikh, Brandon Dwiel, Derek Hower
  • Patent number: 10353819
    Abstract: Next line prefetchers employing initial high prefetch prediction confidence states for throttling next line prefetches in processor-based system are disclosed. Next line prefetcher prefetches a next memory line into cache memory in response to read operation. To mitigate prefetch mispredictions, next line prefetcher is throttled to cease prefetching after prefetch prediction confidence state becomes a no next line prefetch state indicating number of incorrect predictions. Instead of initial prefetch prediction confidence state being set to no next line prefetch state, which is built up in response to correct predictions before performing a next line prefetch, initial prefetch prediction confidence state is set to next line prefetch state to allow next line prefetching. Thus, next line prefetcher starts prefetching next lines before requiring correct predictions to be “built up” in prefetch prediction confidence state.
    Type: Grant
    Filed: June 24, 2016
    Date of Patent: July 16, 2019
    Assignee: QUALCOMM Incorporated
    Inventors: Brandon Dwiel, Rami Mohammad Al Sheikh
  • Patent number: 10303608
    Abstract: A first load instruction specifying a first virtual address misses in a data cache. A delta value is received based on a program counter value of the first load instruction. A second virtual address is computed based on the delta value and the first virtual address. Data associated with the second virtual address is then prefetched from a main memory to the data cache prior to a second load instruction specifying the second virtual address missing in the data cache.
    Type: Grant
    Filed: August 22, 2017
    Date of Patent: May 28, 2019
    Assignee: QUALCOMM Incorporated
    Inventors: Rami Mohammad Al Sheikh, Shivam Priyadarshi, Brandon Dwiel, David John Palframan, Derek Hower, Muntaquim Faruk Chowdhury
  • Publication number: 20190155608
    Abstract: Aspects of the present disclosure include a method, a device, and a computer-readable medium for restarting an instruction pipeline of a processor that includes a decoupled fetcher. A method comprises detecting, in a processor, a re-fetch event, wherein the processor includes an instruction unit (IU) configured to fetch instructions from a decoupled fetcher (DCF), and simultaneously flushing the IU and the DCF in response to detecting of the re-fetch event.
    Type: Application
    Filed: November 16, 2018
    Publication date: May 23, 2019
    Inventors: Arthur PERAIS, Michael Scott MCILVAINE, Rami Mohammad A. AL SHEIKH, Robert Douglas CLANCY, Luke YEN, Rodney Wayne SMITH
  • Patent number: 10255074
    Abstract: Selective flushing of instructions in an instruction pipeline in a processor back to an execution-determined target address in response to a precise interrupt is disclosed. A selective instruction pipeline flush controller determines if a precise interrupt has occurred for an executed instruction in the instruction pipeline. The selective instruction pipeline flush controller determines if an instruction at the correct resolved target address of the instruction that caused the precise interrupt is contained in the instruction pipeline. If so, the selective instruction pipeline flush controller can selectively flush instructions back to the instruction in the pipeline that contains the correct resolved target address to reduce the amount of new instruction fetching.
    Type: Grant
    Filed: September 11, 2015
    Date of Patent: April 9, 2019
    Assignee: QUALCOMM Incorporated
    Inventors: Vignyan Reddy Kothinti Naresh, Rami Mohammad Al Sheikh, Harold Wade Cain, III
  • Publication number: 20190095621
    Abstract: Aspects of the present disclosure are directed to detecting and responding to injected faults. In some examples, fault injections are detected in a pipeline processor using transactional memory by comparing a predicted value (e.g. from a Value Predictor) against a subsequently loaded or computed reference value, and then detecting the fault based on the result of the comparison. If the predicted value is found to differ from the subsequently loaded or calculated value, the difference is deemed to be due to a fault and actions are taken to address the fault, such as by using deception or blinding of observable values. In some examples, the Value Predictor is modified to perform the comparison to detect the fault. The Value Predictor then notifies Transactional Hardware, which responds to the fault. In other examples described herein, the Value Predictor is unchanged and the Transactional Hardware detects and corrects the fault.
    Type: Application
    Filed: September 27, 2017
    Publication date: March 28, 2019
    Inventors: Rosario CAMMAROTA, Rami Mohammad A. AL SHEIKH, Wenjia RUAN
  • Publication number: 20190087192
    Abstract: Systems and methods for constructing an instruction slice for prefetching data of a data-dependent load instruction include a slicer for identifying a load instruction in an instruction sequence as a first occurrence of a qualified load instruction which will miss in a last-level cache. A commit buffer stores information pertaining to the first occurrence of the qualified load instruction and shadow instructions which follow. For a second occurrence of the qualified load instruction, an instruction slice is constructed from the information in the commit buffer to form a slice payload. A pre-execution engine pre-executes the instruction slice based on the slice payload to determine an address from which data is to be fetched for execution of a third and any subsequent occurrences of the qualified load instruction. The data is prefetched from the determined address for the third and any subsequent occurrence of the qualified load instruction.
    Type: Application
    Filed: September 21, 2017
    Publication date: March 21, 2019
    Inventors: Shivam PRIYADARSHI, Rami Mohammad A. AL SHEIKH, Brandon DWIEL, Derek HOWER
  • Patent number: 10223278
    Abstract: Systems and methods are directed to selectively bypassing allocation of cache lines in a cache. A bypass predictor table is provided with reuse counters to track reuse characteristics of cache lines, based on memory regions to which the cache lines belong in memory. A contender reuse counter provides an indication of a likelihood of reuse of a contender cache line in the cache pursuant to a miss in the cache for the contender cache line, and a victim reuse counter provides an indication of a likelihood of reuse for a victim cache line that will be evicted if the contender cache line is allocated in the cache. A decision whether to allocate the contender cache line in the cache or bypass allocation of the contender cache line in the cache is based on the contender reuse counter value and the victim reuse counter value.
    Type: Grant
    Filed: September 22, 2016
    Date of Patent: March 5, 2019
    Assignee: QUALCOMM Incorporated
    Inventors: Shivam Priyadarshi, Brandon Harley Anthony Dwiel, Rami Mohammad A. Al Sheikh, Harold Wade Cain, III
  • Publication number: 20190065964
    Abstract: A method and apparatus for predicting instruction load values in a processor. While a program is executing the processor is used to train predictors in order to predict load values. In particular 4 differing kinds of predictors are trained. The four predictors are the Last Value Predictor (LVP) which captures loads that encounter very few values, the Stride Address Predictor (SAP) which captures loads based on stride (offset) addresses, a Content Address Predictor (CAP) which captures load addresses that are non-stride and the Context Value Predictor (CVP) which captures load values in a particular context that are non-stride. Training methods and the use of such predictors are disclosed.
    Type: Application
    Filed: August 30, 2017
    Publication date: February 28, 2019
    Inventors: Rami Mohammad A. AL SHEIKH, Derek HOWER
  • Publication number: 20190065384
    Abstract: A request to access data at a first physical address misses in a private cache of a processor. A confidence value is received for the first physical address based on a hash value of the first physical address. A determination is made that the received confidence value exceeds a threshold value. In response, a speculative read request specifying the first physical address is issued to a memory controller of a main memory to expedite a miss for the data at the first physical address in a shared cache.
    Type: Application
    Filed: August 22, 2017
    Publication date: February 28, 2019
    Inventors: Rami Mohammad AL SHEIKH, Shivam PRIYADARSHI, Brandon DWIEL, David John PALFRAMAN, Derek HOWER
  • Publication number: 20190065375
    Abstract: A first load instruction specifying a first virtual address misses in a data cache. A delta value is received based on a program counter value of the first load instruction. A second virtual address is computed based on the delta value and the first virtual address. Data associated with the second virtual address is then prefetched from a main memory to the data cache prior to a second load instruction specifying the second virtual address missing in the data cache.
    Type: Application
    Filed: August 22, 2017
    Publication date: February 28, 2019
    Inventors: Rami Mohammad AL SHEIKH, Shivam PRIYADARSHI, Brandon DWIEL, David John PALFRAMAN, Derek HOWER, Muntaquim Faruk CHOWDHURY
  • Patent number: 10203745
    Abstract: A scheduler and method for dynamic power reduction, e.g., in a processor core, is proposed. In conventional processor cores for example, the scheduler precharges grant lines of many instructions only to discharge a great majority of the precharged lines in one cycle. To reduce power consumption, selective precharge and/or selective evaluation are proposed. In the selective precharge, the grant lines of instructions that will evaluate to false (e.g., invalid instructions) are not precharged in a cycle. In the selective evaluation, among the precharged instructions, instructions that are not ready are not evaluated in the same cycle. In this way, power consumption is reduced by avoiding unnecessary precharge and discharge.
    Type: Grant
    Filed: March 30, 2016
    Date of Patent: February 12, 2019
    Assignee: QUALCOMM Incorporated
    Inventors: Milind Ram Kulkarni, Rami Mohammad A. Al Sheikh, Raguram Damodaran
  • Patent number: 10185668
    Abstract: Systems and methods relate to cost-aware cache management policies. In a cost-aware least recently used (LRU) replacement policy, temporal locality as well as miss cost is taken into account in selecting a cache line for replacement, wherein the miss cost is based on an associated operation type including instruction cache read, data cache read, data cache write, prefetch, and write back. In a cost-aware dynamic re-reference interval prediction (DRRIP) based cache management policy, miss costs associated with operation types pertaining to a cache line are considered for assigning re-reference interval prediction values (RRPV) for inserting the cache line, pursuant to a cache miss and for updating the RRPV upon a hit for the cache line. The operation types comprise instruction cache access, data cache access, prefetch, and write back. These policies improve victim selection, while minimizing cache thrashing and scans.
    Type: Grant
    Filed: September 20, 2016
    Date of Patent: January 22, 2019
    Assignee: QUALCOMM Incorporated
    Inventors: Rami Mohammad A. Al Sheikh, Shivam Priyadarshi, Harold Wade Cain, III
  • Publication number: 20190018798
    Abstract: Systems and methods relate to cost-aware cache management policies. In a cost-aware least recently used (LRU) replacement policy, temporal locality as well as miss cost is taken into account in selecting a cache line for replacement, wherein the miss cost is based on an associated operation type including instruction cache read, data cache read, data cache write, prefetch, and write back. In a cost-aware dynamic re-reference interval prediction (DRRIP) based cache management policy, miss costs associated with operation types pertaining to a cache line are considered for assigning re-reference interval prediction values (RRPV) for inserting the cache line, pursuant to a cache miss and for updating the RRPV upon a hit for the cache line. The operation types comprise instruction cache access, data cache access, prefetch, and write back. These policies improve victim selection, while minimizing cache thrashing and scans.
    Type: Application
    Filed: September 18, 2018
    Publication date: January 17, 2019
    Inventors: Rami Mohammad AL SHEIKH, Shivam PRIYADARSHI, Harold Wade CAIN III
  • Publication number: 20190004806
    Abstract: Systems and methods for branch prediction of fixed direction branch instructions involve Bloom Filters. A taken Bloom Filter records instances of a branch instruction being taken or having resolved in a taken direction; while a not-taken Bloom Filter records instances of a branch instruction not being taken, or having resolved in a not-taken direction. For a branch instruction to be executed, the taken Bloom Filter and the not-taken Bloom Filter are accessed and a direction of execution for the branch instruction is predicted using at least one of the taken Bloom Filter or the not-taken Bloom Filter.
    Type: Application
    Filed: June 30, 2017
    Publication date: January 3, 2019
    Inventor: Rami Mohammad A. AL SHEIKH
  • Publication number: 20190004805
    Abstract: Systems and methods pertain to a branch prediction table comprising one or more entries. Each entry comprises one or more branch prediction counters corresponding to one or more instructions in a fetch group of instructions fetched for processing in a processor. Each of the two or more fetch groups comprises at least one branch instruction for which at least one of the one or more branch prediction counters is used for making a branch prediction. Two or more tag fields are associated with each entry, wherein the two or more tag fields correspond to two or more fetch groups. In the event of a miss in the branch prediction table, updating the branch prediction counters and the two or more tag fields is performed in a manner which enables constructive aliasing and prevents destructive aliasing.
    Type: Application
    Filed: June 28, 2017
    Publication date: January 3, 2019
    Inventor: Rami Mohammad A. AL SHEIKH
  • Publication number: 20190004803
    Abstract: Systems and methods for branch prediction include a processor configured to execute at least one branch instruction. The processor includes a branch prediction mechanism configured to provide a branch prediction for the at least one branch instruction and a statistical correction table (SCT) configured to indicate whether a branch prediction accuracy of the branch prediction provided by the branch prediction mechanism is worse than a statistical bias for a branch instruction. An execution pipeline of the processor is configured to speculatively executing the branch instruction in a direction corresponding to the statistical bias if, at least, the branch prediction accuracy is worse than the statistical bias.
    Type: Application
    Filed: June 30, 2017
    Publication date: January 3, 2019
    Inventor: Rami Mohammad A. AL SHEIKH
  • Patent number: 10089114
    Abstract: A scheduler with a picker block capable of dispatching multiple instructions per cycle is disclosed. The picker block may comprise an inter-group picker and an intra-group picker. The inter-group picker may be configured to pick multiple ready groups when there are two or more ready groups among a plurality of groups of instructions, and pick a single ready group when the single ready group is the only ready group among the plurality of groups. The intra-group picker may be configured to pick one ready instruction from each of the multiple ready groups when the inter-group picker picks the multiple ready groups, and to pick multiple ready instructions from the single ready group when the inter-group picker picks the single ready group.
    Type: Grant
    Filed: March 30, 2016
    Date of Patent: October 2, 2018
    Assignee: QUALCOMM Incorporated
    Inventors: Milind Ram Kulkarni, Rami Mohammad A. Al Sheikh, Raguram Damodaran
  • Publication number: 20180081811
    Abstract: Systems and methods for dynamically partitioning a shared cache, include dynamically determining a probability to be associated with each one of two or more processors configured to access the shared cache. Based on the probability for a processor, a first cache line of the processor is inserted in a most recently used (MRU) position of a least recently used (LRU) stack associated with the shared cache, pursuant to a miss in the shared cache for the first cache line. Based on the probability for the processor, a second cache line is promoted to the MRU position of the LRU stack, pursuant to a hit in the shared cache for the second cache line. The probability for the processor is determined based on hill-climbing, wherein fluctuations in the probability are reduced, local maxima are prevented, and the probability is prevented from falling below a threshold.
    Type: Application
    Filed: September 20, 2016
    Publication date: March 22, 2018
    Inventors: Rami Mohammad A. AL SHEIKH, Harold Wade CAIN, III
  • Publication number: 20180081691
    Abstract: Replaying speculatively dispatched load-dependent instructions in response to a cache miss for a producing load instruction in an out-of-order processor (OoP) is disclosed. To allow for a scheduler circuit to restore register dependencies in a register dependency tracking circuit for a replay operation in response to a cache miss for execution of a load instruction, the scheduler circuit includes a replay circuit. The replay circuit includes a load dependency tracking circuit. The replay circuit is configured to track dependencies of dispatched load instructions in the load dependency tracking circuit. The replay circuit uses these tracked dependencies to restore register dependencies for the dispatched load instructions in the register dependency tracking circuit in response to a replay operation. Thus, the load instruction does not have to be re-allocated to restore register dependencies in the register dependency tracking circuit used for re-dispatching load-dependent instructions.
    Type: Application
    Filed: September 21, 2016
    Publication date: March 22, 2018
    Inventors: Milind Ram Kulkarni, Rami Mohammad Al Sheikh, Raguram Damodaran