Patents by Inventor Muawya M. Al-Otoom

Muawya M. Al-Otoom has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Multi-table signature prefetch

Patent number: 11630670

Abstract: Techniques are disclosed relating to signature-based instruction prefetching. In some embodiments, processor pipeline circuitry executes a computer program that includes control transfer instructions, such that the execution follows a taken path through the computer program. First signature prefetch table circuitry indicates prefetch addresses for signatures generated using a first signature generation technique and second signature prefetch table circuitry indicates prefetch addresses for signatures generated using a second, different signature generation technique. Signature prefetch circuitry, in response to a prefetch training event, determines a first signature according to the first technique and a second signature according to the second technique and selects one but not both of the first and second signature prefetch tables to train using the first signature or the second signature.

Type: Grant

Filed: July 21, 2021

Date of Patent: April 18, 2023

Assignee: Apple Inc.

Inventors: Douglas C. Holman, Ian D. Kountanis, Amit Kumar, Muawya M. Al-Otoom
Multi-table Signature Prefetch

Publication number: 20230023860

Abstract: Techniques are disclosed relating to signature-based instruction prefetching. In some embodiments, processor pipeline circuitry executes a computer program that includes control transfer instructions, such that the execution follows a taken path through the computer program. First signature prefetch table circuitry indicates prefetch addresses for signatures generated using a first signature generation technique and second signature prefetch table circuitry that indicates prefetch addresses for signatures generated using a second, different signature generation technique. Signature prefetch circuitry, in response to a prefetch training event: determines a first signature according to the first technique and a second signature according to the second technique and selects one but not both of the first and second signature prefetch tables to train using the first signature or the second signature.

Type: Application

Filed: July 21, 2021

Publication date: January 26, 2023

Inventors: Douglas C. Holman, Ian D. Kountanis, Amit Kumar, Muawya M. Al-Otoom
Zero cycle load bypass in a decode group

Patent number: 11416254

Abstract: Systems, apparatuses, and methods for implementing zero cycle load bypass operations are described. A system includes a processor with at least a decode unit, control logic, mapper, and free list. When a load operation is detected, the control logic determines if the load operation qualifies to be converted to a zero cycle load bypass operation. Conditions for qualifying include the load operation being in the same decode group as an older store operation to the same address. Qualifying load operations are converted to zero cycle load bypass operations. A lookup of the free list is prevented for a zero cycle load bypass operation and a destination operand of the load is renamed with a same physical register identifier used for a source operand of the store. Also, the data of the store is bypassed to the load.

Type: Grant

Filed: December 5, 2019

Date of Patent: August 16, 2022

Assignee: Apple Inc.

Inventors: Deepankar Duggal, Kulin N. Kothari, Conrado Blasco, Muawya M. Al-Otoom
Indirect branch predictor based on register operands

Patent number: 11379240

Abstract: In an embodiment, an indirect branch predictor generates indirect branch predictions based on one or more register values. The register values may be the contents of registers on which the indirect branch instruction is directly or indirectly dependent for generating the branch target address, for example. In an embodiment, at least one of the registers may be a source for a load instruction, and the indirect branch may be dependent (directly or indirectly) on the target of the load. In an embodiment, the indirect branch predictor may be one of at least two indirect branch predictors in a processor. The other indirect branch predictor may be based on a fetch address, or PC, associated with the indirect branch instruction. The other indirect branch predictor may generate a first predicted target address, and the indirect branch predictor may generate a second predicted target address for the same indirect branch instruction.

Type: Grant

Filed: January 31, 2020

Date of Patent: July 5, 2022

Assignee: Apple Inc.

Inventors: Muawya M. Al-Otoom, Ian D. Kountanis, Conrado Blasco, Haoyan Jia, Amit Kumar
History file for previous register mapping storage and last reference indication

Patent number: 11200062

Abstract: Systems, apparatuses, and methods for implementing a physical register last reference scheme are described. A system includes a processor with a mapper, history file, and freelist. When an entry in the mapper is updated with a new architectural register-to-physical register mapping, the processor creates a new history file entry for the given instruction that caused the update. The processor also searches the mapper to determine if the old physical register that was previously stored in the mapper entry is referenced by any other mapper entries. If there are no other mapper entries that reference this old physical register, then a last reference indicator is stored in the new history file entry. When the given instruction retires, the processor checks the last reference indicator in the history file entry to determine whether the old physical register can be returned to the freelist of available physical registers.

Type: Grant

Filed: August 26, 2019

Date of Patent: December 14, 2021

Assignee: Apple Inc.

Inventors: Deepankar Duggal, Conrado Blasco, Muawya M. Al-Otoom, Richard F. Russo
Indirect Branch Predictor Based on Register Operands

Publication number: 20210240477

Abstract: In an embodiment, an indirect branch predictor generates indirect branch predictions based on one or more register values. The register values may be the contents of registers on which the indirect branch instruction is directly or indirectly dependent for generating the branch target address, for example. In an embodiment, at least one of the registers may be a source for a load instruction, and the indirect branch may be dependent (directly or indirectly) on the target of the load. In an embodiment, the indirect branch predictor may be one of at least two indirect branch predictors in a processor. The other indirect branch predictor may be based on a fetch address, or PC, associated with the indirect branch instruction. The other indirect branch predictor may generate a first predicted target address, and the indirect branch predictor may generate a second predicted target address for the same indirect branch instruction.

Type: Application

Filed: January 31, 2020

Publication date: August 5, 2021

Inventors: Muawya M. Al-Otoom, Ian D. Kountanis, Conrado Blasco, Haoyan Jia, Amit Kumar
ZERO CYCLE LOAD BYPASS

Publication number: 20210173654

Abstract: Systems, apparatuses, and methods for implementing zero cycle load bypass operations are described. A system includes a processor with at least a decode unit, control logic, mapper, and free list. When a load operation is detected, the control logic determines if the load operation qualifies to be converted to a zero cycle load bypass operation. Conditions for qualifying include the load operation being in the same decode group as an older store operation to the same address. Qualifying load operations are converted to zero cycle load bypass operations. A lookup of the free list is prevented for a zero cycle load bypass operation and a destination operand of the load is renamed with a same physical register identifier used for a source operand of the store. Also, the data of the store is bypassed to the load.

Type: Application

Filed: December 5, 2019

Publication date: June 10, 2021

Inventors: Deepankar Duggal, Kulin N. Kothari, Conrado Blasco, Muawya M. Al-Otoom
LAST PHYSICAL REGISTER REFERENCE SCHEME

Publication number: 20210064376

Abstract: Systems, apparatuses, and methods for implementing a physical register last reference scheme are described. A system includes a processor with a mapper, history file, and freelist. When an entry in the mapper is updated with a new architectural register-to-physical register mapping, the processor creates a new history file entry for the given instruction that caused the update. The processor also searches the mapper to determine if the old physical register that was previously stored in the mapper entry is referenced by any other mapper entries. If there are no other mapper entries that reference this old physical register, then a last reference indicator is stored in the new history file entry. When the given instruction retires, the processor checks the last reference indicator in the history file entry to determine whether the old physical register can be returned to the freelist of available physical registers.

Type: Application

Filed: August 26, 2019

Publication date: March 4, 2021

Inventors: Deepankar Duggal, Conrado Blasco, Muawya M. Al-Otoom, Richard F. Russo
System and method for predicting memory dependence when a source register of a push instruction matches the destination register of a pop instruction

Patent number: 10838729

Abstract: A system and method for efficiently reducing the latency and power of memory access operations. A processor includes a stack pointer (SP) load-store dependence (LSD) predictor which predicts whether a memory dependence exists on a store instruction. The processor also includes a register file (RF) LSD predictor which predicts whether a memory dependence exists on a store instruction or a load instruction by a subsequent load instruction in program order. Each of the SP-LSD predictor and the RF-LSD predictor predicts and performs register renaming in a pipeline stage earlier than a renaming pipeline stage. The RF-LSD predictor also determines whether any intervening instructions between a producer memory instruction and a consumer memory instruction modify a predicted dependence.

Type: Grant

Filed: March 21, 2018

Date of Patent: November 17, 2020

Assignee: Apple Inc.

Inventors: Muawya M. Al-Otoom, Conrado Blasco, Deepankar Duggal, Kulin N. Kothari, Richard F. Russo
Branch prediction system

Patent number: 10719327

Abstract: In some embodiments, a branch prediction unit includes a plurality of branch prediction circuits and selection logic. At least two of the branch prediction circuits are configured, based on an address of a branch instruction and different sets of history information, to provide a corresponding branch prediction for the branch instruction. At least one storage element of the at least two branch prediction circuits is set associative. The selection logic is configured to select a particular branch prediction output by one of the branch prediction circuits as a current branch prediction output of the branch prediction unit. In some instances, the branch prediction unit may be less likely to replace branch prediction information, as compared to a different branch prediction unit that does not include a set associative storage element. In some embodiments, this arrangement may lead to increased performance of the branch prediction unit.

Type: Grant

Filed: May 19, 2015

Date of Patent: July 21, 2020

Assignee: Apple Inc.

Inventors: Muawya M. Al-Otoom, Ian D. Kountanis, Conrado Blasco
Accelerated interlane vector reduction instructions

Patent number: 10209989

Abstract: A vector reduction instruction is executed by a processor to provide efficient reduction operations on an array of data elements. The processor includes vector registers. Each vector register is divided into a plurality of lanes, and each lane stores the same number of data elements. The processor also includes execution circuitry that receives the vector reduction instruction to reduce the array of data elements stored in a source operand into a result in a destination operand using a reduction operator. Each of the source operand and the destination operand is one of the vector registers. Responsive to the vector reduction instruction, the execution circuitry applies the reduction operator to two of the data elements in each lane, and shifts one or more remaining data elements when there is at least one of the data elements remaining in each lane.

Type: Grant

Filed: March 7, 2017

Date of Patent: February 19, 2019

Assignee: Intel Corporation

Inventors: Paul Caprioli, Abhay S. Kanhere, Jeffrey J. Cook, Muawya M. Al-Otoom
ACCELERATED INTERLANE VECTOR REDUCTION INSTRUCTIONS

Publication number: 20170242699

Abstract: A vector reduction instruction is executed by a processor to provide efficient reduction operations on an array of data elements. The processor includes vector registers. Each vector register is divided into a plurality of lanes, and each lane stores the same number of data elements. The processor also includes execution circuitry that receives the vector reduction instruction to reduce the array of data elements stored in a source operand into a result in a destination operand using a reduction operator. Each of the source operand and the destination operand is one of the vector registers. Responsive to the vector reduction instruction, the execution circuitry applies the reduction operator to two of the data elements in each lane, and shifts one or more remaining data elements when there is at least one of the data elements remaining in each lane.

Type: Application

Filed: March 7, 2017

Publication date: August 24, 2017

Inventors: PAUL CAPRIOLI, ABHAY S. KANHERE, JEFFREY J. COOK, MUAWYA M. AL-OTOOM
HARDWARE PROFILING MECHANISM TO ENABLE PAGE LEVEL AUTOMATIC BINARY TRANSLATION

Publication number: 20170212825

Abstract: A hardware profiling mechanism implemented by performance monitoring hardware enables page level automatic binary translation. The hardware during runtime identifies a code page in memory containing potentially optimizable instructions. The hardware requests allocation of a new page in memory associated with the code page, where the new page contains a collection of counters and each of the counters corresponds to one of the instructions in the code page. When the hardware detects a branch instruction having a branch target within the code page, it increments one of the counters that has the same position in the new page as the branch target in the code page. The execution of the code page is repeated and the counters are incremented when branch targets fall within the code page. The hardware then provides the counter values in the new page to a binary translator for binary translation.

Type: Application

Filed: January 10, 2017

Publication date: July 27, 2017

Inventors: Paul Caprioli, Matthew C. Merten, Muawya M. Al-Otoom, Omar M. Shaikh, Abhay S. Kanhere, Suresh Srinivas, Koichi Yamada, Vivek Thakkar, Pawel Osciak
Instruction and logic to control transfer in a partial binary translation system

Patent number: 9652234

Abstract: A dynamic optimization of code for a processor-specific dynamic binary translation of hot code pages (e.g., frequently executed code pages) may be provided by a run-time translation layer. A method may be provided to use an instruction look-aside buffer (iTLB) to map original code pages and translated code pages. The method may comprise fetching an instruction from an original code page, determining whether the fetched instruction is a first instruction of a new code page and whether the original code page is deprecated. If both determinations return yes, the method may further comprise fetching a next instruction from a translated code page. If either determinations returns no, the method may further comprise decoding the instruction and fetching the next instruction from the original code page.

Type: Grant

Filed: September 30, 2011

Date of Patent: May 16, 2017

Assignee: Intel Corporation

Inventors: Paul Caprioli, Martin G. Dixon, Brett L. Toll, Muawya M. Al-Otoom, Omar M. Shaikh
Cache for patterns of instructions with multiple forward control transfers

Patent number: 9632791

Abstract: Techniques are disclosed relating to a cache for patterns of instructions. In some embodiments, an apparatus includes an instruction cache and is configured to detect a pattern of execution of instructions by an instruction processing pipeline. The pattern of execution may involve execution of only instructions in a particular group of instructions. The instructions may include multiple backward control transfers and/or a control transfer instruction that is taken in one iteration of the pattern and not taken in another iteration of the pattern. The apparatus may be configured to store the instructions in the instruction cache and fetch and execute the instructions from the instruction cache. The apparatus may include a branch predictor dedicated to predicting the direction of control transfer instructions for the instruction cache. Various embodiments may reduce power consumption associated with instruction processing.

Type: Grant

Filed: January 21, 2014

Date of Patent: April 25, 2017

Assignee: Apple Inc.

Inventors: Muawya M. Al-Otoom, Ian D. Kountanis, Ronald P. Hall, Michael L. Karm
System, Method, and Apparatus for Improving Throughput of Consecutive Transactional Memory Regions

Publication number: 20170097891

Abstract: Systems, apparatuses, and methods for improving TM throughput using a TM region indicator (or color) are described. Through the use of TM region indicators younger TM regions can have their instructions retired while waiting for older TM regions to commit.

Type: Application

Filed: December 16, 2016

Publication date: April 6, 2017

Inventors: Omar M. Shaikh, Ravi Rajwar, Paul Caprioli, Muawya M. Al-Otoom
System, Method, and Apparatus for Improving Throughput of Consecutive Transactional Memory Regions

Publication number: 20170097826

Abstract: Systems, apparatuses, and methods for improving TM throughput using a TM region indicator (or color) are described. Through the use of TM region indicators younger TM regions can have their instructions retired while waiting for older TM regions to commit.

Type: Application

Filed: December 16, 2016

Publication date: April 6, 2017

Inventors: Omar M. Shaikh, Ravi Rajwar, Paul Caprioli, Muawya M. Al-Otoom
Accelerated interlane vector reduction instructions

Patent number: 9588766

Abstract: A vector reduction instruction is executed by a processor to provide efficient reduction operations on an array of data elements. The processor includes vector registers. Each vector register is divided into a plurality of lanes, and each lane stores the same number of data elements. The processor also includes execution circuitry that receives the vector reduction instruction to reduce the array of data elements stored in a source operand into a result in a destination operand using a reduction operator. Each of the source operand and the destination operand is one of the vector registers. Responsive to the vector reduction instruction, the execution circuitry applies the reduction operator to two of the data elements in each lane, and shifts one or more remaining data elements when there is at least one of the data elements remaining in each lane.

Type: Grant

Filed: September 28, 2012

Date of Patent: March 7, 2017

Assignee: Intel Corporation

Inventors: Paul Caprioli, Abhay S. Kanhere, Jeffrey J. Cook, Muawya M. Al-Otoom
Hardware profiling mechanism to enable page level automatic binary translation

Patent number: 9542191

Abstract: A hardware profiling mechanism implemented by performance monitoring hardware enables page level automatic binary translation. The hardware during runtime identifies a code page in memory containing potentially optimizable instructions. The hardware requests allocation of a new page in memory associated with the code page, where the new page contains a collection of counters and each of the counters corresponds to one of the instructions in the code page. When the hardware detects a branch instruction having a branch target within the code page, it increments one of the counters that has the same position in the new page as the branch target in the code page. The execution of the code page is repeated and the counters are incremented when branch targets fall within the code page. The hardware then provides the counter values in the new page to a binary translator for binary translation.

Type: Grant

Filed: March 30, 2012

Date of Patent: January 10, 2017

Assignee: Intel Corporation

Inventors: Paul Caprioli, Matthew C. Merten, Muawya M. Al-Otoom, Omar M. Shaikh, Abhay S. Kanhere, Suresh Srinivas, Koichi Yamada, Vivek Thakkar, Pawel Osciak
System, Method, and Apparatus for Improving Throughput of Consecutive Transactional Memory Regions

Publication number: 20160350221

Abstract: Systems, apparatuses, and methods for improving TM throughput using a TM region indicator (or color) are described. Through the use of TM region indicators younger TM regions can have their instructions retired while waiting for older TM regions to commit.

Type: Application

Filed: August 9, 2016

Publication date: December 1, 2016

Inventors: Omar M. Shaikh, Ravi Rajwar, Paul Caprioli, Muawya M. Al-Otoom

1 2 next