Patents by Inventor Arunachalam Annamalai

Arunachalam Annamalai has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11256505
    Abstract: A processor predicts a number of loop iterations associated with a set of loop instructions. In response to the predicted number of loop iterations exceeding a first loop iteration threshold, the set of loop instructions are executed in a loop mode that includes placing at least one component of an instruction pipeline of the processor in a low-power mode or state and executing the set of loop instructions from a loop buffer. In response to the predicted number of loop iterations being less than or equal to a second loop iteration threshold, the set of instructions are executed in a non-loop mode that includes maintaining at least one component of the instruction pipeline in a powered up state and executing the set of loop instructions from an instruction fetch unit of the instruction pipeline.
    Type: Grant
    Filed: February 5, 2021
    Date of Patent: February 22, 2022
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Arunachalam Annamalai, Marius Evers, Aparna Thyagarajan, Anthony Jarvis
  • Patent number: 11055098
    Abstract: A processor includes a branch target buffer (BTB) having a plurality of entries whereby each entry corresponds to an associated instruction pointer value that is predicted to be a branch instruction. Each BTB entry stores a predicted branch target address for the branch instruction, and further stores information indicating whether the next branch in the block of instructions associated with the predicted branch target address is predicted to be a return instruction. In response to the BTB indicating that the next branch is predicted to be a return instruction, the processor initiates an access to a return stack that stores the return address for the predicted return instruction. By initiating access to the return stack responsive to the return prediction stored at the BTB, the processor reduces the delay in identifying the return address, thereby improving processing efficiency.
    Type: Grant
    Filed: July 24, 2018
    Date of Patent: July 6, 2021
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Aparna Thyagarajan, Marius Evers, Arunachalam Annamalai
  • Publication number: 20210191722
    Abstract: A processor predicts a number of loop iterations associated with a set of loop instructions. In response to the predicted number of loop iterations exceeding a first loop iteration threshold, the set of loop instructions are executed in a loop mode that includes placing at least one component of an instruction pipeline of the processor in a low-power mode or state and executing the set of loop instructions from a loop buffer. In response to the predicted number of loop iterations being less than or equal to a second loop iteration threshold, the set of instructions are executed in a non-loop mode that includes maintaining at least one component of the instruction pipeline in a powered up state and executing the set of loop instructions from an instruction fetch unit of the instruction pipeline.
    Type: Application
    Filed: February 5, 2021
    Publication date: June 24, 2021
    Inventors: Arunachalam ANNAMALAI, Marius EVERS, Aparna THYAGARAJAN, Anthony JARVIS
  • Patent number: 10915322
    Abstract: A processor predicts a number of loop iterations associated with a set of loop instructions. In response to the predicted number of loop iterations exceeding a first loop iteration threshold, the set of loop instructions are executed in a loop mode that includes placing at least one component of an instruction pipeline of the processor in a low-power mode or state and executing the set of loop instructions from a loop buffer. In response to the predicted number of loop iterations being less than or equal to a second loop iteration threshold, the set of instructions are executed in a non-loop mode that includes maintaining at least one component of the instruction pipeline in a powered up state and executing the set of loop instructions from an instruction fetch unit of the instruction pipeline.
    Type: Grant
    Filed: September 18, 2018
    Date of Patent: February 9, 2021
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Arunachalam Annamalai, Marius Evers, Aparna Thyagarajan, Anthony Jarvis
  • Patent number: 10896044
    Abstract: The techniques described herein provide an instruction fetch and decode unit having an operation cache with low latency in switching between fetching decoded operations from the operation cache and fetching and decoding instructions using a decode unit. This low latency is accomplished through a synchronization mechanism that allows work to flow through both the operation cache path and the instruction cache path until that work is stopped due to needing to wait on output from the opposite path. The existence of decoupling buffers in the operation cache path and the instruction cache path allows work to be held until that work is cleared to proceed. Other improvements, such as a specially configured operation cache tag array that allows for detection of multiple hits in a single cycle, also improve latency by, for example, improving the speed at which entries are consumed from a prediction queue that stores predicted address blocks.
    Type: Grant
    Filed: June 21, 2018
    Date of Patent: January 19, 2021
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Marius Evers, Dhanaraj Bapurao Tavare, Ashok Tirupathy Venkatachar, Arunachalam Annamalai, Donald A. Priore, Douglas R. Williams
  • Publication number: 20200089498
    Abstract: A processor predicts a number of loop iterations associated with a set of loop instructions. In response to the predicted number of loop iterations exceeding a first loop iteration threshold, the set of loop instructions are executed in a loop mode that includes placing at least one component of an instruction pipeline of the processor in a low-power mode or state and executing the set of loop instructions from a loop buffer. In response to the predicted number of loop iterations being less than or equal to a second loop iteration threshold, the set of instructions are executed in a non-loop mode that includes maintaining at least one component of the instruction pipeline in a powered up state and executing the set of loop instructions from an instruction fetch unit of the instruction pipeline.
    Type: Application
    Filed: September 18, 2018
    Publication date: March 19, 2020
    Inventors: Arunachalam ANNAMALAI, Marius EVERS, Aparna THYAGARAJAN
  • Publication number: 20200034151
    Abstract: A processor includes a branch target buffer (BTB) having a plurality of entries whereby each entry corresponds to an associated instruction pointer value that is predicted to be a branch instruction. Each BTB entry stores a predicted branch target address for the branch instruction, and further stores information indicating whether the next branch in the block of instructions associated with the predicted branch target address is predicted to be a return instruction. In response to the BTB indicating that the next branch is predicted to be a return instruction, the processor initiates an access to a return stack that stores the return address for the predicted return instruction. By initiating access to the return stack responsive to the return prediction stored at the BTB, the processor reduces the delay in identifying the return address, thereby improving processing efficiency.
    Type: Application
    Filed: July 24, 2018
    Publication date: January 30, 2020
    Inventors: Aparna THYAGARAJAN, Marius EVERS, Arunachalam ANNAMALAI
  • Publication number: 20190391813
    Abstract: The techniques described herein provide an instruction fetch and decode unit having an operation cache with low latency in switching between fetching decoded operations from the operation cache and fetching and decoding instructions using a decode unit. This low latency is accomplished through a synchronization mechanism that allows work to flow through both the operation cache path and the instruction cache path until that work is stopped due to needing to wait on output from the opposite path. The existence of decoupling buffers in the operation cache path and the instruction cache path allows work to be held until that work is cleared to proceed. Other improvements, such as a specially configured operation cache tag array that allows for detection of multiple hits in a single cycle, also improve latency by, for example, improving the speed at which entries are consumed from a prediction queue that stores predicted address blocks.
    Type: Application
    Filed: June 21, 2018
    Publication date: December 26, 2019
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Marius Evers, Dhanaraj Bapurao Tavare, Ashok Tirupathy Venkatachar, Arunachalam Annamalai, Donald A. Priore, Douglas R. Williams