Patents by Inventor Kulin N. Kothari

Kulin N. Kothari has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11829763
    Abstract: A system and method for efficiently reducing the latency of load operations. In various embodiments, logic of a processor accesses a prediction table after fetching instructions. For a prediction table hit, the logic executes a load instruction with a retrieved predicted address from the prediction table. For a prediction table miss, when the logic determines the address of the load instruction and hits in a learning table, the logic updates a level of confidence indication to indicate a higher level of confidence when a stored address matches the determined address. When the logic determines the level of confidence indication stored in a given table entry of the learning table meets a threshold, the logic allocates, in the prediction table, information stored in the given entry. Therefore, the predicted address is available during the next lookup of the prediction table.
    Type: Grant
    Filed: August 13, 2019
    Date of Patent: November 28, 2023
    Assignee: Apple Inc.
    Inventors: Yuan C. Chou, Viney Gautam, Wei-Han Lien, Kulin N. Kothari, Mridul Agarwal
  • Patent number: 11809874
    Abstract: A processor may include an instruction distribution circuit and a plurality of execution pipelines. The instruction distribution circuit may distribute a conditional instruction to a first execution pipeline for execution when the conditional instruction is associated with a prediction of a high confidence level, or to a second execution pipeline for execution when the conditional instruction is associated with a prediction of a low confidence level. The second execution pipeline, not the first execution pipeline, may directly instruct the processor to obtain an instruction from a target address for execution, when the conditional instruction is mispredicted. Thus, when the conditional instruction is distributed to the first execution pipeline for execution and determined to be mispredicted, the first execution pipeline may cause the conditional instruction to be re-executed in the second execution pipeline to cause the instruction from the correct target address to be obtained for execution.
    Type: Grant
    Filed: February 1, 2022
    Date of Patent: November 7, 2023
    Assignee: Apple Inc.
    Inventors: Ethan R Schuchman, Niket K Choudhary, Kulin N Kothari, Haoyan Jia, Ian D Kountanis, Douglas C Holman, Wei-Han Lien, Pruthivi Vuyyuru
  • Publication number: 20230244495
    Abstract: A processor may include an instruction distribution circuit and a plurality of execution pipelines. The instruction distribution circuit may distribute a conditional instruction to a first execution pipeline for execution when the conditional instruction is associated with a prediction of a high confidence level, or to a second execution pipeline for execution when the conditional instruction is associated with a prediction of a low confidence level. The second execution pipeline, not the first execution pipeline, may directly instruct the processor to obtain an instruction from a target address for execution, when the conditional instruction is mispredicted. Thus, when the conditional instruction is distributed to the first execution pipeline for execution and determined to be mispredicted, the first execution pipeline may cause the conditional instruction to be re-executed in the second execution pipeline to cause the instruction from the correct target address to be obtained for execution.
    Type: Application
    Filed: February 1, 2022
    Publication date: August 3, 2023
    Applicant: Apple Inc.
    Inventors: Ethan R. Schuchman, Niket K. Choudhary, Kulin N. Kothari, Haoyan Jia, Ian D. Kountanis, Douglas C. Holman, Wei-Han Lien, Pruthivi Vuyyuru
  • Publication number: 20230244494
    Abstract: A processor may include a bias prediction circuit and an instruction prediction circuit to provide respective predictions for a conditional instruction. The bias prediction circuit may provide a bias prediction whether a condition of the conditional instruction is biased true or biased false. The instruction prediction circuit may provide an instruction prediction whether the condition of the conditional instruction is true of false. Responsive to a bias prediction that the condition of the conditional instruction is biased true or biased false, the processor may use the bias prediction from the bias prediction circuit to speculatively process the conditional instruction. Otherwise, the processor may use the instruction prediction from the instruction prediction circuit to speculatively process the conditional instruction.
    Type: Application
    Filed: February 1, 2022
    Publication date: August 3, 2023
    Applicant: Apple Inc.
    Inventors: Ian D Kountanis, Douglas C Holman, Wei-Han Lien, Pruthivi Vuyyuru, Ethan R Schuchman, Niket K Choudhary, Kulin N Kothari, Haoyan Jia
  • Patent number: 11416254
    Abstract: Systems, apparatuses, and methods for implementing zero cycle load bypass operations are described. A system includes a processor with at least a decode unit, control logic, mapper, and free list. When a load operation is detected, the control logic determines if the load operation qualifies to be converted to a zero cycle load bypass operation. Conditions for qualifying include the load operation being in the same decode group as an older store operation to the same address. Qualifying load operations are converted to zero cycle load bypass operations. A lookup of the free list is prevented for a zero cycle load bypass operation and a destination operand of the load is renamed with a same physical register identifier used for a source operand of the store. Also, the data of the store is bypassed to the load.
    Type: Grant
    Filed: December 5, 2019
    Date of Patent: August 16, 2022
    Assignee: Apple Inc.
    Inventors: Deepankar Duggal, Kulin N. Kothari, Conrado Blasco, Muawya M. Al-Otoom
  • Patent number: 11175917
    Abstract: In an embodiment, a processor comprises a reservation station that issues a first load operation for execution, a store queue, and a replayed load buffer coupled in parallel with the reservation station. During execution of the first load operation, the store queue detects that the first load operation hits on a first store operation in the store queue that lacks store data and causes a replay of the first load operation. The replayed load buffer captures an identifier of the first load operation and the first store operation based on the replay of the first load operation, wherein the replayed load buffer monitors the reservation station for issuance of a first store data operation corresponding to the first store operation and issues the first load operation for reexecution based on the issuance of the first store data operation.
    Type: Grant
    Filed: September 11, 2020
    Date of Patent: November 16, 2021
    Assignee: Apple Inc.
    Inventors: Mridul Agarwal, Kulin N. Kothari, Nikhil Gupta
  • Publication number: 20210173654
    Abstract: Systems, apparatuses, and methods for implementing zero cycle load bypass operations are described. A system includes a processor with at least a decode unit, control logic, mapper, and free list. When a load operation is detected, the control logic determines if the load operation qualifies to be converted to a zero cycle load bypass operation. Conditions for qualifying include the load operation being in the same decode group as an older store operation to the same address. Qualifying load operations are converted to zero cycle load bypass operations. A lookup of the free list is prevented for a zero cycle load bypass operation and a destination operand of the load is renamed with a same physical register identifier used for a source operand of the store. Also, the data of the store is bypassed to the load.
    Type: Application
    Filed: December 5, 2019
    Publication date: June 10, 2021
    Inventors: Deepankar Duggal, Kulin N. Kothari, Conrado Blasco, Muawya M. Al-Otoom
  • Patent number: 10983801
    Abstract: A processor includes a load/store unit that includes one or more load pipelines and one or more store pipelines. Load operations may be issued into the load pipelines out of order with respect to older store operations. If a load operation is executed out or order with an older store operation that writes one or more bytes read by the load operation, and if the store operation is issued shortly after the load operation, such that the load operation is still in the load pipeline when the store operation is issued, some cases of flushing may be converted to replays by detecting the ordering violation while the load operation is still in the load pipeline.
    Type: Grant
    Filed: September 6, 2019
    Date of Patent: April 20, 2021
    Assignee: Apple Inc.
    Inventors: Kulin N. Kothari, Mridul Agarwal
  • Publication number: 20210072997
    Abstract: A processor includes a load/store unit that includes one or more load pipelines and one or more store pipelines. Load operations may be issued into the load pipelines out of order with respect to older store operations. If a load operation is executed out or order with an older store operation that writes one or more bytes read by the load operation, and if the store operation is issued shortly after the load operation, such that the load operation is still in the load pipeline when the store operation is issued, some cases of flushing may be converted to replays by detecting the ordering violation while the load operation is still in the load pipeline.
    Type: Application
    Filed: September 6, 2019
    Publication date: March 11, 2021
    Inventors: Kulin N. Kothari, Mridul Agarwal
  • Publication number: 20210049015
    Abstract: A system and method for efficiently reducing the latency of load operations. In various embodiments, logic of a processor accesses a prediction table after fetching instructions. For a prediction table hit, the logic executes a load instruction with a retrieved predicted address from the prediction table. For a prediction table miss, when the logic determines the address of the load instruction and hits in a learning table, the logic updates a level of confidence indication to indicate a higher level of confidence when a stored address matches the determined address. When the logic determines the level of confidence indication stored in a given table entry of the learning table meets a threshold, the logic allocates, in the prediction table, information stored in the given entry. Therefore, the predicted address is available during the next lookup of the prediction table.
    Type: Application
    Filed: August 13, 2019
    Publication date: February 18, 2021
    Inventors: Yuan C. Chou, Viney Gautam, Wei-Han Lien, Kulin N. Kothari, Mridul Agarwal
  • Patent number: 10838729
    Abstract: A system and method for efficiently reducing the latency and power of memory access operations. A processor includes a stack pointer (SP) load-store dependence (LSD) predictor which predicts whether a memory dependence exists on a store instruction. The processor also includes a register file (RF) LSD predictor which predicts whether a memory dependence exists on a store instruction or a load instruction by a subsequent load instruction in program order. Each of the SP-LSD predictor and the RF-LSD predictor predicts and performs register renaming in a pipeline stage earlier than a renaming pipeline stage. The RF-LSD predictor also determines whether any intervening instructions between a producer memory instruction and a consumer memory instruction modify a predicted dependence.
    Type: Grant
    Filed: March 21, 2018
    Date of Patent: November 17, 2020
    Assignee: Apple Inc.
    Inventors: Muawya M. Al-Otoom, Conrado Blasco, Deepankar Duggal, Kulin N. Kothari, Richard F. Russo
  • Patent number: 10628164
    Abstract: A system and method for efficiently handling speculative execution. A load store unit (LSU) of a processor stores a commit candidate pointer, which points to a given store instruction buffered in the store queue. The given store instruction is an oldest store instruction not currently permitted to commit to the data cache. The LSU receives a first pointer from the mapping unit, which points to an oldest instruction of non-dispatched branches and unresolved system instructions. The LSU receives a second pointer from the execution unit, which points to an oldest unresolved, issued branch instruction. When the LSU determines the commit candidate pointer is older than each of the first pointer and the second pointer, the commit candidate pointer is updated to point to an oldest store instruction younger than the given store instruction stored in the store queue. The given store instruction is permitted to commit to the data cache.
    Type: Grant
    Filed: July 30, 2018
    Date of Patent: April 21, 2020
    Assignee: Apple Inc.
    Inventors: Kulin N. Kothari, Mridul Agarwal, Aditya Kesiraju, Deepankar Duggal, Sean M. Reynolds
  • Patent number: 10437595
    Abstract: Systems, apparatuses, and methods for optimizing a load-store dependency predictor (LSDP). When a younger load instruction is issued before an older store instruction and the younger load is dependent on the older store, the LSDP is trained on this ordering violation. A replay/flush indicator is stored in a corresponding entry in the LSDP to indicate whether the ordering violation resulted in a flush or replay. On subsequent executions, a dependency may be enforced for the load-store pair if a confidence counter is above a threshold, with the threshold varying based on the status of the replay/flush indicator. If a given load matches on multiple entries in the LSDP, and if at least one of the entries has a flush indicator, then the given load may be marked as a multimatch case and forced to wait to issue until all older stores have issued.
    Type: Grant
    Filed: March 15, 2016
    Date of Patent: October 8, 2019
    Assignee: Apple Inc.
    Inventors: Pradeep Kanapathipillai, Stephan G. Meier, Gerard R. Williams, III, Mridul Agarwal, Kulin N. Kothari
  • Patent number: 10228951
    Abstract: Systems, apparatuses, and methods for committing store instructions out of order from a store queue are described. A processor may store a first store instruction and a second store instruction in the store queue, wherein the first store instruction is older than the second store instruction. In response to determining the second store instruction is ready to commit to the memory hierarchy, the processor may allow the second store instruction to commit before the first store instruction, in response to determining that all store instructions in the store queue older than the second store instruction are non-speculative. However, if it is determined that at least one store instruction in the store queue older than the second store instruction is speculative, the processor may prevent the second store instruction from committing to the memory hierarchy before the first store instruction.
    Type: Grant
    Filed: August 20, 2015
    Date of Patent: March 12, 2019
    Assignee: Apple Inc.
    Inventors: Kulin N. Kothari, Mridul Agarwal, Pradeep Kanapathipillai
  • Patent number: 9494997
    Abstract: In some embodiments, a system may include a sub-hierarchy clock control. In some embodiments, the system may include a master unit. The master unit may include an interface unit electrically coupled to a slave unit. The interface unit may monitor, during use, usage requests of the slave unit by the master unit. In some embodiments, the interface unit may turn off clocks to the slave unit during periods of nonuse. In some embodiments, the interface unit may determine if a predetermined period of time elapses before turning on clocks to the slave unit such that turning off the slave unit resulted in the system achieving greater efficiency. In some embodiments, the interface unit may maintain, during use, power to the slave unit during periods of nonuse. The interface unit may maintain power to the slave unit during periods of nonuse such that data stored in the slave unit is preserved.
    Type: Grant
    Filed: June 16, 2014
    Date of Patent: November 15, 2016
    Assignee: Apple Inc.
    Inventors: Kulin N. Kothari, Pradeep Kanapathipillai, Chetana N. Keltcher, Pankaj Raghuvanshi
  • Patent number: 9477478
    Abstract: The disclosure relates to predicting simple and polymorphic branch instructions. An embodiment of the disclosure detects that a program instruction is a branch instruction, determines whether a program counter for the branch instruction is stored in a program counter filter, and, if the program counter is stored in the program counter filter, prevents the program counter from being stored in a first level predictor.
    Type: Grant
    Filed: May 16, 2012
    Date of Patent: October 25, 2016
    Assignee: QUALCOMM Incorporated
    Inventors: Kulin N. Kothari, Michael William Morrow, James Norris Dieffenderfer, Michael Scott McIlvaine, Brian Michael Stempel, Daren Eugene Streett
  • Publication number: 20150362978
    Abstract: In some embodiments, a system may include a sub-hierarchy clock control. In some embodiments, the system may include a master unit. The master unit may include an interface unit electrically coupled to a slave unit. The interface unit may monitor, during use, usage requests of the slave unit by the master unit. In some embodiments, the interface unit may turn off clocks to the slave unit during periods of nonuse. In some embodiments, the interface unit may determine if a predetermined period of time elapses before turning on clocks to the slave unit such that turning off the slave unit resulted in the system achieving greater efficiency. In some embodiments, the interface unit may maintain, during use, power to the slave unit during periods of nonuse. The interface unit may maintain power to the slave unit during periods of nonuse such that data stored in the slave unit is preserved.
    Type: Application
    Filed: June 16, 2014
    Publication date: December 17, 2015
    Inventors: Kulin N. Kothari, Pradeep Kanapathipillai, Chetana N. Keltcher, Pankaj Raghuvanshi
  • Patent number: 8966230
    Abstract: Methods and apparatus relating to dynamic selection of execution stage are described. In some embodiments, logic may determine whether to execute an instruction at one of a plurality of stages in a processor. In some embodiments, the plurality of stages are to at least correspond to an address generation stage or an execution stage of the instruction. Other embodiments are also described and claimed.
    Type: Grant
    Filed: September 30, 2009
    Date of Patent: February 24, 2015
    Assignee: Intel Corporation
    Inventors: Deepak Limaye, Kulin N. Kothari, James D. Allen, James E. Phillips
  • Publication number: 20140281391
    Abstract: A processor to a store constant value (immediate or literal) in a cache upon decoding a move immediate instruction in which the immediate is to be moved (copied or written) to an architected register. The constant value is stored in an entry in the cache. Each entry in the cache includes a field to indicate whether its stored constant value is valid, and a field to associate the entry with an architected register. Once a constant value is stored in the cache, it is immediately available for forwarding to a processor pipeline where a decoded instruction may need the constant value as an operand.
    Type: Application
    Filed: March 14, 2013
    Publication date: September 18, 2014
    Applicant: QUALCOMM INCORPORATED
    Inventors: James Norris Dieffenderfer, Michael William Morrow, Rodney Wayne Smith, Jeffery M. Schottmiller, Daniel S. Higdon, Michael Scott McIlvaine, Brian Michael Stempel, Kulin N. Kothari
  • Publication number: 20130311760
    Abstract: The disclosure relates to predicting simple and polymorphic branch instructions. An embodiment of the disclosure detects that a program instruction is a branch instruction, determines whether a program counter for the branch instruction is stored in a program counter filter, and, if the program counter is stored in the program counter filter, prevents the program counter from being stored in a first level predictor.
    Type: Application
    Filed: May 16, 2012
    Publication date: November 21, 2013
    Applicant: QUALCOMM INCORPORATED
    Inventors: Kulin N. Kothari, Michael William Morrow, James Norris Dieffenderfer, Michael Scott McIlvaine, Brian Michael Stempel, Daren Eugene Streett