Patents by Inventor Vignyan Reddy Kothinti Naresh

Vignyan Reddy Kothinti Naresh has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20210089312
    Abstract: Tracking and communication of direct/indirect source dependencies of producer instructions executed in a processor to source dependent consumer instructions to facilitate processor optimizations. The processor includes instruction processing circuit configured to process and execute fetched instructions in an instruction stream according to a dataflow execution. The instruction processing circuit includes mechanisms to communicate dependencies to dependent, consumer instructions in an instruction pipeline to facilitate processor optimizations, such as replay of consumer instructions. The instruction processing circuit is configured to track producer instructions and consumer instruction dependencies on its producer instructions in the instruction pipeline in a data structure circuit before the instructions are scheduled for execution.
    Type: Application
    Filed: September 20, 2019
    Publication date: March 25, 2021
    Inventor: Vignyan Reddy KOTHINTI NARESH
  • Publication number: 20210089308
    Abstract: A processor element in a processor-based system is configured to fetch one or more instructions associated with a program binary, where the one or more instructions include an instruction having an immediate operand. The processor element is configured to determine if the immediate operand is a reference to a wide immediate operand. In response to determining that the immediate operand is a reference to a wide immediate operand, the processor element is configured to retrieve the wide immediate operand from a common intermediate lookup table (CILT) in the program binary, where the immediate operand indexes the wide immediate operand in the CILT. The processor element is then configured to process the instruction having the immediate operand such that the immediate operand is replaced with the wide immediate operand from the CILT.
    Type: Application
    Filed: September 23, 2019
    Publication date: March 25, 2021
    Inventors: Arthur PERAIS, Rodney Wayne SMITH, Shivam PRIYADARSHI, Rami Mohammad AL SHEIKH, Vignyan Reddy KOTHINTI NARESH
  • Publication number: 20210064541
    Abstract: Deferring cache state updates in a non-speculative cache memory in a processor-based system in response to a speculative data request until the speculative data request becomes non-speculative is disclosed. The updating of at least one cache state in the cache memory resulting from a data request is deferred until the data request becomes non-speculative. Thus, a cache state in the cache memory is not updated for requests resulting from mispredictions. Deferring the updating of a cache state in the cache memory can include deferring the storing of received speculative requested data in the main data array of the cache memory as a result of a cache miss until the data request becomes non-speculative. The received speculative requested data can first be stored in a speculative buffer memory associated with a cache memory, and then stored in the main data array if the data request becomes non-speculative.
    Type: Application
    Filed: September 3, 2019
    Publication date: March 4, 2021
    Inventors: Vignyan Reddy KOTHINTI NARESH, Arthur PERAIS, Rami Mohammad AL SHEIKH, Shivam PRIYADARSHI
  • Patent number: 10929139
    Abstract: Providing predictive instruction dispatch throttling to prevent resource overflow in out-of-order processor (OOP)-based devices is disclosed. An OOP-based device includes a system resource that may be consumed or otherwise occupied by instructions, as well as an execution pipeline comprising a decode stage and a dispatch stage. The OOP further maintains a running count and a resource usage threshold. Upon receiving an instruction block, the decode stage extracts a proxy value that indicates an approximate predicted count of instructions within the instruction block that will consume a system resource. The decode stage then increments the running count by the proxy value. The dispatch stage compares the running count to the resource usage threshold before dispatching any younger instruction blocks. If the running count exceeds the resource usage threshold, the dispatch stage blocks dispatching of younger instruction blocks until the running count no longer exceeds the resource usage threshold.
    Type: Grant
    Filed: September 27, 2018
    Date of Patent: February 23, 2021
    Assignee: Qualcomm Incorporated
    Inventors: Lisa Ru-feng Hsu, Vignyan Reddy Kothinti Naresh, Gregory Michael Wright
  • Patent number: 10896041
    Abstract: Enabling early execution of move-immediate instructions having variable immediate value sizes in processor-based devices is disclosed. In one exemplary embodiment, a processor-based device provides a move-immediate logic circuit that detects a move-immediate instruction comprising an immediate value and a destination register. For frequently encountered immediate values, the move-immediate logic circuit allocates a physical register from an immediate physical register file (IPRF), and writes an IPRF tag corresponding to the allocated IPRF register into a most-recent mapping table (MRT) entry for the destination register. Subsequent move-immediate instructions embedding the same immediate value, as well as other dependent instructions, may then obtain the immediate value from the IPRF register by accessing the MRT entry.
    Type: Grant
    Filed: September 25, 2019
    Date of Patent: January 19, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Shivam Priyadarshi, Arthur Perais, Vignyan Reddy Kothinti Naresh, Yusuf Cagatay Tekmen, Rami Mohammad Al Sheikh, Rodney Wayne Smith
  • Patent number: 10877768
    Abstract: Minimizing traversal of a processor reorder buffer (ROB) for register rename map table (RMT) state recovery for interrupted instruction recovery in a processor. Instructions may execute out of order in a processor. Information about the logical register-to-physical register mapping resulting from each instruction is stored in entries in program order in the ROB. When the pipeline is interrupted by an instruction that fails to execute, changing program flow, all instructions following the interrupting instruction may be flushed from the processor pipeline. It is important to return the state of the RMT to the state that existed when the interrupting instruction entered the pipeline. To recover the RMT state in response to an interrupting instruction, register mapping information in the ROB entries is traversed to either undo the younger instructions that entered the pipeline after the interrupting instruction or replay the older instructions that entered the pipeline before the interrupting instruction.
    Type: Grant
    Filed: September 6, 2019
    Date of Patent: December 29, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Shivam Priyadarshi, Yusuf Cagatay Tekmen, Kiran Ravi Seth, Rodney Wayne Smith, Vignyan Reddy Kothinti Naresh
  • Publication number: 20200394040
    Abstract: Limiting replay of load-based control independent (CI) instructions in speculative misprediction recovery in a processor. In misprediction recovery, load-based CI instructions are designated as load-based CI, data dependent (CIDD) instructions if a load-based CI instruction consumed forwarded-stored data of a store-based instruction. During the misprediction recovery, replayed load-based CIDD instructions will reevaluate an accurate source of memory load the correct data instead of consuming potentially faulty data that may have been forwarded by a store-based instruction that may have only existed in a mispredicted instruction control flow path. Limiting the replay of load-based CI instructions to only determined CIDD load-based instructions can reduce execution delay and power consumption in an instruction pipeline.
    Type: Application
    Filed: June 13, 2019
    Publication date: December 17, 2020
    Inventors: Vignyan Reddy KOTHINTI NARESH, Shivam PRIYADARSHI
  • Publication number: 20200310814
    Abstract: Systems and methods for reusing load instructions by a processor without accessing a data cache include a load store execution unit (LSU) of the processor, the LSU being configured to determine if a prior execution of a first load instruction loaded data from a first cache line of the data cache and determine if a current execution of the second load instruction will load the data from the first cache line of the data cache. Further, the LSU also determines if a reuse of the data from the prior execution of the first load instruction for the current execution of the second load instruction will lead to functional errors. If there are no functional errors, the data from the prior execution of the first load instruction is reused for the current execution of the second load instruction, without accessing the data cache for the current execution of the second load instruction.
    Type: Application
    Filed: March 28, 2019
    Publication date: October 1, 2020
    Inventor: Vignyan Reddy KOTHINTI NARESH
  • Patent number: 10783011
    Abstract: Systems and methods are directed to efficient management of processor resources, particularly General Purpose Registers (GPRs), for example to minimize pipeline flushes prevent deadlocks by counting GPRs instead of allocating them to specific blocks of code. Blocks of code are allowed to execute if the Free GPRs count is adequate. The method contemplates counting the number of Register Writers in blocks of code which will write to GPRs which are in process of executing, and counting the GPRs which are available instead of merely allocating them to dedicated use by a block of code, or an instruction in a block of code. Because blocks do not run if there is not enough GPRs available for the block, deadlocks and pipeline flushes due to lack of resources can be minimized.
    Type: Grant
    Filed: September 21, 2017
    Date of Patent: September 22, 2020
    Assignee: Qualcomm Incorporated
    Inventors: Vignyan Reddy Kothinti Naresh, Gregory Michael Wright
  • Patent number: 10725782
    Abstract: Providing variable interpretation of usefulness indicators for memory tables in processor-based systems is disclosed. In one aspect, a memory system comprises a memory table providing multiple memory table entries, each including a usefulness indicator. A memory controller of the memory system comprises a global polarity indicator representing how the usefulness indicator for each memory table entry is interpreted and updated by the memory controller. If the global polarity indicator is set, the memory controller interprets a value of each usefulness indicator as directly corresponding to the usefulness of the corresponding memory table entry. Conversely, if the global polarity indicator is not set, the polarity is reversed such that the memory controller interprets the usefulness indicator value as inversely corresponding to the usefulness of the corresponding memory table entry.
    Type: Grant
    Filed: September 12, 2017
    Date of Patent: July 28, 2020
    Assignee: Qualcomm Incorporated
    Inventors: Anil Krishna, Yongseok Yi, Eric Rotenberg, Vignyan Reddy Kothinti Naresh, Gregory Michael Wright
  • Publication number: 20200104163
    Abstract: Providing predictive instruction dispatch throttling to prevent resource overflow in out-of-order processor (OOP)-based devices is disclosed. In this regard, an OOP-based device includes a system resource that may be consumed or otherwise occupied by instructions, as well as an execution pipeline comprising a decode stage and a dispatch stage. The OOP further maintains a running count and a resource usage threshold. Upon receiving an instruction block, the decode stage extracts a proxy value that indicates an approximate predicted count of instructions within the instruction block that will consume a system resource. The decode stage then increments the running count by the proxy value. The dispatch stage compares the running count to the resource usage threshold before dispatching any younger instruction blocks. If the running count exceeds the resource usage threshold, the dispatch stage blocks dispatching of younger instruction blocks until the running count no longer exceeds the resource usage threshold.
    Type: Application
    Filed: September 27, 2018
    Publication date: April 2, 2020
    Inventors: Lisa Ru-feng Hsu, Vignyan Reddy Kothinti Naresh, Gregory Michael Wright
  • Patent number: 10437592
    Abstract: Reduced logic level operation folding of context history in a history register in a prediction system for a processor-based system is disclosed. The prediction system includes a prediction circuit employing reduced operation folding of the history register for indexing a prediction table containing prediction values used to process a consumer instruction when value has not yet been resolved. To avoid the requirement to perform successive logic folding operations to produce a folded context history of a resultant reduced bit width, reduced logic level folding operation of the resultant reduced bit width is employed. Reduced logic level folding operation of the resultant reduced bit width involves using current folded context history from previous contents of a history register as basis for determining a new folded context history. In this manner, logic folding of the history register is faster and operates with reduced power consumption as a result of fewer logic operations.
    Type: Grant
    Filed: August 24, 2017
    Date of Patent: October 8, 2019
    Assignee: Qualcomm Incorporated
    Inventors: Anil Krishna, Yongseok Yi, Vignyan Reddy Kothinti Naresh
  • Patent number: 10331447
    Abstract: Providing efficient recursion handling using compressed return address stacks (CRASs) in processor-based systems is disclosed. In one aspect, a processor-based system provides a branch prediction circuit including a CRAS. Each CRAS entry within the CRAS includes an address field and a counter field. When a call instruction is encountered, a return address of the call instruction is compared to the address field of a top CRAS entry indicated by a CRAS top-of-stack (TOS) index. If the return address matches the top CRAS entry, the counter field of the top CRAS entry is incremented instead of adding a new CRAS entry for the return address. When a return instruction is subsequently encountered in the instruction stream, the counter field of the top CRAS entry is decremented if its value is greater than zero (0), or, if not, the top CRAS entry is removed from the CRAS.
    Type: Grant
    Filed: August 30, 2017
    Date of Patent: June 25, 2019
    Assignee: QUALCOMM Incorporated
    Inventors: Vignyan Reddy Kothinti Naresh, Anil Krishna
  • Patent number: 10255074
    Abstract: Selective flushing of instructions in an instruction pipeline in a processor back to an execution-determined target address in response to a precise interrupt is disclosed. A selective instruction pipeline flush controller determines if a precise interrupt has occurred for an executed instruction in the instruction pipeline. The selective instruction pipeline flush controller determines if an instruction at the correct resolved target address of the instruction that caused the precise interrupt is contained in the instruction pipeline. If so, the selective instruction pipeline flush controller can selectively flush instructions back to the instruction in the pipeline that contains the correct resolved target address to reduce the amount of new instruction fetching.
    Type: Grant
    Filed: September 11, 2015
    Date of Patent: April 9, 2019
    Assignee: QUALCOMM Incorporated
    Inventors: Vignyan Reddy Kothinti Naresh, Rami Mohammad Al Sheikh, Harold Wade Cain, III
  • Publication number: 20190087241
    Abstract: Systems and methods are directed to efficient management of processor resources, particularly General Purpose Registers (GPRs), for example to minimize pipeline flushes prevent deadlocks by counting GPRs instead of allocating them to specific blocks of code. Blocks of code are allowed to execute if the Free GPRs count is adequate. The method contemplates counting the number of Register Writers in blocks of code which will write to GPRs which are in process of executing, and counting the GPRs which are available instead of merely allocating them to dedicated use by a block of code, or an instruction in a block of code. Because blocks do not run if there is not enough GPRs available for the block, deadlocks and pipeline flushes due to lack of resources can be minimized.
    Type: Application
    Filed: September 21, 2017
    Publication date: March 21, 2019
    Inventors: Vignyan Reddy KOTHINTI NARESH, Gregory Michael WRIGHT
  • Publication number: 20190087184
    Abstract: Systems and methods are directed to instruction execution in a computer system having an out of order instruction picker, which are typically used in computing systems capable of executing multiple instructions in parallel. Such systems are typically block based and multiple instructions are grouped in execution units such as Reservation Station (RSV) Arrays. If an event, such as an exception, page fault, or similar event occurs, the block may have to be swapped out, that is removed from execution, until the event clears. Typically when the event clears the block is brought back to be executed, but typically will be assigned a different RSV Array and re-executed from the beginning of the block. Tagging instructions that may cause such events and then untagging them, by resetting the tag, once they have executed can eliminate much of the typical unnecessary re-execution of instructions.
    Type: Application
    Filed: September 15, 2017
    Publication date: March 21, 2019
    Inventors: Vignyan Reddy KOTHINTI NARESH, Lisa HSU, Vinay MURTHY, Anil KRISHNA, Gregory WRIGHT, III
  • Publication number: 20190079772
    Abstract: Providing variable interpretation of usefulness indicators for memory tables in processor-based systems is disclosed. In one aspect, a memory system comprises a memory table providing multiple memory table entries, each including a usefulness indicator. A memory controller of the memory system comprises a global polarity indicator representing how the usefulness indicator for each memory table entry is interpreted and updated by the memory controller. If the global polarity indicator is set, the memory controller interprets a value of each usefulness indicator as directly corresponding to the usefulness of the corresponding memory table entry. Conversely, if the global polarity indicator is not set, the polarity is reversed such that the memory controller interprets the usefulness indicator value as inversely corresponding to the usefulness of the corresponding memory table entry.
    Type: Application
    Filed: September 12, 2017
    Publication date: March 14, 2019
    Inventors: Anil Krishna, Yongseok Yi, Eric Rotenberg, Vignyan Reddy Kothinti Naresh, Gregory Michael Wright
  • Patent number: 10223118
    Abstract: Providing references to previously decoded instructions of recently-provided instructions to be executed by a processor is disclosed herein. In one aspect, a low resource micro-operation controller is provided. Responsive to an instruction pipeline receiving an instruction address, the low resource micro-operation controller is configured to determine if the received instruction address corresponds to an instruction address in short history table. Short history table includes instruction addresses of recently-provided instructions having micro-ops in a post-decode queue. If the received instruction address corresponds to an instruction address in short history table, the low resource micro-operation controller is configured to provide reference (e.g., pointer) to the fetch stage that corresponds to an entry in the post-decode queue in which the micro-ops corresponding to the instruction address are stored.
    Type: Grant
    Filed: March 24, 2016
    Date of Patent: March 5, 2019
    Assignee: QUALCOMM Incorporated
    Inventors: Vignyan Reddy Kothinti Naresh, Shivam Priyadarshi, Raguram Damodaran
  • Publication number: 20190065197
    Abstract: Providing efficient recursion handling using compressed return address stacks (CRASs) in processor-based systems is disclosed. In one aspect, a processor-based system provides a branch prediction circuit including a CRAS. Each CRAS entry within the CRAS includes an address field and a counter field. When a call instruction is encountered, a return address of the call instruction is compared to the address field of a top CRAS entry indicated by a CRAS top-of-stack (TOS) index. If the return address matches the top CRAS entry, the counter field of the top CRAS entry is incremented instead of adding a new CRAS entry for the return address. When a return instruction is subsequently encountered in the instruction stream, the counter field of the top CRAS entry is decremented if its value is greater than zero (0), or, if not, the top CRAS entry is removed from the CRAS.
    Type: Application
    Filed: August 30, 2017
    Publication date: February 28, 2019
    Inventors: Vignyan Reddy Kothinti Naresh, Anil Krishna
  • Publication number: 20190065196
    Abstract: Reduced logic level operation folding of context history in a history register in a prediction system for a processor-based system is disclosed. The prediction system includes a prediction circuit employing reduced operation folding of the history register for indexing a prediction table containing prediction values used to process a consumer instruction when value has not yet been resolved. To avoid the requirement to perform successive logic folding operations to produce a folded context history of a resultant reduced bit width, reduced logic level folding operation of the resultant reduced bit width is employed. Reduced logic level folding operation of the resultant reduced bit width involves using current folded context history from previous contents of a history register as basis for determining a new folded context history. In this manner, logic folding of the history register is faster and operates with reduced power consumption as a result of fewer logic operations.
    Type: Application
    Filed: August 24, 2017
    Publication date: February 28, 2019
    Inventors: Anil Krishna, Yongseok Yi, Vignyan Reddy Kothinti Naresh