Patents by Inventor Ravindra N. Bhargava

Ravindra N. Bhargava has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20190196995
    Abstract: Systems, apparatuses, and methods for performing efficient memory accesses for a computing system are disclosed. When a memory controller in a computing system determines a threshold number of memory access requests have not been sent to the memory device in a current mode of a read mode and a write mode, a first cost corresponding to a latency associated with sending remaining requests in either the read queue or the write queue associated with the current mode is determined. If the first cost exceeds the cost of a data bus turnaround, the cost of a data bus turnaround comprising a latency incurred when switching a transmission direction of the data bus from one direction to an opposite direction, then a second cost is determined for sending remaining memory access requests to the memory device. If the second cost does not exceed the cost of the data bus turnaround, then a time for the data bus turnaround is indicated and the current mode of the memory controller is changed.
    Type: Application
    Filed: December 21, 2017
    Publication date: June 27, 2019
    Inventors: Guanhao Shen, Ravindra N. Bhargava, Kedarnath Balakrishnan
  • Publication number: 20190196720
    Abstract: Systems, apparatuses, and methods for performing efficient memory accesses for a computing system are disclosed. In various embodiments, a computing system includes one or more computing resources and a memory controller coupled to a memory device. The memory controller determines a memory access request targets a given bank of multiple banks. An access history is updated for the given bank based on whether the memory access request hits on an open page within the given bank and a page hit rate for the given bank is determined. The memory controller sets an idle cycle limit based on the page hit rate. The idle cycle limit is a maximum amount of time the given bank will be held open before closing the given bank while the bank is idle. The idle cycle limit is based at least in part on a page hit rate for the bank.
    Type: Application
    Filed: December 21, 2017
    Publication date: June 27, 2019
    Inventors: Guanhao Shen, Ravindra N. Bhargava, James Raymond Magro, Kedarnath Balakrishnan, Kevin M. Brandl
  • Publication number: 20190196987
    Abstract: Systems, apparatuses, and methods for performing efficient memory accesses in a computing system are disclosed. In various embodiments, a computing system includes computing resources and a memory controller coupled to a memory device. The memory controller determines a memory request targets a given rank of multiple ranks. The memory controller determines a predicted latency for the given rank as an amount of time the pending queue in the memory controller for storing outstanding memory requests does not store any memory requests targeting the given rank. The memory controller determines the total bank latency as an amount of time for refreshing a number of banks which have not yet been refreshed in the given rank with per-bank refresh operations. If there are no pending requests targeting the given rank, each of the predicted latency and the total bank latency is used to select between per-bank and all-bank refresh operations.
    Type: Application
    Filed: December 21, 2017
    Publication date: June 27, 2019
    Inventors: Guanhao Shen, Ravindra N. Bhargava, James Raymond Magro, Kedarnath Balakrishnan, Jing Wang
  • Publication number: 20190188137
    Abstract: Systems, apparatuses, and methods for maintaining a region-based cache directory are disclosed. A system includes multiple processing nodes, with each processing node including a cache subsystem. The system also includes a cache directory to help manage cache coherency among the different cache subsystems of the system. In order to reduce the number of entries in the cache directory, the cache directory tracks coherency on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. Accordingly, the system includes a region-based cache directory to track regions which have at least one cache line cached in any cache subsystem in the system. The cache directory includes a reference count in each entry to track the aggregate number of cache lines that are cached per region. If a reference count of a given entry goes to zero, the cache directory reclaims the given entry.
    Type: Application
    Filed: December 18, 2017
    Publication date: June 20, 2019
    Inventors: Vydhyanathan Kalyanasundharam, Kevin M. Lepak, Amit P. Apte, Ganesh Balakrishnan, Eric Christopher Morton, Elizabeth M. Cooper, Ravindra N. Bhargava
  • Publication number: 20190179760
    Abstract: Systems, apparatuses, and methods for performing efficient memory accesses for a computing system are disclosed. External system memory is used as a last-level cache and includes one of a variety of types of dynamic random access memory (DRAM). A memory controller generates a tag request and a separate data request based on a same, single received memory request. The sending of the tag request is prioritized over sending the data request. A partial tag comparison is performed during processing of the tag request. If a tag miss is detected for the partial tag comparison, then the data request is cancelled, and the memory request is sent to main memory. If one or more tag hits are detected for the partial tag comparison, then processing of the data request is dependent upon the result of the full tag comparison.
    Type: Application
    Filed: December 12, 2017
    Publication date: June 13, 2019
    Inventors: Ravindra N. Bhargava, Ganesh Balakrishnan
  • Publication number: 20190179758
    Abstract: Systems, apparatuses, and methods for accelerating cache to cache data transfers are disclosed. A system includes at least a plurality of processing nodes and prediction units, an interconnect fabric, and a memory. A first prediction unit is configured to receive memory requests generated by a first processing node as the requests traverse the interconnect fabric on the path to memory. When the first prediction unit receives a memory request, the first prediction unit generates a prediction of whether data targeted by the request is cached by another processing node. The first prediction unit is configured to cause a speculative probe to be sent to a second processing node responsive to predicting that the data targeted by the memory request is cached by the second processing node. The speculative probe accelerates the retrieval of the data from the second processing node if the prediction is correct.
    Type: Application
    Filed: December 12, 2017
    Publication date: June 13, 2019
    Inventors: Vydhyanathan Kalyanasundharam, Amit P. Apte, Ganesh Balakrishnan, Ann Ling, Ravindra N. Bhargava
  • Publication number: 20190155516
    Abstract: Systems, apparatuses, and methods for performing efficient memory accesses for a computing system are disclosed. In various embodiments, a computing system includes a computing resource and a memory controller coupled to a memory device. The computing resource selectively generates a hint that includes a target address of a memory request generated by the processor. The hint is sent outside the primary communication fabric to the memory controller. The hint conditionally triggers a data access in the memory device. When no page in a bank targeted by the hint is open, the memory controller processes the hint by opening a target page of the hint without retrieving data. The memory controller drops the hint if there are other pending requests that target the same page or the target page is already open.
    Type: Application
    Filed: November 20, 2017
    Publication date: May 23, 2019
    Inventors: Ravindra N. Bhargava, Philip S. Park, Vydhyanathan Kalyanasundharam, James Raymond Magro
  • Patent number: 10223124
    Abstract: A processor employs one or more branch predictors to issue branch predictions for each thread executing at an instruction pipeline. Based on the branch predictions, the processor determines a branch prediction confidence for each of the executing threads, whereby a lower confidence level indicates a smaller likelihood that the corresponding thread will actually take the predicted branch. Because speculative execution of an untaken branch wastes resources of the instruction pipeline, the processor prioritizes threads associated with a higher confidence level for selection at the stages of the instruction pipeline.
    Type: Grant
    Filed: January 11, 2013
    Date of Patent: March 5, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Ramkumar Jayaseelan, Ravindra N Bhargava
  • Patent number: 9529720
    Abstract: The present application describes embodiments of techniques for picking a data array lookup request for execution in a data array pipeline a variable number of cycles behind a corresponding tag array lookup request that is concurrently executing in a tag array pipeline. Some embodiments of a method for picking the data array lookup request include picking the data array lookup request for execution in a data array pipeline of a cache concurrently with execution of a tag array lookup request in a tag array pipeline of the cache. The data array lookup request is picked for execution in response to resources of the data array pipeline becoming available after picking the tag array lookup request for execution. Some embodiments of the method may be implemented in a cache.
    Type: Grant
    Filed: June 7, 2013
    Date of Patent: December 27, 2016
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Marius Evers, John Kalamatianos, Carl D. Dietz, Richard E. Klass, Ravindra N. Bhargava
  • Publication number: 20140365729
    Abstract: The present application describes embodiments of techniques for picking a data array lookup request for execution in a data array pipeline a variable number of cycles behind a corresponding tag array lookup request that is concurrently executing in a tag array pipeline. Some embodiments of a method for picking the data array lookup request include picking the data array lookup request for execution in a data array pipeline of a cache concurrently with execution of a tag array lookup request in a tag array pipeline of the cache. The data array lookup request is picked for execution in response to resources of the data array pipeline becoming available after picking the tag array lookup request for execution. Some embodiments of the method may be implemented in a cache.
    Type: Application
    Filed: June 7, 2013
    Publication date: December 11, 2014
    Inventors: Marius Evers, John Kalamatianos, Carl D. Dietz, Richard E. Klass, Ravindra N. Bhargava
  • Publication number: 20140201507
    Abstract: A processor employs one or more branch predictors to issue branch predictions for each thread executing at an instruction pipeline. Based on the branch predictions, the processor determines a branch prediction confidence for each of the executing threads, whereby a lower confidence level indicates a smaller likelihood that the corresponding thread will actually take the predicted branch. Because speculative execution of an untaken branch wastes resources of the instruction pipeline, the processor prioritizes threads associated with a higher confidence level for selection at the stages of the instruction pipeline.
    Type: Application
    Filed: January 11, 2013
    Publication date: July 17, 2014
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventors: Ramkumar Jayaseelan, Ravindra N. Bhargava
  • Patent number: 8782384
    Abstract: A system and method for efficient improvement of branch prediction in a microprocessor with negligible impact on die-area, power consumption, and clock cycle period. It is determined if a program counter (PC) register contains a polymorphic indirect unconditional branch (PIUB) instruction. One determination may be searching a table with a portion or all of a PC of past PIUB instructions. If a hit occurs in this table, the global shift register (GSR) is updated by shifting a portion of the branch target address into the GSR, rather than updating the GSR with a taken/not-taken prediction bit. The stored value in the GSR is input into a hashing function along with the PC in order to index prediction tables such as a pattern history table (PHT), a branch target buffer (BTB), an indirect target array, or other. The updated value due to the PIUB instruction improves the accuracy of the prediction tables.
    Type: Grant
    Filed: December 20, 2007
    Date of Patent: July 15, 2014
    Assignee: Advanced Micro Devices, Inc.
    Inventors: David Suggs, Ravindra N. Bhargava
  • Patent number: 8667257
    Abstract: Techniques are disclosed relating to improving the performance of branch prediction in processors. In one embodiment, a processor is disclosed that includes a branch prediction unit configured to predict a sequence of instructions to be issued by the processor for execution. The processor also includes a pattern detection unit configured to detect a pattern in the predicted sequence of instructions, where the pattern includes a plurality of predicted instructions. In response to the pattern detection unit detecting the pattern, the processor is configured to switch from issuing instructions predicted by the branch prediction unit to issuing the plurality of instructions. In some embodiments, the processor includes a replay unit that is configured to replay fetch addresses to an instruction fetch unit to cause the plurality of predicted instructions to be issued.
    Type: Grant
    Filed: November 10, 2010
    Date of Patent: March 4, 2014
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Ravindra N. Bhargava, David Suggs, Anthony X. Jarvis
  • Publication number: 20120117362
    Abstract: Techniques are disclosed relating to improving the performance of branch prediction in processors. In one embodiment, a processor is disclosed that includes a branch prediction unit configured to predict a sequence of instructions to be issued by the processor for execution. The processor also includes a pattern detection unit configured to detect a pattern in the predicted sequence of instructions, where the pattern includes a plurality of predicted instructions. In response to the pattern detection unit detecting the pattern, the processor is configured to switch from issuing instructions predicted by the branch prediction unit to issuing the plurality of instructions. In some embodiments, the processor includes a replay unit that is configured to replay fetch addresses to an instruction fetch unit to cause the plurality of predicted instructions to be issued.
    Type: Application
    Filed: November 10, 2010
    Publication date: May 10, 2012
    Inventors: Ravindra N. Bhargava, David Suggs, Anthony X. Jarvis
  • Publication number: 20090164766
    Abstract: A system and method for efficient improvement of branch prediction in a microprocessor with negligible impact on die-area, power consumption, and clock cycle period. It is determined if a program counter (PC) register contains a polymorphic indirect unconditional branch (PIUB) instruction. One determination may be searching a table with a portion or all of a PC of past PIUB instructions. If a hit occurs in this table, the global shift register (GSR) is updated by shifting a portion of the branch target address into the GSR, rather than updating the GSR with a taken/not-taken prediction bit. The stored value in the GSR is input into a hashing function along with the PC in order to index prediction tables such as a pattern history table (PHT), a branch target buffer (BTB), an indirect target array, or other. The updated value due to the PIUB instruction improves the accuracy of the prediction tables.
    Type: Application
    Filed: December 20, 2007
    Publication date: June 25, 2009
    Inventors: David Suggs, Ravindra N. Bhargava
  • Publication number: 20080209190
    Abstract: A branch history value associated with a first branch instruction of a first set of instructions is determined. The branch history value represents a branch history of a program flow prior to the first branch instruction. A first branch prediction of the first branch instruction is determined based on the branch history value of the first branch instruction and a first identifier associated with first branch instruction. A second branch prediction of a second branch instruction of the first set of instructions based on the branch history value associated with the first branch instruction and a second identifier associated with the second branch instruction. The second branch instruction occurs subsequent to the first branch instruction in the program flow. A second set of instructions is fetched at the processing device based on at least one of the first branch prediction and the second branch prediction.
    Type: Application
    Filed: February 28, 2007
    Publication date: August 28, 2008
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventors: Ravindra N. Bhargava, Brian Raf
  • Publication number: 20080141002
    Abstract: In accordance with a specific embodiment of the present disclosure, hardware periodically monitors a fetch cycle that fetches data associated with an address to determine performance parameters associated with the fetch cycle. Information related to the duration of a fetch cycle is maintained as well as information indicating the occurrence of various states and data values related to the fetch cycle. For example, the virtual address being processed during the fetch cycle is saved at the integrated circuit containing the fetch engine. Other performance-related parameters associated with execution of instructions at an execution engine of the pipeline are also monitored periodically. However, monitoring performance of the fetch engine is decoupled from monitoring performance-related events of the execution engine.
    Type: Application
    Filed: December 8, 2006
    Publication date: June 12, 2008
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventors: Ravindra N. Bhargava, Benjamin T. Sander
  • Publication number: 20080140993
    Abstract: In accordance with a specific embodiment of the present disclosure, hardware periodically monitors a fetch cycle that fetches data associated with an address to determine performance parameters associated with the fetch cycle. Information related to the duration of a fetch cycle is maintained as well as information indicating the occurrence of various states and data values related to the fetch cycle. For example, the virtual address being processed during the fetch cycle is saved at the integrated circuit containing the fetch engine. Other performance-related parameters associated with execution of instructions at an execution engine of the pipeline are also monitored periodically. However, monitoring performance of the fetch engine is decoupled from monitoring performance-related events of the execution engine.
    Type: Application
    Filed: December 8, 2006
    Publication date: June 12, 2008
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventors: Ravindra N. Bhargava, Benjamin T. Sander, David Neal Suggs
  • Publication number: 20080141008
    Abstract: In accordance with a specific embodiment of the present disclosure, hardware periodically monitors a fetch cycle that fetches data associated with an address to determine performance parameters associated with the fetch cycle. Information related to the duration of a fetch cycle is maintained as well as information indicating the occurrence of various states and data values related to the fetch cycle. For example, the virtual address being processed during the fetch cycle is saved at the integrated circuit containing the fetch engine. Other performance-related parameters associated with execution of instructions at an execution engine of the pipeline are also monitored periodically. However, monitoring performance of the fetch engine is decoupled from monitoring performance-related events of the execution engine.
    Type: Application
    Filed: December 8, 2006
    Publication date: June 12, 2008
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventors: Benjamin T. Sander, Michael Edward Tuuk, Ravindra N. Bhargava