Patents by Inventor Ravindra N. Bhargava

Ravindra N. Bhargava has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11119926
    Abstract: Systems, apparatuses, and methods for maintaining a region-based cache directory are disclosed. A system includes multiple processing nodes, with each processing node including a cache subsystem. The system also includes a cache directory to help manage cache coherency among the different cache subsystems of the system. In order to reduce the number of entries in the cache directory, the cache directory tracks coherency on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. Accordingly, the system includes a region-based cache directory to track regions which have at least one cache line cached in any cache subsystem in the system. The cache directory includes a reference count in each entry to track the aggregate number of cache lines that are cached per region. If a reference count of a given entry goes to zero, the cache directory reclaims the given entry.
    Type: Grant
    Filed: December 18, 2017
    Date of Patent: September 14, 2021
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Vydhyanathan Kalyanasundharam, Kevin M. Lepak, Amit P. Apte, Ganesh Balakrishnan, Eric Christopher Morton, Elizabeth M. Cooper, Ravindra N. Bhargava
  • Publication number: 20210200695
    Abstract: Staging memory access requests includes receiving a memory access request directed to Dynamic Random Access Memory; storing the memory access request in a staging buffer; and moving the memory access request from the staging buffer to a command queue.
    Type: Application
    Filed: December 27, 2019
    Publication date: July 1, 2021
    Inventors: JAMES R. MAGRO, KEDARNATH BALAKRISHNAN, RAVINDRA N. BHARGAVA, GUANHAO SHEN
  • Publication number: 20210200694
    Abstract: Staging buffer arbitration includes: storing a plurality of memory access requests in a staging buffer; selecting a memory access request of the plurality of memory access requests from the staging buffer based on one or more arbitration rules; and moving the memory access request from the staging buffer to a command queue.
    Type: Application
    Filed: December 27, 2019
    Publication date: July 1, 2021
    Inventors: JAMES R. MAGRO, KEDARNATH BALAKRISHNAN, RAVINDRA N. BHARGAVA, GUANHAO SHEN
  • Publication number: 20210073152
    Abstract: Systems, apparatuses, and methods for performing efficient memory accesses for a computing system are disclosed. When a memory controller in a computing system determines a threshold number of memory access requests have not been sent to the memory device in a current mode of a read mode and a write mode, a first cost corresponding to a latency associated with sending remaining requests in either the read queue or the write queue associated with the current mode is determined. If the first cost exceeds the cost of a data bus turnaround, the cost of a data bus turnaround comprising a latency incurred when switching a transmission direction of the data bus from one direction to an opposite direction, then a second cost is determined for sending remaining memory access requests to the memory device. If the second cost does not exceed the cost of the data bus turnaround, then a time for the data bus turnaround is indicated and the current mode of the memory controller is changed.
    Type: Application
    Filed: November 20, 2020
    Publication date: March 11, 2021
    Inventors: Guanhao Shen, Ravindra N. Bhargava, Kedarnath Balakrishnan
  • Publication number: 20200401321
    Abstract: Systems, apparatuses, and methods for performing efficient memory accesses for a computing system are disclosed. In various embodiments, a computing system includes a computing resource and a memory controller coupled to a memory device. The computing resource selectively generates a hint that includes a target address of a memory request generated by the processor. The hint is sent outside the primary communication fabric to the memory controller. The hint conditionally triggers a data access in the memory device. When no page in a bank targeted by the hint is open, the memory controller processes the hint by opening a target page of the hint without retrieving data. The memory controller drops the hint if there are other pending requests that target the same page or the target page is already open.
    Type: Application
    Filed: April 6, 2020
    Publication date: December 24, 2020
    Inventors: Ravindra N. Bhargava, Philip S. Park, Vydhyanathan Kalyanasundharam, James Raymond Magro
  • Patent number: 10846253
    Abstract: Systems, apparatuses, and methods for performing efficient memory accesses for a computing system are disclosed. When a memory controller in a computing system determines a threshold number of memory access requests have not been sent to the memory device in a current mode of a read mode and a write mode, a first cost corresponding to a latency associated with sending remaining requests in either the read queue or the write queue associated with the current mode is determined. If the first cost exceeds the cost of a data bus turnaround, the cost of a data bus turnaround comprising a latency incurred when switching a transmission direction of the data bus from one direction to an opposite direction, then a second cost is determined for sending remaining memory access requests to the memory device. If the second cost does not exceed the cost of the data bus turnaround, then a time for the data bus turnaround is indicated and the current mode of the memory controller is changed.
    Type: Grant
    Filed: December 21, 2017
    Date of Patent: November 24, 2020
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Guanhao Shen, Ravindra N. Bhargava, Kedarnath Balakrishnan
  • Patent number: 10613764
    Abstract: Systems, apparatuses, and methods for performing efficient memory accesses for a computing system are disclosed. In various embodiments, a computing system includes a computing resource and a memory controller coupled to a memory device. The computing resource selectively generates a hint that includes a target address of a memory request generated by the processor. The hint is sent outside the primary communication fabric to the memory controller. The hint conditionally triggers a data access in the memory device. When no page in a bank targeted by the hint is open, the memory controller processes the hint by opening a target page of the hint without retrieving data. The memory controller drops the hint if there are other pending requests that target the same page or the target page is already open.
    Type: Grant
    Filed: November 20, 2017
    Date of Patent: April 7, 2020
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Ravindra N. Bhargava, Philip S. Park, Vydhyanathan Kalyanasundharam, James Raymond Magro
  • Patent number: 10572389
    Abstract: Systems, apparatuses, and methods for performing efficient memory accesses for a computing system are disclosed. External system memory is used as a last-level cache and includes one of a variety of types of dynamic random access memory (DRAM). A memory controller generates a tag request and a separate data request based on a same, single received memory request. The sending of the tag request is prioritized over sending the data request. A partial tag comparison is performed during processing of the tag request. If a tag miss is detected for the partial tag comparison, then the data request is cancelled, and the memory request is sent to main memory. If one or more tag hits are detected for the partial tag comparison, then processing of the data request is dependent upon the result of the full tag comparison.
    Type: Grant
    Filed: December 12, 2017
    Date of Patent: February 25, 2020
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Ravindra N. Bhargava, Ganesh Balakrishnan
  • Patent number: 10545875
    Abstract: Systems, apparatuses, and methods for implementing a tag accelerator cache are disclosed. A system includes at least a data cache and a control unit coupled to the data cache via a memory controller. The control unit includes a tag accelerator cache (TAC) for caching tag blocks fetched from the data cache. The data cache is organized such that multiple tags are retrieved in a single access. This allows hiding the tag latency penalty for future accesses to neighboring tags and improves cache bandwidth. When a tag block is fetched from the data cache, the tag block is cached in the TAC. Memory requests received by the control unit first lookup the TAC before being forwarded to the data cache. Due to the presence of spatial locality in applications, the TAC can filter out a large percentage of tag accesses to the data cache, resulting in latency and bandwidth savings.
    Type: Grant
    Filed: December 27, 2017
    Date of Patent: January 28, 2020
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Vydhyanathan Kalyanasundharam, Kevin M. Lepak, Ganesh Balakrishnan, Ravindra N. Bhargava
  • Patent number: 10503648
    Abstract: Systems, apparatuses, and methods for accelerating cache to cache data transfers are disclosed. A system includes at least a plurality of processing nodes and prediction units, an interconnect fabric, and a memory. A first prediction unit is configured to receive memory requests generated by a first processing node as the requests traverse the interconnect fabric on the path to memory. When the first prediction unit receives a memory request, the first prediction unit generates a prediction of whether data targeted by the request is cached by another processing node. The first prediction unit is configured to cause a speculative probe to be sent to a second processing node responsive to predicting that the data targeted by the memory request is cached by the second processing node. The speculative probe accelerates the retrieval of the data from the second processing node if the prediction is correct.
    Type: Grant
    Filed: December 12, 2017
    Date of Patent: December 10, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Vydhyanathan Kalyanasundharam, Amit P. Apte, Ganesh Balakrishnan, Ann Ling, Ravindra N. Bhargava
  • Patent number: 10503670
    Abstract: Systems, apparatuses, and methods for performing efficient memory accesses in a computing system are disclosed. In various embodiments, a computing system includes computing resources and a memory controller coupled to a memory device. The memory controller determines a memory request targets a given rank of multiple ranks. The memory controller determines a predicted latency for the given rank as an amount of time the pending queue in the memory controller for storing outstanding memory requests does not store any memory requests targeting the given rank. The memory controller determines the total bank latency as an amount of time for refreshing a number of banks which have not yet been refreshed in the given rank with per-bank refresh operations. If there are no pending requests targeting the given rank, each of the predicted latency and the total bank latency is used to select between per-bank and all-bank refresh operations.
    Type: Grant
    Filed: December 21, 2017
    Date of Patent: December 10, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Guanhao Shen, Ravindra N. Bhargava, James Raymond Magro, Kedarnath Balakrishnan, Jing Wang
  • Publication number: 20190196720
    Abstract: Systems, apparatuses, and methods for performing efficient memory accesses for a computing system are disclosed. In various embodiments, a computing system includes one or more computing resources and a memory controller coupled to a memory device. The memory controller determines a memory access request targets a given bank of multiple banks. An access history is updated for the given bank based on whether the memory access request hits on an open page within the given bank and a page hit rate for the given bank is determined. The memory controller sets an idle cycle limit based on the page hit rate. The idle cycle limit is a maximum amount of time the given bank will be held open before closing the given bank while the bank is idle. The idle cycle limit is based at least in part on a page hit rate for the bank.
    Type: Application
    Filed: December 21, 2017
    Publication date: June 27, 2019
    Inventors: Guanhao Shen, Ravindra N. Bhargava, James Raymond Magro, Kedarnath Balakrishnan, Kevin M. Brandl
  • Publication number: 20190196974
    Abstract: Systems, apparatuses, and methods for implementing a tag accelerator cache are disclosed. A system includes at least a data cache and a control unit coupled to the data cache via a memory controller. The control unit includes a tag accelerator cache (TAC) for caching tag blocks fetched from the data cache. The data cache is organized such that multiple tags are retrieved in a single access. This allows hiding the tag latency penalty for future accesses to neighboring tags and improves cache bandwidth. When a tag block is fetched from the data cache, the tag block is cached in the TAC. Memory requests received by the control unit first lookup the TAC before being forwarded to the data cache. Due to the presence of spatial locality in applications, the TAC can filter out a large percentage of tag accesses to the data cache, resulting in latency and bandwidth savings.
    Type: Application
    Filed: December 27, 2017
    Publication date: June 27, 2019
    Inventors: Vydhyanathan Kalyanasundharam, Kevin M. Lepak, Ganesh Balakrishnan, Ravindra N. Bhargava
  • Publication number: 20190196995
    Abstract: Systems, apparatuses, and methods for performing efficient memory accesses for a computing system are disclosed. When a memory controller in a computing system determines a threshold number of memory access requests have not been sent to the memory device in a current mode of a read mode and a write mode, a first cost corresponding to a latency associated with sending remaining requests in either the read queue or the write queue associated with the current mode is determined. If the first cost exceeds the cost of a data bus turnaround, the cost of a data bus turnaround comprising a latency incurred when switching a transmission direction of the data bus from one direction to an opposite direction, then a second cost is determined for sending remaining memory access requests to the memory device. If the second cost does not exceed the cost of the data bus turnaround, then a time for the data bus turnaround is indicated and the current mode of the memory controller is changed.
    Type: Application
    Filed: December 21, 2017
    Publication date: June 27, 2019
    Inventors: Guanhao Shen, Ravindra N. Bhargava, Kedarnath Balakrishnan
  • Publication number: 20190196996
    Abstract: Systems, apparatuses, and methods for performing efficient memory accesses for a computing system are disclosed. When a memory controller in a computing system determines a threshold number of memory read requests have been sent to a memory device in a read mode of a data bus, the memory controller determines a threshold number of memory write requests to send to the memory device in an upcoming write mode is a number of outstanding memory write requests. Alternatively, the memory controller determines the threshold number of memory write requests to send to the memory device in an upcoming write mode is a maximum value of the number of outstanding memory write requests and a programmable value of the write burst length stored in a control register. Therefore, the write burst length is determined dynamically. Similarly, the read burst length is determined dynamically when the write mode ends.
    Type: Application
    Filed: December 21, 2017
    Publication date: June 27, 2019
    Inventors: Kedarnath Balakrishnan, Ravindra N. Bhargava, Guanhao Shen, James Raymond Magro, Kevin M. Brandl
  • Publication number: 20190196987
    Abstract: Systems, apparatuses, and methods for performing efficient memory accesses in a computing system are disclosed. In various embodiments, a computing system includes computing resources and a memory controller coupled to a memory device. The memory controller determines a memory request targets a given rank of multiple ranks. The memory controller determines a predicted latency for the given rank as an amount of time the pending queue in the memory controller for storing outstanding memory requests does not store any memory requests targeting the given rank. The memory controller determines the total bank latency as an amount of time for refreshing a number of banks which have not yet been refreshed in the given rank with per-bank refresh operations. If there are no pending requests targeting the given rank, each of the predicted latency and the total bank latency is used to select between per-bank and all-bank refresh operations.
    Type: Application
    Filed: December 21, 2017
    Publication date: June 27, 2019
    Inventors: Guanhao Shen, Ravindra N. Bhargava, James Raymond Magro, Kedarnath Balakrishnan, Jing Wang
  • Publication number: 20190188137
    Abstract: Systems, apparatuses, and methods for maintaining a region-based cache directory are disclosed. A system includes multiple processing nodes, with each processing node including a cache subsystem. The system also includes a cache directory to help manage cache coherency among the different cache subsystems of the system. In order to reduce the number of entries in the cache directory, the cache directory tracks coherency on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. Accordingly, the system includes a region-based cache directory to track regions which have at least one cache line cached in any cache subsystem in the system. The cache directory includes a reference count in each entry to track the aggregate number of cache lines that are cached per region. If a reference count of a given entry goes to zero, the cache directory reclaims the given entry.
    Type: Application
    Filed: December 18, 2017
    Publication date: June 20, 2019
    Inventors: Vydhyanathan Kalyanasundharam, Kevin M. Lepak, Amit P. Apte, Ganesh Balakrishnan, Eric Christopher Morton, Elizabeth M. Cooper, Ravindra N. Bhargava
  • Publication number: 20190179758
    Abstract: Systems, apparatuses, and methods for accelerating cache to cache data transfers are disclosed. A system includes at least a plurality of processing nodes and prediction units, an interconnect fabric, and a memory. A first prediction unit is configured to receive memory requests generated by a first processing node as the requests traverse the interconnect fabric on the path to memory. When the first prediction unit receives a memory request, the first prediction unit generates a prediction of whether data targeted by the request is cached by another processing node. The first prediction unit is configured to cause a speculative probe to be sent to a second processing node responsive to predicting that the data targeted by the memory request is cached by the second processing node. The speculative probe accelerates the retrieval of the data from the second processing node if the prediction is correct.
    Type: Application
    Filed: December 12, 2017
    Publication date: June 13, 2019
    Inventors: Vydhyanathan Kalyanasundharam, Amit P. Apte, Ganesh Balakrishnan, Ann Ling, Ravindra N. Bhargava
  • Publication number: 20190179760
    Abstract: Systems, apparatuses, and methods for performing efficient memory accesses for a computing system are disclosed. External system memory is used as a last-level cache and includes one of a variety of types of dynamic random access memory (DRAM). A memory controller generates a tag request and a separate data request based on a same, single received memory request. The sending of the tag request is prioritized over sending the data request. A partial tag comparison is performed during processing of the tag request. If a tag miss is detected for the partial tag comparison, then the data request is cancelled, and the memory request is sent to main memory. If one or more tag hits are detected for the partial tag comparison, then processing of the data request is dependent upon the result of the full tag comparison.
    Type: Application
    Filed: December 12, 2017
    Publication date: June 13, 2019
    Inventors: Ravindra N. Bhargava, Ganesh Balakrishnan
  • Publication number: 20190155516
    Abstract: Systems, apparatuses, and methods for performing efficient memory accesses for a computing system are disclosed. In various embodiments, a computing system includes a computing resource and a memory controller coupled to a memory device. The computing resource selectively generates a hint that includes a target address of a memory request generated by the processor. The hint is sent outside the primary communication fabric to the memory controller. The hint conditionally triggers a data access in the memory device. When no page in a bank targeted by the hint is open, the memory controller processes the hint by opening a target page of the hint without retrieving data. The memory controller drops the hint if there are other pending requests that target the same page or the target page is already open.
    Type: Application
    Filed: November 20, 2017
    Publication date: May 23, 2019
    Inventors: Ravindra N. Bhargava, Philip S. Park, Vydhyanathan Kalyanasundharam, James Raymond Magro