Patents by Inventor Amit P. Apte

Amit P. Apte has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Region based directory scheme to adapt to large cache sizes

Patent number: 11119926

Abstract: Systems, apparatuses, and methods for maintaining a region-based cache directory are disclosed. A system includes multiple processing nodes, with each processing node including a cache subsystem. The system also includes a cache directory to help manage cache coherency among the different cache subsystems of the system. In order to reduce the number of entries in the cache directory, the cache directory tracks coherency on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. Accordingly, the system includes a region-based cache directory to track regions which have at least one cache line cached in any cache subsystem in the system. The cache directory includes a reference count in each entry to track the aggregate number of cache lines that are cached per region. If a reference count of a given entry goes to zero, the cache directory reclaims the given entry.

Type: Grant

Filed: December 18, 2017

Date of Patent: September 14, 2021

Assignee: Advanced Micro Devices, Inc.

Inventors: Vydhyanathan Kalyanasundharam, Kevin M. Lepak, Amit P. Apte, Ganesh Balakrishnan, Eric Christopher Morton, Elizabeth M. Cooper, Ravindra N. Bhargava
ZERO VALUE MEMORY COMPRESSION

Publication number: 20210191865

Abstract: A coherency management device receives requests to read data from or write data to an address in a main memory. On a write, if the data includes zero data, an entry corresponding to the memory address is created in a cache directory if it does not already exist, is set to an invalid state, and indicates that the data includes zero data. The zero data is not written to main memory or a cache. On a read, the cache directory is checked for an entry corresponding to the memory address. If the entry exists in the cache directory, is invalid, and includes an indication that data corresponding to the memory address includes zero data, the coherency management device returns zero data in response to the request without fetching the data from main memory or a cache.

Type: Application

Filed: December 20, 2019

Publication date: June 24, 2021

Applicant: Advanced Micro Devices, Inc.

Inventors: Vydhyanathan Kalyanasundharam, Amit P. Apte
HOME AGENT BASED CACHE TRANSFER ACCELERATION SCHEME

Publication number: 20210064545

Abstract: Systems, apparatuses, and methods for implementing a speculative probe mechanism are disclosed. A system includes at least multiple processing nodes, a probe filter, and a coherent slave. The coherent slave includes an early probe cache to cache recent lookups to the probe filter. The early probe cache includes entries for regions of memory, wherein a region includes a plurality of cache lines. The coherent slave performs parallel lookups to the probe filter and the early probe cache responsive to receiving a memory request. An early probe is sent to a first processing node responsive to determining that a lookup to the early probe cache hits on a first entry identifying the first processing node as an owner of a first region targeted by the memory request and responsive to determining that a confidence indicator of the first entry is greater than a threshold.

Type: Application

Filed: September 14, 2020

Publication date: March 4, 2021

Inventors: Amit P. Apte, Ganesh Balakrishnan, Vydhyanathan Kalyanasundharam, Kevin M. Lepak
Accelerating accesses to private regions in a region-based cache directory scheme

Patent number: 10922237

Abstract: Systems, apparatuses, and methods for accelerating accesses to private regions in a region-based cache directory scheme are disclosed. A system includes multiple processing nodes, one or more memory devices, and one or more region-based cache directories to manage cache coherence among the nodes' cache subsystems. Region-based cache directories track coherence on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. The cache directory entries for regions that are only accessed by a single node are cached locally at the node. Updates to the reference count for these entries are made locally rather than sending updates to the cache directory. When a second node accesses a first node's private region, the region is now considered shared, and the entry for this region is transferred from the first node back to the cache directory.

Type: Grant

Filed: September 12, 2018

Date of Patent: February 16, 2021

Assignee: Advanced Micro Devices, Inc.

Inventors: Vydhyanathan Kalyanasundharam, Amit P. Apte, Ganesh Balakrishnan
REGION BASED SPLIT-DIRECTORY SCHEME TO ADAPT TO LARGE CACHE SIZES

Publication number: 20200401519

Abstract: Systems, apparatuses, and methods for maintaining region-based cache directories split between node and memory are disclosed. The system with multiple processing nodes includes cache directories split between the nodes and memory to help manage cache coherency among the nodes' cache subsystems. In order to reduce the number of entries in the cache directories, the cache directories track coherency on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. Each processing node includes a node-based cache directory to track regions which have at least one cache line cached in any cache subsystem in the node. The node-based cache directory includes a reference count field in each entry to track the aggregate number of cache lines that are cached per region. The memory-based cache directory includes entries for regions which have an entry stored in any node-based cache directory of the system.

Type: Application

Filed: July 2, 2020

Publication date: December 24, 2020

Inventors: Vydhyanathan Kalyanasundharam, Kevin M. Lepak, Amit P. Apte, Ganesh Balakrishnan
Home agent based cache transfer acceleration scheme

Patent number: 10776282

Abstract: Systems, apparatuses, and methods for implementing a speculative probe mechanism are disclosed. A system includes at least multiple processing nodes, a probe filter, and a coherent slave. The coherent slave includes an early probe cache to cache recent lookups to the probe filter. The early probe cache includes entries for regions of memory, wherein a region includes a plurality of cache lines. The coherent slave performs parallel lookups to the probe filter and the early probe cache responsive to receiving a memory request. An early probe is sent to a first processing node responsive to determining that a lookup to the early probe cache hits on a first entry identifying the first processing node as an owner of a first region targeted by the memory request and responsive to determining that a confidence indicator of the first entry is greater than a threshold.

Type: Grant

Filed: December 15, 2017

Date of Patent: September 15, 2020

Assignee: Advanced Micro Devices, Inc.

Inventors: Amit P. Apte, Ganesh Balakrishnan, Vydhyanathan Kalyanasundharam, Kevin M. Lepak
Region based split-directory scheme to adapt to large cache sizes

Patent number: 10705959

Abstract: Systems, apparatuses, and methods for maintaining region-based cache directories split between node and memory are disclosed. The system with multiple processing nodes includes cache directories split between the nodes and memory to help manage cache coherency among the nodes' cache subsystems. In order to reduce the number of entries in the cache directories, the cache directories track coherency on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. Each processing node includes a node-based cache directory to track regions which have at least one cache line cached in any cache subsystem in the node. The node-based cache directory includes a reference count field in each entry to track the aggregate number of cache lines that are cached per region. The memory-based cache directory includes entries for regions which have an entry stored in any node-based cache directory of the system.

Type: Grant

Filed: August 31, 2018

Date of Patent: July 7, 2020

Assignee: Advanced Micro Devices, Inc.

Inventors: Vydhyanathan Kalyanasundharam, Kevin M. Lepak, Amit P. Apte, Ganesh Balakrishnan
Method to reduce write responses to improve bandwidth and efficiency

Patent number: 10684965

Abstract: Systems, apparatuses, and methods for routing traffic between clients and system memory are disclosed. A computing system includes system memory and one or more clients, each capable of generating memory access requests. The computing system also includes a communication fabric for transferring traffic between the clients and the system memory. The fabric includes master units for interfacing with clients and grouping write requests with a same target together. The fabric also includes slave units for interfacing with memory controllers and for sending a single write response when each write request in a group has been serviced. When the master unit receives the single write response for the group, it sends a respective acknowledgment response for each of the multiple write requests in the group to clients that generated the multiple write requests.

Type: Grant

Filed: November 8, 2017

Date of Patent: June 16, 2020

Assignee: Advanced Micro Devices, Inc.

Inventors: Vydhyanathan Kalyanasundharam, Amit P. Apte, Chen-Ping Yang
ACCELERATING ACCESSES TO PRIVATE REGIONS IN A REGION-BASED CACHE DIRECTORY SCHEME

Publication number: 20200081844

Abstract: Systems, apparatuses, and methods for accelerating accesses to private regions in a region-based cache directory scheme are disclosed. A system includes multiple processing nodes, one or more memory devices, and one or more region-based cache directories to manage cache coherence among the nodes' cache subsystems. Region-based cache directories track coherence on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. The cache directory entries for regions that are only accessed by a single node are cached locally at the node. Updates to the reference count for these entries are made locally rather than sending updates to the cache directory. When a second node accesses a first node's private region, the region is now considered shared, and the entry for this region is transferred from the first node back to the cache directory.

Type: Application

Filed: September 12, 2018

Publication date: March 12, 2020

Inventors: Vydhyanathan Kalyanasundharam, Amit P. Apte, Ganesh Balakrishnan
REGION BASED SPLIT-DIRECTORY SCHEME TO ADAPT TO LARGE CACHE SIZES

Publication number: 20200073801

Abstract: Systems, apparatuses, and methods for maintaining region-based cache directories split between node and memory are disclosed. The system with multiple processing nodes includes cache directories split between the nodes and memory to help manage cache coherency among the nodes' cache subsystems. In order to reduce the number of entries in the cache directories, the cache directories track coherency on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. Each processing node includes a node-based cache directory to track regions which have at least one cache line cached in any cache subsystem in the node. The node-based cache directory includes a reference count field in each entry to track the aggregate number of cache lines that are cached per region. The memory-based cache directory includes entries for regions which have an entry stored in any node-based cache directory of the system.

Type: Application

Filed: August 31, 2018

Publication date: March 5, 2020

Inventors: Vydhyanathan Kalyanasundharam, Kevin M. Lepak, Amit P. Apte, Ganesh Balakrishnan
Cancel and replay protocol scheme to improve ordered bandwidth

Patent number: 10540316

Abstract: Systems, apparatuses, and methods for implementing a cancel and replay mechanism for ordered requests are disclosed. A system includes at least an ordering master, a memory controller, a coherent slave coupled to the memory controller, and an interconnect fabric coupled to the ordering master and the coherent slave. The ordering master generates a write request which is forwarded to the coherent slave on the path to memory. The coherent slave sends invalidating probes to all processing nodes and then sends an indication that the write request is globally visible to the ordering master when all cached copies of the data targeted by the write request have been invalidated. In response to receiving the globally visible indication, the ordering master starts a timer. If the timer expires before all older requests have become globally visible, then the write request is cancelled and replayed to ensure forward progress in the fabric and avoid a potential deadlock scenario.

Type: Grant

Filed: December 28, 2017

Date of Patent: January 21, 2020

Assignee: Advanced Micro Devices, Inc.

Inventors: Vydhyanathan Kalyanasundharam, Eric Christopher Morton, Chen-Ping Yang, Amit P. Apte, Elizabeth M. Cooper
Cache to cache data transfer acceleration techniques

Patent number: 10503648

Abstract: Systems, apparatuses, and methods for accelerating cache to cache data transfers are disclosed. A system includes at least a plurality of processing nodes and prediction units, an interconnect fabric, and a memory. A first prediction unit is configured to receive memory requests generated by a first processing node as the requests traverse the interconnect fabric on the path to memory. When the first prediction unit receives a memory request, the first prediction unit generates a prediction of whether data targeted by the request is cached by another processing node. The first prediction unit is configured to cause a speculative probe to be sent to a second processing node responsive to predicting that the data targeted by the memory request is cached by the second processing node. The speculative probe accelerates the retrieval of the data from the second processing node if the prediction is correct.

Type: Grant

Filed: December 12, 2017

Date of Patent: December 10, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Vydhyanathan Kalyanasundharam, Amit P. Apte, Ganesh Balakrishnan, Ann Ling, Ravindra N. Bhargava
CANCEL AND REPLAY PROTOCOL SCHEME TO IMPROVE ORDERED BANDWIDTH

Publication number: 20190205280

Abstract: Systems, apparatuses, and methods for implementing a cancel and replay mechanism for ordered requests are disclosed. A system includes at least an ordering master, a memory controller, a coherent slave coupled to the memory controller, and an interconnect fabric coupled to the ordering master and the coherent slave. The ordering master generates a write request which is forwarded to the coherent slave on the path to memory. The coherent slave sends invalidating probes to all processing nodes and then sends an indication that the write request is globally visible to the ordering master when all cached copies of the data targeted by the write request have been invalidated. In response to receiving the globally visible indication, the ordering master starts a timer. If the timer expires before all older requests have become globally visible, then the write request is cancelled and replayed to ensure forward progress in the fabric and avoid a potential deadlock scenario.

Type: Application

Filed: December 28, 2017

Publication date: July 4, 2019

Inventors: Vydhyanathan Kalyanasundharam, Eric Christopher Morton, Chen-Ping Yang, Amit P. Apte, Elizabeth M. Cooper
HOME AGENT BASED CACHE TRANSFER ACCELERATION SCHEME

Publication number: 20190188155

Abstract: Systems, apparatuses, and methods for implementing a speculative probe mechanism are disclosed. A system includes at least multiple processing nodes, a probe filter, and a coherent slave. The coherent slave includes an early probe cache to cache recent lookups to the probe filter. The early probe cache includes entries for regions of memory, wherein a region includes a plurality of cache lines. The coherent slave performs parallel lookups to the probe filter and the early probe cache responsive to receiving a memory request. An early probe is sent to a first processing node responsive to determining that a lookup to the early probe cache hits on a first entry identifying the first processing node as an owner of a first region targeted by the memory request and responsive to determining that a confidence indicator of the first entry is greater than a threshold.

Type: Application

Filed: December 15, 2017

Publication date: June 20, 2019

Inventors: Amit P. Apte, Ganesh Balakrishnan, Vydhyanathan Kalyanasundharam, Kevin M. Lepak
REGION BASED DIRECTORY SCHEME TO ADAPT TO LARGE CACHE SIZES

Publication number: 20190188137

Abstract: Systems, apparatuses, and methods for maintaining a region-based cache directory are disclosed. A system includes multiple processing nodes, with each processing node including a cache subsystem. The system also includes a cache directory to help manage cache coherency among the different cache subsystems of the system. In order to reduce the number of entries in the cache directory, the cache directory tracks coherency on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. Accordingly, the system includes a region-based cache directory to track regions which have at least one cache line cached in any cache subsystem in the system. The cache directory includes a reference count in each entry to track the aggregate number of cache lines that are cached per region. If a reference count of a given entry goes to zero, the cache directory reclaims the given entry.

Type: Application

Filed: December 18, 2017

Publication date: June 20, 2019

Inventors: Vydhyanathan Kalyanasundharam, Kevin M. Lepak, Amit P. Apte, Ganesh Balakrishnan, Eric Christopher Morton, Elizabeth M. Cooper, Ravindra N. Bhargava
CACHE TO CACHE DATA TRANSFER ACCELERATION TECHNIQUES

Publication number: 20190179758

Abstract: Systems, apparatuses, and methods for accelerating cache to cache data transfers are disclosed. A system includes at least a plurality of processing nodes and prediction units, an interconnect fabric, and a memory. A first prediction unit is configured to receive memory requests generated by a first processing node as the requests traverse the interconnect fabric on the path to memory. When the first prediction unit receives a memory request, the first prediction unit generates a prediction of whether data targeted by the request is cached by another processing node. The first prediction unit is configured to cause a speculative probe to be sent to a second processing node responsive to predicting that the data targeted by the memory request is cached by the second processing node. The speculative probe accelerates the retrieval of the data from the second processing node if the prediction is correct.

Type: Application

Filed: December 12, 2017

Publication date: June 13, 2019

Inventors: Vydhyanathan Kalyanasundharam, Amit P. Apte, Ganesh Balakrishnan, Ann Ling, Ravindra N. Bhargava
METHOD TO REDUCE WRITE RESPONSES TO IMPROVE BANDWIDTH AND EFFICIENCY

Publication number: 20190138465

Abstract: Systems, apparatuses, and methods for routing traffic between clients and system memory are disclosed. A computing system includes system memory and one or more clients, each capable of generating memory access requests. The computing system also includes a communication fabric for transferring traffic between the clients and the system memory. The fabric includes master units for interfacing with clients and grouping write requests with a same target together. The fabric also includes slave units for interfacing with memory controllers and for sending a single write response when each write request in a group has been serviced. When the master unit receives the single write response for the group, it sends a respective acknowledgment response for each of the multiple write requests in the group to clients that generated the multiple write requests.

Type: Application

Filed: November 8, 2017

Publication date: May 9, 2019

Inventors: Vydhyanathan Kalyanasundharam, Amit P. Apte, Chen-Ping Yang
Contended lock request elision scheme

Patent number: 10248564

Abstract: A system and method for network traffic management between multiple nodes are described. A computing system includes multiple nodes connected to one another. When a home node determines a number of nodes requesting read access for a given data block assigned to the home node exceeds a threshold and a copy of the given data block is already stored at a first node of the multiple nodes in the system, the home node sends a command to the first node. The command directs the first node to forward a copy of the given data block to the home node. The home node then maintains a copy of the given data block and forwards copies of the given data block to other requesting nodes until the home node detects a write request or a lock release request for the given data block.

Type: Grant

Filed: June 24, 2016

Date of Patent: April 2, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Vydhyanathan Kalyanasundharam, Eric Christopher Morton, Amit P. Apte, Elizabeth M. Cooper
CONTENDED LOCK REQUEST ELISION SCHEME

Publication number: 20170371787

Abstract: A system and method for network traffic management between multiple nodes are described. A computing system includes multiple nodes connected to one another. When a home node determines a number of nodes requesting read access for a given data block assigned to the home node exceeds a threshold and a copy of the given data block is already stored at a first node of the multiple nodes in the system, the home node sends a command to the first node. The command directs the first node to forward a copy of the given data block to the home node. The home node then maintains a copy of the given data block and forwards copies of the given data block to other requesting nodes until the home node detects a write request or a lock release request for the given data block.

Type: Application

Filed: June 24, 2016

Publication date: December 28, 2017

Inventors: Vydhyanathan Kalyanasundharam, Eric Christopher Morton, Amit P. Apte, Elizabeth M. Cooper
Method and apparatus for encoding erroneous data in an error correction code protected memory

Patent number: 9354970

Abstract: A method and device are described for encoding erroneous data in an error correction code (ECC) protected memory. In one embodiment, incoming data including a plurality of data symbols and a data integrity marker is received. At least one extra symbol is used to mark the incoming data as error-free data or erroneous data (i.e., poison) based on the data integrity marker. ECC may be created to protect the data symbols. The ECC may include a plurality of check symbols, a plurality of unused symbols and the at least one extra symbol. In another embodiment, an error marker may be propagated from a single ECC word to all ECC words of data block (e.g., a cache line, a page, and the like) to prevent errors due to corruption of the error marker caused by faulty memory in the erroneous ECC word.

Type: Grant

Filed: March 31, 2014

Date of Patent: May 31, 2016

Assignee: Advanced Micro Devices, Inc.

Inventors: Ross V. La Fetra, Vilas K. Sridharan, Vydhyanathan Kalyanasundharam, Dean A. Liberty, Amit P. Apte

prev 1 2 3 next