Patents by Inventor Vydhyanathan Kalyanasundharam

Vydhyanathan Kalyanasundharam has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11210246
    Abstract: Systems, apparatuses, and methods for routing interrupts on a coherency probe network are disclosed. A computing system includes a plurality of processing nodes, a coherency probe network, and one or more control units. The coherency probe network carries coherency probe messages between coherent agents. Interrupts that are detected by a control unit are converted into messages that are compatible with coherency probe messages and then routed to a target destination via the coherency probe network. Interrupts are generated with a first encoding while coherency probe messages have a second encoding. Cache subsystems determine whether a message received via the coherency probe network is an interrupt message or a coherency probe message based on an encoding embedded in the received message. Interrupt messages are routed to interrupt controller(s) while coherency probe messages are processed in accordance with a coherence probe action field embedded in the message.
    Type: Grant
    Filed: August 24, 2018
    Date of Patent: December 28, 2021
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Vydhyanathan Kalyanasundharam, Eric Christopher Morton, Bryan P. Broussard, Paul James Moyer, William Louie Walker
  • Patent number: 11210248
    Abstract: Systems, devices, and methods for direct memory access. A system direct memory access (SDMA) device disposed on a processor die sends a message which includes physical addresses of a source buffer and a destination buffer, and a size of a data transfer, to a data fabric device. The data fabric device sends an instruction which includes the physical addresses of the source and destination buffer, and the size of the data transfer, to first agent devices. Each of the first agent devices reads a portion of the source buffer from a memory device at the physical address of the source buffer. Each of the first agent devices sends the portion of the source buffer to one of second agent devices. Each of the second agent devices writes the portion of the source buffer to the destination buffer.
    Type: Grant
    Filed: December 20, 2019
    Date of Patent: December 28, 2021
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Vydhyanathan Kalyanasundharam, Narendra Kamat
  • Patent number: 11196657
    Abstract: A system for automatically discovering fabric topology includes at least one or more processing units, one or more memory devices, a security processor, and a communication fabric with an unknown topology coupled to the processing unit(s), memory device(s), and security processor. The security processor queries each component of the fabric to retrieve various attributes associated with the component. The security processor utilizes the retrieved attributes to create a network graph of the topology of the components within the fabric. The security processor generates routing tables from the network graph and programs the routing tables into the fabric components. Then, the fabric components utilize the routing tables to determine how to route incoming packets.
    Type: Grant
    Filed: December 21, 2017
    Date of Patent: December 7, 2021
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Vydhyanathan Kalyanasundharam, Eric Christopher Morton, Alan Dodson Smith, Joe G. Cruz
  • Patent number: 11119926
    Abstract: Systems, apparatuses, and methods for maintaining a region-based cache directory are disclosed. A system includes multiple processing nodes, with each processing node including a cache subsystem. The system also includes a cache directory to help manage cache coherency among the different cache subsystems of the system. In order to reduce the number of entries in the cache directory, the cache directory tracks coherency on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. Accordingly, the system includes a region-based cache directory to track regions which have at least one cache line cached in any cache subsystem in the system. The cache directory includes a reference count in each entry to track the aggregate number of cache lines that are cached per region. If a reference count of a given entry goes to zero, the cache directory reclaims the given entry.
    Type: Grant
    Filed: December 18, 2017
    Date of Patent: September 14, 2021
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Vydhyanathan Kalyanasundharam, Kevin M. Lepak, Amit P. Apte, Ganesh Balakrishnan, Eric Christopher Morton, Elizabeth M. Cooper, Ravindra N. Bhargava
  • Publication number: 20210191890
    Abstract: Systems, devices, and methods for direct memory access. A system direct memory access (SDMA) device disposed on a processor die sends a message which includes physical addresses of a source buffer and a destination buffer, and a size of a data transfer, to a data fabric device. The data fabric device sends an instruction which includes the physical addresses of the source and destination buffer, and the size of the data transfer, to first agent devices. Each of the first agent devices reads a portion of the source buffer from a memory device at the physical address of the source buffer. Each of the first agent devices sends the portion of the source buffer to one of second agent devices. Each of the second agent devices writes the portion of the source buffer to the destination buffer.
    Type: Application
    Filed: December 20, 2019
    Publication date: June 24, 2021
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Vydhyanathan Kalyanasundharam, Narendra Kamat
  • Publication number: 20210194827
    Abstract: Systems, apparatuses, and methods for efficient data transfer in a computing system are disclosed. A source generates packets to send across a communication fabric (or fabric) to a destination. The source generates partition enable signals for the partitions of payload data. The source negates an enable signal for a particular partition when the source determines the packet type indicates the particular partition should have an associated asserted enable signal in the packet, but the source also determines the particular partition includes a particular data pattern. Routing components of the fabric disable clock signals to storage elements assigned to store the particular partition. The destination inserts the particular data pattern for the particular partition in the payload data.
    Type: Application
    Filed: December 23, 2019
    Publication date: June 24, 2021
    Inventors: Greggory D. Donley, Vydhyanathan Kalyanasundharam, Mark A. Silla, Ashwin Chincholi
  • Publication number: 20210191865
    Abstract: A coherency management device receives requests to read data from or write data to an address in a main memory. On a write, if the data includes zero data, an entry corresponding to the memory address is created in a cache directory if it does not already exist, is set to an invalid state, and indicates that the data includes zero data. The zero data is not written to main memory or a cache. On a read, the cache directory is checked for an entry corresponding to the memory address. If the entry exists in the cache directory, is invalid, and includes an indication that data corresponding to the memory address includes zero data, the coherency management device returns zero data in response to the request without fetching the data from main memory or a cache.
    Type: Application
    Filed: December 20, 2019
    Publication date: June 24, 2021
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Vydhyanathan Kalyanasundharam, Amit P. Apte
  • Patent number: 11036658
    Abstract: Systems, methods, and port controller designs employ a light-weight memory protocol. A light-weight memory protocol controller is selectively coupled to a Cache Coherent Interconnect for Accelerators (CCIX) port. Over an on-chip interconnect fabric, the light-weight protocol controller receives memory access requests from a processor and, in response, transmits associated memory access requests to an external memory through the CCIX port using only a proper subset of CCIX protocol memory transactions types including non-cacheable transactions and non-snooping transactions. The light-weight memory protocol controller is selectively uncoupled from the CCIX port and a remote coherent slave controller is coupled in its place. The remote coherent slave controller receives memory access requests and, in response, transmits associated memory access requests to a memory module through the CCIX port using cacheable CCIX protocol memory transaction types.
    Type: Grant
    Filed: January 16, 2019
    Date of Patent: June 15, 2021
    Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Vydhyanathan Kalyanasundharam, Philip Ng, Alexander J Branover, Kevin M. Lepak
  • Publication number: 20210064545
    Abstract: Systems, apparatuses, and methods for implementing a speculative probe mechanism are disclosed. A system includes at least multiple processing nodes, a probe filter, and a coherent slave. The coherent slave includes an early probe cache to cache recent lookups to the probe filter. The early probe cache includes entries for regions of memory, wherein a region includes a plurality of cache lines. The coherent slave performs parallel lookups to the probe filter and the early probe cache responsive to receiving a memory request. An early probe is sent to a first processing node responsive to determining that a lookup to the early probe cache hits on a first entry identifying the first processing node as an owner of a first region targeted by the memory request and responsive to determining that a confidence indicator of the first entry is greater than a threshold.
    Type: Application
    Filed: September 14, 2020
    Publication date: March 4, 2021
    Inventors: Amit P. Apte, Ganesh Balakrishnan, Vydhyanathan Kalyanasundharam, Kevin M. Lepak
  • Patent number: 10922237
    Abstract: Systems, apparatuses, and methods for accelerating accesses to private regions in a region-based cache directory scheme are disclosed. A system includes multiple processing nodes, one or more memory devices, and one or more region-based cache directories to manage cache coherence among the nodes' cache subsystems. Region-based cache directories track coherence on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. The cache directory entries for regions that are only accessed by a single node are cached locally at the node. Updates to the reference count for these entries are made locally rather than sending updates to the cache directory. When a second node accesses a first node's private region, the region is now considered shared, and the entry for this region is transferred from the first node back to the cache directory.
    Type: Grant
    Filed: September 12, 2018
    Date of Patent: February 16, 2021
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Vydhyanathan Kalyanasundharam, Amit P. Apte, Ganesh Balakrishnan
  • Publication number: 20200401321
    Abstract: Systems, apparatuses, and methods for performing efficient memory accesses for a computing system are disclosed. In various embodiments, a computing system includes a computing resource and a memory controller coupled to a memory device. The computing resource selectively generates a hint that includes a target address of a memory request generated by the processor. The hint is sent outside the primary communication fabric to the memory controller. The hint conditionally triggers a data access in the memory device. When no page in a bank targeted by the hint is open, the memory controller processes the hint by opening a target page of the hint without retrieving data. The memory controller drops the hint if there are other pending requests that target the same page or the target page is already open.
    Type: Application
    Filed: April 6, 2020
    Publication date: December 24, 2020
    Inventors: Ravindra N. Bhargava, Philip S. Park, Vydhyanathan Kalyanasundharam, James Raymond Magro
  • Publication number: 20200401519
    Abstract: Systems, apparatuses, and methods for maintaining region-based cache directories split between node and memory are disclosed. The system with multiple processing nodes includes cache directories split between the nodes and memory to help manage cache coherency among the nodes' cache subsystems. In order to reduce the number of entries in the cache directories, the cache directories track coherency on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. Each processing node includes a node-based cache directory to track regions which have at least one cache line cached in any cache subsystem in the node. The node-based cache directory includes a reference count field in each entry to track the aggregate number of cache lines that are cached per region. The memory-based cache directory includes entries for regions which have an entry stored in any node-based cache directory of the system.
    Type: Application
    Filed: July 2, 2020
    Publication date: December 24, 2020
    Inventors: Vydhyanathan Kalyanasundharam, Kevin M. Lepak, Amit P. Apte, Ganesh Balakrishnan
  • Patent number: 10776282
    Abstract: Systems, apparatuses, and methods for implementing a speculative probe mechanism are disclosed. A system includes at least multiple processing nodes, a probe filter, and a coherent slave. The coherent slave includes an early probe cache to cache recent lookups to the probe filter. The early probe cache includes entries for regions of memory, wherein a region includes a plurality of cache lines. The coherent slave performs parallel lookups to the probe filter and the early probe cache responsive to receiving a memory request. An early probe is sent to a first processing node responsive to determining that a lookup to the early probe cache hits on a first entry identifying the first processing node as an owner of a first region targeted by the memory request and responsive to determining that a confidence indicator of the first entry is greater than a threshold.
    Type: Grant
    Filed: December 15, 2017
    Date of Patent: September 15, 2020
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Amit P. Apte, Ganesh Balakrishnan, Vydhyanathan Kalyanasundharam, Kevin M. Lepak
  • Publication number: 20200259747
    Abstract: Systems, apparatuses, and methods for dynamic buffer management in multi-client token flow control routers are disclosed. A system includes at least one or more processing units, a memory, and a communication fabric with a plurality of routers coupled to the processing unit(s) and the memory. A router servicing multiple active clients allocates a first number of tokens to each active client. The first number of tokens is less than a second number of tokens needed to saturate the bandwidth of each client to the router. The router also allocates a third number of tokens to a free pool, with tokens from the free pool being dynamically allocated to different clients. The third number of tokens is equal to the difference between the second number of tokens and the first number of tokens. An advantage of this approach is reducing the amount of buffer space needed at the router.
    Type: Application
    Filed: February 19, 2020
    Publication date: August 13, 2020
    Inventors: Alan Dodson Smith, Chintan S. Patel, Eric Christopher Morton, Vydhyanathan Kalyanasundharam, Narendra Kamat
  • Publication number: 20200226081
    Abstract: Systems, methods, and port controller designs employ a light-weight memory protocol. A light-weight memory protocol controller is selectively coupled to a Cache Coherent Interconnect for Accelerators (CCIX) port. Over an on-chip interconnect fabric, the light-weight protocol controller receives memory access requests from a processor and, in response, transmits associated memory access requests to an external memory through the CCIX port using only a proper subset of CCIX protocol memory transactions types including non-cacheable transactions and non-snooping transactions. The light-weight memory protocol controller is selectively uncoupled from the CCIX port and a remote coherent slave controller is coupled in its place. The remote coherent slave controller receives memory access requests and, in response, transmits associated memory access requests to a memory module through the CCIX port using cacheable CCIX protocol memory transaction types.
    Type: Application
    Filed: January 16, 2019
    Publication date: July 16, 2020
    Applicants: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Vydhyanathan Kalyanasundharam, Philip Ng, Alexander J. Branover, Kevin M. Lepak
  • Patent number: 10705959
    Abstract: Systems, apparatuses, and methods for maintaining region-based cache directories split between node and memory are disclosed. The system with multiple processing nodes includes cache directories split between the nodes and memory to help manage cache coherency among the nodes' cache subsystems. In order to reduce the number of entries in the cache directories, the cache directories track coherency on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. Each processing node includes a node-based cache directory to track regions which have at least one cache line cached in any cache subsystem in the node. The node-based cache directory includes a reference count field in each entry to track the aggregate number of cache lines that are cached per region. The memory-based cache directory includes entries for regions which have an entry stored in any node-based cache directory of the system.
    Type: Grant
    Filed: August 31, 2018
    Date of Patent: July 7, 2020
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Vydhyanathan Kalyanasundharam, Kevin M. Lepak, Amit P. Apte, Ganesh Balakrishnan
  • Publication number: 20200192842
    Abstract: Bus protocol features are provided for chaining memory access requests on a high speed interconnect bus, allowing for reduced signaling overhead. Multiple memory request messages are received over a bus. A first message has a source identifier, a target identifier, a first address, and first payload data. The first payload data is stored in a memory at locations indicated by the first address. Within a selected second one of the request messages, a chaining indicator is received associated with the first request message and second payload data. The second request message does not include an address. Based on the chaining indicator, a second address for which memory access is requested is calculated based on the first address. The second payload data is stored in the memory at locations indicated by the second address.
    Type: Application
    Filed: December 14, 2018
    Publication date: June 18, 2020
    Applicants: ATI Technologies ULC, Advanced Micro Devices, Inc.
    Inventors: Philip Ng, Vydhyanathan Kalyanasundharam
  • Patent number: 10684965
    Abstract: Systems, apparatuses, and methods for routing traffic between clients and system memory are disclosed. A computing system includes system memory and one or more clients, each capable of generating memory access requests. The computing system also includes a communication fabric for transferring traffic between the clients and the system memory. The fabric includes master units for interfacing with clients and grouping write requests with a same target together. The fabric also includes slave units for interfacing with memory controllers and for sending a single write response when each write request in a group has been serviced. When the master unit receives the single write response for the group, it sends a respective acknowledgment response for each of the multiple write requests in the group to clients that generated the multiple write requests.
    Type: Grant
    Filed: November 8, 2017
    Date of Patent: June 16, 2020
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Vydhyanathan Kalyanasundharam, Amit P. Apte, Chen-Ping Yang
  • Patent number: 10671148
    Abstract: Systems, apparatuses, and methods for performing efficient power management for a multi-node computing system are disclosed. A computing system including multiple nodes utilizes a non-uniform memory access (NUMA) architecture. A first node receives a broadcast probe from a second node. The first node spoofs a miss response for a powered down third node, which prevents the third node from waking up to respond to the broadcast probe. Prior to powering down, the third node flushed its probe filter and caches, and updated its system memory with the received dirty cache lines. The computing system includes a master node for storing interrupt priorities of the multiple cores in the computing system for arbitrated interrupts. The cores store indications of fixed interrupt identifiers for each core in the computing system. Arbitrated and fixed interrupts are handled by cores with point-to-point unicast messages, rather than broadcast messages.
    Type: Grant
    Filed: December 21, 2017
    Date of Patent: June 2, 2020
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Benjamin Tsien, Bryan P. Broussard, Vydhyanathan Kalyanasundharam
  • Patent number: 10613764
    Abstract: Systems, apparatuses, and methods for performing efficient memory accesses for a computing system are disclosed. In various embodiments, a computing system includes a computing resource and a memory controller coupled to a memory device. The computing resource selectively generates a hint that includes a target address of a memory request generated by the processor. The hint is sent outside the primary communication fabric to the memory controller. The hint conditionally triggers a data access in the memory device. When no page in a bank targeted by the hint is open, the memory controller processes the hint by opening a target page of the hint without retrieving data. The memory controller drops the hint if there are other pending requests that target the same page or the target page is already open.
    Type: Grant
    Filed: November 20, 2017
    Date of Patent: April 7, 2020
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Ravindra N. Bhargava, Philip S. Park, Vydhyanathan Kalyanasundharam, James Raymond Magro