Patents by Inventor Vydhyanathan Kalyanasundharam
Vydhyanathan Kalyanasundharam has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11210246Abstract: Systems, apparatuses, and methods for routing interrupts on a coherency probe network are disclosed. A computing system includes a plurality of processing nodes, a coherency probe network, and one or more control units. The coherency probe network carries coherency probe messages between coherent agents. Interrupts that are detected by a control unit are converted into messages that are compatible with coherency probe messages and then routed to a target destination via the coherency probe network. Interrupts are generated with a first encoding while coherency probe messages have a second encoding. Cache subsystems determine whether a message received via the coherency probe network is an interrupt message or a coherency probe message based on an encoding embedded in the received message. Interrupt messages are routed to interrupt controller(s) while coherency probe messages are processed in accordance with a coherence probe action field embedded in the message.Type: GrantFiled: August 24, 2018Date of Patent: December 28, 2021Assignee: Advanced Micro Devices, Inc.Inventors: Vydhyanathan Kalyanasundharam, Eric Christopher Morton, Bryan P. Broussard, Paul James Moyer, William Louie Walker
-
Patent number: 11210248Abstract: Systems, devices, and methods for direct memory access. A system direct memory access (SDMA) device disposed on a processor die sends a message which includes physical addresses of a source buffer and a destination buffer, and a size of a data transfer, to a data fabric device. The data fabric device sends an instruction which includes the physical addresses of the source and destination buffer, and the size of the data transfer, to first agent devices. Each of the first agent devices reads a portion of the source buffer from a memory device at the physical address of the source buffer. Each of the first agent devices sends the portion of the source buffer to one of second agent devices. Each of the second agent devices writes the portion of the source buffer to the destination buffer.Type: GrantFiled: December 20, 2019Date of Patent: December 28, 2021Assignee: Advanced Micro Devices, Inc.Inventors: Vydhyanathan Kalyanasundharam, Narendra Kamat
-
Patent number: 11196657Abstract: A system for automatically discovering fabric topology includes at least one or more processing units, one or more memory devices, a security processor, and a communication fabric with an unknown topology coupled to the processing unit(s), memory device(s), and security processor. The security processor queries each component of the fabric to retrieve various attributes associated with the component. The security processor utilizes the retrieved attributes to create a network graph of the topology of the components within the fabric. The security processor generates routing tables from the network graph and programs the routing tables into the fabric components. Then, the fabric components utilize the routing tables to determine how to route incoming packets.Type: GrantFiled: December 21, 2017Date of Patent: December 7, 2021Assignee: Advanced Micro Devices, Inc.Inventors: Vydhyanathan Kalyanasundharam, Eric Christopher Morton, Alan Dodson Smith, Joe G. Cruz
-
Patent number: 11119926Abstract: Systems, apparatuses, and methods for maintaining a region-based cache directory are disclosed. A system includes multiple processing nodes, with each processing node including a cache subsystem. The system also includes a cache directory to help manage cache coherency among the different cache subsystems of the system. In order to reduce the number of entries in the cache directory, the cache directory tracks coherency on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. Accordingly, the system includes a region-based cache directory to track regions which have at least one cache line cached in any cache subsystem in the system. The cache directory includes a reference count in each entry to track the aggregate number of cache lines that are cached per region. If a reference count of a given entry goes to zero, the cache directory reclaims the given entry.Type: GrantFiled: December 18, 2017Date of Patent: September 14, 2021Assignee: Advanced Micro Devices, Inc.Inventors: Vydhyanathan Kalyanasundharam, Kevin M. Lepak, Amit P. Apte, Ganesh Balakrishnan, Eric Christopher Morton, Elizabeth M. Cooper, Ravindra N. Bhargava
-
Publication number: 20210191890Abstract: Systems, devices, and methods for direct memory access. A system direct memory access (SDMA) device disposed on a processor die sends a message which includes physical addresses of a source buffer and a destination buffer, and a size of a data transfer, to a data fabric device. The data fabric device sends an instruction which includes the physical addresses of the source and destination buffer, and the size of the data transfer, to first agent devices. Each of the first agent devices reads a portion of the source buffer from a memory device at the physical address of the source buffer. Each of the first agent devices sends the portion of the source buffer to one of second agent devices. Each of the second agent devices writes the portion of the source buffer to the destination buffer.Type: ApplicationFiled: December 20, 2019Publication date: June 24, 2021Applicant: Advanced Micro Devices, Inc.Inventors: Vydhyanathan Kalyanasundharam, Narendra Kamat
-
Publication number: 20210194827Abstract: Systems, apparatuses, and methods for efficient data transfer in a computing system are disclosed. A source generates packets to send across a communication fabric (or fabric) to a destination. The source generates partition enable signals for the partitions of payload data. The source negates an enable signal for a particular partition when the source determines the packet type indicates the particular partition should have an associated asserted enable signal in the packet, but the source also determines the particular partition includes a particular data pattern. Routing components of the fabric disable clock signals to storage elements assigned to store the particular partition. The destination inserts the particular data pattern for the particular partition in the payload data.Type: ApplicationFiled: December 23, 2019Publication date: June 24, 2021Inventors: Greggory D. Donley, Vydhyanathan Kalyanasundharam, Mark A. Silla, Ashwin Chincholi
-
Publication number: 20210191865Abstract: A coherency management device receives requests to read data from or write data to an address in a main memory. On a write, if the data includes zero data, an entry corresponding to the memory address is created in a cache directory if it does not already exist, is set to an invalid state, and indicates that the data includes zero data. The zero data is not written to main memory or a cache. On a read, the cache directory is checked for an entry corresponding to the memory address. If the entry exists in the cache directory, is invalid, and includes an indication that data corresponding to the memory address includes zero data, the coherency management device returns zero data in response to the request without fetching the data from main memory or a cache.Type: ApplicationFiled: December 20, 2019Publication date: June 24, 2021Applicant: Advanced Micro Devices, Inc.Inventors: Vydhyanathan Kalyanasundharam, Amit P. Apte
-
Patent number: 11036658Abstract: Systems, methods, and port controller designs employ a light-weight memory protocol. A light-weight memory protocol controller is selectively coupled to a Cache Coherent Interconnect for Accelerators (CCIX) port. Over an on-chip interconnect fabric, the light-weight protocol controller receives memory access requests from a processor and, in response, transmits associated memory access requests to an external memory through the CCIX port using only a proper subset of CCIX protocol memory transactions types including non-cacheable transactions and non-snooping transactions. The light-weight memory protocol controller is selectively uncoupled from the CCIX port and a remote coherent slave controller is coupled in its place. The remote coherent slave controller receives memory access requests and, in response, transmits associated memory access requests to a memory module through the CCIX port using cacheable CCIX protocol memory transaction types.Type: GrantFiled: January 16, 2019Date of Patent: June 15, 2021Assignees: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Vydhyanathan Kalyanasundharam, Philip Ng, Alexander J Branover, Kevin M. Lepak
-
Publication number: 20210064545Abstract: Systems, apparatuses, and methods for implementing a speculative probe mechanism are disclosed. A system includes at least multiple processing nodes, a probe filter, and a coherent slave. The coherent slave includes an early probe cache to cache recent lookups to the probe filter. The early probe cache includes entries for regions of memory, wherein a region includes a plurality of cache lines. The coherent slave performs parallel lookups to the probe filter and the early probe cache responsive to receiving a memory request. An early probe is sent to a first processing node responsive to determining that a lookup to the early probe cache hits on a first entry identifying the first processing node as an owner of a first region targeted by the memory request and responsive to determining that a confidence indicator of the first entry is greater than a threshold.Type: ApplicationFiled: September 14, 2020Publication date: March 4, 2021Inventors: Amit P. Apte, Ganesh Balakrishnan, Vydhyanathan Kalyanasundharam, Kevin M. Lepak
-
Patent number: 10922237Abstract: Systems, apparatuses, and methods for accelerating accesses to private regions in a region-based cache directory scheme are disclosed. A system includes multiple processing nodes, one or more memory devices, and one or more region-based cache directories to manage cache coherence among the nodes' cache subsystems. Region-based cache directories track coherence on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. The cache directory entries for regions that are only accessed by a single node are cached locally at the node. Updates to the reference count for these entries are made locally rather than sending updates to the cache directory. When a second node accesses a first node's private region, the region is now considered shared, and the entry for this region is transferred from the first node back to the cache directory.Type: GrantFiled: September 12, 2018Date of Patent: February 16, 2021Assignee: Advanced Micro Devices, Inc.Inventors: Vydhyanathan Kalyanasundharam, Amit P. Apte, Ganesh Balakrishnan
-
Publication number: 20200401321Abstract: Systems, apparatuses, and methods for performing efficient memory accesses for a computing system are disclosed. In various embodiments, a computing system includes a computing resource and a memory controller coupled to a memory device. The computing resource selectively generates a hint that includes a target address of a memory request generated by the processor. The hint is sent outside the primary communication fabric to the memory controller. The hint conditionally triggers a data access in the memory device. When no page in a bank targeted by the hint is open, the memory controller processes the hint by opening a target page of the hint without retrieving data. The memory controller drops the hint if there are other pending requests that target the same page or the target page is already open.Type: ApplicationFiled: April 6, 2020Publication date: December 24, 2020Inventors: Ravindra N. Bhargava, Philip S. Park, Vydhyanathan Kalyanasundharam, James Raymond Magro
-
Publication number: 20200401519Abstract: Systems, apparatuses, and methods for maintaining region-based cache directories split between node and memory are disclosed. The system with multiple processing nodes includes cache directories split between the nodes and memory to help manage cache coherency among the nodes' cache subsystems. In order to reduce the number of entries in the cache directories, the cache directories track coherency on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. Each processing node includes a node-based cache directory to track regions which have at least one cache line cached in any cache subsystem in the node. The node-based cache directory includes a reference count field in each entry to track the aggregate number of cache lines that are cached per region. The memory-based cache directory includes entries for regions which have an entry stored in any node-based cache directory of the system.Type: ApplicationFiled: July 2, 2020Publication date: December 24, 2020Inventors: Vydhyanathan Kalyanasundharam, Kevin M. Lepak, Amit P. Apte, Ganesh Balakrishnan
-
Patent number: 10776282Abstract: Systems, apparatuses, and methods for implementing a speculative probe mechanism are disclosed. A system includes at least multiple processing nodes, a probe filter, and a coherent slave. The coherent slave includes an early probe cache to cache recent lookups to the probe filter. The early probe cache includes entries for regions of memory, wherein a region includes a plurality of cache lines. The coherent slave performs parallel lookups to the probe filter and the early probe cache responsive to receiving a memory request. An early probe is sent to a first processing node responsive to determining that a lookup to the early probe cache hits on a first entry identifying the first processing node as an owner of a first region targeted by the memory request and responsive to determining that a confidence indicator of the first entry is greater than a threshold.Type: GrantFiled: December 15, 2017Date of Patent: September 15, 2020Assignee: Advanced Micro Devices, Inc.Inventors: Amit P. Apte, Ganesh Balakrishnan, Vydhyanathan Kalyanasundharam, Kevin M. Lepak
-
Publication number: 20200259747Abstract: Systems, apparatuses, and methods for dynamic buffer management in multi-client token flow control routers are disclosed. A system includes at least one or more processing units, a memory, and a communication fabric with a plurality of routers coupled to the processing unit(s) and the memory. A router servicing multiple active clients allocates a first number of tokens to each active client. The first number of tokens is less than a second number of tokens needed to saturate the bandwidth of each client to the router. The router also allocates a third number of tokens to a free pool, with tokens from the free pool being dynamically allocated to different clients. The third number of tokens is equal to the difference between the second number of tokens and the first number of tokens. An advantage of this approach is reducing the amount of buffer space needed at the router.Type: ApplicationFiled: February 19, 2020Publication date: August 13, 2020Inventors: Alan Dodson Smith, Chintan S. Patel, Eric Christopher Morton, Vydhyanathan Kalyanasundharam, Narendra Kamat
-
Publication number: 20200226081Abstract: Systems, methods, and port controller designs employ a light-weight memory protocol. A light-weight memory protocol controller is selectively coupled to a Cache Coherent Interconnect for Accelerators (CCIX) port. Over an on-chip interconnect fabric, the light-weight protocol controller receives memory access requests from a processor and, in response, transmits associated memory access requests to an external memory through the CCIX port using only a proper subset of CCIX protocol memory transactions types including non-cacheable transactions and non-snooping transactions. The light-weight memory protocol controller is selectively uncoupled from the CCIX port and a remote coherent slave controller is coupled in its place. The remote coherent slave controller receives memory access requests and, in response, transmits associated memory access requests to a memory module through the CCIX port using cacheable CCIX protocol memory transaction types.Type: ApplicationFiled: January 16, 2019Publication date: July 16, 2020Applicants: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Vydhyanathan Kalyanasundharam, Philip Ng, Alexander J. Branover, Kevin M. Lepak
-
Patent number: 10705959Abstract: Systems, apparatuses, and methods for maintaining region-based cache directories split between node and memory are disclosed. The system with multiple processing nodes includes cache directories split between the nodes and memory to help manage cache coherency among the nodes' cache subsystems. In order to reduce the number of entries in the cache directories, the cache directories track coherency on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. Each processing node includes a node-based cache directory to track regions which have at least one cache line cached in any cache subsystem in the node. The node-based cache directory includes a reference count field in each entry to track the aggregate number of cache lines that are cached per region. The memory-based cache directory includes entries for regions which have an entry stored in any node-based cache directory of the system.Type: GrantFiled: August 31, 2018Date of Patent: July 7, 2020Assignee: Advanced Micro Devices, Inc.Inventors: Vydhyanathan Kalyanasundharam, Kevin M. Lepak, Amit P. Apte, Ganesh Balakrishnan
-
Publication number: 20200192842Abstract: Bus protocol features are provided for chaining memory access requests on a high speed interconnect bus, allowing for reduced signaling overhead. Multiple memory request messages are received over a bus. A first message has a source identifier, a target identifier, a first address, and first payload data. The first payload data is stored in a memory at locations indicated by the first address. Within a selected second one of the request messages, a chaining indicator is received associated with the first request message and second payload data. The second request message does not include an address. Based on the chaining indicator, a second address for which memory access is requested is calculated based on the first address. The second payload data is stored in the memory at locations indicated by the second address.Type: ApplicationFiled: December 14, 2018Publication date: June 18, 2020Applicants: ATI Technologies ULC, Advanced Micro Devices, Inc.Inventors: Philip Ng, Vydhyanathan Kalyanasundharam
-
Patent number: 10684965Abstract: Systems, apparatuses, and methods for routing traffic between clients and system memory are disclosed. A computing system includes system memory and one or more clients, each capable of generating memory access requests. The computing system also includes a communication fabric for transferring traffic between the clients and the system memory. The fabric includes master units for interfacing with clients and grouping write requests with a same target together. The fabric also includes slave units for interfacing with memory controllers and for sending a single write response when each write request in a group has been serviced. When the master unit receives the single write response for the group, it sends a respective acknowledgment response for each of the multiple write requests in the group to clients that generated the multiple write requests.Type: GrantFiled: November 8, 2017Date of Patent: June 16, 2020Assignee: Advanced Micro Devices, Inc.Inventors: Vydhyanathan Kalyanasundharam, Amit P. Apte, Chen-Ping Yang
-
Patent number: 10671148Abstract: Systems, apparatuses, and methods for performing efficient power management for a multi-node computing system are disclosed. A computing system including multiple nodes utilizes a non-uniform memory access (NUMA) architecture. A first node receives a broadcast probe from a second node. The first node spoofs a miss response for a powered down third node, which prevents the third node from waking up to respond to the broadcast probe. Prior to powering down, the third node flushed its probe filter and caches, and updated its system memory with the received dirty cache lines. The computing system includes a master node for storing interrupt priorities of the multiple cores in the computing system for arbitrated interrupts. The cores store indications of fixed interrupt identifiers for each core in the computing system. Arbitrated and fixed interrupts are handled by cores with point-to-point unicast messages, rather than broadcast messages.Type: GrantFiled: December 21, 2017Date of Patent: June 2, 2020Assignee: Advanced Micro Devices, Inc.Inventors: Benjamin Tsien, Bryan P. Broussard, Vydhyanathan Kalyanasundharam
-
Patent number: 10613764Abstract: Systems, apparatuses, and methods for performing efficient memory accesses for a computing system are disclosed. In various embodiments, a computing system includes a computing resource and a memory controller coupled to a memory device. The computing resource selectively generates a hint that includes a target address of a memory request generated by the processor. The hint is sent outside the primary communication fabric to the memory controller. The hint conditionally triggers a data access in the memory device. When no page in a bank targeted by the hint is open, the memory controller processes the hint by opening a target page of the hint without retrieving data. The memory controller drops the hint if there are other pending requests that target the same page or the target page is already open.Type: GrantFiled: November 20, 2017Date of Patent: April 7, 2020Assignee: Advanced Micro Devices, Inc.Inventors: Ravindra N. Bhargava, Philip S. Park, Vydhyanathan Kalyanasundharam, James Raymond Magro