Patents by Inventor Vydhyanathan Kalyanasundharam

Vydhyanathan Kalyanasundharam has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10776282
    Abstract: Systems, apparatuses, and methods for implementing a speculative probe mechanism are disclosed. A system includes at least multiple processing nodes, a probe filter, and a coherent slave. The coherent slave includes an early probe cache to cache recent lookups to the probe filter. The early probe cache includes entries for regions of memory, wherein a region includes a plurality of cache lines. The coherent slave performs parallel lookups to the probe filter and the early probe cache responsive to receiving a memory request. An early probe is sent to a first processing node responsive to determining that a lookup to the early probe cache hits on a first entry identifying the first processing node as an owner of a first region targeted by the memory request and responsive to determining that a confidence indicator of the first entry is greater than a threshold.
    Type: Grant
    Filed: December 15, 2017
    Date of Patent: September 15, 2020
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Amit P. Apte, Ganesh Balakrishnan, Vydhyanathan Kalyanasundharam, Kevin M. Lepak
  • Publication number: 20200259747
    Abstract: Systems, apparatuses, and methods for dynamic buffer management in multi-client token flow control routers are disclosed. A system includes at least one or more processing units, a memory, and a communication fabric with a plurality of routers coupled to the processing unit(s) and the memory. A router servicing multiple active clients allocates a first number of tokens to each active client. The first number of tokens is less than a second number of tokens needed to saturate the bandwidth of each client to the router. The router also allocates a third number of tokens to a free pool, with tokens from the free pool being dynamically allocated to different clients. The third number of tokens is equal to the difference between the second number of tokens and the first number of tokens. An advantage of this approach is reducing the amount of buffer space needed at the router.
    Type: Application
    Filed: February 19, 2020
    Publication date: August 13, 2020
    Inventors: Alan Dodson Smith, Chintan S. Patel, Eric Christopher Morton, Vydhyanathan Kalyanasundharam, Narendra Kamat
  • Publication number: 20200226081
    Abstract: Systems, methods, and port controller designs employ a light-weight memory protocol. A light-weight memory protocol controller is selectively coupled to a Cache Coherent Interconnect for Accelerators (CCIX) port. Over an on-chip interconnect fabric, the light-weight protocol controller receives memory access requests from a processor and, in response, transmits associated memory access requests to an external memory through the CCIX port using only a proper subset of CCIX protocol memory transactions types including non-cacheable transactions and non-snooping transactions. The light-weight memory protocol controller is selectively uncoupled from the CCIX port and a remote coherent slave controller is coupled in its place. The remote coherent slave controller receives memory access requests and, in response, transmits associated memory access requests to a memory module through the CCIX port using cacheable CCIX protocol memory transaction types.
    Type: Application
    Filed: January 16, 2019
    Publication date: July 16, 2020
    Applicants: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Vydhyanathan Kalyanasundharam, Philip Ng, Alexander J. Branover, Kevin M. Lepak
  • Patent number: 10705959
    Abstract: Systems, apparatuses, and methods for maintaining region-based cache directories split between node and memory are disclosed. The system with multiple processing nodes includes cache directories split between the nodes and memory to help manage cache coherency among the nodes' cache subsystems. In order to reduce the number of entries in the cache directories, the cache directories track coherency on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. Each processing node includes a node-based cache directory to track regions which have at least one cache line cached in any cache subsystem in the node. The node-based cache directory includes a reference count field in each entry to track the aggregate number of cache lines that are cached per region. The memory-based cache directory includes entries for regions which have an entry stored in any node-based cache directory of the system.
    Type: Grant
    Filed: August 31, 2018
    Date of Patent: July 7, 2020
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Vydhyanathan Kalyanasundharam, Kevin M. Lepak, Amit P. Apte, Ganesh Balakrishnan
  • Publication number: 20200192842
    Abstract: Bus protocol features are provided for chaining memory access requests on a high speed interconnect bus, allowing for reduced signaling overhead. Multiple memory request messages are received over a bus. A first message has a source identifier, a target identifier, a first address, and first payload data. The first payload data is stored in a memory at locations indicated by the first address. Within a selected second one of the request messages, a chaining indicator is received associated with the first request message and second payload data. The second request message does not include an address. Based on the chaining indicator, a second address for which memory access is requested is calculated based on the first address. The second payload data is stored in the memory at locations indicated by the second address.
    Type: Application
    Filed: December 14, 2018
    Publication date: June 18, 2020
    Applicants: ATI Technologies ULC, Advanced Micro Devices, Inc.
    Inventors: Philip Ng, Vydhyanathan Kalyanasundharam
  • Patent number: 10684965
    Abstract: Systems, apparatuses, and methods for routing traffic between clients and system memory are disclosed. A computing system includes system memory and one or more clients, each capable of generating memory access requests. The computing system also includes a communication fabric for transferring traffic between the clients and the system memory. The fabric includes master units for interfacing with clients and grouping write requests with a same target together. The fabric also includes slave units for interfacing with memory controllers and for sending a single write response when each write request in a group has been serviced. When the master unit receives the single write response for the group, it sends a respective acknowledgment response for each of the multiple write requests in the group to clients that generated the multiple write requests.
    Type: Grant
    Filed: November 8, 2017
    Date of Patent: June 16, 2020
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Vydhyanathan Kalyanasundharam, Amit P. Apte, Chen-Ping Yang
  • Patent number: 10671148
    Abstract: Systems, apparatuses, and methods for performing efficient power management for a multi-node computing system are disclosed. A computing system including multiple nodes utilizes a non-uniform memory access (NUMA) architecture. A first node receives a broadcast probe from a second node. The first node spoofs a miss response for a powered down third node, which prevents the third node from waking up to respond to the broadcast probe. Prior to powering down, the third node flushed its probe filter and caches, and updated its system memory with the received dirty cache lines. The computing system includes a master node for storing interrupt priorities of the multiple cores in the computing system for arbitrated interrupts. The cores store indications of fixed interrupt identifiers for each core in the computing system. Arbitrated and fixed interrupts are handled by cores with point-to-point unicast messages, rather than broadcast messages.
    Type: Grant
    Filed: December 21, 2017
    Date of Patent: June 2, 2020
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Benjamin Tsien, Bryan P. Broussard, Vydhyanathan Kalyanasundharam
  • Patent number: 10613764
    Abstract: Systems, apparatuses, and methods for performing efficient memory accesses for a computing system are disclosed. In various embodiments, a computing system includes a computing resource and a memory controller coupled to a memory device. The computing resource selectively generates a hint that includes a target address of a memory request generated by the processor. The hint is sent outside the primary communication fabric to the memory controller. The hint conditionally triggers a data access in the memory device. When no page in a bank targeted by the hint is open, the memory controller processes the hint by opening a target page of the hint without retrieving data. The memory controller drops the hint if there are other pending requests that target the same page or the target page is already open.
    Type: Grant
    Filed: November 20, 2017
    Date of Patent: April 7, 2020
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Ravindra N. Bhargava, Philip S. Park, Vydhyanathan Kalyanasundharam, James Raymond Magro
  • Patent number: 10608943
    Abstract: Systems, apparatuses, and methods for dynamic buffer management in multi-client token flow control routers are disclosed. A system includes at least one or more processing units, a memory, and a communication fabric with a plurality of routers coupled to the processing unit(s) and the memory. A router servicing multiple active clients allocates a first number of tokens to each active client. The first number of tokens is less than a second number of tokens needed to saturate the bandwidth of each client to the router. The router also allocates a third number of tokens to a free pool, with tokens from the free pool being dynamically allocated to different clients. The third number of tokens is equal to the difference between the second number of tokens and the first number of tokens. An advantage of this approach is reducing the amount of buffer space needed at the router.
    Type: Grant
    Filed: October 27, 2017
    Date of Patent: March 31, 2020
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Alan Dodson Smith, Chintan S. Patel, Eric Christopher Morton, Vydhyanathan Kalyanasundharam, Narendra Kamat
  • Publication number: 20200099993
    Abstract: Systems, apparatuses, and methods for processing multi-cast messages are disclosed. A system includes at least one or more processing units, one or more memory controllers, and a communication fabric coupled to the processing unit(s) and the memory controller(s). The communication fabric includes a plurality of crossbars which connect various agents within the system. When a multi-cast message is received by a crossbar, the crossbar extracts a message type indicator and a recipient type indicator from the message. The crossbar uses the message type indicator to determine which set of masks to lookup using the recipient type indicator. Then, the crossbar determines which one or more masks to extract from the selected set of masks based on values of the recipient type indicator. The crossbar combines the one or more masks with a multi-cast route to create a port vector for determining on which ports to forward the multi-cast message.
    Type: Application
    Filed: September 21, 2018
    Publication date: March 26, 2020
    Inventors: Vydhyanathan Kalyanasundharam, Joe G. Cruz, Eric Christopher Morton, Alan Dodson Smith
  • Patent number: 10601723
    Abstract: A computing system uses a memory for storing data, one or more clients for generating network traffic and a communication fabric with network switches. The network switches include centralized storage structures, rather than separate input and output storage structures. The network switches store particular metadata corresponding to received packets in a single, centralized collapsing queue where the age of the packets corresponds to a queue entry position. The payload data of the packets are stored in a separate memory, so the relatively large amount of data is not shifted during the lifetime of the packet in the network switch. The network switches select sparse queue entries in the collapsible queue, deallocate the selected queue entries, and shift remaining allocated queue entries toward a first end of the queue with a delay proportional to the radix of the network switches.
    Type: Grant
    Filed: April 12, 2018
    Date of Patent: March 24, 2020
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Alan Dodson Smith, Vydhyanathan Kalyanasundharam, Bryan P. Broussard, Greggory D. Donley, Chintan S. Patel
  • Publication number: 20200081844
    Abstract: Systems, apparatuses, and methods for accelerating accesses to private regions in a region-based cache directory scheme are disclosed. A system includes multiple processing nodes, one or more memory devices, and one or more region-based cache directories to manage cache coherence among the nodes' cache subsystems. Region-based cache directories track coherence on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. The cache directory entries for regions that are only accessed by a single node are cached locally at the node. Updates to the reference count for these entries are made locally rather than sending updates to the cache directory. When a second node accesses a first node's private region, the region is now considered shared, and the entry for this region is transferred from the first node back to the cache directory.
    Type: Application
    Filed: September 12, 2018
    Publication date: March 12, 2020
    Inventors: Vydhyanathan Kalyanasundharam, Amit P. Apte, Ganesh Balakrishnan
  • Publication number: 20200073801
    Abstract: Systems, apparatuses, and methods for maintaining region-based cache directories split between node and memory are disclosed. The system with multiple processing nodes includes cache directories split between the nodes and memory to help manage cache coherency among the nodes' cache subsystems. In order to reduce the number of entries in the cache directories, the cache directories track coherency on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. Each processing node includes a node-based cache directory to track regions which have at least one cache line cached in any cache subsystem in the node. The node-based cache directory includes a reference count field in each entry to track the aggregate number of cache lines that are cached per region. The memory-based cache directory includes entries for regions which have an entry stored in any node-based cache directory of the system.
    Type: Application
    Filed: August 31, 2018
    Publication date: March 5, 2020
    Inventors: Vydhyanathan Kalyanasundharam, Kevin M. Lepak, Amit P. Apte, Ganesh Balakrishnan
  • Publication number: 20200065275
    Abstract: Systems, apparatuses, and methods for routing interrupts on a coherency probe network are disclosed. A computing system includes a plurality of processing nodes, a coherency probe network, and one or more control units. The coherency probe network carries coherency probe messages between coherent agents. Interrupts that are detected by a control unit are converted into messages that are compatible with coherency probe messages and then routed to a target destination via the coherency probe network. Interrupts are generated with a first encoding while coherency probe messages have a second encoding. Cache subsystems determine whether a message received via the coherency probe network is an interrupt message or a coherency probe message based on an encoding embedded in the received message. Interrupt messages are routed to interrupt controller(s) while coherency probe messages are processed in accordance with a coherence probe action field embedded in the message.
    Type: Application
    Filed: August 24, 2018
    Publication date: February 27, 2020
    Inventors: Vydhyanathan Kalyanasundharam, Eric Christopher Morton, Bryan P. Broussard, Paul James Moyer, William Louie Walker
  • Publication number: 20200059437
    Abstract: Systems, apparatuses, and methods for performing efficient data transfer in a computing system are disclosed. A computing system includes multiple fabric interfaces in clients and a fabric. A packet transmitter in the fabric interface includes multiple queues, each for storing packets of a respective type. The packet transmitter includes multiple queue arbiters, each for selecting a candidate packet from a respective one of the multiple queues. The packet transmitter includes a buffer for storing a link packet, which includes data storage space for storing multiple candidate packets. The packet transmitter selects qualified candidate packets from the multiple queues and inserts these candidate packets into the link packet. The packing arbiter avoids data collisions at the receiver by taking into consideration mismatches between the rate of inserting candidate packets into the link packet and the rate of creating available data storage space in a receiving queue in the receiver.
    Type: Application
    Filed: August 20, 2018
    Publication date: February 20, 2020
    Inventors: Greggory D. Donley, Bryan P. Broussard, Vydhyanathan Kalyanasundharam
  • Patent number: 10558591
    Abstract: Systems, apparatuses, and methods for implementing priority adjustment forwarding are disclosed. A system includes at least one or more processing units, a memory, and a communication fabric coupled to the processing unit(s) and the memory. The communication fabric includes a plurality of arbitration points. When a client determines that its bandwidth requirements are not being met, the client generates and sends an in-band priority adjustment request to the nearest arbitration point. This arbitration point receives the in-band priority adjustment request and then identifies any pending requests which are buffered at the arbitration point which meet the criteria specified by the in-band priority adjustment request. The arbitration point adjusts the priority of any identified requests, and then the arbitration point forwards the in-band priority adjustment request on the fabric to the next upstream arbitration point which processes the in-band priority adjustment request in the same manner.
    Type: Grant
    Filed: October 9, 2017
    Date of Patent: February 11, 2020
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Alan Dodson Smith, Eric Christopher Morton, Vydhyanathan Kalyanasundharam, Joe G. Cruz
  • Patent number: 10545875
    Abstract: Systems, apparatuses, and methods for implementing a tag accelerator cache are disclosed. A system includes at least a data cache and a control unit coupled to the data cache via a memory controller. The control unit includes a tag accelerator cache (TAC) for caching tag blocks fetched from the data cache. The data cache is organized such that multiple tags are retrieved in a single access. This allows hiding the tag latency penalty for future accesses to neighboring tags and improves cache bandwidth. When a tag block is fetched from the data cache, the tag block is cached in the TAC. Memory requests received by the control unit first lookup the TAC before being forwarded to the data cache. Due to the presence of spatial locality in applications, the TAC can filter out a large percentage of tag accesses to the data cache, resulting in latency and bandwidth savings.
    Type: Grant
    Filed: December 27, 2017
    Date of Patent: January 28, 2020
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Vydhyanathan Kalyanasundharam, Kevin M. Lepak, Ganesh Balakrishnan, Ravindra N. Bhargava
  • Patent number: 10540316
    Abstract: Systems, apparatuses, and methods for implementing a cancel and replay mechanism for ordered requests are disclosed. A system includes at least an ordering master, a memory controller, a coherent slave coupled to the memory controller, and an interconnect fabric coupled to the ordering master and the coherent slave. The ordering master generates a write request which is forwarded to the coherent slave on the path to memory. The coherent slave sends invalidating probes to all processing nodes and then sends an indication that the write request is globally visible to the ordering master when all cached copies of the data targeted by the write request have been invalidated. In response to receiving the globally visible indication, the ordering master starts a timer. If the timer expires before all older requests have become globally visible, then the write request is cancelled and replayed to ensure forward progress in the fabric and avoid a potential deadlock scenario.
    Type: Grant
    Filed: December 28, 2017
    Date of Patent: January 21, 2020
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Vydhyanathan Kalyanasundharam, Eric Christopher Morton, Chen-Ping Yang, Amit P. Apte, Elizabeth M. Cooper
  • Patent number: 10503648
    Abstract: Systems, apparatuses, and methods for accelerating cache to cache data transfers are disclosed. A system includes at least a plurality of processing nodes and prediction units, an interconnect fabric, and a memory. A first prediction unit is configured to receive memory requests generated by a first processing node as the requests traverse the interconnect fabric on the path to memory. When the first prediction unit receives a memory request, the first prediction unit generates a prediction of whether data targeted by the request is cached by another processing node. The first prediction unit is configured to cause a speculative probe to be sent to a second processing node responsive to predicting that the data targeted by the memory request is cached by the second processing node. The speculative probe accelerates the retrieval of the data from the second processing node if the prediction is correct.
    Type: Grant
    Filed: December 12, 2017
    Date of Patent: December 10, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Vydhyanathan Kalyanasundharam, Amit P. Apte, Ganesh Balakrishnan, Ann Ling, Ravindra N. Bhargava
  • Patent number: 10491524
    Abstract: A system for implementing load balancing schemes includes one or more processing units, a memory, and a communication fabric with a plurality of switches coupled to the processing unit(s) and the memory. A switch of the fabric determines a first number of streams on a first input port that are targeting a first output port. The switch also determines a second number of requestors, from all input ports, that are targeting the first output port. Then, the switch calculates a throttle factor for the first input port by dividing the first number of streams by the second number of streams. The switch applies the throttle factor to regulate bandwidth on the first input port for requestors targeting the first output port. The switch also calculates throttle factors for the other ports and applies the throttle factors when regulating bandwidth on the other ports.
    Type: Grant
    Filed: November 7, 2017
    Date of Patent: November 26, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Alan Dodson Smith, Chintan S. Patel, Eric Christopher Morton, Vydhyanathan Kalyanasundharam, Narendra Kamat