Patents by Inventor Vydhyanathan Kalyanasundharam

Vydhyanathan Kalyanasundharam has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

DYNAMIC BUFFER MANAGEMENT IN MULTI-CLIENT TOKEN FLOW CONTROL ROUTERS

Publication number: 20200259747

Abstract: Systems, apparatuses, and methods for dynamic buffer management in multi-client token flow control routers are disclosed. A system includes at least one or more processing units, a memory, and a communication fabric with a plurality of routers coupled to the processing unit(s) and the memory. A router servicing multiple active clients allocates a first number of tokens to each active client. The first number of tokens is less than a second number of tokens needed to saturate the bandwidth of each client to the router. The router also allocates a third number of tokens to a free pool, with tokens from the free pool being dynamically allocated to different clients. The third number of tokens is equal to the difference between the second number of tokens and the first number of tokens. An advantage of this approach is reducing the amount of buffer space needed at the router.

Type: Application

Filed: February 19, 2020

Publication date: August 13, 2020

Inventors: Alan Dodson Smith, Chintan S. Patel, Eric Christopher Morton, Vydhyanathan Kalyanasundharam, Narendra Kamat
LIGHT-WEIGHT MEMORY EXPANSION IN A COHERENT MEMORY SYSTEM

Publication number: 20200226081

Abstract: Systems, methods, and port controller designs employ a light-weight memory protocol. A light-weight memory protocol controller is selectively coupled to a Cache Coherent Interconnect for Accelerators (CCIX) port. Over an on-chip interconnect fabric, the light-weight protocol controller receives memory access requests from a processor and, in response, transmits associated memory access requests to an external memory through the CCIX port using only a proper subset of CCIX protocol memory transactions types including non-cacheable transactions and non-snooping transactions. The light-weight memory protocol controller is selectively uncoupled from the CCIX port and a remote coherent slave controller is coupled in its place. The remote coherent slave controller receives memory access requests and, in response, transmits associated memory access requests to a memory module through the CCIX port using cacheable CCIX protocol memory transaction types.

Type: Application

Filed: January 16, 2019

Publication date: July 16, 2020

Applicants: Advanced Micro Devices, Inc., ATI Technologies ULC

Inventors: Vydhyanathan Kalyanasundharam, Philip Ng, Alexander J. Branover, Kevin M. Lepak
Region based split-directory scheme to adapt to large cache sizes

Patent number: 10705959

Abstract: Systems, apparatuses, and methods for maintaining region-based cache directories split between node and memory are disclosed. The system with multiple processing nodes includes cache directories split between the nodes and memory to help manage cache coherency among the nodes' cache subsystems. In order to reduce the number of entries in the cache directories, the cache directories track coherency on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. Each processing node includes a node-based cache directory to track regions which have at least one cache line cached in any cache subsystem in the node. The node-based cache directory includes a reference count field in each entry to track the aggregate number of cache lines that are cached per region. The memory-based cache directory includes entries for regions which have an entry stored in any node-based cache directory of the system.

Type: Grant

Filed: August 31, 2018

Date of Patent: July 7, 2020

Assignee: Advanced Micro Devices, Inc.

Inventors: Vydhyanathan Kalyanasundharam, Kevin M. Lepak, Amit P. Apte, Ganesh Balakrishnan
MEMORY REQUEST CHAINING ON BUS

Publication number: 20200192842

Abstract: Bus protocol features are provided for chaining memory access requests on a high speed interconnect bus, allowing for reduced signaling overhead. Multiple memory request messages are received over a bus. A first message has a source identifier, a target identifier, a first address, and first payload data. The first payload data is stored in a memory at locations indicated by the first address. Within a selected second one of the request messages, a chaining indicator is received associated with the first request message and second payload data. The second request message does not include an address. Based on the chaining indicator, a second address for which memory access is requested is calculated based on the first address. The second payload data is stored in the memory at locations indicated by the second address.

Type: Application

Filed: December 14, 2018

Publication date: June 18, 2020

Applicants: ATI Technologies ULC, Advanced Micro Devices, Inc.

Inventors: Philip Ng, Vydhyanathan Kalyanasundharam
Method to reduce write responses to improve bandwidth and efficiency

Patent number: 10684965

Abstract: Systems, apparatuses, and methods for routing traffic between clients and system memory are disclosed. A computing system includes system memory and one or more clients, each capable of generating memory access requests. The computing system also includes a communication fabric for transferring traffic between the clients and the system memory. The fabric includes master units for interfacing with clients and grouping write requests with a same target together. The fabric also includes slave units for interfacing with memory controllers and for sending a single write response when each write request in a group has been serviced. When the master unit receives the single write response for the group, it sends a respective acknowledgment response for each of the multiple write requests in the group to clients that generated the multiple write requests.

Type: Grant

Filed: November 8, 2017

Date of Patent: June 16, 2020

Assignee: Advanced Micro Devices, Inc.

Inventors: Vydhyanathan Kalyanasundharam, Amit P. Apte, Chen-Ping Yang
Multi-node system low power management

Patent number: 10671148

Abstract: Systems, apparatuses, and methods for performing efficient power management for a multi-node computing system are disclosed. A computing system including multiple nodes utilizes a non-uniform memory access (NUMA) architecture. A first node receives a broadcast probe from a second node. The first node spoofs a miss response for a powered down third node, which prevents the third node from waking up to respond to the broadcast probe. Prior to powering down, the third node flushed its probe filter and caches, and updated its system memory with the received dirty cache lines. The computing system includes a master node for storing interrupt priorities of the multiple cores in the computing system for arbitrated interrupts. The cores store indications of fixed interrupt identifiers for each core in the computing system. Arbitrated and fixed interrupts are handled by cores with point-to-point unicast messages, rather than broadcast messages.

Type: Grant

Filed: December 21, 2017

Date of Patent: June 2, 2020

Assignee: Advanced Micro Devices, Inc.

Inventors: Benjamin Tsien, Bryan P. Broussard, Vydhyanathan Kalyanasundharam
Speculative hint-triggered activation of pages in memory

Patent number: 10613764

Abstract: Systems, apparatuses, and methods for performing efficient memory accesses for a computing system are disclosed. In various embodiments, a computing system includes a computing resource and a memory controller coupled to a memory device. The computing resource selectively generates a hint that includes a target address of a memory request generated by the processor. The hint is sent outside the primary communication fabric to the memory controller. The hint conditionally triggers a data access in the memory device. When no page in a bank targeted by the hint is open, the memory controller processes the hint by opening a target page of the hint without retrieving data. The memory controller drops the hint if there are other pending requests that target the same page or the target page is already open.

Type: Grant

Filed: November 20, 2017

Date of Patent: April 7, 2020

Assignee: Advanced Micro Devices, Inc.

Inventors: Ravindra N. Bhargava, Philip S. Park, Vydhyanathan Kalyanasundharam, James Raymond Magro
Dynamic buffer management in multi-client token flow control routers

Patent number: 10608943

Abstract: Systems, apparatuses, and methods for dynamic buffer management in multi-client token flow control routers are disclosed. A system includes at least one or more processing units, a memory, and a communication fabric with a plurality of routers coupled to the processing unit(s) and the memory. A router servicing multiple active clients allocates a first number of tokens to each active client. The first number of tokens is less than a second number of tokens needed to saturate the bandwidth of each client to the router. The router also allocates a third number of tokens to a free pool, with tokens from the free pool being dynamically allocated to different clients. The third number of tokens is equal to the difference between the second number of tokens and the first number of tokens. An advantage of this approach is reducing the amount of buffer space needed at the router.

Type: Grant

Filed: October 27, 2017

Date of Patent: March 31, 2020

Assignee: Advanced Micro Devices, Inc.

Inventors: Alan Dodson Smith, Chintan S. Patel, Eric Christopher Morton, Vydhyanathan Kalyanasundharam, Narendra Kamat
MULTICAST IN THE PROBE CHANNEL

Publication number: 20200099993

Abstract: Systems, apparatuses, and methods for processing multi-cast messages are disclosed. A system includes at least one or more processing units, one or more memory controllers, and a communication fabric coupled to the processing unit(s) and the memory controller(s). The communication fabric includes a plurality of crossbars which connect various agents within the system. When a multi-cast message is received by a crossbar, the crossbar extracts a message type indicator and a recipient type indicator from the message. The crossbar uses the message type indicator to determine which set of masks to lookup using the recipient type indicator. Then, the crossbar determines which one or more masks to extract from the selected set of masks based on values of the recipient type indicator. The crossbar combines the one or more masks with a multi-cast route to create a port vector for determining on which ports to forward the multi-cast message.

Type: Application

Filed: September 21, 2018

Publication date: March 26, 2020

Inventors: Vydhyanathan Kalyanasundharam, Joe G. Cruz, Eric Christopher Morton, Alan Dodson Smith
Bandwidth matched scheduler

Patent number: 10601723

Abstract: A computing system uses a memory for storing data, one or more clients for generating network traffic and a communication fabric with network switches. The network switches include centralized storage structures, rather than separate input and output storage structures. The network switches store particular metadata corresponding to received packets in a single, centralized collapsing queue where the age of the packets corresponds to a queue entry position. The payload data of the packets are stored in a separate memory, so the relatively large amount of data is not shifted during the lifetime of the packet in the network switch. The network switches select sparse queue entries in the collapsible queue, deallocate the selected queue entries, and shift remaining allocated queue entries toward a first end of the queue with a delay proportional to the radix of the network switches.

Type: Grant

Filed: April 12, 2018

Date of Patent: March 24, 2020

Assignee: Advanced Micro Devices, Inc.

Inventors: Alan Dodson Smith, Vydhyanathan Kalyanasundharam, Bryan P. Broussard, Greggory D. Donley, Chintan S. Patel
ACCELERATING ACCESSES TO PRIVATE REGIONS IN A REGION-BASED CACHE DIRECTORY SCHEME

Publication number: 20200081844

Abstract: Systems, apparatuses, and methods for accelerating accesses to private regions in a region-based cache directory scheme are disclosed. A system includes multiple processing nodes, one or more memory devices, and one or more region-based cache directories to manage cache coherence among the nodes' cache subsystems. Region-based cache directories track coherence on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. The cache directory entries for regions that are only accessed by a single node are cached locally at the node. Updates to the reference count for these entries are made locally rather than sending updates to the cache directory. When a second node accesses a first node's private region, the region is now considered shared, and the entry for this region is transferred from the first node back to the cache directory.

Type: Application

Filed: September 12, 2018

Publication date: March 12, 2020

Inventors: Vydhyanathan Kalyanasundharam, Amit P. Apte, Ganesh Balakrishnan
REGION BASED SPLIT-DIRECTORY SCHEME TO ADAPT TO LARGE CACHE SIZES

Publication number: 20200073801

Abstract: Systems, apparatuses, and methods for maintaining region-based cache directories split between node and memory are disclosed. The system with multiple processing nodes includes cache directories split between the nodes and memory to help manage cache coherency among the nodes' cache subsystems. In order to reduce the number of entries in the cache directories, the cache directories track coherency on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. Each processing node includes a node-based cache directory to track regions which have at least one cache line cached in any cache subsystem in the node. The node-based cache directory includes a reference count field in each entry to track the aggregate number of cache lines that are cached per region. The memory-based cache directory includes entries for regions which have an entry stored in any node-based cache directory of the system.

Type: Application

Filed: August 31, 2018

Publication date: March 5, 2020

Inventors: Vydhyanathan Kalyanasundharam, Kevin M. Lepak, Amit P. Apte, Ganesh Balakrishnan
PROBE INTERRUPT DELIVERY

Publication number: 20200065275

Abstract: Systems, apparatuses, and methods for routing interrupts on a coherency probe network are disclosed. A computing system includes a plurality of processing nodes, a coherency probe network, and one or more control units. The coherency probe network carries coherency probe messages between coherent agents. Interrupts that are detected by a control unit are converted into messages that are compatible with coherency probe messages and then routed to a target destination via the coherency probe network. Interrupts are generated with a first encoding while coherency probe messages have a second encoding. Cache subsystems determine whether a message received via the coherency probe network is an interrupt message or a coherency probe message based on an encoding embedded in the received message. Interrupt messages are routed to interrupt controller(s) while coherency probe messages are processed in accordance with a coherence probe action field embedded in the message.

Type: Application

Filed: August 24, 2018

Publication date: February 27, 2020

Inventors: Vydhyanathan Kalyanasundharam, Eric Christopher Morton, Bryan P. Broussard, Paul James Moyer, William Louie Walker
LINK LAYER DATA PACKING AND PACKET FLOW CONTROL SCHEME

Publication number: 20200059437

Abstract: Systems, apparatuses, and methods for performing efficient data transfer in a computing system are disclosed. A computing system includes multiple fabric interfaces in clients and a fabric. A packet transmitter in the fabric interface includes multiple queues, each for storing packets of a respective type. The packet transmitter includes multiple queue arbiters, each for selecting a candidate packet from a respective one of the multiple queues. The packet transmitter includes a buffer for storing a link packet, which includes data storage space for storing multiple candidate packets. The packet transmitter selects qualified candidate packets from the multiple queues and inserts these candidate packets into the link packet. The packing arbiter avoids data collisions at the receiver by taking into consideration mismatches between the rate of inserting candidate packets into the link packet and the rate of creating available data storage space in a receiving queue in the receiver.

Type: Application

Filed: August 20, 2018

Publication date: February 20, 2020

Inventors: Greggory D. Donley, Bryan P. Broussard, Vydhyanathan Kalyanasundharam
Method and apparatus for in-band priority adjustment forwarding in a communication fabric

Patent number: 10558591

Abstract: Systems, apparatuses, and methods for implementing priority adjustment forwarding are disclosed. A system includes at least one or more processing units, a memory, and a communication fabric coupled to the processing unit(s) and the memory. The communication fabric includes a plurality of arbitration points. When a client determines that its bandwidth requirements are not being met, the client generates and sends an in-band priority adjustment request to the nearest arbitration point. This arbitration point receives the in-band priority adjustment request and then identifies any pending requests which are buffered at the arbitration point which meet the criteria specified by the in-band priority adjustment request. The arbitration point adjusts the priority of any identified requests, and then the arbitration point forwards the in-band priority adjustment request on the fabric to the next upstream arbitration point which processes the in-band priority adjustment request in the same manner.

Type: Grant

Filed: October 9, 2017

Date of Patent: February 11, 2020

Assignee: Advanced Micro Devices, Inc.

Inventors: Alan Dodson Smith, Eric Christopher Morton, Vydhyanathan Kalyanasundharam, Joe G. Cruz
Tag accelerator for low latency DRAM cache

Patent number: 10545875

Abstract: Systems, apparatuses, and methods for implementing a tag accelerator cache are disclosed. A system includes at least a data cache and a control unit coupled to the data cache via a memory controller. The control unit includes a tag accelerator cache (TAC) for caching tag blocks fetched from the data cache. The data cache is organized such that multiple tags are retrieved in a single access. This allows hiding the tag latency penalty for future accesses to neighboring tags and improves cache bandwidth. When a tag block is fetched from the data cache, the tag block is cached in the TAC. Memory requests received by the control unit first lookup the TAC before being forwarded to the data cache. Due to the presence of spatial locality in applications, the TAC can filter out a large percentage of tag accesses to the data cache, resulting in latency and bandwidth savings.

Type: Grant

Filed: December 27, 2017

Date of Patent: January 28, 2020

Assignee: Advanced Micro Devices, Inc.

Inventors: Vydhyanathan Kalyanasundharam, Kevin M. Lepak, Ganesh Balakrishnan, Ravindra N. Bhargava
Cancel and replay protocol scheme to improve ordered bandwidth

Patent number: 10540316

Abstract: Systems, apparatuses, and methods for implementing a cancel and replay mechanism for ordered requests are disclosed. A system includes at least an ordering master, a memory controller, a coherent slave coupled to the memory controller, and an interconnect fabric coupled to the ordering master and the coherent slave. The ordering master generates a write request which is forwarded to the coherent slave on the path to memory. The coherent slave sends invalidating probes to all processing nodes and then sends an indication that the write request is globally visible to the ordering master when all cached copies of the data targeted by the write request have been invalidated. In response to receiving the globally visible indication, the ordering master starts a timer. If the timer expires before all older requests have become globally visible, then the write request is cancelled and replayed to ensure forward progress in the fabric and avoid a potential deadlock scenario.

Type: Grant

Filed: December 28, 2017

Date of Patent: January 21, 2020

Assignee: Advanced Micro Devices, Inc.

Inventors: Vydhyanathan Kalyanasundharam, Eric Christopher Morton, Chen-Ping Yang, Amit P. Apte, Elizabeth M. Cooper
Cache to cache data transfer acceleration techniques

Patent number: 10503648

Abstract: Systems, apparatuses, and methods for accelerating cache to cache data transfers are disclosed. A system includes at least a plurality of processing nodes and prediction units, an interconnect fabric, and a memory. A first prediction unit is configured to receive memory requests generated by a first processing node as the requests traverse the interconnect fabric on the path to memory. When the first prediction unit receives a memory request, the first prediction unit generates a prediction of whether data targeted by the request is cached by another processing node. The first prediction unit is configured to cause a speculative probe to be sent to a second processing node responsive to predicting that the data targeted by the memory request is cached by the second processing node. The speculative probe accelerates the retrieval of the data from the second processing node if the prediction is correct.

Type: Grant

Filed: December 12, 2017

Date of Patent: December 10, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Vydhyanathan Kalyanasundharam, Amit P. Apte, Ganesh Balakrishnan, Ann Ling, Ravindra N. Bhargava
Load balancing scheme

Patent number: 10491524

Abstract: A system for implementing load balancing schemes includes one or more processing units, a memory, and a communication fabric with a plurality of switches coupled to the processing unit(s) and the memory. A switch of the fabric determines a first number of streams on a first input port that are targeting a first output port. The switch also determines a second number of requestors, from all input ports, that are targeting the first output port. Then, the switch calculates a throttle factor for the first input port by dividing the first number of streams by the second number of streams. The switch applies the throttle factor to regulate bandwidth on the first input port for requestors targeting the first output port. The switch also calculates throttle factors for the other ports and applies the throttle factors when regulating bandwidth on the other ports.

Type: Grant

Filed: November 7, 2017

Date of Patent: November 26, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Alan Dodson Smith, Chintan S. Patel, Eric Christopher Morton, Vydhyanathan Kalyanasundharam, Narendra Kamat
Caching policies for processing units on multiple sockets

Patent number: 10467138

Abstract: A processing system includes a first socket, a second socket, and an interface between the first socket and the second socket. A first memory is associated with the first socket and a second memory is associated with the second socket. The processing system also includes a controller for the first memory. The controller is to receive a first request for a first memory transaction with the second memory and perform the first memory transaction along a path that includes the interface and bypasses at least one second cache associated with the second memory.

Type: Grant

Filed: December 28, 2015

Date of Patent: November 5, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Paul Blinzer, Ali Ibrahim, Benjamin T. Sander, Vydhyanathan Kalyanasundharam

prev 1 2 3 4 5 6 7 8 9 next