Patents by Inventor Vydhyanathan Kalyanasundharam

Vydhyanathan Kalyanasundharam has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20190188155
    Abstract: Systems, apparatuses, and methods for implementing a speculative probe mechanism are disclosed. A system includes at least multiple processing nodes, a probe filter, and a coherent slave. The coherent slave includes an early probe cache to cache recent lookups to the probe filter. The early probe cache includes entries for regions of memory, wherein a region includes a plurality of cache lines. The coherent slave performs parallel lookups to the probe filter and the early probe cache responsive to receiving a memory request. An early probe is sent to a first processing node responsive to determining that a lookup to the early probe cache hits on a first entry identifying the first processing node as an owner of a first region targeted by the memory request and responsive to determining that a confidence indicator of the first entry is greater than a threshold.
    Type: Application
    Filed: December 15, 2017
    Publication date: June 20, 2019
    Inventors: Amit P. Apte, Ganesh Balakrishnan, Vydhyanathan Kalyanasundharam, Kevin M. Lepak
  • Publication number: 20190179758
    Abstract: Systems, apparatuses, and methods for accelerating cache to cache data transfers are disclosed. A system includes at least a plurality of processing nodes and prediction units, an interconnect fabric, and a memory. A first prediction unit is configured to receive memory requests generated by a first processing node as the requests traverse the interconnect fabric on the path to memory. When the first prediction unit receives a memory request, the first prediction unit generates a prediction of whether data targeted by the request is cached by another processing node. The first prediction unit is configured to cause a speculative probe to be sent to a second processing node responsive to predicting that the data targeted by the memory request is cached by the second processing node. The speculative probe accelerates the retrieval of the data from the second processing node if the prediction is correct.
    Type: Application
    Filed: December 12, 2017
    Publication date: June 13, 2019
    Inventors: Vydhyanathan Kalyanasundharam, Amit P. Apte, Ganesh Balakrishnan, Ann Ling, Ravindra N. Bhargava
  • Patent number: 10305509
    Abstract: Systems, apparatuses, and methods for compression of frequent data values across narrow links are disclosed. In one embodiment, a system includes a processor, a link interface unit, and a communication link. The link interface unit is configured to receive a data stream for transmission over the communication link, wherein the data stream is generated by the processor. The link interface unit determines if blocks of data of a first size from the data stream match one or more first data patterns and the link interface unit determines if blocks of data of a second size from the data stream match one or more second data patterns. The link interface unit sends, over the communication link, only blocks of data which do not match the first or second data patterns.
    Type: Grant
    Filed: October 16, 2017
    Date of Patent: May 28, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Greggory D. Donley, Vydhyanathan Kalyanasundharam, Bryan P. Broussard
  • Publication number: 20190155516
    Abstract: Systems, apparatuses, and methods for performing efficient memory accesses for a computing system are disclosed. In various embodiments, a computing system includes a computing resource and a memory controller coupled to a memory device. The computing resource selectively generates a hint that includes a target address of a memory request generated by the processor. The hint is sent outside the primary communication fabric to the memory controller. The hint conditionally triggers a data access in the memory device. When no page in a bank targeted by the hint is open, the memory controller processes the hint by opening a target page of the hint without retrieving data. The memory controller drops the hint if there are other pending requests that target the same page or the target page is already open.
    Type: Application
    Filed: November 20, 2017
    Publication date: May 23, 2019
    Inventors: Ravindra N. Bhargava, Philip S. Park, Vydhyanathan Kalyanasundharam, James Raymond Magro
  • Publication number: 20190138465
    Abstract: Systems, apparatuses, and methods for routing traffic between clients and system memory are disclosed. A computing system includes system memory and one or more clients, each capable of generating memory access requests. The computing system also includes a communication fabric for transferring traffic between the clients and the system memory. The fabric includes master units for interfacing with clients and grouping write requests with a same target together. The fabric also includes slave units for interfacing with memory controllers and for sending a single write response when each write request in a group has been serviced. When the master unit receives the single write response for the group, it sends a respective acknowledgment response for each of the multiple write requests in the group to clients that generated the multiple write requests.
    Type: Application
    Filed: November 8, 2017
    Publication date: May 9, 2019
    Inventors: Vydhyanathan Kalyanasundharam, Amit P. Apte, Chen-Ping Yang
  • Publication number: 20190140954
    Abstract: A system for implementing load balancing schemes includes one or more processing units, a memory, and a communication fabric with a plurality of switches coupled to the processing unit(s) and the memory. A switch of the fabric determines a first number of streams on a first input port that are targeting a first output port. The switch also determines a second number of requestors, from all input ports, that are targeting the first output port. Then, the switch calculates a throttle factor for the first input port by dividing the first number of streams by the second number of streams. The switch applies the throttle factor to regulate bandwidth on the first input port for requestors targeting the first output port. The switch also calculates throttle factors for the other ports and applies the throttle factors when regulating bandwidth on the other ports.
    Type: Application
    Filed: November 7, 2017
    Publication date: May 9, 2019
    Inventors: Alan Dodson Smith, Chintan S. Patel, Eric Christopher Morton, Vydhyanathan Kalyanasundharam, Narendra Kamat
  • Publication number: 20190132249
    Abstract: Systems, apparatuses, and methods for dynamic buffer management in multi-client token flow control routers are disclosed. A system includes at least one or more processing units, a memory, and a communication fabric with a plurality of routers coupled to the processing unit(s) and the memory. A router servicing multiple active clients allocates a first number of tokens to each active client. The first number of tokens is less than a second number of tokens needed to saturate the bandwidth of each client to the router. The router also allocates a third number of tokens to a free pool, with tokens from the free pool being dynamically allocated to different clients. The third number of tokens is equal to the difference between the second number of tokens and the first number of tokens. An advantage of this approach is reducing the amount of buffer space needed at the router.
    Type: Application
    Filed: October 27, 2017
    Publication date: May 2, 2019
    Inventors: Alan Dodson Smith, Chintan S. Patel, Eric Christopher Morton, Vydhyanathan Kalyanasundharam, Narendra Kamat
  • Publication number: 20190108143
    Abstract: Systems, apparatuses, and methods for implementing priority adjustment forwarding are disclosed. A system includes at least one or more processing units, a memory, and a communication fabric coupled to the processing unit(s) and the memory. The communication fabric includes a plurality of arbitration points. When a client determines that its bandwidth requirements are not being met, the client generates and sends an in-band priority adjustment request to the nearest arbitration point. This arbitration point receives the in-band priority adjustment request and then identifies any pending requests which are buffered at the arbitration point which meet the criteria specified by the in-band priority adjustment request. The arbitration point adjusts the priority of any identified requests, and then the arbitration point forwards the in-band priority adjustment request on the fabric to the next upstream arbitration point which processes the in-band priority adjustment request in the same manner.
    Type: Application
    Filed: October 9, 2017
    Publication date: April 11, 2019
    Inventors: Alan Dodson Smith, Eric Christopher Morton, Vydhyanathan Kalyanasundharam, Joe G. Cruz
  • Patent number: 10248564
    Abstract: A system and method for network traffic management between multiple nodes are described. A computing system includes multiple nodes connected to one another. When a home node determines a number of nodes requesting read access for a given data block assigned to the home node exceeds a threshold and a copy of the given data block is already stored at a first node of the multiple nodes in the system, the home node sends a command to the first node. The command directs the first node to forward a copy of the given data block to the home node. The home node then maintains a copy of the given data block and forwards copies of the given data block to other requesting nodes until the home node detects a write request or a lock release request for the given data block.
    Type: Grant
    Filed: June 24, 2016
    Date of Patent: April 2, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Vydhyanathan Kalyanasundharam, Eric Christopher Morton, Amit P. Apte, Elizabeth M. Cooper
  • Patent number: 10223280
    Abstract: A system including a gasket communicatively coupled between a unified northbridge (UNB) having a cache coherent interconnect (CCI) interface and a processor having an Advanced eXtensible Interface (AXI) coherency extension (ACE). The gasket is configured to translate requests from the processor that include ACE commands into equivalent CCI commands, wherein each request from the processor maps onto a specific CCI request type. The gasket is further configured to translate ACE tags into CCI tags. The gasket is further configured to translate CCI encoded probes from a system resource interface (SRI) into equivalent ACE snoop transactions. The gasket is further configured to translate the memory map to inter-operate with a UNB/coherent HyperTransport (cHT) environment. The gasket is further configured to receive a barrier transaction that is used to provide ordering for transactions.
    Type: Grant
    Filed: July 2, 2018
    Date of Patent: March 5, 2019
    Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Vydhyanathan Kalyanasundharam, Yaniv Adiri, Philip Ng, Maggie Chan, Vincent Cueva, Anthony Asaro, Jimshed Mirza, Greggory D. Donley, Bryan Broussard, Benjamin Tsien
  • Publication number: 20180307619
    Abstract: A system including a gasket communicatively coupled between a unified northbridge (UNB) having a cache coherent interconnect (CCI) interface and a processor having an Advanced eXtensible Interface (AXI) coherency extension (ACE). The gasket is configured to translate requests from the processor that include ACE commands into equivalent CCI commands, wherein each request from the processor maps onto a specific CCI request type. The gasket is further configured to translate ACE tags into CCI tags. The gasket is further configured to translate CCI encoded probes from a system resource interface (SRI) into equivalent ACE snoop transactions. The gasket is further configured to translate the memory map to inter-operate with a UNB/coherent HyperTransport (cHT) environment. The gasket is further configured to receive a barrier transaction that is used to provide ordering for transactions.
    Type: Application
    Filed: July 2, 2018
    Publication date: October 25, 2018
    Applicants: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Vydhyanathan Kalyanasundharam, Philip Ng, Maggie Chan, Vincent Cueva, Anthony Asaro, Jimshed Mirza, Greggory D. Donley, Bryan Broussard, Benjamin Tsien, Yaniv Adiri
  • Patent number: 10042576
    Abstract: A method and apparatus of compressing addresses for transmission includes receiving a transaction at a first device from a source that includes a memory address request for a memory location on a second device. It is determined if a first part of the memory address is stored in a cache located on the first device. If the first part of the memory address is not stored in the cache, the first part of the memory address is stored in the cache and the entire memory address and information relating to the storage of the first part is transmitted to the second device. If the first part of the memory address is stored in the cache, only a second part of the memory address and an identifier that indicates a way in which the first part of the address is stored in the cache is transmitted to the second device.
    Type: Grant
    Filed: November 8, 2016
    Date of Patent: August 7, 2018
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Vydhyanathan Kalyanasundharam, Greggory D. Donley
  • Patent number: 10025721
    Abstract: The present invention provides for page table access and dirty bit management in hardware via a new atomic test[0] and OR and Mask. The present invention also provides for a gasket that enables ACE to CCI translations. This gasket further provides request translation between ACE and CCI, deadlock avoidance for victim and probe collision, ARM barrier handling, and power management interactions. The present invention also provides a solution for ARM victim/probe collision handling which deadlocks the unified northbridge. These solutions includes a dedicated writeback virtual channel, probes for IO requests using 4-hop protocol, and a WrBack Reorder Ability in MCT where victims update older requests with data as they pass the requests.
    Type: Grant
    Filed: October 24, 2014
    Date of Patent: July 17, 2018
    Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Vydhyanathan Kalyanasundharam, Philip Ng, Maggie Chan, Vincent Cueva, Anthony Asaro, Jimshed Mirza, Greggory D. Donley, Bryan Broussard, Benjamin Tsien, Yaniv Adiri
  • Publication number: 20180167082
    Abstract: Systems, apparatuses, and methods for compression of frequent data values across narrow links are disclosed. In one embodiment, a system includes a processor, a link interface unit, and a communication link. The link interface unit is configured to receive a data stream for transmission over the communication link, wherein the data stream is generated by the processor. The link interface unit determines if blocks of data of a first size from the data stream match one or more first data patterns and the link interface unit determines if blocks of data of a second size from the data stream match one or more second data patterns. The link interface unit sends, over the communication link, only blocks of data which do not match the first or second data patterns.
    Type: Application
    Filed: October 16, 2017
    Publication date: June 14, 2018
    Inventors: Greggory D. Donley, Vydhyanathan Kalyanasundharam, Bryan P. Broussard
  • Publication number: 20180165202
    Abstract: A data processing system includes a processor and a cache controller coupled to the processor, and adapted to be coupled to a memory. The cache controller uses the memory to form a pseudo direct mapped cache having a plurality of groups of pages. The memory forms a first number of selected pages, including a first page for storing a plurality of sets of tags and a plurality of remaining pages for storing data. Each tag, of the plurality of sets of tags, stores tags for respective entries in a corresponding one of the plurality of remaining pages.
    Type: Application
    Filed: December 12, 2016
    Publication date: June 14, 2018
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Ganesh Balakrishnan, Vydhyanathan Kalyanasundharam, Kevin M. Lepak
  • Publication number: 20180052631
    Abstract: A method and apparatus of compressing addresses for transmission includes receiving a transaction at a first device from a source that includes a memory address request for a memory location on a second device. It is determined if a first part of the memory address is stored in a cache located on the first device. If the first part of the memory address is not stored in the cache, the first part of the memory address is stored in the cache and the entire memory address and information relating to the storage of the first part is transmitted to the second device. If the first part of the memory address is stored in the cache, only a second part of the memory address and an identifier that indicates a way in which the first part of the address is stored in the cache is transmitted to the second device.
    Type: Application
    Filed: November 8, 2016
    Publication date: February 22, 2018
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Vydhyanathan Kalyanasundharam, Greggory D. Donley
  • Publication number: 20170371787
    Abstract: A system and method for network traffic management between multiple nodes are described. A computing system includes multiple nodes connected to one another. When a home node determines a number of nodes requesting read access for a given data block assigned to the home node exceeds a threshold and a copy of the given data block is already stored at a first node of the multiple nodes in the system, the home node sends a command to the first node. The command directs the first node to forward a copy of the given data block to the home node. The home node then maintains a copy of the given data block and forwards copies of the given data block to other requesting nodes until the home node detects a write request or a lock release request for the given data block.
    Type: Application
    Filed: June 24, 2016
    Publication date: December 28, 2017
    Inventors: Vydhyanathan Kalyanasundharam, Eric Christopher Morton, Amit P. Apte, Elizabeth M. Cooper
  • Patent number: 9793919
    Abstract: Systems, apparatuses, and methods for compression of frequent data values across narrow links are disclosed. In one embodiment, a system includes a processor, a link interface unit, and a communication link. The link interface unit is configured to receive a data stream for transmission over the communication link, wherein the data stream is generated by the processor. The link interface unit determines if blocks of data of a first size from the data stream match one or more first data patterns and the link interface unit determines if blocks of data of a second size from the data stream match one or more second data patterns. The link interface unit sends, over the communication link, only blocks of data which do not match the first or second data patterns.
    Type: Grant
    Filed: December 8, 2016
    Date of Patent: October 17, 2017
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Greggory D. Donley, Vydhyanathan Kalyanasundharam, Bryan P. Broussard
  • Patent number: 9697146
    Abstract: A processor uses a token scheme to govern the maximum number of memory access requests each of a set of processor cores can have pending at a northbridge of the processor. To implement the scheme, the northbridge issues a minimum number of tokens to each of the processor cores and keeps a number of tokens in reserve. In response to determining that a given processor core is generating a high level of memory access activity the northbridge issues some of the reserve tokens to the processor core. The processor core returns the reserve tokens to the northbridge in response to determining that it is not likely to continue to generate the high number of memory access requests, so that the reserve tokens are available to issue to another processor core.
    Type: Grant
    Filed: December 27, 2012
    Date of Patent: July 4, 2017
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Douglas R. Williams, Vydhyanathan Kalyanasundharam, Marius Evers, Michael K. Fertig
  • Publication number: 20170185514
    Abstract: A processing system includes a first socket, a second socket, and an interface between the first socket and the second socket. A first memory is associated with the first socket and a second memory is associated with the second socket. The processing system also includes a controller for the first memory. The controller is to receive a first request for a first memory transaction with the second memory and perform the first memory transaction along a path that includes the interface and bypasses at least one second cache associated with the second memory.
    Type: Application
    Filed: December 28, 2015
    Publication date: June 29, 2017
    Inventors: Paul Blinzer, Ali Ibrahim, Benjamin T. Sander, Vydhyanathan Kalyanasundharam