Patents by Inventor Akhil ARUNKUMAR

Akhil ARUNKUMAR has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12182028
    Abstract: A transformer compute apparatus and method of operation therefor. The apparatus receives matrix inputs in a first format and generates projection tokens from these inputs. Among others, the apparatus includes a first cache device configured for processing first projection tokens and a second cache device configured for processing second projection tokens. The first cache device stores the first projection tokens in a first cache region and stores these tokens converted to a second format in a second cache region. The second cache device stores the second projection tokens converted to the second format in a first cache region and stores the converted second projection tokens after being transposed. Then, a compute device performs various matrix computations with the converted first projection tokens and transposed second projection tokens. Re-processing data and expensive padding and de-padding operations for transposed storage and byte alignment can be avoided using this caching process.
    Type: Grant
    Filed: September 28, 2023
    Date of Patent: December 31, 2024
    Assignee: d-MATRIX CORPORATION
    Inventors: Akhil Arunkumar, Satyam Srivastava, Aayush Ankit
  • Patent number: 12153524
    Abstract: A disclosed computing device includes at least one prefetcher and a processing device communicatively coupled to the prefetcher. The processing device is configured to detect a throttling instruction that indicates a start of a throttling region. The computing device is further configured to prevent the prefetcher from being trained on one or more memory instructions included in the throttling region in response to the throttling instruction. Various other apparatuses, systems, and methods are also disclosed.
    Type: Grant
    Filed: September 30, 2022
    Date of Patent: November 26, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventors: John Kalamatianos, Marko Scrbak, Gabriel H. Loh, Akhil Arunkumar
  • Patent number: 11960399
    Abstract: Methods, systems, and devices maintain state information in a shadow tag memory for a plurality of cachelines in each of a plurality of private caches, with each of the private caches being associated with a corresponding one of multiple processing cores. One or more cache probes are generated based on a write operation associated with one or more cachelines of the plurality of cachelines, such that each of the cache probes is associated with cachelines of a particular private cache of the multiple private caches, the particular private cache being associated with an indicated processing core. Transmission of the cache probes to the particular private cache is prevented until, responsive to a scope acquire operation from the indicated processing core, the cache probes are released for transmission to the respectively associated cachelines in the particular private cache.
    Type: Grant
    Filed: December 21, 2021
    Date of Patent: April 16, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Akhil Arunkumar, Tarun Nakra, Maxim V. Kazakov, Milind N. Nemlekar
  • Publication number: 20240111676
    Abstract: A disclosed computing device includes at least one prefetcher and a processing device communicatively coupled to the prefetcher. The processing device is configured to detect a throttling instruction that indicates a start of a throttling region. The computing device is further configured to prevent the prefetcher from being trained on one or more memory instructions included in the throttling region in response to the throttling instruction. Various other apparatuses, systems, and methods are also disclosed.
    Type: Application
    Filed: September 30, 2022
    Publication date: April 4, 2024
    Applicant: Advanced Micro Devices, Inc.
    Inventors: John Kalamatianos, Marko Scrbak, Gabriel H. Loh, Akhil Arunkumar
  • Publication number: 20240111677
    Abstract: A method for performing prefetching operations is disclosed. The method includes storing a recorded access pattern indicating a set of accesses for a region; in response to an access within the region, fetching the recorded access pattern; and performing prefetching based on the access pattern.
    Type: Application
    Filed: September 30, 2022
    Publication date: April 4, 2024
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Gabriel H. Loh, Marko Scrbak, Akhil Arunkumar, John Kalamatianos
  • Publication number: 20240090181
    Abstract: An immersion cooling server system with AI accelerator apparatuses using in-memory compute chiplet devices. This system includes one or more immersion tanks with heat transfer fluid and configured with at least a condenser device. A plurality of AI accelerator servers is immersed in the heat transfer fluid in a bottom portion of the tanks and is configured to process transformer workloads while cooled by the immersion cooling configuration. Each of the servers includes a plurality of multiprocessors each having at least a first server central processing unit (CPU) and a second server CPU, both of which are coupled to a plurality of switch devices. Each switch device is coupled to a plurality of AI accelerator apparatuses. The apparatus includes one or more chiplets, each of which includes a plurality of digital in-memory compute (DIMC) devices configured to perform high throughput matrix computations for transformer based models.
    Type: Application
    Filed: November 16, 2023
    Publication date: March 14, 2024
    Inventors: Jayaprakash BALACHANDRAN, Akhil ARUNKUMAR, Aayush ANKIT, Nithesh Kurella, Sudeep Bhoja
  • Publication number: 20240037379
    Abstract: A server system with AI accelerator apparatuses using in-memory compute chiplet devices. The system includes a plurality of multiprocessors each having at least a first server central processing unit (CPU) and a second server CPU, both of which are coupled to a plurality of switch devices. Each switch device is coupled to a plurality of AI accelerator apparatuses. The apparatus includes one or more chiplets, each of which includes a plurality of tiles. Each tile includes a plurality of slices, a CPU, and a hardware dispatch device. Each slice can include a digital in-memory compute (DIMC) device configured to perform high throughput computations. In particular, the DIMC device can be configured to accelerate the computations of attention functions for transformer-based models (a.k.a. transformers) applied to machine learning applications. A single input multiple data (SIMD) device configured to further process the DIMC output and compute softmax functions for the attention functions.
    Type: Application
    Filed: October 13, 2023
    Publication date: February 1, 2024
    Inventors: Jayaprakash BALACHANDRAN, Akhil ARUNKUMAR, Aayush ANKIT, Nithesh Kurella, Sudeep Bhoja
  • Patent number: 11847062
    Abstract: In response to eviction of a first clean data block from an intermediate level of cache in a multi-cache hierarchy of a processing system, a cache controller accesses an address of the first clean data block. The controller initiates a fetch of the first clean data block from a system memory into a last-level cache using the accessed address.
    Type: Grant
    Filed: December 16, 2021
    Date of Patent: December 19, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Tarun Nakra, Jay Fleischman, Gautam Tarasingh Hazari, Akhil Arunkumar, William L. Walker, Gabriel H. Loh, John Kalamatianos, Marko Scrbak
  • Publication number: 20230205700
    Abstract: In response to generating one or more speculative prefetch requests for a last-level cache, a processor determines prefetch analytics for the generated speculative prefetch requests and compares the determined prefetch analytics of the speculative prefetch requests to selection thresholds. In response to a speculative prefetch request meeting or exceeding a selection threshold, the processor selects the speculative prefetch request for issuance to a memory-side cache controller. When one or more system conditions meet one or more condition thresholds, the processor issues the selected speculative prefetch request to the memory-side cache controller. The memory-side cache controller then retrieves the data indicated in the selected speculative prefetch request from a memory and stores it in a memory-side cache in the data fabric coupled to the last-level cache.
    Type: Application
    Filed: December 28, 2021
    Publication date: June 29, 2023
    Inventors: Tarun NAKRA, Akhil ARUNKUMAR, Paul MOYER, Jay FLEISCHMAN
  • Publication number: 20230195628
    Abstract: Methods, systems, and devices maintain state information in a shadow tag memory for a plurality of cachelines in each of a plurality of private caches, with each of the private caches being associated with a corresponding one of multiple processing cores. One or more cache probes are generated based on a write operation associated with one or more cachelines of the plurality of cachelines, such that each of the cache probes is associated with cachelines of a particular private cache of the multiple private caches, the particular private cache being associated with an indicated processing core. Transmission of the cache probes to the particular private cache is prevented until, responsive to a scope acquire operation from the indicated processing core, the cache probes are released for transmission to the respectively associated cachelines in the particular private cache.
    Type: Application
    Filed: December 21, 2021
    Publication date: June 22, 2023
    Inventors: Akhil Arunkumar, Tarun Nakra, Maxim V. Kazakov, Milind N. Nemlekar
  • Publication number: 20230195643
    Abstract: In response to eviction of a first clean data block from an intermediate level of cache in a multi-cache hierarchy of a processing system, a cache controller accesses an address of the first clean data block. The controller initiates a fetch of the first clean data block from a system memory into a last-level cache using the accessed address.
    Type: Application
    Filed: December 16, 2021
    Publication date: June 22, 2023
    Inventors: Tarun Nakra, Jay Fleischman, Gautam Tarasingh Hazari, Akhil Arunkumar, William L. Walker, Gabriel H. Loh, John Kalamatianos, Marko Scrbak
  • Patent number: 11580025
    Abstract: Systems and methods for coordinated memory-side cache prefetching and dynamic interleaving configuration modification involve modifying one or both of the prefetch distance or the prefetch degree used by prefetcher modules of one or more memory-side caches by modifying interleaving configuration data following detection of an interleaving reconfiguration trigger condition indicative, for example, of low prefetch accuracy, low prefetch coverage, high prefetch lateness, or a combination of these. In response an interleaving reconfiguration trigger condition, a processor modifies the interleaving configuration data for the processing system based on the prefetch performance characteristics associated with the interleaving reconfiguration trigger condition. In some embodiments, the interleaving configuration data is modified by changing which physical memory address indices are used to determine the bits that define the channel identification number to which that physical memory address is to be mapped.
    Type: Grant
    Filed: September 30, 2021
    Date of Patent: February 14, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Tarun Nakra, Akhil Arunkumar, Vydhyanathan Kalyanasundharam, Chintan S. Patel, Nithesh Kurella Lakshmi Narayanamurthy
  • Patent number: 10360158
    Abstract: Embodiments of the present system and method provide cache replacement in a victim exclusive cache using a snoop filter where replacement information is not lost during a re-reference back to the CPU. Replacement information is stored in a snoop filter, meaning that historical access data may be fully preserved and allows for more flexibility in the LLC re-insertion points, without additional bits stored in a L2 cache. The present system and method further include snoop filter replacement technique. The present system and method passes replacement information between a snoop filter and a victim exclusive cache (e.g., LLC) when transactions move cachelines to and from a master CPU. This maintains and advances existing replacement information for a cacheline that is removed from the victim exclusive cache on a read, as well as intelligently replaces and ages cachelines in the snoop filter.
    Type: Grant
    Filed: June 7, 2017
    Date of Patent: July 23, 2019
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Eric C. Quinnell, Kevin C. Heuer, Tarun Nakra, Akhil Arunkumar
  • Publication number: 20180276140
    Abstract: Embodiments of the present system and method provide cache replacement in a victim exclusive cache using a snoop filter where replacement information is not lost during a re-reference back to the CPU. Replacement information is stored in a snoop filter, meaning that historical access data may be fully preserved and allows for more flexibility in the LLC re-insertion points, without additional bits stored in a L2 cache. The present system and method further include snoop filter replacement technique. The present system and method passes replacement information between a snoop filter and a victim exclusive cache (e.g., LLC) when transactions move cachelines to and from a master CPU. This maintains and advances existing replacement information for a cacheline that is removed from the victim exclusive cache on a read, as well as intelligently replaces and ages cachelines in the snoop filter.
    Type: Application
    Filed: June 7, 2017
    Publication date: September 27, 2018
    Inventors: Eric C. QUINNELL, Kevin C. HEUER, Tarun NAKRA, Akhil ARUNKUMAR