Patents by Inventor Akhil ARUNKUMAR

Akhil ARUNKUMAR has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Selective speculative prefetch requests for a last-level cache

Patent number: 12282428

Abstract: In response to generating one or more speculative prefetch requests for a last-level cache, a processor determines prefetch analytics for the generated speculative prefetch requests and compares the determined prefetch analytics of the speculative prefetch requests to selection thresholds. In response to a speculative prefetch request meeting or exceeding a selection threshold, the processor selects the speculative prefetch request for issuance to a memory-side cache controller. When one or more system conditions meet one or more condition thresholds, the processor issues the selected speculative prefetch request to the memory-side cache controller. The memory-side cache controller then retrieves the data indicated in the selected speculative prefetch request from a memory and stores it in a memory-side cache in the data fabric coupled to the last-level cache.

Type: Grant

Filed: December 28, 2021

Date of Patent: April 22, 2025

Assignee: Advanced Micro Devices, Inc.

Inventors: Tarun Nakra, Akhil Arunkumar, Paul Moyer, Jay Fleischman
METHOD AND APPARATUS TO CACHE KEY-VALUE DATA IN LOW-PRECISION NUMERICS FOR EFFICIENT GENERATIVE TRANSFORMER EXECUTION

Publication number: 20250110885

Abstract: A transformer compute apparatus and method of operation therefor. The apparatus receives matrix inputs in a first format and generates projection tokens from these inputs. Among others, the apparatus includes a first cache device configured for processing first projection tokens and a second cache device configured for processing second projection tokens. The first cache device stores the first projection tokens in a first cache region and stores these tokens converted to a second format in a second cache region. The second cache device stores the second projection tokens converted to the second format in a first cache region and stores the converted second projection tokens after being transposed. Then, a compute device performs various matrix computations with the converted first projection tokens and transposed second projection tokens. Re-processing data and expensive padding and de-padding operations for transposed storage and byte alignment can be avoided using this caching process.

Type: Application

Filed: November 18, 2024

Publication date: April 3, 2025

Inventors: Akhil ARUNKUMAR, Satyam Srivastava, Aayush Ankit
Method and apparatus to cache key-value data in low-precision numerics for efficient generative transformer execution

Patent number: 12182028

Abstract: A transformer compute apparatus and method of operation therefor. The apparatus receives matrix inputs in a first format and generates projection tokens from these inputs. Among others, the apparatus includes a first cache device configured for processing first projection tokens and a second cache device configured for processing second projection tokens. The first cache device stores the first projection tokens in a first cache region and stores these tokens converted to a second format in a second cache region. The second cache device stores the second projection tokens converted to the second format in a first cache region and stores the converted second projection tokens after being transposed. Then, a compute device performs various matrix computations with the converted first projection tokens and transposed second projection tokens. Re-processing data and expensive padding and de-padding operations for transposed storage and byte alignment can be avoided using this caching process.

Type: Grant

Filed: September 28, 2023

Date of Patent: December 31, 2024

Assignee: d-MATRIX CORPORATION

Inventors: Akhil Arunkumar, Satyam Srivastava, Aayush Ankit
Apparatus, system, and method for throttling prefetchers to prevent training on irregular memory accesses

Patent number: 12153524

Abstract: A disclosed computing device includes at least one prefetcher and a processing device communicatively coupled to the prefetcher. The processing device is configured to detect a throttling instruction that indicates a start of a throttling region. The computing device is further configured to prevent the prefetcher from being trained on one or more memory instructions included in the throttling region in response to the throttling instruction. Various other apparatuses, systems, and methods are also disclosed.

Type: Grant

Filed: September 30, 2022

Date of Patent: November 26, 2024

Assignee: Advanced Micro Devices, Inc.

Inventors: John Kalamatianos, Marko Scrbak, Gabriel H. Loh, Akhil Arunkumar
Relaxed invalidation for cache coherence

Patent number: 11960399

Abstract: Methods, systems, and devices maintain state information in a shadow tag memory for a plurality of cachelines in each of a plurality of private caches, with each of the private caches being associated with a corresponding one of multiple processing cores. One or more cache probes are generated based on a write operation associated with one or more cachelines of the plurality of cachelines, such that each of the cache probes is associated with cachelines of a particular private cache of the multiple private caches, the particular private cache being associated with an indicated processing core. Transmission of the cache probes to the particular private cache is prevented until, responsive to a scope acquire operation from the indicated processing core, the cache probes are released for transmission to the respectively associated cachelines in the particular private cache.

Type: Grant

Filed: December 21, 2021

Date of Patent: April 16, 2024

Assignee: Advanced Micro Devices, Inc.

Inventors: Akhil Arunkumar, Tarun Nakra, Maxim V. Kazakov, Milind N. Nemlekar
REGION PATTERN-MATCHING HARDWARE PREFETCHER

Publication number: 20240111677

Abstract: A method for performing prefetching operations is disclosed. The method includes storing a recorded access pattern indicating a set of accesses for a region; in response to an access within the region, fetching the recorded access pattern; and performing prefetching based on the access pattern.

Type: Application

Filed: September 30, 2022

Publication date: April 4, 2024

Applicant: Advanced Micro Devices, Inc.

Inventors: Gabriel H. Loh, Marko Scrbak, Akhil Arunkumar, John Kalamatianos
APPARATUS, SYSTEM, AND METHOD FOR THROTTLING PREFETCHERS TO PREVENT TRAINING ON IRREGULAR MEMORY ACCESSES

Publication number: 20240111676

Abstract: A disclosed computing device includes at least one prefetcher and a processing device communicatively coupled to the prefetcher. The processing device is configured to detect a throttling instruction that indicates a start of a throttling region. The computing device is further configured to prevent the prefetcher from being trained on one or more memory instructions included in the throttling region in response to the throttling instruction. Various other apparatuses, systems, and methods are also disclosed.

Type: Application

Filed: September 30, 2022

Publication date: April 4, 2024

Applicant: Advanced Micro Devices, Inc.

Inventors: John Kalamatianos, Marko Scrbak, Gabriel H. Loh, Akhil Arunkumar
IMMERSION COOLING SERVER SYSTEM WITH AI ACCELERATOR APPARATUSES USING IN-MEMORY COMPUTE CHIPLET DEVICES FOR TRANSFORMER WORKLOADS

Publication number: 20240090181

Abstract: An immersion cooling server system with AI accelerator apparatuses using in-memory compute chiplet devices. This system includes one or more immersion tanks with heat transfer fluid and configured with at least a condenser device. A plurality of AI accelerator servers is immersed in the heat transfer fluid in a bottom portion of the tanks and is configured to process transformer workloads while cooled by the immersion cooling configuration. Each of the servers includes a plurality of multiprocessors each having at least a first server central processing unit (CPU) and a second server CPU, both of which are coupled to a plurality of switch devices. Each switch device is coupled to a plurality of AI accelerator apparatuses. The apparatus includes one or more chiplets, each of which includes a plurality of digital in-memory compute (DIMC) devices configured to perform high throughput matrix computations for transformer based models.

Type: Application

Filed: November 16, 2023

Publication date: March 14, 2024

Inventors: Jayaprakash BALACHANDRAN, Akhil ARUNKUMAR, Aayush ANKIT, Nithesh Kurella, Sudeep Bhoja
SERVER SYSTEM WITH AI ACCELERATOR APPARATUSES USING IN-MEMORY COMPUTE CHIPLET DEVICES FOR TRANSFORMER WORKLOADS

Publication number: 20240037379

Abstract: A server system with AI accelerator apparatuses using in-memory compute chiplet devices. The system includes a plurality of multiprocessors each having at least a first server central processing unit (CPU) and a second server CPU, both of which are coupled to a plurality of switch devices. Each switch device is coupled to a plurality of AI accelerator apparatuses. The apparatus includes one or more chiplets, each of which includes a plurality of tiles. Each tile includes a plurality of slices, a CPU, and a hardware dispatch device. Each slice can include a digital in-memory compute (DIMC) device configured to perform high throughput computations. In particular, the DIMC device can be configured to accelerate the computations of attention functions for transformer-based models (a.k.a. transformers) applied to machine learning applications. A single input multiple data (SIMD) device configured to further process the DIMC output and compute softmax functions for the attention functions.

Type: Application

Filed: October 13, 2023

Publication date: February 1, 2024

Inventors: Jayaprakash BALACHANDRAN, Akhil ARUNKUMAR, Aayush ANKIT, Nithesh Kurella, Sudeep Bhoja
Re-fetching data for L3 cache data evictions into a last-level cache

Patent number: 11847062

Abstract: In response to eviction of a first clean data block from an intermediate level of cache in a multi-cache hierarchy of a processing system, a cache controller accesses an address of the first clean data block. The controller initiates a fetch of the first clean data block from a system memory into a last-level cache using the accessed address.

Type: Grant

Filed: December 16, 2021

Date of Patent: December 19, 2023

Assignee: Advanced Micro Devices, Inc.

Inventors: Tarun Nakra, Jay Fleischman, Gautam Tarasingh Hazari, Akhil Arunkumar, William L. Walker, Gabriel H. Loh, John Kalamatianos, Marko Scrbak
SELECTIVE SPECULATIVE PREFETCH REQUESTS FOR A LAST-LEVEL CACHE

Publication number: 20230205700

Abstract: In response to generating one or more speculative prefetch requests for a last-level cache, a processor determines prefetch analytics for the generated speculative prefetch requests and compares the determined prefetch analytics of the speculative prefetch requests to selection thresholds. In response to a speculative prefetch request meeting or exceeding a selection threshold, the processor selects the speculative prefetch request for issuance to a memory-side cache controller. When one or more system conditions meet one or more condition thresholds, the processor issues the selected speculative prefetch request to the memory-side cache controller. The memory-side cache controller then retrieves the data indicated in the selected speculative prefetch request from a memory and stores it in a memory-side cache in the data fabric coupled to the last-level cache.

Type: Application

Filed: December 28, 2021

Publication date: June 29, 2023

Inventors: Tarun NAKRA, Akhil ARUNKUMAR, Paul MOYER, Jay FLEISCHMAN
RE-FETCHING DATA FOR L3 CACHE DATA EVICTIONS INTO A LAST-LEVEL CACHE

Publication number: 20230195643

Abstract: In response to eviction of a first clean data block from an intermediate level of cache in a multi-cache hierarchy of a processing system, a cache controller accesses an address of the first clean data block. The controller initiates a fetch of the first clean data block from a system memory into a last-level cache using the accessed address.

Type: Application

Filed: December 16, 2021

Publication date: June 22, 2023

Inventors: Tarun Nakra, Jay Fleischman, Gautam Tarasingh Hazari, Akhil Arunkumar, William L. Walker, Gabriel H. Loh, John Kalamatianos, Marko Scrbak
RELAXED INVALIDATION FOR CACHE COHERENCE

Publication number: 20230195628

Abstract: Methods, systems, and devices maintain state information in a shadow tag memory for a plurality of cachelines in each of a plurality of private caches, with each of the private caches being associated with a corresponding one of multiple processing cores. One or more cache probes are generated based on a write operation associated with one or more cachelines of the plurality of cachelines, such that each of the cache probes is associated with cachelines of a particular private cache of the multiple private caches, the particular private cache being associated with an indicated processing core. Transmission of the cache probes to the particular private cache is prevented until, responsive to a scope acquire operation from the indicated processing core, the cache probes are released for transmission to the respectively associated cachelines in the particular private cache.

Type: Application

Filed: December 21, 2021

Publication date: June 22, 2023

Inventors: Akhil Arunkumar, Tarun Nakra, Maxim V. Kazakov, Milind N. Nemlekar
Cache prefetching with dynamic interleaving configuration modification

Patent number: 11580025

Abstract: Systems and methods for coordinated memory-side cache prefetching and dynamic interleaving configuration modification involve modifying one or both of the prefetch distance or the prefetch degree used by prefetcher modules of one or more memory-side caches by modifying interleaving configuration data following detection of an interleaving reconfiguration trigger condition indicative, for example, of low prefetch accuracy, low prefetch coverage, high prefetch lateness, or a combination of these. In response an interleaving reconfiguration trigger condition, a processor modifies the interleaving configuration data for the processing system based on the prefetch performance characteristics associated with the interleaving reconfiguration trigger condition. In some embodiments, the interleaving configuration data is modified by changing which physical memory address indices are used to determine the bits that define the channel identification number to which that physical memory address is to be mapped.

Type: Grant

Filed: September 30, 2021

Date of Patent: February 14, 2023

Assignee: Advanced Micro Devices, Inc.

Inventors: Tarun Nakra, Akhil Arunkumar, Vydhyanathan Kalyanasundharam, Chintan S. Patel, Nithesh Kurella Lakshmi Narayanamurthy
Snoop filter with stored replacement information, method for same, and system including victim exclusive cache and snoop filter shared replacement policies

Patent number: 10360158

Abstract: Embodiments of the present system and method provide cache replacement in a victim exclusive cache using a snoop filter where replacement information is not lost during a re-reference back to the CPU. Replacement information is stored in a snoop filter, meaning that historical access data may be fully preserved and allows for more flexibility in the LLC re-insertion points, without additional bits stored in a L2 cache. The present system and method further include snoop filter replacement technique. The present system and method passes replacement information between a snoop filter and a victim exclusive cache (e.g., LLC) when transactions move cachelines to and from a master CPU. This maintains and advances existing replacement information for a cacheline that is removed from the victim exclusive cache on a read, as well as intelligently replaces and ages cachelines in the snoop filter.

Type: Grant

Filed: June 7, 2017

Date of Patent: July 23, 2019

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Eric C. Quinnell, Kevin C. Heuer, Tarun Nakra, Akhil Arunkumar
SNOOP FILTER WITH STORED REPLACEMENT INFORMATION, METHOD FOR SAME, AND SYSTEM INCLUDING VICTIM EXCLUSIVE CACHE AND SNOOP FILTER SHARED REPLACEMENT POLICIES

Publication number: 20180276140

Abstract: Embodiments of the present system and method provide cache replacement in a victim exclusive cache using a snoop filter where replacement information is not lost during a re-reference back to the CPU. Replacement information is stored in a snoop filter, meaning that historical access data may be fully preserved and allows for more flexibility in the LLC re-insertion points, without additional bits stored in a L2 cache. The present system and method further include snoop filter replacement technique. The present system and method passes replacement information between a snoop filter and a victim exclusive cache (e.g., LLC) when transactions move cachelines to and from a master CPU. This maintains and advances existing replacement information for a cacheline that is removed from the victim exclusive cache on a read, as well as intelligently replaces and ages cachelines in the snoop filter.

Type: Application

Filed: June 7, 2017

Publication date: September 27, 2018

Inventors: Eric C. QUINNELL, Kevin C. HEUER, Tarun NAKRA, Akhil ARUNKUMAR