Patents by Inventor John Kalamatianos

John Kalamatianos has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Accessing a cache based on an address translation buffer result

Patent number: 12287739

Abstract: Address translation is performed to translate a virtual address targeted by a memory request (e.g., a load or memory request for data or an instruction) to a physical address. This translation is performed using an address translation buffer, e.g., a translation lookaside buffer (TLB). One or more actions are taken to reduce data access latencies for memory requests in the event of a TLB miss where the virtual address to physical address translation is not in the TLB. Examples of actions that are performed in various implementations in response to a TLB miss include bypassing level 1 (L1) and level 2 (L2) caches in the memory system, and speculatively sending the memory request to the L2 cache while checking whether the memory request is satisfied by the L1 cache.

Type: Grant

Filed: December 9, 2022

Date of Patent: April 29, 2025

Assignee: Advanced Micro Devices, Inc.

Inventors: Jagadish B Kotra, John Kalamatianos
Speculative dram request enabling and disabling

Patent number: 12189953

Abstract: Methods, devices, and systems for retrieving information based on cache miss prediction. It is predicted, based on a history of cache misses at a private cache, that a cache lookup for the information will miss a shared victim cache. A speculative memory request is enabled based on the prediction that the cache lookup for the information will miss the shared victim cache. The information is fetched based on the enabled speculative memory request.

Type: Grant

Filed: September 29, 2022

Date of Patent: January 7, 2025

Inventors: Jagadish B. Kotra, John Kalamatianos
SEMICONDUCTOR DEVICE FOR PERFORMING DATA REDUCTION FOR PROCESSING ARRAYS

Publication number: 20250004963

Abstract: A semiconductor device, referred to herein as a Globally Interconnected Operations (GIO) layer, provides global operations in the form of global data reduction for one or more PE arrays. The GIO layer includes processing elements that perform global data reduction on processing results from one or more PE arrays. The GIO layer includes connectors that allow it to be arranged in a 3D stack with one or more PE arrays, for example, on top of or beneath a PE array. This allows reduction operations to be implemented across PE arrays using an efficient topology with superior flexibility, scalability, latency and/or power characteristics that is customizable for particular use cases at assembly time, without requiring costly and time-consuming redesign of PE arrays, and without being constrained by particular PE array designs.

Type: Application

Filed: June 30, 2023

Publication date: January 2, 2025

Inventors: William Peter Ehrett, Anthony Gutierrez, Vedula Venkata Srikant Bharadwaj, Karthik Ramu Sangaiah, Prachi Shukla, Sriseshan Srikanth, Ganesh Dasika, John Kalamatianos
Reusing remote registers in processing in memory

Patent number: 12175073

Abstract: Systems, apparatuses, and methods for reusing remote registers in processing in memory (PIM) are disclosed. A system includes at least a host processor, a memory controller, and a PIM device. When the memory controller receives, from the host processor, an operation targeting the PIM device, the memory controller determines whether an optimization can be applied to the operation. The memory controller converts the operation into N PIM commands if the optimization is not applicable. Otherwise, the memory controller converts the operation into a N?1 PIM commands if the optimization is applicable. For example, if the operation involves reusing a constant value, a copy command can be omitted, resulting in memory bandwidth reduction and power consumption savings. In one scenario, the memory controller includes a constant-value cache, and the memory controller performs a lookup of the constant-value cache to determine if the optimization is applicable for a given operation.

Type: Grant

Filed: December 31, 2020

Date of Patent: December 24, 2024

Assignee: Advanced Micro Devices, Inc.

Inventors: John Kalamatianos, Varun Agrawal, Niti Madan
Processor-guided execution of offloaded instructions using fixed function operations

Patent number: 12153926

Abstract: Processor-guided execution of offloaded instructions using fixed function operations is disclosed. Instructions designated for remote execution by a target device are received by a processor. Each instruction includes, as an operand, a target register in the target device. The target register may be an architected virtual register. For each of the plurality of instructions, the processor transmits an offload request in the order that the instructions are received. The offload request includes the instruction designated for remote execution. The target device may be, for example, a processing-in-memory device or an accelerator coupled to a memory.

Type: Grant

Filed: December 21, 2023

Date of Patent: November 26, 2024

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: John Kalamatianos, Michael T. Clark, Marius Evers, William L. Walker, Paul Moyer, Jay Fleischman, Jagadish B. Kotra
Apparatus, system, and method for throttling prefetchers to prevent training on irregular memory accesses

Patent number: 12153524

Abstract: A disclosed computing device includes at least one prefetcher and a processing device communicatively coupled to the prefetcher. The processing device is configured to detect a throttling instruction that indicates a start of a throttling region. The computing device is further configured to prevent the prefetcher from being trained on one or more memory instructions included in the throttling region in response to the throttling instruction. Various other apparatuses, systems, and methods are also disclosed.

Type: Grant

Filed: September 30, 2022

Date of Patent: November 26, 2024

Assignee: Advanced Micro Devices, Inc.

Inventors: John Kalamatianos, Marko Scrbak, Gabriel H. Loh, Akhil Arunkumar
Flexible dictionary sharing for compressed caches

Patent number: 12135653

Abstract: Systems, apparatuses, and methods for implementing flexible dictionary sharing techniques for caches are disclosed. A set-associative cache includes a dictionary for each data array set. When a cache line is to be allocated in the cache, a cache controller determines to which set a base index of the cache line address maps. Then, a selector unit determines which dictionary of a group of dictionaries stored by those sets neighboring this set would achieve the most compression for the cache line. This dictionary is then selected to compress the cache line. An offset is added to the base index of the cache line to generate a full index in order to map the cache line to the set corresponding to this chosen dictionary. The compressed cache line is stored in this set with the chosen dictionary, and the offset is stored in the corresponding tag array entry.

Type: Grant

Filed: January 23, 2023

Date of Patent: November 5, 2024

Assignee: Advanced Micro Devices, Inc.

Inventors: Alexander D. Breslow, John Kalamatianos
Method and apparatus for a page-local delta-based prefetcher

Patent number: 12111767

Abstract: A method includes recording a first set of consecutive memory access deltas, where each of the consecutive memory access deltas represents a difference between two memory addresses accessed by an application, updating values in a prefetch training table based on the first set of memory access deltas, and predicting one or more memory addresses for prefetching responsive to a second set of consecutive memory access deltas and based on values in the prefetch training table.

Type: Grant

Filed: April 19, 2023

Date of Patent: October 8, 2024

Assignee: Advanced Micro Devices, Inc.

Inventors: Susumu Mashimo, John Kalamatianos
Accelerating relaxed remote atomics on multiple writer operations

Patent number: 12105957

Abstract: A memory controller includes an arbiter, a vector arithmetic logic unit (VALU), a read buffer and a write buffer both coupled to the VALU, and an atomic memory operation scheduler. The VALU performs scattered atomic memory operations on arrays of data elements responsive to selected memory access commands. The atomic memory operation scheduler is for scheduling atomic memory operations at the VALU; identifying a plurality of scattered atomic memory operations with commutative and associative properties, the plurality of scattered atomic memory operations on at least one element of an array of data elements associated with an address; and commanding the VALU to perform the plurality of scattered atomic memory operations.

Type: Grant

Filed: December 23, 2022

Date of Patent: October 1, 2024

Assignee: Advanced Micro Devices, Inc.

Inventors: John Kalamatianos, Karthik Ramu Sangaiah, Anthony Thomas Gutierrez
Offloading computations from a processor to remote execution logic

Patent number: 12073251

Abstract: Offloading computations from a processor to remote execution logic is disclosed. Offload instructions for remote execution on a remote device are dispatched in the form of processor instructions like conventional instructions. In the processor, an offload instruction is inserted in an offload queue. The offload instruction may be inserted at the dispatch stage or the retire stage of the processor pipeline. Metadata for the offload instruction is added to the offload instruction in the offload queue. After retirement of the offload instruction, the processor transmits an offload request generated from the offload instruction.

Type: Grant

Filed: December 29, 2020

Date of Patent: August 27, 2024

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Nagadastagiri Reddy Challapalle, Jagadish B. Kotra, John Kalamatianos
Approach for managing near-memory processing commands and non-near-memory processing commands in a memory controller

Patent number: 12066950

Abstract: An approach is provided for managing PIM commands and non-PIM commands at a memory controller. A memory controller enqueues PIM commands and non-PIM commands and selects the next command to process based upon various selection criteria. The memory controller maintains and uses a page table to properly configure memory elements, such as banks in a memory module, for the next memory command, whether a PIM command or a non-PIM command. The page table tracks the status of memory elements as of the most recent memory command that was issued. The page table includes an “All Bank” entry that indicates the status of banks after processing the most recent PIM command. For example, the All Banks entry indicates whether all the banks have a row open and if so, specifies the open row for all the banks.

Type: Grant

Filed: December 23, 2021

Date of Patent: August 20, 2024

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Niti Madan, John Kalamatianos
Hardware configuration selection using machine learning model

Patent number: 12045169

Abstract: Techniques for identifying a hardware configuration for operation are disclosed. The techniques include applying feature measurements to a trained model; obtaining output values from the trained model, the output values corresponding to different hardware configurations; and operating according to the output values, wherein the output values include one of a certainty score, a ranking, or a regression value.

Type: Grant

Filed: December 23, 2020

Date of Patent: July 23, 2024

Assignee: Advanced Micro Devices, Inc.

Inventors: Furkan Eris, Paul S. Keltcher, John Kalamatianos, Mayank Chhablani, Alok Garg
DRAM row management for processing in memory

Patent number: 12026401

Abstract: In accordance with described techniques for DRAM row management for processing in memory, a plurality of instructions are obtained for execution by a processing in memory component embedded in a dynamic random access memory. An instruction is identified that last accesses a row of the dynamic random access memory, and a subsequent instruction is identified that first accesses an additional row of the dynamic random access memory. A first command is issued to close the row and a second command is issued to open the additional row after the row is last accessed by the instruction.

Type: Grant

Filed: June 30, 2022

Date of Patent: July 2, 2024

Assignee: Advanced Micro Devices, Inc.

Inventors: Niti Madan, Yasuko Eckert, Varun Agrawal, John Kalamatianos
ACCELERATING RELAXED REMOTE ATOMICS ON MULTIPLE WRITER OPERATIONS

Publication number: 20240211134

Abstract: A memory controller includes an arbiter, a vector arithmetic logic unit (VALU), a read buffer and a write buffer both coupled to the VALU, and an atomic memory operation scheduler. The VALU performs scattered atomic memory operations on arrays of data elements responsive to selected memory access commands. The atomic memory operation scheduler is for scheduling atomic memory operations at the VALU; identifying a plurality of scattered atomic memory operations with commutative and associative properties, the plurality of scattered atomic memory operations on at least one element of an array of data elements associated with an address; and commanding the VALU to perform the plurality of scattered atomic memory operations.

Type: Application

Filed: December 23, 2022

Publication date: June 27, 2024

Applicant: Advanced Micro Devices, Inc.

Inventors: John Kalamatianos, Karthik Ramu Sangaiah, Anthony Thomas Gutierrez
Managing a Cache Using Per Memory Region Reuse Distance Estimation

Publication number: 20240211407

Abstract: A memory request issue counter (MRIC) is maintained that is incremented for every memory access a central processing unit core makes. A region reuse distance table is also maintained that includes multiple entries each of which stores the region reuse distance for a corresponding region. When a memory access request for a physical address is received, a reuse distance for the physical address is calculated. This reuse distance is the difference between the current MRIC value and a previous MRIC value for the physical address. The previous MRIC value for the physical address is the MRIC value the MRIC had when a memory access request for the physical address was last received. A region reuse distance for a region that includes the physical address is generated based on the reuse distance for the physical address and used to manage the cache.

Type: Application

Filed: December 27, 2022

Publication date: June 27, 2024

Applicant: Advanced Micro Devices, Inc.

Inventors: John Kalamatianos, Jagadish B. Kotra, Asmita Pal
Dispatch bandwidth of memory-centric requests by bypassing storage array address checking

Patent number: 12019547

Abstract: A technical solution to the technical problem of how to improve dispatch throughput for memory-centric commands bypasses address checking for certain memory-centric commands. Implementations include using an Address Check Bypass (ACB) bit to specify whether address checking should be performed for a memory-centric command. ACB bit values are specified in memory-centric instructions, automatically specified by a process, such as a compiler, or by host hardware, such as dispatch hardware, based upon whether a memory-centric command explicitly references memory. Implementations include bypassing, i.e., not performing, address checking for memory-centric commands that do not access memory and also for memory-centric commands that do access memory, but that have the same physical address as a prior memory-centric command that explicitly accessed memory to ensure that any data in caches was flushed to memory and/or invalidated.

Type: Grant

Filed: July 27, 2021

Date of Patent: June 25, 2024

Inventors: Jagadish B. Kotra, John Kalamatianos, Gagandeep Panwar
Method and Apparatus for Increasing Memory Level Parallelism by Reducing Miss Status Holding Register Allocation in Caches

Publication number: 20240202116

Abstract: An entry of a last level cache shadow tag array to track pending last level cache misses to private data in a previous level cache (e.g., an L2 cache), that also are misses to an exclusive last level cache (e.g., an L3 cache) and to the last level cache shadow tag array. Accordingly, last level cache miss status holding registers need not be expended to track cache misses to private data that are already being tracked by a previous level cache miss status holding register. Additionally or alternatively, up to a threshold number of last level cache pending misses to the same shared data from different processor cores are tracked in the last level cache shadow tag array, and any additional last level cache pending misses are tracked in a last level cache miss status holding register.

Type: Application

Filed: December 20, 2022

Publication date: June 20, 2024

Applicant: Advanced Micro Devices, Inc.

Inventors: Jagadish B. Kotra, John Kalamatianos, Paul James Moyer, Nicholas Dean Lance, Sriram Srinivasan, Patrick James Shyvers, William Louie Walker
Accessing a Cache Based on an Address Translation Buffer Result

Publication number: 20240193097

Abstract: Address translation is performed to translate a virtual address targeted by a memory request (e.g., a load or memory request for data or an instruction) to a physical address. This translation is performed using an address translation buffer, e.g., a translation lookaside buffer (TLB). One or more actions are taken to reduce data access latencies for memory requests in the event of a TLB miss where the virtual address to physical address translation is not in the TLB. Examples of actions that are performed in various implementations in response to a TLB miss include bypassing level 1 (L1) and level 2 (L2) caches in the memory system, and speculatively sending the memory request to the L2 cache while checking whether the memory request is satisfied by the L1 cache.

Type: Application

Filed: December 9, 2022

Publication date: June 13, 2024

Applicant: Advanced Micro Devices, Inc.

Inventors: Jagadish B. Kotra, John Kalamatianos
PROCESSOR-GUIDED EXECUTION OF OFFLOADED INSTRUCTIONS USING FIXED FUNCTION OPERATIONS

Publication number: 20240126552

Abstract: Processor-guided execution of offloaded instructions using fixed function operations is disclosed. Instructions designated for remote execution by a target device are received by a processor. Each instruction includes, as an operand, a target register in the target device. The target register may be an architected virtual register. For each of the plurality of instructions, the processor transmits an offload request in the order that the instructions are received. The offload request includes the instruction designated for remote execution. The target device may be, for example, a processing-in-memory device or an accelerator coupled to a memory.

Type: Application

Filed: December 21, 2023

Publication date: April 18, 2024

Inventors: JOHN KALAMATIANOS, MICHAEL T. CLARK, MARIUS EVERS, WILLIAM L. WALKER, PAUL MOYER, JAY FLEISCHMAN, JAGADISH B. KOTRA
Method and apparatus for reducing the latency of long latency memory requests

Patent number: 11960404

Abstract: Systems, apparatuses, and methods for efficiently processing memory requests are disclosed. A computing system includes at least one processing unit coupled to a memory. Circuitry in the processing unit determines a memory request becomes a long-latency request based on detecting a translation lookaside buffer (TLB) miss, a branch misprediction, a memory dependence misprediction, or a precise exception has occurred. The circuitry marks the memory request as a long-latency request such as storing an indication of a long-latency request in an instruction tag of the memory request. The circuitry uses weighted criteria for scheduling out-of-order issue and servicing of memory requests. However, the indication of a long-latency request is not combined with other criteria in a weighted sum. Rather, the indication of the long-latency request is a separate value. The circuitry prioritizes memory requests marked as long-latency requests over memory requests not marked as long-latency requests.

Type: Grant

Filed: September 23, 2020

Date of Patent: April 16, 2024

Assignee: Advanced Micro Devices, Inc.

Inventors: Jagadish B. Kotra, John Kalamatianos

1 2 3 4 5 … next