Patents by Inventor John Kalamatianos

John Kalamatianos has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12287739
    Abstract: Address translation is performed to translate a virtual address targeted by a memory request (e.g., a load or memory request for data or an instruction) to a physical address. This translation is performed using an address translation buffer, e.g., a translation lookaside buffer (TLB). One or more actions are taken to reduce data access latencies for memory requests in the event of a TLB miss where the virtual address to physical address translation is not in the TLB. Examples of actions that are performed in various implementations in response to a TLB miss include bypassing level 1 (L1) and level 2 (L2) caches in the memory system, and speculatively sending the memory request to the L2 cache while checking whether the memory request is satisfied by the L1 cache.
    Type: Grant
    Filed: December 9, 2022
    Date of Patent: April 29, 2025
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Jagadish B Kotra, John Kalamatianos
  • Patent number: 12189953
    Abstract: Methods, devices, and systems for retrieving information based on cache miss prediction. It is predicted, based on a history of cache misses at a private cache, that a cache lookup for the information will miss a shared victim cache. A speculative memory request is enabled based on the prediction that the cache lookup for the information will miss the shared victim cache. The information is fetched based on the enabled speculative memory request.
    Type: Grant
    Filed: September 29, 2022
    Date of Patent: January 7, 2025
    Inventors: Jagadish B. Kotra, John Kalamatianos
  • Publication number: 20250004963
    Abstract: A semiconductor device, referred to herein as a Globally Interconnected Operations (GIO) layer, provides global operations in the form of global data reduction for one or more PE arrays. The GIO layer includes processing elements that perform global data reduction on processing results from one or more PE arrays. The GIO layer includes connectors that allow it to be arranged in a 3D stack with one or more PE arrays, for example, on top of or beneath a PE array. This allows reduction operations to be implemented across PE arrays using an efficient topology with superior flexibility, scalability, latency and/or power characteristics that is customizable for particular use cases at assembly time, without requiring costly and time-consuming redesign of PE arrays, and without being constrained by particular PE array designs.
    Type: Application
    Filed: June 30, 2023
    Publication date: January 2, 2025
    Inventors: William Peter Ehrett, Anthony Gutierrez, Vedula Venkata Srikant Bharadwaj, Karthik Ramu Sangaiah, Prachi Shukla, Sriseshan Srikanth, Ganesh Dasika, John Kalamatianos
  • Patent number: 12175073
    Abstract: Systems, apparatuses, and methods for reusing remote registers in processing in memory (PIM) are disclosed. A system includes at least a host processor, a memory controller, and a PIM device. When the memory controller receives, from the host processor, an operation targeting the PIM device, the memory controller determines whether an optimization can be applied to the operation. The memory controller converts the operation into N PIM commands if the optimization is not applicable. Otherwise, the memory controller converts the operation into a N?1 PIM commands if the optimization is applicable. For example, if the operation involves reusing a constant value, a copy command can be omitted, resulting in memory bandwidth reduction and power consumption savings. In one scenario, the memory controller includes a constant-value cache, and the memory controller performs a lookup of the constant-value cache to determine if the optimization is applicable for a given operation.
    Type: Grant
    Filed: December 31, 2020
    Date of Patent: December 24, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventors: John Kalamatianos, Varun Agrawal, Niti Madan
  • Patent number: 12153926
    Abstract: Processor-guided execution of offloaded instructions using fixed function operations is disclosed. Instructions designated for remote execution by a target device are received by a processor. Each instruction includes, as an operand, a target register in the target device. The target register may be an architected virtual register. For each of the plurality of instructions, the processor transmits an offload request in the order that the instructions are received. The offload request includes the instruction designated for remote execution. The target device may be, for example, a processing-in-memory device or an accelerator coupled to a memory.
    Type: Grant
    Filed: December 21, 2023
    Date of Patent: November 26, 2024
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: John Kalamatianos, Michael T. Clark, Marius Evers, William L. Walker, Paul Moyer, Jay Fleischman, Jagadish B. Kotra
  • Patent number: 12153524
    Abstract: A disclosed computing device includes at least one prefetcher and a processing device communicatively coupled to the prefetcher. The processing device is configured to detect a throttling instruction that indicates a start of a throttling region. The computing device is further configured to prevent the prefetcher from being trained on one or more memory instructions included in the throttling region in response to the throttling instruction. Various other apparatuses, systems, and methods are also disclosed.
    Type: Grant
    Filed: September 30, 2022
    Date of Patent: November 26, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventors: John Kalamatianos, Marko Scrbak, Gabriel H. Loh, Akhil Arunkumar
  • Patent number: 12135653
    Abstract: Systems, apparatuses, and methods for implementing flexible dictionary sharing techniques for caches are disclosed. A set-associative cache includes a dictionary for each data array set. When a cache line is to be allocated in the cache, a cache controller determines to which set a base index of the cache line address maps. Then, a selector unit determines which dictionary of a group of dictionaries stored by those sets neighboring this set would achieve the most compression for the cache line. This dictionary is then selected to compress the cache line. An offset is added to the base index of the cache line to generate a full index in order to map the cache line to the set corresponding to this chosen dictionary. The compressed cache line is stored in this set with the chosen dictionary, and the offset is stored in the corresponding tag array entry.
    Type: Grant
    Filed: January 23, 2023
    Date of Patent: November 5, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Alexander D. Breslow, John Kalamatianos
  • Patent number: 12111767
    Abstract: A method includes recording a first set of consecutive memory access deltas, where each of the consecutive memory access deltas represents a difference between two memory addresses accessed by an application, updating values in a prefetch training table based on the first set of memory access deltas, and predicting one or more memory addresses for prefetching responsive to a second set of consecutive memory access deltas and based on values in the prefetch training table.
    Type: Grant
    Filed: April 19, 2023
    Date of Patent: October 8, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Susumu Mashimo, John Kalamatianos
  • Patent number: 12105957
    Abstract: A memory controller includes an arbiter, a vector arithmetic logic unit (VALU), a read buffer and a write buffer both coupled to the VALU, and an atomic memory operation scheduler. The VALU performs scattered atomic memory operations on arrays of data elements responsive to selected memory access commands. The atomic memory operation scheduler is for scheduling atomic memory operations at the VALU; identifying a plurality of scattered atomic memory operations with commutative and associative properties, the plurality of scattered atomic memory operations on at least one element of an array of data elements associated with an address; and commanding the VALU to perform the plurality of scattered atomic memory operations.
    Type: Grant
    Filed: December 23, 2022
    Date of Patent: October 1, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventors: John Kalamatianos, Karthik Ramu Sangaiah, Anthony Thomas Gutierrez
  • Patent number: 12073251
    Abstract: Offloading computations from a processor to remote execution logic is disclosed. Offload instructions for remote execution on a remote device are dispatched in the form of processor instructions like conventional instructions. In the processor, an offload instruction is inserted in an offload queue. The offload instruction may be inserted at the dispatch stage or the retire stage of the processor pipeline. Metadata for the offload instruction is added to the offload instruction in the offload queue. After retirement of the offload instruction, the processor transmits an offload request generated from the offload instruction.
    Type: Grant
    Filed: December 29, 2020
    Date of Patent: August 27, 2024
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Nagadastagiri Reddy Challapalle, Jagadish B. Kotra, John Kalamatianos
  • Patent number: 12066950
    Abstract: An approach is provided for managing PIM commands and non-PIM commands at a memory controller. A memory controller enqueues PIM commands and non-PIM commands and selects the next command to process based upon various selection criteria. The memory controller maintains and uses a page table to properly configure memory elements, such as banks in a memory module, for the next memory command, whether a PIM command or a non-PIM command. The page table tracks the status of memory elements as of the most recent memory command that was issued. The page table includes an “All Bank” entry that indicates the status of banks after processing the most recent PIM command. For example, the All Banks entry indicates whether all the banks have a row open and if so, specifies the open row for all the banks.
    Type: Grant
    Filed: December 23, 2021
    Date of Patent: August 20, 2024
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Niti Madan, John Kalamatianos
  • Patent number: 12045169
    Abstract: Techniques for identifying a hardware configuration for operation are disclosed. The techniques include applying feature measurements to a trained model; obtaining output values from the trained model, the output values corresponding to different hardware configurations; and operating according to the output values, wherein the output values include one of a certainty score, a ranking, or a regression value.
    Type: Grant
    Filed: December 23, 2020
    Date of Patent: July 23, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Furkan Eris, Paul S. Keltcher, John Kalamatianos, Mayank Chhablani, Alok Garg
  • Patent number: 12026401
    Abstract: In accordance with described techniques for DRAM row management for processing in memory, a plurality of instructions are obtained for execution by a processing in memory component embedded in a dynamic random access memory. An instruction is identified that last accesses a row of the dynamic random access memory, and a subsequent instruction is identified that first accesses an additional row of the dynamic random access memory. A first command is issued to close the row and a second command is issued to open the additional row after the row is last accessed by the instruction.
    Type: Grant
    Filed: June 30, 2022
    Date of Patent: July 2, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Niti Madan, Yasuko Eckert, Varun Agrawal, John Kalamatianos
  • Publication number: 20240211134
    Abstract: A memory controller includes an arbiter, a vector arithmetic logic unit (VALU), a read buffer and a write buffer both coupled to the VALU, and an atomic memory operation scheduler. The VALU performs scattered atomic memory operations on arrays of data elements responsive to selected memory access commands. The atomic memory operation scheduler is for scheduling atomic memory operations at the VALU; identifying a plurality of scattered atomic memory operations with commutative and associative properties, the plurality of scattered atomic memory operations on at least one element of an array of data elements associated with an address; and commanding the VALU to perform the plurality of scattered atomic memory operations.
    Type: Application
    Filed: December 23, 2022
    Publication date: June 27, 2024
    Applicant: Advanced Micro Devices, Inc.
    Inventors: John Kalamatianos, Karthik Ramu Sangaiah, Anthony Thomas Gutierrez
  • Publication number: 20240211407
    Abstract: A memory request issue counter (MRIC) is maintained that is incremented for every memory access a central processing unit core makes. A region reuse distance table is also maintained that includes multiple entries each of which stores the region reuse distance for a corresponding region. When a memory access request for a physical address is received, a reuse distance for the physical address is calculated. This reuse distance is the difference between the current MRIC value and a previous MRIC value for the physical address. The previous MRIC value for the physical address is the MRIC value the MRIC had when a memory access request for the physical address was last received. A region reuse distance for a region that includes the physical address is generated based on the reuse distance for the physical address and used to manage the cache.
    Type: Application
    Filed: December 27, 2022
    Publication date: June 27, 2024
    Applicant: Advanced Micro Devices, Inc.
    Inventors: John Kalamatianos, Jagadish B. Kotra, Asmita Pal
  • Patent number: 12019547
    Abstract: A technical solution to the technical problem of how to improve dispatch throughput for memory-centric commands bypasses address checking for certain memory-centric commands. Implementations include using an Address Check Bypass (ACB) bit to specify whether address checking should be performed for a memory-centric command. ACB bit values are specified in memory-centric instructions, automatically specified by a process, such as a compiler, or by host hardware, such as dispatch hardware, based upon whether a memory-centric command explicitly references memory. Implementations include bypassing, i.e., not performing, address checking for memory-centric commands that do not access memory and also for memory-centric commands that do access memory, but that have the same physical address as a prior memory-centric command that explicitly accessed memory to ensure that any data in caches was flushed to memory and/or invalidated.
    Type: Grant
    Filed: July 27, 2021
    Date of Patent: June 25, 2024
    Inventors: Jagadish B. Kotra, John Kalamatianos, Gagandeep Panwar
  • Publication number: 20240202116
    Abstract: An entry of a last level cache shadow tag array to track pending last level cache misses to private data in a previous level cache (e.g., an L2 cache), that also are misses to an exclusive last level cache (e.g., an L3 cache) and to the last level cache shadow tag array. Accordingly, last level cache miss status holding registers need not be expended to track cache misses to private data that are already being tracked by a previous level cache miss status holding register. Additionally or alternatively, up to a threshold number of last level cache pending misses to the same shared data from different processor cores are tracked in the last level cache shadow tag array, and any additional last level cache pending misses are tracked in a last level cache miss status holding register.
    Type: Application
    Filed: December 20, 2022
    Publication date: June 20, 2024
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Jagadish B. Kotra, John Kalamatianos, Paul James Moyer, Nicholas Dean Lance, Sriram Srinivasan, Patrick James Shyvers, William Louie Walker
  • Publication number: 20240193097
    Abstract: Address translation is performed to translate a virtual address targeted by a memory request (e.g., a load or memory request for data or an instruction) to a physical address. This translation is performed using an address translation buffer, e.g., a translation lookaside buffer (TLB). One or more actions are taken to reduce data access latencies for memory requests in the event of a TLB miss where the virtual address to physical address translation is not in the TLB. Examples of actions that are performed in various implementations in response to a TLB miss include bypassing level 1 (L1) and level 2 (L2) caches in the memory system, and speculatively sending the memory request to the L2 cache while checking whether the memory request is satisfied by the L1 cache.
    Type: Application
    Filed: December 9, 2022
    Publication date: June 13, 2024
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Jagadish B. Kotra, John Kalamatianos
  • Publication number: 20240126552
    Abstract: Processor-guided execution of offloaded instructions using fixed function operations is disclosed. Instructions designated for remote execution by a target device are received by a processor. Each instruction includes, as an operand, a target register in the target device. The target register may be an architected virtual register. For each of the plurality of instructions, the processor transmits an offload request in the order that the instructions are received. The offload request includes the instruction designated for remote execution. The target device may be, for example, a processing-in-memory device or an accelerator coupled to a memory.
    Type: Application
    Filed: December 21, 2023
    Publication date: April 18, 2024
    Inventors: JOHN KALAMATIANOS, MICHAEL T. CLARK, MARIUS EVERS, WILLIAM L. WALKER, PAUL MOYER, JAY FLEISCHMAN, JAGADISH B. KOTRA
  • Patent number: 11960404
    Abstract: Systems, apparatuses, and methods for efficiently processing memory requests are disclosed. A computing system includes at least one processing unit coupled to a memory. Circuitry in the processing unit determines a memory request becomes a long-latency request based on detecting a translation lookaside buffer (TLB) miss, a branch misprediction, a memory dependence misprediction, or a precise exception has occurred. The circuitry marks the memory request as a long-latency request such as storing an indication of a long-latency request in an instruction tag of the memory request. The circuitry uses weighted criteria for scheduling out-of-order issue and servicing of memory requests. However, the indication of a long-latency request is not combined with other criteria in a weighted sum. Rather, the indication of the long-latency request is a separate value. The circuitry prioritizes memory requests marked as long-latency requests over memory requests not marked as long-latency requests.
    Type: Grant
    Filed: September 23, 2020
    Date of Patent: April 16, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Jagadish B. Kotra, John Kalamatianos