Patents by Inventor John Kalamatianos
John Kalamatianos has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12287739Abstract: Address translation is performed to translate a virtual address targeted by a memory request (e.g., a load or memory request for data or an instruction) to a physical address. This translation is performed using an address translation buffer, e.g., a translation lookaside buffer (TLB). One or more actions are taken to reduce data access latencies for memory requests in the event of a TLB miss where the virtual address to physical address translation is not in the TLB. Examples of actions that are performed in various implementations in response to a TLB miss include bypassing level 1 (L1) and level 2 (L2) caches in the memory system, and speculatively sending the memory request to the L2 cache while checking whether the memory request is satisfied by the L1 cache.Type: GrantFiled: December 9, 2022Date of Patent: April 29, 2025Assignee: Advanced Micro Devices, Inc.Inventors: Jagadish B Kotra, John Kalamatianos
-
Patent number: 12189953Abstract: Methods, devices, and systems for retrieving information based on cache miss prediction. It is predicted, based on a history of cache misses at a private cache, that a cache lookup for the information will miss a shared victim cache. A speculative memory request is enabled based on the prediction that the cache lookup for the information will miss the shared victim cache. The information is fetched based on the enabled speculative memory request.Type: GrantFiled: September 29, 2022Date of Patent: January 7, 2025Inventors: Jagadish B. Kotra, John Kalamatianos
-
Publication number: 20250004963Abstract: A semiconductor device, referred to herein as a Globally Interconnected Operations (GIO) layer, provides global operations in the form of global data reduction for one or more PE arrays. The GIO layer includes processing elements that perform global data reduction on processing results from one or more PE arrays. The GIO layer includes connectors that allow it to be arranged in a 3D stack with one or more PE arrays, for example, on top of or beneath a PE array. This allows reduction operations to be implemented across PE arrays using an efficient topology with superior flexibility, scalability, latency and/or power characteristics that is customizable for particular use cases at assembly time, without requiring costly and time-consuming redesign of PE arrays, and without being constrained by particular PE array designs.Type: ApplicationFiled: June 30, 2023Publication date: January 2, 2025Inventors: William Peter Ehrett, Anthony Gutierrez, Vedula Venkata Srikant Bharadwaj, Karthik Ramu Sangaiah, Prachi Shukla, Sriseshan Srikanth, Ganesh Dasika, John Kalamatianos
-
Patent number: 12175073Abstract: Systems, apparatuses, and methods for reusing remote registers in processing in memory (PIM) are disclosed. A system includes at least a host processor, a memory controller, and a PIM device. When the memory controller receives, from the host processor, an operation targeting the PIM device, the memory controller determines whether an optimization can be applied to the operation. The memory controller converts the operation into N PIM commands if the optimization is not applicable. Otherwise, the memory controller converts the operation into a N?1 PIM commands if the optimization is applicable. For example, if the operation involves reusing a constant value, a copy command can be omitted, resulting in memory bandwidth reduction and power consumption savings. In one scenario, the memory controller includes a constant-value cache, and the memory controller performs a lookup of the constant-value cache to determine if the optimization is applicable for a given operation.Type: GrantFiled: December 31, 2020Date of Patent: December 24, 2024Assignee: Advanced Micro Devices, Inc.Inventors: John Kalamatianos, Varun Agrawal, Niti Madan
-
Patent number: 12153926Abstract: Processor-guided execution of offloaded instructions using fixed function operations is disclosed. Instructions designated for remote execution by a target device are received by a processor. Each instruction includes, as an operand, a target register in the target device. The target register may be an architected virtual register. For each of the plurality of instructions, the processor transmits an offload request in the order that the instructions are received. The offload request includes the instruction designated for remote execution. The target device may be, for example, a processing-in-memory device or an accelerator coupled to a memory.Type: GrantFiled: December 21, 2023Date of Patent: November 26, 2024Assignee: ADVANCED MICRO DEVICES, INC.Inventors: John Kalamatianos, Michael T. Clark, Marius Evers, William L. Walker, Paul Moyer, Jay Fleischman, Jagadish B. Kotra
-
Patent number: 12153524Abstract: A disclosed computing device includes at least one prefetcher and a processing device communicatively coupled to the prefetcher. The processing device is configured to detect a throttling instruction that indicates a start of a throttling region. The computing device is further configured to prevent the prefetcher from being trained on one or more memory instructions included in the throttling region in response to the throttling instruction. Various other apparatuses, systems, and methods are also disclosed.Type: GrantFiled: September 30, 2022Date of Patent: November 26, 2024Assignee: Advanced Micro Devices, Inc.Inventors: John Kalamatianos, Marko Scrbak, Gabriel H. Loh, Akhil Arunkumar
-
Patent number: 12135653Abstract: Systems, apparatuses, and methods for implementing flexible dictionary sharing techniques for caches are disclosed. A set-associative cache includes a dictionary for each data array set. When a cache line is to be allocated in the cache, a cache controller determines to which set a base index of the cache line address maps. Then, a selector unit determines which dictionary of a group of dictionaries stored by those sets neighboring this set would achieve the most compression for the cache line. This dictionary is then selected to compress the cache line. An offset is added to the base index of the cache line to generate a full index in order to map the cache line to the set corresponding to this chosen dictionary. The compressed cache line is stored in this set with the chosen dictionary, and the offset is stored in the corresponding tag array entry.Type: GrantFiled: January 23, 2023Date of Patent: November 5, 2024Assignee: Advanced Micro Devices, Inc.Inventors: Alexander D. Breslow, John Kalamatianos
-
Patent number: 12111767Abstract: A method includes recording a first set of consecutive memory access deltas, where each of the consecutive memory access deltas represents a difference between two memory addresses accessed by an application, updating values in a prefetch training table based on the first set of memory access deltas, and predicting one or more memory addresses for prefetching responsive to a second set of consecutive memory access deltas and based on values in the prefetch training table.Type: GrantFiled: April 19, 2023Date of Patent: October 8, 2024Assignee: Advanced Micro Devices, Inc.Inventors: Susumu Mashimo, John Kalamatianos
-
Patent number: 12105957Abstract: A memory controller includes an arbiter, a vector arithmetic logic unit (VALU), a read buffer and a write buffer both coupled to the VALU, and an atomic memory operation scheduler. The VALU performs scattered atomic memory operations on arrays of data elements responsive to selected memory access commands. The atomic memory operation scheduler is for scheduling atomic memory operations at the VALU; identifying a plurality of scattered atomic memory operations with commutative and associative properties, the plurality of scattered atomic memory operations on at least one element of an array of data elements associated with an address; and commanding the VALU to perform the plurality of scattered atomic memory operations.Type: GrantFiled: December 23, 2022Date of Patent: October 1, 2024Assignee: Advanced Micro Devices, Inc.Inventors: John Kalamatianos, Karthik Ramu Sangaiah, Anthony Thomas Gutierrez
-
Patent number: 12073251Abstract: Offloading computations from a processor to remote execution logic is disclosed. Offload instructions for remote execution on a remote device are dispatched in the form of processor instructions like conventional instructions. In the processor, an offload instruction is inserted in an offload queue. The offload instruction may be inserted at the dispatch stage or the retire stage of the processor pipeline. Metadata for the offload instruction is added to the offload instruction in the offload queue. After retirement of the offload instruction, the processor transmits an offload request generated from the offload instruction.Type: GrantFiled: December 29, 2020Date of Patent: August 27, 2024Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Nagadastagiri Reddy Challapalle, Jagadish B. Kotra, John Kalamatianos
-
Patent number: 12066950Abstract: An approach is provided for managing PIM commands and non-PIM commands at a memory controller. A memory controller enqueues PIM commands and non-PIM commands and selects the next command to process based upon various selection criteria. The memory controller maintains and uses a page table to properly configure memory elements, such as banks in a memory module, for the next memory command, whether a PIM command or a non-PIM command. The page table tracks the status of memory elements as of the most recent memory command that was issued. The page table includes an “All Bank” entry that indicates the status of banks after processing the most recent PIM command. For example, the All Banks entry indicates whether all the banks have a row open and if so, specifies the open row for all the banks.Type: GrantFiled: December 23, 2021Date of Patent: August 20, 2024Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Niti Madan, John Kalamatianos
-
Patent number: 12045169Abstract: Techniques for identifying a hardware configuration for operation are disclosed. The techniques include applying feature measurements to a trained model; obtaining output values from the trained model, the output values corresponding to different hardware configurations; and operating according to the output values, wherein the output values include one of a certainty score, a ranking, or a regression value.Type: GrantFiled: December 23, 2020Date of Patent: July 23, 2024Assignee: Advanced Micro Devices, Inc.Inventors: Furkan Eris, Paul S. Keltcher, John Kalamatianos, Mayank Chhablani, Alok Garg
-
Patent number: 12026401Abstract: In accordance with described techniques for DRAM row management for processing in memory, a plurality of instructions are obtained for execution by a processing in memory component embedded in a dynamic random access memory. An instruction is identified that last accesses a row of the dynamic random access memory, and a subsequent instruction is identified that first accesses an additional row of the dynamic random access memory. A first command is issued to close the row and a second command is issued to open the additional row after the row is last accessed by the instruction.Type: GrantFiled: June 30, 2022Date of Patent: July 2, 2024Assignee: Advanced Micro Devices, Inc.Inventors: Niti Madan, Yasuko Eckert, Varun Agrawal, John Kalamatianos
-
Publication number: 20240211134Abstract: A memory controller includes an arbiter, a vector arithmetic logic unit (VALU), a read buffer and a write buffer both coupled to the VALU, and an atomic memory operation scheduler. The VALU performs scattered atomic memory operations on arrays of data elements responsive to selected memory access commands. The atomic memory operation scheduler is for scheduling atomic memory operations at the VALU; identifying a plurality of scattered atomic memory operations with commutative and associative properties, the plurality of scattered atomic memory operations on at least one element of an array of data elements associated with an address; and commanding the VALU to perform the plurality of scattered atomic memory operations.Type: ApplicationFiled: December 23, 2022Publication date: June 27, 2024Applicant: Advanced Micro Devices, Inc.Inventors: John Kalamatianos, Karthik Ramu Sangaiah, Anthony Thomas Gutierrez
-
Publication number: 20240211407Abstract: A memory request issue counter (MRIC) is maintained that is incremented for every memory access a central processing unit core makes. A region reuse distance table is also maintained that includes multiple entries each of which stores the region reuse distance for a corresponding region. When a memory access request for a physical address is received, a reuse distance for the physical address is calculated. This reuse distance is the difference between the current MRIC value and a previous MRIC value for the physical address. The previous MRIC value for the physical address is the MRIC value the MRIC had when a memory access request for the physical address was last received. A region reuse distance for a region that includes the physical address is generated based on the reuse distance for the physical address and used to manage the cache.Type: ApplicationFiled: December 27, 2022Publication date: June 27, 2024Applicant: Advanced Micro Devices, Inc.Inventors: John Kalamatianos, Jagadish B. Kotra, Asmita Pal
-
Patent number: 12019547Abstract: A technical solution to the technical problem of how to improve dispatch throughput for memory-centric commands bypasses address checking for certain memory-centric commands. Implementations include using an Address Check Bypass (ACB) bit to specify whether address checking should be performed for a memory-centric command. ACB bit values are specified in memory-centric instructions, automatically specified by a process, such as a compiler, or by host hardware, such as dispatch hardware, based upon whether a memory-centric command explicitly references memory. Implementations include bypassing, i.e., not performing, address checking for memory-centric commands that do not access memory and also for memory-centric commands that do access memory, but that have the same physical address as a prior memory-centric command that explicitly accessed memory to ensure that any data in caches was flushed to memory and/or invalidated.Type: GrantFiled: July 27, 2021Date of Patent: June 25, 2024Inventors: Jagadish B. Kotra, John Kalamatianos, Gagandeep Panwar
-
Publication number: 20240202116Abstract: An entry of a last level cache shadow tag array to track pending last level cache misses to private data in a previous level cache (e.g., an L2 cache), that also are misses to an exclusive last level cache (e.g., an L3 cache) and to the last level cache shadow tag array. Accordingly, last level cache miss status holding registers need not be expended to track cache misses to private data that are already being tracked by a previous level cache miss status holding register. Additionally or alternatively, up to a threshold number of last level cache pending misses to the same shared data from different processor cores are tracked in the last level cache shadow tag array, and any additional last level cache pending misses are tracked in a last level cache miss status holding register.Type: ApplicationFiled: December 20, 2022Publication date: June 20, 2024Applicant: Advanced Micro Devices, Inc.Inventors: Jagadish B. Kotra, John Kalamatianos, Paul James Moyer, Nicholas Dean Lance, Sriram Srinivasan, Patrick James Shyvers, William Louie Walker
-
Publication number: 20240193097Abstract: Address translation is performed to translate a virtual address targeted by a memory request (e.g., a load or memory request for data or an instruction) to a physical address. This translation is performed using an address translation buffer, e.g., a translation lookaside buffer (TLB). One or more actions are taken to reduce data access latencies for memory requests in the event of a TLB miss where the virtual address to physical address translation is not in the TLB. Examples of actions that are performed in various implementations in response to a TLB miss include bypassing level 1 (L1) and level 2 (L2) caches in the memory system, and speculatively sending the memory request to the L2 cache while checking whether the memory request is satisfied by the L1 cache.Type: ApplicationFiled: December 9, 2022Publication date: June 13, 2024Applicant: Advanced Micro Devices, Inc.Inventors: Jagadish B. Kotra, John Kalamatianos
-
Publication number: 20240126552Abstract: Processor-guided execution of offloaded instructions using fixed function operations is disclosed. Instructions designated for remote execution by a target device are received by a processor. Each instruction includes, as an operand, a target register in the target device. The target register may be an architected virtual register. For each of the plurality of instructions, the processor transmits an offload request in the order that the instructions are received. The offload request includes the instruction designated for remote execution. The target device may be, for example, a processing-in-memory device or an accelerator coupled to a memory.Type: ApplicationFiled: December 21, 2023Publication date: April 18, 2024Inventors: JOHN KALAMATIANOS, MICHAEL T. CLARK, MARIUS EVERS, WILLIAM L. WALKER, PAUL MOYER, JAY FLEISCHMAN, JAGADISH B. KOTRA
-
Patent number: 11960404Abstract: Systems, apparatuses, and methods for efficiently processing memory requests are disclosed. A computing system includes at least one processing unit coupled to a memory. Circuitry in the processing unit determines a memory request becomes a long-latency request based on detecting a translation lookaside buffer (TLB) miss, a branch misprediction, a memory dependence misprediction, or a precise exception has occurred. The circuitry marks the memory request as a long-latency request such as storing an indication of a long-latency request in an instruction tag of the memory request. The circuitry uses weighted criteria for scheduling out-of-order issue and servicing of memory requests. However, the indication of a long-latency request is not combined with other criteria in a weighted sum. Rather, the indication of the long-latency request is a separate value. The circuitry prioritizes memory requests marked as long-latency requests over memory requests not marked as long-latency requests.Type: GrantFiled: September 23, 2020Date of Patent: April 16, 2024Assignee: Advanced Micro Devices, Inc.Inventors: Jagadish B. Kotra, John Kalamatianos