Patents by Inventor John Kalamatianos

John Kalamatianos has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11509333
    Abstract: Systems, apparatuses, and methods for implementing masked fault detection for reliable low voltage cache operation are disclosed. A processor includes a cache that can operate at a relatively low voltage level to conserve power. However, at low voltage levels, the cache is more likely to suffer from bit errors. To mitigate the bit errors occurring in cache lines at low voltage levels, the cache employs a strategy to uncover masked faults during runtime accesses to data by actual software applications. For example, on the first read of a given cache line, the data of the given cache line is inverted and written back to the same data array entry. Also, the error correction bits are regenerated for the inverted data. On a second read of the given cache line, if the fault population of the given cache line changes, then the given cache line's error protection level is updated.
    Type: Grant
    Filed: December 17, 2020
    Date of Patent: November 22, 2022
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Shrikanth Ganapathy, John Kalamatianos
  • Patent number: 11487671
    Abstract: Wavefront loading in a processor is managed and includes monitoring a selected wavefront of a set of wavefronts. Reuse of memory access requests for the selected wavefront is counted. A cache hit rate in one or more caches of the processor is determined based on the counted reuse. Based on the cache hit rate, subsequent memory requests of other wavefronts of the set of wavefronts are modified by including a type of reuse of cache lines in requests to the caches. In the caches, storage of data in the caches is based on the type of reuse indicated by the subsequent memory access requests. Reused cache lines are protected by preventing cache line contents from being replaced by another cache line for a duration of processing the set of wavefronts. Caches are bypassed when streaming access requests are made.
    Type: Grant
    Filed: June 19, 2019
    Date of Patent: November 1, 2022
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Xianwei Zhang, John Kalamatianos, Bradford Beckmann
  • Patent number: 11481331
    Abstract: An electronic device includes a processor having a cache memory, a plurality of physical registers, and a promotion logic functional block. The promotion logic functional block promotes prefetched data from a portion of a cache block in the cache memory into a given physical register, the promoting including storing the prefetched data in the given physical register. Upon encountering a load micro-operation that loads data from the portion of the cache block into a destination physical register, the promotion logic functional block sets the processor so that the prefetched data stored in the given physical register is provided to micro-operations that depend on the load micro-operation.
    Type: Grant
    Filed: December 28, 2020
    Date of Patent: October 25, 2022
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Jagadish Kotra, John Kalamatianos
  • Patent number: 11455252
    Abstract: Techniques for generating a model for predicting when different hybrid prefetcher configurations should be used are disclosed. Techniques for using the model to predict when different hybrid prefetcher configurations should be used are also disclosed. The techniques for generating the model include obtaining a set of input data, and generating trees based on the training data. Each tree is associated with a different hybrid prefetcher configuration and the trees output certainty scores for the associated hybrid prefetcher configuration based on hardware feature measurements. To decide on a hybrid prefetcher configuration to use, a prefetcher traverses multiple trees to obtain certainty scores for different hybrid prefetcher configurations and identifies a hybrid prefetcher configuration to used based on a comparison of the certainty scores.
    Type: Grant
    Filed: June 26, 2019
    Date of Patent: September 27, 2022
    Assignee: Advanced Micro Devices, Inc.
    Inventors: John Kalamatianos, Paul S. Keltcher, Mayank Chhablani, Alok Garg, Furkan Eris
  • Patent number: 11442727
    Abstract: An electronic device includes a processor, a branch predictor in the processor, and a predictor controller in the processor. The branch predictor includes multiple prediction functional blocks, each prediction functional block configured for generating predictions for control transfer instructions (CTIs) in program code based on respective prediction information, the branch predictor configured to select, from among predictions generated by the prediction functional blocks for each CTI, a selected prediction to be used for that CTI. The predictor controller keeps a record of prediction functional blocks from which the branch predictor previously selected predictions for CTIs. The predictor controller uses information from the record for controlling which prediction functional blocks are used by the branch predictor for generating predictions for CTIs.
    Type: Grant
    Filed: June 8, 2020
    Date of Patent: September 13, 2022
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Varun Agrawal, John Kalamatianos
  • Publication number: 20220261350
    Abstract: An electronic device includes a processor, the processor having a cache memory, a set of physical registers, and a promotion logic functional block. When one or more promotion conditions are met, the promotion logic functional block promotes prefetched data from a portion of a cache block in the cache memory to a physical register among the set of physical registers.
    Type: Application
    Filed: April 27, 2022
    Publication date: August 18, 2022
    Inventors: Jagadish Kotra, John Kalamatianos
  • Patent number: 11409608
    Abstract: Providing host-based error detection capabilities in a remote execution device is disclosed. A remote execution device performs a host-offloaded operation that modifies a block of data stored in memory. Metadata is generated locally for the modified of block of data such that the local metadata generation emulates host-based metadata generation. Stored metadata for the block of data is updated with the locally generated metadata for the modified portion of the block of data. When the host performs an integrity check on the modified block of data using the updated metadata, the host does not distinguish between metadata generated by the host and metadata generated in the remote execution device.
    Type: Grant
    Filed: December 29, 2020
    Date of Patent: August 9, 2022
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Shrikanth Ganapathy, Ross V. La Fetra, John Kalamatianos, Sudhanva Gurumurthi, Shaizeen Aga, Vilas Sridharan, Michael Ignatowski, Nuwan Jayasena
  • Publication number: 20220239315
    Abstract: A data processing platform, method, and program product perform compression and decompression of a set of data items. Suffix data and a prefix are selected for each respective data item in the set of data items based on data content of the respective data item. The set of data items is sorted based on the prefixes. The prefixes are encoded by querying multiple encoding tables to create a code word containing compressed information representing values of all prefixes for the set of data items. The code word and suffix data for each of the data items are stored in memory. The code word is decompressed to recover the prefixes. The recovered prefixes are paired with their respective suffix data.
    Type: Application
    Filed: April 18, 2022
    Publication date: July 28, 2022
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Alexander D. Breslow, Nuwan Jayasena, John Kalamatianos
  • Patent number: 11397691
    Abstract: A technique for accessing a memory having a high latency portion and a low latency portion is provided. The technique includes detecting a promotion trigger to promote data from the high latency portion to the low latency portion, in response to the promotion trigger, copying cache lines associated with the promotion trigger from the high latency portion to the low latency portion, and in response to a read request, providing data from either or both of the high latency portion or the low latency portion, based on a state associated with data in the high latency portion and the low latency portion.
    Type: Grant
    Filed: November 13, 2019
    Date of Patent: July 26, 2022
    Assignee: Advanced Micro Devices, Inc.
    Inventors: John Kalamatianos, Apostolos Kokolis, Shrikanth Ganapathy
  • Publication number: 20220206817
    Abstract: Preserving memory ordering between offloaded instructions and non-offloaded instructions is disclosed. An offload instruction for an operation to be offloaded is processed and a lock is placed on a memory address associated with the offload instruction. In response to completing a cache operation targeting the memory address, the lock on the memory address is removed. For multithreaded applications, upon determining that a plurality of processor cores have each begun executing a sequence of offload instructions, the execution of non-offload instructions that are younger than any of the offload instructions is restricted. In response to determining that each processor core has completed executing its sequence of offload instructions, the restriction is removed. The remote device may be, for example, a processing-in-memory device or an accelerator coupled to a memory.
    Type: Application
    Filed: December 29, 2020
    Publication date: June 30, 2022
    Inventors: JAGADISH B. KOTRA, JOHN KALAMATIANOS
  • Publication number: 20220206855
    Abstract: Offloading computations from a processor to remote execution logic is disclosed. Offload instructions for remote execution on a remote device are dispatched in the form of processor instructions like conventional instructions. In the processor, an offload instruction is inserted in an offload queue. The offload instruction may be inserted at the dispatch stage or the retire stage of the processor pipeline. Metadata for the offload instruction is added to the offload instruction in the offload queue. After retirement of the offload instruction, the processor transmits an offload request generated from the offload instruction.
    Type: Application
    Filed: December 29, 2020
    Publication date: June 30, 2022
    Inventors: NAGADASTAGIRI REDDY CHALLAPALLE, JAGADISH B. KOTRA, JOHN KALAMATIANOS
  • Publication number: 20220206901
    Abstract: Providing host-based error detection capabilities in a remote execution device is disclosed. A remote execution device performs a host-offloaded operation that modifies a block of data stored in memory. Metadata is generated locally for the modified of block of data such that the local metadata generation emulates host-based metadata generation. Stored metadata for the block of data is updated with the locally generated metadata for the modified portion of the block of data. When the host performs an integrity check on the modified block of data using the updated metadata, the host does not distinguish between metadata generated by the host and metadata generated in the remote execution device.
    Type: Application
    Filed: December 29, 2020
    Publication date: June 30, 2022
    Inventors: SHRIKANTH GANAPATHY, ROSS V. LA FETRA, JOHN KALAMATIANOS, SUDHANVA GURUMURTHI, SHAIZEEN AGA, VILAS SRIDHARAN, MICHAEL IGNATOWSKI, NUWAN JAYASENA
  • Publication number: 20220206899
    Abstract: Methods and processing devices are provided for error protection to support instruction replay for executing idempotent instructions at a processing in memory PIM device. The processing apparatus includes a PIM device configured to execute an idempotent instruction. The processing apparatus also includes a processor, in communication with the PIM device, configured to issue the idempotent instruction to the PIM device for execution at the PIM device and reissue the idempotent instruction to the PIM device when one of execution of the idempotent instruction at the PIM device results in an error and a predetermined latency period expires from when the idempotent instruction is issued.
    Type: Application
    Filed: December 24, 2020
    Publication date: June 30, 2022
    Applicant: Advanced Micro Devices, Inc.
    Inventors: John Kalamatianos, Nuwan Jayasena, Sudhanva Gurumurthi, Shaizeen Aga, Shrikanth Ganapathy
  • Publication number: 20220206685
    Abstract: Systems, apparatuses, and methods for reusing remote registers in processing in memory (PIM) are disclosed. A system includes at least a host processor, a memory controller, and a PIM device. When the memory controller receives, from the host processor, an operation targeting the PIM device, the memory controller determines whether an optimization can be applied to the operation. The memory controller converts the operation into N PIM commands if the optimization is not applicable. Otherwise, the memory controller converts the operation into a N?1 PIM commands if the optimization is applicable. For example, if the operation involves reusing a constant value, a copy command can be omitted, resulting in memory bandwidth reduction and power consumption savings. In one scenario, the memory controller includes a constant-value cache, and the memory controller performs a lookup of the constant-value cache to determine if the optimization is applicable for a given operation.
    Type: Application
    Filed: December 31, 2020
    Publication date: June 30, 2022
    Inventors: John Kalamatianos, Varun Agrawal, Niti Madan
  • Publication number: 20220197809
    Abstract: Techniques for identifying a hardware configuration for operation are disclosed. The techniques include applying feature measurements to a trained model; obtaining output values from the trained model, the output values corresponding to different hardware configurations; and operating according to the output values, wherein the output values include one of a certainty score, a ranking, or a regression value.
    Type: Application
    Filed: December 23, 2020
    Publication date: June 23, 2022
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Furkan Eris, Paul S. Keltcher, John Kalamatianos, Mayank Chhablani, Alok Garg
  • Publication number: 20220188233
    Abstract: A system-on-chip configured for eager invalidation and flushing of cached data used by PIM (Processing-in-Memory) instructions includes: one or more processor cores; one or more caches and an I/O (input/output) die comprising logic to: receive a cache probe request, wherein the cache probe request including a physical memory address associated with a PIM instruction, and the PIM instruction is to be offloaded to a PIM device for execution; and issue, based on the physical memory address, a cache probe to one or more of the caches prior to receiving the PIM instruction for dispatch to the PIM device.
    Type: Application
    Filed: September 13, 2021
    Publication date: June 16, 2022
    Inventors: JOHN KALAMATIANOS, JAGADISH B. KOTRA, GAGANDEEP PANWAR
  • Publication number: 20220188117
    Abstract: Processor-guided execution of offloaded instructions using fixed function operations is disclosed. Instructions designated for remote execution by a target device are received by a processor. Each instruction includes, as an operand, a target register in the target device. The target register may be an architected virtual register. For each of the plurality of instructions, the processor transmits an offload request in the order that the instructions are received. The offload request includes the instruction designated for remote execution. The target device may be, for example, a processing-in-memory device or an accelerator coupled to a memory.
    Type: Application
    Filed: December 16, 2020
    Publication date: June 16, 2022
    Inventors: JOHN KALAMATIANOS, MICHAEL T. CLARK, MARIUS EVERS, WILLIAM L. WALKER, PAUL MOYER, JAY FLEISCHMAN, JAGADISH B. KOTRA
  • Patent number: 11309911
    Abstract: A data processing platform, method, and program product perform compression and decompression of a set of data items. Suffix data and a prefix are selected for each respective data item in the set of data items based on data content of the respective data item. The set of data items is sorted based on the prefixes. The prefixes are encoded by querying multiple encoding tables to create a code word containing compressed information representing values of all prefixes for the set of data items. The code word and suffix data for each of the data items are stored in memory. The code word is decompressed to recover the prefixes. The recovered prefixes are paired with their respective suffix data.
    Type: Grant
    Filed: August 16, 2019
    Date of Patent: April 19, 2022
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Alexander D. Breslow, Nuwan Jayasena, John Kalamatianos
  • Publication number: 20220100501
    Abstract: An electronic device includes a processor having a micro-operation queue, multiple scheduler entries, and scheduler compression logic. When a pair of micro-operations in the micro-operation queue is compressible in accordance with one or more compressibility rules, the scheduler compression logic acquires the pair of micro-operations from the micro-operation queue and stores information from both micro-operations of the pair of micro-operations into different portions in a single scheduler entry. In this way, the scheduler compression logic compresses the pair of micro-operations into the single scheduler entry.
    Type: Application
    Filed: September 27, 2020
    Publication date: March 31, 2022
    Inventors: Michael W. Boyer, John Kalamatianos, Pritam Majumder
  • Publication number: 20220100606
    Abstract: A memory module includes one or more programmable ECC engines that may be programed by a host processing element with a particular ECC implementation. As used herein, the term “ECC implementation” refers to ECC functionality for performing error detection and subsequent processing, for example using the results of the error detection to perform error correction and to encode corrupted data that cannot be corrected, etc. The approach allows an SoC designer or company to program and reprogram ECC engines in memory modules in a secure manner without having to disclose the particular ECC implementations used by the ECC engines to memory vendors or third parties.
    Type: Application
    Filed: September 25, 2020
    Publication date: March 31, 2022
    Inventors: SUDHANVA GURUMURTHI, VILAS SRIDHARAN, SHAIZEEN AGA, NUWAN JAYASENA, MICHAEL IGNATOWSKI, SHRIKANTH GANAPATHY, JOHN KALAMATIANOS