Patents by Inventor Jagadish B. Kotra

Jagadish B. Kotra has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230205693
    Abstract: Leveraging processing-in-memory (PIM) resources to expedite non-PIM instructions executed on a host is disclosed. In an implementation, a memory controller identifies a first write instruction to write first data to a first memory location, where the first write instruction is not a processing-in-memory (PIM) instruction. The memory controller then writes the first data to a first PIM register. Opportunistically, the memory controller moves the first data from the first PIM register to the first memory location. In another implementation, a memory controller identifies a first memory location associated with a first read instruction, where the first read instruction is not a processing-in-memory (PIM) instruction. The memory controller identifies that a PIM register is associated with the first memory location. The memory controller then reads, in response to the first read instruction, first data from the PIM register.
    Type: Application
    Filed: December 28, 2021
    Publication date: June 29, 2023
    Inventors: JAGADISH B. KOTRA, JOHN KALAMATIANOS, YASUKO ECKERT, YONGHAE KIM
  • Publication number: 20230186976
    Abstract: A fine-grained dynamic random-access memory (DRAM) includes a first memory bank, a second memory bank, and a dual mode I/O circuit. The first memory bank includes a memory array divided into a plurality of grains, each grain including a row buffer and input/output (I/O) circuitry. The dual-mode I/O circuit is coupled to the I/O circuitry of each grain in the first memory bank, and operates in a first mode in which commands having a first data width are routed to and fulfilled individually at each grain, and a second mode in which commands having a second data width different from the first data width are fulfilled by at least two of the grains in parallel.
    Type: Application
    Filed: December 13, 2021
    Publication date: June 15, 2023
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Sriseshan Srikanth, Vignesh Adhinarayanan, Jagadish B. Kotra, Sergey Blagodurov
  • Publication number: 20230169015
    Abstract: A method includes storing a function representing a set of data elements stored in a backing memory and, in response to a first memory read request for a first data element of the set of data elements, calculating a function result representing the first data element based on the function.
    Type: Application
    Filed: November 30, 2021
    Publication date: June 1, 2023
    Inventors: Kishore Punniyamurthy, SeyedMohammad SeyedzadehDelcheh, Sergey Blagodurov, Ganesh Dasika, Jagadish B. Kotra
  • Patent number: 11625249
    Abstract: Preserving memory ordering between offloaded instructions and non-offloaded instructions is disclosed. An offload instruction for an operation to be offloaded is processed and a lock is placed on a memory address associated with the offload instruction. In response to completing a cache operation targeting the memory address, the lock on the memory address is removed. For multithreaded applications, upon determining that a plurality of processor cores have each begun executing a sequence of offload instructions, the execution of non-offload instructions that are younger than any of the offload instructions is restricted. In response to determining that each processor core has completed executing its sequence of offload instructions, the restriction is removed. The remote device may be, for example, a processing-in-memory device or an accelerator coupled to a memory.
    Type: Grant
    Filed: December 29, 2020
    Date of Patent: April 11, 2023
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Jagadish B. Kotra, John Kalamatianos
  • Patent number: 11586441
    Abstract: Systems, apparatuses, and methods for virtualizing a micro-operation cache are disclosed. A processor includes at least a micro-operation cache, a conventional cache subsystem, a decode unit, and control logic. The decode unit decodes instructions into micro-operations which are then stored in the micro-operation cache. The micro-operation cache has limited capacity for storing micro-operations. When new micro-operations are decoded from pending instructions, existing micro-operations are evicted from the micro-operation cache to make room for the new micro-operations. Rather than being discarded, micro-operations evicted from the micro-operation cache are stored in the conventional cache subsystem. This prevents the original instruction from having to be decoded again on subsequent executions.
    Type: Grant
    Filed: December 17, 2020
    Date of Patent: February 21, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: John Kalamatianos, Jagadish B. Kotra
  • Patent number: 11586539
    Abstract: A processing system selectively allocates space to store a group of one or more cache lines at a cache level of a cache hierarchy having a plurality of cache levels based on memory access patterns of a software application executing at the processing system. The processing system generates bit vectors indicating which cache levels are to allocate space to store groups of one or more cache lines based on the memory access patterns, which are derived from data granularity and movement information. Based on the bit vectors, the processing system provides hints to the cache hierarchy indicating the lowest cache level that can exploit the reuse potential for a particular data.
    Type: Grant
    Filed: December 13, 2019
    Date of Patent: February 21, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Weon Taek Na, Jagadish B. Kotra, Yasuko Eckert, Steven Raasch, Sergey Blagodurov
  • Publication number: 20230030679
    Abstract: A technical solution to the technical problem of how to improve dispatch throughput for memory-centric commands bypasses address checking for certain memory-centric commands. Implementations include using an Address Check Bypass (ACB) bit to specify whether address checking should be performed for a memory-centric command. ACB bit values are specified in memory-centric instructions, automatically specified by a process, such as a compiler, or by host hardware, such as dispatch hardware, based upon whether a memory-centric command explicitly references memory. Implementations include bypassing, i.e., not performing, address checking for memory-centric commands that do not access memory and also for memory-centric commands that do access memory, but that have the same physical address as a prior memory-centric command that explicitly accessed memory to ensure that any data in caches was flushed to memory and/or invalidated.
    Type: Application
    Filed: July 27, 2021
    Publication date: February 2, 2023
    Inventors: Jagadish B. Kotra, John Kalamatianos, Gagandeep Panwar
  • Patent number: 11556250
    Abstract: A system including a stack of two or more layers of volatile memory, such as layers of a 3D stacked DRAM memory, places data in the stack based on a temperature or a refresh rate. When a threshold is exceeded, data are moved from a first region to a second region in the stack, the second region having one or both of a second temperature lower than a first temperature of the first region or a second refresh rate lower than a first refresh rate of the first region.
    Type: Grant
    Filed: July 27, 2020
    Date of Patent: January 17, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Jagadish B. Kotra, Karthik Rao, Joseph L. Greathouse
  • Patent number: 11507519
    Abstract: A processing system selectively compresses cache lines at a cache or at a memory or encrypts cache lines at the memory based on evictions of entries mapping virtual-to-physical address translations from a translation lookaside buffer (TLB). Upon eviction of a TLB entry, the processing system identifies cache lines corresponding to the physical addresses of the evicted TLB entry and selectively compresses the cache lines to increase the effective storage capacity of the processing system or encrypts the cache lines to protect against vulnerabilities.
    Type: Grant
    Filed: December 28, 2020
    Date of Patent: November 22, 2022
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Jagadish B. Kotra, Gabriel H. Loh, Matthew R. Poremba
  • Patent number: 11487447
    Abstract: Approaches are provided for implementing hardware-software collaborative address mapping schemes that enable mapping data elements which are accessed together in the same row of one bank or over the same rows of different banks to achieve higher performance by reducing row conflicts. Using an intra-bank frame striping policy (IBFS), corresponding subsets of data elements are interleaved into a single row of a bank. Using an intra-channel frame striping policy (ICFS), corresponding subsets of data elements are interleaved into a single channel row of a channel. A memory controller utilizes ICFS and/or IBFS to efficiently store and access data elements in memory, such as processing-in-memory (PIM) enabled memory.
    Type: Grant
    Filed: August 28, 2020
    Date of Patent: November 1, 2022
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Mahzabeen Islam, Shaizeen Aga, Nuwan Jayasena, Jagadish B. Kotra
  • Publication number: 20220318151
    Abstract: Devices and methods for cache prefetching are provided. A device is provided which comprises memory and a processor. The memory comprises a DRAM cache, a cache dedicated to the processor and one or more intermediate caches between the dedicated cache and the DRAM cache. The processor is configured to issue prefetch requests to prefetch data, issue data access requests to fetch the data and when one or more previously issued prefetch requests are determined to be inaccurate, issue a prefetch request to prefetch a tag, corresponding to the memory address of requested data in the DRAM cache. A tag look-up is performed at the DRAM cache without performing tag look-ups at the dedicated cache or the intermediate caches. The tag is prefetched from the DRAM cache without prefetching the requested data.
    Type: Application
    Filed: March 31, 2021
    Publication date: October 6, 2022
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Jagadish B. Kotra, Marko Scrbak, Matthew Raymond Poremba
  • Publication number: 20220276795
    Abstract: Approaches are provided for implementing hardware-software collaborative address mapping schemes that enable mapping data elements which are accessed together in the same row of one bank or over the same rows of different banks to achieve higher performance by reducing row conflicts. Using an intra-bank frame striping policy (IBFS), corresponding subsets of data elements are interleaved into a single row of a bank. Using an intra-channel frame striping policy (ICFS), corresponding subsets of data elements are interleaved into a single channel row of a channel. A memory controller utilizes ICFS and/or IBFS to efficiently store and access data elements in memory, such as processing-in-memory (PIM) enabled memory.
    Type: Application
    Filed: May 16, 2022
    Publication date: September 1, 2022
    Inventors: Mahzabeen Islam, Shaizeen Aga, Nuwan Jayasena, Jagadish B. Kotra
  • Patent number: 11398831
    Abstract: Temporal link encoding, including: identifying a data type of a data value to be transmitted; determining that the data type is included in one or more data types for temporal encoding; and transmitting the data value using temporal encoding.
    Type: Grant
    Filed: May 7, 2020
    Date of Patent: July 26, 2022
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Onur Kayiran, Steven Raasch, Sergey Blagodurov, Jagadish B. Kotra
  • Publication number: 20220206817
    Abstract: Preserving memory ordering between offloaded instructions and non-offloaded instructions is disclosed. An offload instruction for an operation to be offloaded is processed and a lock is placed on a memory address associated with the offload instruction. In response to completing a cache operation targeting the memory address, the lock on the memory address is removed. For multithreaded applications, upon determining that a plurality of processor cores have each begun executing a sequence of offload instructions, the execution of non-offload instructions that are younger than any of the offload instructions is restricted. In response to determining that each processor core has completed executing its sequence of offload instructions, the restriction is removed. The remote device may be, for example, a processing-in-memory device or an accelerator coupled to a memory.
    Type: Application
    Filed: December 29, 2020
    Publication date: June 30, 2022
    Inventors: JAGADISH B. KOTRA, JOHN KALAMATIANOS
  • Publication number: 20220206855
    Abstract: Offloading computations from a processor to remote execution logic is disclosed. Offload instructions for remote execution on a remote device are dispatched in the form of processor instructions like conventional instructions. In the processor, an offload instruction is inserted in an offload queue. The offload instruction may be inserted at the dispatch stage or the retire stage of the processor pipeline. Metadata for the offload instruction is added to the offload instruction in the offload queue. After retirement of the offload instruction, the processor transmits an offload request generated from the offload instruction.
    Type: Application
    Filed: December 29, 2020
    Publication date: June 30, 2022
    Inventors: NAGADASTAGIRI REDDY CHALLAPALLE, JAGADISH B. KOTRA, JOHN KALAMATIANOS
  • Publication number: 20220188208
    Abstract: A method may include, in response to a change in an operating parameter of a processing unit, modifying a signal pathway to a processing circuit component of the processing unit, and communicating with the processing circuit component via the signal pathway.
    Type: Application
    Filed: December 10, 2020
    Publication date: June 16, 2022
    Inventors: Anthony Gutierrez, Yasuko Eckert, Sergey Blagodurov, Jagadish B. Kotra
  • Publication number: 20220188117
    Abstract: Processor-guided execution of offloaded instructions using fixed function operations is disclosed. Instructions designated for remote execution by a target device are received by a processor. Each instruction includes, as an operand, a target register in the target device. The target register may be an architected virtual register. For each of the plurality of instructions, the processor transmits an offload request in the order that the instructions are received. The offload request includes the instruction designated for remote execution. The target device may be, for example, a processing-in-memory device or an accelerator coupled to a memory.
    Type: Application
    Filed: December 16, 2020
    Publication date: June 16, 2022
    Inventors: JOHN KALAMATIANOS, MICHAEL T. CLARK, MARIUS EVERS, WILLIAM L. WALKER, PAUL MOYER, JAY FLEISCHMAN, JAGADISH B. KOTRA
  • Publication number: 20220188233
    Abstract: A system-on-chip configured for eager invalidation and flushing of cached data used by PIM (Processing-in-Memory) instructions includes: one or more processor cores; one or more caches and an I/O (input/output) die comprising logic to: receive a cache probe request, wherein the cache probe request including a physical memory address associated with a PIM instruction, and the PIM instruction is to be offloaded to a PIM device for execution; and issue, based on the physical memory address, a cache probe to one or more of the caches prior to receiving the PIM instruction for dispatch to the PIM device.
    Type: Application
    Filed: September 13, 2021
    Publication date: June 16, 2022
    Inventors: JOHN KALAMATIANOS, JAGADISH B. KOTRA, GAGANDEEP PANWAR
  • Patent number: 11321241
    Abstract: Techniques are disclosed for processing address translations. The techniques include detecting a first miss for a first address translation request for a first address translation in a first translation lookaside buffer, in response to the first miss, fetching the first address translation into the first translation lookaside buffer and evicting a second address translation from the translation lookaside buffer into an instruction cache or local data share memory, detecting a second miss for a second address translation request referencing the second address translation, in the first translation lookaside buffer, and in response to the second miss, fetching the second address translation from the instruction cache or the local data share memory.
    Type: Grant
    Filed: August 31, 2020
    Date of Patent: May 3, 2022
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Jagadish B. Kotra, Michael W. LeBeane
  • Publication number: 20220091986
    Abstract: Systems, apparatuses, and methods for efficiently processing memory requests are disclosed. A computing system includes at least one processing unit coupled to a memory. Circuitry in the processing unit determines a memory request becomes a long-latency request based on detecting a translation lookaside buffer (TLB) miss, a branch misprediction, a memory dependence misprediction, or a precise exception has occurred. The circuitry marks the memory request as a long-latency request such as storing an indication of a long-latency request in an instruction tag of the memory request. The circuitry uses weighted criteria for scheduling out-of-order issue and servicing of memory requests. However, the indication of a long-latency request is not combined with other criteria in a weighted sum. Rather, the indication of the long-latency request is a separate value. The circuitry prioritizes memory requests marked as long-latency requests over memory requests not marked as long-latency requests.
    Type: Application
    Filed: September 23, 2020
    Publication date: March 24, 2022
    Inventors: Jagadish B. Kotra, John Kalamatianos