Patents by Inventor Shaizeen AGA

Shaizeen AGA has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11656945
    Abstract: Methods and processing devices are provided for error protection to support instruction replay for executing idempotent instructions at a processing in memory PIM device. The processing apparatus includes a PIM device configured to execute an idempotent instruction. The processing apparatus also includes a processor, in communication with the PIM device, configured to issue the idempotent instruction to the PIM device for execution at the PIM device and reissue the idempotent instruction to the PIM device when one of execution of the idempotent instruction at the PIM device results in an error and a predetermined latency period expires from when the idempotent instruction is issued.
    Type: Grant
    Filed: December 24, 2020
    Date of Patent: May 23, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: John Kalamatianos, Nuwan Jayasena, Sudhanva Gurumurthi, Shaizeen Aga, Shrikanth Ganapathy
  • Patent number: 11640444
    Abstract: A processing device is provided which comprises memory configured to store data and a plurality of processor cores in communication with each other via first and second hierarchical communication links. Processor cores of a first hierarchical processor core group are in communication with each other via the first hierarchical communication links and are configured to store, in the memory, a sub-portion of data of a first matrix and a sub-portion of data of a second matrix. The processor cores are also configured to determine a product of the sub-portion of data of the first matrix and the sub-portion of data of the second matrix, receive, from another processor core, another sub-portion of data of the second matrix and determine a product of the sub-portion of data of the first matrix and the other sub-portion of data of the second matrix.
    Type: Grant
    Filed: March 22, 2021
    Date of Patent: May 2, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Shaizeen Aga, Nuwan Jayasena, Allen H. Rush, Michael Ignatowski
  • Publication number: 20230098421
    Abstract: Methods and apparatuses include a processing unit which helps control the speed and computational resources required for arithmetic operations of two numbers in a first format. The control unit of the processing unit approximates the arithmetic operations using a plurality of decomposed numbers in a second format that facilitates faster calculations than the first format, such that performing arithmetic operations using the decomposed numbers is capable of approximating the results of the arithmetic operations of the two numbers in the first format.
    Type: Application
    Filed: September 30, 2021
    Publication date: March 30, 2023
    Inventors: Onur Kayiran, Mohamed Assem Abd ElMohsen Ibrahim, Shaizeen Aga
  • Publication number: 20230065546
    Abstract: An electronic device includes a plurality of nodes, each node having a processor that performs operations for processing instances of input data through a model, a local memory that stores a separate portion of model data for the model, and a controller. The controller identifies model data that meets one or more predetermined conditions in the separate portion of the model data in the local memory in some or all of the nodes that is accessible by the processors when processing the instances of input data through the model. The controller then copies the model data that meets the one or more predetermined conditions from the separate portion of the model data in the local memory in the some or all of the nodes to local memories in other nodes. In this way, the controller distributes model data that meets the one or more predetermined conditions among the nodes, making the model data that meets the one or more predetermined conditions available to the nodes without performing remote memory accesses.
    Type: Application
    Filed: September 29, 2021
    Publication date: March 2, 2023
    Inventors: Mohamed Assem Abd ElMohsen Ibrahim, Onur Kayiran, Shaizeen Aga
  • Publication number: 20230021492
    Abstract: A technical solution to the technical problem of how to support memory-centric operations on cached data uses a novel memory-centric memory operation that invokes write back functionality on cache controllers and memory controllers. The write back functionality enforces selective flushing of dirty, i.e., modified, cached data that is needed for memory-centric memory operations from caches to the completion level of the memory-centric memory operations, and updates the coherence state appropriately at each cache level. The technical solution ensures that commands to implement the selective cache flushing are ordered before the memory-centric memory operation at the completion level of the memory-centric memory operation.
    Type: Application
    Filed: July 26, 2021
    Publication date: January 26, 2023
    Inventors: Shaizeen Aga, Nuwan Jayasena, John Kalamatianos
  • Publication number: 20230004491
    Abstract: A technical solution to the technical problem of how to reduce the undesirable side effects of offloading computations to memory uses read hints to preload results of memory-side processing into a processor-side cache. A cache controller, in response to identifying a read hint in a memory-side processing instruction, causes results of the memory-side processing to be preloaded into a processor-side cache. Implementations include, without limitation, enabling or disabling the preloading based upon cache thrashing levels, preloading results, or portions of results, of memory-side processing to particular destination caches, preloading results based upon priority and/or degree of confidence, and/or during periods of low data bus and/or command bus utilization, last stores considerations, and enforcing an ordering constraint to ensure that preloading occurs after memory-side processing results are complete.
    Type: Application
    Filed: June 30, 2021
    Publication date: January 5, 2023
    Inventors: Shaizeen Aga, Nuwan Jayasena
  • Publication number: 20220414013
    Abstract: Dynamically coalescing atomic memory operations for memory-local computing is disclosed. In an embodiment, it is determined whether a first atomic memory access and a second atomic memory access are candidates for coalescing. In response to a triggering event, the atomic memory accesses that are candidates for coalescing are coalesced in a cache prior to requesting memory-local processing by a memory-local compute unit. The atomic memory accesses may be coalesced in the same cache line or atomic memory accesses in different cache lines may be coalesced using a multicast memory-local processing command.
    Type: Application
    Filed: June 28, 2021
    Publication date: December 29, 2022
    Inventors: JOHNATHAN ALSOP, ALEXANDRU DUTU, SHAIZEEN AGA, NUWAN JAYASENA
  • Patent number: 11487447
    Abstract: Approaches are provided for implementing hardware-software collaborative address mapping schemes that enable mapping data elements which are accessed together in the same row of one bank or over the same rows of different banks to achieve higher performance by reducing row conflicts. Using an intra-bank frame striping policy (IBFS), corresponding subsets of data elements are interleaved into a single row of a bank. Using an intra-channel frame striping policy (ICFS), corresponding subsets of data elements are interleaved into a single channel row of a channel. A memory controller utilizes ICFS and/or IBFS to efficiently store and access data elements in memory, such as processing-in-memory (PIM) enabled memory.
    Type: Grant
    Filed: August 28, 2020
    Date of Patent: November 1, 2022
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Mahzabeen Islam, Shaizeen Aga, Nuwan Jayasena, Jagadish B. Kotra
  • Publication number: 20220317926
    Abstract: Ordering between memory-centric memory operations, referred to hereinafter as “MC-Mem-Ops,” and core-centric memory operations, referred to hereinafter as “CC-Mem-Ops,” is enforced using inter-centric fences, referred to hereinafter as an “IC-fences.” IC-fences are implemented by an ordering primitive or ordering instruction, that cause a memory controller, a cache controller, etc., to enforce ordering of MC-Mem-Ops and CC-Mem-Ops throughout the memory pipeline and at the memory controller by not reordering MC-Mem-Ops (or sometimes CC-Mem-Ops) that arrive before the IC-fence to after the IC-fence. Processing of an IC-fence also causes the memory controller to issue an ordering acknowledgment to the thread that issued the IC-fence instruction. IC-fences are tracked at the core and designated as complete when the ordering acknowledgment is received.
    Type: Application
    Filed: March 31, 2021
    Publication date: October 6, 2022
    Inventors: Shaizeen Aga, Nuwan Jayasena, Johnathan Alsop
  • Publication number: 20220318085
    Abstract: Detecting execution hazards in offloaded operations is disclosed. A second offload operation is compared to a first offload operation that precedes the second offload operation. It is determined whether the second offload operation creates an execution hazard on an offload target device based on the comparison of the second offload operation to the first offload operation. If the execution hazard is detected, an error handling operation may be performed. In some examples, the offload operations are processing-in-memory operations.
    Type: Application
    Filed: November 29, 2021
    Publication date: October 6, 2022
    Inventors: JOHNATHAN ALSOP, SHAIZEEN AGA
  • Publication number: 20220317876
    Abstract: Methods and apparatuses to control digital data transfer via a memory channel between a memory module and a processor are disclosed. At least one of the memory module or the processor coalesces a plurality of short data words into multicast coalesced block data comprising a single data block for transfer via the memory channel. Each of the plurality of short data words pertains to one of at least two partitioned memory submodules in the memory module. The multicast coalesced block data is communicated over the memory channel.
    Type: Application
    Filed: March 31, 2021
    Publication date: October 6, 2022
    Inventors: Johnathan Alsop, Nuwan Jayasena, Shaizeen Aga, Andrew McCrabb
  • Publication number: 20220318015
    Abstract: Enforcing data placement requirements via address bit swapping, including: receiving an instruction comprising a first memory address associated with a first address bit mapping; generating a remapped instruction by rearranging a plurality of bits of the first memory address according to a second address bit mapping; and issuing the remapped instruction to memory.
    Type: Application
    Filed: March 31, 2021
    Publication date: October 6, 2022
    Inventors: JOHNATHAN ALSOP, SHAIZEEN AGA
  • Publication number: 20220276795
    Abstract: Approaches are provided for implementing hardware-software collaborative address mapping schemes that enable mapping data elements which are accessed together in the same row of one bank or over the same rows of different banks to achieve higher performance by reducing row conflicts. Using an intra-bank frame striping policy (IBFS), corresponding subsets of data elements are interleaved into a single row of a bank. Using an intra-channel frame striping policy (ICFS), corresponding subsets of data elements are interleaved into a single channel row of a channel. A memory controller utilizes ICFS and/or IBFS to efficiently store and access data elements in memory, such as processing-in-memory (PIM) enabled memory.
    Type: Application
    Filed: May 16, 2022
    Publication date: September 1, 2022
    Inventors: Mahzabeen Islam, Shaizeen Aga, Nuwan Jayasena, Jagadish B. Kotra
  • Patent number: 11409608
    Abstract: Providing host-based error detection capabilities in a remote execution device is disclosed. A remote execution device performs a host-offloaded operation that modifies a block of data stored in memory. Metadata is generated locally for the modified of block of data such that the local metadata generation emulates host-based metadata generation. Stored metadata for the block of data is updated with the locally generated metadata for the modified portion of the block of data. When the host performs an integrity check on the modified block of data using the updated metadata, the host does not distinguish between metadata generated by the host and metadata generated in the remote execution device.
    Type: Grant
    Filed: December 29, 2020
    Date of Patent: August 9, 2022
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Shrikanth Ganapathy, Ross V. La Fetra, John Kalamatianos, Sudhanva Gurumurthi, Shaizeen Aga, Vilas Sridharan, Michael Ignatowski, Nuwan Jayasena
  • Publication number: 20220206899
    Abstract: Methods and processing devices are provided for error protection to support instruction replay for executing idempotent instructions at a processing in memory PIM device. The processing apparatus includes a PIM device configured to execute an idempotent instruction. The processing apparatus also includes a processor, in communication with the PIM device, configured to issue the idempotent instruction to the PIM device for execution at the PIM device and reissue the idempotent instruction to the PIM device when one of execution of the idempotent instruction at the PIM device results in an error and a predetermined latency period expires from when the idempotent instruction is issued.
    Type: Application
    Filed: December 24, 2020
    Publication date: June 30, 2022
    Applicant: Advanced Micro Devices, Inc.
    Inventors: John Kalamatianos, Nuwan Jayasena, Sudhanva Gurumurthi, Shaizeen Aga, Shrikanth Ganapathy
  • Publication number: 20220206839
    Abstract: An Address Mapping-Aware Tasking (AMAT) mechanism manages compute task data and issues compute tasks on behalf of threads that created the compute task data. The AMAT mechanism stores compute task data generated by host threads in a set of partitions, where each partition is designated for a particular memory module. The AMAT mechanism maintains address mapping data that maps address information to partitions. Threads push compute task data to the AMAT mechanism instead of generating and issuing their own compute tasks. The AMAT mechanism uses address information included in the compute task data and the address mapping data to determine partitions in which to store the compute task data. The AMAT mechanism then issues compute tasks to be executed near the corresponding memory modules (i.e., in PIM execution units or NUMA compute nodes) based upon the compute task data stored in the partitions.
    Type: Application
    Filed: December 28, 2020
    Publication date: June 30, 2022
    Inventors: Jonathan Alsop, Shaizeen Aga, Nuwan Jayasena
  • Publication number: 20220206901
    Abstract: Providing host-based error detection capabilities in a remote execution device is disclosed. A remote execution device performs a host-offloaded operation that modifies a block of data stored in memory. Metadata is generated locally for the modified of block of data such that the local metadata generation emulates host-based metadata generation. Stored metadata for the block of data is updated with the locally generated metadata for the modified portion of the block of data. When the host performs an integrity check on the modified block of data using the updated metadata, the host does not distinguish between metadata generated by the host and metadata generated in the remote execution device.
    Type: Application
    Filed: December 29, 2020
    Publication date: June 30, 2022
    Inventors: SHRIKANTH GANAPATHY, ROSS V. LA FETRA, JOHN KALAMATIANOS, SUDHANVA GURUMURTHI, SHAIZEEN AGA, VILAS SRIDHARAN, MICHAEL IGNATOWSKI, NUWAN JAYASENA
  • Publication number: 20220197647
    Abstract: A memory module includes register selection logic to select alternate local source and/or destination registers to process PIM commands. The register selection logic uses an address-based register selection approach to select an alternate local source and/or destination register based upon address data specified by a PIM command and a split address maintained by a memory module. The register selection logic may alternatively use a register data-based approach to select an alternate local source and/or destination register based upon data stored in one or more local registers. A PIM-enabled memory module configured with the register selection logic described herein is capable of selecting an alternate local source and/or destination register to process PIM commands at or near the PIM execution unit where the PIM commands are executed.
    Type: Application
    Filed: December 18, 2020
    Publication date: June 23, 2022
    Inventors: Onur Kayiran, Mohamed Assem Ibrahim, Shaizeen Aga
  • Publication number: 20220100606
    Abstract: A memory module includes one or more programmable ECC engines that may be programed by a host processing element with a particular ECC implementation. As used herein, the term “ECC implementation” refers to ECC functionality for performing error detection and subsequent processing, for example using the results of the error detection to perform error correction and to encode corrupted data that cannot be corrected, etc. The approach allows an SoC designer or company to program and reprogram ECC engines in memory modules in a secure manner without having to disclose the particular ECC implementations used by the ECC engines to memory vendors or third parties.
    Type: Application
    Filed: September 25, 2020
    Publication date: March 31, 2022
    Inventors: SUDHANVA GURUMURTHI, VILAS SRIDHARAN, SHAIZEEN AGA, NUWAN JAYASENA, MICHAEL IGNATOWSKI, SHRIKANTH GANAPATHY, JOHN KALAMATIANOS
  • Publication number: 20220091974
    Abstract: A processing device and methods of controlling remote persistent writes are provided. Methods include receiving an instruction of a program to issue a persistent write to remote memory. The methods also include logging an entry in a local domain when the persistent write instruction is received and providing a first indication that the persistent write will be persisted to the remote memory. The methods also include executing the persistent write to the remote memory and providing a second indication that the persistent write to the remote memory is completed. The methods also include providing the first and second indications when it is determined not to execute the persistent write according to global ordering and providing the second indication without providing the first indication when it is determined to execute the persistent write to remote memory according to global ordering.
    Type: Application
    Filed: September 24, 2020
    Publication date: March 24, 2022
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Nuwan Jayasena, Shaizeen Aga