Patents by Inventor Shaizeen AGA

Shaizeen AGA has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12124531
    Abstract: A processing device including a plurality of clusters of processor cores and a method for use in the processing device is disclosed. Each processor core in a cluster of processor cores is in communication with the other processor cores in the cluster and at least one processor core of each cluster is in communication with at least a processor core of a different cluster of processor cores. Each processor core is configured to store a product of a portion of a first matrix and a first portion of a second matrix in the memory, and store a product of the portion of the first matrix and a second portion of the second matrix in the memory, where the second portion of the second matrix is received from a processor core in the cluster of processor cores.
    Type: Grant
    Filed: April 7, 2023
    Date of Patent: October 22, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Shaizeen Aga, Nuwan Jayasena, Allen H. Rush, Michael Ignatowski
  • Patent number: 12099866
    Abstract: An Address Mapping-Aware Tasking (AMAT) mechanism manages compute task data and issues compute tasks on behalf of threads that created the compute task data. The AMAT mechanism stores compute task data generated by host threads in a set of partitions, where each partition is designated for a particular memory module. The AMAT mechanism maintains address mapping data that maps address information to partitions. Threads push compute task data to the AMAT mechanism instead of generating and issuing their own compute tasks. The AMAT mechanism uses address information included in the compute task data and the address mapping data to determine partitions in which to store the compute task data. The AMAT mechanism then issues compute tasks to be executed near the corresponding memory modules (i.e., in PIM execution units or NUMA compute nodes) based upon the compute task data stored in the partitions.
    Type: Grant
    Filed: December 28, 2020
    Date of Patent: September 24, 2024
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Jonathan Alsop, Shaizeen Aga, Nuwan Jayasena
  • Publication number: 20240220315
    Abstract: A processing system includes a scheduling mechanism for producing data for fine-grained reordering of workgroups of a kernel to produce blocks of data, such as for communication across devices to enable overlapping of a producer computation with an all-reduce communication across the network. This scheduling mechanism enables a first parallel processor to schedule and execute a set of workgroups of a producer operation to generate data for transmission to a second parallel processor in a desired traffic pattern. At the same time, the second parallel processor schedules and executes a different set of workgroups of the producer operation to generate data for transmission in a desired traffic pattern to a third parallel processor or back to the first parallel processor.
    Type: Application
    Filed: December 30, 2022
    Publication date: July 4, 2024
    Inventors: Suchita Pati, Shaizeen Aga, Nuwan Jayasena, Matthew David Sinclair
  • Publication number: 20240168639
    Abstract: An apparatus for performing distributed reduction operations using near-memory computation includes memory and a first near-memory compute node. The first-near-memory compute node is coupled to a plurality of near-memory compute nodes. The first near-memory compute node comprises logic to store first data loaded from a second near-memory compute node, perform a reduction operation on the first data and second data to compute a result; and store the result within the first near-memory compute node. In some aspects, the near-memory compute node includes a PIM execution unit and carries out the reduction operation utilizing PIM commands.
    Type: Application
    Filed: November 18, 2022
    Publication date: May 23, 2024
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventors: SHAIZEEN AGA, JOHNATHAN ALSOP, NUWAN JAYASENA
  • Patent number: 11977782
    Abstract: An approach allows concurrent execution of near-memory processing commands, referred to herein as “PIM commands,” and host memory commands. A memory controller determines and issues a plurality of register-only PIM commands that do not reference memory with host memory commands to allow concurrent execution of the register-only PIM commands and the host memory commands. The approach allows concurrent execution of register-only PIM commands and host memory commands without interference, even when the register-only PIM commands and the host memory commands are interleaved, and even for the same memory module, which improves resource utilization and performance. Further improvement of resource utilization and performance is achieved by extending a register-only phase by reordering register-only PIM commands before non-register-only PIM commands, subject to dependency constraints, and using shadow row buffers to provide local working copies of data from memory to near-memory compute elements.
    Type: Grant
    Filed: June 30, 2022
    Date of Patent: May 7, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Mohamed Assem Abd ElMohsen Ibrahim, Meysam Taassori, Mahzabeen Islam, Shaizeen Aga
  • Patent number: 11966328
    Abstract: A memory module includes register selection logic to select alternate local source and/or destination registers to process PIM commands. The register selection logic uses an address-based register selection approach to select an alternate local source and/or destination register based upon address data specified by a PIM command and a split address maintained by a memory module. The register selection logic may alternatively use a register data-based approach to select an alternate local source and/or destination register based upon data stored in one or more local registers. A PIM-enabled memory module configured with the register selection logic described herein is capable of selecting an alternate local source and/or destination register to process PIM commands at or near the PIM execution unit where the PIM commands are executed.
    Type: Grant
    Filed: December 18, 2020
    Date of Patent: April 23, 2024
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Onur Kayiran, Mohamed Assem Ibrahim, Shaizeen Aga
  • Patent number: 11900161
    Abstract: Memory allocation for processing-in-memory operations, including: receiving, by an allocation module, a memory allocation request indicating a plurality of data structure operands for a processing-in-memory operation; determining a memory allocation pattern for the plurality of data structure operands, wherein the memory allocation pattern interleaves a plurality of component pages of a memory page across the plurality of data structure operands; and allocating the memory page based on the determined memory allocation pattern.
    Type: Grant
    Filed: March 24, 2020
    Date of Patent: February 13, 2024
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Anirban Nag, Nuwan Jayasena, Shaizeen Aga
  • Publication number: 20240045606
    Abstract: Methods and apparatuses to control digital data transfer via a memory channel between a memory module and a processor are disclosed. At least one of the memory module or the processor coalesces a plurality of short data words into multicast coalesced block data comprising a single data block for transfer via the memory channel. Each of the plurality of short data words pertains to one of at least two partitioned memory submodules in the memory module. The multicast coalesced block data is communicated over the memory channel.
    Type: Application
    Filed: October 23, 2023
    Publication date: February 8, 2024
    Inventors: JOHNATHAN ALSOP, NUWAN JAYASENA, SHAIZEEN AGA, ANDREW M. MCCRABB
  • Patent number: 11874739
    Abstract: A memory module includes one or more programmable ECC engines that may be programed by a host processing element with a particular ECC implementation. As used herein, the term “ECC implementation” refers to ECC functionality for performing error detection and subsequent processing, for example using the results of the error detection to perform error correction and to encode corrupted data that cannot be corrected, etc. The approach allows an SoC designer or company to program and reprogram ECC engines in memory modules in a secure manner without having to disclose the particular ECC implementations used by the ECC engines to memory vendors or third parties.
    Type: Grant
    Filed: September 25, 2020
    Date of Patent: January 16, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Sudhanva Gurumurthi, Vilas Sridharan, Shaizeen Aga, Nuwan Jayasena, Michael Ignatowski, Shrikanth Ganapathy, John Kalamatianos
  • Publication number: 20240004585
    Abstract: An approach allows concurrent execution of near-memory processing commands, referred to herein as “PIM commands,” and host memory commands. A memory controller determines and issues a plurality of register-only PIM commands that do not reference memory with host memory commands to allow concurrent execution of the register-only PIM commands and the host memory commands. The approach allows concurrent execution of register-only PIM commands and host memory commands without interference, even when the register-only PIM commands and the host memory commands are interleaved, and even for the same memory module, which improves resource utilization and performance. Further improvement of resource utilization and performance is achieved by extending a register-only phase by reordering register-only PIM commands before non-register-only PIM commands, subject to dependency constraints, and using shadow row buffers to provide local working copies of data from memory to near-memory compute elements.
    Type: Application
    Filed: June 30, 2022
    Publication date: January 4, 2024
    Inventors: Mohamed Assem Abd ElMohsen Ibrahim, Meysam Taassori, Mahzabeen Islam, Shaizeen Aga
  • Publication number: 20240004653
    Abstract: An approach is provided for managing near-memory processing commands (“PIM commands”) from multiple processor threads in a manner to prevent interference and maintain correctness at near-memory processing elements. A memory controller uses thread identification information and last command information to issue a PIM command sequence from a first processor thread, directed to a PIM-enabled memory element, while deferring the issuance of PIM command sequences from other processor threads, directed to the same PIM-enabled memory element. After the last PIM command in the PIM command sequence for the first processor thread has been issued, a PIM command sequence for another processor thread is issued, and so on. The approach allows multiple processor threads to concurrently issue fine grained PIM commands to the same PIM-enabled memory element without having to be aware of address-to-memory element mapping, and without having to coordinate with other threads.
    Type: Application
    Filed: June 29, 2022
    Publication date: January 4, 2024
    Inventors: Johnathan Alsop, Laurent S. White, Shaizeen Aga
  • Publication number: 20230409238
    Abstract: An approach is provided for processing near-memory processing commands, e.g., PIM commands, using PIM register definition data that defines multiple combinations of source and/or destination registers to be used to process PIM commands. A particular combination of source and/or destination registers to be used to process a PIM command is specified by the PIM command or determined by a near-memory processing element processing the PIM command. According to another implementation, the PIM register definition data specifies an initial combination of source and/or destination registers and one or more update functions for each PIM command. A near-memory processing element processes a PIM command using the initial combination of source and/or destination registers and uses the one or more update functions to update the combination of source and/or destination registers to be used the next time the PIM command is processed.
    Type: Application
    Filed: June 21, 2022
    Publication date: December 21, 2023
    Inventors: Shaizeen Aga, Nuwan Jayasena
  • Patent number: 11847048
    Abstract: A processing device and methods of controlling remote persistent writes are provided. Methods include receiving an instruction of a program to issue a persistent write to remote memory. The methods also include logging an entry in a local domain when the persistent write instruction is received and providing a first indication that the persistent write will be persisted to the remote memory. The methods also include executing the persistent write to the remote memory and providing a second indication that the persistent write to the remote memory is completed. The methods also include providing the first and second indications when it is determined not to execute the persistent write according to global ordering and providing the second indication without providing the first indication when it is determined to execute the persistent write to remote memory according to global ordering.
    Type: Grant
    Filed: September 24, 2020
    Date of Patent: December 19, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Nuwan Jayasena, Shaizeen Aga
  • Patent number: 11847055
    Abstract: A technical solution to the technical problem of how to reduce the undesirable side effects of offloading computations to memory uses read hints to preload results of memory-side processing into a processor-side cache. A cache controller, in response to identifying a read hint in a memory-side processing instruction, causes results of the memory-side processing to be preloaded into a processor-side cache. Implementations include, without limitation, enabling or disabling the preloading based upon cache thrashing levels, preloading results, or portions of results, of memory-side processing to particular destination caches, preloading results based upon priority and/or degree of confidence, and/or during periods of low data bus and/or command bus utilization, last stores considerations, and enforcing an ordering constraint to ensure that preloading occurs after memory-side processing results are complete.
    Type: Grant
    Filed: June 30, 2021
    Date of Patent: December 19, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Shaizeen Aga, Nuwan Jayasena
  • Patent number: 11847061
    Abstract: A technical solution to the technical problem of how to support memory-centric operations on cached data uses a novel memory-centric memory operation that invokes write back functionality on cache controllers and memory controllers. The write back functionality enforces selective flushing of dirty, i.e., modified, cached data that is needed for memory-centric memory operations from caches to the completion level of the memory-centric memory operations, and updates the coherence state appropriately at each cache level. The technical solution ensures that commands to implement the selective cache flushing are ordered before the memory-centric memory operation at the completion level of the memory-centric memory operation.
    Type: Grant
    Filed: July 26, 2021
    Date of Patent: December 19, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Shaizeen Aga, Nuwan Jayasena, John Kalamatianos
  • Publication number: 20230359558
    Abstract: An approach is provided for skipping, i.e., not processing and/or deleting, near-memory processing commands when one or more skip criteria are satisfied. Examples of skip criteria include, without limitation, specific operations, specific operands, and combinations of specific operations and specific operands. The approach is implemented at one or more memory command processing elements in the memory pipeline of a processor, such as memory controllers, caches, queues, and buffers, etc. Implementations include exceptions to skipping in certain situations and software support for configuring skip criteria, including particular operations and operands for which skip checking is performed. The approach provides the benefits of reducing command bus traffic and power consumption while maintaining functional correctness.
    Type: Application
    Filed: May 9, 2022
    Publication date: November 9, 2023
    Inventors: Shaizeen Aga, Mohamed Assem Abd ElMohsen Ibrahim
  • Patent number: 11803311
    Abstract: Methods and apparatuses to control digital data transfer via a memory channel between a memory module and a processor are disclosed. At least one of the memory module or the processor coalesces a plurality of short data words into multicast coalesced block data comprising a single data block for transfer via the memory channel. Each of the plurality of short data words pertains to one of at least two partitioned memory submodules in the memory module. The multicast coalesced block data is communicated over the memory channel.
    Type: Grant
    Filed: March 31, 2021
    Date of Patent: October 31, 2023
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Johnathan Alsop, Nuwan Jayasena, Shaizeen Aga, Andrew McCrabb
  • Patent number: 11797201
    Abstract: Approaches are provided for implementing hardware-software collaborative address mapping schemes that enable mapping data elements which are accessed together in the same row of one bank or over the same rows of different banks to achieve higher performance by reducing row conflicts. Using an intra-bank frame striping policy (IBFS), corresponding subsets of data elements are interleaved into a single row of a bank. Using an intra-channel frame striping policy (ICFS), corresponding subsets of data elements are interleaved into a single channel row of a channel. A memory controller utilizes ICFS and/or IBFS to efficiently store and access data elements in memory, such as processing-in-memory (PIM) enabled memory.
    Type: Grant
    Filed: May 16, 2022
    Date of Patent: October 24, 2023
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Mahzabeen Islam, Shaizeen Aga, Nuwan Jayasena, Jagadish B. Kotra
  • Patent number: 11726918
    Abstract: Dynamically coalescing atomic memory operations for memory-local computing is disclosed. In an embodiment, it is determined whether a first atomic memory access and a second atomic memory access are candidates for coalescing. In response to a triggering event, the atomic memory accesses that are candidates for coalescing are coalesced in a cache prior to requesting memory-local processing by a memory-local compute unit. The atomic memory accesses may be coalesced in the same cache line or atomic memory accesses in different cache lines may be coalesced using a multicast memory-local processing command.
    Type: Grant
    Filed: June 28, 2021
    Date of Patent: August 15, 2023
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Johnathan Alsop, Alexandru Dutu, Shaizeen Aga, Nuwan Jayasena
  • Publication number: 20230244751
    Abstract: A processing device is provided which comprises memory configured to store data and a plurality of processor cores in communication with each other via first and second hierarchical communication links. Processor cores of a first hierarchical processor core group are in communication with each other via the first hierarchical communication links and are configured to store, in the memory, a sub-portion of data of a first matrix and a sub-portion of data of a second matrix. The processor cores are also configured to determine a product of the sub-portion of data of the first matrix and the sub-portion of data of the second matrix, receive, from another processor core, another sub-portion of data of the second matrix and determine a product of the sub-portion of data of the first matrix and the other sub-portion of data of the second matrix.
    Type: Application
    Filed: April 7, 2023
    Publication date: August 3, 2023
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Shaizeen Aga, Nuwan Jayasena, Allen H. Rush, Michael Ignatowski