Patents by Inventor Alexandru Dutu

Alexandru Dutu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240103745
    Abstract: A memory controller coupled to a memory module receives both processing-in-memory (PIM) requests and memory requests from a host (e.g., a host processor). The memory controller issues PIM requests to one group of memory banks and concurrently issues memory requests to one or more other groups of memory banks. Accordingly, memory requests are performed on groups of memory banks that would otherwise be idle while PIM requests are performed on the one group of memory banks. Optionally, the memory controller coupled to the memory module also takes various actions when switching between operating in a PIM mode and a non-processing-in-memory mode to reduce or hide overhead when switching between the two modes.
    Type: Application
    Filed: September 28, 2022
    Publication date: March 28, 2024
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Niti Madan, Johnathan Robert Alsop, Alexandru Dutu, Mahzabeen Islam, Yasuko Eckert, Nuwan S Jayasena
  • Publication number: 20240045718
    Abstract: Techniques for executing workgroups are provided. The techniques include executing, for a first workgroup of a first kernel dispatch, a workgroup dependency instruction that includes an indication to prioritize execution of a second workgroup of a second kernel dispatch, and in response to the workgroup dependency instruction, dispatching the second workgroup of the second kernel dispatch prior to dispatching a third workgroup of the second kernel dispatch, wherein no workgroup dependency instruction including an indication to prioritize execution of the third workgroup has been executed.
    Type: Application
    Filed: October 17, 2023
    Publication date: February 8, 2024
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Alexandru Dutu, Marcus Nathaniel Chow, Matthew D. Sinclair, Bradford M. Beckmann, David A. Wood
  • Patent number: 11809902
    Abstract: Techniques for executing workgroups are provided. The techniques include executing, for a first workgroup of a first kernel dispatch, a workgroup dependency instruction that includes an indication to prioritize execution of a second workgroup of a second kernel dispatch, and in response to the workgroup dependency instruction, dispatching the second workgroup of the second kernel dispatch prior to dispatching a third workgroup of the second kernel dispatch, wherein no workgroup dependency instruction including an indication to prioritize execution of the third workgroup has been executed.
    Type: Grant
    Filed: September 24, 2020
    Date of Patent: November 7, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Alexandru Dutu, Marcus Nathaniel Chow, Matthew D. Sinclair, Bradford M. Beckmann, David A. Wood
  • Patent number: 11726918
    Abstract: Dynamically coalescing atomic memory operations for memory-local computing is disclosed. In an embodiment, it is determined whether a first atomic memory access and a second atomic memory access are candidates for coalescing. In response to a triggering event, the atomic memory accesses that are candidates for coalescing are coalesced in a cache prior to requesting memory-local processing by a memory-local compute unit. The atomic memory accesses may be coalesced in the same cache line or atomic memory accesses in different cache lines may be coalesced using a multicast memory-local processing command.
    Type: Grant
    Filed: June 28, 2021
    Date of Patent: August 15, 2023
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Johnathan Alsop, Alexandru Dutu, Shaizeen Aga, Nuwan Jayasena
  • Publication number: 20230205705
    Abstract: An approach provides indirect addressing support for PIM. Indirect PIM commands include address translation information that allows memory modules to perform indirect addressing. Processing logic in a memory module processes an indirect PIM command and retrieves, from a first memory location, a virtual address of a second memory location. The processing logic calculates a corresponding physical address for the virtual address using the address translation information included with the indirect PIM command and retrieves, from the second memory location, a virtual address of a third memory location. This process is repeated any number of times until one or more indirection stop criteria are satisfied. The indirection stop criteria stop the process when work has been completed normally or to prevent errors. Implementations include the processing logic in the memory module working in cooperation with a memory controller to perform indirect addressing.
    Type: Application
    Filed: December 23, 2021
    Publication date: June 29, 2023
    Inventors: Matthew R. Poremba, Alexandru Dutu, Sooraj Puthoor
  • Publication number: 20230130969
    Abstract: A memory includes two or more portions of memory circuitry and two or more processor in memory (PIM) functional blocks, each PIM functional block associated with a respective portion of the memory circuitry. In operation, at least one other PIM functional block other than a particular PIM functional block copies data from a source location accessible to the other PIM functional block. The other PIM functional block then provides the data to the particular PIM functional block. The particular acquires and stores the data in a destination location accessible to the particular PIM functional block. The particular PIM functional block next performs one or more PIM operations using the data.
    Type: Application
    Filed: October 27, 2021
    Publication date: April 27, 2023
    Inventors: Alexandru Dutu, Vaibhav Ramakrishnan Ramachandran, Michael W. Boyer
  • Publication number: 20220414013
    Abstract: Dynamically coalescing atomic memory operations for memory-local computing is disclosed. In an embodiment, it is determined whether a first atomic memory access and a second atomic memory access are candidates for coalescing. In response to a triggering event, the atomic memory accesses that are candidates for coalescing are coalesced in a cache prior to requesting memory-local processing by a memory-local compute unit. The atomic memory accesses may be coalesced in the same cache line or atomic memory accesses in different cache lines may be coalesced using a multicast memory-local processing command.
    Type: Application
    Filed: June 28, 2021
    Publication date: December 29, 2022
    Inventors: JOHNATHAN ALSOP, ALEXANDRU DUTU, SHAIZEEN AGA, NUWAN JAYASENA
  • Patent number: 11481250
    Abstract: A first workgroup is preempted in response to threads in the first workgroup executing a first wait instruction including a first value of a signal and a first hint indicating a type of modification for the signal. The first workgroup is scheduled for execution on a processor core based on a first context after preemption in response to the signal having the first value. A second workgroup is scheduled for execution on the processor core based on a second context in response to preempting the first workgroup and in response to the signal having a second value. A third context it is prefetched into registers of the processor core based on the first hint and the second value. The first context is stored in a first portion of the registers and the second context is prefetched into a second portion of the registers prior to preempting the first workgroup.
    Type: Grant
    Filed: June 29, 2018
    Date of Patent: October 25, 2022
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Alexandru Dutu, Matthew David Sinclair, Bradford Beckmann, David A. Wood
  • Publication number: 20220206869
    Abstract: Virtualizing resources of a memory-based execution device is disclosed. A host processing system orchestrates the execution of two or more offload tasks on a remote execution device. The remote execution device includes a memory array coupled to a processing unit that is shared by concurrent processes on the host processing system. The host processing system provides time-multiplexed access to the processing unit by each concurrent process for completing offload tasks on the processing unit. The host processing system initiates a context switch on the remote execution device from a first offload task to a second offload task. The context state of the first offload task is saved on the remote execution device.
    Type: Application
    Filed: December 28, 2020
    Publication date: June 30, 2022
    Inventors: VAIBHAV RAMAKRISHNAN RAMACHANDRAN, ALEXANDRU DUTU, BRADFORD BECKMANN
  • Publication number: 20220206851
    Abstract: A method and processing apparatus are provided for executing a program. The processing apparatus comprises memory and a processor. The processor is configured to dispatch a parent work group of a program to be executed and execute a spawn work group instruction to enable a child work group of the parent work group to be executed. The processor is also configured to dispatch the child work group for execution when a sufficient amount of resources are determined to be available to execute the child work group and execute the child work group on one or more compute units. The spawn work group instruction comprises a pointer to a synchronization variable, and the processor is also configured to execute a join workgroup instruction which comprises the pointer to the synchronization variable in the spawn work group instruction.
    Type: Application
    Filed: December 30, 2020
    Publication date: June 30, 2022
    Applicant: Advanced Micro Devices, Inc.
    Inventor: Alexandru Dutu
  • Patent number: 11288095
    Abstract: A technique for synchronizing workgroups is provided. The techniques comprise detecting that one or more non-executing workgroups are ready to execute, placing the one or more non-executing workgroups into one or more ready queues based on the synchronization status of the one or more workgroups, detecting that computing resources are available for execution of one or more ready workgroups, and scheduling for execution one or more ready workgroups from the one or more ready queues in an order that is based on the relative priority of the ready queues.
    Type: Grant
    Filed: September 30, 2019
    Date of Patent: March 29, 2022
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Alexandru Dutu, Matthew D. Sinclair, Bradford M. Beckmann, David A. Wood
  • Publication number: 20220091880
    Abstract: Techniques for executing workgroups are provided. The techniques include executing, for a first workgroup of a first kernel dispatch, a workgroup dependency instruction that includes an indication to prioritize execution of a second workgroup of a second kernel dispatch, and in response to the workgroup dependency instruction, dispatching the second workgroup of the second kernel dispatch prior to dispatching a third workgroup of the second kernel dispatch, wherein no workgroup dependency instruction including an indication to prioritize execution of the third workgroup has been executed.
    Type: Application
    Filed: September 24, 2020
    Publication date: March 24, 2022
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Alexandru Dutu, Marcus Nathaniel Chow, Matthew D. Sinclair, Bradford M. Beckmann, David A. Wood
  • Publication number: 20210373975
    Abstract: A processing system monitors and synchronizes parallel execution of workgroups (WGs). One or more of the WGs perform (e.g., periodically or in response to a trigger such as an indication of oversubscription) a waiting atomic instruction. In response to a comparison between an atomic value produced as a result of the waiting atomic instruction and an expected value, WGs that fail to produce a correct atomic value are identified as being in a waiting state (e.g., waiting for a synchronization variable). Execution of WGs in the waiting state is prevented (e.g., by a context switch) until corresponding synchronization variables are released.
    Type: Application
    Filed: September 23, 2020
    Publication date: December 2, 2021
    Inventors: Alexandru DUTU, Matthew David SINCLAIR, Bradford BECKMANN, David A. WOOD
  • Publication number: 20210096909
    Abstract: A technique for synchronizing workgroups is provided. The techniques comprise detecting that one or more non-executing workgroups are ready to execute, placing the one or more non-executing workgroups into one or more ready queues based on the synchronization status of the one or more workgroups, detecting that computing resources are available for execution of one or more ready workgroups, and scheduling for execution one or more ready workgroups from the one or more ready queues in an order that is based on the relative priority of the ready queues.
    Type: Application
    Filed: September 30, 2019
    Publication date: April 1, 2021
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Alexandru Dutu, Matthew D. Sinclair, Bradford M. Beckmann, David A. Wood
  • Publication number: 20200379820
    Abstract: A technique for synchronizing workgroups is provided. Multiple workgroups execute a wait instruction that specifies a condition variable and a condition. A workgroup scheduler stops execution of a workgroup that executes a wait instruction and an advanced controller begins monitoring the condition variable. In response to the advanced controller detecting that the condition is met, the workgroup scheduler determines whether there is a high contention scenario, which occurs when the wait instruction is part of a mutual exclusion synchronization primitive and is detected by determining that there is a low number of updates to the condition variable prior to detecting that the condition has been met. In a high contention scenario, the workgroup scheduler wakes up one workgroup and schedules another workgroup to be woken up at a time in the future. In a non-contention scenario, more than one workgroup can be woken up at the same time.
    Type: Application
    Filed: May 29, 2019
    Publication date: December 3, 2020
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Alexandru Dutu, Sergey Blagodurov, Anthony T. Gutierrez, Matthew D. Sinclair, David A. Wood, Bradford M. Beckmann
  • Patent number: 10558418
    Abstract: A technique for implementing synchronization monitors on an accelerated processing device (“APD”) is provided. Work on an APD includes workgroups that include one or more wavefronts. All wavefronts of a workgroup execute on a single compute unit. A monitor is a synchronization construct that allows workgroups to stall until a particular condition is met. Responsive to all wavefronts of a workgroup executing a wait instruction, the monitor coordinator records the workgroup in an “entry queue.” The workgroup begins saving its state to a general APD memory and, when such saving is complete, the monitor coordinator moves the workgroup to a “condition queue.” When the condition specified by the wait instruction is met, the monitor coordinator moves the workgroup to a “ready queue,” and, when sufficient resources are available on a compute unit, the APD schedules the ready workgroup for execution on a compute unit.
    Type: Grant
    Filed: July 27, 2017
    Date of Patent: February 11, 2020
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Alexandru Dutu, Bradford M. Beckmann
  • Publication number: 20200004586
    Abstract: A first workgroup is preempted in response to threads in the first workgroup executing a first wait instruction including a first value of a signal and a first hint indicating a type of modification for the signal. The first workgroup is scheduled for execution on a processor core based on a first context after preemption in response to the signal having the first value. A second workgroup is scheduled for execution on the processor core based on a second context in response to preempting the first workgroup and in response to the signal having a second value. A third context it is prefetched into registers of the processor core based on the first hint and the second value. The first context is stored in a first portion of the registers and the second context is prefetched into a second portion of the registers prior to preempting the first workgroup.
    Type: Application
    Filed: June 29, 2018
    Publication date: January 2, 2020
    Inventors: Alexandru DUTU, Matthew David SINCLAIR, Bradford BECKMANN, David A. WOOD
  • Publication number: 20190034151
    Abstract: A technique for implementing synchronization monitors on an accelerated processing device (“APD”) is provided. Work on an APD includes workgroups that include one or more wavefronts. All wavefronts of a workgroup execute on a single compute unit. A monitor is a synchronization construct that allows workgroups to stall until a particular condition is met. Responsive to all wavefronts of a workgroup executing a wait instruction, the monitor coordinator records the workgroup in an “entry queue.” The workgroup begins saving its state to a general APD memory and, when such saving is complete, the monitor coordinator moves the workgroup to a “condition queue.” When the condition specified by the wait instruction is met, the monitor coordinator moves the workgroup to a “ready queue,” and, when sufficient resources are available on a compute unit, the APD schedules the ready workgroup for execution on a compute unit.
    Type: Application
    Filed: July 27, 2017
    Publication date: January 31, 2019
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Alexandru Dutu, Bradford M. Beckmann