Patents by Inventor Alexandru Dutu
Alexandru Dutu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240103745Abstract: A memory controller coupled to a memory module receives both processing-in-memory (PIM) requests and memory requests from a host (e.g., a host processor). The memory controller issues PIM requests to one group of memory banks and concurrently issues memory requests to one or more other groups of memory banks. Accordingly, memory requests are performed on groups of memory banks that would otherwise be idle while PIM requests are performed on the one group of memory banks. Optionally, the memory controller coupled to the memory module also takes various actions when switching between operating in a PIM mode and a non-processing-in-memory mode to reduce or hide overhead when switching between the two modes.Type: ApplicationFiled: September 28, 2022Publication date: March 28, 2024Applicant: Advanced Micro Devices, Inc.Inventors: Niti Madan, Johnathan Robert Alsop, Alexandru Dutu, Mahzabeen Islam, Yasuko Eckert, Nuwan S Jayasena
-
Publication number: 20240045718Abstract: Techniques for executing workgroups are provided. The techniques include executing, for a first workgroup of a first kernel dispatch, a workgroup dependency instruction that includes an indication to prioritize execution of a second workgroup of a second kernel dispatch, and in response to the workgroup dependency instruction, dispatching the second workgroup of the second kernel dispatch prior to dispatching a third workgroup of the second kernel dispatch, wherein no workgroup dependency instruction including an indication to prioritize execution of the third workgroup has been executed.Type: ApplicationFiled: October 17, 2023Publication date: February 8, 2024Applicant: Advanced Micro Devices, Inc.Inventors: Alexandru Dutu, Marcus Nathaniel Chow, Matthew D. Sinclair, Bradford M. Beckmann, David A. Wood
-
Patent number: 11809902Abstract: Techniques for executing workgroups are provided. The techniques include executing, for a first workgroup of a first kernel dispatch, a workgroup dependency instruction that includes an indication to prioritize execution of a second workgroup of a second kernel dispatch, and in response to the workgroup dependency instruction, dispatching the second workgroup of the second kernel dispatch prior to dispatching a third workgroup of the second kernel dispatch, wherein no workgroup dependency instruction including an indication to prioritize execution of the third workgroup has been executed.Type: GrantFiled: September 24, 2020Date of Patent: November 7, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Alexandru Dutu, Marcus Nathaniel Chow, Matthew D. Sinclair, Bradford M. Beckmann, David A. Wood
-
Patent number: 11726918Abstract: Dynamically coalescing atomic memory operations for memory-local computing is disclosed. In an embodiment, it is determined whether a first atomic memory access and a second atomic memory access are candidates for coalescing. In response to a triggering event, the atomic memory accesses that are candidates for coalescing are coalesced in a cache prior to requesting memory-local processing by a memory-local compute unit. The atomic memory accesses may be coalesced in the same cache line or atomic memory accesses in different cache lines may be coalesced using a multicast memory-local processing command.Type: GrantFiled: June 28, 2021Date of Patent: August 15, 2023Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Johnathan Alsop, Alexandru Dutu, Shaizeen Aga, Nuwan Jayasena
-
Publication number: 20230205705Abstract: An approach provides indirect addressing support for PIM. Indirect PIM commands include address translation information that allows memory modules to perform indirect addressing. Processing logic in a memory module processes an indirect PIM command and retrieves, from a first memory location, a virtual address of a second memory location. The processing logic calculates a corresponding physical address for the virtual address using the address translation information included with the indirect PIM command and retrieves, from the second memory location, a virtual address of a third memory location. This process is repeated any number of times until one or more indirection stop criteria are satisfied. The indirection stop criteria stop the process when work has been completed normally or to prevent errors. Implementations include the processing logic in the memory module working in cooperation with a memory controller to perform indirect addressing.Type: ApplicationFiled: December 23, 2021Publication date: June 29, 2023Inventors: Matthew R. Poremba, Alexandru Dutu, Sooraj Puthoor
-
Publication number: 20230130969Abstract: A memory includes two or more portions of memory circuitry and two or more processor in memory (PIM) functional blocks, each PIM functional block associated with a respective portion of the memory circuitry. In operation, at least one other PIM functional block other than a particular PIM functional block copies data from a source location accessible to the other PIM functional block. The other PIM functional block then provides the data to the particular PIM functional block. The particular acquires and stores the data in a destination location accessible to the particular PIM functional block. The particular PIM functional block next performs one or more PIM operations using the data.Type: ApplicationFiled: October 27, 2021Publication date: April 27, 2023Inventors: Alexandru Dutu, Vaibhav Ramakrishnan Ramachandran, Michael W. Boyer
-
Publication number: 20220414013Abstract: Dynamically coalescing atomic memory operations for memory-local computing is disclosed. In an embodiment, it is determined whether a first atomic memory access and a second atomic memory access are candidates for coalescing. In response to a triggering event, the atomic memory accesses that are candidates for coalescing are coalesced in a cache prior to requesting memory-local processing by a memory-local compute unit. The atomic memory accesses may be coalesced in the same cache line or atomic memory accesses in different cache lines may be coalesced using a multicast memory-local processing command.Type: ApplicationFiled: June 28, 2021Publication date: December 29, 2022Inventors: JOHNATHAN ALSOP, ALEXANDRU DUTU, SHAIZEEN AGA, NUWAN JAYASENA
-
Patent number: 11481250Abstract: A first workgroup is preempted in response to threads in the first workgroup executing a first wait instruction including a first value of a signal and a first hint indicating a type of modification for the signal. The first workgroup is scheduled for execution on a processor core based on a first context after preemption in response to the signal having the first value. A second workgroup is scheduled for execution on the processor core based on a second context in response to preempting the first workgroup and in response to the signal having a second value. A third context it is prefetched into registers of the processor core based on the first hint and the second value. The first context is stored in a first portion of the registers and the second context is prefetched into a second portion of the registers prior to preempting the first workgroup.Type: GrantFiled: June 29, 2018Date of Patent: October 25, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Alexandru Dutu, Matthew David Sinclair, Bradford Beckmann, David A. Wood
-
Publication number: 20220206869Abstract: Virtualizing resources of a memory-based execution device is disclosed. A host processing system orchestrates the execution of two or more offload tasks on a remote execution device. The remote execution device includes a memory array coupled to a processing unit that is shared by concurrent processes on the host processing system. The host processing system provides time-multiplexed access to the processing unit by each concurrent process for completing offload tasks on the processing unit. The host processing system initiates a context switch on the remote execution device from a first offload task to a second offload task. The context state of the first offload task is saved on the remote execution device.Type: ApplicationFiled: December 28, 2020Publication date: June 30, 2022Inventors: VAIBHAV RAMAKRISHNAN RAMACHANDRAN, ALEXANDRU DUTU, BRADFORD BECKMANN
-
Publication number: 20220206851Abstract: A method and processing apparatus are provided for executing a program. The processing apparatus comprises memory and a processor. The processor is configured to dispatch a parent work group of a program to be executed and execute a spawn work group instruction to enable a child work group of the parent work group to be executed. The processor is also configured to dispatch the child work group for execution when a sufficient amount of resources are determined to be available to execute the child work group and execute the child work group on one or more compute units. The spawn work group instruction comprises a pointer to a synchronization variable, and the processor is also configured to execute a join workgroup instruction which comprises the pointer to the synchronization variable in the spawn work group instruction.Type: ApplicationFiled: December 30, 2020Publication date: June 30, 2022Applicant: Advanced Micro Devices, Inc.Inventor: Alexandru Dutu
-
Patent number: 11288095Abstract: A technique for synchronizing workgroups is provided. The techniques comprise detecting that one or more non-executing workgroups are ready to execute, placing the one or more non-executing workgroups into one or more ready queues based on the synchronization status of the one or more workgroups, detecting that computing resources are available for execution of one or more ready workgroups, and scheduling for execution one or more ready workgroups from the one or more ready queues in an order that is based on the relative priority of the ready queues.Type: GrantFiled: September 30, 2019Date of Patent: March 29, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Alexandru Dutu, Matthew D. Sinclair, Bradford M. Beckmann, David A. Wood
-
Publication number: 20220091880Abstract: Techniques for executing workgroups are provided. The techniques include executing, for a first workgroup of a first kernel dispatch, a workgroup dependency instruction that includes an indication to prioritize execution of a second workgroup of a second kernel dispatch, and in response to the workgroup dependency instruction, dispatching the second workgroup of the second kernel dispatch prior to dispatching a third workgroup of the second kernel dispatch, wherein no workgroup dependency instruction including an indication to prioritize execution of the third workgroup has been executed.Type: ApplicationFiled: September 24, 2020Publication date: March 24, 2022Applicant: Advanced Micro Devices, Inc.Inventors: Alexandru Dutu, Marcus Nathaniel Chow, Matthew D. Sinclair, Bradford M. Beckmann, David A. Wood
-
Publication number: 20210373975Abstract: A processing system monitors and synchronizes parallel execution of workgroups (WGs). One or more of the WGs perform (e.g., periodically or in response to a trigger such as an indication of oversubscription) a waiting atomic instruction. In response to a comparison between an atomic value produced as a result of the waiting atomic instruction and an expected value, WGs that fail to produce a correct atomic value are identified as being in a waiting state (e.g., waiting for a synchronization variable). Execution of WGs in the waiting state is prevented (e.g., by a context switch) until corresponding synchronization variables are released.Type: ApplicationFiled: September 23, 2020Publication date: December 2, 2021Inventors: Alexandru DUTU, Matthew David SINCLAIR, Bradford BECKMANN, David A. WOOD
-
Publication number: 20210096909Abstract: A technique for synchronizing workgroups is provided. The techniques comprise detecting that one or more non-executing workgroups are ready to execute, placing the one or more non-executing workgroups into one or more ready queues based on the synchronization status of the one or more workgroups, detecting that computing resources are available for execution of one or more ready workgroups, and scheduling for execution one or more ready workgroups from the one or more ready queues in an order that is based on the relative priority of the ready queues.Type: ApplicationFiled: September 30, 2019Publication date: April 1, 2021Applicant: Advanced Micro Devices, Inc.Inventors: Alexandru Dutu, Matthew D. Sinclair, Bradford M. Beckmann, David A. Wood
-
Publication number: 20200379820Abstract: A technique for synchronizing workgroups is provided. Multiple workgroups execute a wait instruction that specifies a condition variable and a condition. A workgroup scheduler stops execution of a workgroup that executes a wait instruction and an advanced controller begins monitoring the condition variable. In response to the advanced controller detecting that the condition is met, the workgroup scheduler determines whether there is a high contention scenario, which occurs when the wait instruction is part of a mutual exclusion synchronization primitive and is detected by determining that there is a low number of updates to the condition variable prior to detecting that the condition has been met. In a high contention scenario, the workgroup scheduler wakes up one workgroup and schedules another workgroup to be woken up at a time in the future. In a non-contention scenario, more than one workgroup can be woken up at the same time.Type: ApplicationFiled: May 29, 2019Publication date: December 3, 2020Applicant: Advanced Micro Devices, Inc.Inventors: Alexandru Dutu, Sergey Blagodurov, Anthony T. Gutierrez, Matthew D. Sinclair, David A. Wood, Bradford M. Beckmann
-
Patent number: 10558418Abstract: A technique for implementing synchronization monitors on an accelerated processing device (“APD”) is provided. Work on an APD includes workgroups that include one or more wavefronts. All wavefronts of a workgroup execute on a single compute unit. A monitor is a synchronization construct that allows workgroups to stall until a particular condition is met. Responsive to all wavefronts of a workgroup executing a wait instruction, the monitor coordinator records the workgroup in an “entry queue.” The workgroup begins saving its state to a general APD memory and, when such saving is complete, the monitor coordinator moves the workgroup to a “condition queue.” When the condition specified by the wait instruction is met, the monitor coordinator moves the workgroup to a “ready queue,” and, when sufficient resources are available on a compute unit, the APD schedules the ready workgroup for execution on a compute unit.Type: GrantFiled: July 27, 2017Date of Patent: February 11, 2020Assignee: Advanced Micro Devices, Inc.Inventors: Alexandru Dutu, Bradford M. Beckmann
-
Publication number: 20200004586Abstract: A first workgroup is preempted in response to threads in the first workgroup executing a first wait instruction including a first value of a signal and a first hint indicating a type of modification for the signal. The first workgroup is scheduled for execution on a processor core based on a first context after preemption in response to the signal having the first value. A second workgroup is scheduled for execution on the processor core based on a second context in response to preempting the first workgroup and in response to the signal having a second value. A third context it is prefetched into registers of the processor core based on the first hint and the second value. The first context is stored in a first portion of the registers and the second context is prefetched into a second portion of the registers prior to preempting the first workgroup.Type: ApplicationFiled: June 29, 2018Publication date: January 2, 2020Inventors: Alexandru DUTU, Matthew David SINCLAIR, Bradford BECKMANN, David A. WOOD
-
Publication number: 20190034151Abstract: A technique for implementing synchronization monitors on an accelerated processing device (“APD”) is provided. Work on an APD includes workgroups that include one or more wavefronts. All wavefronts of a workgroup execute on a single compute unit. A monitor is a synchronization construct that allows workgroups to stall until a particular condition is met. Responsive to all wavefronts of a workgroup executing a wait instruction, the monitor coordinator records the workgroup in an “entry queue.” The workgroup begins saving its state to a general APD memory and, when such saving is complete, the monitor coordinator moves the workgroup to a “condition queue.” When the condition specified by the wait instruction is met, the monitor coordinator moves the workgroup to a “ready queue,” and, when sufficient resources are available on a compute unit, the APD schedules the ready workgroup for execution on a compute unit.Type: ApplicationFiled: July 27, 2017Publication date: January 31, 2019Applicant: Advanced Micro Devices, Inc.Inventors: Alexandru Dutu, Bradford M. Beckmann