Patents by Inventor Bradford M. Beckmann
Bradford M. Beckmann has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240045718Abstract: Techniques for executing workgroups are provided. The techniques include executing, for a first workgroup of a first kernel dispatch, a workgroup dependency instruction that includes an indication to prioritize execution of a second workgroup of a second kernel dispatch, and in response to the workgroup dependency instruction, dispatching the second workgroup of the second kernel dispatch prior to dispatching a third workgroup of the second kernel dispatch, wherein no workgroup dependency instruction including an indication to prioritize execution of the third workgroup has been executed.Type: ApplicationFiled: October 17, 2023Publication date: February 8, 2024Applicant: Advanced Micro Devices, Inc.Inventors: Alexandru Dutu, Marcus Nathaniel Chow, Matthew D. Sinclair, Bradford M. Beckmann, David A. Wood
-
Patent number: 11809902Abstract: Techniques for executing workgroups are provided. The techniques include executing, for a first workgroup of a first kernel dispatch, a workgroup dependency instruction that includes an indication to prioritize execution of a second workgroup of a second kernel dispatch, and in response to the workgroup dependency instruction, dispatching the second workgroup of the second kernel dispatch prior to dispatching a third workgroup of the second kernel dispatch, wherein no workgroup dependency instruction including an indication to prioritize execution of the third workgroup has been executed.Type: GrantFiled: September 24, 2020Date of Patent: November 7, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Alexandru Dutu, Marcus Nathaniel Chow, Matthew D. Sinclair, Bradford M. Beckmann, David A. Wood
-
Publication number: 20230102296Abstract: A processing unit decomposes a matrix for partial processing at a processor-in-memory (PIM) device. The processing unit receives a matrix to be used as an operand in an arithmetic operation (e.g., a matrix multiplication operation). In response, the processing unit decomposes the matrix into two component matrices: a sparse component matrix and a dense component matrix. The processing unit itself performs the arithmetic operation with the dense component matrix, but sends the sparse component matrix to the PIM device for execution of the arithmetic operation. The processing unit thereby offloads at least some of the processing overhead to the PIM device, improving overall efficiency of the processing system.Type: ApplicationFiled: September 30, 2021Publication date: March 30, 2023Inventors: Michael W. Boyer, Ashish Gondimalla, Bradford M. Beckmann
-
Patent number: 11436016Abstract: A technique for determining whether a register value should be written to an operand cache or whether the register value should remain in and not be evicted from the operand cache is provided. The technique includes executing an instruction that accesses an operand that comprises the register value, performing one or both of a lookahead technique and a prediction technique to determine whether the register value should be written to an operand cache or whether the register value should remain in and not be evicted from the operand cache, and based on the determining, updating the operand cache.Type: GrantFiled: December 4, 2019Date of Patent: September 6, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Anthony T. Gutierrez, Bradford M. Beckmann, Marcus Nathaniel Chow
-
Publication number: 20220197696Abstract: Methods, devices, and systems for launching a compute kernel. A reference kernel dispatch packet is received by a kernel agent. The reference kernel dispatch packet is processed by the kernel agent to determine kernel dispatch information. The kernel dispatch information is stored by the kernel agent. A kernel is dispatched by the kernel agent, based on the kernel dispatch information. In some implementations, a condensed kernel dispatch packet is received by the kernel agent, the condensed kernel dispatch packet is processed by the kernel agent to retrieve the stored kernel dispatch information, and a kernel is dispatched by the kernel agent based on the retrieved kernel dispatch information.Type: ApplicationFiled: December 23, 2020Publication date: June 23, 2022Applicant: Advanced Micro Devices, Inc.Inventors: Sooraj Puthoor, Bradford M. Beckmann
-
Publication number: 20220114097Abstract: Methods, devices, and systems for managing performance of a processor having multiple compute units. An effective number of the multiple compute units may be determined to designate as having priority. On a condition that the effective number is nonzero, the effective number of the multiple compute units may each be designated as a priority compute unit. Priority compute units may have access to a shared cache whereas non-priority compute units may not. Workgroups may be preferentially dispatched to priority compute units. Memory access requests from priority compute units may be served ahead of requests from non-priority compute units.Type: ApplicationFiled: December 20, 2021Publication date: April 14, 2022Applicant: Advanced Micro Devices, Inc.Inventors: Zhe Wang, Sooraj Puthoor, Bradford M. Beckmann
-
Patent number: 11288095Abstract: A technique for synchronizing workgroups is provided. The techniques comprise detecting that one or more non-executing workgroups are ready to execute, placing the one or more non-executing workgroups into one or more ready queues based on the synchronization status of the one or more workgroups, detecting that computing resources are available for execution of one or more ready workgroups, and scheduling for execution one or more ready workgroups from the one or more ready queues in an order that is based on the relative priority of the ready queues.Type: GrantFiled: September 30, 2019Date of Patent: March 29, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Alexandru Dutu, Matthew D. Sinclair, Bradford M. Beckmann, David A. Wood
-
Publication number: 20220091880Abstract: Techniques for executing workgroups are provided. The techniques include executing, for a first workgroup of a first kernel dispatch, a workgroup dependency instruction that includes an indication to prioritize execution of a second workgroup of a second kernel dispatch, and in response to the workgroup dependency instruction, dispatching the second workgroup of the second kernel dispatch prior to dispatching a third workgroup of the second kernel dispatch, wherein no workgroup dependency instruction including an indication to prioritize execution of the third workgroup has been executed.Type: ApplicationFiled: September 24, 2020Publication date: March 24, 2022Applicant: Advanced Micro Devices, Inc.Inventors: Alexandru Dutu, Marcus Nathaniel Chow, Matthew D. Sinclair, Bradford M. Beckmann, David A. Wood
-
Patent number: 11204871Abstract: Methods, devices, and systems for managing performance of a processor having multiple compute units. An effective number of the multiple compute units may be determined to designate as having priority. On a condition that the effective number is nonzero, the effective number of the multiple compute units may each be designated as a priority compute unit. Priority compute units may have access to a shared cache whereas non-priority compute units may not. Workgroups may be preferentially dispatched to priority compute units. Memory access requests from priority compute units may be served ahead of requests from non-priority compute units.Type: GrantFiled: June 30, 2015Date of Patent: December 21, 2021Assignee: Advanced Micro Devices, Inc.Inventors: Zhe Wang, Sooraj Puthoor, Bradford M. Beckmann
-
Publication number: 20210173650Abstract: A technique for determining whether a register value should be written to an operand cache or whether the register value should remain in and not be evicted from the operand cache is provided. The technique includes executing an instruction that accesses an operand that comprises the register value, performing one or both of a lookahead technique and a prediction technique to determine whether the register value should be written to an operand cache or whether the register value should remain in and not be evicted from the operand cache, and based on the determining, updating the operand cache.Type: ApplicationFiled: December 4, 2019Publication date: June 10, 2021Applicant: Advanced Micro Devices, Inc.Inventors: Anthony T. Gutierrez, Bradford M. Beckmann, Marcus Nathaniel Chow
-
Publication number: 20210096909Abstract: A technique for synchronizing workgroups is provided. The techniques comprise detecting that one or more non-executing workgroups are ready to execute, placing the one or more non-executing workgroups into one or more ready queues based on the synchronization status of the one or more workgroups, detecting that computing resources are available for execution of one or more ready workgroups, and scheduling for execution one or more ready workgroups from the one or more ready queues in an order that is based on the relative priority of the ready queues.Type: ApplicationFiled: September 30, 2019Publication date: April 1, 2021Applicant: Advanced Micro Devices, Inc.Inventors: Alexandru Dutu, Matthew D. Sinclair, Bradford M. Beckmann, David A. Wood
-
Publication number: 20200379820Abstract: A technique for synchronizing workgroups is provided. Multiple workgroups execute a wait instruction that specifies a condition variable and a condition. A workgroup scheduler stops execution of a workgroup that executes a wait instruction and an advanced controller begins monitoring the condition variable. In response to the advanced controller detecting that the condition is met, the workgroup scheduler determines whether there is a high contention scenario, which occurs when the wait instruction is part of a mutual exclusion synchronization primitive and is detected by determining that there is a low number of updates to the condition variable prior to detecting that the condition has been met. In a high contention scenario, the workgroup scheduler wakes up one workgroup and schedules another workgroup to be woken up at a time in the future. In a non-contention scenario, more than one workgroup can be woken up at the same time.Type: ApplicationFiled: May 29, 2019Publication date: December 3, 2020Applicant: Advanced Micro Devices, Inc.Inventors: Alexandru Dutu, Sergey Blagodurov, Anthony T. Gutierrez, Matthew D. Sinclair, David A. Wood, Bradford M. Beckmann
-
Patent number: 10838727Abstract: A processing device is provided which includes memory and at least one processor. The memory includes main memory and cache memory in communication with the main memory via a link. The at least one processor is configured to receive a request for a cache line and read the cache line from main memory. The at least one processor is also configured to compress the cache line according to a compression algorithm and, when the compressed cache line includes at least one byte predicted not to be accessed, drop the at least one byte from the compressed cache line based on whether the compression algorithm is determined to successfully compress the cache line according to a compression parameter.Type: GrantFiled: December 14, 2018Date of Patent: November 17, 2020Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Shomit N. Das, Kishore Punniyamurthy, Matthew Tomei, Bradford M. Beckmann
-
Publication number: 20200192671Abstract: A processing device is provided which includes memory and at least one processor. The memory includes main memory and cache memory in communication with the main memory via a link. The at least one processor is configured to receive a request for a cache line and read the cache line from main memory. The at least one processor is also configured to compress the cache line according to a compression algorithm and, when the compressed cache line includes at least one byte predicted not to be accessed, drop the at least one byte from the compressed cache line based on whether the compression algorithm is determined to successfully compress the cache line according to a compression parameter.Type: ApplicationFiled: December 14, 2018Publication date: June 18, 2020Applicant: Advanced Micro Devices, Inc.Inventors: Shomit N. Das, Kishore Punniyamurthy, Matthew Tomei, Bradford M. Beckmann
-
Patent number: 10558418Abstract: A technique for implementing synchronization monitors on an accelerated processing device (“APD”) is provided. Work on an APD includes workgroups that include one or more wavefronts. All wavefronts of a workgroup execute on a single compute unit. A monitor is a synchronization construct that allows workgroups to stall until a particular condition is met. Responsive to all wavefronts of a workgroup executing a wait instruction, the monitor coordinator records the workgroup in an “entry queue.” The workgroup begins saving its state to a general APD memory and, when such saving is complete, the monitor coordinator moves the workgroup to a “condition queue.” When the condition specified by the wait instruction is met, the monitor coordinator moves the workgroup to a “ready queue,” and, when sufficient resources are available on a compute unit, the APD schedules the ready workgroup for execution on a compute unit.Type: GrantFiled: July 27, 2017Date of Patent: February 11, 2020Assignee: Advanced Micro Devices, Inc.Inventors: Alexandru Dutu, Bradford M. Beckmann
-
Publication number: 20200034195Abstract: Techniques for improved networking performance in systems where a graphics processing unit or other highly parallel non-central-processing-unit (referred to as an accelerated processing device or “APD” herein) has the ability to directly issue commands to a networking device such as a network interface controller (“NIC”) are disclosed. According to a first technique, the latency associated with loading certain metadata into NIC hardware memory is reduced or eliminated by pre-fetching network command queue metadata into hardware network command queue metadata slots of the NIC, thereby reducing the latency associated with fetching that metadata at a later time. A second technique involves reducing latency by prioritizing work on an APD when it is known that certain network traffic is soon to arrive over the network via a NIC.Type: ApplicationFiled: July 30, 2018Publication date: January 30, 2020Applicant: Advanced Micro Devices, Inc.Inventors: Michael W. LeBeane, Khaled Hamidouche, Bradford M. Beckmann
-
Patent number: 10522193Abstract: A system, method, and computer program product are provided for a memory device system. One or more memory dies and at least one logic die are disposed in a package and communicatively coupled. The logic die comprises a processing device configurable to manage virtual memory and operate in an operating mode. The operating mode is selected from a set of operating modes comprising a slave operating mode and a host operating mode.Type: GrantFiled: September 12, 2018Date of Patent: December 31, 2019Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Nuwan S. Jayasena, Gabriel H. Loh, Bradford M. Beckmann, James M. O'Connor, Lisa R. Hsu
-
Patent number: 10360652Abstract: A processor comprising hardware logic configured to execute of a first wavefront in a hardware resource and stop execution of the first wavefront before the first wavefront completes. The processor schedules a second wavefront for execution in the hardware resource.Type: GrantFiled: June 13, 2014Date of Patent: July 23, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Marc S. Orr, Bradford M. Beckmann, Benedict R. Gaster, Steven K. Reinhardt, David A. Wood
-
Patent number: 10320695Abstract: A system and method for efficient management of network traffic management of highly data parallel computing. A processing node includes one or more processors capable of generating network messages. A network interface is used to receive and send network messages across a network. The processing node reduces at least one of a number or a storage size of the original network messages into one or more new network messages. The new network messages are sent to the network interface to send across the network.Type: GrantFiled: May 26, 2016Date of Patent: June 11, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Steven K. Reinhardt, Marc S. Orr, Bradford M. Beckmann, Shuai Che, David A. Wood
-
Patent number: 10209990Abstract: A conditional fetch-and-phi operation tests a memory location to determine if the memory locations stores a specified value and, if so, modifies the value at the memory location. The conditional fetch-and-phi operation can be implemented so that it can be concurrently executed by a plurality of concurrently executing threads, such as the threads of wavefront at a GPU. To execute the conditional fetch-and-phi operation, one of the concurrently executing threads is selected to execute a compare-and-swap (CAS) operation at the memory location, while the other threads await the results. The CAS operation tests the value at the memory location and, if the CAS operation is successful, the value is passed to each of the concurrently executing threads.Type: GrantFiled: June 2, 2015Date of Patent: February 19, 2019Assignee: Advanced Micro Devices, Inc.Inventors: David A. Wood, Steven K. Reinhardt, Bradford M. Beckmann, Marc S. Orr