Patents by Inventor Bradford M. Beckmann

Bradford M. Beckmann has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

FINE-GRAINED CONDITIONAL DISPATCHING

Publication number: 20240045718

Abstract: Techniques for executing workgroups are provided. The techniques include executing, for a first workgroup of a first kernel dispatch, a workgroup dependency instruction that includes an indication to prioritize execution of a second workgroup of a second kernel dispatch, and in response to the workgroup dependency instruction, dispatching the second workgroup of the second kernel dispatch prior to dispatching a third workgroup of the second kernel dispatch, wherein no workgroup dependency instruction including an indication to prioritize execution of the third workgroup has been executed.

Type: Application

Filed: October 17, 2023

Publication date: February 8, 2024

Applicant: Advanced Micro Devices, Inc.

Inventors: Alexandru Dutu, Marcus Nathaniel Chow, Matthew D. Sinclair, Bradford M. Beckmann, David A. Wood
Fine-grained conditional dispatching

Patent number: 11809902

Abstract: Techniques for executing workgroups are provided. The techniques include executing, for a first workgroup of a first kernel dispatch, a workgroup dependency instruction that includes an indication to prioritize execution of a second workgroup of a second kernel dispatch, and in response to the workgroup dependency instruction, dispatching the second workgroup of the second kernel dispatch prior to dispatching a third workgroup of the second kernel dispatch, wherein no workgroup dependency instruction including an indication to prioritize execution of the third workgroup has been executed.

Type: Grant

Filed: September 24, 2020

Date of Patent: November 7, 2023

Assignee: Advanced Micro Devices, Inc.

Inventors: Alexandru Dutu, Marcus Nathaniel Chow, Matthew D. Sinclair, Bradford M. Beckmann, David A. Wood
DECOMPOSING MATRICES FOR PROCESSING AT A PROCESSOR-IN-MEMORY

Publication number: 20230102296

Abstract: A processing unit decomposes a matrix for partial processing at a processor-in-memory (PIM) device. The processing unit receives a matrix to be used as an operand in an arithmetic operation (e.g., a matrix multiplication operation). In response, the processing unit decomposes the matrix into two component matrices: a sparse component matrix and a dense component matrix. The processing unit itself performs the arithmetic operation with the dense component matrix, but sends the sparse component matrix to the PIM device for execution of the arithmetic operation. The processing unit thereby offloads at least some of the processing overhead to the PIM device, improving overall efficiency of the processing system.

Type: Application

Filed: September 30, 2021

Publication date: March 30, 2023

Inventors: Michael W. Boyer, Ashish Gondimalla, Bradford M. Beckmann
Techniques for improving operand caching

Patent number: 11436016

Abstract: A technique for determining whether a register value should be written to an operand cache or whether the register value should remain in and not be evicted from the operand cache is provided. The technique includes executing an instruction that accesses an operand that comprises the register value, performing one or both of a lookahead technique and a prediction technique to determine whether the register value should be written to an operand cache or whether the register value should remain in and not be evicted from the operand cache, and based on the determining, updating the operand cache.

Type: Grant

Filed: December 4, 2019

Date of Patent: September 6, 2022

Assignee: Advanced Micro Devices, Inc.

Inventors: Anthony T. Gutierrez, Bradford M. Beckmann, Marcus Nathaniel Chow
CONDENSED COMMAND PACKET FOR HIGH THROUGHPUT AND LOW OVERHEAD KERNEL LAUNCH

Publication number: 20220197696

Abstract: Methods, devices, and systems for launching a compute kernel. A reference kernel dispatch packet is received by a kernel agent. The reference kernel dispatch packet is processed by the kernel agent to determine kernel dispatch information. The kernel dispatch information is stored by the kernel agent. A kernel is dispatched by the kernel agent, based on the kernel dispatch information. In some implementations, a condensed kernel dispatch packet is received by the kernel agent, the condensed kernel dispatch packet is processed by the kernel agent to retrieve the stored kernel dispatch information, and a kernel is dispatched by the kernel agent based on the retrieved kernel dispatch information.

Type: Application

Filed: December 23, 2020

Publication date: June 23, 2022

Applicant: Advanced Micro Devices, Inc.

Inventors: Sooraj Puthoor, Bradford M. Beckmann
SYSTEM PERFORMANCE MANAGEMENT USING PRIORITIZED COMPUTE UNITS

Publication number: 20220114097

Abstract: Methods, devices, and systems for managing performance of a processor having multiple compute units. An effective number of the multiple compute units may be determined to designate as having priority. On a condition that the effective number is nonzero, the effective number of the multiple compute units may each be designated as a priority compute unit. Priority compute units may have access to a shared cache whereas non-priority compute units may not. Workgroups may be preferentially dispatched to priority compute units. Memory access requests from priority compute units may be served ahead of requests from non-priority compute units.

Type: Application

Filed: December 20, 2021

Publication date: April 14, 2022

Applicant: Advanced Micro Devices, Inc.

Inventors: Zhe Wang, Sooraj Puthoor, Bradford M. Beckmann
Enhanced atomics for workgroup synchronization

Patent number: 11288095

Abstract: A technique for synchronizing workgroups is provided. The techniques comprise detecting that one or more non-executing workgroups are ready to execute, placing the one or more non-executing workgroups into one or more ready queues based on the synchronization status of the one or more workgroups, detecting that computing resources are available for execution of one or more ready workgroups, and scheduling for execution one or more ready workgroups from the one or more ready queues in an order that is based on the relative priority of the ready queues.

Type: Grant

Filed: September 30, 2019

Date of Patent: March 29, 2022

Assignee: Advanced Micro Devices, Inc.

Inventors: Alexandru Dutu, Matthew D. Sinclair, Bradford M. Beckmann, David A. Wood
FINE-GRAINED CONDITIONAL DISPATCHING

Publication number: 20220091880

Abstract: Techniques for executing workgroups are provided. The techniques include executing, for a first workgroup of a first kernel dispatch, a workgroup dependency instruction that includes an indication to prioritize execution of a second workgroup of a second kernel dispatch, and in response to the workgroup dependency instruction, dispatching the second workgroup of the second kernel dispatch prior to dispatching a third workgroup of the second kernel dispatch, wherein no workgroup dependency instruction including an indication to prioritize execution of the third workgroup has been executed.

Type: Application

Filed: September 24, 2020

Publication date: March 24, 2022

Applicant: Advanced Micro Devices, Inc.

Inventors: Alexandru Dutu, Marcus Nathaniel Chow, Matthew D. Sinclair, Bradford M. Beckmann, David A. Wood
System performance management using prioritized compute units

Patent number: 11204871

Abstract: Methods, devices, and systems for managing performance of a processor having multiple compute units. An effective number of the multiple compute units may be determined to designate as having priority. On a condition that the effective number is nonzero, the effective number of the multiple compute units may each be designated as a priority compute unit. Priority compute units may have access to a shared cache whereas non-priority compute units may not. Workgroups may be preferentially dispatched to priority compute units. Memory access requests from priority compute units may be served ahead of requests from non-priority compute units.

Type: Grant

Filed: June 30, 2015

Date of Patent: December 21, 2021

Assignee: Advanced Micro Devices, Inc.

Inventors: Zhe Wang, Sooraj Puthoor, Bradford M. Beckmann
TECHNIQUES FOR IMPROVING OPERAND CACHING

Publication number: 20210173650

Abstract: A technique for determining whether a register value should be written to an operand cache or whether the register value should remain in and not be evicted from the operand cache is provided. The technique includes executing an instruction that accesses an operand that comprises the register value, performing one or both of a lookahead technique and a prediction technique to determine whether the register value should be written to an operand cache or whether the register value should remain in and not be evicted from the operand cache, and based on the determining, updating the operand cache.

Type: Application

Filed: December 4, 2019

Publication date: June 10, 2021

Applicant: Advanced Micro Devices, Inc.

Inventors: Anthony T. Gutierrez, Bradford M. Beckmann, Marcus Nathaniel Chow
ENHANCED ATOMICS FOR WORKGROUP SYNCHRONIZATION

Publication number: 20210096909

Abstract: A technique for synchronizing workgroups is provided. The techniques comprise detecting that one or more non-executing workgroups are ready to execute, placing the one or more non-executing workgroups into one or more ready queues based on the synchronization status of the one or more workgroups, detecting that computing resources are available for execution of one or more ready workgroups, and scheduling for execution one or more ready workgroups from the one or more ready queues in an order that is based on the relative priority of the ready queues.

Type: Application

Filed: September 30, 2019

Publication date: April 1, 2021

Applicant: Advanced Micro Devices, Inc.

Inventors: Alexandru Dutu, Matthew D. Sinclair, Bradford M. Beckmann, David A. Wood
SYNCHRONIZATION MECHANISM FOR WORKGROUPS

Publication number: 20200379820

Abstract: A technique for synchronizing workgroups is provided. Multiple workgroups execute a wait instruction that specifies a condition variable and a condition. A workgroup scheduler stops execution of a workgroup that executes a wait instruction and an advanced controller begins monitoring the condition variable. In response to the advanced controller detecting that the condition is met, the workgroup scheduler determines whether there is a high contention scenario, which occurs when the wait instruction is part of a mutual exclusion synchronization primitive and is detected by determining that there is a low number of updates to the condition variable prior to detecting that the condition has been met. In a high contention scenario, the workgroup scheduler wakes up one workgroup and schedules another workgroup to be woken up at a time in the future. In a non-contention scenario, more than one workgroup can be woken up at the same time.

Type: Application

Filed: May 29, 2019

Publication date: December 3, 2020

Applicant: Advanced Micro Devices, Inc.

Inventors: Alexandru Dutu, Sergey Blagodurov, Anthony T. Gutierrez, Matthew D. Sinclair, David A. Wood, Bradford M. Beckmann
Device and method for cache utilization aware data compression

Patent number: 10838727

Abstract: A processing device is provided which includes memory and at least one processor. The memory includes main memory and cache memory in communication with the main memory via a link. The at least one processor is configured to receive a request for a cache line and read the cache line from main memory. The at least one processor is also configured to compress the cache line according to a compression algorithm and, when the compressed cache line includes at least one byte predicted not to be accessed, drop the at least one byte from the compressed cache line based on whether the compression algorithm is determined to successfully compress the cache line according to a compression parameter.

Type: Grant

Filed: December 14, 2018

Date of Patent: November 17, 2020

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Shomit N. Das, Kishore Punniyamurthy, Matthew Tomei, Bradford M. Beckmann
DEVICE AND METHOD FOR CACHE UTILIZATION AWARE DATA COMPRESSION

Publication number: 20200192671

Abstract: A processing device is provided which includes memory and at least one processor. The memory includes main memory and cache memory in communication with the main memory via a link. The at least one processor is configured to receive a request for a cache line and read the cache line from main memory. The at least one processor is also configured to compress the cache line according to a compression algorithm and, when the compressed cache line includes at least one byte predicted not to be accessed, drop the at least one byte from the compressed cache line based on whether the compression algorithm is determined to successfully compress the cache line according to a compression parameter.

Type: Application

Filed: December 14, 2018

Publication date: June 18, 2020

Applicant: Advanced Micro Devices, Inc.

Inventors: Shomit N. Das, Kishore Punniyamurthy, Matthew Tomei, Bradford M. Beckmann
Monitor support on accelerated processing device

Patent number: 10558418

Abstract: A technique for implementing synchronization monitors on an accelerated processing device (“APD”) is provided. Work on an APD includes workgroups that include one or more wavefronts. All wavefronts of a workgroup execute on a single compute unit. A monitor is a synchronization construct that allows workgroups to stall until a particular condition is met. Responsive to all wavefronts of a workgroup executing a wait instruction, the monitor coordinator records the workgroup in an “entry queue.” The workgroup begins saving its state to a general APD memory and, when such saving is complete, the monitor coordinator moves the workgroup to a “condition queue.” When the condition specified by the wait instruction is met, the monitor coordinator moves the workgroup to a “ready queue,” and, when sufficient resources are available on a compute unit, the APD schedules the ready workgroup for execution on a compute unit.

Type: Grant

Filed: July 27, 2017

Date of Patent: February 11, 2020

Assignee: Advanced Micro Devices, Inc.

Inventors: Alexandru Dutu, Bradford M. Beckmann
NETWORK-RELATED PERFORMANCE FOR GPUS

Publication number: 20200034195

Abstract: Techniques for improved networking performance in systems where a graphics processing unit or other highly parallel non-central-processing-unit (referred to as an accelerated processing device or “APD” herein) has the ability to directly issue commands to a networking device such as a network interface controller (“NIC”) are disclosed. According to a first technique, the latency associated with loading certain metadata into NIC hardware memory is reduced or eliminated by pre-fetching network command queue metadata into hardware network command queue metadata slots of the NIC, thereby reducing the latency associated with fetching that metadata at a later time. A second technique involves reducing latency by prioritizing work on an APD when it is known that certain network traffic is soon to arrive over the network via a NIC.

Type: Application

Filed: July 30, 2018

Publication date: January 30, 2020

Applicant: Advanced Micro Devices, Inc.

Inventors: Michael W. LeBeane, Khaled Hamidouche, Bradford M. Beckmann
Processor with host and slave operating modes stacked with memory

Patent number: 10522193

Abstract: A system, method, and computer program product are provided for a memory device system. One or more memory dies and at least one logic die are disposed in a package and communicatively coupled. The logic die comprises a processing device configurable to manage virtual memory and operate in an operating mode. The operating mode is selected from a set of operating modes comprising a slave operating mode and a host operating mode.

Type: Grant

Filed: September 12, 2018

Date of Patent: December 31, 2019

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Nuwan S. Jayasena, Gabriel H. Loh, Bradford M. Beckmann, James M. O'Connor, Lisa R. Hsu
Wavefront resource virtualization

Patent number: 10360652

Abstract: A processor comprising hardware logic configured to execute of a first wavefront in a hardware resource and stop execution of the first wavefront before the first wavefront completes. The processor schedules a second wavefront for execution in the hardware resource.

Type: Grant

Filed: June 13, 2014

Date of Patent: July 23, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Marc S. Orr, Bradford M. Beckmann, Benedict R. Gaster, Steven K. Reinhardt, David A. Wood
Message aggregation, combining and compression for efficient data communications in GPU-based clusters

Patent number: 10320695

Abstract: A system and method for efficient management of network traffic management of highly data parallel computing. A processing node includes one or more processors capable of generating network messages. A network interface is used to receive and send network messages across a network. The processing node reduces at least one of a number or a storage size of the original network messages into one or more new network messages. The new network messages are sent to the network interface to send across the network.

Type: Grant

Filed: May 26, 2016

Date of Patent: June 11, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Steven K. Reinhardt, Marc S. Orr, Bradford M. Beckmann, Shuai Che, David A. Wood
Conditional atomic operations in single instruction multiple data processors

Patent number: 10209990

Abstract: A conditional fetch-and-phi operation tests a memory location to determine if the memory locations stores a specified value and, if so, modifies the value at the memory location. The conditional fetch-and-phi operation can be implemented so that it can be concurrently executed by a plurality of concurrently executing threads, such as the threads of wavefront at a GPU. To execute the conditional fetch-and-phi operation, one of the concurrently executing threads is selected to execute a compare-and-swap (CAS) operation at the memory location, while the other threads await the results. The CAS operation tests the value at the memory location and, if the CAS operation is successful, the value is passed to each of the concurrently executing threads.

Type: Grant

Filed: June 2, 2015

Date of Patent: February 19, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: David A. Wood, Steven K. Reinhardt, Bradford M. Beckmann, Marc S. Orr

1 2 3 4 5 next