Patents by Inventor Michael W. LeBeane

Michael W. LeBeane has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Efficient memory-semantic networking using scoped memory models

Patent number: 12086422

Abstract: A framework disclosed herein extends a relaxed, scoped memory model to a system that includes nodes across a commodity network and maintains coherency across the system. A new scope, cluster scope, is defined, that allows for memory accesses at scopes less than cluster scope to operate on locally cached versions of remote data from across the commodity network without having to issue expensive network operations. Cluster scope operations generate network commands that are used to synchronize memory across the commodity network.

Type: Grant

Filed: May 19, 2023

Date of Patent: September 10, 2024

Assignee: Advanced Micro Devices, Inc.

Inventors: Michael W. LeBeane, Khaled Hamidouche, Hari S. Thangirala, Brandon Keith Potter
Systems and methods for reducing instruction code memory footprint for multiple processes executed at a coprocessor

Patent number: 12086447

Abstract: A processing system includes a first processor couplable to a first memory and a second memory. In response to a page migration trigger for a page in the first memory, the first processor is configured to, responsive to the page being a read-only page storing code for execution, initiate migration of the page to a code cache portion of a second memory associated with a second processor and shared by multiple processes executing at the second processor, and to configure each process of a set of processes executing at the second processor to access and execute the code from the code cache portion.

Type: Grant

Filed: December 18, 2019

Date of Patent: September 10, 2024

Assignee: Advanced Micro Devices, Inc.

Inventors: Khaled Hamidouche, Michael W. Lebeane, Hari S. Thangirala
Network command coalescing on GPUs

Patent number: 11922207

Abstract: An approach is provided for coalescing network commands in a GPU that implements a SIMT architecture. Compatible next network operations from different threads are coalesced into a single network command packet. This reduces the number of network command packets generated and issued by threads, thereby increasing efficiency, and improving throughput. The approach is applicable to any number of threads and any thread organization methodology, such as wavefronts, warps, etc.

Type: Grant

Filed: August 13, 2020

Date of Patent: March 5, 2024

Assignee: Advanced Micro Devices, Inc

Inventors: Michael W. LeBeane, Khaled Hamidouche, Brandon K. Potter
EFFICIENT MEMORY-SEMANTIC NETWORKING USING SCOPED MEMORY MODELS

Publication number: 20230289070

Abstract: A framework disclosed herein extends a relaxed, scoped memory model to a system that includes nodes across a commodity network and maintains coherency across the system. A new scope, cluster scope, is defined, that allows for memory accesses at scopes less than cluster scope to operate on locally cached versions of remote data from across the commodity network without having to issue expensive network operations. Cluster scope operations generate network commands that are used to synchronize memory across the commodity network.

Type: Application

Filed: May 19, 2023

Publication date: September 14, 2023

Applicant: Advanced Micro Devices, Inc.

Inventors: Michael W. LeBeane, Khaled Hamidouche, Hari S. Thangirala, Brandon Keith Potter
Efficient memory-semantic networking using scoped memory models

Patent number: 11714559

Abstract: A framework disclosed herein extends a relaxed, scoped memory model to a system that includes nodes across a commodity network and maintains coherency across the system. A new scope, cluster scope, is defined, that allows for memory accesses at scopes less than cluster scope to operate on locally cached versions of remote data from across the commodity network without having to issue expensive network operations. Cluster scope operations generate network commands that are used to synchronize memory across the commodity network.

Type: Grant

Filed: September 25, 2020

Date of Patent: August 1, 2023

Assignee: Advanced Micro Devices, Inc.

Inventors: Michael W. LeBeane, Khaled Hamidouche, Hari S. Thangirala, Brandon Keith Potter
Network cache injection for coherent GPUs

Patent number: 11687460

Abstract: Methods, devices, and systems for GPU cache injection. A GPU compute node includes a network interface controller (NIC) which includes NIC receiver circuitry which can receive data for processing on the GPU, NIC transmitter circuitry which can send the data to a main memory of the GPU compute node and which can send coherence information to a coherence directory of the GPU compute node based on the data. The GPU compute node also includes a GPU which includes GPU receiver circuitry which can receive the coherence information; GPU processing circuitry which can determine, based on the coherence information, whether the data satisfies a heuristic; and GPU loading circuitry which can load the data into a cache of the GPU from the main memory if on the data satisfies the heuristic.

Type: Grant

Filed: April 26, 2017

Date of Patent: June 27, 2023

Assignee: Advanced Micro Devices, Inc.

Inventors: Michael W. LeBeane, Walter B. Benton, Vinay Agarwala
Optimized asynchronous training of neural networks using a distributed parameter server with eager updates

Patent number: 11630994

Abstract: A method of training a neural network includes, at a local computing node, receiving remote parameters from a set of one or more remote computing nodes, initiating execution of a forward pass in a local neural network in the local computing node to determine a final output based on the remote parameters, initiating execution of a backward pass in the local neural network to determine updated parameters for the local neural network, and prior to completion of the backward pass, transmitting a subset of the updated parameters to the set of remote computing nodes.

Type: Grant

Filed: February 17, 2018

Date of Patent: April 18, 2023

Assignee: Advanced Micro Devices, Inc.

Inventors: Khaled Hamidouche, Michael W LeBeane, Walter B Benton, Michael L Chu
Techniques to improve translation lookaside buffer reach by leveraging idle resources

Patent number: 11321241

Abstract: Techniques are disclosed for processing address translations. The techniques include detecting a first miss for a first address translation request for a first address translation in a first translation lookaside buffer, in response to the first miss, fetching the first address translation into the first translation lookaside buffer and evicting a second address translation from the translation lookaside buffer into an instruction cache or local data share memory, detecting a second miss for a second address translation request referencing the second address translation, in the first translation lookaside buffer, and in response to the second miss, fetching the second address translation from the instruction cache or the local data share memory.

Type: Grant

Filed: August 31, 2020

Date of Patent: May 3, 2022

Assignee: Advanced Micro Devices, Inc.

Inventors: Jagadish B. Kotra, Michael W. LeBeane
EFFICIENT MEMORY-SEMANTIC NETWORKING USING SCOPED MEMORY MODELS

Publication number: 20220100391

Abstract: A framework disclosed herein extends a relaxed, scoped memory model to a system that includes nodes across a commodity network and maintains coherency across the system. A new scope, cluster scope, is defined, that allows for memory accesses at scopes less than cluster scope to operate on locally cached versions of remote data from across the commodity network without having to issue expensive network operations. Cluster scope operations generate network commands that are used to synchronize memory across the commodity network.

Type: Application

Filed: September 25, 2020

Publication date: March 31, 2022

Applicant: Advanced Micro Devices, Inc.

Inventors: Michael W. LeBeane, Khaled Hamidouche, Hari S. Thangirala, Brandon Keith Potter
TECHNIQUES TO IMPROVE TRANSLATION LOOKASIDE BUFFER REACH BY LEVERAGING IDLE RESOURCES

Publication number: 20220066946

Abstract: Techniques are disclosed for processing address translations. The techniques include detecting a first miss for a first address translation request for a first address translation in a first translation lookaside buffer, in response to the first miss, fetching the first address translation into the first translation lookaside buffer and evicting a second address translation from the translation lookaside buffer into an instruction cache or local data share memory, detecting a second miss for a second address translation request referencing the second address translation, in the first translation lookaside buffer, and in response to the second miss, fetching the second address translation from the instruction cache or the local data share memory.

Type: Application

Filed: August 31, 2020

Publication date: March 3, 2022

Applicant: Advanced Micro Devices, Inc.

Inventors: Jagadish B. Kotra, Michael W. LeBeane
NETWORK COMMAND COALESCING ON GPUs

Publication number: 20220050707

Abstract: An approach is provided for coalescing network commands in a GPU that implements a SIMT architecture. Compatible next network operations from different threads are coalesced into a single network command packet. This reduces the number of network command packets generated and issued by threads, thereby increasing efficiency, and improving throughput. The approach is applicable to any number of threads and any thread organization methodology, such as wavefronts, warps, etc.

Type: Application

Filed: August 13, 2020

Publication date: February 17, 2022

Inventors: Michael W. LeBeane, Khaled Hamidouche, Brandon K. Potter
Network interface controller-based scheduling of processing tasks in a distributed computing system

Patent number: 10963309

Abstract: Techniques for scheduling processing tasks in a device having multiple computing elements are disclosed. A network interface controller of the device receives processing tasks, for execution on the computing elements, from a network that is external to the device. The network interface controller schedules the tasks for execution on the computing devices based on policy data available to the network interface controller. A scheduler within the network interface controller, which can be implemented as a standalone processing unit (such as a microcontroller, a programmable processing core, or an application specific integrated circuit), performs such scheduling, thereby freeing the central processing unit of the device from the burden of performing scheduling operations. The scheduler schedules the tasks according to any technically feasible scheduling technique.

Type: Grant

Filed: September 16, 2016

Date of Patent: March 30, 2021

Assignee: Advanced Micro Devices, Inc.

Inventors: Michael W. LeBeane, Abhisek Pan, Steven K. Reinhardt
Optimized and scalable sparse triangular linear systems on networks of accelerators

Patent number: 10936697

Abstract: A method includes storing a first portion of a sparse triangular matrix in a local memory and launching a kernel for executing a set of workgroups. The first portion includes a plurality of row blocks, and each workgroup in the set of workgroups is associated with one of the plurality of row blocks. The method also includes, for each workgroup in the set of workgroups, solving the row block. The row block is solved by, for each row segment of a first subset of row segments in the row block, calculating a partial sum for the row segment based on one or more matrix elements in the row segment, and writing the partial sum to a remote memory of a first remote processing unit prior to terminating the kernel.

Type: Grant

Filed: July 24, 2018

Date of Patent: March 2, 2021

Assignee: Advanced Micro Devices, Inc.

Inventors: Khaled Hamidouche, Michael W. LeBeane, Nicholas P. Malaya, Joseph L. Greathouse
GPU remote communication with triggered operations

Patent number: 10936533

Abstract: Methods, devices, and systems for transmitting data over a computer communications network are disclosed. A queue of communications commands can be pre-generated using a central processing unit (CPU) and stored in a device memory of a network interface controller (NIC). Thereafter, if a graphics processing unit (GPU) has data to communicate to a remote GPU, it can store the data in a send buffer, where the location in the buffer is pointed to by a pre-generated command. The GPU can then signal to the interface device that the data is ready, triggering execution of the pre-generated command to send the data.

Type: Grant

Filed: October 18, 2016

Date of Patent: March 2, 2021

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Michael W. LeBeane, Steven K. Reinhardt
Apparatus and method for neighborhood-aware virtual to physical address translations

Patent number: 10684957

Abstract: An apparatus and method performs neighborhood-aware virtual to physical address translations. A coalescing opportunity for a first virtual address is determined, based on completing a memory access corresponding to a page walk for a second virtual address. Metadata corresponding to the first virtual address is provided to a page table walk buffer based on the coalescing opportunity and a page walk for the first virtual address is performed based on the metadata corresponding to the first virtual address.

Type: Grant

Filed: August 23, 2018

Date of Patent: June 16, 2020

Assignee: Advanced Micro Devices, Inc.

Inventors: Michael W. Lebeane, Seunghee Shin
NETWORK-RELATED PERFORMANCE FOR GPUS

Publication number: 20200034195

Abstract: Techniques for improved networking performance in systems where a graphics processing unit or other highly parallel non-central-processing-unit (referred to as an accelerated processing device or “APD” herein) has the ability to directly issue commands to a networking device such as a network interface controller (“NIC”) are disclosed. According to a first technique, the latency associated with loading certain metadata into NIC hardware memory is reduced or eliminated by pre-fetching network command queue metadata into hardware network command queue metadata slots of the NIC, thereby reducing the latency associated with fetching that metadata at a later time. A second technique involves reducing latency by prioritizing work on an APD when it is known that certain network traffic is soon to arrive over the network via a NIC.

Type: Application

Filed: July 30, 2018

Publication date: January 30, 2020

Applicant: Advanced Micro Devices, Inc.

Inventors: Michael W. LeBeane, Khaled Hamidouche, Bradford M. Beckmann
OPTIMIZED AND SCALABLE SPARSE TRIANGULAR LINEAR SYSTEMS ON NETWORKS OF ACCELERATORS

Publication number: 20200034405

Abstract: A method includes storing a first portion of a sparse triangular matrix in a local memory and launching a kernel for executing a set of workgroups. The first portion includes a plurality of row blocks, and each workgroup in the set of workgroups is associated with one of the plurality of row blocks. The method also includes, for each workgroup in the set of workgroups, solving the row block. The row block is solved by, for each row segment of a first subset of row segments in the row block, calculating a partial sum for the row segment based on one or more matrix elements in the row segment, and writing the partial sum to a remote memory of a first remote processing unit prior to terminating the kernel.

Type: Application

Filed: July 24, 2018

Publication date: January 30, 2020

Inventors: Khaled Hamidouche, Michael W. LeBeane, Nicholas P. Malaya, Joseph L. Greathouse
OPTIMIZED ASYNCHRONOUS TRAINING OF NEURAL NETWORKS USING A DISTRIBUTED PARAMETER SERVER WITH EAGER UPDATES

Publication number: 20190258924

Abstract: A method of training a neural network includes, at a local computing node, receiving remote parameters from a set of one or more remote computing nodes, initiating execution of a forward pass in a local neural network in the local computing node to determine a final output based on the remote parameters, initiating execution of a backward pass in the local neural network to determine updated parameters for the local neural network, and prior to completion of the backward pass, transmitting a subset of the updated parameters to the set of remote computing nodes.

Type: Application

Filed: February 17, 2018

Publication date: August 22, 2019

Inventors: Khaled Hamidouche, Michael W LeBeane, Walter B Benton, Michael L Chu
NETWORK CACHE INJECTION FOR COHERENT GPUS

Publication number: 20180314638

Abstract: Methods, devices, and systems for GPU cache injection. A GPU compute node includes a network interface controller (NIC) which includes NIC receiver circuitry which can receive data for processing on the GPU, NIC transmitter circuitry which can send the data to a main memory of the GPU compute node and which can send coherence information to a coherence directory of the GPU compute node based on the data. The GPU compute node also includes a GPU which includes GPU receiver circuitry which can receive the coherence information; GPU processing circuitry which can determine, based on the coherence information, whether the data satisfies a heuristic; and GPU loading circuitry which can load the data into a cache of the GPU from the main memory if on the data satisfies the heuristic.

Type: Application

Filed: April 26, 2017

Publication date: November 1, 2018

Applicant: Advanced Micro Devices, Inc.

Inventors: Michael W. LeBeane, Walter B. Benton, Vinay Agarwala
Power aware work stealing

Patent number: 10089155

Abstract: First and second processor cores are configured to concurrently execute tasks. A scheduler is configured to schedule tasks for execution by the first and second processor cores. The first processor core is configured to selectively steal a task that was previously scheduled for execution by the second processor core based on additional power consumption incurred by migrating the task from the second processor core to the first processor core.

Type: Grant

Filed: September 22, 2015

Date of Patent: October 2, 2018

Assignee: Advanced Micro Devices, Inc.

Inventors: Michael W. LeBeane, Deepak Majeti, Mauricio Breternitz

1 2 next