Patents by Inventor Walter B. Benton

Walter B. Benton has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Network cache injection for coherent GPUs

Patent number: 11687460

Abstract: Methods, devices, and systems for GPU cache injection. A GPU compute node includes a network interface controller (NIC) which includes NIC receiver circuitry which can receive data for processing on the GPU, NIC transmitter circuitry which can send the data to a main memory of the GPU compute node and which can send coherence information to a coherence directory of the GPU compute node based on the data. The GPU compute node also includes a GPU which includes GPU receiver circuitry which can receive the coherence information; GPU processing circuitry which can determine, based on the coherence information, whether the data satisfies a heuristic; and GPU loading circuitry which can load the data into a cache of the GPU from the main memory if on the data satisfies the heuristic.

Type: Grant

Filed: April 26, 2017

Date of Patent: June 27, 2023

Assignee: Advanced Micro Devices, Inc.

Inventors: Michael W. LeBeane, Walter B. Benton, Vinay Agarwala
GPU NETWORKING USING AN INTEGRATED COMMAND PROCESSOR

Publication number: 20230120934

Abstract: Systems, apparatuses, and methods for generating network messages on a parallel processor are disclosed. A system includes at least a parallel processor, a general purpose processor, and a network interface unit. The parallel processor includes at least a plurality of compute units, a command processor, and a cache. A thread within a kernel executing on a compute unit of the parallel processor generates a network message and stores the network message and a corresponding indication in the cache. In response to detecting the indication of the network message in the cache, the command processor processes and conveys the network message to the network interface unit without involving the general purpose processor.

Type: Application

Filed: December 20, 2022

Publication date: April 20, 2023

Inventors: Michael Wayne LeBeane, Khaled Hamidouche, Walter B. Benton
Optimized asynchronous training of neural networks using a distributed parameter server with eager updates

Patent number: 11630994

Abstract: A method of training a neural network includes, at a local computing node, receiving remote parameters from a set of one or more remote computing nodes, initiating execution of a forward pass in a local neural network in the local computing node to determine a final output based on the remote parameters, initiating execution of a backward pass in the local neural network to determine updated parameters for the local neural network, and prior to completion of the backward pass, transmitting a subset of the updated parameters to the set of remote computing nodes.

Type: Grant

Filed: February 17, 2018

Date of Patent: April 18, 2023

Assignee: Advanced Micro Devices, Inc.

Inventors: Khaled Hamidouche, Michael W LeBeane, Walter B Benton, Michael L Chu
GPU networking using an integrated command processor

Patent number: 11544121

Abstract: Systems, apparatuses, and methods for generating network messages on a parallel processor are disclosed. A system includes at least a parallel processor, a general purpose processor, and a network interface unit. The parallel processor includes at least a plurality of compute units, a command processor, and a cache. A thread within a kernel executing on a compute unit of the parallel processor generates a network message and stores the network message and a corresponding indication in the cache. In response to detecting the indication of the network message in the cache, the command processor processes and conveys the network message to the network interface unit without involving the general purpose processor.

Type: Grant

Filed: November 16, 2017

Date of Patent: January 3, 2023

Assignee: Advanced Micro Devices, Inc.

Inventors: Michael Wayne LeBeane, Khaled Hamidouche, Walter B. Benton
Network packet templating for GPU-initiated communication

Patent number: 10740163

Abstract: Systems, apparatuses, and methods for performing network packet templating for graphics processing unit (GPU)-initiated communication are disclosed. A central processing unit (CPU) creates a network packet according to a template and populates a first subset of fields of the network packet with static data. Next, the CPU stores the network packet in a memory. A GPU initiates execution of a kernel and detects a network communication request within the kernel and prior to the kernel completing execution. Responsive to this determination, the GPU populates a second subset of fields of the network packet with runtime data. Then, the GPU generates a notification that the network packet is ready to be processed. A network interface controller (NIC) processes the network packet using data retrieved from the first subset of fields and from the second subset of fields responsive to detecting the notification.

Type: Grant

Filed: June 28, 2018

Date of Patent: August 11, 2020

Assignee: Advanced Micro Devices, Inc.

Inventors: Khaled Hamidouche, Michael Wayne LeBeane, Walter B. Benton
NETWORK PACKET TEMPLATING FOR GPU-INITIATED COMMUNICATION

Publication number: 20200004610

Abstract: Systems, apparatuses, and methods for performing network packet templating for graphics processing unit (GPU)-initiated communication are disclosed. A central processing unit (CPU) creates a network packet according to a template and populates a first subset of fields of the network packet with static data. Next, the CPU stores the network packet in a memory. A GPU initiates execution of a kernel and detects a network communication request within the kernel and prior to the kernel completing execution. Responsive to this determination, the GPU populates a second subset of fields of the network packet with runtime data. Then, the GPU generates a notification that the network packet is ready to be processed. A network interface controller (NIC) processes the network packet using data retrieved from the first subset of fields and from the second subset of fields responsive to detecting the notification.

Type: Application

Filed: June 28, 2018

Publication date: January 2, 2020

Inventors: Khaled Hamidouche, Michael Wayne LeBeane, Walter B. Benton
OPTIMIZED ASYNCHRONOUS TRAINING OF NEURAL NETWORKS USING A DISTRIBUTED PARAMETER SERVER WITH EAGER UPDATES

Publication number: 20190258924

Abstract: A method of training a neural network includes, at a local computing node, receiving remote parameters from a set of one or more remote computing nodes, initiating execution of a forward pass in a local neural network in the local computing node to determine a final output based on the remote parameters, initiating execution of a backward pass in the local neural network to determine updated parameters for the local neural network, and prior to completion of the backward pass, transmitting a subset of the updated parameters to the set of remote computing nodes.

Type: Application

Filed: February 17, 2018

Publication date: August 22, 2019

Inventors: Khaled Hamidouche, Michael W LeBeane, Walter B Benton, Michael L Chu
GPU NETWORKING USING AN INTEGRATED COMMAND PROCESSOR

Publication number: 20190146857

Abstract: Systems, apparatuses, and methods for generating network messages on a parallel processor are disclosed. A system includes at least a parallel processor, a general purpose processor, and a network interface unit. The parallel processor includes at least a plurality of compute units, a command processor, and a cache. A thread within a kernel executing on a compute unit of the parallel processor generates a network message and stores the network message and a corresponding indication in the cache. In response to detecting the indication of the network message in the cache, the command processor processes and conveys the network message to the network interface unit without involving the general purpose processor.

Type: Application

Filed: November 16, 2017

Publication date: May 16, 2019

Inventors: Michael Wayne LeBeane, Khaled Hamidouche, Walter B. Benton
Programming in-memory accelerators to improve the efficiency of datacenter operations

Patent number: 10198349

Abstract: Systems, apparatuses, and methods for utilizing in-memory accelerators to perform data conversion operations are disclosed. A system includes one or more main processors coupled to one or more memory modules. Each memory module includes one or more memory devices coupled to a processing in memory (PIM) device. The main processors are configured to generate an executable for a PIM device to accelerate data conversion tasks of data stored in the local memory devices. In one embodiment, the system detects a read request for data stored in a given memory module. In order to process the read request, the system determines that a conversion from a first format to a second format is required. In response to detecting the read request, the given memory module's PIM device performs the conversion of the data from the first format to the second format and then provides the data to a consumer application.

Type: Grant

Filed: September 19, 2016

Date of Patent: February 5, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Mauricio Breternitz, Walter B. Benton
NETWORK CACHE INJECTION FOR COHERENT GPUS

Publication number: 20180314638

Abstract: Methods, devices, and systems for GPU cache injection. A GPU compute node includes a network interface controller (NIC) which includes NIC receiver circuitry which can receive data for processing on the GPU, NIC transmitter circuitry which can send the data to a main memory of the GPU compute node and which can send coherence information to a coherence directory of the GPU compute node based on the data. The GPU compute node also includes a GPU which includes GPU receiver circuitry which can receive the coherence information; GPU processing circuitry which can determine, based on the coherence information, whether the data satisfies a heuristic; and GPU loading circuitry which can load the data into a cache of the GPU from the main memory if on the data satisfies the heuristic.

Type: Application

Filed: April 26, 2017

Publication date: November 1, 2018

Applicant: Advanced Micro Devices, Inc.

Inventors: Michael W. LeBeane, Walter B. Benton, Vinay Agarwala
PROGRAMMING IN-MEMORY ACCELERATORS TO IMPROVE THE EFFICIENCY OF DATACENTER OPERATIONS

Publication number: 20180081583

Abstract: Systems, apparatuses, and methods for utilizing in-memory accelerators to perform data conversion operations are disclosed. A system includes one or more main processors coupled to one or more memory modules. Each memory module includes one or more memory devices coupled to a processing in memory (PIM) device. The main processors are configured to generate an executable for a PIM device to accelerate data conversion tasks of data stored in the local memory devices. In one embodiment, the system detects a read request for data stored in a given memory module. In order to process the read request, the system determines that a conversion from a first format to a second format is required. In response to detecting the read request, the given memory module's PIM device performs the conversion of the data from the first format to the second format and then provides the data to a consumer application.

Type: Application

Filed: September 19, 2016

Publication date: March 22, 2018

Inventors: Mauricio Breternitz, Walter B. Benton
METHOD AND APPARATUS FOR TIME-BASED SCHEDULING OF TASKS

Publication number: 20170161114

Abstract: A computing device is disclosed. The computing device includes an Accelerated Processing Unit (APU) including at least a first Heterogeneous System Architecture (HSA) computing device and at least a second HSA computing device, the second computing device being a different type than the first computing device, and an HSA Memory Management Unit (HMMU) allowing the APU to communicate with at least one memory. The computing task is enqueued on an HSA-managed queue that is set to run on the at least first HSA computing device or the at least second HSA computing device. The computing task is re-enqueued on the HSA-managed queue based on a repetition flag that triggers the number of times the computing task is re-enqueued. The repetition field is decremented each time the computing task is re-enqueued. The repetition field may include a special value (e.g., ?1) to allow re-enqueuing of the computing task indefinitely.

Type: Application

Filed: December 8, 2015

Publication date: June 8, 2017

Applicant: Advanced Micro Devices, Inc.

Inventors: Walter B. Benton, Steven K. Reinhardt
Remote task queuing by networked computing devices

Patent number: 9582402

Abstract: The described embodiments include a networking subsystem in a second computing device that is configured to receive a task message from a first computing device. Based on the task message, the networking subsystem updates an entry in a task queue with task information from the task message. A processing subsystem in the second computing device subsequently retrieves the task information from the task queue and performs the corresponding task. In these embodiments, the networking subsystem processes the task message (e.g., stores the task information in the task queue) without causing the processing subsystem to perform operations for processing the task message.

Type: Grant

Filed: January 26, 2014

Date of Patent: February 28, 2017

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Steven K. Reinhardt, Michael L. Chu, Vinod Tipparaju, Walter B. Benton
Remote Task Queuing by Networked Computing Devices

Publication number: 20140331230

Abstract: The described embodiments include a networking subsystem in a second computing device that is configured to receive a task message from a first computing device. Based on the task message, the networking subsystem updates an entry in a task queue with task information from the task message. A processing subsystem in the second computing device subsequently retrieves the task information from the task queue and performs the corresponding task. In these embodiments, the networking subsystem processes the task message (e.g., stores the task information in the task queue) without causing the processing subsystem to perform operations for processing the task message.

Type: Application

Filed: January 26, 2014

Publication date: November 6, 2014

Applicant: Advanced Micro Devices, Inc.

Inventors: Steven K. Reinhardt, Michael L. Chu, Vinod Tipparaju, Walter B. Benton