Patents by Inventor Walter B. Benton

Walter B. Benton has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11687460
    Abstract: Methods, devices, and systems for GPU cache injection. A GPU compute node includes a network interface controller (NIC) which includes NIC receiver circuitry which can receive data for processing on the GPU, NIC transmitter circuitry which can send the data to a main memory of the GPU compute node and which can send coherence information to a coherence directory of the GPU compute node based on the data. The GPU compute node also includes a GPU which includes GPU receiver circuitry which can receive the coherence information; GPU processing circuitry which can determine, based on the coherence information, whether the data satisfies a heuristic; and GPU loading circuitry which can load the data into a cache of the GPU from the main memory if on the data satisfies the heuristic.
    Type: Grant
    Filed: April 26, 2017
    Date of Patent: June 27, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Michael W. LeBeane, Walter B. Benton, Vinay Agarwala
  • Publication number: 20230120934
    Abstract: Systems, apparatuses, and methods for generating network messages on a parallel processor are disclosed. A system includes at least a parallel processor, a general purpose processor, and a network interface unit. The parallel processor includes at least a plurality of compute units, a command processor, and a cache. A thread within a kernel executing on a compute unit of the parallel processor generates a network message and stores the network message and a corresponding indication in the cache. In response to detecting the indication of the network message in the cache, the command processor processes and conveys the network message to the network interface unit without involving the general purpose processor.
    Type: Application
    Filed: December 20, 2022
    Publication date: April 20, 2023
    Inventors: Michael Wayne LeBeane, Khaled Hamidouche, Walter B. Benton
  • Patent number: 11630994
    Abstract: A method of training a neural network includes, at a local computing node, receiving remote parameters from a set of one or more remote computing nodes, initiating execution of a forward pass in a local neural network in the local computing node to determine a final output based on the remote parameters, initiating execution of a backward pass in the local neural network to determine updated parameters for the local neural network, and prior to completion of the backward pass, transmitting a subset of the updated parameters to the set of remote computing nodes.
    Type: Grant
    Filed: February 17, 2018
    Date of Patent: April 18, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Khaled Hamidouche, Michael W LeBeane, Walter B Benton, Michael L Chu
  • Patent number: 11544121
    Abstract: Systems, apparatuses, and methods for generating network messages on a parallel processor are disclosed. A system includes at least a parallel processor, a general purpose processor, and a network interface unit. The parallel processor includes at least a plurality of compute units, a command processor, and a cache. A thread within a kernel executing on a compute unit of the parallel processor generates a network message and stores the network message and a corresponding indication in the cache. In response to detecting the indication of the network message in the cache, the command processor processes and conveys the network message to the network interface unit without involving the general purpose processor.
    Type: Grant
    Filed: November 16, 2017
    Date of Patent: January 3, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Michael Wayne LeBeane, Khaled Hamidouche, Walter B. Benton
  • Patent number: 10740163
    Abstract: Systems, apparatuses, and methods for performing network packet templating for graphics processing unit (GPU)-initiated communication are disclosed. A central processing unit (CPU) creates a network packet according to a template and populates a first subset of fields of the network packet with static data. Next, the CPU stores the network packet in a memory. A GPU initiates execution of a kernel and detects a network communication request within the kernel and prior to the kernel completing execution. Responsive to this determination, the GPU populates a second subset of fields of the network packet with runtime data. Then, the GPU generates a notification that the network packet is ready to be processed. A network interface controller (NIC) processes the network packet using data retrieved from the first subset of fields and from the second subset of fields responsive to detecting the notification.
    Type: Grant
    Filed: June 28, 2018
    Date of Patent: August 11, 2020
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Khaled Hamidouche, Michael Wayne LeBeane, Walter B. Benton
  • Publication number: 20200004610
    Abstract: Systems, apparatuses, and methods for performing network packet templating for graphics processing unit (GPU)-initiated communication are disclosed. A central processing unit (CPU) creates a network packet according to a template and populates a first subset of fields of the network packet with static data. Next, the CPU stores the network packet in a memory. A GPU initiates execution of a kernel and detects a network communication request within the kernel and prior to the kernel completing execution. Responsive to this determination, the GPU populates a second subset of fields of the network packet with runtime data. Then, the GPU generates a notification that the network packet is ready to be processed. A network interface controller (NIC) processes the network packet using data retrieved from the first subset of fields and from the second subset of fields responsive to detecting the notification.
    Type: Application
    Filed: June 28, 2018
    Publication date: January 2, 2020
    Inventors: Khaled Hamidouche, Michael Wayne LeBeane, Walter B. Benton
  • Publication number: 20190258924
    Abstract: A method of training a neural network includes, at a local computing node, receiving remote parameters from a set of one or more remote computing nodes, initiating execution of a forward pass in a local neural network in the local computing node to determine a final output based on the remote parameters, initiating execution of a backward pass in the local neural network to determine updated parameters for the local neural network, and prior to completion of the backward pass, transmitting a subset of the updated parameters to the set of remote computing nodes.
    Type: Application
    Filed: February 17, 2018
    Publication date: August 22, 2019
    Inventors: Khaled Hamidouche, Michael W LeBeane, Walter B Benton, Michael L Chu
  • Publication number: 20190146857
    Abstract: Systems, apparatuses, and methods for generating network messages on a parallel processor are disclosed. A system includes at least a parallel processor, a general purpose processor, and a network interface unit. The parallel processor includes at least a plurality of compute units, a command processor, and a cache. A thread within a kernel executing on a compute unit of the parallel processor generates a network message and stores the network message and a corresponding indication in the cache. In response to detecting the indication of the network message in the cache, the command processor processes and conveys the network message to the network interface unit without involving the general purpose processor.
    Type: Application
    Filed: November 16, 2017
    Publication date: May 16, 2019
    Inventors: Michael Wayne LeBeane, Khaled Hamidouche, Walter B. Benton
  • Patent number: 10198349
    Abstract: Systems, apparatuses, and methods for utilizing in-memory accelerators to perform data conversion operations are disclosed. A system includes one or more main processors coupled to one or more memory modules. Each memory module includes one or more memory devices coupled to a processing in memory (PIM) device. The main processors are configured to generate an executable for a PIM device to accelerate data conversion tasks of data stored in the local memory devices. In one embodiment, the system detects a read request for data stored in a given memory module. In order to process the read request, the system determines that a conversion from a first format to a second format is required. In response to detecting the read request, the given memory module's PIM device performs the conversion of the data from the first format to the second format and then provides the data to a consumer application.
    Type: Grant
    Filed: September 19, 2016
    Date of Patent: February 5, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Mauricio Breternitz, Walter B. Benton
  • Publication number: 20180314638
    Abstract: Methods, devices, and systems for GPU cache injection. A GPU compute node includes a network interface controller (NIC) which includes NIC receiver circuitry which can receive data for processing on the GPU, NIC transmitter circuitry which can send the data to a main memory of the GPU compute node and which can send coherence information to a coherence directory of the GPU compute node based on the data. The GPU compute node also includes a GPU which includes GPU receiver circuitry which can receive the coherence information; GPU processing circuitry which can determine, based on the coherence information, whether the data satisfies a heuristic; and GPU loading circuitry which can load the data into a cache of the GPU from the main memory if on the data satisfies the heuristic.
    Type: Application
    Filed: April 26, 2017
    Publication date: November 1, 2018
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Michael W. LeBeane, Walter B. Benton, Vinay Agarwala
  • Publication number: 20180081583
    Abstract: Systems, apparatuses, and methods for utilizing in-memory accelerators to perform data conversion operations are disclosed. A system includes one or more main processors coupled to one or more memory modules. Each memory module includes one or more memory devices coupled to a processing in memory (PIM) device. The main processors are configured to generate an executable for a PIM device to accelerate data conversion tasks of data stored in the local memory devices. In one embodiment, the system detects a read request for data stored in a given memory module. In order to process the read request, the system determines that a conversion from a first format to a second format is required. In response to detecting the read request, the given memory module's PIM device performs the conversion of the data from the first format to the second format and then provides the data to a consumer application.
    Type: Application
    Filed: September 19, 2016
    Publication date: March 22, 2018
    Inventors: Mauricio Breternitz, Walter B. Benton
  • Publication number: 20170161114
    Abstract: A computing device is disclosed. The computing device includes an Accelerated Processing Unit (APU) including at least a first Heterogeneous System Architecture (HSA) computing device and at least a second HSA computing device, the second computing device being a different type than the first computing device, and an HSA Memory Management Unit (HMMU) allowing the APU to communicate with at least one memory. The computing task is enqueued on an HSA-managed queue that is set to run on the at least first HSA computing device or the at least second HSA computing device. The computing task is re-enqueued on the HSA-managed queue based on a repetition flag that triggers the number of times the computing task is re-enqueued. The repetition field is decremented each time the computing task is re-enqueued. The repetition field may include a special value (e.g., ?1) to allow re-enqueuing of the computing task indefinitely.
    Type: Application
    Filed: December 8, 2015
    Publication date: June 8, 2017
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Walter B. Benton, Steven K. Reinhardt
  • Patent number: 9582402
    Abstract: The described embodiments include a networking subsystem in a second computing device that is configured to receive a task message from a first computing device. Based on the task message, the networking subsystem updates an entry in a task queue with task information from the task message. A processing subsystem in the second computing device subsequently retrieves the task information from the task queue and performs the corresponding task. In these embodiments, the networking subsystem processes the task message (e.g., stores the task information in the task queue) without causing the processing subsystem to perform operations for processing the task message.
    Type: Grant
    Filed: January 26, 2014
    Date of Patent: February 28, 2017
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Steven K. Reinhardt, Michael L. Chu, Vinod Tipparaju, Walter B. Benton
  • Publication number: 20140331230
    Abstract: The described embodiments include a networking subsystem in a second computing device that is configured to receive a task message from a first computing device. Based on the task message, the networking subsystem updates an entry in a task queue with task information from the task message. A processing subsystem in the second computing device subsequently retrieves the task information from the task queue and performs the corresponding task. In these embodiments, the networking subsystem processes the task message (e.g., stores the task information in the task queue) without causing the processing subsystem to perform operations for processing the task message.
    Type: Application
    Filed: January 26, 2014
    Publication date: November 6, 2014
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Steven K. Reinhardt, Michael L. Chu, Vinod Tipparaju, Walter B. Benton