Patents by Inventor David Constantine Patrick Warde-Farley

David Constantine Patrick Warde-Farley has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

CONTROLLING AGENTS USING AMORTIZED Q LEARNING

Publication number: 20240160901

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network system used to control an agent interacting with an environment. One of the methods includes receiving a current observation; processing the current observation using a proposal neural network to generate a proposal output that defines a proposal probability distribution over a set of possible actions that can be performed by the agent to interact with the environment; sampling (i) one or more actions from the set of possible actions in accordance with the proposal probability distribution and (ii) one or more actions randomly from the set of possible actions; processing the current observation and each sampled action using a Q neural network to generate a Q value; and selecting an action using the Q values generated by the Q neural network.

Type: Application

Filed: January 8, 2024

Publication date: May 16, 2024

Inventors: Tom Van de Wiele, Volodymyr Mnih, Andriy Mnih, David Constantine Patrick Warde-Farley
Controlling agents using amortized Q learning

Patent number: 11868866

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network system used to control an agent interacting with an environment. One of the methods includes receiving a current observation; processing the current observation using a proposal neural network to generate a proposal output that defines a proposal probability distribution over a set of possible actions that can be performed by the agent to interact with the environment; sampling (i) one or more actions from the set of possible actions in accordance with the proposal probability distribution and (ii) one or more actions randomly from the set of possible actions; processing the current observation and each sampled action using a Q neural network to generate a Q value; and selecting an action using the Q values generated by the Q neural network.

Type: Grant

Filed: November 18, 2019

Date of Patent: January 9, 2024

Assignee: Deep Mind Technologies Limited

Inventors: Tom Van de Wiele, Volodymyr Mnih, Andriy Mnih, David Constantine Patrick Warde-Farley
CONTROLLING AGENTS USING RELATIVE VARIATIONAL INTRINSIC CONTROL

Publication number: 20230325635

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a policy neural network for use in controlling an agent using relative variational intrinsic control. In one aspect, a method includes: selecting a skill from a set of skills; generating a trajectory by controlling the agent using the policy neural network while the policy neural network is conditioned on the selected skill; processing an initial observation and a last observation using a relative discriminator neural network to generate a relative score; processing the last observation using an absolute discriminator neural network to generate an absolute score; generating a reward for the trajectory from the absolute score corresponding to the selected skill and the relative score corresponding to the selected skill; and training the policy neural network on the reward for the trajectory.

Type: Application

Filed: September 10, 2021

Publication date: October 12, 2023

Inventors: David Constantine Patrick Warde-Farley, Steven Stenberg Hansen, Volodymyr Mnih, Kate Alexandra Baumli
Unsupervised control using learned rewards

Patent number: 11727281

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting actions to be performed by an agent that interacts with an environment. In one aspect, a system comprises: an action selection subsystem that selects actions to be performed by the agent using an action selection policy generated using an action selection neural network; a reward subsystem that is configured to: receive an observation characterizing a current state of the environment and an observation characterizing a goal state of the environment; generate a reward using an embedded representation of the observation characterizing the current state of the environment and an embedded representation of the observation characterizing the goal state of the environment; and a training subsystem that is configured to train the action selection neural network based on the rewards generated by the reward subsystem using reinforcement learning techniques.

Type: Grant

Filed: January 27, 2022

Date of Patent: August 15, 2023

Assignee: DeepMind Technologies Limited

Inventors: David Constantine Patrick Warde-Farley, Volodymyr Mnih
UNSUPERVISED CONTROL USING LEARNED REWARDS

Publication number: 20220164673

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting actions to be performed by an agent that interacts with an environment. In one aspect, a system comprises: an action selection subsystem that selects actions to be performed by the agent using an action selection policy generated using an action selection neural network; a reward subsystem that is configured to: receive an observation characterizing a current state of the environment and an observation characterizing a goal state of the environment; generate a reward using an embedded representation of the observation characterizing the current state of the environment and an embedded representation of the observation characterizing the goal state of the environment; and a training subsystem that is configured to train the action selection neural network based on the rewards generated by the reward subsystem using reinforcement learning techniques.

Type: Application

Filed: January 27, 2022

Publication date: May 26, 2022

Inventors: David Constantine Patrick Warde-Farley, Volodymyr Mnih
Unsupervised control using learned rewards

Patent number: 11263531

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting actions to be performed by an agent that interacts with an environment. In one aspect, a system comprises: an action selection subsystem that selects actions to be performed by the agent using an action selection policy generated using an action selection neural network; a reward subsystem that is configured to: receive an observation characterizing a current state of the environment and an observation characterizing a goal state of the environment; generate a reward using an embedded representation of the observation characterizing the current state of the environment and an embedded representation of the observation characterizing the goal state of the environment; and a training subsystem that is configured to train the action selection neural network based on the rewards generated by the reward subsystem using reinforcement learning techniques.

Type: Grant

Filed: May 20, 2019

Date of Patent: March 1, 2022

Assignee: DeepMind Technologies Limited

Inventors: David Constantine Patrick Warde-Farley, Volodymyr Mnih
CONTROLLING AGENTS USING AMORTIZED Q LEARNING

Publication number: 20210357731

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network system used to control an agent interacting with an environment. One of the methods includes receiving a current observation; processing the current observation using a proposal neural network to generate a proposal output that defines a proposal probability distribution over a set of possible actions that can be performed by the agent to interact with the environment; sampling (i) one or more actions from the set of possible actions in accordance with the proposal probability distribution and (ii) one or more actions randomly from the set of possible actions; processing the current observation and each sampled action using a Q neural network to generate a Q value; and selecting an action using the Q values generated by the Q neural network.

Type: Application

Filed: November 18, 2019

Publication date: November 18, 2021

Inventors: Tom Van de Wiele, Volodymyr Mnih, Andriy Mnih, David Constantine Patrick Warde-Farley
UNSUPERVISED CONTROL USING LEARNED REWARDS

Publication number: 20190354869

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting actions to be performed by an agent that interacts with an environment. In one aspect, a system comprises: an action selection subsystem that selects actions to be performed by the agent using an action selection policy generated using an action selection neural network; a reward subsystem that is configured to: receive an observation characterizing a current state of the environment and an observation characterizing a goal state of the environment; generate a reward using an embedded representation of the observation characterizing the current state of the environment and an embedded representation of the observation characterizing the goal state of the environment; and a training subsystem that is configured to train the action selection neural network based on the rewards generated by the reward subsystem using reinforcement learning techniques.

Type: Application

Filed: May 20, 2019

Publication date: November 21, 2019

Inventors: David Constantine Patrick Warde-Farley, Volodymyr Mnih