Patents by Inventor Tom Schaul

Tom Schaul has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

REINFORCEMENT LEARNING WITH AUXILIARY TASKS

Publication number: 20240144015

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. The method includes: training an action selection policy neural network, and during the training of the action selection neural network, training one or more auxiliary control neural networks and a reward prediction neural network. Each of the auxiliary control neural networks is configured to receive a respective intermediate output generated by the action selection policy neural network and generate a policy output for a corresponding auxiliary control task. The reward prediction neural network is configured to receive one or more intermediate outputs generated by the action selection policy neural network and generate a corresponding predicted reward.

Type: Application

Filed: November 3, 2023

Publication date: May 2, 2024

Inventors: Volodymyr Mnih, Wojciech Czarnecki, Maxwell Elliot Jaderberg, Tom Schaul, David Silver, Koray Kavukcuoglu
META-LEARNED EVOLUTIONARY STRATEGIES OPTIMIZER

Publication number: 20240127071

Abstract: There is provided a computer-implemented method for updating a search distribution of an evolutionary strategies optimizer using an optimizer neural network comprising one or more attention blocks. The method comprises receiving a plurality of candidate solutions, one or more parameters defining the search distribution that the plurality of candidate solutions are sampled from, and fitness score data indicating a fitness of each respective candidate solution of the plurality of candidate solutions. The method further comprises processing, by the one or more attention neural network blocks, the fitness score data using an attention mechanism to generate respective recombination weights corresponding to each respective candidate solution. The method further comprises updating the one or more parameters defining the search distribution based upon the recombination weights applied to the plurality of candidate solutions.

Type: Application

Filed: September 27, 2023

Publication date: April 18, 2024

Inventors: Robert Tjarko Lange, Tom Schaul, Yutian Chen, Tom Ben Zion Zahavy, Valentin Clement Dalibard, Christopher Yenchuan Lu, Satinder Singh Baveja, Johan Sebastian Flennerhag
TEMPORAL DIFFERENCE SCALING WHEN CONTROLLING AGENTS USING REINFORCEMENT LEARNING

Publication number: 20240104388

Abstract: A reinforcement learning neural network system configured to manage rewards on scales that can vary significantly. The system determines the value of a scale factor that is applied to a temporal difference error used for reinforcement learning. The scale factor depends at least upon a variance of the rewards received during the reinforcement learning.

Type: Application

Filed: February 4, 2022

Publication date: March 28, 2024

Inventor: Tom Schaul
Reinforcement learning with auxiliary tasks

Patent number: 11842281

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. The method includes: training an action selection policy neural network, and during the training of the action selection neural network, training one or more auxiliary control neural networks and a reward prediction neural network. Each of the auxiliary control neural networks is configured to receive a respective intermediate output generated by the action selection policy neural network and generate a policy output for a corresponding auxiliary control task. The reward prediction neural network is configured to receive one or more intermediate outputs generated by the action selection policy neural network and generate a corresponding predicted reward.

Type: Grant

Filed: February 24, 2021

Date of Patent: December 12, 2023

Assignee: DeepMind Technologies Limited

Inventors: Volodymyr Mnih, Wojciech Czarnecki, Maxwell Elliot Jaderberg, Tom Schaul, David Silver, Koray Kavukcuoglu
TRAINING MACHINE LEARNING MODELS BY DETERMINING UPDATE RULES USING NEURAL NETWORKS

Publication number: 20230376771

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for training machine learning models. One method includes obtaining a machine learning model, wherein the machine learning model comprises one or more model parameters, and the machine learning model is trained using gradient descent techniques to optimize an objective function; determining an update rule for the model parameters using a recurrent neural network (RNN); and applying a determined update rule for a final time step in a sequence of multiple time steps to the model parameters.

Type: Application

Filed: March 8, 2023

Publication date: November 23, 2023

Inventors: Misha Man Ray Denil, Tom Schaul, Marcin Andrychowicz, Joao Ferdinando Gomes de Freitas, Sergio Gomez Colmenarejo, Matthew William Hoffman, David Benjamin Pfau
TRAINING NEURAL NETWORKS USING A PRIORITIZED EXPERIENCE MEMORY

Publication number: 20230244933

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a neural network used to select actions performed by a reinforcement learning agent interacting with an environment. In one aspect, a method includes maintaining a replay memory, where the replay memory stores pieces of experience data generated as a result of the reinforcement learning agent interacting with the environment. Each piece of experience data is associated with a respective expected learning progress measure that is a measure of an expected amount of progress made in the training of the neural network if the neural network is trained on the piece of experience data. The method further includes selecting a piece of experience data from the replay memory by prioritizing for selection pieces of experience data having relatively higher expected learning progress measures and training the neural network on the selected piece of experience data.

Type: Application

Filed: January 30, 2023

Publication date: August 3, 2023

Inventors: Tom Schaul, John Quan, David Silver
Learning non-differentiable weights of neural networks using evolutionary strategies

Patent number: 11676035

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network. The neural network has a plurality of differentiable weights and a plurality of non-differentiable weights. One of the methods includes determining trained values of the plurality of differentiable weights and the non-differentiable weights by repeatedly performing operations that include determining an update to the current values of the plurality of differentiable weights using a machine learning gradient-based training technique and determining, using an evolution strategies (ES) technique, an update to the current values of a plurality of distribution parameters.

Type: Grant

Filed: January 23, 2020

Date of Patent: June 13, 2023

Assignee: DeepMind Technologies Limited

Inventors: Karel Lenc, Karen Simonyan, Tom Schaul, Erich Konrad Elsen
Training machine learning models by determining update rules using recurrent neural networks

Patent number: 11615310

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for training machine learning models. One method includes obtaining a machine learning model, wherein the machine learning model comprises one or more model parameters, and the machine learning model is trained using gradient descent techniques to optimize an objective function; determining an update rule for the model parameters using a recurrent neural network (RNN); and applying a determined update rule for a final time step in a sequence of multiple time steps to the model parameters.

Type: Grant

Filed: May 19, 2017

Date of Patent: March 28, 2023

Assignee: DeepMind Technologies Limited

Inventors: Misha Man Ray Denil, Tom Schaul, Marcin Andrychowicz, Joao Ferdinando Gomes de Freitas, Sergio Gomez Colmenarejo, Matthew William Hoffman, David Benjamin Pfau
Training neural networks using a prioritized experience memory

Patent number: 11568250

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a neural network used to select actions performed by a reinforcement learning agent interacting with an environment. In one aspect, a method includes maintaining a replay memory, where the replay memory stores pieces of experience data generated as a result of the reinforcement learning agent interacting with the environment. Each piece of experience data is associated with a respective expected learning progress measure that is a measure of an expected amount of progress made in the training of the neural network if the neural network is trained on the piece of experience data. The method further includes selecting a piece of experience data from the replay memory by prioritizing for selection pieces of experience data having relatively higher expected learning progress measures and training the neural network on the selected piece of experience data.

Type: Grant

Filed: May 4, 2020

Date of Patent: January 31, 2023

Assignee: DeepMind Technologies Limited

Inventors: Tom Schaul, John Quan, David Silver
REINFORCEMENT LEARNING WITH AUXILIARY TASKS

Publication number: 20210182688

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. The method includes: training an action selection policy neural network, and during the training of the action selection neural network, training one or more auxiliary control neural networks and a reward prediction neural network. Each of the auxiliary control neural networks is configured to receive a respective intermediate output generated by the action selection policy neural network and generate a policy output for a corresponding auxiliary control task. The reward prediction neural network is configured to receive one or more intermediate outputs generated by the action selection policy neural network and generate a corresponding predicted reward.

Type: Application

Filed: February 24, 2021

Publication date: June 17, 2021

Inventors: Volodymyr Mnih, Wojciech Czarnecki, Maxwell Elliot Jaderberg, Tom Schaul, David Silver, Koray Kavukcuoglu
MODULATING AGENT BEHAVIOR TO OPTIMIZE LEARNING PROGRESS

Publication number: 20210089908

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for controlling an agent. One of the methods includes sampling a behavior modulation in accordance with a current probability distribution; for each of one or more time steps: processing an input comprising an observation characterizing a current state of the environment at the time step using an action selection neural network to generate a respective action score for each action in a set of possible actions that can be performed by the agent; modifying the action scores using the sampled behavior modulation; and selecting the action to be performed by the agent at the time step based on the modified action scores; determining a fitness measure corresponding to the sampled behavior modulation; and updating the current probability distribution over the set of possible behavior modulations using the fitness measure corresponding to the behavior modulation.

Type: Application

Filed: September 25, 2020

Publication date: March 25, 2021

Inventors: Tom Schaul, Diana Luiza Borsa, Fengning Ding, David Szepesvari, Georg Ostrovski, Simon Osindero, William Clinton Dabney
Reinforcement learning with auxiliary tasks

Patent number: 10956820

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. The method includes: training an action selection policy neural network, and during the training of the action selection neural network, training one or more auxiliary control neural networks and a reward prediction neural network. Each of the auxiliary control neural networks is configured to receive a respective intermediate output generated by the action selection policy neural network and generate a policy output for a corresponding auxiliary control task. The reward prediction neural network is configured to receive one or more intermediate outputs generated by the action selection policy neural network and generate a corresponding predicted reward.

Type: Grant

Filed: May 3, 2019

Date of Patent: March 23, 2021

Assignee: DeepMind Technologies Limited

Inventors: Volodymyr Mnih, Wojciech Czarnecki, Maxwell Elliot Jaderberg, Tom Schaul, David Silver, Koray Kavukcuoglu
ENVIRONMENT PREDICTION USING REINFORCEMENT LEARNING

Publication number: 20200327399

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for prediction of an outcome related to an environment. In one aspect, a system comprises a state representation neural network that is configured to: receive an observation characterizing a state of an environment being interacted with by an agent and process the observation to generate an internal state representation of the environment state; a prediction neural network that is configured to receive a current internal state representation of a current environment state and process the current internal state representation to generate a predicted subsequent state representation of a subsequent state of the environment and a predicted reward for the subsequent state; and a value prediction neural network that is configured to receive a current internal state representation of a current environment state and process the current internal state representation to generate a value prediction.

Type: Application

Filed: June 25, 2020

Publication date: October 15, 2020

Inventors: David Silver, Tom Schaul, Matteo Hessel, Hado Philip van Hasselt
TRAINING NEURAL NETWORKS USING A PRIORITIZED EXPERIENCE MEMORY

Publication number: 20200265312

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a neural network used to select actions performed by a reinforcement learning agent interacting with an environment. In one aspect, a method includes maintaining a replay memory, where the replay memory stores pieces of experience data generated as a result of the reinforcement learning agent interacting with the environment. Each piece of experience data is associated with a respective expected learning progress measure that is a measure of an expected amount of progress made in the training of the neural network if the neural network is trained on the piece of experience data. The method further includes selecting a piece of experience data from the replay memory by prioritizing for selection pieces of experience data having relatively higher expected learning progress measures and training the neural network on the selected piece of experience data.

Type: Application

Filed: May 4, 2020

Publication date: August 20, 2020

Inventors: Tom Schaul, John Quan, David Silver
Environment prediction using reinforcement learning

Patent number: 10733501

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for prediction of an outcome related to an environment. In one aspect, a system comprises a state representation neural network that is configured to: receive an observation characterizing a state of an environment being interacted with by an agent and process the observation to generate an internal state representation of the environment state; a prediction neural network that is configured to receive a current internal state representation of a current environment state and process the current internal state representation to generate a predicted subsequent state representation of a subsequent state of the environment and a predicted reward for the subsequent state; and a value prediction neural network that is configured to receive a current internal state representation of a current environment state and process the current internal state representation to generate a value prediction.

Type: Grant

Filed: May 3, 2019

Date of Patent: August 4, 2020

Assignee: DeepMind Technologies Limited

Inventors: David Silver, Tom Schaul, Matteo Hessel, Hado Philip van Hasselt
LEARNING NON-DIFFERENTIABLE WEIGHTS OF NEURAL NETWORKS USING EVOLUTIONARY STRATEGIES

Publication number: 20200234142

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network. The neural network has a plurality of differentiable weights and a plurality of non-differentiable weights. One of the methods includes determining trained values of the plurality of differentiable weights and the non-differentiable weights by repeatedly performing operations that include determining an update to the current values of the plurality of differentiable weights using a machine learning gradient-based training technique and determining, using an evolution strategies (ES) technique, an update to the current values of a plurality of distribution parameters.

Type: Application

Filed: January 23, 2020

Publication date: July 23, 2020

Inventors: Karel Lenc, Karen Simonyan, Tom Schaul, Erich Konrad Elsen
Training neural networks using a prioritized experience memory

Patent number: 10650310

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a neural network used to select actions performed by a reinforcement learning agent interacting with an environment. In one aspect, a method includes maintaining a replay memory, where the replay memory stores pieces of experience data generated as a result of the reinforcement learning agent interacting with the environment. Each piece of experience data is associated with a respective expected learning progress measure that is a measure of an expected amount of progress made in the training of the neural network if the neural network is trained on the piece of experience data. The method further includes selecting a piece of experience data from the replay memory by prioritizing for selection pieces of experience data having relatively higher expected learning progress measures and training the neural network on the selected piece of experience data.

Type: Grant

Filed: November 11, 2016

Date of Patent: May 12, 2020

Assignee: DeepMind Technologies Limited

Inventors: Tom Schaul, John Quan, David Silver
Selecting reinforcement learning actions using goals and observations

Patent number: 10628733

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for reinforcement learning using goals and observations. One of the methods includes receiving an observation characterizing a current state of the environment; receiving a goal characterizing a target state from a set of target states of the environment; processing the observation using an observation neural network to generate a numeric representation of the observation; processing the goal using a goal neural network to generate a numeric representation of the goal; combining the numeric representation of the observation and the numeric representation of the goal to generate a combined representation; processing the combined representation using an action score neural network to generate a respective score for each action in the predetermined set of actions; and selecting the action to be performed using the respective scores for the actions in the predetermined set of actions.

Type: Grant

Filed: April 6, 2016

Date of Patent: April 21, 2020

Assignee: DeepMind Technologies Limited

Inventors: Tom Schaul, Daniel George Horgan, Karol Gregor, David Silver
REINFORCEMENT LEARNING WITH AUXILIARY TASKS

Publication number: 20190258938

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. The method includes: training an action selection policy neural network, and during the training of the action selection neural network, training one or more auxiliary control neural networks and a reward prediction neural network. Each of the auxiliary control neural networks is configured to receive a respective intermediate output generated by the action selection policy neural network and generate a policy output for a corresponding auxiliary control task. The reward prediction neural network is configured to receive one or more intermediate outputs generated by the action selection policy neural network and generate a corresponding predicted reward.

Type: Application

Filed: May 3, 2019

Publication date: August 22, 2019

Inventors: Volodymyr Mnih, Wojciech Czarnecki, Maxwell Elliot Jaderberg, Tom Schaul, David Silver, Koray Kavukcuoglu
ENVIRONMENT PREDICTION USING REINFORCEMENT LEARNING

Publication number: 20190259051

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for prediction of an outcome related to an environment. In one aspect, a system comprises a state representation neural network that is configured to: receive an observation characterizing a state of an environment being interacted with by an agent and process the observation to generate an internal state representation of the environment state; a prediction neural network that is configured to receive a current internal state representation of a current environment state and process the current internal state representation to generate a predicted subsequent state representation of a subsequent state of the environment and a predicted reward for the subsequent state; and a value prediction neural network that is configured to receive a current internal state representation of a current environment state and process the current internal state representation to generate a value prediction.

Type: Application

Filed: May 3, 2019

Publication date: August 22, 2019

Inventors: David Silver, Tom Schaul, Matteo Hessel, Hado Philip van Hasselt

1 2 next