Patents by Inventor Volodymyr Mnih

Volodymyr Mnih has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

CONTROLLING AGENTS USING AMORTIZED Q LEARNING

Publication number: 20240160901

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network system used to control an agent interacting with an environment. One of the methods includes receiving a current observation; processing the current observation using a proposal neural network to generate a proposal output that defines a proposal probability distribution over a set of possible actions that can be performed by the agent to interact with the environment; sampling (i) one or more actions from the set of possible actions in accordance with the proposal probability distribution and (ii) one or more actions randomly from the set of possible actions; processing the current observation and each sampled action using a Q neural network to generate a Q value; and selecting an action using the Q values generated by the Q neural network.

Type: Application

Filed: January 8, 2024

Publication date: May 16, 2024

Inventors: Tom Van de Wiele, Volodymyr Mnih, Andriy Mnih, David Constantine Patrick Warde-Farley
Noisy neural network layers with noise parameters

Patent number: 11977983

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting an action to be performed by a reinforcement learning agent. The method includes obtaining an observation characterizing a current state of an environment. For each layer parameter of each noisy layer of a neural network, a respective noise value is determined. For each layer parameter of each noisy layer, a noisy current value for the layer parameter is determined from a current value of the layer parameter, a current value of a corresponding noise parameter, and the noise value. A network input including the observation is processed using the neural network in accordance with the noisy current values to generate a network output for the network input. An action is selected from a set of possible actions to be performed by the agent in response to the observation using the network output.

Type: Grant

Filed: September 14, 2020

Date of Patent: May 7, 2024

Assignee: DeepMind Technologies Limited

Inventors: Mohammad Gheshlaghi Azar, Meire Fortunato, Bilal Piot, Olivier Claude Pietquin, Jacob Lee Menick, Volodymyr Mnih, Charles Blundell, Remi Munos
REINFORCEMENT LEARNING WITH AUXILIARY TASKS

Publication number: 20240144015

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. The method includes: training an action selection policy neural network, and during the training of the action selection neural network, training one or more auxiliary control neural networks and a reward prediction neural network. Each of the auxiliary control neural networks is configured to receive a respective intermediate output generated by the action selection policy neural network and generate a policy output for a corresponding auxiliary control task. The reward prediction neural network is configured to receive one or more intermediate outputs generated by the action selection policy neural network and generate a corresponding predicted reward.

Type: Application

Filed: November 3, 2023

Publication date: May 2, 2024

Inventors: Volodymyr Mnih, Wojciech Czarnecki, Maxwell Elliot Jaderberg, Tom Schaul, David Silver, Koray Kavukcuoglu
DISTRIBUTED TRAINING USING ACTOR-CRITIC REINFORCEMENT LEARNING WITH OFF-POLICY CORRECTION FACTORS

Publication number: 20240127060

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network used to select actions to be performed by an agent interacting with an environment. In one aspect, a system comprises a plurality of actor computing units and a plurality of learner computing units. The actor computing units generate experience tuple trajectories that are used by the learner computing units to update learner action selection neural network parameters using a reinforcement learning technique. The reinforcement learning technique may be an off-policy actor critic reinforcement learning technique.

Type: Application

Filed: October 16, 2023

Publication date: April 18, 2024

Inventors: Hubert Josef Soyer, Lasse Espeholt, Karen Simonyan, Yotam Doron, Vlad Firoiu, Volodymyr Mnih, Koray Kavukcuoglu, Remi Munos, Thomas Ward, Timothy James Alexander Harley, Iain Robert Dunning
AGENT CONTROL THROUGH IN-CONTEXT REINFORCEMENT LEARNING

Publication number: 20240104379

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling agents. In particular, an agent can be controlled using an action selection neural network that performs in-context reinforcement learning when controlling an agent on a new task.

Type: Application

Filed: September 28, 2023

Publication date: March 28, 2024

Inventors: Michael Laskin, Volodymyr Mnih, Luyu Wang, Satinder Singh Baveja
NEURAL NETWORK REINFORCEMENT LEARNING WITH DIVERSE POLICIES

Publication number: 20240104389

Abstract: In one aspect there is provided a method for training a neural network system by reinforcement learning. The neural network system may be configured to receive an input observation characterizing a state of an environment interacted with by an agent and to select and output an action in accordance with a policy aiming to satisfy an objective. The method may comprise obtaining a policy set comprising one or more policies for satisfying the objective and determining a new policy based on the one or more policies. The determining may include one or more optimization steps that aim to maximize a diversity of the new policy relative to the policy set under the condition that the new policy satisfies a minimum performance criterion based on an expected return that would be obtained by following the new policy.

Type: Application

Filed: February 4, 2022

Publication date: March 28, 2024

Inventors: Tom Ben Zion Zahavy, Brendan Timothy O'Donoghue, Andre da Motta Salles Barreto, Johan Sebastian Flennerhag, Volodymyr Mnih, Satinder Singh Baveja
Image processing of an environment to select an action to be performed by an agent interacting with the environment

Patent number: 11941088

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing images using recurrent attention. One of the methods includes determining a location in the first image; extracting a glimpse from the first image using the location; generating a glimpse representation of the extracted glimpse; processing the glimpse representation using a recurrent neural network to update a current internal state of the recurrent neural network to generate a new internal state; processing the new internal state to select a location in a next image in the image sequence after the first image; and processing the new internal state to select an action from a predetermined set of possible actions.

Type: Grant

Filed: May 5, 2022

Date of Patent: March 26, 2024

Assignee: DeepMind Technologies Limited

Inventors: Volodymyr Mnih, Koray Kavukcuoglu
Distributed training using actor-critic reinforcement learning with off-policy correction factors

Patent number: 11868894

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network used to select actions to be performed by an agent interacting with an environment. In one aspect, a system comprises a plurality of actor computing units and a plurality of learner computing units. The actor computing units generate experience tuple trajectories that are used by the learner computing units to update learner action selection neural network parameters using a reinforcement learning technique. The reinforcement learning technique may be an off-policy actor critic reinforcement learning technique.

Type: Grant

Filed: January 4, 2023

Date of Patent: January 9, 2024

Assignee: DeepMind Technologies Limited

Inventors: Hubert Josef Soyer, Lasse Espeholt, Karen Simonyan, Yotam Doron, Vlad Firoiu, Volodymyr Mnih, Koray Kavukcuoglu, Remi Munos, Thomas Ward, Timothy James Alexander Harley, Iain Robert Dunning
Controlling agents using amortized Q learning

Patent number: 11868866

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network system used to control an agent interacting with an environment. One of the methods includes receiving a current observation; processing the current observation using a proposal neural network to generate a proposal output that defines a proposal probability distribution over a set of possible actions that can be performed by the agent to interact with the environment; sampling (i) one or more actions from the set of possible actions in accordance with the proposal probability distribution and (ii) one or more actions randomly from the set of possible actions; processing the current observation and each sampled action using a Q neural network to generate a Q value; and selecting an action using the Q values generated by the Q neural network.

Type: Grant

Filed: November 18, 2019

Date of Patent: January 9, 2024

Assignee: Deep Mind Technologies Limited

Inventors: Tom Van de Wiele, Volodymyr Mnih, Andriy Mnih, David Constantine Patrick Warde-Farley
Reinforcement learning with auxiliary tasks

Patent number: 11842281

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. The method includes: training an action selection policy neural network, and during the training of the action selection neural network, training one or more auxiliary control neural networks and a reward prediction neural network. Each of the auxiliary control neural networks is configured to receive a respective intermediate output generated by the action selection policy neural network and generate a policy output for a corresponding auxiliary control task. The reward prediction neural network is configured to receive one or more intermediate outputs generated by the action selection policy neural network and generate a corresponding predicted reward.

Type: Grant

Filed: February 24, 2021

Date of Patent: December 12, 2023

Assignee: DeepMind Technologies Limited

Inventors: Volodymyr Mnih, Wojciech Czarnecki, Maxwell Elliot Jaderberg, Tom Schaul, David Silver, Koray Kavukcuoglu
CONTROLLING AGENTS USING RELATIVE VARIATIONAL INTRINSIC CONTROL

Publication number: 20230325635

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a policy neural network for use in controlling an agent using relative variational intrinsic control. In one aspect, a method includes: selecting a skill from a set of skills; generating a trajectory by controlling the agent using the policy neural network while the policy neural network is conditioned on the selected skill; processing an initial observation and a last observation using a relative discriminator neural network to generate a relative score; processing the last observation using an absolute discriminator neural network to generate an absolute score; generating a reward for the trajectory from the absolute score corresponding to the selected skill and the relative score corresponding to the selected skill; and training the policy neural network on the reward for the trajectory.

Type: Application

Filed: September 10, 2021

Publication date: October 12, 2023

Inventors: David Constantine Patrick Warde-Farley, Steven Stenberg Hansen, Volodymyr Mnih, Kate Alexandra Baumli
Asynchronous deep reinforcement learning

Patent number: 11783182

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for asynchronous deep reinforcement learning. One of the systems includes a plurality of workers, wherein each worker is configured to operate independently of each other worker, and wherein each worker is associated with a respective actor that interacts with a respective replica of the environment during the training of the deep neural network.

Type: Grant

Filed: February 8, 2021

Date of Patent: October 10, 2023

Assignee: DeepMind Technologies Limited

Inventors: Volodymyr Mnih, Adrià Puigdomènech Badia, Alexander Benjamin Graves, Timothy James Alexander Harley, David Silver, Koray Kavukcuoglu
Unsupervised control using learned rewards

Patent number: 11727281

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting actions to be performed by an agent that interacts with an environment. In one aspect, a system comprises: an action selection subsystem that selects actions to be performed by the agent using an action selection policy generated using an action selection neural network; a reward subsystem that is configured to: receive an observation characterizing a current state of the environment and an observation characterizing a goal state of the environment; generate a reward using an embedded representation of the observation characterizing the current state of the environment and an embedded representation of the observation characterizing the goal state of the environment; and a training subsystem that is configured to train the action selection neural network based on the rewards generated by the reward subsystem using reinforcement learning techniques.

Type: Grant

Filed: January 27, 2022

Date of Patent: August 15, 2023

Assignee: DeepMind Technologies Limited

Inventors: David Constantine Patrick Warde-Farley, Volodymyr Mnih
DISTRIBUTED TRAINING USING ACTOR-CRITIC REINFORCEMENT LEARNING WITH OFF-POLICY CORRECTION FACTORS

Publication number: 20230153617

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network used to select actions to be performed by an agent interacting with an environment. In one aspect, a system comprises a plurality of actor computing units and a plurality of learner computing units. The actor computing units generate experience tuple trajectories that are used by the learner computing units to update learner action selection neural network parameters using a reinforcement learning technique. The reinforcement learning technique may be an off-policy actor critic reinforcement learning technique.

Type: Application

Filed: January 4, 2023

Publication date: May 18, 2023

Inventors: Hubert Josef Soyer, Lasse Espeholt, Karen Simonyan, Yotam Doron, Vlad Firoiu, Volodymyr Mnih, Koray Kavukcuoglu, Remi Munos, Thomas Ward, Timothy James Alexander Harley, Iain Robert Dunning
Distributed training using actor-critic reinforcement learning with off-policy correction factors

Patent number: 11593646

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network used to select actions to be performed by an agent interacting with an environment. In one aspect, a system comprises a plurality of actor computing units and a plurality of learner computing units. The actor computing units generate experience tuple trajectories that are used by the learner computing units to update learner action selection neural network parameters using a reinforcement learning technique. The reinforcement learning technique may be an off-policy actor critic reinforcement learning technique.

Type: Grant

Filed: February 5, 2019

Date of Patent: February 28, 2023

Assignee: DeepMind Technologies Limited

Inventors: Hubert Josef Soyer, Lasse Espeholt, Karen Simonyan, Yotam Doron, Vlad Firoiu, Volodymyr Mnih, Koray Kavukcuoglu, Remi Munos, Thomas Ward, Timothy James Alexander Harley, Iain Robert Dunning
REINFORCEMENT LEARNING FOR ACTIVE SEQUENCE PROCESSING

Publication number: 20220392206

Abstract: A system that is configured to receive a sequence of task inputs and to perform a machine learning task is described. The system includes a reinforcement learning (RL) neural network and a task neural network. The RL neural network is configured to: generate, for each task input of the sequence of task inputs, a respective decision that determines whether to encode the task input or to skip the task input, and provide the respective decision of each task input to the task neural network.

Type: Application

Filed: November 13, 2020

Publication date: December 8, 2022

Inventors: Viorica PATRAUCEAN, Bilal PIOT, Joao CARREIRA, Volodymyr MNIH, Simon OSINDERO
Distributed training of reinforcement learning systems

Patent number: 11507827

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for distributed training of reinforcement learning systems. One of the methods includes receiving, by a learner, current values of the parameters of the Q network from a parameter server, wherein each learner maintains a respective learner Q network replica and a respective target Q network replica; updating, by the learner, the parameters of the learner Q network replica maintained by the learner using the current values; selecting, by the learner, an experience tuple from a respective replay memory; computing, by the learner, a gradient from the experience tuple using the learner Q network replica maintained by the learner and the target Q network replica maintained by the learner; and providing, by the learner, the computed gradient to the parameter server.

Type: Grant

Filed: October 14, 2019

Date of Patent: November 22, 2022

Assignee: DeepMind Technologies Limited

Inventors: Praveen Deepak Srinivasan, Rory Fearon, Cagdas Alcicek, Arun Sarath Nair, Samuel Blackwell, Vedavyas Panneershelvam, Alessandro De Maria, Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Mustafa Suleyman
REINFORCEMENT LEARNING USING BASELINE AND POLICY NEURAL NETWORKS

Publication number: 20220261647

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for asynchronous deep reinforcement learning. One of the systems includes a plurality of workers, wherein each worker is configured to operate independently of each other worker, and wherein each worker is associated with a respective actor that interacts with a respective replica of the environment during the training of the deep neural network.

Type: Application

Filed: April 29, 2022

Publication date: August 18, 2022

Inventors: Volodymyr Mnih, Adrià Puigdomènech Badia, Alexander Benjamin Graves, Timothy James Alexander Harley, David Silver, Koray Kavukcuoglu
Image processing with recurrent attention

Patent number: 11354548

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing images using recurrent attention. One of the methods includes determining a location in the first image; extracting a glimpse from the first image using the location; generating a glimpse representation of the extracted glimpse; processing the glimpse representation using a recurrent neural network to update a current internal state of the recurrent neural network to generate a new internal state; processing the new internal state to select a location in a next image in the image sequence after the first image; and processing the new internal state to select an action from a predetermined set of possible actions.

Type: Grant

Filed: July 13, 2020

Date of Patent: June 7, 2022

Assignee: DeepMind Technologies Limited

Inventors: Volodymyr Mnih, Koray Kavukcuoglu
UNSUPERVISED CONTROL USING LEARNED REWARDS

Publication number: 20220164673

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting actions to be performed by an agent that interacts with an environment. In one aspect, a system comprises: an action selection subsystem that selects actions to be performed by the agent using an action selection policy generated using an action selection neural network; a reward subsystem that is configured to: receive an observation characterizing a current state of the environment and an observation characterizing a goal state of the environment; generate a reward using an embedded representation of the observation characterizing the current state of the environment and an embedded representation of the observation characterizing the goal state of the environment; and a training subsystem that is configured to train the action selection neural network based on the rewards generated by the reward subsystem using reinforcement learning techniques.

Type: Application

Filed: January 27, 2022

Publication date: May 26, 2022

Inventors: David Constantine Patrick Warde-Farley, Volodymyr Mnih

1 2 3 next