Patents by Inventor Volodymyr Mnih
Volodymyr Mnih has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11334792Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for asynchronous deep reinforcement learning. One of the systems includes a plurality of workers, wherein each worker is configured to operate independently of each other worker, and wherein each worker is associated with a respective actor that interacts with a respective replica of the environment during the training of the deep neural network.Type: GrantFiled: May 3, 2019Date of Patent: May 17, 2022Assignee: DeepMind Technologies LimitedInventors: Volodymyr Mnih, Adria Puigdomenech Badia, Alexander Benjamin Graves, Timothy James Alexander Harley, David Silver, Koray Kavukcuoglu
-
Patent number: 11263531Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting actions to be performed by an agent that interacts with an environment. In one aspect, a system comprises: an action selection subsystem that selects actions to be performed by the agent using an action selection policy generated using an action selection neural network; a reward subsystem that is configured to: receive an observation characterizing a current state of the environment and an observation characterizing a goal state of the environment; generate a reward using an embedded representation of the observation characterizing the current state of the environment and an embedded representation of the observation characterizing the goal state of the environment; and a training subsystem that is configured to train the action selection neural network based on the rewards generated by the reward subsystem using reinforcement learning techniques.Type: GrantFiled: May 20, 2019Date of Patent: March 1, 2022Assignee: DeepMind Technologies LimitedInventors: David Constantine Patrick Warde-Farley, Volodymyr Mnih
-
Publication number: 20210374538Abstract: We describe a method of reinforcement learning for a subject system having multiple states and actions to move from one state to the next. Training data is generated by operating on the system with a succession of actions and used to train a second neural network. Target values for training the second neural network are derived from a first neural network which is generated by copying weights of the second neural network at intervals.Type: ApplicationFiled: June 25, 2021Publication date: December 2, 2021Inventors: Volodymyr Mnih, Koray Kavukcuoglu
-
Publication number: 20210357731Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network system used to control an agent interacting with an environment. One of the methods includes receiving a current observation; processing the current observation using a proposal neural network to generate a proposal output that defines a proposal probability distribution over a set of possible actions that can be performed by the agent to interact with the environment; sampling (i) one or more actions from the set of possible actions in accordance with the proposal probability distribution and (ii) one or more actions randomly from the set of possible actions; processing the current observation and each sampled action using a Q neural network to generate a Q value; and selecting an action using the Q values generated by the Q neural network.Type: ApplicationFiled: November 18, 2019Publication date: November 18, 2021Inventors: Tom Van de Wiele, Volodymyr Mnih, Andriy Mnih, David Constantine Patrick Warde-Farley
-
Patent number: 11049008Abstract: We describe a method of reinforcement learning for a subject system having multiple states and actions to move from one state to the next. Training data is generated by operating on the system with a succession of actions and used to train a second neural network. Target values for training the second neural network are derived from a first neural network which is generated by copying weights of the second neural network at intervals.Type: GrantFiled: June 9, 2017Date of Patent: June 29, 2021Assignee: DeepMind Technologies LimitedInventors: Volodymyr Mnih, Koray Kavukcuoglu
-
Publication number: 20210182688Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. The method includes: training an action selection policy neural network, and during the training of the action selection neural network, training one or more auxiliary control neural networks and a reward prediction neural network. Each of the auxiliary control neural networks is configured to receive a respective intermediate output generated by the action selection policy neural network and generate a policy output for a corresponding auxiliary control task. The reward prediction neural network is configured to receive one or more intermediate outputs generated by the action selection policy neural network and generate a corresponding predicted reward.Type: ApplicationFiled: February 24, 2021Publication date: June 17, 2021Inventors: Volodymyr Mnih, Wojciech Czarnecki, Maxwell Elliot Jaderberg, Tom Schaul, David Silver, Koray Kavukcuoglu
-
Publication number: 20210166127Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for asynchronous deep reinforcement learning. One of the systems includes a plurality of workers, wherein each worker is configured to operate independently of each other worker, and wherein each worker is associated with a respective actor that interacts with a respective replica of the environment during the training of the deep neural network.Type: ApplicationFiled: February 8, 2021Publication date: June 3, 2021Inventors: Volodymyr Mnih, Adrià Puigdomènech Badia, Alexander Benjamin Graves, Timothy James Alexander Harley, David Silver, Koray Kavukcuoglu
-
Patent number: 10956820Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. The method includes: training an action selection policy neural network, and during the training of the action selection neural network, training one or more auxiliary control neural networks and a reward prediction neural network. Each of the auxiliary control neural networks is configured to receive a respective intermediate output generated by the action selection policy neural network and generate a policy output for a corresponding auxiliary control task. The reward prediction neural network is configured to receive one or more intermediate outputs generated by the action selection policy neural network and generate a corresponding predicted reward.Type: GrantFiled: May 3, 2019Date of Patent: March 23, 2021Assignee: DeepMind Technologies LimitedInventors: Volodymyr Mnih, Wojciech Czarnecki, Maxwell Elliot Jaderberg, Tom Schaul, David Silver, Koray Kavukcuoglu
-
Publication number: 20210065012Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting an action to be performed by a reinforcement learning agent. The method includes obtaining an observation characterizing a current state of an environment. For each layer parameter of each noisy layer of a neural network, a respective noise value is determined. For each layer parameter of each noisy layer, a noisy current value for the layer parameter is determined from a current value of the layer parameter, a current value of a corresponding noise parameter, and the noise value. A network input including the observation is processed using the neural network in accordance with the noisy current values to generate a network output for the network input. An action is selected from a set of possible actions to be performed by the agent in response to the observation using the network output.Type: ApplicationFiled: September 14, 2020Publication date: March 4, 2021Inventors: Mohammad Gheshlaghi Azar, Meire Fortunato, Bilal Piot, Olivier Claude Pietquin, Jacob Lee Menick, Volodymyr Mnih, Charles Blundell, Remi Munos
-
Patent number: 10936946Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for asynchronous deep reinforcement learning. One of the systems includes a plurality of workers, wherein each worker is configured to operate independently of each other worker, and wherein each worker is associated with a respective actor that interacts with a respective replica of the environment during the training of the deep neural network.Type: GrantFiled: November 11, 2016Date of Patent: March 2, 2021Assignee: DeepMind Technologies LimitedInventors: Volodymyr Mnih, Adrià Puigdomènech Badia, Alexander Benjamin Graves, Timothy James Alexander Harley, David Silver, Koray Kavukcuoglu
-
Publication number: 20210034970Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network used to select actions to be performed by an agent interacting with an environment. In one aspect, a system comprises a plurality of actor computing units and a plurality of learner computing units. The actor computing units generate experience tuple trajectories that are used by the learner computing units to update learner action selection neural network parameters using a reinforcement learning technique. The reinforcement learning technique may be an off-policy actor critic reinforcement learning technique.Type: ApplicationFiled: February 5, 2019Publication date: February 4, 2021Inventors: Hubert Josef Soyer, Lasse Espeholt, Karen Simonyan, Yotam Doron, Vlad Firoiu, Volodymyr Mnih, Koray Kavukcuoglu, Remi Munos, Thomas Ward, Timothy James Alexander Harley, Iain Robert Dunning
-
Patent number: 10839293Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting an action to be performed by a reinforcement learning agent. The method includes obtaining an observation characterizing a current state of an environment. For each layer parameter of each noisy layer of a neural network, a respective noise value is determined. For each layer parameter of each noisy layer, a noisy current value for the layer parameter is determined from a current value of the layer parameter, a current value of a corresponding noise parameter, and the noise value. A network input including the observation is processed using the neural network in accordance with the noisy current values to generate a network output for the network input. An action is selected from a set of possible actions to be performed by the agent in response to the observation using the network output.Type: GrantFiled: June 12, 2019Date of Patent: November 17, 2020Assignee: DeepMind Technologies LimitedInventors: Mohammad Gheshlaghi Azar, Meire Fortunato, Bilal Piot, Olivier Claude Pietquin, Jacob Lee Menick, Volodymyr Mnih, Charles Blundell, Remi Munos
-
Patent number: 10748041Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing images using recurrent attention. One of the methods includes determining a location in the first image; extracting a glimpse from the first image using the location; generating a glimpse representation of the extracted glimpse; processing the glimpse representation using a recurrent neural network to update a current internal state of the recurrent neural network to generate a new internal state; processing the new internal state to select a location in a next image in the image sequence after the first image; and processing the new internal state to select an action from a predetermined set of possible actions.Type: GrantFiled: January 17, 2019Date of Patent: August 18, 2020Assignee: DeepMind Technologies LimitedInventors: Volodymyr Mnih, Koray Kavukcuoglu
-
Publication number: 20200117992Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for distributed training of reinforcement learning systems. One of the methods includes receiving, by a learner, current values of the parameters of the Q network from a parameter server, wherein each learner maintains a respective learner Q network replica and a respective target Q network replica; updating, by the learner, the parameters of the learner Q network replica maintained by the learner using the current values; selecting, by the learner, an experience tuple from a respective replay memory; computing, by the learner, a gradient from the experience tuple using the learner Q network replica maintained by the learner and the target Q network replica maintained by the learner; and providing, by the learner, the computed gradient to the parameter server.Type: ApplicationFiled: October 14, 2019Publication date: April 16, 2020Inventors: Praveen Deepak Srinivasan, Rory Fearon, Cagdas Alcicek, Arun Sarath Nair, Samuel Blackwell, Vedavyas Panneershelvam, Alessandro De Maria, Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Mustafa Suleyman
-
Publication number: 20190362238Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting an action to be performed by a reinforcement learning agent. The method includes obtaining an observation characterizing a current state of an environment. For each layer parameter of each noisy layer of a neural network, a respective noise value is determined. For each layer parameter of each noisy layer, a noisy current value for the layer parameter is determined from a current value of the layer parameter, a current value of a corresponding noise parameter, and the noise value. A network input including the observation is processed using the neural network in accordance with the noisy current values to generate a network output for the network input. An action is selected from a set of possible actions to be performed by the agent in response to the observation using the network output.Type: ApplicationFiled: June 12, 2019Publication date: November 28, 2019Inventors: Olivier Pietquin, Jacob Lee Menick, Mohammad Gheshlaghi Azar, Bilal Piot, Volodymyr Mnih, Charles Blundell, Meire Fortunato, Remi Munos
-
Publication number: 20190354869Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting actions to be performed by an agent that interacts with an environment. In one aspect, a system comprises: an action selection subsystem that selects actions to be performed by the agent using an action selection policy generated using an action selection neural network; a reward subsystem that is configured to: receive an observation characterizing a current state of the environment and an observation characterizing a goal state of the environment; generate a reward using an embedded representation of the observation characterizing the current state of the environment and an embedded representation of the observation characterizing the goal state of the environment; and a training subsystem that is configured to train the action selection neural network based on the rewards generated by the reward subsystem using reinforcement learning techniques.Type: ApplicationFiled: May 20, 2019Publication date: November 21, 2019Inventors: David Constantine Patrick Warde-Farley, Volodymyr Mnih
-
Patent number: 10445641Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for distributed training of reinforcement learning systems. One of the methods includes receiving, by a learner, current values of the parameters of the Q network from a parameter server, wherein each learner maintains a respective learner Q network replica and a respective target Q network replica; updating, by the learner, the parameters of the learner Q network replica maintained by the learner using the current values; selecting, by the learner, an experience tuple from a respective replay memory; computing, by the learner, a gradient from the experience tuple using the learner Q network replica maintained by the learner and the target Q network replica maintained by the learner; and providing, by the learner, the computed gradient to the parameter server.Type: GrantFiled: February 4, 2016Date of Patent: October 15, 2019Assignee: Deepmind Technologies LimitedInventors: Praveen Deepak Srinivasan, Rory Fearon, Cagdas Alcicek, Arun Sarath Nair, Samuel Blackwell, Vedavyas Panneershelvam, Alessandro De Maria, Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Mustafa Suleyman
-
Publication number: 20190258938Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. The method includes: training an action selection policy neural network, and during the training of the action selection neural network, training one or more auxiliary control neural networks and a reward prediction neural network. Each of the auxiliary control neural networks is configured to receive a respective intermediate output generated by the action selection policy neural network and generate a policy output for a corresponding auxiliary control task. The reward prediction neural network is configured to receive one or more intermediate outputs generated by the action selection policy neural network and generate a corresponding predicted reward.Type: ApplicationFiled: May 3, 2019Publication date: August 22, 2019Inventors: Volodymyr Mnih, Wojciech Czarnecki, Maxwell Elliot Jaderberg, Tom Schaul, David Silver, Koray Kavukcuoglu
-
Publication number: 20190258929Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for asynchronous deep reinforcement learning. One of the systems includes a plurality of workers, wherein each worker is configured to operate independently of each other worker, and wherein each worker is associated with a respective actor that interacts with a respective replica of the environment during the training of the deep neural network.Type: ApplicationFiled: May 3, 2019Publication date: August 22, 2019Inventors: Volodymyr Mnih, Adria Puigdomenech Badia, Alexander Benjamin Graves, Timothy James Alexander Harley, David Silver, Koray Kavukcuoglu
-
Patent number: 10346741Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for asynchronous deep reinforcement learning. One of the systems includes a plurality of workers, wherein each worker is configured to operate independently of each other worker, and wherein each worker is associated with a respective actor that interacts with a respective replica of the environment during the training of the deep neural network.Type: GrantFiled: May 11, 2018Date of Patent: July 9, 2019Assignee: DeepMind Technologies LimitedInventors: Volodymyr Mnih, Adrià Puigdomènech Badia, Alexander Benjamin Graves, Timothy James Alexander Harley, David Silver, Koray Kavukcuoglu