Patents by Inventor Victor Constant Bapst

Victor Constant Bapst has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

TRAINING ACTION SELECTION NEURAL NETWORKS USING OFF-POLICY ACTOR CRITIC REINFORCEMENT LEARNING AND STOCHASTIC DUELING NEURAL NETWORKS

Publication number: 20250094772

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network. One of the methods includes maintaining a replay memory that stores trajectories generated as a result of interaction of an agent with an environment; and training an action selection neural network having policy parameters on the trajectories in the replay memory, wherein training the action selection neural network comprises: sampling a trajectory from the replay memory; and adjusting current values of the policy parameters by training the action selection neural network on the trajectory using an off-policy actor critic reinforcement learning technique.

Type: Application

Filed: November 27, 2024

Publication date: March 20, 2025

Inventors: Ziyu Wang, Nicolas Manfred Otto Heess, Victor Constant Bapst
Training action selection neural networks using off-policy actor critic reinforcement learning and stochastic dueling neural networks

Patent number: 12190223

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network. One of the methods includes maintaining a replay memory that stores trajectories generated as a result of interaction of an agent with an environment; and training an action selection neural network having policy parameters on the trajectories in the replay memory, wherein training the action selection neural network comprises: sampling a trajectory from the replay memory; and adjusting current values of the policy parameters by training the action selection neural network on the trajectory using an off-policy actor critic reinforcement learning technique.

Type: Grant

Filed: May 28, 2020

Date of Patent: January 7, 2025

Assignee: DeepMind Technologies Limited

Inventors: Ziyu Wang, Nicolas Manfred Otto Heess, Victor Constant Bapst
Predicting properties of materials from physical material structures

Patent number: 12190236

Abstract: Methods, computer systems, and apparatus, including computer programs encoded on computer storage media, for predicting one or more properties of a material. One of the methods includes maintaining data specifying a set of known materials each having a respective known physical structure; receiving data specifying a new material; identifying a plurality of known materials in the set of known materials that are similar to the new material; determining a predicted embedding of the new material from at least respective embeddings corresponding to each of the similar known materials; and processing the predicted embedding of the new material using an experimental prediction neural network to predict one or more properties of the new material.

Type: Grant

Filed: April 26, 2021

Date of Patent: January 7, 2025

Assignee: DeepMind Technologies Limited

Inventors: Annette Ada Nkechinyere Obika, Tian Xie, Victor Constant Bapst, Alexander Lloyd Gaunt, James Kirkpatrick
Multi-task neural network systems with task-specific policies and a shared policy

Patent number: 11983634

Abstract: A method is proposed for training a multitask computer system, such as a multitask neural network system. The system comprises a set of trainable workers and a shared module. The trainable workers and shared module are trained on a plurality of different tasks, such that each worker learns to perform a corresponding one of the tasks according to a respective task policy, and said shared policy network learns a multitask policy which represents common behavior for the tasks. The coordinated training is performed by optimizing an objective function comprising, for each task: a reward term indicative of an expected reward earned by a worker in performing the corresponding task according to the task policy; and at least one entropy term which regularizes the distribution of the task policy towards the distribution of the multitask policy.

Type: Grant

Filed: September 27, 2021

Date of Patent: May 14, 2024

Assignee: DeepMind Technologies Limited

Inventors: Razvan Pascanu, Raia Thais Hadsell, Victor Constant Bapst, Wojciech Czarnecki, James Kirkpatrick, Yee Whye Teh, Nicolas Manfred Otto Heess
REINFORCEMENT LEARNING USING A RELATIONAL NETWORK FOR GENERATING DATA ENCODING RELATIONSHIPS BETWEEN ENTITIES IN AN ENVIRONMENT

Publication number: 20230196146

Abstract: A neural network system is proposed, including an input network for extracting, from state data, respective entity data for each a plurality of entities which are present, or at least potentially present, in the environment. The entity data describes the entity. The neural network contains a relational network for parsing this data, which includes one or more attention blocks which may be stacked to perform successive actions on the entity data. The attention blocks each include a respective transform network for each of the entities. The transform network for each entity is able to transform data which the transform network receives for the entity into modified entity data for the entity, based on data for a plurality of the other entities. An output network is arranged to receive data output by the relational network, and use the received data to select a respective action.

Type: Application

Filed: February 13, 2023

Publication date: June 22, 2023

Inventors: Yujia Li, Victor Constant Bapst, Vinicius Zambaldi, David Nunes Raposo, Adam Anthony Santoro
Reinforcement learning using a relational network for generating data encoding relationships between entities in an environment

Patent number: 11580429

Abstract: A neural network system is proposed, including an input network for extracting, from state data, respective entity data for each a plurality of entities which are present, or at least potentially present, in the environment. The entity data describes the entity. The neural network contains a relational network for parsing this data, which includes one or more attention blocks which may be stacked to perform successive actions on the entity data. The attention blocks each include a respective transform network for each of the entities. The transform network for each entity is able to transform data which the transform network receives for the entity into modified entity data for the entity, based on data for a plurality of the other entities. An output network is arranged to receive data output by the relational network, and use the received data to select a respective action.

Type: Grant

Filed: May 20, 2019

Date of Patent: February 14, 2023

Assignee: DeepMind Technologies Limited

Inventors: Yujia Li, Victor Constant Bapst, Vinicius Zambaldi, David Nunes Raposo, Adam Anthony Santoro
TRAINING ACTION SELECTION NEURAL NETWORKS USING Q-LEARNING COMBINED WITH LOOK AHEAD SEARCH

Publication number: 20220366247

Abstract: A reinforcement learning system and method that selects actions to be performed by an agent interacting with an environment. The system uses a combination of reinforcement learning and a look ahead search: Reinforcement learning Q-values are used to guide the look ahead search and the search is used in turn to improve the Q-values. The system learns from a combination of real experience and simulated, model-based experience.

Type: Application

Filed: September 23, 2020

Publication date: November 17, 2022

Inventors: Jessica Blake Chandler Hamrick, Victor Constant Bapst, Alvaro Sanchez, Tobias Pfaff, Theophane Guillaume Weber, Lars Buesing, Peter William Battaglia
MULTI-TASK NEURAL NETWORK SYSTEMS WITH TASK-SPECIFIC POLICIES AND A SHARED POLICY

Publication number: 20220083869

Abstract: A method is proposed for training a multitask computer system, such as a multitask neural network system. The system comprises a set of trainable workers and a shared module. The trainable workers and shared module are trained on a plurality of different tasks, such that each worker learns to perform a corresponding one of the tasks according to a respective task policy, and said shared policy network learns a multitask policy which represents common behavior for the tasks. The coordinated training is performed by optimizing an objective function comprising, for each task: a reward term indicative of an expected reward earned by a worker in performing the corresponding task according to the task policy; and at least one entropy term which regularizes the distribution of the task policy towards the distribution of the multitask policy.

Type: Application

Filed: September 27, 2021

Publication date: March 17, 2022

Inventors: Razvan Pascanu, Raia Thais Hadsell, Victor Constant Bapst, Wojciech Czarnecki, James Kirkpatrick, Yee Whye Teh, Nicolas Manfred Otto Heess
PREDICTING PROPERTIES OF MATERIALS FROM PHYSICAL MATERIAL STRUCTURES

Publication number: 20210334655

Abstract: Methods, computer systems, and apparatus, including computer programs encoded on computer storage media, for predicting one or more properties of a material. One of the methods includes maintaining data specifying a set of known materials each having a respective known physical structure; receiving data specifying a new material; identifying a plurality of known materials in the set of known materials that are similar to the new material; determining a predicted embedding of the new material from at least respective embeddings corresponding to each of the similar known materials; and processing the predicted embedding of the new material using an experimental prediction neural network to predict one or more properties of the new material.

Type: Application

Filed: April 26, 2021

Publication date: October 28, 2021

Inventors: Annette Ada Nkechinyere Obika, Tian Xie, Victor Constant Bapst, Alexander Lloyd Gaunt, James Kirkpatrick
Multi-task neural network systems with task-specific policies and a shared policy

Patent number: 11132609

Abstract: A method is proposed for training a multitask computer system, such as a multitask neural network system. The system comprises a set of trainable workers and a shared module. The trainable workers and shared module are trained on a plurality of different tasks, such that each worker learns to perform a corresponding one of the tasks according to a respective task policy, and said shared policy network learns a multitask policy which represents common behavior for the tasks. The coordinated training is performed by optimizing an objective function comprising, for each task: a reward term indicative of an expected reward earned by a worker in performing the corresponding task according to the task policy; and at least one entropy term which regularizes the distribution of the task policy towards the distribution of the multitask policy.

Type: Grant

Filed: November 19, 2019

Date of Patent: September 28, 2021

Assignee: DeepMind Technologies Limited

Inventors: Razvan Pascanu, Raia Thais Hadsell, Victor Constant Bapst, Wojciech Czarnecki, James Kirkpatrick, Yee Whye Teh, Nicolas Manfred Otto Heess
TRAINING ACTION SELECTION NEURAL NETWORKS USING OFF-POLICY ACTOR CRITIC REINFORCEMENT LEARNING

Publication number: 20200293862

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network. One of the methods includes maintaining a replay memory that stores trajectories generated as a result of interaction of an agent with an environment; and training an action selection neural network having policy parameters on the trajectories in the replay memory, wherein training the action selection neural network comprises: sampling a trajectory from the replay memory; and adjusting current values of the policy parameters by training the action selection neural network on the trajectory using an off-policy actor critic reinforcement learning technique.

Type: Application

Filed: May 28, 2020

Publication date: September 17, 2020

Inventors: Ziyu Wang, Nicolas Manfred Otto Heess, Victor Constant Bapst
Training action selection neural networks using off-policy actor critic reinforcement learning

Patent number: 10706352

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network. One of the methods includes maintaining a replay memory that stores trajectories generated as a result of interaction of an agent with an environment; and training an action selection neural network having policy parameters on the trajectories in the replay memory, wherein training the action selection neural network comprises: sampling a trajectory from the replay memory; and adjusting current values of the policy parameters by training the action selection neural network on the trajectory using an off-policy actor critic reinforcement learning technique.

Type: Grant

Filed: May 3, 2019

Date of Patent: July 7, 2020

Assignee: DeepMind Technologies Limited

Inventors: Ziyu Wang, Nicolas Manfred Otto Heess, Victor Constant Bapst
MULTI-TASK NEURAL NETWORK SYSTEMS WITH TASK-SPECIFIC POLICIES AND A SHARED POLICY

Publication number: 20200090048

Abstract: A method is proposed for training a multitask computer system, such as a multitask neural network system. The system comprises a set of trainable workers and a shared module. The trainable workers and shared module are trained on a plurality of different tasks, such that each worker learns to perform a corresponding one of the tasks according to a respective task policy, and said shared policy network learns a multitask policy which represents common behavior for the tasks. The coordinated training is performed by optimizing an objective function comprising, for each task: a reward term indicative of an expected reward earned by a worker in performing the corresponding task according to the task policy; and at least one entropy term which regularizes the distribution of the task policy towards the distribution of the multitask policy.

Type: Application

Filed: November 19, 2019

Publication date: March 19, 2020

Inventors: Razvan Pascanu, Raia Thais Hadsell, Victor Constant Bapst, Wojciech Czarnecki, James Kirkpatrick, Yee Whye Teh, Nicolas Manfred Otto Heess
REINFORCEMENT LEARNING USING A RELATIONAL NETWORK FOR GENERATING DATA ENCODING RELATIONSHIPS BETWEEN ENTITIES IN AN ENVIRONMENT

Publication number: 20190354885

Abstract: A neural network system is proposed, including an input network for extracting, from state data, respective entity data for each a plurality of entities which are present, or at least potentially present, in the environment. The entity data describes the entity. The neural network contains a relational network for parsing this data, which includes one or more attention blocks which may be stacked to perform successive actions on the entity data. The attention blocks each include a respective transform network for each of the entities. The transform network for each entity is able to transform data which the transform network receives for the entity into modified entity data for the entity, based on data for a plurality of the other entities. An output network is arranged to receive data output by the relational network, and use the received data to select a respective action.

Type: Application

Filed: May 20, 2019

Publication date: November 21, 2019

Inventors: Yujia Li, Victor Constant Bapst, Vinicius Zambaldi, David Nunes Raposo, Adam Anthony Santoro
TRAINING ACTION SELECTION NEURAL NETWORKS

Publication number: 20190258918

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network. One of the methods includes maintaining a replay memory that stores trajectories generated as a result of interaction of an agent with an environment; and training an action selection neural network having policy parameters on the trajectories in the replay memory, wherein training the action selection neural network comprises: sampling a trajectory from the replay memory; and adjusting current values of the policy parameters by training the action selection neural network on the trajectory using an off-policy actor critic reinforcement learning technique.

Type: Application

Filed: May 3, 2019

Publication date: August 22, 2019

Inventors: Ziyu Wang, Nicolas Manfred Otto Heess, Victor Constant Bapst