Patents by Inventor Marc Gendron

Marc Gendron has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Reinforcement learning using pseudo-counts

Patent number: 11727264

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network used to select actions to be performed by an agent interacting with an environment. One of the methods includes obtaining data identifying (i) a first observation characterizing a first state of the environment, (ii) an action performed by the agent in response to the first observation, and (iii) an actual reward received resulting from the agent performing the action in response to the first observation; determining a pseudo-count for the first observation; determining an exploration reward bonus that incentivizes the agent to explore the environment from the pseudo-count for the first observation; generating a combined reward from the actual reward and the exploration reward bonus; and adjusting current values of the parameters of the neural network using the combined reward.

Type: Grant

Filed: May 18, 2017

Date of Patent: August 15, 2023

Assignee: DeepMind Technologies Limited

Inventors: Marc Gendron-Bellemare, Remi Munos, Srinivasan Sriram
CONTRASTIVE BEHAVIORAL SIMILARITY EMBEDDINGS FOR GENERALIZATION IN REINFORCEMENT LEARNING

Publication number: 20230102544

Abstract: Approaches are described for training an action selection neural network system for use in controlling an agent interacting with an environment to perform a task, using a contrastive loss function based on a policy similarity metric. In one aspect, a method includes: obtaining a first observation of a first training environment; obtaining a plurality of second observations of a second training environment; for each second observation, determining a respective policy similarity metric between the second observation and the first observation; processing the first observation and the second observations using the representation neural network to generate a first representation of the first training observation and a respective second representation of each second training observation; and training the representation neural network on a contrastive loss function computed using the policy similarity metrics and the first and second representations.

Type: Application

Filed: September 28, 2021

Publication date: March 30, 2023

Inventors: Rishabh Agarwal, Marlos Cholodovskis Machado, Pablo Samuel Castro Rivadeneira, Marc Gendron-Bellemare
Training action selection neural networks using leave-one-out-updates

Patent number: 11604997

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a policy neural network. The policy neural network is used to select actions to be performed by an agent that interacts with an environment by receiving an observation characterizing a state of the environment and performing an action from a set of actions in response to the received observation. A trajectory is obtained from a replay memory, and a final update to current values of the policy network parameters is determined for each training observation in the trajectory. The final updates to the current values of the policy network parameters are determined from selected action updates and leave-one-out updates.

Type: Grant

Filed: June 11, 2018

Date of Patent: March 14, 2023

Assignee: DeepMind Technologies Limited

Inventors: Marc Gendron-Bellemare, Mohammad Gheshlaghi Azar, Audrunas Gruslys, Remi Munos
Evaluating reinforcement learning policies

Patent number: 11429898

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for evaluating reinforcement learning policies. One of the methods includes receiving a plurality of training histories for a reinforcement learning agent; determining a total reward for each training observation in the training histories; partitioning the training observations into a plurality of partitions; determining, for each partition and from the partitioned training observations, a probability that the reinforcement learning agent will receive the total reward for the partition if the reinforcement learning agent performs the action for the partition in response to receiving the current observation; determining, from the probabilities and for each total reward, a respective estimated value of performing each action in response to receiving the current observation; and selecting an action from the pre-determined set of actions from the estimated values in accordance with an action selection policy.

Type: Grant

Filed: October 14, 2019

Date of Patent: August 30, 2022

Assignee: DeepMind Technologies Limited

Inventors: Joel William Veness, Marc Gendron-Bellemare
Railway vehicle coach

Patent number: 11161528

Abstract: Disclosed is railway vehicle coach, including a chassis and a bogie arranged below the chassis in a vertical direction. The coach includes a thermally insulating screen, arranged between the chassis and the bogie, above the bogie in the vertical direction.

Type: Grant

Filed: December 14, 2018

Date of Patent: November 2, 2021

Assignee: ALSTOM TRANSPORT TECHNOLOGIES

Inventors: Marc Gendron, Laurent Laloyaux, Alexandre Sharawi, Pascal Flament, Nicolas Delannoy
TRAINING MACHINE LEARNING MODELS USING TASK SELECTION POLICIES TO INCREASE LEARNING PROGRESS

Publication number: 20210150355

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a machine learning model. In one aspect, a method includes receiving training data for training the machine learning model on a plurality of tasks, where each task includes multiple batches of training data. A task is selected in accordance with a current task selection policy. A batch of training data is selected from the selected task. The machine learning model is trained on the selected batch of training data to determine updated values of the model parameters. A learning progress measure that represents a progress of the training of the machine learning model as a result of training the machine learning model on the selected batch of training data is determined. The current task selection policy is updated using the learning progress measure.

Type: Application

Filed: January 27, 2021

Publication date: May 20, 2021

Inventors: Marc Gendron-Bellemare, Jacob Lee Menick, Alexander Benjamin Graves, Koray Kavukcuoglu, Remi Munos
Systems and Methods for Navigating Aerial Vehicles Using Deep Reinforcement Learning

Publication number: 20210124352

Abstract: The technology relates to navigating aerial vehicles using deep reinforcement learning techniques to generate flight policies. An operational system for controlling flight of an aerial vehicle may include a computing system configured to process an input vector representing a state of the aerial vehicle and output an action, an operation-ready policies server configured to store a trained neural network encoding a learned flight policy, and a controller configured to control the aerial vehicle. The input vector may be processed using the trained neural network encoding the learned flight policy.

Type: Application

Filed: October 29, 2019

Publication date: April 29, 2021

Applicant: LOON LLC

Inventors: Salvatore J. Candido, Jun Gong, Marc Gendron-Bellemare
Systems and Methods for Navigating Aerial Vehicles Using Deep Reinforcement Learning

Publication number: 20210123741

Abstract: The technology relates to navigating aerial vehicles using deep reinforcement learning techniques to generate flight policies. A computing system may include a simulator configured to produce simulations of a flight of the aerial vehicle in a region of an atmosphere, a replay buffer configured to store frames of the simulations, and a learning module having a deep reinforcement learning architecture configured to, by a reinforcement learning algorithm, process an input of a set of frames, and output a neural network encoding a learned flight policy. A meta-learning system may include stacks of learning systems, a coordinator configured to provide an instruction to the learning systems that includes a parameter and a start time, and an evaluation server configured to evaluate resulting rewards from learned flight policies generated by the learning systems.

Type: Application

Filed: October 29, 2019

Publication date: April 29, 2021

Applicant: LOON LLC

Inventors: Salvatore J. Candido, Jun Gong, Marc Gendron-Bellemare, Marlos Cholodovskis Machado
TRAINING ACTION SELECTION NEURAL NETWORKS

Publication number: 20210110271

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a policy neural network. The policy neural network is used to select actions to be performed by an agent that interacts with an environment by receiving an observation characterizing a state of the environment and performing an action from a set of actions in response to the received observation. A trajectory is obtained from a replay memory, and a final update to current values of the policy network parameters is determined for each training observation in the trajectory. The final updates to the current values of the policy network parameters are determined from selected action updates and leave-one-out updates.

Type: Application

Filed: June 11, 2018

Publication date: April 15, 2021

Inventors: Marc Gendron-Bellemare, Mohammad Gheshlaghi Azar, Audrunas Gruslys, Remi Munos
DISTRIBUTIONAL REINFORCEMENT LEARNING

Publication number: 20210064970

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting an action to be performed by a reinforcement learning agent interacting with an environment. A current observation characterizing a current state of the environment is received. For each action in a set of multiple actions that can be performed by the agent to interact with the environment, a probability distribution is determined over possible Q returns for the action-current observation pair. For each action, a measure of central tendency of the possible Q returns with respect to the probability distributions for the action-current observation pair is determined. An action to be performed by the agent in response to the current observation is selected using the measures of central tendency.

Type: Application

Filed: November 16, 2020

Publication date: March 4, 2021

Inventors: Marc Gendron-Bellemare, William Clinton Dabney
Training machine learning models using task selection policies to increase learning progress

Patent number: 10936949

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a machine learning model. In one aspect, a method includes receiving training data for training the machine learning model on a plurality of tasks, where each task includes multiple batches of training data. A task is selected in accordance with a current task selection policy. A batch of training data is selected from the selected task. The machine learning model is trained on the selected batch of training data to determine updated values of the model parameters. A learning progress measure that represents a progress of the training of the machine learning model as a result of training the machine learning model on the selected batch of training data is determined. The current task selection policy is updated using the learning progress measure.

Type: Grant

Filed: July 10, 2019

Date of Patent: March 2, 2021

Assignee: DeepMind Technologies Limited

Inventors: Marc Gendron-Bellemare, Jacob Lee Menick, Alexander Benjamin Graves, Koray Kavukcuoglu, Remi Munos
Distributional reinforcement learning

Patent number: 10860920

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting an action to be performed by a reinforcement learning agent interacting with an environment. A current observation characterizing a current state of the environment is received. For each action in a set of multiple actions that can be performed by the agent to interact with the environment, a probability distribution is determined over possible Q returns for the action-current observation pair. For each action, a measure of central tendency of the possible Q returns with respect to the probability distributions for the action-current observation pair is determined. An action to be performed by the agent in response to the current observation is selected using the measures of central tendency.

Type: Grant

Filed: July 10, 2019

Date of Patent: December 8, 2020

Assignee: DeepMind Technologies Limited

Inventors: Marc Gendron-Bellemare, William Clinton Dabney
REINFORCEMENT LEARNING USING PSEUDO-COUNTS

Publication number: 20200327405

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network used to select actions to be performed by an agent interacting with an environment. One of the methods includes obtaining data identifying (i) a first observation characterizing a first state of the environment, (ii) an action performed by the agent in response to the first observation, and (iii) an actual reward received resulting from the agent performing the action in response to the first observation; determining a pseudo-count for the first observation; determining an exploration reward bonus that incentivizes the agent to explore the environment from the pseudo-count for the first observation; generating a combined reward from the actual reward and the exploration reward bonus; and adjusting current values of the parameters of the neural network using the combined reward.

Type: Application

Filed: May 18, 2017

Publication date: October 15, 2020

Inventors: Marc Gendron-Bellemare, Remi Munos, Srinivasan Sriram
RAILWAY COACH VEHICLE

Publication number: 20200164901

Abstract: A railway vehicle coach, including a roof, a box mounted on the roof, a least one primary box support, inserted between the box and the roof, each primary support being arranged in a peripheral part of the box and being suitable for bearing the weight of the box, the coach including at least one secondary support inserted between the box and the roof, each secondary support being arranged in a central part of the box and being able to bear the weight of the box in case of collapse of the roof in the location of a primary support.

Type: Application

Filed: November 21, 2019

Publication date: May 28, 2020

Inventors: Marc GENDRON, Laurent LALOYAUX, Alexandre SHARAWI, Pascal FLAMENT, Nicolas DELANNOY
TRAINING MACHINE LEARNING MODELS

Publication number: 20190332938

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a machine learning model. In one aspect, a method includes receiving training data for training the machine learning model on a plurality of tasks, where each task includes multiple batches of training data. A task is selected in accordance with a current task selection policy. A batch of training data is selected from the selected task. The machine learning model is trained on the selected batch of training data to determine updated values of the model parameters. A learning progress measure that represents a progress of the training of the machine learning model as a result of training the machine learning model on the selected batch of training data is determined. The current task selection policy is updated using the learning progress measure.

Type: Application

Filed: July 10, 2019

Publication date: October 31, 2019

Inventors: Marc Gendron-Bellemare, Jacob Lee Menick, Alexander Benjamin Graves, Koray Kavukcuoglu, Remi Munos
DISTRIBUTIONAL REINFORCEMENT LEARNING

Publication number: 20190332923

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting an action to be performed by a reinforcement learning agent interacting with an environment. A current observation characterizing a current state of the environment is received. For each action in a set of multiple actions that can be performed by the agent to interact with the environment, a probability distribution is determined over possible Q returns for the action-current observation pair. For each action, a measure of central tendency of the possible Q returns with respect to the probability distributions for the action-current observation pair is determined. An action to be performed by the agent in response to the current observation is selected using the measures of central tendency.

Type: Application

Filed: July 10, 2019

Publication date: October 31, 2019

Inventors: Marc Gendron-Bellemare, William Clinton Dabney
Evaluating reinforcement learning policies

Patent number: 10445653

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for evaluating reinforcement learning policies. One of the methods includes receiving a plurality of training histories for a reinforcement learning agent; determining a total reward for each training observation in the training histories; partitioning the training observations into a plurality of partitions; determining, for each partition and from the partitioned training observations, a probability that the reinforcement learning agent will receive the total reward for the partition if the reinforcement learning agent performs the action for the partition in response to receiving the current observation; determining, from the probabilities and for each total reward, a respective estimated value of performing each action in response to receiving the current observation; and selecting an action from the pre-determined set of actions from the estimated values in accordance with an action selection policy.

Type: Grant

Filed: August 7, 2015

Date of Patent: October 15, 2019

Assignee: DeepMind Technologies Limited

Inventors: Joel William Veness, Marc Gendron-Bellemare
Railway vehicle coach

Publication number: 20190185028

Abstract: Disclosed is railway vehicle coach, including a chassis and a bogie arranged below the chassis in a vertical direction. The coach includes a thermally insulating screen, arranged between the chassis and the bogie, above the bogie in the vertical direction.

Type: Application

Filed: December 14, 2018

Publication date: June 20, 2019

Inventors: Marc GENDRON, Laurent LALOYAUX, Alexandre SHARAWI, Pascal FLAMENT, Nicolas DELANNOY
Processing value-ascertainable items

Patent number: 8751294

Abstract: Techniques are provided for allowing a merchant to process third party closed-loop instruments (such as gift cards) as if the closed-loop instruments were open-loop instruments. A customer provides card data of a third party gift card to a merchant, e.g., online or in a merchant store, for the purchase of one or more items provided by the merchant. The merchant sends the gift card data to an intermediary that deducts at least a portion of the balance of the gift card. The intermediary sends an offer for the gift card to the customer. If the customer accepts the offer, then the merchant applies the offer towards the total purchase price of the one or more items.

Type: Grant

Filed: May 9, 2012

Date of Patent: June 10, 2014

Assignee: e2interactive, Inc.

Inventors: Ashmit Bhattacharya, Bruce Bower, Gary Briggs, Marc Gendron, Steve Grove, Tina Henson, Parker Thomas
PROCESSING VALUE-ASCERTAINABLE ITEMS

Publication number: 20120221425

Abstract: Techniques are provided for allowing a merchant to process third party closed-loop instruments (such as gift cards) as if the closed-loop instruments were open-loop instruments. A customer provides card data of a third party gift card to a merchant, e.g., online or in a merchant store, for the purchase of one or more items provided by the merchant. The merchant sends the gift card data to an intermediary that deducts at least a portion of the balance of the gift card. The intermediary sends an offer for the gift card to the customer. If the customer accepts the offer, then the merchant applies the offer towards the total purchase price of the one or more items.

Type: Application

Filed: May 9, 2012

Publication date: August 30, 2012

Inventors: Ashmit Bhattacharya, Bruce Bower, Gary Briggs, Marc Gendron, Steve Grove, Tina Henson, Parker Thomas

1 2 next