Patents by Inventor Matthieu Florent Geist

Matthieu Florent Geist has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

STATE-DEPENDENT ACTION SPACE QUANTIZATION

Publication number: 20230093451

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling agents. In particular, an agent can be controlled using a discretization neural network that generates a state-dependent discretization of an original action space and a policy neural network that is used to select an action from the state-dependent quantization rather than from the original action space.

Type: Application

Filed: September 19, 2022

Publication date: March 23, 2023

Inventors: Robert Dadashi-Tazehozi, Olivier Claude Pietquin, Léonard Hussenot Desenonges, Matthieu Florent Geist, Anton Raichuk, Damien Vincent, Sertan Girgin
TRAINING REINFORCEMENT LEARNING AGENTS TO LEARN EXPERT EXPLORATION BEHAVIORS FROM DEMONSTRATORS

Publication number: 20210397959

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network used to select actions performed by an agent interacting with an environment by performing actions that cause the environment to transition states. One of the methods includes obtaining a transition generated as a result of the reinforcement learning agent interacting with the environment, processing a bonus input using a bonus estimation neural network to generate an exploration bonus estimate that encourages the agent to explore the environment in accordance with an expert exploration strategy that would be adopted by an expert agent; generating a modified reward from the reward included in the transition and the exploration bonus estimate; and determining an update to current parameter values of the neural network to optimize a reinforcement learning objective function that maximizes returns to be received by the agent with respect to the modified reward.

Type: Application

Filed: June 22, 2021

Publication date: December 23, 2021

Inventors: Olivier Claude Pietquin, Léonard Hussenot Desenonges, Robert Dadashi-Tazehozi, Matthieu Florent Geist
TRAINING REINFORCEMENT LEARNING AGENTS USING AUGMENTED TEMPORAL DIFFERENCE LEARNING

Publication number: 20210390409

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network used to select actions performed by an agent interacting with an environment by performing actions that cause the environment to transition states. One of the methods includes training the neural network on one or more transitions selected from a replay memory, including: generating, using the neural network, an action selection output for the current observation; determining, based on the action selection output and the current action performed by the agent in response to the current observation, a state-action target for the current observation; determining a gradient of a temporal difference (TD) loss function with respect to parameters of the neural network, wherein the TD loss function comprises a first term that depends on the state-action target for the current observation; and adjusting current parameter values of the neural network based on the gradient.

Type: Application

Filed: June 14, 2021

Publication date: December 16, 2021

Inventors: Matthieu Florent Geist, Nino Vieillard, Olivier Claude Pietquin

STATE-DEPENDENT ACTION SPACE QUANTIZATION

TRAINING REINFORCEMENT LEARNING AGENTS TO LEARN EXPERT EXPLORATION BEHAVIORS FROM DEMONSTRATORS

TRAINING REINFORCEMENT LEARNING AGENTS USING AUGMENTED TEMPORAL DIFFERENCE LEARNING