Patents by Inventor Michal Valko

Michal Valko has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

REINFORCEMENT LEARNING USING HINDSIGHT TO MODEL UNPREDICTABLE ASPECTS OF THE FUTURE

Publication number: 20250068919

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network used to select actions to be performed by an agent interacting with an environment. Implementations of the method model unpredictable aspects of the future, using hindsight. They use this information to disentangle inherently unpredictable, aleatoric variation, from epistemic uncertainty that arises from lack of knowledge of the environment. They then use the epistemic uncertainty, which relates to in principle predictable aspects of the environment, as a source of intrinsic reward to drive curiosity, i.e. exploration of the environment by the agent.

Type: Application

Filed: August 25, 2023

Publication date: February 27, 2025

Inventors: Daniel Jarrett, Corentin Tallec, Florent Altché, Thomas Mesnard, Remi Munos, Michal Valko
REINFORCEMENT LEARNING BY DIRECTLY LEARNING AN ADVANTAGE FUNCTION

Publication number: 20240256882

Abstract: A system and method, implemented by one or more computers, of controlling an agent to take actions in an environment to perform a task is provided. The method comprises maintaining a value function neural network an advantage function neural network that is an estimate of a state-action advantage function representing a relative advantage of performing one possible action relative to the other possible actions. The method further comprises using the advantage function neural network to control the agent to take actions in the environment to perform the task. The method also comprises training the value function neural network and the advantage function neural network in a way that takes into account a behavior policy defined by a distribution of actions taken by the agent in training data.

Type: Application

Filed: January 26, 2024

Publication date: August 1, 2024

Inventors: Yunhao Tang, Remi Munos, Mark Daniel Rowland, Michal Valko
SELF-SUPERVISED REPRESENTATION LEARNING USING BOOTSTRAPPED LATENT REPRESENTATIONS

Publication number: 20210383225

Abstract: A computer-implemented method of training a neural network. The method comprises processing a first transformed view of a training data item, e.g. an image, with a target neural network to generate a target output, processing a second transformed view of the training data item, e.g. image, with an online neural network to generate a prediction of the target output, updating parameters of the online neural network to minimize an error between the prediction of the target output and the target output, and updating parameters of the target neural network based on the parameters of the online neural network. The method can effectively train an encoder neural network without using labelled training data items, and without using a contrastive loss, i.e. without needing “negative examples” which comprise transformed views of different data items.

Type: Application

Filed: June 4, 2021

Publication date: December 9, 2021

Inventors: Jean-Bastien François Laurent Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre Richemond, Bernardo Avila Pires, Zhaohan Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Remi Munos, Michal Valko

REINFORCEMENT LEARNING USING HINDSIGHT TO MODEL UNPREDICTABLE ASPECTS OF THE FUTURE

REINFORCEMENT LEARNING BY DIRECTLY LEARNING AN ADVANTAGE FUNCTION

SELF-SUPERVISED REPRESENTATION LEARNING USING BOOTSTRAPPED LATENT REPRESENTATIONS