Patents by Inventor Georg Ostrovski

Georg Ostrovski has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

DISTRIBUTIONAL REINFORCEMENT LEARNING USING QUANTILE FUNCTION NEURAL NETWORKS

Publication number: 20240135182

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting an action to be performed by a reinforcement learning agent interacting with an environment. In one aspect, a method comprises: receiving a current observation; for each action of a plurality of actions: randomly sampling one or more probability values; for each probability value: processing the action, the current observation, and the probability value using a quantile function network to generate an estimated quantile value for the probability value with respect to a probability distribution over possible returns that would result from the agent performing the action in response to the current observation; determining a measure of central tendency of the one or more estimated quantile values; and selecting an action to be performed by the agent in response to the current observation using the measures of central tendency for the actions.

Type: Application

Filed: December 15, 2023

Publication date: April 25, 2024

Inventors: Georg Ostrovski, William Clinton Dabney
Distributional reinforcement learning using quantile function neural networks

Patent number: 11887000

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting an action to be performed by a reinforcement learning agent interacting with an environment. In one aspect, a method comprises: receiving a current observation; for each action of a plurality of actions: randomly sampling one or more probability values; for each probability value: processing the action, the current observation, and the probability value using a quantile function network to generate an estimated quantile value for the probability value with respect to a probability distribution over possible returns that would result from the agent performing the action in response to the current observation; determining a measure of central tendency of the one or more estimated quantile values; and selecting an action to be performed by the agent in response to the current observation using the measures of central tendency for the actions.

Type: Grant

Filed: February 15, 2023

Date of Patent: January 30, 2024

Assignee: DeepMind Technologies Limited

Inventors: Georg Ostrovski, William Clinton Dabney
DISTRIBUTIONAL REINFORCEMENT LEARNING USING QUANTILE FUNCTION NEURAL NETWORKS

Publication number: 20230196108

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting an action to be performed by a reinforcement learning agent interacting with an environment. In one aspect, a method comprises: receiving a current observation; for each action of a plurality of actions: randomly sampling one or more probability values; for each probability value: processing the action, the current observation, and the probability value using a quantile function network to generate an estimated quantile value for the probability value with respect to a probability distribution over possible returns that would result from the agent performing the action in response to the current observation; determining a measure of central tendency of the one or more estimated quantile values; and selecting an action to be performed by the agent in response to the current observation using the measures of central tendency for the actions.

Type: Application

Filed: February 15, 2023

Publication date: June 22, 2023

Inventors: Georg Ostrovski, William Clinton Dabney
Distributional reinforcement learning using quantile function neural networks

Patent number: 11610118

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting an action to be performed by a reinforcement learning agent interacting with an environment. In one aspect, a method comprises: receiving a current observation; for each action of a plurality of actions: randomly sampling one or more probability values; for each probability value: processing the action, the current observation, and the probability value using a quantile function network to generate an estimated quantile value for the probability value with respect to a probability distribution over possible returns that would result from the agent performing the action in response to the current observation; determining a measure of central tendency of the one or more estimated quantile values; and selecting an action to be performed by the agent in response to the current observation using the measures of central tendency for the actions.

Type: Grant

Filed: February 11, 2019

Date of Patent: March 21, 2023

Assignee: DeepMind Technologies Limited

Inventors: Georg Ostrovski, William Clinton Dabney
MODULATING AGENT BEHAVIOR TO OPTIMIZE LEARNING PROGRESS

Publication number: 20210089908

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for controlling an agent. One of the methods includes sampling a behavior modulation in accordance with a current probability distribution; for each of one or more time steps: processing an input comprising an observation characterizing a current state of the environment at the time step using an action selection neural network to generate a respective action score for each action in a set of possible actions that can be performed by the agent; modifying the action scores using the sampled behavior modulation; and selecting the action to be performed by the agent at the time step based on the modified action scores; determining a fitness measure corresponding to the sampled behavior modulation; and updating the current probability distribution over the set of possible behavior modulations using the fitness measure corresponding to the behavior modulation.

Type: Application

Filed: September 25, 2020

Publication date: March 25, 2021

Inventors: Tom Schaul, Diana Luiza Borsa, Fengning Ding, David Szepesvari, Georg Ostrovski, Simon Osindero, William Clinton Dabney
DISTRIBUTIONAL REINFORCEMENT LEARNING USING QUANTILE FUNCTION NEURAL NETWORKS

Publication number: 20200364557

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting an action to be performed by a reinforcement learning agent interacting with an environment. In one aspect, a method comprises: receiving a current observation; for each action of a plurality of actions: randomly sampling one or more probability values; for each probability value: processing the action, the current observation, and the probability value using a quantile function network to generate an estimated quantile value for the probability value with respect to a probability distribution over possible returns that would result from the agent performing the action in response to the current observation; determining a measure of central tendency of the one or more estimated quantile values; and selecting an action to be performed by the agent in response to the current observation using the measures of central tendency for the actions.

Type: Application

Filed: February 11, 2019

Publication date: November 19, 2020

Inventors: Georg Ostrovski, William Clinton Dabney

DISTRIBUTIONAL REINFORCEMENT LEARNING USING QUANTILE FUNCTION NEURAL NETWORKS

Distributional reinforcement learning using quantile function neural networks

DISTRIBUTIONAL REINFORCEMENT LEARNING USING QUANTILE FUNCTION NEURAL NETWORKS

Distributional reinforcement learning using quantile function neural networks

MODULATING AGENT BEHAVIOR TO OPTIMIZE LEARNING PROGRESS

DISTRIBUTIONAL REINFORCEMENT LEARNING USING QUANTILE FUNCTION NEURAL NETWORKS