Patents Assigned to Osaro, Inc.

REINFORCEMENT LEARNING USING A PARTITIONED INPUT STATE SPACE

Publication number: 20170213150

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for reinforcement learning using a partitioned reinforcement learning input state space (RL input state space). One of the methods includes maintaining data defining a plurality of partitions of a space of reinforcement learning (RL) input states, each partition corresponding to a respective supervised learning model; obtaining a current state representation that represents a current state of the environment; for the current state representation and for each action in the set of actions, identifying a respective partition and processing the action and the current state representation using the supervised learning model that corresponds to the respective partition to generate a respective current value function estimate; and selecting an action to be performed by the computer-implemented agent in response to the current state representation using the respective current value function estimates.

Type: Application

Filed: July 8, 2016

Publication date: July 27, 2017

Applicant: Osaro, Inc.

Inventors: Itamar Arel, Michael Kahane, Khashayar Rohanimanesh
Reinforcement learning using confidence scores

Patent number: 9536191

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for reinforcement learning using confidence scores. One of the methods includes receiving a current observation; for each of multiple actions: determining a respective value function estimate that is an estimate of a return resulting from the agent performing the action in response to the current observation, determining a respective confidence score that is a measure of confidence that the respective value function estimate for the action is an accurate estimate of the return that will result from the agent performing the action in response to the current observation, adjusting the respective value function estimate for the action using the respective confidence score for the action to determine a respective adjusted value function estimate; and selecting an action to be performed by the agent in response to the current observation using the respective adjusted value function estimates.

Type: Grant

Filed: November 25, 2015

Date of Patent: January 3, 2017

Assignee: Osaro, Inc.

Inventors: Itamar Arel, Michael Kahane, Khashayar Rohanimanesh

REINFORCEMENT LEARNING USING A PARTITIONED INPUT STATE SPACE

Reinforcement learning using confidence scores