Patents by Inventor Abbas Abdolmaleki

Abbas Abdolmaleki has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

PLANNING USING A JUMPY TRAJECTORY DECODER NEURAL NETWORK

Publication number: 20240220795

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling agents using jumpy trajectory decoder neural networks.

Type: Application

Filed: December 29, 2023

Publication date: July 4, 2024

Inventors: Jingwei Zhang, Arunkumar Byravan, Jost Tobias Springenberg, Martin Riedmiller, Nicolas Manfred Otto Heess, Leonard Hasenclever, Abbas Abdolmaleki, Dushyant Rao
MULTI-OBJECTIVE REINFORCEMENT LEARNING USING WEIGHTED POLICY PROJECTION

Publication number: 20240185084

Abstract: Computer implemented systems and methods for training an action selection policy neural network to select actions to be performed by an agent to control the agent to perform a task. The techniques are able to optimize multiple objectives one of which may be to stay close to a behavioral policy of a teacher. The behavioral policy of the teacher may be defined by a predetermined dataset of behaviors and the systems and methods may then learn offline. The described techniques provide a mechanism for explicitly defining a trade-off between the multiple objectives.

Type: Application

Filed: May 27, 2022

Publication date: June 6, 2024

Inventors: Abbas Abdolmaleki, Sandy Han Huang, Martin Riedmiller
CONSTRAINED REINFORCEMENT LEARNING NEURAL NETWORK SYSTEMS USING PARETO FRONT OPTIMIZATION

Publication number: 20230368037

Abstract: A system and method that controls an agent to perform a task subject to one or more constraints. The system trains a preference neural network that learns which preferences produce constraint-satisfying action selection policies. Thus the system optimizes a hierarchical policy that is a product of a preference policy and a preference-conditioned action selection policy. Thus the system learns to jointly optimize a set of objectives relating to rewards and costs received during the task whilst also learning preferences, i.e. trade-offs between the rewards and costs, that are most likely to produce policies that satisfy the constraints.

Type: Application

Filed: October 1, 2021

Publication date: November 16, 2023

Inventors: Sandy Han Huang, Abbas Abdolmaleki
TRAINING MULTI-OBJECTIVE NEURAL NETWORK REINFORCEMENT LEARNING SYSTEMS

Publication number: 20230082326

Abstract: There is provided a method for training a neural network system by reinforcement learning, the neural network system being configured to receive an input observation characterizing a state of an environment interacted with by an agent and to select and output an action in accordance with a policy that aims to satisfy a plurality of objectives. The method comprises obtaining a set of one or more trajectories. Each trajectory comprises a state of an environment, an action applied by the agent to the environment according to a previous policy in response to the state, and a set of rewards for the action, each reward relating to a corresponding objective of the plurality of objectives. The method further comprises determining an action-value function for each of the plurality of objectives based on the set of one or more trajectories.

Type: Application

Filed: February 8, 2021

Publication date: March 16, 2023

Inventors: Abbas Abdolmaleki, Sandy Han Huang
ROBUST REINFORCEMENT LEARNING FOR CONTINUOUS CONTROL WITH MODEL MISSPECIFICATION

Publication number: 20220343157

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a policy neural network having policy parameters. One of the methods includes sampling a mini-batch comprising one or more observation-action-reward tuples generated as a result of interactions of a first agent with a first environment; determining an update to current values of the Q network parameters by minimizing a robust entropy-regularized temporal difference (TD) error that accounts for possible perturbations of the states of the first environment represented by the observations in the observation-action-reward tuples; and determining, using the Q-value neural network, an update to the policy network parameters using the sampled mini-batch of observation-action-reward tuples.

Type: Application

Filed: June 17, 2020

Publication date: October 27, 2022

Inventors: Daniel J. Mankowitz, Nir Levine, Rae Chan Jeong, Abbas Abdolmaleki, Jost Tobias Springenberg, Todd Andrew Hester, Timothy Arthur Mann, Martin Riedmiller
HIERARCHICAL POLICIES FOR MULTITASK TRANSFER

Publication number: 20220237488

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for controlling an agent. One of the methods includes obtaining an observation characterizing a current state of the environment and data identifying a task currently being performed by the agent; processing the observation and the data identifying the task using a high-level controller to generate a high-level probability distribution that assigns a respective probability to each of a plurality of low-level controllers; processing the observation using each of the plurality of low-level controllers to generate, for each of the plurality of low-level controllers, a respective low-level probability distribution; generating a combined probability distribution; and selecting, using the combined probability distribution, an action from the space of possible actions to be performed by the agent in response to the observation.

Type: Application

Filed: May 22, 2020

Publication date: July 28, 2022

Inventors: Markus Wulfmeier, Abbas Abdolmaleki, Roland Hafner, Jost Tobias Springenberg, Nicolas Manfred Otto Heess, Martin Riedmiller
Robot control policy determination through constrained optimization for smooth continuous control

Patent number: 10786900

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining a control policy for a vehicles or other robot through the performance of a reinforcement learning simulation of the robot.

Type: Grant

Filed: September 27, 2019

Date of Patent: September 29, 2020

Assignee: DeepMind Technologies Limited

Inventors: Steven Bohez, Abbas Abdolmaleki

PLANNING USING A JUMPY TRAJECTORY DECODER NEURAL NETWORK

MULTI-OBJECTIVE REINFORCEMENT LEARNING USING WEIGHTED POLICY PROJECTION

CONSTRAINED REINFORCEMENT LEARNING NEURAL NETWORK SYSTEMS USING PARETO FRONT OPTIMIZATION

TRAINING MULTI-OBJECTIVE NEURAL NETWORK REINFORCEMENT LEARNING SYSTEMS

ROBUST REINFORCEMENT LEARNING FOR CONTINUOUS CONTROL WITH MODEL MISSPECIFICATION

HIERARCHICAL POLICIES FOR MULTITASK TRANSFER

Robot control policy determination through constrained optimization for smooth continuous control