Patents by Inventor Abbas Abdolmaleki

Abbas Abdolmaleki has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240220795
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling agents using jumpy trajectory decoder neural networks.
    Type: Application
    Filed: December 29, 2023
    Publication date: July 4, 2024
    Inventors: Jingwei Zhang, Arunkumar Byravan, Jost Tobias Springenberg, Martin Riedmiller, Nicolas Manfred Otto Heess, Leonard Hasenclever, Abbas Abdolmaleki, Dushyant Rao
  • Publication number: 20240185084
    Abstract: Computer implemented systems and methods for training an action selection policy neural network to select actions to be performed by an agent to control the agent to perform a task. The techniques are able to optimize multiple objectives one of which may be to stay close to a behavioral policy of a teacher. The behavioral policy of the teacher may be defined by a predetermined dataset of behaviors and the systems and methods may then learn offline. The described techniques provide a mechanism for explicitly defining a trade-off between the multiple objectives.
    Type: Application
    Filed: May 27, 2022
    Publication date: June 6, 2024
    Inventors: Abbas Abdolmaleki, Sandy Han Huang, Martin Riedmiller
  • Publication number: 20230368037
    Abstract: A system and method that controls an agent to perform a task subject to one or more constraints. The system trains a preference neural network that learns which preferences produce constraint-satisfying action selection policies. Thus the system optimizes a hierarchical policy that is a product of a preference policy and a preference-conditioned action selection policy. Thus the system learns to jointly optimize a set of objectives relating to rewards and costs received during the task whilst also learning preferences, i.e. trade-offs between the rewards and costs, that are most likely to produce policies that satisfy the constraints.
    Type: Application
    Filed: October 1, 2021
    Publication date: November 16, 2023
    Inventors: Sandy Han Huang, Abbas Abdolmaleki
  • Publication number: 20230082326
    Abstract: There is provided a method for training a neural network system by reinforcement learning, the neural network system being configured to receive an input observation characterizing a state of an environment interacted with by an agent and to select and output an action in accordance with a policy that aims to satisfy a plurality of objectives. The method comprises obtaining a set of one or more trajectories. Each trajectory comprises a state of an environment, an action applied by the agent to the environment according to a previous policy in response to the state, and a set of rewards for the action, each reward relating to a corresponding objective of the plurality of objectives. The method further comprises determining an action-value function for each of the plurality of objectives based on the set of one or more trajectories.
    Type: Application
    Filed: February 8, 2021
    Publication date: March 16, 2023
    Inventors: Abbas Abdolmaleki, Sandy Han Huang
  • Publication number: 20220343157
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a policy neural network having policy parameters. One of the methods includes sampling a mini-batch comprising one or more observation-action-reward tuples generated as a result of interactions of a first agent with a first environment; determining an update to current values of the Q network parameters by minimizing a robust entropy-regularized temporal difference (TD) error that accounts for possible perturbations of the states of the first environment represented by the observations in the observation-action-reward tuples; and determining, using the Q-value neural network, an update to the policy network parameters using the sampled mini-batch of observation-action-reward tuples.
    Type: Application
    Filed: June 17, 2020
    Publication date: October 27, 2022
    Inventors: Daniel J. Mankowitz, Nir Levine, Rae Chan Jeong, Abbas Abdolmaleki, Jost Tobias Springenberg, Todd Andrew Hester, Timothy Arthur Mann, Martin Riedmiller
  • Publication number: 20220237488
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for controlling an agent. One of the methods includes obtaining an observation characterizing a current state of the environment and data identifying a task currently being performed by the agent; processing the observation and the data identifying the task using a high-level controller to generate a high-level probability distribution that assigns a respective probability to each of a plurality of low-level controllers; processing the observation using each of the plurality of low-level controllers to generate, for each of the plurality of low-level controllers, a respective low-level probability distribution; generating a combined probability distribution; and selecting, using the combined probability distribution, an action from the space of possible actions to be performed by the agent in response to the observation.
    Type: Application
    Filed: May 22, 2020
    Publication date: July 28, 2022
    Inventors: Markus Wulfmeier, Abbas Abdolmaleki, Roland Hafner, Jost Tobias Springenberg, Nicolas Manfred Otto Heess, Martin Riedmiller
  • Patent number: 10786900
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining a control policy for a vehicles or other robot through the performance of a reinforcement learning simulation of the robot.
    Type: Grant
    Filed: September 27, 2019
    Date of Patent: September 29, 2020
    Assignee: DeepMind Technologies Limited
    Inventors: Steven Bohez, Abbas Abdolmaleki