Patents by Inventor Daniel J. Mankowitz

Daniel J. Mankowitz has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12154029
    Abstract: A method of training an action selection neural network for controlling an agent interacting with an environment to perform different tasks is described. The method includes obtaining a first trajectory of transitions generated while the agent was performing an episode of the first task from multiple tasks; and training the action selection neural network on the first trajectory to adjust the control policies for the multiple tasks. The training includes, for each transition in the first trajectory: generating respective policy outputs for the initial observation in the transition for each task in a subset of tasks that includes the first task and one other task; generating respective target policy outputs for each task using the reward in the transition, and determining an update to the current parameter values based on, for each task, a gradient of a loss between the policy output and the target policy output for the task.
    Type: Grant
    Filed: February 5, 2019
    Date of Patent: November 26, 2024
    Assignee: DeepMind Technologies Limited
    Inventors: Tom Schaul, Matteo Hessel, Hado Philip van Hasselt, Daniel J. Mankowitz
  • Publication number: 20240267532
    Abstract: Systems and methods for training rate control neural networks through reinforcement learning. During training, reward values for training examples are generated from the current performance of the rate control neural network in encoding the video in the training example and the historical performance of the rate control neural network in encoding the video in the training example.
    Type: Application
    Filed: May 30, 2022
    Publication date: August 8, 2024
    Inventors: Anton Zhernov, Chenjie Gu, Daniel J. Mankowitz, Julian Schrittwieser, Amol Balkishan Mandhane, Mary Elizabeth Rauh, Miaosen Wang, Thomas Keisuke Hubert
  • Publication number: 20220343157
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a policy neural network having policy parameters. One of the methods includes sampling a mini-batch comprising one or more observation-action-reward tuples generated as a result of interactions of a first agent with a first environment; determining an update to current values of the Q network parameters by minimizing a robust entropy-regularized temporal difference (TD) error that accounts for possible perturbations of the states of the first environment represented by the observations in the observation-action-reward tuples; and determining, using the Q-value neural network, an update to the policy network parameters using the sampled mini-batch of observation-action-reward tuples.
    Type: Application
    Filed: June 17, 2020
    Publication date: October 27, 2022
    Inventors: Daniel J. Mankowitz, Nir Levine, Rae Chan Jeong, Abbas Abdolmaleki, Jost Tobias Springenberg, Todd Andrew Hester, Timothy Arthur Mann, Martin Riedmiller
  • Publication number: 20190244099
    Abstract: A method of training an action selection neural network for controlling an agent interacting with an environment to perform different tasks is described. The method includes obtaining a first trajectory of transitions generated while the agent was performing an episode of the first task from multiple tasks; and training the action selection neural network on the first trajectory to adjust the control policies for the multiple tasks. The training includes, for each transition in the first trajectory: generating respective policy outputs for the initial observation in the transition for each task in a subset of tasks that includes the first task and one other task; generating respective target policy outputs for each task using the reward in the transition, and determining an update to the current parameter values based on, for each task, a gradient of a loss between the policy output and the target policy output for the task.
    Type: Application
    Filed: February 5, 2019
    Publication date: August 8, 2019
    Inventors: Tom Schaul, Matteo Hessel, Hado Philip van Hasselt, Daniel J. Mankowitz