Patents by Inventor Thomas Rothoerl
Thomas Rothoerl has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11886997Abstract: An off-policy reinforcement learning actor-critic neural network system configured to select actions from a continuous action space to be performed by an agent interacting with an environment to perform a task. An observation defines environment state data and reward data. The system has an actor neural network which learns a policy function mapping the state data to action data. A critic neural network learns an action-value (Q) function. A replay buffer stores tuples of the state data, the action data, the reward data and new state data. The replay buffer also includes demonstration transition data comprising a set of the tuples from a demonstration of the task within the environment. The neural network system is configured to train the actor neural network and the critic neural network off-policy using stored tuples from the replay buffer comprising tuples both from operation of the system and from the demonstration transition data.Type: GrantFiled: October 7, 2022Date of Patent: January 30, 2024Assignee: DeepMind Technologies LimitedInventors: Olivier Pietquin, Martin Riedmiller, Wang Fumin, Bilal Piot, Mel Vecerik, Todd Andrew Hester, Thomas Rothoerl, Thomas Lampe, Nicolas Manfred Otto Heess, Jonathan Karl Scholz
-
Patent number: 11868882Abstract: An off-policy reinforcement learning actor-critic neural network system configured to select actions from a continuous action space to be performed by an agent interacting with an environment to perform a task. An observation defines environment state data and reward data. The system has an actor neural network which learns a policy function mapping the state data to action data. A critic neural network learns an action-value (Q) function. A replay buffer stores tuples of the state data, the action data, the reward data and new state data. The replay buffer also includes demonstration transition data comprising a set of the tuples from a demonstration of the task within the environment. The neural network system is configured to train the actor neural network and the critic neural network off-policy using stored tuples from the replay buffer comprising tuples both from operation of the system and from the demonstration transition data.Type: GrantFiled: June 28, 2018Date of Patent: January 9, 2024Assignee: DeepMind Technologies LimitedInventors: Olivier Claude Pietquin, Martin Riedmiller, Wang Fumin, Bilal Piot, Mel Vecerik, Todd Andrew Hester, Thomas Rothoerl, Thomas Lampe, Nicolas Manfred Otto Heess, Jonathan Karl Scholz
-
Publication number: 20230023189Abstract: An off-policy reinforcement learning actor-critic neural network system configured to select actions from a continuous action space to be performed by an agent interacting with an environment to perform a task. An observation defines environment state data and reward data. The system has an actor neural network which learns a policy function mapping the state data to action data. A critic neural network learns an action-value (Q) function. A replay buffer stores tuples of the state data, the action data, the reward data and new state data. The replay buffer also includes demonstration transition data comprising a set of the tuples from a demonstration of the task within the environment. The neural network system is configured to train the actor neural network and the critic neural network off-policy using stored tuples from the replay buffer comprising tuples both from operation of the system and from the demonstration transition data.Type: ApplicationFiled: October 7, 2022Publication date: January 26, 2023Inventors: Olivier Pietquin, Martin Riedmiller, Wang Fumin, Bilal Piot, Mel Vecerik, Todd Andrew Hester, Thomas Rothoerl, Thomas Lampe, Nicolas Manfred Otto Heess, Jonathan Karl Scholz
-
Patent number: 11534911Abstract: A system includes a neural network system implemented by one or more computers. The neural network system is configured to receive an observation characterizing a current state of a real-world environment being interacted with by a robotic agent to perform a robotic task and to process the observation to generate a policy output that defines an action to be performed by the robotic agent in response to the observation. The neural network system includes: (i) a sequence of deep neural networks (DNNs), in which the sequence of DNNs includes a simulation-trained DNN that has been trained on interactions of a simulated version of the robotic agent with a simulated version of the real-world environment to perform a simulated version of the robotic task, and (ii) a first robot-trained DNN that is configured to receive the observation and to process the observation to generate the policy output.Type: GrantFiled: March 25, 2020Date of Patent: December 27, 2022Assignee: DeepMind Technologies LimitedInventors: Razvan Pascanu, Raia Thais Hadsell, Mel Vecerik, Thomas Rothoerl, Andrei-Alexandru Rusu, Nicolas Manfred Otto Heess
-
Publication number: 20220355472Abstract: A system includes a neural network system implemented by one or more computers. The neural network system is configured to receive an observation characterizing a current state of a real-world environment being interacted with by a robotic agent to perform a robotic task and to process the observation to generate a policy output that defines an action to be performed by the robotic agent in response to the observation. The neural network system includes: (i) a sequence of deep neural networks (DNNs), in which the sequence of DNNs includes a simulation-trained DNN that has been trained on interactions of a simulated version of the robotic agent with a simulated version of the real-world environment to perform a simulated version of the robotic task, and (ii) a first robot-trained DNN that is configured to receive the observation and to process the observation to generate the policy output.Type: ApplicationFiled: July 25, 2022Publication date: November 10, 2022Inventors: Razvan Pascanu, Raia Thais Hadsell, Mel Vecerik, Thomas Rothoerl, Andrei-Alexandru Rusu, Nicolas Manfred Otto Heess
-
Patent number: 11468321Abstract: An off-policy reinforcement learning actor-critic neural network system configured to select actions from a continuous action space to be performed by an agent interacting with an environment to perform a task. An observation defines environment state data and reward data. The system has an actor neural network which learns a policy function mapping the state data to action data. A critic neural network learns an action-value (Q) function. A replay buffer stores tuples of the state data, the action data, the reward data and new state data. The replay buffer also includes demonstration transition data comprising a set of the tuples from a demonstration of the task within the environment. The neural network system is configured to train the actor neural network and the critic neural network off-policy using stored tuples from the replay buffer comprising tuples both from operation of the system and from the demonstration transition data.Type: GrantFiled: June 28, 2018Date of Patent: October 11, 2022Assignee: DeepMind Technologies LimitedInventors: Olivier Claude Pietquin, Martin Riedmiller, Wang Fumin, Bilal Piot, Mel Vecerik, Todd Andrew Hester, Thomas Rothoerl, Thomas Lampe, Nicolas Manfred Otto Heess, Jonathan Karl Scholz
-
Publication number: 20200223063Abstract: A system includes a neural network system implemented by one or more computers. The neural network system is configured to receive an observation characterizing a current state of a real-world environment being interacted with by a robotic agent to perform a robotic task and to process the observation to generate a policy output that defines an action to be performed by the robotic agent in response to the observation. The neural network system includes: (i) a sequence of deep neural networks (DNNs), in which the sequence of DNNs includes a simulation-trained DNN that has been trained on interactions of a simulated version of the robotic agent with a simulated version of the real-world environment to perform a simulated version of the robotic task, and (ii) a first robot-trained DNN that is configured to receive the observation and to process the observation to generate the policy output.Type: ApplicationFiled: March 25, 2020Publication date: July 16, 2020Inventors: Razvan Pascanu, Raia Thais Hadsell, Mel Vecerik, Thomas Rothoerl, Andrei-Alexandru Rusu, Nicolas Manfred Otto Heess
-
Publication number: 20200151562Abstract: An off-policy reinforcement learning actor-critic neural network system configured to select actions from a continuous action space to be performed by an agent interacting with an environment to perform a task. An observation defines environment state data and reward data. The system has an actor neural network which learns a policy function mapping the state data to action data. A critic neural network learns an action-value (Q) function. A replay buffer stores tuples of the state data, the action data, the reward data and new state data. The replay buffer also includes demonstration transition data comprising a set of the tuples from a demonstration of the task within the environment. The neural network system is configured to train the actor neural network and the critic neural network off-policy using stored tuples from the replay buffer comprising tuples both from operation of the system and from the demonstration transition data.Type: ApplicationFiled: June 28, 2018Publication date: May 14, 2020Inventors: Olivier Pietquin, Martin Riedmiller, Wang Fumin, Bilal Piot, Mel Vecerik, Todd Andrew Hester, Thomas Rothörl, Thomas Lampe, Nicolas Manfred Otto Heess, Jonathan Karl Scholz
-
Patent number: 10632618Abstract: A system includes a neural network system implemented by one or more computers. The neural network system is configured to receive an observation characterizing a current state of a real-world environment being interacted with by a robotic agent to perform a robotic task and to process the observation to generate a policy output that defines an action to be performed by the robotic agent in response to the observation. The neural network system includes: (i) a sequence of deep neural networks (DNNs), in which the sequence of DNNs includes a simulation-trained DNN that has been trained on interactions of a simulated version of the robotic agent with a simulated version of the real-world environment to perform a simulated version of the robotic task, and (ii) a first robot-trained DNN that is configured to receive the observation and to process the observation to generate the policy output.Type: GrantFiled: April 10, 2019Date of Patent: April 28, 2020Assignee: DeepMind Technologies LimitedInventors: Razvan Pascanu, Raia Thais Hadsell, Mel Vecerik, Thomas Rothoerl, Andrei-Alexandru Rusu, Nicolas Manfred Otto Heess
-
Publication number: 20190232489Abstract: A system includes a neural network system implemented by one or more computers. The neural network system is configured to receive an observation characterizing a current state of a real-world environment being interacted with by a robotic agent to perform a robotic task and to process the observation to generate a policy output that defines an action to be performed by the robotic agent in response to the observation. The neural network system includes: (i) a sequence of deep neural networks (DNNs), in which the sequence of DNNs includes a simulation-trained DNN that has been trained on interactions of a simulated version of the robotic agent with a simulated version of the real-world environment to perform a simulated version of the robotic task, and (ii) a first robot-trained DNN that is configured to receive the observation and to process the observation to generate the policy output.Type: ApplicationFiled: April 10, 2019Publication date: August 1, 2019Inventors: Razvan Pascanu, Raia Thais Hadsell, Mel Vecerik, Thomas Rothoerl, Andrei-Alexandru Rusu, Nicolas Manfred Otto Heess