Patents by Inventor Shixiang Gu

Shixiang Gu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Deep reinforcement learning for robotic manipulation

Patent number: 12240113

Abstract: Implementations utilize deep reinforcement learning to train a policy neural network that parameterizes a policy for determining a robotic action based on a current state. Some of those implementations collect experience data from multiple robots that operate simultaneously. Each robot generates instances of experience data during iterative performance of episodes that are each explorations of performing a task, and that are each guided based on the policy network and the current policy parameters for the policy network during the episode. The collected experience data is generated during the episodes and is used to train the policy network by iteratively updating policy parameters of the policy network based on a batch of collected experience data. Further, prior to performance of each of a plurality of episodes performed by the robots, the current updated policy parameters can be provided (or retrieved) for utilization in performance of the episode.

Type: Grant

Filed: December 1, 2023

Date of Patent: March 4, 2025

Assignee: GOOGLE LLC

Inventors: Sergey Levine, Ethan Holly, Shixiang Gu, Timothy Lillicrap
DATA-EFFICIENT HIERARCHICAL REINFORCEMENT LEARNING

Publication number: 20240308068

Abstract: Training and/or utilizing a hierarchical reinforcement learning (HRL) model for robotic control. The HRL model can include at least a higher-level policy model and a lower-level policy model. Some implementations relate to technique(s) that enable more efficient off-policy training to be utilized in training of the higher-level policy model and/or the lower-level policy model. Some of those implementations utilize off-policy correction, which re-labels higher-level actions of experience data, generated in the past utilizing a previously trained version of the HRL model, with modified higher-level actions. The modified higher-level actions are then utilized to off-policy train the higher-level policy model. This can enable effective off-policy training despite the lower-level policy model being a different version at training time (relative to the version when the experience data was collected).

Type: Application

Filed: May 24, 2024

Publication date: September 19, 2024

Inventors: Honglak Lee, Shixiang Gu, Sergey Levine
Data-efficient hierarchical reinforcement learning

Patent number: 11992944

Abstract: Training and/or utilizing a hierarchical reinforcement learning (HRL) model for robotic control. The HRL model can include at least a higher-level policy model and a lower-level policy model. Some implementations relate to technique(s) that enable more efficient off-policy training to be utilized in training of the higher-level policy model and/or the lower-level policy model. Some of those implementations utilize off-policy correction, which re-labels higher-level actions of experience data, generated in the past utilizing a previously trained version of the HRL model, with modified higher-level actions. The modified higher-level actions are then utilized to off-policy train the higher-level policy model. This can enable effective off-policy training despite the lower-level policy model being a different version at training time (relative to the version when the experience data was collected).

Type: Grant

Filed: May 17, 2019

Date of Patent: May 28, 2024

Assignee: GOOGLE LLC

Inventors: Honglak Lee, Shixiang Gu, Sergey Levine
DEEP REINFORCEMENT LEARNING FOR ROBOTIC MANIPULATION

Publication number: 20240131695

Abstract: Implementations utilize deep reinforcement learning to train a policy neural network that parameterizes a policy for determining a robotic action based on a current state. Some of those implementations collect experience data from multiple robots that operate simultaneously. Each robot generates instances of experience data during iterative performance of episodes that are each explorations of performing a task, and that are each guided based on the policy network and the current policy parameters for the policy network during the episode. The collected experience data is generated during the episodes and is used to train the policy network by iteratively updating policy parameters of the policy network based on a batch of collected experience data. Further, prior to performance of each of a plurality of episodes performed by the robots, the current updated policy parameters can be provided (or retrieved) for utilization in performance of the episode.

Type: Application

Filed: December 1, 2023

Publication date: April 25, 2024

Inventors: Sergey Levine, Ethan Holly, Shixiang Gu, Timothy Lillicrap
Deep reinforcement learning for robotic manipulation

Patent number: 11897133

Abstract: Implementations utilize deep reinforcement learning to train a policy neural network that parameterizes a policy for determining a robotic action based on a current state. Some of those implementations collect experience data from multiple robots that operate simultaneously. Each robot generates instances of experience data during iterative performance of episodes that are each explorations of performing a task, and that are each guided based on the policy network and the current policy parameters for the policy network during the episode. The collected experience data is generated during the episodes and is used to train the policy network by iteratively updating policy parameters of the policy network based on a batch of collected experience data. Further, prior to performance of each of a plurality of episodes performed by the robots, the current updated policy parameters can be provided (or retrieved) for utilization in performance of the episode.

Type: Grant

Filed: August 1, 2022

Date of Patent: February 13, 2024

Assignee: GOOGLE LLC

Inventors: Sergey Levine, Ethan Holly, Shixiang Gu, Timothy Lillicrap
Deep reinforcement learning for robotic manipulation

Patent number: 11845183

Abstract: Implementations utilize deep reinforcement learning to train a policy neural network that parameterizes a policy for determining a robotic action based on a current state. Some of those implementations collect experience data from multiple robots that operate simultaneously. Each robot generates instances of experience data during iterative performance of episodes that are each explorations of performing a task, and that are each guided based on the policy network and the current policy parameters for the policy network during the episode. The collected experience data is generated during the episodes and is used to train the policy network by iteratively updating policy parameters of the policy network based on a batch of collected experience data. Further, prior to performance of each of a plurality of episodes performed by the robots, the current updated policy parameters can be provided (or retrieved) for utilization in performance of the episode.

Type: Grant

Filed: August 1, 2022

Date of Patent: December 19, 2023

Assignee: GOOGLE LLC

Inventors: Sergey Levine, Ethan Holly, Shixiang Gu, Timothy Lillicrap
DEEP REINFORCEMENT LEARNING FOR ROBOTIC MANIPULATION

Publication number: 20220388159

Abstract: Implementations utilize deep reinforcement learning to train a policy neural network that parameterizes a policy for determining a robotic action based on a current state. Some of those implementations collect experience data from multiple robots that operate simultaneously. Each robot generates instances of experience data during iterative performance of episodes that are each explorations of performing a task, and that are each guided based on the policy network and the current policy parameters for the policy network during the episode. The collected experience data is generated during the episodes and is used to train the policy network by iteratively updating policy parameters of the policy network based on a batch of collected experience data. Further, prior to performance of each of a plurality of episodes performed by the robots, the current updated policy parameters can be provided (or retrieved) for utilization in performance of the episode.

Type: Application

Filed: August 1, 2022

Publication date: December 8, 2022

Inventors: Sergey Levine, Ethan Holly, Shixiang Gu, Timothy Lillicrap
REINFORCEMENT LEARNING USING ADVANTAGE ESTIMATES

Publication number: 20220284266

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for computing Q values for actions to be performed by an agent interacting with an environment from a continuous action space of actions. In one aspect, a system includes a value subnetwork configured to receive an observation characterizing a current state of the environment and process the observation to generate a value estimate; a policy subnetwork configured to receive the observation and process the observation to generate an ideal point in the continuous action space; and a subsystem configured to receive a particular point in the continuous action space representing a particular action; generate an advantage estimate for the particular action; and generate a Q value for the particular action that is an estimate of an expected return resulting from the agent performing the particular action when the environment is in the current state.

Type: Application

Filed: March 25, 2022

Publication date: September 8, 2022

Inventors: Shixiang Gu, Timothy Paul Lillicrap, Ilya Sutskever, Sergey Vladimir Levine
Deep reinforcement learning for robotic manipulation

Patent number: 11400587

Abstract: Implementations utilize deep reinforcement learning to train a policy neural network that parameterizes a policy for determining a robotic action based on a current state. Some of those implementations collect experience data from multiple robots that operate simultaneously. Each robot generates instances of experience data during iterative performance of episodes that are each explorations of performing a task, and that are each guided based on the policy network and the current policy parameters for the policy network during the episode. The collected experience data is generated during the episodes and is used to train the policy network by iteratively updating policy parameters of the policy network based on a batch of collected experience data. Further, prior to performance of each of a plurality of episodes performed by the robots, the current updated policy parameters can be provided (or retrieved) for utilization in performance of the episode.

Type: Grant

Filed: September 14, 2017

Date of Patent: August 2, 2022

Assignee: GOOGLE LLC

Inventors: Sergey Levine, Ethan Holly, Shixiang Gu, Timothy Lillicrap
Reinforcement learning using advantage estimates

Patent number: 11288568

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for computing Q values for actions to be performed by an agent interacting with an environment from a continuous action space of actions. In one aspect, a system includes a value subnetwork configured to receive an observation characterizing a current state of the environment and process the observation to generate a value estimate; a policy subnetwork configured to receive the observation and process the observation to generate an ideal point in the continuous action space; and a subsystem configured to receive a particular point in the continuous action space representing a particular action; generate an advantage estimate for the particular action; and generate a Q value for the particular action that is an estimate of an expected return resulting from the agent performing the particular action when the environment is in the current state.

Type: Grant

Filed: February 9, 2017

Date of Patent: March 29, 2022

Assignee: Google LLC

Inventors: Shixiang Gu, Timothy Paul Lillicrap, Ilya Sutskever, Sergey Vladimir Levine
DATA-EFFICIENT HIERARCHICAL REINFORCEMENT LEARNING

Publication number: 20210187733

Abstract: Training and/or utilizing a hierarchical reinforcement learning (HRL) model for robotic control. The HRL model can include at least a higher-level policy model and a lower-level policy model. Some implementations relate to technique(s) that enable more efficient off-policy training to be utilized in training of the higher-level policy model and/or the lower-level policy model. Some of those implementations utilize off-policy correction, which re-labels higher-level actions of experience data, generated in the past utilizing a previously trained version of the HRL model, with modified higher-level actions. The modified higher-level actions are then utilized to off-policy train the higher-level policy model. This can enable effective off-policy training despite the lower-level policy model being a different version at training time (relative to the version when the experience data was collected).

Type: Application

Filed: May 17, 2019

Publication date: June 24, 2021

Inventors: Honglak Lee, Shixiang Gu, Sergey Levine
DEEP REINFORCEMENT LEARNING FOR ROBOTIC MANIPULATION

Publication number: 20190232488

Abstract: Implementations utilize deep reinforcement learning to train a policy neural network that parameterizes a policy for determining a robotic action based on a current state. Some of those implementations collect experience data from multiple robots that operate simultaneously. Each robot generates instances of experience data during iterative performance of episodes that are each explorations of performing a task, and that are each guided based on the policy network and the current policy parameters for the policy network during the episode. The collected experience data is generated during the episodes and is used to train the policy network by iteratively updating policy parameters of the policy network based on a batch of collected experience data. Further, prior to performance of each of a plurality of episodes performed by the robots, the current updated policy parameters can be provided (or retrieved) for utilization in performance of the episode.

Type: Application

Filed: September 14, 2017

Publication date: August 1, 2019

Inventors: Sergey Levine, Ethan Holly, Shixiang Gu, Timothy Lillicrap
REINFORCEMENT LEARNING USING ADVANTAGE ESTIMATES

Publication number: 20170228662

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for computing Q values for actions to be performed by an agent interacting with an environment from a continuous action space of actions. In one aspect, a system includes a value subnetwork configured to receive an observation characterizing a current state of the environment and process the observation to generate a value estimate; a policy subnetwork configured to receive the observation and process the observation to generate an ideal point in the continuous action space; and a subsystem configured to receive a particular point in the continuous action space representing a particular action; generate an advantage estimate for the particular action; and generate a Q value for the particular action that is an estimate of an expected return resulting from the agent performing the particular action when the environment is in the current state.

Type: Application

Filed: February 9, 2017

Publication date: August 10, 2017

Inventors: Shixiang Gu, Timothy Paul Lillicrap, Ilya ISutskever, Sergey Vladimir Levine