Patents by Inventor Todd Andrew Hester

Todd Andrew Hester has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Training action-selection neural networks from demonstrations using multiple losses

Patent number: 12008077

Abstract: A method of training an action selection neural network to perform a demonstrated task using a supervised learning technique. The action selection neural network is configured to receive demonstration data comprising actions to perform the task and rewards received for performing the actions. The action selection neural network has auxiliary prediction task neural networks on one or more of its intermediate outputs. The action selection policy neural network is trained using multiple combined losses, concurrently with the auxiliary prediction task neural networks.

Type: Grant

Filed: March 13, 2023

Date of Patent: June 11, 2024

Assignee: DeepMind Technologies Limited

Inventor: Todd Andrew Hester
Training action selection neural networks using apprenticeship

Patent number: 11886997

Abstract: An off-policy reinforcement learning actor-critic neural network system configured to select actions from a continuous action space to be performed by an agent interacting with an environment to perform a task. An observation defines environment state data and reward data. The system has an actor neural network which learns a policy function mapping the state data to action data. A critic neural network learns an action-value (Q) function. A replay buffer stores tuples of the state data, the action data, the reward data and new state data. The replay buffer also includes demonstration transition data comprising a set of the tuples from a demonstration of the task within the environment. The neural network system is configured to train the actor neural network and the critic neural network off-policy using stored tuples from the replay buffer comprising tuples both from operation of the system and from the demonstration transition data.

Type: Grant

Filed: October 7, 2022

Date of Patent: January 30, 2024

Assignee: DeepMind Technologies Limited

Inventors: Olivier Pietquin, Martin Riedmiller, Wang Fumin, Bilal Piot, Mel Vecerik, Todd Andrew Hester, Thomas Rothoerl, Thomas Lampe, Nicolas Manfred Otto Heess, Jonathan Karl Scholz
Training action selection neural networks using apprenticeship

Patent number: 11868882

Abstract: An off-policy reinforcement learning actor-critic neural network system configured to select actions from a continuous action space to be performed by an agent interacting with an environment to perform a task. An observation defines environment state data and reward data. The system has an actor neural network which learns a policy function mapping the state data to action data. A critic neural network learns an action-value (Q) function. A replay buffer stores tuples of the state data, the action data, the reward data and new state data. The replay buffer also includes demonstration transition data comprising a set of the tuples from a demonstration of the task within the environment. The neural network system is configured to train the actor neural network and the critic neural network off-policy using stored tuples from the replay buffer comprising tuples both from operation of the system and from the demonstration transition data.

Type: Grant

Filed: June 28, 2018

Date of Patent: January 9, 2024

Assignee: DeepMind Technologies Limited

Inventors: Olivier Claude Pietquin, Martin Riedmiller, Wang Fumin, Bilal Piot, Mel Vecerik, Todd Andrew Hester, Thomas Rothoerl, Thomas Lampe, Nicolas Manfred Otto Heess, Jonathan Karl Scholz
Optimizing data center controls using neural networks

Patent number: 11836599

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for improving operational efficiency within a data center by modeling data center performance and predicting power usage efficiency. An example method receives a state input characterizing a current state of a data center. For each data center setting slate, the state input and the data center setting slate are processed through an ensemble of machine learning models. Each machine learning model is configured to receive and process the state input and the data center setting slate to generate an efficiency score that characterizes a predicted resource efficiency of the data center if the data center settings defined by the data center setting slate are adopted t. The method selects, based on the efficiency scores for the data center setting slates, new values for the data center settings.

Type: Grant

Filed: May 26, 2021

Date of Patent: December 5, 2023

Assignee: DeepMind Technologies Limited

Inventors: Richard Andrew Evans, Jim Gao, Michael C. Ryan, Gabriel Dulac-Arnold, Jonathan Karl Scholz, Todd Andrew Hester
Training action-selection neural networks from demonstrations using multiple losses

Patent number: 11604941

Abstract: A method of training an action selection neural network to perform a demonstrated task using a supervised learning technique. The action selection neural network is configured to receive demonstration data comprising actions to perform the task and rewards received for performing the actions. The action selection neural network has auxiliary prediction task neural networks on one or more of its intermediate outputs. The action selection policy neural network is trained using multiple combined losses, concurrently with the auxiliary prediction task neural networks.

Type: Grant

Filed: October 29, 2018

Date of Patent: March 14, 2023

Assignee: DeepMind Technologies Limited

Inventor: Todd Andrew Hester
TRAINING ACTION SELECTION NEURAL NETWORKS USING APPRENTICESHIP

Publication number: 20230023189

Abstract: An off-policy reinforcement learning actor-critic neural network system configured to select actions from a continuous action space to be performed by an agent interacting with an environment to perform a task. An observation defines environment state data and reward data. The system has an actor neural network which learns a policy function mapping the state data to action data. A critic neural network learns an action-value (Q) function. A replay buffer stores tuples of the state data, the action data, the reward data and new state data. The replay buffer also includes demonstration transition data comprising a set of the tuples from a demonstration of the task within the environment. The neural network system is configured to train the actor neural network and the critic neural network off-policy using stored tuples from the replay buffer comprising tuples both from operation of the system and from the demonstration transition data.

Type: Application

Filed: October 7, 2022

Publication date: January 26, 2023

Inventors: Olivier Pietquin, Martin Riedmiller, Wang Fumin, Bilal Piot, Mel Vecerik, Todd Andrew Hester, Thomas Rothoerl, Thomas Lampe, Nicolas Manfred Otto Heess, Jonathan Karl Scholz
ROBUST REINFORCEMENT LEARNING FOR CONTINUOUS CONTROL WITH MODEL MISSPECIFICATION

Publication number: 20220343157

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a policy neural network having policy parameters. One of the methods includes sampling a mini-batch comprising one or more observation-action-reward tuples generated as a result of interactions of a first agent with a first environment; determining an update to current values of the Q network parameters by minimizing a robust entropy-regularized temporal difference (TD) error that accounts for possible perturbations of the states of the first environment represented by the observations in the observation-action-reward tuples; and determining, using the Q-value neural network, an update to the policy network parameters using the sampled mini-batch of observation-action-reward tuples.

Type: Application

Filed: June 17, 2020

Publication date: October 27, 2022

Inventors: Daniel J. Mankowitz, Nir Levine, Rae Chan Jeong, Abbas Abdolmaleki, Jost Tobias Springenberg, Todd Andrew Hester, Timothy Arthur Mann, Martin Riedmiller
Training action selection neural networks using apprenticeship

Patent number: 11468321

Abstract: An off-policy reinforcement learning actor-critic neural network system configured to select actions from a continuous action space to be performed by an agent interacting with an environment to perform a task. An observation defines environment state data and reward data. The system has an actor neural network which learns a policy function mapping the state data to action data. A critic neural network learns an action-value (Q) function. A replay buffer stores tuples of the state data, the action data, the reward data and new state data. The replay buffer also includes demonstration transition data comprising a set of the tuples from a demonstration of the task within the environment. The neural network system is configured to train the actor neural network and the critic neural network off-policy using stored tuples from the replay buffer comprising tuples both from operation of the system and from the demonstration transition data.

Type: Grant

Filed: June 28, 2018

Date of Patent: October 11, 2022

Assignee: DeepMind Technologies Limited

Inventors: Olivier Claude Pietquin, Martin Riedmiller, Wang Fumin, Bilal Piot, Mel Vecerik, Todd Andrew Hester, Thomas Rothoerl, Thomas Lampe, Nicolas Manfred Otto Heess, Jonathan Karl Scholz
OPTIMIZING DATA CENTER CONTROLS USING NEURAL NETWORKS

Publication number: 20210287072

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for improving operational efficiency within a data center by modeling data center performance and predicting power usage efficiency. An example method receives a state input characterizing a current state of a data center. For each data center setting slate, the state input and the data center setting slate are processed through an ensemble of machine learning models. Each machine learning model is configured to receive and process the state input and the data center setting slate to generate an efficiency score that characterizes a predicted resource efficiency of the data center if the data center settings defined by the data center setting slate are adopted t. The method selects, based on the efficiency scores for the data center setting slates, new values for the data center settings.

Type: Application

Filed: May 26, 2021

Publication date: September 16, 2021

Inventors: Richard Andrew Evans, Jim Gao, Michael C. Ryan, Gabriel Dulac-Arnold, Jonathan Karl Scholz, Todd Andrew Hester
OPTIMIZING DATA CENTER CONTROLS USING NEURAL NETWORKS

Publication number: 20200272889

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for improving operational efficiency within a data center by modeling data center performance and predicting power usage efficiency. An example method receives a state input characterizing a current state of a data center. For each data center setting slate, the state input and the data center setting slate are processed through an ensemble of machine learning models. Each machine learning model is configured to receive and process the state input and the data center setting slate to generate an efficiency score that characterizes a predicted resource efficiency of the data center if the data center settings defined by the data center setting slate are adopted t. The method selects, based on the efficiency scores for the data center setting slates, new values for the data center settings.

Type: Application

Filed: April 30, 2020

Publication date: August 27, 2020

Inventors: Richard Andrew Evans, Jim Gao, Michael C. Ryan, Gabriel Dulac-Arnold, Jonathan Karl Scholz, Todd Andrew Hester
TRAINING ACTION SELECTION NEURAL NETWORKS USING APPRENTICESHIP

Publication number: 20200151562

Abstract: An off-policy reinforcement learning actor-critic neural network system configured to select actions from a continuous action space to be performed by an agent interacting with an environment to perform a task. An observation defines environment state data and reward data. The system has an actor neural network which learns a policy function mapping the state data to action data. A critic neural network learns an action-value (Q) function. A replay buffer stores tuples of the state data, the action data, the reward data and new state data. The replay buffer also includes demonstration transition data comprising a set of the tuples from a demonstration of the task within the environment. The neural network system is configured to train the actor neural network and the critic neural network off-policy using stored tuples from the replay buffer comprising tuples both from operation of the system and from the demonstration transition data.

Type: Application

Filed: June 28, 2018

Publication date: May 14, 2020

Inventors: Olivier Pietquin, Martin Riedmiller, Wang Fumin, Bilal Piot, Mel Vecerik, Todd Andrew Hester, Thomas Rothörl, Thomas Lampe, Nicolas Manfred Otto Heess, Jonathan Karl Scholz
Optimizing data center controls using neural networks

Patent number: 10643121

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for improving operational efficiency within a data center by modeling data center performance and predicting power usage efficiency. An example method receives a state input characterizing a current state of a data center. For each data center setting slate, the state input and the data center setting slate are processed through an ensemble of machine learning models. Each machine learning model is configured to receive and process the state input and the data center setting slate to generate an efficiency score that characterizes a predicted resource efficiency of the data center if the data center settings defined by the data center setting slate are adopted t. The method selects, based on the efficiency scores for the data center setting slates, new values for the data center settings.

Type: Grant

Filed: January 19, 2017

Date of Patent: May 5, 2020

Assignee: DeepMind Technologies Limited

Inventors: Richard Andrew Evans, Jim Gao, Michael C. Ryan, Gabriel Dulac-Arnold, Jonathan Karl Scholz, Todd Andrew Hester
OPTIMIZING DATA CENTER CONTROLS USING NEURAL NETWORKS

Publication number: 20180204116

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for improving operational efficiency within a data center by modeling data center performance and predicting power usage efficiency. An example method receives a state input characterizing a current state of a data center. For each data center setting slate, the state input and the data center setting slate are processed through an ensemble of machine learning models. Each machine learning model is configured to receive and process the state input and the data center setting slate to generate an efficiency score that characterizes a predicted resource efficiency of the data center if the data center settings defined by the data center setting slate are adopted t. The method selects, based on the efficiency scores for the data center setting slates, new values for the data center settings.

Type: Application

Filed: January 19, 2017

Publication date: July 19, 2018

Inventors: Richard Andrew Evans, Jim Gao, Michael C. Ryan, Gabriel Dulac-Arnold, Jonathan Karl Scholz, Todd Andrew Hester
Predictively controlling an environmental control system

Patent number: 9869484

Abstract: In an embodiment, an electronic device may include a processor that may iteratively simulate candidate control trajectories using upper confidence bound for trees (UCT) to control an environmental control system (e.g., an HVAC system). Each candidate control trajectory may be simulated by selecting a control action at each of a plurality of time steps over a period of time that has the highest upper bound on possible performance using values from previous simulations and predicting a temperature for a next time step of the plurality of time steps that results from applying the selected control action using a thermal model. The processor may determine a value of each candidate control trajectory using a cost function, update the value of each control action selected in each candidate control trajectory, and select a candidate control trajectory with the highest value using UCT to apply to control the environmental control system.

Type: Grant

Filed: January 14, 2015

Date of Patent: January 16, 2018

Assignee: Google Inc.

Inventors: Todd Andrew Hester, Evan Jarman Fisher, Piyush Khandelwal
Enhanced automated control scheduling

Patent number: 9772116

Abstract: In an embodiment, an electronic device may include storage containing processor-executable instructions, a preference function that maps weights indicating likely user preferences for the range of values of a device setting in relation to a range of values of a variable, and a current automated device control schedule configured to control the device setting of the electronic device in relation to the variable, and a processor. The instructions may cause the processor to determine the current automated device control schedule based on the preference function by detecting user behavior that indicates satisfaction or dissatisfaction with values of the device setting in relation to the variable, updating the preference function based on the detected user behavior, and determining the current automated device control schedule by comparing a number of candidate device control schedules against the weights of the preference function and selecting the candidate with the highest score.

Type: Grant

Filed: November 4, 2014

Date of Patent: September 26, 2017

Assignee: Google Inc.

Inventors: Todd Andrew Hester, Allen Joseph Minich, George Alban Heitz, III
PREDICTIVELY CONTROLLING AN ENVIRONMENTAL CONTROL SYSTEM USING UPPER CONFIDENCE BOUND FOR TREES

Publication number: 20160201934

Abstract: In an embodiment, an electronic device may include a processor that may iteratively simulate candidate control trajectories using upper confidence bound for trees (UCT) to control an environmental control system (e.g., an HVAC system). Each candidate control trajectory may be simulated by selecting a control action at each of a plurality of time steps over a period of time that has the highest upper bound on possible performance using values from previous simulations and predicting a temperature for a next time step of the plurality of time steps that results from applying the selected control action using a thermal model. The processor may determine a value of each candidate control trajectory using a cost function, update the value of each control action selected in each candidate control trajectory, and select a candidate control trajectory with the highest value using UCT to apply to control the environmental control system.

Type: Application

Filed: January 14, 2015

Publication date: July 14, 2016

Inventors: Todd Andrew Hester, Evan Jarman Fisher, Piyush Khandelwal
PREDICTIVELY CONTROLLING AN ENVIRONMENTAL CONTROL SYSTEM

Publication number: 20160201933

Abstract: In an embodiment, an electronic device may include a power source configured to provide operational power to the electronic device and a processor coupled to the power source. The processor may be configured to generate temperature predictions using a model of a structure and possible control scenarios, determine a value of the temperature predictions and the respective possible control scenarios using a cost function, the cost function comprising weighted factors related to an error between a setpoint temperature and the temperature predictions, a length of runtime for an environmental control system (e.g., an HVAC system), and a length of environmental control system cycles. The processor may also be configured to select the control scenario with the highest value to apply to control the environmental control system. The control scenarios may be generated using upper confidence bound for trees (UCT).

Type: Application

Filed: January 14, 2015

Publication date: July 14, 2016

Inventors: Todd Andrew Hester, Evan Jarman Fisher, Piyush Khandelwal