Patents by Inventor Todd Andrew Hester

Todd Andrew Hester has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11886997
    Abstract: An off-policy reinforcement learning actor-critic neural network system configured to select actions from a continuous action space to be performed by an agent interacting with an environment to perform a task. An observation defines environment state data and reward data. The system has an actor neural network which learns a policy function mapping the state data to action data. A critic neural network learns an action-value (Q) function. A replay buffer stores tuples of the state data, the action data, the reward data and new state data. The replay buffer also includes demonstration transition data comprising a set of the tuples from a demonstration of the task within the environment. The neural network system is configured to train the actor neural network and the critic neural network off-policy using stored tuples from the replay buffer comprising tuples both from operation of the system and from the demonstration transition data.
    Type: Grant
    Filed: October 7, 2022
    Date of Patent: January 30, 2024
    Assignee: DeepMind Technologies Limited
    Inventors: Olivier Pietquin, Martin Riedmiller, Wang Fumin, Bilal Piot, Mel Vecerik, Todd Andrew Hester, Thomas Rothoerl, Thomas Lampe, Nicolas Manfred Otto Heess, Jonathan Karl Scholz
  • Patent number: 11868882
    Abstract: An off-policy reinforcement learning actor-critic neural network system configured to select actions from a continuous action space to be performed by an agent interacting with an environment to perform a task. An observation defines environment state data and reward data. The system has an actor neural network which learns a policy function mapping the state data to action data. A critic neural network learns an action-value (Q) function. A replay buffer stores tuples of the state data, the action data, the reward data and new state data. The replay buffer also includes demonstration transition data comprising a set of the tuples from a demonstration of the task within the environment. The neural network system is configured to train the actor neural network and the critic neural network off-policy using stored tuples from the replay buffer comprising tuples both from operation of the system and from the demonstration transition data.
    Type: Grant
    Filed: June 28, 2018
    Date of Patent: January 9, 2024
    Assignee: DeepMind Technologies Limited
    Inventors: Olivier Claude Pietquin, Martin Riedmiller, Wang Fumin, Bilal Piot, Mel Vecerik, Todd Andrew Hester, Thomas Rothoerl, Thomas Lampe, Nicolas Manfred Otto Heess, Jonathan Karl Scholz
  • Patent number: 11836599
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for improving operational efficiency within a data center by modeling data center performance and predicting power usage efficiency. An example method receives a state input characterizing a current state of a data center. For each data center setting slate, the state input and the data center setting slate are processed through an ensemble of machine learning models. Each machine learning model is configured to receive and process the state input and the data center setting slate to generate an efficiency score that characterizes a predicted resource efficiency of the data center if the data center settings defined by the data center setting slate are adopted t. The method selects, based on the efficiency scores for the data center setting slates, new values for the data center settings.
    Type: Grant
    Filed: May 26, 2021
    Date of Patent: December 5, 2023
    Assignee: DeepMind Technologies Limited
    Inventors: Richard Andrew Evans, Jim Gao, Michael C. Ryan, Gabriel Dulac-Arnold, Jonathan Karl Scholz, Todd Andrew Hester
  • Patent number: 11604941
    Abstract: A method of training an action selection neural network to perform a demonstrated task using a supervised learning technique. The action selection neural network is configured to receive demonstration data comprising actions to perform the task and rewards received for performing the actions. The action selection neural network has auxiliary prediction task neural networks on one or more of its intermediate outputs. The action selection policy neural network is trained using multiple combined losses, concurrently with the auxiliary prediction task neural networks.
    Type: Grant
    Filed: October 29, 2018
    Date of Patent: March 14, 2023
    Assignee: DeepMind Technologies Limited
    Inventor: Todd Andrew Hester
  • Publication number: 20230023189
    Abstract: An off-policy reinforcement learning actor-critic neural network system configured to select actions from a continuous action space to be performed by an agent interacting with an environment to perform a task. An observation defines environment state data and reward data. The system has an actor neural network which learns a policy function mapping the state data to action data. A critic neural network learns an action-value (Q) function. A replay buffer stores tuples of the state data, the action data, the reward data and new state data. The replay buffer also includes demonstration transition data comprising a set of the tuples from a demonstration of the task within the environment. The neural network system is configured to train the actor neural network and the critic neural network off-policy using stored tuples from the replay buffer comprising tuples both from operation of the system and from the demonstration transition data.
    Type: Application
    Filed: October 7, 2022
    Publication date: January 26, 2023
    Inventors: Olivier Pietquin, Martin Riedmiller, Wang Fumin, Bilal Piot, Mel Vecerik, Todd Andrew Hester, Thomas Rothoerl, Thomas Lampe, Nicolas Manfred Otto Heess, Jonathan Karl Scholz
  • Publication number: 20220343157
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a policy neural network having policy parameters. One of the methods includes sampling a mini-batch comprising one or more observation-action-reward tuples generated as a result of interactions of a first agent with a first environment; determining an update to current values of the Q network parameters by minimizing a robust entropy-regularized temporal difference (TD) error that accounts for possible perturbations of the states of the first environment represented by the observations in the observation-action-reward tuples; and determining, using the Q-value neural network, an update to the policy network parameters using the sampled mini-batch of observation-action-reward tuples.
    Type: Application
    Filed: June 17, 2020
    Publication date: October 27, 2022
    Inventors: Daniel J. Mankowitz, Nir Levine, Rae Chan Jeong, Abbas Abdolmaleki, Jost Tobias Springenberg, Todd Andrew Hester, Timothy Arthur Mann, Martin Riedmiller
  • Patent number: 11468321
    Abstract: An off-policy reinforcement learning actor-critic neural network system configured to select actions from a continuous action space to be performed by an agent interacting with an environment to perform a task. An observation defines environment state data and reward data. The system has an actor neural network which learns a policy function mapping the state data to action data. A critic neural network learns an action-value (Q) function. A replay buffer stores tuples of the state data, the action data, the reward data and new state data. The replay buffer also includes demonstration transition data comprising a set of the tuples from a demonstration of the task within the environment. The neural network system is configured to train the actor neural network and the critic neural network off-policy using stored tuples from the replay buffer comprising tuples both from operation of the system and from the demonstration transition data.
    Type: Grant
    Filed: June 28, 2018
    Date of Patent: October 11, 2022
    Assignee: DeepMind Technologies Limited
    Inventors: Olivier Claude Pietquin, Martin Riedmiller, Wang Fumin, Bilal Piot, Mel Vecerik, Todd Andrew Hester, Thomas Rothoerl, Thomas Lampe, Nicolas Manfred Otto Heess, Jonathan Karl Scholz
  • Publication number: 20210287072
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for improving operational efficiency within a data center by modeling data center performance and predicting power usage efficiency. An example method receives a state input characterizing a current state of a data center. For each data center setting slate, the state input and the data center setting slate are processed through an ensemble of machine learning models. Each machine learning model is configured to receive and process the state input and the data center setting slate to generate an efficiency score that characterizes a predicted resource efficiency of the data center if the data center settings defined by the data center setting slate are adopted t. The method selects, based on the efficiency scores for the data center setting slates, new values for the data center settings.
    Type: Application
    Filed: May 26, 2021
    Publication date: September 16, 2021
    Inventors: Richard Andrew Evans, Jim Gao, Michael C. Ryan, Gabriel Dulac-Arnold, Jonathan Karl Scholz, Todd Andrew Hester
  • Publication number: 20200272889
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for improving operational efficiency within a data center by modeling data center performance and predicting power usage efficiency. An example method receives a state input characterizing a current state of a data center. For each data center setting slate, the state input and the data center setting slate are processed through an ensemble of machine learning models. Each machine learning model is configured to receive and process the state input and the data center setting slate to generate an efficiency score that characterizes a predicted resource efficiency of the data center if the data center settings defined by the data center setting slate are adopted t. The method selects, based on the efficiency scores for the data center setting slates, new values for the data center settings.
    Type: Application
    Filed: April 30, 2020
    Publication date: August 27, 2020
    Inventors: Richard Andrew Evans, Jim Gao, Michael C. Ryan, Gabriel Dulac-Arnold, Jonathan Karl Scholz, Todd Andrew Hester
  • Publication number: 20200151562
    Abstract: An off-policy reinforcement learning actor-critic neural network system configured to select actions from a continuous action space to be performed by an agent interacting with an environment to perform a task. An observation defines environment state data and reward data. The system has an actor neural network which learns a policy function mapping the state data to action data. A critic neural network learns an action-value (Q) function. A replay buffer stores tuples of the state data, the action data, the reward data and new state data. The replay buffer also includes demonstration transition data comprising a set of the tuples from a demonstration of the task within the environment. The neural network system is configured to train the actor neural network and the critic neural network off-policy using stored tuples from the replay buffer comprising tuples both from operation of the system and from the demonstration transition data.
    Type: Application
    Filed: June 28, 2018
    Publication date: May 14, 2020
    Inventors: Olivier Pietquin, Martin Riedmiller, Wang Fumin, Bilal Piot, Mel Vecerik, Todd Andrew Hester, Thomas Rothörl, Thomas Lampe, Nicolas Manfred Otto Heess, Jonathan Karl Scholz
  • Patent number: 10643121
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for improving operational efficiency within a data center by modeling data center performance and predicting power usage efficiency. An example method receives a state input characterizing a current state of a data center. For each data center setting slate, the state input and the data center setting slate are processed through an ensemble of machine learning models. Each machine learning model is configured to receive and process the state input and the data center setting slate to generate an efficiency score that characterizes a predicted resource efficiency of the data center if the data center settings defined by the data center setting slate are adopted t. The method selects, based on the efficiency scores for the data center setting slates, new values for the data center settings.
    Type: Grant
    Filed: January 19, 2017
    Date of Patent: May 5, 2020
    Assignee: DeepMind Technologies Limited
    Inventors: Richard Andrew Evans, Jim Gao, Michael C. Ryan, Gabriel Dulac-Arnold, Jonathan Karl Scholz, Todd Andrew Hester
  • Publication number: 20180204116
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for improving operational efficiency within a data center by modeling data center performance and predicting power usage efficiency. An example method receives a state input characterizing a current state of a data center. For each data center setting slate, the state input and the data center setting slate are processed through an ensemble of machine learning models. Each machine learning model is configured to receive and process the state input and the data center setting slate to generate an efficiency score that characterizes a predicted resource efficiency of the data center if the data center settings defined by the data center setting slate are adopted t. The method selects, based on the efficiency scores for the data center setting slates, new values for the data center settings.
    Type: Application
    Filed: January 19, 2017
    Publication date: July 19, 2018
    Inventors: Richard Andrew Evans, Jim Gao, Michael C. Ryan, Gabriel Dulac-Arnold, Jonathan Karl Scholz, Todd Andrew Hester
  • Patent number: 9869484
    Abstract: In an embodiment, an electronic device may include a processor that may iteratively simulate candidate control trajectories using upper confidence bound for trees (UCT) to control an environmental control system (e.g., an HVAC system). Each candidate control trajectory may be simulated by selecting a control action at each of a plurality of time steps over a period of time that has the highest upper bound on possible performance using values from previous simulations and predicting a temperature for a next time step of the plurality of time steps that results from applying the selected control action using a thermal model. The processor may determine a value of each candidate control trajectory using a cost function, update the value of each control action selected in each candidate control trajectory, and select a candidate control trajectory with the highest value using UCT to apply to control the environmental control system.
    Type: Grant
    Filed: January 14, 2015
    Date of Patent: January 16, 2018
    Assignee: Google Inc.
    Inventors: Todd Andrew Hester, Evan Jarman Fisher, Piyush Khandelwal
  • Patent number: 9772116
    Abstract: In an embodiment, an electronic device may include storage containing processor-executable instructions, a preference function that maps weights indicating likely user preferences for the range of values of a device setting in relation to a range of values of a variable, and a current automated device control schedule configured to control the device setting of the electronic device in relation to the variable, and a processor. The instructions may cause the processor to determine the current automated device control schedule based on the preference function by detecting user behavior that indicates satisfaction or dissatisfaction with values of the device setting in relation to the variable, updating the preference function based on the detected user behavior, and determining the current automated device control schedule by comparing a number of candidate device control schedules against the weights of the preference function and selecting the candidate with the highest score.
    Type: Grant
    Filed: November 4, 2014
    Date of Patent: September 26, 2017
    Assignee: Google Inc.
    Inventors: Todd Andrew Hester, Allen Joseph Minich, George Alban Heitz, III
  • Publication number: 20160201933
    Abstract: In an embodiment, an electronic device may include a power source configured to provide operational power to the electronic device and a processor coupled to the power source. The processor may be configured to generate temperature predictions using a model of a structure and possible control scenarios, determine a value of the temperature predictions and the respective possible control scenarios using a cost function, the cost function comprising weighted factors related to an error between a setpoint temperature and the temperature predictions, a length of runtime for an environmental control system (e.g., an HVAC system), and a length of environmental control system cycles. The processor may also be configured to select the control scenario with the highest value to apply to control the environmental control system. The control scenarios may be generated using upper confidence bound for trees (UCT).
    Type: Application
    Filed: January 14, 2015
    Publication date: July 14, 2016
    Inventors: Todd Andrew Hester, Evan Jarman Fisher, Piyush Khandelwal
  • Publication number: 20160201934
    Abstract: In an embodiment, an electronic device may include a processor that may iteratively simulate candidate control trajectories using upper confidence bound for trees (UCT) to control an environmental control system (e.g., an HVAC system). Each candidate control trajectory may be simulated by selecting a control action at each of a plurality of time steps over a period of time that has the highest upper bound on possible performance using values from previous simulations and predicting a temperature for a next time step of the plurality of time steps that results from applying the selected control action using a thermal model. The processor may determine a value of each candidate control trajectory using a cost function, update the value of each control action selected in each candidate control trajectory, and select a candidate control trajectory with the highest value using UCT to apply to control the environmental control system.
    Type: Application
    Filed: January 14, 2015
    Publication date: July 14, 2016
    Inventors: Todd Andrew Hester, Evan Jarman Fisher, Piyush Khandelwal