Patents by Inventor Todd Andrew Hester
Todd Andrew Hester has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11886997Abstract: An off-policy reinforcement learning actor-critic neural network system configured to select actions from a continuous action space to be performed by an agent interacting with an environment to perform a task. An observation defines environment state data and reward data. The system has an actor neural network which learns a policy function mapping the state data to action data. A critic neural network learns an action-value (Q) function. A replay buffer stores tuples of the state data, the action data, the reward data and new state data. The replay buffer also includes demonstration transition data comprising a set of the tuples from a demonstration of the task within the environment. The neural network system is configured to train the actor neural network and the critic neural network off-policy using stored tuples from the replay buffer comprising tuples both from operation of the system and from the demonstration transition data.Type: GrantFiled: October 7, 2022Date of Patent: January 30, 2024Assignee: DeepMind Technologies LimitedInventors: Olivier Pietquin, Martin Riedmiller, Wang Fumin, Bilal Piot, Mel Vecerik, Todd Andrew Hester, Thomas Rothoerl, Thomas Lampe, Nicolas Manfred Otto Heess, Jonathan Karl Scholz
-
Patent number: 11868882Abstract: An off-policy reinforcement learning actor-critic neural network system configured to select actions from a continuous action space to be performed by an agent interacting with an environment to perform a task. An observation defines environment state data and reward data. The system has an actor neural network which learns a policy function mapping the state data to action data. A critic neural network learns an action-value (Q) function. A replay buffer stores tuples of the state data, the action data, the reward data and new state data. The replay buffer also includes demonstration transition data comprising a set of the tuples from a demonstration of the task within the environment. The neural network system is configured to train the actor neural network and the critic neural network off-policy using stored tuples from the replay buffer comprising tuples both from operation of the system and from the demonstration transition data.Type: GrantFiled: June 28, 2018Date of Patent: January 9, 2024Assignee: DeepMind Technologies LimitedInventors: Olivier Claude Pietquin, Martin Riedmiller, Wang Fumin, Bilal Piot, Mel Vecerik, Todd Andrew Hester, Thomas Rothoerl, Thomas Lampe, Nicolas Manfred Otto Heess, Jonathan Karl Scholz
-
Patent number: 11836599Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for improving operational efficiency within a data center by modeling data center performance and predicting power usage efficiency. An example method receives a state input characterizing a current state of a data center. For each data center setting slate, the state input and the data center setting slate are processed through an ensemble of machine learning models. Each machine learning model is configured to receive and process the state input and the data center setting slate to generate an efficiency score that characterizes a predicted resource efficiency of the data center if the data center settings defined by the data center setting slate are adopted t. The method selects, based on the efficiency scores for the data center setting slates, new values for the data center settings.Type: GrantFiled: May 26, 2021Date of Patent: December 5, 2023Assignee: DeepMind Technologies LimitedInventors: Richard Andrew Evans, Jim Gao, Michael C. Ryan, Gabriel Dulac-Arnold, Jonathan Karl Scholz, Todd Andrew Hester
-
Patent number: 11604941Abstract: A method of training an action selection neural network to perform a demonstrated task using a supervised learning technique. The action selection neural network is configured to receive demonstration data comprising actions to perform the task and rewards received for performing the actions. The action selection neural network has auxiliary prediction task neural networks on one or more of its intermediate outputs. The action selection policy neural network is trained using multiple combined losses, concurrently with the auxiliary prediction task neural networks.Type: GrantFiled: October 29, 2018Date of Patent: March 14, 2023Assignee: DeepMind Technologies LimitedInventor: Todd Andrew Hester
-
Publication number: 20230023189Abstract: An off-policy reinforcement learning actor-critic neural network system configured to select actions from a continuous action space to be performed by an agent interacting with an environment to perform a task. An observation defines environment state data and reward data. The system has an actor neural network which learns a policy function mapping the state data to action data. A critic neural network learns an action-value (Q) function. A replay buffer stores tuples of the state data, the action data, the reward data and new state data. The replay buffer also includes demonstration transition data comprising a set of the tuples from a demonstration of the task within the environment. The neural network system is configured to train the actor neural network and the critic neural network off-policy using stored tuples from the replay buffer comprising tuples both from operation of the system and from the demonstration transition data.Type: ApplicationFiled: October 7, 2022Publication date: January 26, 2023Inventors: Olivier Pietquin, Martin Riedmiller, Wang Fumin, Bilal Piot, Mel Vecerik, Todd Andrew Hester, Thomas Rothoerl, Thomas Lampe, Nicolas Manfred Otto Heess, Jonathan Karl Scholz
-
Publication number: 20220343157Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a policy neural network having policy parameters. One of the methods includes sampling a mini-batch comprising one or more observation-action-reward tuples generated as a result of interactions of a first agent with a first environment; determining an update to current values of the Q network parameters by minimizing a robust entropy-regularized temporal difference (TD) error that accounts for possible perturbations of the states of the first environment represented by the observations in the observation-action-reward tuples; and determining, using the Q-value neural network, an update to the policy network parameters using the sampled mini-batch of observation-action-reward tuples.Type: ApplicationFiled: June 17, 2020Publication date: October 27, 2022Inventors: Daniel J. Mankowitz, Nir Levine, Rae Chan Jeong, Abbas Abdolmaleki, Jost Tobias Springenberg, Todd Andrew Hester, Timothy Arthur Mann, Martin Riedmiller
-
Patent number: 11468321Abstract: An off-policy reinforcement learning actor-critic neural network system configured to select actions from a continuous action space to be performed by an agent interacting with an environment to perform a task. An observation defines environment state data and reward data. The system has an actor neural network which learns a policy function mapping the state data to action data. A critic neural network learns an action-value (Q) function. A replay buffer stores tuples of the state data, the action data, the reward data and new state data. The replay buffer also includes demonstration transition data comprising a set of the tuples from a demonstration of the task within the environment. The neural network system is configured to train the actor neural network and the critic neural network off-policy using stored tuples from the replay buffer comprising tuples both from operation of the system and from the demonstration transition data.Type: GrantFiled: June 28, 2018Date of Patent: October 11, 2022Assignee: DeepMind Technologies LimitedInventors: Olivier Claude Pietquin, Martin Riedmiller, Wang Fumin, Bilal Piot, Mel Vecerik, Todd Andrew Hester, Thomas Rothoerl, Thomas Lampe, Nicolas Manfred Otto Heess, Jonathan Karl Scholz
-
Publication number: 20210287072Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for improving operational efficiency within a data center by modeling data center performance and predicting power usage efficiency. An example method receives a state input characterizing a current state of a data center. For each data center setting slate, the state input and the data center setting slate are processed through an ensemble of machine learning models. Each machine learning model is configured to receive and process the state input and the data center setting slate to generate an efficiency score that characterizes a predicted resource efficiency of the data center if the data center settings defined by the data center setting slate are adopted t. The method selects, based on the efficiency scores for the data center setting slates, new values for the data center settings.Type: ApplicationFiled: May 26, 2021Publication date: September 16, 2021Inventors: Richard Andrew Evans, Jim Gao, Michael C. Ryan, Gabriel Dulac-Arnold, Jonathan Karl Scholz, Todd Andrew Hester
-
Publication number: 20200272889Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for improving operational efficiency within a data center by modeling data center performance and predicting power usage efficiency. An example method receives a state input characterizing a current state of a data center. For each data center setting slate, the state input and the data center setting slate are processed through an ensemble of machine learning models. Each machine learning model is configured to receive and process the state input and the data center setting slate to generate an efficiency score that characterizes a predicted resource efficiency of the data center if the data center settings defined by the data center setting slate are adopted t. The method selects, based on the efficiency scores for the data center setting slates, new values for the data center settings.Type: ApplicationFiled: April 30, 2020Publication date: August 27, 2020Inventors: Richard Andrew Evans, Jim Gao, Michael C. Ryan, Gabriel Dulac-Arnold, Jonathan Karl Scholz, Todd Andrew Hester
-
Publication number: 20200151562Abstract: An off-policy reinforcement learning actor-critic neural network system configured to select actions from a continuous action space to be performed by an agent interacting with an environment to perform a task. An observation defines environment state data and reward data. The system has an actor neural network which learns a policy function mapping the state data to action data. A critic neural network learns an action-value (Q) function. A replay buffer stores tuples of the state data, the action data, the reward data and new state data. The replay buffer also includes demonstration transition data comprising a set of the tuples from a demonstration of the task within the environment. The neural network system is configured to train the actor neural network and the critic neural network off-policy using stored tuples from the replay buffer comprising tuples both from operation of the system and from the demonstration transition data.Type: ApplicationFiled: June 28, 2018Publication date: May 14, 2020Inventors: Olivier Pietquin, Martin Riedmiller, Wang Fumin, Bilal Piot, Mel Vecerik, Todd Andrew Hester, Thomas Rothörl, Thomas Lampe, Nicolas Manfred Otto Heess, Jonathan Karl Scholz
-
Patent number: 10643121Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for improving operational efficiency within a data center by modeling data center performance and predicting power usage efficiency. An example method receives a state input characterizing a current state of a data center. For each data center setting slate, the state input and the data center setting slate are processed through an ensemble of machine learning models. Each machine learning model is configured to receive and process the state input and the data center setting slate to generate an efficiency score that characterizes a predicted resource efficiency of the data center if the data center settings defined by the data center setting slate are adopted t. The method selects, based on the efficiency scores for the data center setting slates, new values for the data center settings.Type: GrantFiled: January 19, 2017Date of Patent: May 5, 2020Assignee: DeepMind Technologies LimitedInventors: Richard Andrew Evans, Jim Gao, Michael C. Ryan, Gabriel Dulac-Arnold, Jonathan Karl Scholz, Todd Andrew Hester
-
Publication number: 20180204116Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for improving operational efficiency within a data center by modeling data center performance and predicting power usage efficiency. An example method receives a state input characterizing a current state of a data center. For each data center setting slate, the state input and the data center setting slate are processed through an ensemble of machine learning models. Each machine learning model is configured to receive and process the state input and the data center setting slate to generate an efficiency score that characterizes a predicted resource efficiency of the data center if the data center settings defined by the data center setting slate are adopted t. The method selects, based on the efficiency scores for the data center setting slates, new values for the data center settings.Type: ApplicationFiled: January 19, 2017Publication date: July 19, 2018Inventors: Richard Andrew Evans, Jim Gao, Michael C. Ryan, Gabriel Dulac-Arnold, Jonathan Karl Scholz, Todd Andrew Hester
-
Patent number: 9869484Abstract: In an embodiment, an electronic device may include a processor that may iteratively simulate candidate control trajectories using upper confidence bound for trees (UCT) to control an environmental control system (e.g., an HVAC system). Each candidate control trajectory may be simulated by selecting a control action at each of a plurality of time steps over a period of time that has the highest upper bound on possible performance using values from previous simulations and predicting a temperature for a next time step of the plurality of time steps that results from applying the selected control action using a thermal model. The processor may determine a value of each candidate control trajectory using a cost function, update the value of each control action selected in each candidate control trajectory, and select a candidate control trajectory with the highest value using UCT to apply to control the environmental control system.Type: GrantFiled: January 14, 2015Date of Patent: January 16, 2018Assignee: Google Inc.Inventors: Todd Andrew Hester, Evan Jarman Fisher, Piyush Khandelwal
-
Patent number: 9772116Abstract: In an embodiment, an electronic device may include storage containing processor-executable instructions, a preference function that maps weights indicating likely user preferences for the range of values of a device setting in relation to a range of values of a variable, and a current automated device control schedule configured to control the device setting of the electronic device in relation to the variable, and a processor. The instructions may cause the processor to determine the current automated device control schedule based on the preference function by detecting user behavior that indicates satisfaction or dissatisfaction with values of the device setting in relation to the variable, updating the preference function based on the detected user behavior, and determining the current automated device control schedule by comparing a number of candidate device control schedules against the weights of the preference function and selecting the candidate with the highest score.Type: GrantFiled: November 4, 2014Date of Patent: September 26, 2017Assignee: Google Inc.Inventors: Todd Andrew Hester, Allen Joseph Minich, George Alban Heitz, III
-
Publication number: 20160201933Abstract: In an embodiment, an electronic device may include a power source configured to provide operational power to the electronic device and a processor coupled to the power source. The processor may be configured to generate temperature predictions using a model of a structure and possible control scenarios, determine a value of the temperature predictions and the respective possible control scenarios using a cost function, the cost function comprising weighted factors related to an error between a setpoint temperature and the temperature predictions, a length of runtime for an environmental control system (e.g., an HVAC system), and a length of environmental control system cycles. The processor may also be configured to select the control scenario with the highest value to apply to control the environmental control system. The control scenarios may be generated using upper confidence bound for trees (UCT).Type: ApplicationFiled: January 14, 2015Publication date: July 14, 2016Inventors: Todd Andrew Hester, Evan Jarman Fisher, Piyush Khandelwal
-
Publication number: 20160201934Abstract: In an embodiment, an electronic device may include a processor that may iteratively simulate candidate control trajectories using upper confidence bound for trees (UCT) to control an environmental control system (e.g., an HVAC system). Each candidate control trajectory may be simulated by selecting a control action at each of a plurality of time steps over a period of time that has the highest upper bound on possible performance using values from previous simulations and predicting a temperature for a next time step of the plurality of time steps that results from applying the selected control action using a thermal model. The processor may determine a value of each candidate control trajectory using a cost function, update the value of each control action selected in each candidate control trajectory, and select a candidate control trajectory with the highest value using UCT to apply to control the environmental control system.Type: ApplicationFiled: January 14, 2015Publication date: July 14, 2016Inventors: Todd Andrew Hester, Evan Jarman Fisher, Piyush Khandelwal