Patents by Inventor Craig Edgar Boutilier

Craig Edgar Boutilier has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230117499
    Abstract: A computing system for simulating allocation of resources to a plurality of entities is disclosed. The computing system can be configured to input an entity profile that describes a preference and/or demand of a simulated entity into a reinforcement learning agent model and receive, as an output of the reinforcement learning agent model, an allocation output that describes a resource allocation for the simulated entity. The computing system can select one or more resources based on the resource allocation described by the allocation output and provide the resource(s) to an entity model that is configured to simulate a simulated response output that describes a response of the simulated entity. The computing system can receive, as an output of the entity model, the simulated response output and update a resource profile that describes the at least one resource and/or the entity profile based on the simulated response output.
    Type: Application
    Filed: October 17, 2022
    Publication date: April 20, 2023
    Inventors: Tze Way Eugene Ie, Sanmit Santosh Narvekar, Craig Edgar Boutilier
  • Patent number: 11475355
    Abstract: A computing system for simulating allocation of resources to a plurality of entities is disclosed. The computing system can be configured to input an entity profile that describes a preference and/or demand of a simulated entity into a reinforcement learning agent model and receive, as an output of the reinforcement learning agent model, an allocation output that describes a resource allocation for the simulated entity. The computing system can select one or more resources based on the resource allocation described by the allocation output and provide the resource(s) to an entity model that is configured to simulate a simulated response output that describes a response of the simulated entity. The computing system can receive, as an output of the entity model, the simulated response output and update a resource profile that describes the at least one resource and/or the entity profile based on the simulated response output.
    Type: Grant
    Filed: February 28, 2019
    Date of Patent: October 18, 2022
    Assignee: GOOGLE LLC
    Inventors: Tze Way Eugene Ie, Sanmit Santosh Narvekar, Craig Edgar Boutilier
  • Publication number: 20220044110
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network system used to control an agent interacting with an environment. One of the methods includes obtaining a plurality of transitions that are each generated as a result of an agent interacting with an environment, and training a Q neural network having a mixed-integer programming (MIP) formulation on the transitions. The Q neural network is configured to process an observation and initial action constraints in accordance with the Q network parameters to generate a MIP problem based on a Q value objective and the initial action constraints. The initial action constraints specify a set of possible actions that can be performed by the agent to interact with the environment.
    Type: Application
    Filed: August 6, 2020
    Publication date: February 10, 2022
    Inventors: Mungyung Ryu, Yinlam Chow, Ross Michael Anderson, Christian Tjandraatmadja, Craig Edgar Boutilier
  • Publication number: 20210383218
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining a control policy for an agent interacting with an environment. One of the methods includes updating the control policy using policy-consistent backups using Q learning. To determine a policy-consistent backup, the system determining a policy-consistent backup for the control policy at the current observation—current action pair, comprising: for each of a plurality of actions in a set of possible actions that can be performed by the agent, identifying Q values assigned by the control policy to next observation—action pairs by the control policy and justified by at least one of the information sets; pruning, from the identified Q values, any Q values that are justified only by information sets that are not policy-class consistent; and determining, from the reward and only the identified Q values that were not pruned, the policy-consistent backup.
    Type: Application
    Filed: October 29, 2019
    Publication date: December 9, 2021
    Inventors: Tian Lu, Dale Eric Schuurmans, Craig Edgar Boutilier
  • Publication number: 20210081753
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for reinforcement learning in combinatorial action spaces. One of the methods includes receiving an observation characterizing a current state of an environment; for each of a plurality of candidate actions: processing a network input using a Q neural network to generate a Q value that represents a return received if the candidate action is selected while the candidate action is presented in response to the received observation, processing the network input using a myopic neural network to generate a myopic output that represents a likelihood that the candidate action will be selected if the candidate action is presented in response to the received observation, and combining the myopic output and the Q value for the candidate action to generate a selection score for the candidate action; and selecting the candidate actions having the highest selection scores.
    Type: Application
    Filed: May 20, 2019
    Publication date: March 18, 2021
    Applicant: Google LLC
    Inventors: Tze Way Eugene IE, Vihan JAIN, Jing WANG, Ritesh AGARWAL, Craig Edgar BOUTILIER
  • Publication number: 20200250575
    Abstract: A computing system for simulating allocation of resources to a plurality of entities is disclosed. The computing system can be configured to input an entity profile that describes a preference and/or demand of a simulated entity into a reinforcement learning agent model and receive, as an output of the reinforcement learning agent model, an allocation output that describes a resource allocation for the simulated entity. The computing system can select one or more resources based on the resource allocation described by the allocation output and provide the resource(s) to an entity model that is configured to simulate a simulated response output that describes a response of the simulated entity. The computing system can receive, as an output of the entity model, the simulated response output and update a resource profile that describes the at least one resource and/or the entity profile based on the simulated response output.
    Type: Application
    Filed: February 28, 2019
    Publication date: August 6, 2020
    Inventors: Tze Way Eugene Ie, Sanmit Santosh Narvekar, Craig Edgar Boutilier
  • Publication number: 20200125990
    Abstract: The present disclosure provides systems and methods for intervention optimization. A computing system obtain an entity history of each of a plurality of entities of a computer application. For each of the plurality of entities, the computing system can determine a respective probability that each of a plurality of available interventions will improve an objective value that is determined based at least in part on a measure of continued use of a computer application by the entity. The computing system can provide interventions of the plurality of available interventions to entities of the plurality of entities based at least in part on the respective probabilities determined via the machine-learned intervention selection model. Thus, a computing system can employ a machine-learned intervention selection model to select, on an entity-by-entity basis, interventions that are predicted to prevent the entity from churning out of the computer application.
    Type: Application
    Filed: January 30, 2019
    Publication date: April 23, 2020
    Inventors: John Burge, Benjamin Frenkel, Craig Edgar Boutilier, Victor Lum, Yi-Lun Ruan, Jumana Al Hashal, Hamid Mousavi, Subir Jhanb, Viren Baraiya, Aditya Gautam
  • Patent number: 9727653
    Abstract: Methods and systems for learning models of the preferences of members drawn from some population or group, utilizing arbitrary paired preferences of those members, in any commonly used ranking model are disclosed. These methods and systems utilize techniques for learning Mallows models, and mixtures thereof, from pairwise preference data.
    Type: Grant
    Filed: March 8, 2012
    Date of Patent: August 8, 2017
    Assignee: Google Inc.
    Inventors: Tian Lu, Craig Edgar Boutilier
  • Publication number: 20140181102
    Abstract: The present invention is a method and system for learning models of the preferences of members drawn from some population or group, utilizing arbitrary paired preferences of those members, in any commonly used ranking model. In particular the present invention involves techniques for learning Mallows models, and mixtures thereof, from pairwise preference data.
    Type: Application
    Filed: March 8, 2012
    Publication date: June 26, 2014
    Inventors: Tian Lu, Craig Edgar Boutilier
  • Publication number: 20140164171
    Abstract: A computer implemented method and system of providing an optimal matching of customer preference sets to vendible items using an objective function that measures utility of one or more customers associated with the customer preference sets.
    Type: Application
    Filed: September 11, 2012
    Publication date: June 12, 2014
    Inventors: Tian Lu, Craig Edgar Boutilier
  • Publication number: 20140081717
    Abstract: The present invention is a system, method and computer program for generating an optimal decision based on general, incomplete decision-making input, such as incomplete preferences. Input may be provided from a variety of entities (including human and computer entities). The present invention may be operable to utilize such input to make a set of decisions and an optimal decision may be efficiently generated, even if the input represents incomplete voter preferences. The present invention may also undertake a decision-making process that involves a facility to compute minimax regret and to elicit preferences from a voter. Preferences may be elicited by one or more queries posed to a voter about their pairwise preferences in such a way so as to maximally reduce minimax regret. The type of queries and order of queries posed may be determined in accordance with the most efficient decision-making process to arrive efficiently at the optimal decision.
    Type: Application
    Filed: March 5, 2012
    Publication date: March 20, 2014
    Inventors: Tian Lu, Craig Edgar Boutilier