Patents by Inventor Craig Edgar Boutilier

Craig Edgar Boutilier has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Systems and Methods for Simulating a Complex Reinforcement Learning Environment

Publication number: 20230117499

Abstract: A computing system for simulating allocation of resources to a plurality of entities is disclosed. The computing system can be configured to input an entity profile that describes a preference and/or demand of a simulated entity into a reinforcement learning agent model and receive, as an output of the reinforcement learning agent model, an allocation output that describes a resource allocation for the simulated entity. The computing system can select one or more resources based on the resource allocation described by the allocation output and provide the resource(s) to an entity model that is configured to simulate a simulated response output that describes a response of the simulated entity. The computing system can receive, as an output of the entity model, the simulated response output and update a resource profile that describes the at least one resource and/or the entity profile based on the simulated response output.

Type: Application

Filed: October 17, 2022

Publication date: April 20, 2023

Inventors: Tze Way Eugene Ie, Sanmit Santosh Narvekar, Craig Edgar Boutilier
Systems and methods for simulating a complex reinforcement learning environment

Patent number: 11475355

Abstract: A computing system for simulating allocation of resources to a plurality of entities is disclosed. The computing system can be configured to input an entity profile that describes a preference and/or demand of a simulated entity into a reinforcement learning agent model and receive, as an output of the reinforcement learning agent model, an allocation output that describes a resource allocation for the simulated entity. The computing system can select one or more resources based on the resource allocation described by the allocation output and provide the resource(s) to an entity model that is configured to simulate a simulated response output that describes a response of the simulated entity. The computing system can receive, as an output of the entity model, the simulated response output and update a resource profile that describes the at least one resource and/or the entity profile based on the simulated response output.

Type: Grant

Filed: February 28, 2019

Date of Patent: October 18, 2022

Assignee: GOOGLE LLC

Inventors: Tze Way Eugene Ie, Sanmit Santosh Narvekar, Craig Edgar Boutilier
CONTROLLING AGENTS USING REINFORCEMENT LEARNING WITH MIXED-INTEGER PROGRAMMING

Publication number: 20220044110

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network system used to control an agent interacting with an environment. One of the methods includes obtaining a plurality of transitions that are each generated as a result of an agent interacting with an environment, and training a Q neural network having a mixed-integer programming (MIP) formulation on the transitions. The Q neural network is configured to process an observation and initial action constraints in accordance with the Q network parameters to generate a MIP problem based on a Q value objective and the initial action constraints. The initial action constraints specify a set of possible actions that can be performed by the agent to interact with the environment.

Type: Application

Filed: August 6, 2020

Publication date: February 10, 2022

Inventors: Mungyung Ryu, Yinlam Chow, Ross Michael Anderson, Christian Tjandraatmadja, Craig Edgar Boutilier
DETERMINING CONTROL POLICIES BY MINIMIZING THE IMPACT OF DELUSION

Publication number: 20210383218

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining a control policy for an agent interacting with an environment. One of the methods includes updating the control policy using policy-consistent backups using Q learning. To determine a policy-consistent backup, the system determining a policy-consistent backup for the control policy at the current observation—current action pair, comprising: for each of a plurality of actions in a set of possible actions that can be performed by the agent, identifying Q values assigned by the control policy to next observation—action pairs by the control policy and justified by at least one of the information sets; pruning, from the identified Q values, any Q values that are justified only by information sets that are not policy-class consistent; and determining, from the reward and only the identified Q values that were not pruned, the policy-consistent backup.

Type: Application

Filed: October 29, 2019

Publication date: December 9, 2021

Inventors: Tian Lu, Dale Eric Schuurmans, Craig Edgar Boutilier
REINFORCEMENT LEARNING IN COMBINATORIAL ACTION SPACES

Publication number: 20210081753

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for reinforcement learning in combinatorial action spaces. One of the methods includes receiving an observation characterizing a current state of an environment; for each of a plurality of candidate actions: processing a network input using a Q neural network to generate a Q value that represents a return received if the candidate action is selected while the candidate action is presented in response to the received observation, processing the network input using a myopic neural network to generate a myopic output that represents a likelihood that the candidate action will be selected if the candidate action is presented in response to the received observation, and combining the myopic output and the Q value for the candidate action to generate a selection score for the candidate action; and selecting the candidate actions having the highest selection scores.

Type: Application

Filed: May 20, 2019

Publication date: March 18, 2021

Applicant: Google LLC

Inventors: Tze Way Eugene IE, Vihan JAIN, Jing WANG, Ritesh AGARWAL, Craig Edgar BOUTILIER
Systems and Methods for Simulating a Complex Reinforcement Learning Environment

Publication number: 20200250575

Abstract: A computing system for simulating allocation of resources to a plurality of entities is disclosed. The computing system can be configured to input an entity profile that describes a preference and/or demand of a simulated entity into a reinforcement learning agent model and receive, as an output of the reinforcement learning agent model, an allocation output that describes a resource allocation for the simulated entity. The computing system can select one or more resources based on the resource allocation described by the allocation output and provide the resource(s) to an entity model that is configured to simulate a simulated response output that describes a response of the simulated entity. The computing system can receive, as an output of the entity model, the simulated response output and update a resource profile that describes the at least one resource and/or the entity profile based on the simulated response output.

Type: Application

Filed: February 28, 2019

Publication date: August 6, 2020

Inventors: Tze Way Eugene Ie, Sanmit Santosh Narvekar, Craig Edgar Boutilier
Systems and Methods for Intervention Optimization

Publication number: 20200125990

Abstract: The present disclosure provides systems and methods for intervention optimization. A computing system obtain an entity history of each of a plurality of entities of a computer application. For each of the plurality of entities, the computing system can determine a respective probability that each of a plurality of available interventions will improve an objective value that is determined based at least in part on a measure of continued use of a computer application by the entity. The computing system can provide interventions of the plurality of available interventions to entities of the plurality of entities based at least in part on the respective probabilities determined via the machine-learned intervention selection model. Thus, a computing system can employ a machine-learned intervention selection model to select, on an entity-by-entity basis, interventions that are predicted to prevent the entity from churning out of the computer application.

Type: Application

Filed: January 30, 2019

Publication date: April 23, 2020

Inventors: John Burge, Benjamin Frenkel, Craig Edgar Boutilier, Victor Lum, Yi-Lun Ruan, Jumana Al Hashal, Hamid Mousavi, Subir Jhanb, Viren Baraiya, Aditya Gautam
System and method for identifying and ranking user preferences

Patent number: 9727653

Abstract: Methods and systems for learning models of the preferences of members drawn from some population or group, utilizing arbitrary paired preferences of those members, in any commonly used ranking model are disclosed. These methods and systems utilize techniques for learning Mallows models, and mixtures thereof, from pairwise preference data.

Type: Grant

Filed: March 8, 2012

Date of Patent: August 8, 2017

Assignee: Google Inc.

Inventors: Tian Lu, Craig Edgar Boutilier
METHOD AND SYSTEM FOR ROBUST SOCIAL CHOICES AND VOTE ELICITATION

Publication number: 20140181102

Abstract: The present invention is a method and system for learning models of the preferences of members drawn from some population or group, utilizing arbitrary paired preferences of those members, in any commonly used ranking model. In particular the present invention involves techniques for learning Mallows models, and mixtures thereof, from pairwise preference data.

Type: Application

Filed: March 8, 2012

Publication date: June 26, 2014

Inventors: Tian Lu, Craig Edgar Boutilier
SYSTEM AND METHOD FOR AUTOMATIC SEGMENTATION AND MATCHING OF CUSTOMERS TO VENDIBLE ITEMS

Publication number: 20140164171

Abstract: A computer implemented method and system of providing an optimal matching of customer preference sets to vendible items using an objective function that measures utility of one or more customers associated with the customer preference sets.

Type: Application

Filed: September 11, 2012

Publication date: June 12, 2014

Inventors: Tian Lu, Craig Edgar Boutilier
METHOD AND SYSTEM FOR ROBUST SOCIAL CHOICES AND VOTE ELICITATION

Publication number: 20140081717

Abstract: The present invention is a system, method and computer program for generating an optimal decision based on general, incomplete decision-making input, such as incomplete preferences. Input may be provided from a variety of entities (including human and computer entities). The present invention may be operable to utilize such input to make a set of decisions and an optimal decision may be efficiently generated, even if the input represents incomplete voter preferences. The present invention may also undertake a decision-making process that involves a facility to compute minimax regret and to elicit preferences from a voter. Preferences may be elicited by one or more queries posed to a voter about their pairwise preferences in such a way so as to maximally reduce minimax regret. The type of queries and order of queries posed may be determined in accordance with the most efficient decision-making process to arrive efficiently at the optimal decision.

Type: Application

Filed: March 5, 2012

Publication date: March 20, 2014

Inventors: Tian Lu, Craig Edgar Boutilier