Patents by Inventor Kurt Hartwig Graepel

Kurt Hartwig Graepel has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

JOINTLY UPDATING AGENT CONTROL POLICIES USING ESTIMATED BEST RESPONSES TO CURRENT CONTROL POLICIES

Publication number: 20240046112

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating control policies for controlling agents in an environment. One of the methods includes, at each of a plurality of iterations: obtaining a current joint control policy for a plurality of agents, the current joint control policy specifying a respective current control policy for each agent; and updating the current joint control policy, comprising, for each agent: generating a respective reward estimate for each of a plurality of alternate control policies that is an estimate of a reward received by the agent if the agent is controlled using the alternate control policy while the other agents are controlled using the respective current control policies; computing a best response for the agent from the respective reward estimates; and updating the respective current control policy for the agent using the best response for the agent.

Type: Application

Filed: February 7, 2022

Publication date: February 8, 2024

Inventors: Luke Christopher Marris, Paul Fernand Michel Muller, Marc Lanctot, Thore Kurt Hartwig Graepel
SELECTING POINTS IN CONTINUOUS SPACES USING NEURAL NETWORKS

Publication number: 20220374683

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for selecting an optimal feature point in a continuous domain for a group of agents. A computer-implemented system obtains, for each of a plurality of agents, respective training data that comprises a respective utility score for each of a plurality of discrete points in the continuous domain. The system trains, for each of the plurality of agents and on the respective training data for the agents, a respective neural network that is configured to receive an input comprising a point in the continuous domain and to generate as output a predicted utility score for the agent at the point.

Type: Application

Filed: February 9, 2022

Publication date: November 24, 2022

Inventors: Thomas Edward Eccles, Ian Michael Gemp, János Kramár, Marta Garnelo Abellanas, Dan Rosenbaum, Yoram Bachrach, Thore Kurt Hartwig Graepel
TRAINING A POLICY NEURAL NETWORK FOR CONTROLLING AN AGENT USING BEST RESPONSE POLICY ITERATION

Publication number: 20220261635

Abstract: Methods, systems and apparatus, including computer programs encoded on computer storage media, for training a policy neural network by repeatedly updating the policy neural network at each of a plurality of training iterations. One of the methods includes generating training data for the training iteration by controlling the agent in accordance with an improved policy that selects actions in response to input state representations. A best response computation is performed using (i) a candidate policy generated from respective policy neural networks as of one or more preceding iterations and (ii) a candidate value neural network. The candidate value neural network is configured to generate a value output that is an estimate of a value of the environment being in the state characterized by a state representation to complete a particular task. The policy neural network is updated by training the policy neural network on the training data.

Type: Application

Filed: January 7, 2022

Publication date: August 18, 2022

Inventors: Thomas William Anthony, Thomas Edward Eccles, Andrea Tacchetti, János Kramár, Ian Michael Gemp, Thomas Chalmers Hudson, Nicolas Pierre Mickaël Porcel, Marc Lanctot, Julien Perolat, Richard Everett, Thore Kurt Hartwig Graepel, Yoram Bachrach
Neural network architecture for efficient resource allocation

Patent number: 11250475

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for efficiently allocating resources among participants. Methods can include receiving valuation data specifying, for each of a plurality of entities, a respective valuation for each of a plurality of resource subsets, each resource subset comprising a different combination of one or more resources of a plurality of resources. After receiving valuation data, assigning each resource in the plurality of resources to a respective entity of the plurality of entities based on the valuations and generating, for each particular entity, a respective input representation that is derived from valuations of every other entity in the plurality of entities other than the particular entity. The input representation for each particular entity is processed using a neural network to generate a rule for the particular entity and a payment based on the rule output for the entities.

Type: Grant

Filed: July 1, 2020

Date of Patent: February 15, 2022

Assignee: DeepMind Technologies Limited

Inventors: Andrea Tacchetti, Daniel Joseph Strouse, Marta Garnelo Abellanas, Thore Kurt Hartwig Graepel, Yoram Bachrach
NEURAL NETWORK ARCHITECTURE FOR EFFICIENT RESOURCE ALLOCATION

Publication number: 20220005079

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for efficiently allocating resources among participants. Methods can include receiving valuation data specifying, for each of a plurality of entities, a respective valuation for each of a plurality of resource subsets, each resource subset comprising a different combination of one or more resources of a plurality of resources. After receiving valuation data, assigning each resource in the plurality of resources to a respective entity of the plurality of entities based on the valuations and generating, for each particular entity, a respective input representation that is derived from valuations of every other entity in the plurality of entities other than the particular entity. The input representation for each particular entity is processed using a neural network to generate a rule for the particular entity and a payment based on the rule output for the entities.

Type: Application

Filed: July 1, 2020

Publication date: January 6, 2022

Inventors: Andrea Tacchetti, Daniel Joseph Strouse, Marta Garnelo Abellanas, Thore Kurt Hartwig Graepel, Yoram Bachrach
Selecting actions to be performed by a reinforcement learning agent using tree search

Patent number: 10867242

Abstract: Methods, systems and apparatus, including computer programs encoded on computer storage media, for training a value neural network that is configured to receive an observation characterizing a state of an environment being interacted with by an agent and to process the observation in accordance with parameters of the value neural network to generate a value score. One of the systems performs operations that include training a supervised learning policy neural network; initializing initial values of parameters of a reinforcement learning policy neural network having a same architecture as the supervised learning policy network to the trained values of the parameters of the supervised learning policy neural network; training the reinforcement learning policy neural network on second training data; and training the value neural network to generate a value score for the state of the environment that represents a predicted long-term reward resulting from the environment being in the state.

Type: Grant

Filed: September 29, 2016

Date of Patent: December 15, 2020

Assignee: DeepMind Technologies Limited

Inventors: Thore Kurt Hartwig Graepel, Shih-Chieh Huang, David Silver, Arthur Clement Guez, Laurent Sifre, Ilya Sutskever, Christopher Maddison
Relational database management

Patent number: 10685062

Abstract: New methods of relational database management are described, for example, to enable completion and checking of data in relational databases, including completion of missing foreign key values, to facilitate understanding of data in relational databases, to highlight data that it would be useful to add to a relational database and for other applications. In various embodiments, the schema of a relational database is used to automatically create a probabilistic graphical model that has a structure related to the schema. For example, nodes representing individual rows are linked to rows of other tables according to the database schema. In examples, data in the relational database is used to carry out inference using inference algorithms derived from the probabilistic graphical model. In various examples, inference results, comprising probability distributions each for an individual table cell, are used to fill missing data, highlight errors, and for other purposes.

Type: Grant

Filed: December 31, 2012

Date of Patent: June 16, 2020

Assignee: Microsoft Technology Licensing, LLC

Inventors: Sameer Singh, Thore Kurt Hartwig Graepel, Lucas Julien Bordeaux, Andrew Donald Gordon
TRAINING A POLICY NEURAL NETWORK AND A VALUE NEURAL NETWORK

Publication number: 20180032863

Abstract: Methods, systems and apparatus, including computer programs encoded on computer storage media, for training a value neural network that is configured to receive an observation characterizing a state of an environment being interacted with by an agent and to process the observation in accordance with parameters of the value neural network to generate a value score. One of the systems performs operations that include training a supervised learning policy neural network; initializing initial values of parameters of a reinforcement learning policy neural network having a same architecture as the supervised learning policy network to the trained values of the parameters of the supervised learning policy neural network; training the reinforcement learning policy neural network on second training data; and training the value neural network to generate a value score for the state of the environment that represents a predicted long-term reward resulting from the environment being in the state.

Type: Application

Filed: September 29, 2016

Publication date: February 1, 2018

Inventors: Thore Kurt Hartwig Graepel, Shih-Chieh Huang, David Silver, Arthur Clement Guez, Laurent Sifre, Ilya Sutskever, Christopher Maddison
SELECTING ACTIONS TO BE PERFORMED BY A REINFORCEMENT LEARNING AGENT USING TREE SEARCH

Publication number: 20180032864

Abstract: Methods, systems and apparatus, including computer programs encoded on computer storage media, for training a value neural network that is configured to receive an observation characterizing a state of an environment being interacted with by an agent and to process the observation in accordance with parameters of the value neural network to generate a value score. One of the systems performs operations that include training a supervised learning policy neural network; initializing initial values of parameters of a reinforcement learning policy neural network having a same architecture as the supervised learning policy network to the trained values of the parameters of the supervised learning policy neural network; training the reinforcement learning policy neural network on second training data; and training the value neural network to generate a value score for the state of the environment that represents a predicted long-term reward resulting from the environment being in the state.

Type: Application

Filed: September 29, 2016

Publication date: February 1, 2018

Inventors: Thore Kurt Hartwig Graepel, Shih-Chieh Huang, David Silver, Arthur Clement Guez, Laurent Sifre, Ilya Sutskever, Christopher Maddison
Database access

Patent number: 9418086

Abstract: Database access is described, for example, where data in a database is accessed by an inference engine. In various examples, the inference engine executes inference algorithms to access data from the database and carry out inference using the data. In examples the inference algorithms are compiled from a schema of the database which is annotated with expressions of probability distributions over data in the database. In various examples the schema of the database is modified by adding one or more latent columns or latent tables to the schema for storing data to be inferred by the inference engine. In examples the expressions are compositional so, for example, an expression annotating a column of a database table may be used as part of an expression annotating another column of the database.

Type: Grant

Filed: August 20, 2013

Date of Patent: August 16, 2016

Assignee: Microsoft Technology Licensing, LLC

Inventors: Andrew Donald Gordon, Thore Kurt Hartwig Graepel, Nicolas Philippe Marie Rolland, Eric Johannes Borgstrom, Claudio Vittorio Russo
DATABASE ACCESS

Publication number: 20150058337

Abstract: Database access is described, for example, where data in a database is accessed by an inference engine. In various examples, the inference engine executes inference algorithms to access data from the database and carry out inference using the data. In examples the inference algorithms are compiled from a schema of the database which is annotated with expressions of probability distributions over data in the database. In various examples the schema of the database is modified by adding one or more latent columns or latent tables to the schema for storing data to be inferred by the inference engine. In examples the expressions are compositional so, for example, an expression annotating a column of a database table may be used as part of an expression annotating another column of the database.

Type: Application

Filed: August 20, 2013

Publication date: February 26, 2015

Applicant: Microsoft Corporation

Inventors: Andrew Donald Gordon, Thore Kurt Hartwig Graepel, Nicolas Philippe Marie Rolland, Eric Johannes Borgstrom, Claudio Vittorio Russo
Parallelization of online learning algorithms

Patent number: 8904149

Abstract: Methods, systems, and media are provided for a dynamic batch strategy utilized in parallelization of online learning algorithms. The dynamic batch strategy provides a merge function on the basis of a threshold level difference between the original model state and an updated model state, rather than according to a constant or pre-determined batch size. The merging includes reading a batch of incoming streaming data, retrieving any missing model beliefs from partner processors, and training on the batch of incoming streaming data. The steps of reading, retrieving, and training are repeated until the measured difference in states exceeds a set threshold level. The measured differences which exceed the threshold level are merged for each of the plurality of processors according to attributes. The merged differences which exceed the threshold level are combined with the original partial model states to obtain an updated global model state.

Type: Grant

Filed: June 24, 2010

Date of Patent: December 2, 2014

Assignee: Microsoft Corporation

Inventors: Taha Bekir Eren, Oleg Isakov, Weizhu Chen, Jeffrey Scott Dunn, Thomas Ivan Borchert, Joaquin Quinonero Candela, Thore Kurt Hartwig Graepel, Ralf Herbrich
RELATIONAL DATABASE MANAGEMENT

Publication number: 20140188928

Abstract: New methods of relational database management are described, for example, to enable completion and checking of data in relational databases, including completion of missing foreign key values, to facilitate understanding of data in relational databases, to highlight data that it would be useful to add to a relational database and for other applications. In various embodiments, the schema of a relational database is used to automatically create a probabilistic graphical model that has a structure related to the schema. For example, nodes representing individual rows are linked to rows of other tables according to the database schema. In examples, data in the relational database is used to carry out inference using inference algorithms derived from the probabilistic graphical model. In various examples, inference results, comprising probability distributions each for an individual table cell, are used to fill missing data, highlight errors, and for other purposes.

Type: Application

Filed: December 31, 2012

Publication date: July 3, 2014

Applicant: MICROSOFT CORPORATION

Inventors: Sameer Singh, Thore Kurt Hartwig Graepel, Lucas Julien Bordeaux, Andrew Donald Gordon
FEATURE VECTOR CONSTRUCTION

Publication number: 20120158791

Abstract: Feature vector construction techniques are described. In one or more implementations, an input is received at a computing device that describes a graph query that specifies one of a plurality of entities to be used to query a knowledge base graph that represents the plurality of entities. A feature vector is constructed, by the computing device, having a number of indicator variables, each of which indicates observance of a sub-graph feature represented by a respective indicator variable in the knowledge base graph.

Type: Application

Filed: December 21, 2010

Publication date: June 21, 2012

Applicant: MICROSOFT CORPORATION

Inventors: Gjergji Kasneci, David Hector Stern, Thore Kurt Hartwig Graepel, Ralf Herbrich
Parallelization of Online Learning Algorithms

Publication number: 20110320767

Abstract: Methods, systems, and media are provided for a dynamic batch strategy utilized in parallelization of online learning algorithms. The dynamic batch strategy provides a merge function on the basis of a threshold level difference between the original model state and an updated model state, rather than according to a constant or pre-determined batch size. The merging includes reading a batch of incoming streaming data, retrieving any missing model beliefs from partner processors, and training on the batch of incoming streaming data. The steps of reading, retrieving, and training are repeated until the measured difference in states exceeds a set threshold level. The measured differences which exceed the threshold level are merged for each of the plurality of processors according to attributes. The merged differences which exceed the threshold level are combined with the original partial model states to obtain an updated global model state.

Type: Application

Filed: June 24, 2010

Publication date: December 29, 2011

Applicant: MICROSOFT CORPORATION

Inventors: Taha Bekir Eren, Oleg Isakov, Weizhu Chen, Jeffrey Scott Dunn, Thomas Ivan Borchert, Joaquin Quinonero Candela, Thore Kurt Hartwig Graepel, Ralf Herbrich
PRESENTING CONTENT ITEMS USING TOPICAL RELEVANCE AND TRENDING POPULARITY

Publication number: 20110218946

Abstract: A user may request a presentation of a content item set, such as a social network comprising a set of status messages or an image database comprising a set of images. However, the volume and diversity of content items of the content item set may reduce the interest of the user in the presented content items. The potential interest of the user in the presented content items may be improved by selecting content items that are associated with one or more topics of potential interest to the user, and having a positive trending popularity among users of the content item set. Moreover, the interaction of the user with a presented content item may be monitored and used to determine the interest of the user in the topics associated with the presented content item and the popularity of the content item.

Type: Application

Filed: March 3, 2010

Publication date: September 8, 2011

Applicant: Microsoft Corporation

Inventors: David Stern, Ralf Herbrich, Milad Shokouhi, Thore Kurt Hartwig Graepel
Reward-driven adaptive agents for video games

Patent number: 7837543

Abstract: Adaptive agents are driven by rewards they receive based on the outcome of their behavior during actual game play. Accordingly, the adaptive agents are able to learn from experience within the gaming environment. Reward-driven adaptive agents can be trained at either or both of game-time or development time. Computer-controlled agents receive rewards (either positive or negative) at individual action intervals based on the effectiveness of the agents' actions (e.g., compliance with defined goals). The adaptive computer-controlled agent is motivated to perform actions that maximize its positive rewards and minimize is negative rewards.

Type: Grant

Filed: April 30, 2004

Date of Patent: November 23, 2010

Assignee: Microsoft Corporation

Inventors: Kurt Hartwig Graepel, Ralf Herbrich, Julian Gold