Patents by Inventor Kailiang Hu

Kailiang Hu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Determining action selection policies of an execution device

Patent number: 10789810

Abstract: Disclosed herein are methods, systems, and apparatus for generating an action selection policy (ASP) of an execution device. One method includes, in a current iteration, computing a first reward for a current state based on respective first rewards for actions in the current state and an ASP of the current state in the current iteration; computing an accumulative respective regret value of each action in the current state based on a difference between the respective first reward for the action and the first reward for the current state; computing an ASP of the current state in the next iteration; computing a second reward for the current state based on the respective first rewards for the actions and the ASP of the current state in the next iteration; and determining an ASP of the previous state in the next iteration based on the second reward for the current state.

Type: Grant

Filed: December 12, 2019

Date of Patent: September 29, 2020

Assignee: Alibaba Group Holding Limited

Inventors: Hui Li, Kailiang Hu, Le Song
Sampling schemes for strategy searching in strategic interaction between parties

Patent number: 10769544

Abstract: Disclosed herein are methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing counterfactual regret minimization (CFR) for strategy searching in strategic interaction between parties. One of the methods includes: identifying N1 possible actions of a first party in a first state of the first party; sampling a possible action out of the N1 possible actions in the first state of the first party with a first sampling probability; identifying N2 possible actions of the first party in a second state of the first party, wherein the first state of the first party is closer to a beginning state of the IIG than the second state of the first party; sampling a possible action out of the N2 possible actions in the second state of the first party with a second sampling probability, wherein the first sampling probability is less than the second sampling probability.

Type: Grant

Filed: June 21, 2019

Date of Patent: September 8, 2020

Assignee: Alibaba Group Holding Limited

Inventors: Hui Li, Kailiang Hu, Le Song
SAMPLING SCHEMES FOR STRATEGY SEARCHING IN STRATEGIC INTERACTION BETWEEN PARTIES

Publication number: 20200234164

Abstract: Disclosed herein are methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing counterfactual regret minimization (CFR) for strategy searching in strategic interaction between parties. One of the methods includes: identifying N1 possible actions of a first party in a first state of the first party; sampling a possible action out of the N1 possible actions in the first state of the first party with a first sampling probability; identifying N2 possible actions of the first party in a second state of the first party, wherein the first state of the first party is closer to a beginning state of the IIG than the second state of the first party; sampling a possible action out of the N2 possible actions in the second state of the first party with a second sampling probability, wherein the first sampling probability is less than the second sampling probability.

Type: Application

Filed: June 21, 2019

Publication date: July 23, 2020

Applicant: Alibaba Group Holding Limited

Inventors: Hui Li, Kailiang Hu, Le Song
Determining action selection policies of an execution device

Patent number: 10719358

Abstract: Disclosed herein are methods, systems, and apparatus of an execution device for generating an action selection policy for completing a task in an environment that includes the execution device and one or more other devices. One method includes: in a current iteration, identifying an iterative action selection policy of an action in a state of the execution device in a previous iteration; computing a regret value in the previous iteration based on the iterative action selection policy in the previous iteration; computing an incremental action selection policy in the current iteration based on the regret value in the previous iteration but not any regret value in any iteration prior to the previous iteration; computing an iterative action selection policy in the current iteration based on the iterative action selection policy in the previous iteration and the incremental action selection policy in the current iteration.

Type: Grant

Filed: December 12, 2019

Date of Patent: July 21, 2020

Assignee: Alibaba Group Holding Limited

Inventors: Hui Li, Kailiang Hu, Le Song
Determining action selection policies of an execution device

Patent number: 10675537

Abstract: Disclosed herein are methods, systems, and apparatus for generating an action selection policy for a software-implemented application that performs actions in an environment that includes an execution device supported by the application and one or more other devices. One method includes, for each action among possible actions in a state of the execution device in a current iteration, obtaining a regret value of the action in the state of the execution device in a previous iteration; and computing a parameterized regret value of the action in the state of the execution device in the previous iteration; determining a respective normalized regret value for each of the possible actions in the previous iteration; determining, from the normalized regret values, an action selection policy of the action in the state of the execution device; and controlling operations of the execution device according to the action selection policy.

Type: Grant

Filed: December 12, 2019

Date of Patent: June 9, 2020

Assignee: Alibaba Group Holding Limited

Inventors: Hui Li, Kailiang Hu, Le Song
Strategy searching in strategic interaction between parties

Patent number: 10679125

Abstract: Disclosed herein are methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing counterfactual regret minimization (CFR) for strategy searching in strategic interaction between two or more parties. One of the methods includes: storing multiple regret samples in a first data store, wherein the multiple regret samples are obtained in two or more iterations of a CFR algorithm in strategy searching in strategic interaction between two or more parties; storing multiple strategy samples in a second data store; updating parameters of a first neural network for predicting a regret value of a possible action in a state of a party based on the multiple regret samples in the first data store; and updating parameters of a second neural network for predicting a strategy value of a possible action in a state of the party based on the multiple strategy samples in the second data store.

Type: Grant

Filed: June 21, 2019

Date of Patent: June 9, 2020

Assignee: Alibaba Group Holding Limited

Inventors: Hui Li, Kailiang Hu, Le Song

Determining action selection policies of an execution device

Sampling schemes for strategy searching in strategic interaction between parties

SAMPLING SCHEMES FOR STRATEGY SEARCHING IN STRATEGIC INTERACTION BETWEEN PARTIES

Determining action selection policies of an execution device

Determining action selection policies of an execution device

Strategy searching in strategic interaction between parties