Patents by Inventor Kailiang Hu

Kailiang Hu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10789810
    Abstract: Disclosed herein are methods, systems, and apparatus for generating an action selection policy (ASP) of an execution device. One method includes, in a current iteration, computing a first reward for a current state based on respective first rewards for actions in the current state and an ASP of the current state in the current iteration; computing an accumulative respective regret value of each action in the current state based on a difference between the respective first reward for the action and the first reward for the current state; computing an ASP of the current state in the next iteration; computing a second reward for the current state based on the respective first rewards for the actions and the ASP of the current state in the next iteration; and determining an ASP of the previous state in the next iteration based on the second reward for the current state.
    Type: Grant
    Filed: December 12, 2019
    Date of Patent: September 29, 2020
    Assignee: Alibaba Group Holding Limited
    Inventors: Hui Li, Kailiang Hu, Le Song
  • Patent number: 10769544
    Abstract: Disclosed herein are methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing counterfactual regret minimization (CFR) for strategy searching in strategic interaction between parties. One of the methods includes: identifying N1 possible actions of a first party in a first state of the first party; sampling a possible action out of the N1 possible actions in the first state of the first party with a first sampling probability; identifying N2 possible actions of the first party in a second state of the first party, wherein the first state of the first party is closer to a beginning state of the IIG than the second state of the first party; sampling a possible action out of the N2 possible actions in the second state of the first party with a second sampling probability, wherein the first sampling probability is less than the second sampling probability.
    Type: Grant
    Filed: June 21, 2019
    Date of Patent: September 8, 2020
    Assignee: Alibaba Group Holding Limited
    Inventors: Hui Li, Kailiang Hu, Le Song
  • Publication number: 20200234164
    Abstract: Disclosed herein are methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing counterfactual regret minimization (CFR) for strategy searching in strategic interaction between parties. One of the methods includes: identifying N1 possible actions of a first party in a first state of the first party; sampling a possible action out of the N1 possible actions in the first state of the first party with a first sampling probability; identifying N2 possible actions of the first party in a second state of the first party, wherein the first state of the first party is closer to a beginning state of the IIG than the second state of the first party; sampling a possible action out of the N2 possible actions in the second state of the first party with a second sampling probability, wherein the first sampling probability is less than the second sampling probability.
    Type: Application
    Filed: June 21, 2019
    Publication date: July 23, 2020
    Applicant: Alibaba Group Holding Limited
    Inventors: Hui Li, Kailiang Hu, Le Song
  • Patent number: 10719358
    Abstract: Disclosed herein are methods, systems, and apparatus of an execution device for generating an action selection policy for completing a task in an environment that includes the execution device and one or more other devices. One method includes: in a current iteration, identifying an iterative action selection policy of an action in a state of the execution device in a previous iteration; computing a regret value in the previous iteration based on the iterative action selection policy in the previous iteration; computing an incremental action selection policy in the current iteration based on the regret value in the previous iteration but not any regret value in any iteration prior to the previous iteration; computing an iterative action selection policy in the current iteration based on the iterative action selection policy in the previous iteration and the incremental action selection policy in the current iteration.
    Type: Grant
    Filed: December 12, 2019
    Date of Patent: July 21, 2020
    Assignee: Alibaba Group Holding Limited
    Inventors: Hui Li, Kailiang Hu, Le Song
  • Patent number: 10675537
    Abstract: Disclosed herein are methods, systems, and apparatus for generating an action selection policy for a software-implemented application that performs actions in an environment that includes an execution device supported by the application and one or more other devices. One method includes, for each action among possible actions in a state of the execution device in a current iteration, obtaining a regret value of the action in the state of the execution device in a previous iteration; and computing a parameterized regret value of the action in the state of the execution device in the previous iteration; determining a respective normalized regret value for each of the possible actions in the previous iteration; determining, from the normalized regret values, an action selection policy of the action in the state of the execution device; and controlling operations of the execution device according to the action selection policy.
    Type: Grant
    Filed: December 12, 2019
    Date of Patent: June 9, 2020
    Assignee: Alibaba Group Holding Limited
    Inventors: Hui Li, Kailiang Hu, Le Song
  • Patent number: 10679125
    Abstract: Disclosed herein are methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing counterfactual regret minimization (CFR) for strategy searching in strategic interaction between two or more parties. One of the methods includes: storing multiple regret samples in a first data store, wherein the multiple regret samples are obtained in two or more iterations of a CFR algorithm in strategy searching in strategic interaction between two or more parties; storing multiple strategy samples in a second data store; updating parameters of a first neural network for predicting a regret value of a possible action in a state of a party based on the multiple regret samples in the first data store; and updating parameters of a second neural network for predicting a strategy value of a possible action in a state of the party based on the multiple strategy samples in the second data store.
    Type: Grant
    Filed: June 21, 2019
    Date of Patent: June 9, 2020
    Assignee: Alibaba Group Holding Limited
    Inventors: Hui Li, Kailiang Hu, Le Song