Patents by Inventor Tu-Hoa Pham

Tu-Hoa Pham has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11823039
    Abstract: According to an aspect of the present invention, a computer-implemented method is provided for reinforcement learning. The method includes reading, by a processor device, an action manifold which is described as a n-polytope, at least one physical action limit, and at least one safety constraint. The method further includes updating, by the processor device, the action manifold based on the at least one physical action limit and the at least one safety constraint. The method also includes performing, by the processor device, the reinforcement learning by selecting a constrained action from among a set of constrained actions in the action manifold.
    Type: Grant
    Filed: August 24, 2018
    Date of Patent: November 21, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Giovanni De Magistris, Tu-Hoa Pham, Asim Munawar, Ryuki Tachibana
  • Patent number: 11734575
    Abstract: A computer-implemented method, computer program product, and computer processing system are provided for Hierarchical Reinforcement Learning (HRL) with a target task. The method includes obtaining, by a processor device, a sequence of tasks based on hierarchical relations between the tasks, the tasks constituting the target task. The method further includes learning, by a processor device, a sequence of constraints corresponding to the sequence of tasks by repeating, for each of the tasks in the sequence, reinforcement learning and supervised learning with a set of good samples and a set of bad samples and by applying an obtained constraint for a current task to a next task.
    Type: Grant
    Filed: July 30, 2018
    Date of Patent: August 22, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Don Joven Ravoy Agravante, Giovanni De De Magistris, Tu-Hoa Pham, Ryuki Tachibana
  • Patent number: 11537872
    Abstract: A computer-implemented method, computer program product, and computer processing system are provided for obtaining a plurality of bad demonstrations. The method includes reading, by a processor device, a protagonist environment. The method further includes training, by the processor device, a plurality of antagonist agents to fail a task by reinforcement learning using the protagonist environment. The method also includes collecting, by the processor device, the plurality of bad demonstrations by playing the trained antagonist agents on the protagonist environment.
    Type: Grant
    Filed: July 30, 2018
    Date of Patent: December 27, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Tu-Hoa Pham, Giovanni De Magistris, Don Joven Ravoy Agravante, Ryuki Tachibana
  • Patent number: 11501157
    Abstract: A method is provided for reinforcement learning. The method includes obtaining, by a processor device, a first set and a second set of state-action tuples. Each of the state-action tuples in the first set represents a respective good demonstration. Each of the state-action tuples in the second set represents a respective bad demonstration. The method further includes training, by the processor device using supervised learning with the first set and the second set, a neural network which takes as input a state to provide an output. The output is parameterized to obtain each of a plurality of real-valued constraint functions used for evaluation of each of a plurality of action constraints. The method also includes training, by the processor device, a policy using reinforcement learning by restricting actions predicted by the policy according to each of the plurality of action constraints with each of the plurality of real-valued constraint functions.
    Type: Grant
    Filed: July 30, 2018
    Date of Patent: November 15, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Tu-Hoa Pham, Don Joven Ravoy Agravante, Giovanni De Magistris, Ryuki Tachibana
  • Patent number: 11468310
    Abstract: A computer-implemented method, computer program product, and system are provided for deep reinforcement learning to control a subject device. The method includes training, by a processor, a neural network to receive state information of a target of the subject device as an input and provide action information for the target as an output. The method further includes inputting, by the processor, current state information of the target into the neural network to obtain current action information for the target. The method also includes correcting, by the processor, the current action information minimally to obtain corrected action information that meets a set of constraints. The method additionally includes performing an action by the subject device based on the corrected action information for the target to obtain a reward from the target.
    Type: Grant
    Filed: March 7, 2018
    Date of Patent: October 11, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Tu-Hoa Pham, Giovanni De Magistris, Ryuki Tachibana
  • Publication number: 20200065666
    Abstract: According to an aspect of the present invention, a computer-implemented method is provided for reinforcement learning. The method includes reading, by a processor device, an action manifold which is described as a n-polytope, at least one physical action limit, and at least one safety constraint. The method further includes updating, by the processor device, the action manifold based on the at least one physical action limit and the at least one safety constraint. The method also includes performing, by the processor device, the reinforcement learning by selecting a constrained action from among a set of constrained actions in the action manifold.
    Type: Application
    Filed: August 24, 2018
    Publication date: February 27, 2020
    Inventors: Giovanni De Magistris, Tu-Hoa Pham, Asim Munawar, Ryuki Tachibana
  • Publication number: 20200034704
    Abstract: A computer-implemented method, computer program product, and computer processing system are provided for Hierarchical Reinforcement Learning (HRL) with a target task. The method includes obtaining, by a processor device, a sequence of tasks based on hierarchical relations between the tasks, the tasks constituting the target task. The method further includes learning, by a processor device, a sequence of constraints corresponding to the sequence of tasks by repeating, for each of the tasks in the sequence, reinforcement learning and supervised learning with a set of good samples and a set of bad samples and by applying an obtained constraint for a current task to a next task.
    Type: Application
    Filed: July 30, 2018
    Publication date: January 30, 2020
    Inventors: Don Joven Ravoy Agravante, Giovanni De De Magistris, Tu-Hoa Pham, Ryuki Tachibana
  • Publication number: 20200034706
    Abstract: A computer-implemented method, computer program product, and computer processing system are provided for obtaining a plurality of bad demonstrations. The method includes reading, by a processor device, a protagonist environment. The method further includes training, by the processor device, a plurality of antagonist agents to fail a task by reinforcement learning using the protagonist environment. The method also includes collecting, by the processor device, the plurality of bad demonstrations by playing the trained antagonist agents on the protagonist environment.
    Type: Application
    Filed: July 30, 2018
    Publication date: January 30, 2020
    Inventors: Tu-Hoa Pham, Giovanni De Magistris, Don Joven Ravoy Agravante, Ryuki Tachibana
  • Publication number: 20200034705
    Abstract: A method is provided for reinforcement learning. The method includes obtaining, by a processor device, a first set and a second set of state-action tuples. Each of the state-action tuples in the first set represents a respective good demonstration. Each of the state-action tuples in the second set represents a respective bad demonstration. The method further includes training, by the processor device using supervised learning with the first set and the second set, a neural network which takes as input a state to provide an output. The output is parameterized to obtain each of a plurality of real-valued constraint functions used for evaluation of each of a plurality of action constraints. The method also includes training, by the processor device, a policy using reinforcement learning by restricting actions predicted by the policy according to each of the plurality of action constraints with each of the plurality of real-valued constraint functions.
    Type: Application
    Filed: July 30, 2018
    Publication date: January 30, 2020
    Inventors: Tu-Hoa Pham, Don Joven Ravoy Agravante, Giovanni De Magistris, Ryuki Tachibana
  • Publication number: 20190279081
    Abstract: A computer-implemented method, computer program product, and system are provided for deep reinforcement learning to control a subject device. The method includes training, by a processor, a neural network to receive state information of a target of the subject device as an input and provide action information for the target as an output. The method further includes inputting, by the processor, current state information of the target into the neural network to obtain current action information for the target. The method also includes correcting, by the processor, the current action information minimally to obtain corrected action information that meets a set of constraints. The method additionally includes performing an action by the subject device based on the corrected action information for the target to obtain a reward from the target.
    Type: Application
    Filed: March 7, 2018
    Publication date: September 12, 2019
    Inventors: Tu-Hoa Pham, Giovanni De Magistris, Ryuki Tachibana