Patents by Inventor Tu-Hoa Pham

Tu-Hoa Pham has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Safe and fast exploration for reinforcement learning using constrained action manifolds

Patent number: 11823039

Abstract: According to an aspect of the present invention, a computer-implemented method is provided for reinforcement learning. The method includes reading, by a processor device, an action manifold which is described as a n-polytope, at least one physical action limit, and at least one safety constraint. The method further includes updating, by the processor device, the action manifold based on the at least one physical action limit and the at least one safety constraint. The method also includes performing, by the processor device, the reinforcement learning by selecting a constrained action from among a set of constrained actions in the action manifold.

Type: Grant

Filed: August 24, 2018

Date of Patent: November 21, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Giovanni De Magistris, Tu-Hoa Pham, Asim Munawar, Ryuki Tachibana
Sequential learning of constraints for hierarchical reinforcement learning

Patent number: 11734575

Abstract: A computer-implemented method, computer program product, and computer processing system are provided for Hierarchical Reinforcement Learning (HRL) with a target task. The method includes obtaining, by a processor device, a sequence of tasks based on hierarchical relations between the tasks, the tasks constituting the target task. The method further includes learning, by a processor device, a sequence of constraints corresponding to the sequence of tasks by repeating, for each of the tasks in the sequence, reinforcement learning and supervised learning with a set of good samples and a set of bad samples and by applying an obtained constraint for a current task to a next task.

Type: Grant

Filed: July 30, 2018

Date of Patent: August 22, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Don Joven Ravoy Agravante, Giovanni De De Magistris, Tu-Hoa Pham, Ryuki Tachibana
Imitation learning by action shaping with antagonist reinforcement learning

Patent number: 11537872

Abstract: A computer-implemented method, computer program product, and computer processing system are provided for obtaining a plurality of bad demonstrations. The method includes reading, by a processor device, a protagonist environment. The method further includes training, by the processor device, a plurality of antagonist agents to fail a task by reinforcement learning using the protagonist environment. The method also includes collecting, by the processor device, the plurality of bad demonstrations by playing the trained antagonist agents on the protagonist environment.

Type: Grant

Filed: July 30, 2018

Date of Patent: December 27, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Tu-Hoa Pham, Giovanni De Magistris, Don Joven Ravoy Agravante, Ryuki Tachibana
Action shaping from demonstration for fast reinforcement learning

Patent number: 11501157

Abstract: A method is provided for reinforcement learning. The method includes obtaining, by a processor device, a first set and a second set of state-action tuples. Each of the state-action tuples in the first set represents a respective good demonstration. Each of the state-action tuples in the second set represents a respective bad demonstration. The method further includes training, by the processor device using supervised learning with the first set and the second set, a neural network which takes as input a state to provide an output. The output is parameterized to obtain each of a plurality of real-valued constraint functions used for evaluation of each of a plurality of action constraints. The method also includes training, by the processor device, a policy using reinforcement learning by restricting actions predicted by the policy according to each of the plurality of action constraints with each of the plurality of real-valued constraint functions.

Type: Grant

Filed: July 30, 2018

Date of Patent: November 15, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Tu-Hoa Pham, Don Joven Ravoy Agravante, Giovanni De Magistris, Ryuki Tachibana
Constraining actions for reinforcement learning under safety requirements

Patent number: 11468310

Abstract: A computer-implemented method, computer program product, and system are provided for deep reinforcement learning to control a subject device. The method includes training, by a processor, a neural network to receive state information of a target of the subject device as an input and provide action information for the target as an output. The method further includes inputting, by the processor, current state information of the target into the neural network to obtain current action information for the target. The method also includes correcting, by the processor, the current action information minimally to obtain corrected action information that meets a set of constraints. The method additionally includes performing an action by the subject device based on the corrected action information for the target to obtain a reward from the target.

Type: Grant

Filed: March 7, 2018

Date of Patent: October 11, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Tu-Hoa Pham, Giovanni De Magistris, Ryuki Tachibana
SAFE AND FAST EXPLORATION FOR REINFORCEMENT LEARNING USING CONSTRAINED ACTION MANIFOLDS

Publication number: 20200065666

Abstract: According to an aspect of the present invention, a computer-implemented method is provided for reinforcement learning. The method includes reading, by a processor device, an action manifold which is described as a n-polytope, at least one physical action limit, and at least one safety constraint. The method further includes updating, by the processor device, the action manifold based on the at least one physical action limit and the at least one safety constraint. The method also includes performing, by the processor device, the reinforcement learning by selecting a constrained action from among a set of constrained actions in the action manifold.

Type: Application

Filed: August 24, 2018

Publication date: February 27, 2020

Inventors: Giovanni De Magistris, Tu-Hoa Pham, Asim Munawar, Ryuki Tachibana
SEQUENTIAL LEARNING OF CONSTRAINTS FOR HIERARCHICAL REINFORCEMENT LEARNING

Publication number: 20200034704

Abstract: A computer-implemented method, computer program product, and computer processing system are provided for Hierarchical Reinforcement Learning (HRL) with a target task. The method includes obtaining, by a processor device, a sequence of tasks based on hierarchical relations between the tasks, the tasks constituting the target task. The method further includes learning, by a processor device, a sequence of constraints corresponding to the sequence of tasks by repeating, for each of the tasks in the sequence, reinforcement learning and supervised learning with a set of good samples and a set of bad samples and by applying an obtained constraint for a current task to a next task.

Type: Application

Filed: July 30, 2018

Publication date: January 30, 2020

Inventors: Don Joven Ravoy Agravante, Giovanni De De Magistris, Tu-Hoa Pham, Ryuki Tachibana
IMITATION LEARNING BY ACTION SHAPING WITH ANTAGONIST REINFORCEMENT LEARNING

Publication number: 20200034706

Abstract: A computer-implemented method, computer program product, and computer processing system are provided for obtaining a plurality of bad demonstrations. The method includes reading, by a processor device, a protagonist environment. The method further includes training, by the processor device, a plurality of antagonist agents to fail a task by reinforcement learning using the protagonist environment. The method also includes collecting, by the processor device, the plurality of bad demonstrations by playing the trained antagonist agents on the protagonist environment.

Type: Application

Filed: July 30, 2018

Publication date: January 30, 2020

Inventors: Tu-Hoa Pham, Giovanni De Magistris, Don Joven Ravoy Agravante, Ryuki Tachibana
ACTION SHAPING FROM DEMONSTRATION FOR FAST REINFORCEMENT LEARNING

Publication number: 20200034705

Abstract: A method is provided for reinforcement learning. The method includes obtaining, by a processor device, a first set and a second set of state-action tuples. Each of the state-action tuples in the first set represents a respective good demonstration. Each of the state-action tuples in the second set represents a respective bad demonstration. The method further includes training, by the processor device using supervised learning with the first set and the second set, a neural network which takes as input a state to provide an output. The output is parameterized to obtain each of a plurality of real-valued constraint functions used for evaluation of each of a plurality of action constraints. The method also includes training, by the processor device, a policy using reinforcement learning by restricting actions predicted by the policy according to each of the plurality of action constraints with each of the plurality of real-valued constraint functions.

Type: Application

Filed: July 30, 2018

Publication date: January 30, 2020

Inventors: Tu-Hoa Pham, Don Joven Ravoy Agravante, Giovanni De Magistris, Ryuki Tachibana
CONSTRAINING ACTIONS FOR REINFORCEMENT LEARNING UNDER SAFETY REQUIREMENTS

Publication number: 20190279081

Abstract: A computer-implemented method, computer program product, and system are provided for deep reinforcement learning to control a subject device. The method includes training, by a processor, a neural network to receive state information of a target of the subject device as an input and provide action information for the target as an output. The method further includes inputting, by the processor, current state information of the target into the neural network to obtain current action information for the target. The method also includes correcting, by the processor, the current action information minimally to obtain corrected action information that meets a set of constraints. The method additionally includes performing an action by the subject device based on the corrected action information for the target to obtain a reward from the target.

Type: Application

Filed: March 7, 2018

Publication date: September 12, 2019

Inventors: Tu-Hoa Pham, Giovanni De Magistris, Ryuki Tachibana