Patents by Inventor Akifumi Wachi

Akifumi Wachi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

GENERATING TRAINING CURRICULA FOR A PLURALITY OF REINFORCEMENT LEARNING CONTROL AGENTS

Publication number: 20230267365

Abstract: Computer hardware and/or software for generating training curricula for a plurality of reinforcement learning control agents, the hardware and/or software performing the following operations: (i) obtaining system data describing at least one operating parameter of a system based, at least in part, on at least one of a plurality of reinforcement learning control agents failing to satisfy a control criterion for the system; (ii) generating a set of training curricula based, at least in part, on at least one operating parameter of the system and at least one training policy for the plurality of reinforcement learning control agents; and (iii) communicating the set of training curricula to the plurality of reinforcement learning control agents.

Type: Application

Filed: February 23, 2022

Publication date: August 24, 2023

Inventors: Lan Ngoc HOANG, Alexander ZADOROJNIY, Akifumi WACHI
Adversarial input generation using variational autoencoder

Patent number: 11715016

Abstract: A computer-implemented method, computer program product, and computer processing system are provided for generating an adversarial input. The method includes reducing, by a Conditional Variational Encoder, a dimensionality of each of inputs to a target algorithm to obtain a set of latent variables. The method further includes separately training, by a processor, (i) a successful predictor with a first subset of the latent variables as a first input for which the target algorithm succeeds and (ii) an unsuccessful predictor with a second subset of the latent variables as a second input for which the target algorithm fails. Both the successful and the unsuccessful predictors predict outputs of the target algorithm. The method also includes sampling, by the processor, an input that is likely to make the target algorithm fail as the adversarial input by using a likelihood of the successful predictor and the unsuccessful predictor.

Type: Grant

Filed: March 15, 2019

Date of Patent: August 1, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: Akifumi Wachi
STEPWISE UNCERTAINTY-AWARE OFFLINE REINFORCEMENT LEARNING UNDER CONSTRAINTS

Publication number: 20230185881

Abstract: A computer-implemented method is provided for offline reinforcement learning with a dataset. The method includes training a neural network which inputs a state-action pair and outputs a respective Q function for each of a reward and one or more safety constraints, respectively. The neural network has a linear output layer and remaining non-linear layers being represented by a feature mapping function. The training includes obtaining the feature mapping function by constructing Q-functions based on the dataset according to an offline reinforcement algorithm. The training further includes tuning, using the feature mapping function, a weight between the reward and the one or more safety constraints, wherein during the obtaining and the tuning steps, an estimate of a Q-function is provided by subtracting an uncertainty from an expected value of the Q-function. The uncertainty is a function to map the state-action pair to an error size.

Type: Application

Filed: December 15, 2021

Publication date: June 15, 2023

Inventors: Akifumi Wachi, Takayuki Osogami
REINFORCEMENT LEARNING WITH INDUCTIVE LOGIC PROGRAMMING

Publication number: 20230143937

Abstract: Methods and systems for training a model and automated motion include learning Markov decision processes using reinforcement learning in respective training environments. Logic rules are extracted from the Markov decision processes. T reward logic neural network (LNN) and a safety LNN are trained using the logic rules extracted from the Markov decision processes. The reward LNN and the safety LNN each take a state-action pair as an input and output a corresponding score for the state-action pair.

Type: Application

Filed: November 10, 2021

Publication date: May 11, 2023

Inventors: Akifumi Wachi, Songtao Lu
Making a failure scenario using adversarial reinforcement learning background

Patent number: 11443130

Abstract: Making failure scenarios using adversarial reinforcement learning is performed by storing, in a first storage, a variety of first experiences of failures of a player agent due to an adversarial agent, and performing a simulation of an environment including the player agent and the adversarial agent. It also includes calculating a similarity of a second experience of a failure of the player agent in the simulation and each of the variety of first experiences in the first storage, and updating the first storage by adding the second experience as a new first experience of the variety of first experiences in response to the similarity being less than a threshold. Additionally, the use of adversarial reinforcement learning can include training the adversarial agent by using at least one of the plurality of first experiences in the first storage to generate an adversarial agent having diverse experiences.

Type: Grant

Filed: August 30, 2019

Date of Patent: September 13, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: Akifumi Wachi
Polar word embedding

Patent number: 11409958

Abstract: Methods and systems for performing a language processing task include setting an angular coordinate for a vector representation of each of a set of words, based on similarity of the words to one another. A radial coordinate is set for the vector representation of each word, according to hierarchical relationships between the words. A language processing task is performed based on hierarchical word relationships using the vector representations of the words.

Type: Grant

Filed: September 25, 2020

Date of Patent: August 9, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Ran Iwamoto, Ryosuke Kohita, Akifumi Wachi
NEURO-SYMBOLIC NEXT STATE PREDICTION BASED ON MULTIPLE POTENTIAL CAUSES

Publication number: 20220180166

Abstract: Next state prediction technology that performs the following computer based operations: receiving state information that includes information indicative of a current state of an environment; processing the state information to predict a future state of the environment, with the processing being performed by a hybrid computer system that includes both of the following: (i) neural network software module(s) that include machine learning functionality, and (ii) symbolic rule based software modules; and using the prediction of the next state of the environment as an input with respect to taking a further action (for example, activating a hardware device or effecting a communication to a human or another device).

Type: Application

Filed: December 3, 2020

Publication date: June 9, 2022

Inventors: Akifumi Wachi, Ryosuke Kohita, Daiki Kimura
ACTION PRUNING BY LOGICAL NEURAL NETWORK

Publication number: 20220164647

Abstract: A method for action pruning in Reinforcement Learning receives a current state of an environment. The method evaluates, using a Logical Neural Network (LNN) structure, a logical inference based on the current state. The method outputs upper and lower bounds on each action from a set of possible actions of an agent in the environment, responsive to an evaluation of the logical inference. The method calculates, for each pair of a possible action of the agent in the environment and the current state, a probability by using the upper and lower bounds. Each of calculated probabilities indicates a respective priority ratio for the each action. The method obtains a policy in Reinforcement Learning for the current state by using the calculated probabilities. The method prunes one or more actions from the set of actions as being in violation of the policy such that the one or more actions are ignored.

Type: Application

Filed: November 24, 2020

Publication date: May 26, 2022

Inventors: Daiki Kimura, Akifumi Wachi, Subhajit Chaudhury, Ryosuke Kohita, Asim Munawar, Michiaki Tatsubori
SAFE REINFORCEMENT LEARNING BY LOGICAL NEURAL NETWORK

Publication number: 20220164668

Abstract: A method for safe reinforcement learning receives an action and a current state of an environment. The method evaluates, using a Logical Neural Network (LNN) structure, an action safetyness logical inference based on the current state of an environment and a current action candidate from an agent. The method outputs upper and lower bounds on the action, responsive to an evaluation of the action safetyness logical inference. The method calculates a contradiction value for the action by using the upper and lower bounds. The contradiction value indicates a level of contradiction for each of a plurality of logic rules implemented by the LNN structure. The method evaluates the action L with respect to safetyness based on the contradiction value. The method selectively performs the action responsive to an evaluation of the action indicating that the action is safe to perform based on the contradiction value exceeding a safetyness threshold.

Type: Application

Filed: November 24, 2020

Publication date: May 26, 2022

Inventors: Daiki Kimura, Akifumi Wachi, Subhajit Chaudhury, Ryosuke Kohita, Asim Munawar, Michiaki Tatsubori
COMPUTERIZED SELECTION OF SEMANTIC FRAME ELEMENTS FROM TEXTUAL TASK DESCRIPTIONS

Publication number: 20220129637

Abstract: A computer identifies, within a task description, words that correspond to semantic element labels for the task. The computer receives, from a task source operatively connected with the computer, a textual description of a task. The computer receives semantic element labels, element identification rules, and at least one reference sentence showing natural language semantic element label use. The computer parses the description to generate words for the semantic element label to generate, a Rule Match Values based on the element identification rules for the parsed words. The computer collects words having RMVs above a threshold into sets of associated of candidate words and generates, using a neural network trained on the reference sentence, Match Likelihood Values (MLVs) indicating whether the candidate words represent a semantic element label with which the candidate word is associated. The computer selects to represent the semantic element, the associated candidate word having a highest MLV.

Type: Application

Filed: October 23, 2020

Publication date: April 28, 2022

Inventors: Ryosuke Kohita, Akifumi Wachi, Daiki Kimura
Unsupervised text summarization with reinforcement learning

Patent number: 11294945

Abstract: A computer-implemented method is presented for performing Q-learning with language model for unsupervised text summarization. The method includes mapping each word of a sentence into a vector by using word embedding via a deep learning natural language processing model, assigning each of the words to an action and operation status, determining, for each of the words whose operation status represents “unoperated,” a status by calculating a local encoding and a global encoding, and concatenating the local encoding and the global encoding, the local encoding calculated based on a vector, an action, and an operation status of the word, and the global encoding calculated based on each of the local encodings of the words in a self-attention fashion, and determining, via an editorial agent, a Q-value for each of the words in terms of each of three actions based on the status.

Type: Grant

Filed: May 19, 2020

Date of Patent: April 5, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Ryosuke Kohita, Akifumi Wachi
POLAR WORD EMBEDDING

Publication number: 20220100956

Abstract: Methods and systems for performing a language processing task include setting an angular coordinate for a vector representation of each of a set of words, based on similarity of the words to one another. A radial coordinate is set for the vector representation of each word, according to hierarchical relationships between the words. A language processing task is performed based on hierarchical word relationships using the vector representations of the words.

Type: Application

Filed: September 25, 2020

Publication date: March 31, 2022

Inventors: Ran Iwamoto, Ryosuke Kohita, Akifumi Wachi
UNSUPERVISED TEXT SUMMARIZATION WITH REINFORCEMENT LEARNING

Publication number: 20210365485

Abstract: A computer-implemented method is presented for performing Q-learning with language model for unsupervised text summarization. The method includes mapping each word of a sentence into a vector by using word embedding via a deep learning natural language processing model, assigning each of the words to an action and operation status, determining, for each of the words whose operation status represents “unoperated,” a status by calculating a local encoding and a global encoding, and concatenating the local encoding and the global encoding, the local encoding calculated based on a vector, an action, and an operation status of the word, and the global encoding calculated based on each of the local encodings of the words in a self-attention fashion, and determining, via an editorial agent, a Q-value for each of the words in terms of each of three actions based on the status.

Type: Application

Filed: May 19, 2020

Publication date: November 25, 2021

Inventors: Ryosuke Kohita, Akifumi Wachi
MAKING A FAILURE SCENARIO USING ADVERSARIAL REINFORCEMENT LEARNING BACKGROUND

Publication number: 20210064915

Abstract: Making failure scenarios using adversarial reinforcement learning is performed by storing, in a first storage, a variety of first experiences of failures of a player agent due to an adversarial agent, and performing a simulation of an environment including the player agent and the adversarial agent. It also includes calculating a similarity of a second experience of a failure of the player agent in the simulation and each of the variety of first experiences in the first storage, and updating the first storage by adding the second experience as a new first experience of the variety of first experiences in response to the similarity being less than a threshold. Additionally, the use of adversarial reinforcement learning can include training the adversarial agent by using at least one of the plurality of first experiences in the first storage to generate an adversarial agent having diverse experiences.

Type: Application

Filed: August 30, 2019

Publication date: March 4, 2021

Inventor: Akifumi Wachi
ADVERSARIAL INPUT GENERATION USING VARIATIONAL AUTOENCODER

Publication number: 20200293901

Abstract: A computer-implemented method, computer program product, and computer processing system are provided for generating an adversarial input. The method includes reducing, by a Conditional Variational Encoder, a dimensionality of each of inputs to a target algorithm to obtain a set of latent variables. The method further includes separately training, by a processor, (i) a successful predictor with a first subset of the latent variables as a first input for which the target algorithm succeeds and (ii) an unsuccessful predictor with a second subset of the latent variables as a second input for which the target algorithm fails. Both the successful and the unsuccessful predictors predict outputs of the target algorithm. The method also includes sampling, by the processor, an input that is likely to make the target algorithm fail as the adversarial input by using a likelihood of the successful predictor and the unsuccessful predictor.

Type: Application

Filed: March 15, 2019

Publication date: September 17, 2020

Inventor: Akifumi Wachi