Patents by Inventor Akifumi Wachi

Akifumi Wachi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230267365
    Abstract: Computer hardware and/or software for generating training curricula for a plurality of reinforcement learning control agents, the hardware and/or software performing the following operations: (i) obtaining system data describing at least one operating parameter of a system based, at least in part, on at least one of a plurality of reinforcement learning control agents failing to satisfy a control criterion for the system; (ii) generating a set of training curricula based, at least in part, on at least one operating parameter of the system and at least one training policy for the plurality of reinforcement learning control agents; and (iii) communicating the set of training curricula to the plurality of reinforcement learning control agents.
    Type: Application
    Filed: February 23, 2022
    Publication date: August 24, 2023
    Inventors: Lan Ngoc HOANG, Alexander ZADOROJNIY, Akifumi WACHI
  • Patent number: 11715016
    Abstract: A computer-implemented method, computer program product, and computer processing system are provided for generating an adversarial input. The method includes reducing, by a Conditional Variational Encoder, a dimensionality of each of inputs to a target algorithm to obtain a set of latent variables. The method further includes separately training, by a processor, (i) a successful predictor with a first subset of the latent variables as a first input for which the target algorithm succeeds and (ii) an unsuccessful predictor with a second subset of the latent variables as a second input for which the target algorithm fails. Both the successful and the unsuccessful predictors predict outputs of the target algorithm. The method also includes sampling, by the processor, an input that is likely to make the target algorithm fail as the adversarial input by using a likelihood of the successful predictor and the unsuccessful predictor.
    Type: Grant
    Filed: March 15, 2019
    Date of Patent: August 1, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Akifumi Wachi
  • Publication number: 20230185881
    Abstract: A computer-implemented method is provided for offline reinforcement learning with a dataset. The method includes training a neural network which inputs a state-action pair and outputs a respective Q function for each of a reward and one or more safety constraints, respectively. The neural network has a linear output layer and remaining non-linear layers being represented by a feature mapping function. The training includes obtaining the feature mapping function by constructing Q-functions based on the dataset according to an offline reinforcement algorithm. The training further includes tuning, using the feature mapping function, a weight between the reward and the one or more safety constraints, wherein during the obtaining and the tuning steps, an estimate of a Q-function is provided by subtracting an uncertainty from an expected value of the Q-function. The uncertainty is a function to map the state-action pair to an error size.
    Type: Application
    Filed: December 15, 2021
    Publication date: June 15, 2023
    Inventors: Akifumi Wachi, Takayuki Osogami
  • Publication number: 20230143937
    Abstract: Methods and systems for training a model and automated motion include learning Markov decision processes using reinforcement learning in respective training environments. Logic rules are extracted from the Markov decision processes. T reward logic neural network (LNN) and a safety LNN are trained using the logic rules extracted from the Markov decision processes. The reward LNN and the safety LNN each take a state-action pair as an input and output a corresponding score for the state-action pair.
    Type: Application
    Filed: November 10, 2021
    Publication date: May 11, 2023
    Inventors: Akifumi Wachi, Songtao Lu
  • Patent number: 11443130
    Abstract: Making failure scenarios using adversarial reinforcement learning is performed by storing, in a first storage, a variety of first experiences of failures of a player agent due to an adversarial agent, and performing a simulation of an environment including the player agent and the adversarial agent. It also includes calculating a similarity of a second experience of a failure of the player agent in the simulation and each of the variety of first experiences in the first storage, and updating the first storage by adding the second experience as a new first experience of the variety of first experiences in response to the similarity being less than a threshold. Additionally, the use of adversarial reinforcement learning can include training the adversarial agent by using at least one of the plurality of first experiences in the first storage to generate an adversarial agent having diverse experiences.
    Type: Grant
    Filed: August 30, 2019
    Date of Patent: September 13, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Akifumi Wachi
  • Patent number: 11409958
    Abstract: Methods and systems for performing a language processing task include setting an angular coordinate for a vector representation of each of a set of words, based on similarity of the words to one another. A radial coordinate is set for the vector representation of each word, according to hierarchical relationships between the words. A language processing task is performed based on hierarchical word relationships using the vector representations of the words.
    Type: Grant
    Filed: September 25, 2020
    Date of Patent: August 9, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Ran Iwamoto, Ryosuke Kohita, Akifumi Wachi
  • Publication number: 20220180166
    Abstract: Next state prediction technology that performs the following computer based operations: receiving state information that includes information indicative of a current state of an environment; processing the state information to predict a future state of the environment, with the processing being performed by a hybrid computer system that includes both of the following: (i) neural network software module(s) that include machine learning functionality, and (ii) symbolic rule based software modules; and using the prediction of the next state of the environment as an input with respect to taking a further action (for example, activating a hardware device or effecting a communication to a human or another device).
    Type: Application
    Filed: December 3, 2020
    Publication date: June 9, 2022
    Inventors: Akifumi Wachi, Ryosuke Kohita, Daiki Kimura
  • Publication number: 20220164647
    Abstract: A method for action pruning in Reinforcement Learning receives a current state of an environment. The method evaluates, using a Logical Neural Network (LNN) structure, a logical inference based on the current state. The method outputs upper and lower bounds on each action from a set of possible actions of an agent in the environment, responsive to an evaluation of the logical inference. The method calculates, for each pair of a possible action of the agent in the environment and the current state, a probability by using the upper and lower bounds. Each of calculated probabilities indicates a respective priority ratio for the each action. The method obtains a policy in Reinforcement Learning for the current state by using the calculated probabilities. The method prunes one or more actions from the set of actions as being in violation of the policy such that the one or more actions are ignored.
    Type: Application
    Filed: November 24, 2020
    Publication date: May 26, 2022
    Inventors: Daiki Kimura, Akifumi Wachi, Subhajit Chaudhury, Ryosuke Kohita, Asim Munawar, Michiaki Tatsubori
  • Publication number: 20220164668
    Abstract: A method for safe reinforcement learning receives an action and a current state of an environment. The method evaluates, using a Logical Neural Network (LNN) structure, an action safetyness logical inference based on the current state of an environment and a current action candidate from an agent. The method outputs upper and lower bounds on the action, responsive to an evaluation of the action safetyness logical inference. The method calculates a contradiction value for the action by using the upper and lower bounds. The contradiction value indicates a level of contradiction for each of a plurality of logic rules implemented by the LNN structure. The method evaluates the action L with respect to safetyness based on the contradiction value. The method selectively performs the action responsive to an evaluation of the action indicating that the action is safe to perform based on the contradiction value exceeding a safetyness threshold.
    Type: Application
    Filed: November 24, 2020
    Publication date: May 26, 2022
    Inventors: Daiki Kimura, Akifumi Wachi, Subhajit Chaudhury, Ryosuke Kohita, Asim Munawar, Michiaki Tatsubori
  • Publication number: 20220129637
    Abstract: A computer identifies, within a task description, words that correspond to semantic element labels for the task. The computer receives, from a task source operatively connected with the computer, a textual description of a task. The computer receives semantic element labels, element identification rules, and at least one reference sentence showing natural language semantic element label use. The computer parses the description to generate words for the semantic element label to generate, a Rule Match Values based on the element identification rules for the parsed words. The computer collects words having RMVs above a threshold into sets of associated of candidate words and generates, using a neural network trained on the reference sentence, Match Likelihood Values (MLVs) indicating whether the candidate words represent a semantic element label with which the candidate word is associated. The computer selects to represent the semantic element, the associated candidate word having a highest MLV.
    Type: Application
    Filed: October 23, 2020
    Publication date: April 28, 2022
    Inventors: Ryosuke Kohita, Akifumi Wachi, Daiki Kimura
  • Patent number: 11294945
    Abstract: A computer-implemented method is presented for performing Q-learning with language model for unsupervised text summarization. The method includes mapping each word of a sentence into a vector by using word embedding via a deep learning natural language processing model, assigning each of the words to an action and operation status, determining, for each of the words whose operation status represents “unoperated,” a status by calculating a local encoding and a global encoding, and concatenating the local encoding and the global encoding, the local encoding calculated based on a vector, an action, and an operation status of the word, and the global encoding calculated based on each of the local encodings of the words in a self-attention fashion, and determining, via an editorial agent, a Q-value for each of the words in terms of each of three actions based on the status.
    Type: Grant
    Filed: May 19, 2020
    Date of Patent: April 5, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Ryosuke Kohita, Akifumi Wachi
  • Publication number: 20220100956
    Abstract: Methods and systems for performing a language processing task include setting an angular coordinate for a vector representation of each of a set of words, based on similarity of the words to one another. A radial coordinate is set for the vector representation of each word, according to hierarchical relationships between the words. A language processing task is performed based on hierarchical word relationships using the vector representations of the words.
    Type: Application
    Filed: September 25, 2020
    Publication date: March 31, 2022
    Inventors: Ran Iwamoto, Ryosuke Kohita, Akifumi Wachi
  • Publication number: 20210365485
    Abstract: A computer-implemented method is presented for performing Q-learning with language model for unsupervised text summarization. The method includes mapping each word of a sentence into a vector by using word embedding via a deep learning natural language processing model, assigning each of the words to an action and operation status, determining, for each of the words whose operation status represents “unoperated,” a status by calculating a local encoding and a global encoding, and concatenating the local encoding and the global encoding, the local encoding calculated based on a vector, an action, and an operation status of the word, and the global encoding calculated based on each of the local encodings of the words in a self-attention fashion, and determining, via an editorial agent, a Q-value for each of the words in terms of each of three actions based on the status.
    Type: Application
    Filed: May 19, 2020
    Publication date: November 25, 2021
    Inventors: Ryosuke Kohita, Akifumi Wachi
  • Publication number: 20210064915
    Abstract: Making failure scenarios using adversarial reinforcement learning is performed by storing, in a first storage, a variety of first experiences of failures of a player agent due to an adversarial agent, and performing a simulation of an environment including the player agent and the adversarial agent. It also includes calculating a similarity of a second experience of a failure of the player agent in the simulation and each of the variety of first experiences in the first storage, and updating the first storage by adding the second experience as a new first experience of the variety of first experiences in response to the similarity being less than a threshold. Additionally, the use of adversarial reinforcement learning can include training the adversarial agent by using at least one of the plurality of first experiences in the first storage to generate an adversarial agent having diverse experiences.
    Type: Application
    Filed: August 30, 2019
    Publication date: March 4, 2021
    Inventor: Akifumi Wachi
  • Publication number: 20200293901
    Abstract: A computer-implemented method, computer program product, and computer processing system are provided for generating an adversarial input. The method includes reducing, by a Conditional Variational Encoder, a dimensionality of each of inputs to a target algorithm to obtain a set of latent variables. The method further includes separately training, by a processor, (i) a successful predictor with a first subset of the latent variables as a first input for which the target algorithm succeeds and (ii) an unsuccessful predictor with a second subset of the latent variables as a second input for which the target algorithm fails. Both the successful and the unsuccessful predictors predict outputs of the target algorithm. The method also includes sampling, by the processor, an input that is likely to make the target algorithm fail as the adversarial input by using a likelihood of the successful predictor and the unsuccessful predictor.
    Type: Application
    Filed: March 15, 2019
    Publication date: September 17, 2020
    Inventor: Akifumi Wachi