Search Patents
-
Publication number: 20220100956Abstract: Methods and systems for performing a language processing task include setting an angular coordinate for a vector representation of each of a set of words, based on similarity of the words to one another. A radial coordinate is set for the vector representation of each word, according to hierarchical relationships between the words. A language processing task is performed based on hierarchical word relationships using the vector representations of the words.Type: ApplicationFiled: September 25, 2020Publication date: March 31, 2022Inventors: Ran Iwamoto, Ryosuke Kohita, Akifumi Wachi
-
Patent number: 11409958Abstract: Methods and systems for performing a language processing task include setting an angular coordinate for a vector representation of each of a set of words, based on similarity of the words to one another. A radial coordinate is set for the vector representation of each word, according to hierarchical relationships between the words. A language processing task is performed based on hierarchical word relationships using the vector representations of the words.Type: GrantFiled: September 25, 2020Date of Patent: August 9, 2022Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Ran Iwamoto, Ryosuke Kohita, Akifumi Wachi
-
Publication number: 20230143937Abstract: Methods and systems for training a model and automated motion include learning Markov decision processes using reinforcement learning in respective training environments. Logic rules are extracted from the Markov decision processes. T reward logic neural network (LNN) and a safety LNN are trained using the logic rules extracted from the Markov decision processes. The reward LNN and the safety LNN each take a state-action pair as an input and output a corresponding score for the state-action pair.Type: ApplicationFiled: November 10, 2021Publication date: May 11, 2023Inventors: Akifumi Wachi, Songtao Lu
-
Publication number: 20230267365Abstract: Computer hardware and/or software for generating training curricula for a plurality of reinforcement learning control agents, the hardware and/or software performing the following operations: (i) obtaining system data describing at least one operating parameter of a system based, at least in part, on at least one of a plurality of reinforcement learning control agents failing to satisfy a control criterion for the system; (ii) generating a set of training curricula based, at least in part, on at least one operating parameter of the system and at least one training policy for the plurality of reinforcement learning control agents; and (iii) communicating the set of training curricula to the plurality of reinforcement learning control agents.Type: ApplicationFiled: February 23, 2022Publication date: August 24, 2023Inventors: Lan Ngoc HOANG, Alexander ZADOROJNIY, Akifumi WACHI
-
Publication number: 20220180166Abstract: Next state prediction technology that performs the following computer based operations: receiving state information that includes information indicative of a current state of an environment; processing the state information to predict a future state of the environment, with the processing being performed by a hybrid computer system that includes both of the following: (i) neural network software module(s) that include machine learning functionality, and (ii) symbolic rule based software modules; and using the prediction of the next state of the environment as an input with respect to taking a further action (for example, activating a hardware device or effecting a communication to a human or another device).Type: ApplicationFiled: December 3, 2020Publication date: June 9, 2022Inventors: Akifumi Wachi, Ryosuke Kohita, Daiki Kimura
-
Publication number: 20210064915Abstract: Making failure scenarios using adversarial reinforcement learning is performed by storing, in a first storage, a variety of first experiences of failures of a player agent due to an adversarial agent, and performing a simulation of an environment including the player agent and the adversarial agent. It also includes calculating a similarity of a second experience of a failure of the player agent in the simulation and each of the variety of first experiences in the first storage, and updating the first storage by adding the second experience as a new first experience of the variety of first experiences in response to the similarity being less than a threshold. Additionally, the use of adversarial reinforcement learning can include training the adversarial agent by using at least one of the plurality of first experiences in the first storage to generate an adversarial agent having diverse experiences.Type: ApplicationFiled: August 30, 2019Publication date: March 4, 2021Inventor: Akifumi Wachi
-
Publication number: 20200293901Abstract: A computer-implemented method, computer program product, and computer processing system are provided for generating an adversarial input. The method includes reducing, by a Conditional Variational Encoder, a dimensionality of each of inputs to a target algorithm to obtain a set of latent variables. The method further includes separately training, by a processor, (i) a successful predictor with a first subset of the latent variables as a first input for which the target algorithm succeeds and (ii) an unsuccessful predictor with a second subset of the latent variables as a second input for which the target algorithm fails. Both the successful and the unsuccessful predictors predict outputs of the target algorithm. The method also includes sampling, by the processor, an input that is likely to make the target algorithm fail as the adversarial input by using a likelihood of the successful predictor and the unsuccessful predictor.Type: ApplicationFiled: March 15, 2019Publication date: September 17, 2020Inventor: Akifumi Wachi
-
Publication number: 20210365485Abstract: A computer-implemented method is presented for performing Q-learning with language model for unsupervised text summarization. The method includes mapping each word of a sentence into a vector by using word embedding via a deep learning natural language processing model, assigning each of the words to an action and operation status, determining, for each of the words whose operation status represents “unoperated,” a status by calculating a local encoding and a global encoding, and concatenating the local encoding and the global encoding, the local encoding calculated based on a vector, an action, and an operation status of the word, and the global encoding calculated based on each of the local encodings of the words in a self-attention fashion, and determining, via an editorial agent, a Q-value for each of the words in terms of each of three actions based on the status.Type: ApplicationFiled: May 19, 2020Publication date: November 25, 2021Inventors: Ryosuke Kohita, Akifumi Wachi
-
Publication number: 20220164647Abstract: A method for action pruning in Reinforcement Learning receives a current state of an environment. The method evaluates, using a Logical Neural Network (LNN) structure, a logical inference based on the current state. The method outputs upper and lower bounds on each action from a set of possible actions of an agent in the environment, responsive to an evaluation of the logical inference. The method calculates, for each pair of a possible action of the agent in the environment and the current state, a probability by using the upper and lower bounds. Each of calculated probabilities indicates a respective priority ratio for the each action. The method obtains a policy in Reinforcement Learning for the current state by using the calculated probabilities. The method prunes one or more actions from the set of actions as being in violation of the policy such that the one or more actions are ignored.Type: ApplicationFiled: November 24, 2020Publication date: May 26, 2022Inventors: Daiki Kimura, Akifumi Wachi, Subhajit Chaudhury, Ryosuke Kohita, Asim Munawar, Michiaki Tatsubori
-
Patent number: 11715016Abstract: A computer-implemented method, computer program product, and computer processing system are provided for generating an adversarial input. The method includes reducing, by a Conditional Variational Encoder, a dimensionality of each of inputs to a target algorithm to obtain a set of latent variables. The method further includes separately training, by a processor, (i) a successful predictor with a first subset of the latent variables as a first input for which the target algorithm succeeds and (ii) an unsuccessful predictor with a second subset of the latent variables as a second input for which the target algorithm fails. Both the successful and the unsuccessful predictors predict outputs of the target algorithm. The method also includes sampling, by the processor, an input that is likely to make the target algorithm fail as the adversarial input by using a likelihood of the successful predictor and the unsuccessful predictor.Type: GrantFiled: March 15, 2019Date of Patent: August 1, 2023Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Akifumi Wachi
-
Patent number: 11443130Abstract: Making failure scenarios using adversarial reinforcement learning is performed by storing, in a first storage, a variety of first experiences of failures of a player agent due to an adversarial agent, and performing a simulation of an environment including the player agent and the adversarial agent. It also includes calculating a similarity of a second experience of a failure of the player agent in the simulation and each of the variety of first experiences in the first storage, and updating the first storage by adding the second experience as a new first experience of the variety of first experiences in response to the similarity being less than a threshold. Additionally, the use of adversarial reinforcement learning can include training the adversarial agent by using at least one of the plurality of first experiences in the first storage to generate an adversarial agent having diverse experiences.Type: GrantFiled: August 30, 2019Date of Patent: September 13, 2022Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Akifumi Wachi
-
Publication number: 20220164668Abstract: A method for safe reinforcement learning receives an action and a current state of an environment. The method evaluates, using a Logical Neural Network (LNN) structure, an action safetyness logical inference based on the current state of an environment and a current action candidate from an agent. The method outputs upper and lower bounds on the action, responsive to an evaluation of the action safetyness logical inference. The method calculates a contradiction value for the action by using the upper and lower bounds. The contradiction value indicates a level of contradiction for each of a plurality of logic rules implemented by the LNN structure. The method evaluates the action L with respect to safetyness based on the contradiction value. The method selectively performs the action responsive to an evaluation of the action indicating that the action is safe to perform based on the contradiction value exceeding a safetyness threshold.Type: ApplicationFiled: November 24, 2020Publication date: May 26, 2022Inventors: Daiki Kimura, Akifumi Wachi, Subhajit Chaudhury, Ryosuke Kohita, Asim Munawar, Michiaki Tatsubori
-
Patent number: 11294945Abstract: A computer-implemented method is presented for performing Q-learning with language model for unsupervised text summarization. The method includes mapping each word of a sentence into a vector by using word embedding via a deep learning natural language processing model, assigning each of the words to an action and operation status, determining, for each of the words whose operation status represents “unoperated,” a status by calculating a local encoding and a global encoding, and concatenating the local encoding and the global encoding, the local encoding calculated based on a vector, an action, and an operation status of the word, and the global encoding calculated based on each of the local encodings of the words in a self-attention fashion, and determining, via an editorial agent, a Q-value for each of the words in terms of each of three actions based on the status.Type: GrantFiled: May 19, 2020Date of Patent: April 5, 2022Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Ryosuke Kohita, Akifumi Wachi
-
Publication number: 20230185881Abstract: A computer-implemented method is provided for offline reinforcement learning with a dataset. The method includes training a neural network which inputs a state-action pair and outputs a respective Q function for each of a reward and one or more safety constraints, respectively. The neural network has a linear output layer and remaining non-linear layers being represented by a feature mapping function. The training includes obtaining the feature mapping function by constructing Q-functions based on the dataset according to an offline reinforcement algorithm. The training further includes tuning, using the feature mapping function, a weight between the reward and the one or more safety constraints, wherein during the obtaining and the tuning steps, an estimate of a Q-function is provided by subtracting an uncertainty from an expected value of the Q-function. The uncertainty is a function to map the state-action pair to an error size.Type: ApplicationFiled: December 15, 2021Publication date: June 15, 2023Inventors: Akifumi Wachi, Takayuki Osogami
-
Publication number: 20220129637Abstract: A computer identifies, within a task description, words that correspond to semantic element labels for the task. The computer receives, from a task source operatively connected with the computer, a textual description of a task. The computer receives semantic element labels, element identification rules, and at least one reference sentence showing natural language semantic element label use. The computer parses the description to generate words for the semantic element label to generate, a Rule Match Values based on the element identification rules for the parsed words. The computer collects words having RMVs above a threshold into sets of associated of candidate words and generates, using a neural network trained on the reference sentence, Match Likelihood Values (MLVs) indicating whether the candidate words represent a semantic element label with which the candidate word is associated. The computer selects to represent the semantic element, the associated candidate word having a highest MLV.Type: ApplicationFiled: October 23, 2020Publication date: April 28, 2022Inventors: Ryosuke Kohita, Akifumi Wachi, Daiki Kimura