Patents by Inventor Nobuyasu Itoh
Nobuyasu Itoh has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11610581Abstract: A computer-implemented method is provided for generating a language model for an application. The method includes estimating interpolation weights of each of a plurality of language models according to an Expectation Maximization (EM) algorithm based on a first metric. The method further includes classifying the plurality of language models into two or more sets based on characteristics of the two or more sets. The method also includes estimating a hyper interpolation weight for the two or more sets based on a second metric specific to the application. The method additionally includes interpolating the plurality of language models using the interpolation weights and the hyper interpolation weight to generate a final language model.Type: GrantFiled: February 5, 2021Date of Patent: March 21, 2023Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Nobuyasu Itoh, Masayuki Suzuki, Gakuto Kurata
-
Patent number: 11557288Abstract: A computer-implemented method of detecting a portion of audio data to be removed is provided. The method includes obtaining a recognition result of audio data. The recognition result includes recognized text data and time stamps. The method also includes extracting one or more candidate phrases from the recognition result using n-gram counts. The method further includes, for each candidate phrase, making pairs of same phrases with different time stamps and clustering the pairs of the same phrase by using differences in time stamps. The method includes further determining a portion of the audio data to be removed using results of the clustering.Type: GrantFiled: April 10, 2020Date of Patent: January 17, 2023Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Nobuyasu Itoh, Gakuto Kurata, Masayuki Suzuki
-
Publication number: 20220254335Abstract: A computer-implemented method is provided for generating a language model for an application. The method includes estimating interpolation weights of each of a plurality of language models according to an Expectation Maximization (EM) algorithm based on a first metric. The method further includes classifying the plurality of language models into two or more sets based on characteristics of the two or more sets. The method also includes estimating a hyper interpolation weight for the two or more sets based on a second metric specific to the application. The method additionally includes interpolating the plurality of language models using the interpolation weights and the hyper interpolation weight to generate a final language model.Type: ApplicationFiled: February 5, 2021Publication date: August 11, 2022Inventors: Nobuyasu Itoh, Masayuki Suzuki, Gakuto Kurata
-
Patent number: 11276394Abstract: Vocabulary consistency for a language model may be improved by splitting a target token in an initial vocabulary into a plurality of split tokens, calculating an entropy of the target token and an entropy of the plurality of split tokens in a bootstrap language model, and determining whether to delete the target token from the initial vocabulary based on at least the entropy of the target token and the entropy of the plurality of split tokens.Type: GrantFiled: January 30, 2020Date of Patent: March 15, 2022Assignee: International Business Machines CorporationInventors: Nobuyasu Itoh, Gakuto Kurata
-
Patent number: 11276391Abstract: A computer-implemented method for generating a text is disclosed. The method includes obtaining a first text collection matched with a target domain and a second text collection including a plurality of samples, each of which describes rewriting between a first text and a second text that has a style different from the first text. The method also includes training a text generation model with the first text collection and the second text collection, in which the text generation model has, in a vocabulary, one or more operation tokens indicating rewriting. The method further includes outputting a plurality of texts obtained from the text generation model.Type: GrantFiled: February 6, 2020Date of Patent: March 15, 2022Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Nobuyasu Itoh, Gakuto Kurata, Masayuki Suzuki
-
Publication number: 20210319787Abstract: A computer-implemented method of detecting a portion of audio data to be removed is provided. The method includes obtaining a recognition result of audio data. The recognition result includes recognized text data and time stamps. The method also includes extracting one or more candidate phrases from the recognition result using n-gram counts. The method further includes, for each candidate phrase, making pairs of same phrases with different time stamps and clustering the pairs of the same phrase by using differences in time stamps. The method includes further determining a portion of the audio data to be removed using results of the clustering.Type: ApplicationFiled: April 10, 2020Publication date: October 14, 2021Inventors: Nobuyasu Itoh, Gakuto Kurata, Masayuki Suzuki
-
Publication number: 20210248996Abstract: A computer-implemented method for generating a text is disclosed. The method includes obtaining a first text collection matched with a target domain and a second text collection including a plurality of samples, each of which describes rewriting between a first text and a second text that has a style different from the first text. The method also includes training a text generation model with the first text collection and the second text collection, in which the text generation model has, in a vocabulary, one or more operation tokens indicating rewriting. The method further includes outputting a plurality of texts obtained from the text generation model.Type: ApplicationFiled: February 6, 2020Publication date: August 12, 2021Inventors: Nobuyasu Itoh, Gakuto Kurata, Masayuki Suzuki
-
Patent number: 11056102Abstract: A computer-implemented method includes generating a single text data structure for a classifier of a speech recognition system, and sending the single text data structure to the classifier. Generating the single text data structure includes obtaining n-best hypotheses as an output of an automatic speech recognition (ASR) task for an utterance received by the speech recognition system, and combining the n-best hypotheses in a predetermined order with a separator between each pair of hypotheses to generate the single text data structure. The classifier is trained based on a single training text data structure by obtaining training source data, including selecting a first text sample and at least one similar text sample belong to a same class as the first text sample based on a maximum number of hypotheses, and arranging the plurality of text samples based on a degree of similarity.Type: GrantFiled: September 23, 2019Date of Patent: July 6, 2021Assignee: International Business Machines CorporationInventors: Nobuyasu Itoh, Gakuto Kurata, Ryuki Tachibana
-
Patent number: 11011161Abstract: A computer-implemented method is provided for generating a plurality of templates. The method includes obtaining, by a processor device, a Recurrent Neural Network Language Model (RNNLM) trained using a first set of text data. The method further includes adapting, by the processor device, the RNNLM using a second set of text data by adding a new node corresponding to a class in both an input layer and an output layer of the RNNLM, the class being obtained from the second set of text data. The method also includes generating, by the processor device, the plurality of templates using the adapted RNNLM.Type: GrantFiled: December 10, 2018Date of Patent: May 18, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Masayuki Suzuki, Toru Nagano, Nobuyasu Itoh, Gakuto Kurata
-
Patent number: 10909316Abstract: A computer-implemented method, computer program product, and system are provided for separating a word in a dictionary. The method includes reading a word from the dictionary as a source word. The method also includes searching the dictionary for another word having a substring with a same surface string and a same reading as the source word. The method additionally includes splitting the another word by the source word to obtain one or more remaining substrings of the another word. The method further includes registering each of the one or more remaining substrings as a new word in the dictionary.Type: GrantFiled: October 31, 2019Date of Patent: February 2, 2021Assignee: International Business Machines CorporationInventors: Toru Nagano, Nobuyasu Itoh, Gakuto Kurata
-
Patent number: 10832657Abstract: A computer-implemented method, computer program product, and apparatus are provided. The method includes generating a plurality of sequences of small unit tokens from a first language model that is trained with a small unit corpus including the small unit tokens, the small unit corpus having been derived by tokenization with a small unit. The method further includes tokenizing the plurality of sequences of small unit tokens by a large unit that is larger than the small unit, to create a derived large unit corpus including derived large unit tokens.Type: GrantFiled: March 1, 2018Date of Patent: November 10, 2020Assignee: International Business Machines CorporationInventors: Masayuki Suzuki, Nobuyasu Itoh, Gakuto Kurata
-
Publication number: 20200184960Abstract: A computer-implemented method is provided for generating a plurality of templates. The method includes obtaining, by a processor device, a Recurrent Neural Network Language Model (RNNLM) trained using a first set of text data. The method further includes adapting, by the processor device, the RNNLM using a second set of text data by adding a new node corresponding to a class in both an input layer and an output layer of the RNNLM, the class being obtained from the second set of text data. The method also includes generating, by the processor device, the plurality of templates using the adapted RNNLM.Type: ApplicationFiled: December 10, 2018Publication date: June 11, 2020Inventors: Masayuki Suzuki, Toru Nagano, Nobuyasu Itoh, Gakuto Kurata
-
Publication number: 20200168213Abstract: Vocabulary consistency for a language model may be improved by splitting a target token in an initial vocabulary into a plurality of split tokens, calculating an entropy of the target token and an entropy of the plurality of split tokens in a bootstrap language model, and determining whether to delete the target token from the initial vocabulary based on at least the entropy of the target token and the entropy of the plurality of split tokens.Type: ApplicationFiled: January 30, 2020Publication date: May 28, 2020Inventors: Nobuyasu Itoh, Gakuto Kurata
-
Patent number: 10650803Abstract: A method, a computer program product, and a computer system for mapping between a speech signal and a transcript of the speech signal. The computer system segments the speech signal to obtain one or more segmented speech signals and the transcript of the speech signal to obtain one or more segmented transcripts of the speech signal. The computer system generates estimated phone sequences and reference phone sequences, calculates costs of correspondences between the estimated phone sequences and the reference phone sequences, determines a series of the estimated phone sequences with a smallest cost, selects a partial series of the estimated phone sequences from the series of the estimated phone sequences, and generates mapping data which includes the partial series of the estimated phone sequences and a corresponding series of the reference phone sequences.Type: GrantFiled: October 10, 2017Date of Patent: May 12, 2020Assignee: International Business Machines CorporationInventors: Takashi Fukuda, Nobuyasu Itoh
-
Patent number: 10607604Abstract: Vocabulary consistency for a language model may be improved by splitting a target token in an initial vocabulary into a plurality of split tokens, calculating an entropy of the target token and an entropy of the plurality of split tokens in a bootstrap language model, and determining whether to delete the target token from the initial vocabulary based on at least the entropy of the target token and the entropy of the plurality of split tokens.Type: GrantFiled: October 27, 2017Date of Patent: March 31, 2020Assignee: International Business Machines CorporationInventors: Nobuyasu Itoh, Gakuto Kurata
-
Publication number: 20200065378Abstract: A computer-implemented method, computer program product, and system are provided for separating a word in a dictionary. The method includes reading a word from the dictionary as a source word. The method also includes searching the dictionary for another word having a substring with a same surface string and a same reading as the source word. The method additionally includes splitting the another word by the source word to obtain one or more remaining substrings of the another word. The method further includes registering each of the one or more remaining substrings as a new word in the dictionary.Type: ApplicationFiled: October 31, 2019Publication date: February 27, 2020Inventors: Toru Nagano, Nobuyasu Itoh, Gakuto Kurata
-
Patent number: 10572586Abstract: A computer-implemented method, computer program product, and system are provided for separating a word in a dictionary. The method includes reading a word from the dictionary as a source word. The method also includes searching the dictionary for another word having a substring with a same surface string and a same reading as the source word. The method additionally includes splitting the another word by the source word to obtain one or more remaining substrings of the another word. The method further includes registering each of the one or more remaining substrings as a new word in the dictionary.Type: GrantFiled: February 27, 2018Date of Patent: February 25, 2020Assignee: International Business Machines CorporationInventors: Toru Nagano, Nobuyasu Itoh, Gakuto Kurata
-
Patent number: 10540963Abstract: A computer-implemented method for generating an input for a classifier. The method includes obtaining n-best hypotheses which is an output of an automatic speech recognition (ASR) for an utterance, combining the n-best hypotheses horizontally in a predetermined order with a separator between each pair of hypotheses, and outputting the combined n-best hypotheses as a single text input to a classifier.Type: GrantFiled: February 2, 2017Date of Patent: January 21, 2020Assignee: International Business Machines CorporationInventors: Nobuyasu Itoh, Gakuto Kurata, Ryuki Tachibana
-
Publication number: 20200020324Abstract: A computer-implemented method includes generating a single text data structure for a classifier of a speech recognition system, and sending the single text data structure to the classifier. Generating the single text data structure includes obtaining n-best hypotheses as an output of an automatic speech recognition (ASR) task for an utterance received by the speech recognition system, and combining the n-best hypotheses in a predetermined order with a separator between each pair of hypotheses to generate the single text data structure. The classifier is trained based on a single training text data structure by obtaining training source data, including selecting a first text sample and at least one similar text sample belong to a same class as the first text sample based on a maximum number of hypotheses, and arranging the plurality of text samples based on a degree of similarity.Type: ApplicationFiled: September 23, 2019Publication date: January 16, 2020Inventors: Nobuyasu Itoh, Gakuto Kurata, Ryuki Tachibana
-
Patent number: 10418029Abstract: Method of selecting training text for language model, and method of training language model using the training text, and computer and computer program for executing the methods. The present invention provides for selecting training text for a language model that includes: generating a template for selecting training text from a corpus in a first domain according to generation techniques of: (i) replacing one or more words in a word string selected from the corpus in the first domain with a special symbol representing any word or word string, and adopting the word string after replacement as a template for selecting the training text; and/or (ii) adopting the word string selected from the corpus in the first domain as the template for selecting the training text; and selecting text covered by the template as the training text from a corpus in a second domain different from the first domain.Type: GrantFiled: November 30, 2017Date of Patent: September 17, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Nobuyasu Itoh, Gakuto Kurata, Masafumi Nishimura