Patents by Inventor Nobuyasu Itoh

Nobuyasu Itoh has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11610581
    Abstract: A computer-implemented method is provided for generating a language model for an application. The method includes estimating interpolation weights of each of a plurality of language models according to an Expectation Maximization (EM) algorithm based on a first metric. The method further includes classifying the plurality of language models into two or more sets based on characteristics of the two or more sets. The method also includes estimating a hyper interpolation weight for the two or more sets based on a second metric specific to the application. The method additionally includes interpolating the plurality of language models using the interpolation weights and the hyper interpolation weight to generate a final language model.
    Type: Grant
    Filed: February 5, 2021
    Date of Patent: March 21, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Nobuyasu Itoh, Masayuki Suzuki, Gakuto Kurata
  • Patent number: 11557288
    Abstract: A computer-implemented method of detecting a portion of audio data to be removed is provided. The method includes obtaining a recognition result of audio data. The recognition result includes recognized text data and time stamps. The method also includes extracting one or more candidate phrases from the recognition result using n-gram counts. The method further includes, for each candidate phrase, making pairs of same phrases with different time stamps and clustering the pairs of the same phrase by using differences in time stamps. The method includes further determining a portion of the audio data to be removed using results of the clustering.
    Type: Grant
    Filed: April 10, 2020
    Date of Patent: January 17, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Nobuyasu Itoh, Gakuto Kurata, Masayuki Suzuki
  • Publication number: 20220254335
    Abstract: A computer-implemented method is provided for generating a language model for an application. The method includes estimating interpolation weights of each of a plurality of language models according to an Expectation Maximization (EM) algorithm based on a first metric. The method further includes classifying the plurality of language models into two or more sets based on characteristics of the two or more sets. The method also includes estimating a hyper interpolation weight for the two or more sets based on a second metric specific to the application. The method additionally includes interpolating the plurality of language models using the interpolation weights and the hyper interpolation weight to generate a final language model.
    Type: Application
    Filed: February 5, 2021
    Publication date: August 11, 2022
    Inventors: Nobuyasu Itoh, Masayuki Suzuki, Gakuto Kurata
  • Patent number: 11276394
    Abstract: Vocabulary consistency for a language model may be improved by splitting a target token in an initial vocabulary into a plurality of split tokens, calculating an entropy of the target token and an entropy of the plurality of split tokens in a bootstrap language model, and determining whether to delete the target token from the initial vocabulary based on at least the entropy of the target token and the entropy of the plurality of split tokens.
    Type: Grant
    Filed: January 30, 2020
    Date of Patent: March 15, 2022
    Assignee: International Business Machines Corporation
    Inventors: Nobuyasu Itoh, Gakuto Kurata
  • Patent number: 11276391
    Abstract: A computer-implemented method for generating a text is disclosed. The method includes obtaining a first text collection matched with a target domain and a second text collection including a plurality of samples, each of which describes rewriting between a first text and a second text that has a style different from the first text. The method also includes training a text generation model with the first text collection and the second text collection, in which the text generation model has, in a vocabulary, one or more operation tokens indicating rewriting. The method further includes outputting a plurality of texts obtained from the text generation model.
    Type: Grant
    Filed: February 6, 2020
    Date of Patent: March 15, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Nobuyasu Itoh, Gakuto Kurata, Masayuki Suzuki
  • Publication number: 20210319787
    Abstract: A computer-implemented method of detecting a portion of audio data to be removed is provided. The method includes obtaining a recognition result of audio data. The recognition result includes recognized text data and time stamps. The method also includes extracting one or more candidate phrases from the recognition result using n-gram counts. The method further includes, for each candidate phrase, making pairs of same phrases with different time stamps and clustering the pairs of the same phrase by using differences in time stamps. The method includes further determining a portion of the audio data to be removed using results of the clustering.
    Type: Application
    Filed: April 10, 2020
    Publication date: October 14, 2021
    Inventors: Nobuyasu Itoh, Gakuto Kurata, Masayuki Suzuki
  • Publication number: 20210248996
    Abstract: A computer-implemented method for generating a text is disclosed. The method includes obtaining a first text collection matched with a target domain and a second text collection including a plurality of samples, each of which describes rewriting between a first text and a second text that has a style different from the first text. The method also includes training a text generation model with the first text collection and the second text collection, in which the text generation model has, in a vocabulary, one or more operation tokens indicating rewriting. The method further includes outputting a plurality of texts obtained from the text generation model.
    Type: Application
    Filed: February 6, 2020
    Publication date: August 12, 2021
    Inventors: Nobuyasu Itoh, Gakuto Kurata, Masayuki Suzuki
  • Patent number: 11056102
    Abstract: A computer-implemented method includes generating a single text data structure for a classifier of a speech recognition system, and sending the single text data structure to the classifier. Generating the single text data structure includes obtaining n-best hypotheses as an output of an automatic speech recognition (ASR) task for an utterance received by the speech recognition system, and combining the n-best hypotheses in a predetermined order with a separator between each pair of hypotheses to generate the single text data structure. The classifier is trained based on a single training text data structure by obtaining training source data, including selecting a first text sample and at least one similar text sample belong to a same class as the first text sample based on a maximum number of hypotheses, and arranging the plurality of text samples based on a degree of similarity.
    Type: Grant
    Filed: September 23, 2019
    Date of Patent: July 6, 2021
    Assignee: International Business Machines Corporation
    Inventors: Nobuyasu Itoh, Gakuto Kurata, Ryuki Tachibana
  • Patent number: 11011161
    Abstract: A computer-implemented method is provided for generating a plurality of templates. The method includes obtaining, by a processor device, a Recurrent Neural Network Language Model (RNNLM) trained using a first set of text data. The method further includes adapting, by the processor device, the RNNLM using a second set of text data by adding a new node corresponding to a class in both an input layer and an output layer of the RNNLM, the class being obtained from the second set of text data. The method also includes generating, by the processor device, the plurality of templates using the adapted RNNLM.
    Type: Grant
    Filed: December 10, 2018
    Date of Patent: May 18, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Masayuki Suzuki, Toru Nagano, Nobuyasu Itoh, Gakuto Kurata
  • Patent number: 10909316
    Abstract: A computer-implemented method, computer program product, and system are provided for separating a word in a dictionary. The method includes reading a word from the dictionary as a source word. The method also includes searching the dictionary for another word having a substring with a same surface string and a same reading as the source word. The method additionally includes splitting the another word by the source word to obtain one or more remaining substrings of the another word. The method further includes registering each of the one or more remaining substrings as a new word in the dictionary.
    Type: Grant
    Filed: October 31, 2019
    Date of Patent: February 2, 2021
    Assignee: International Business Machines Corporation
    Inventors: Toru Nagano, Nobuyasu Itoh, Gakuto Kurata
  • Patent number: 10832657
    Abstract: A computer-implemented method, computer program product, and apparatus are provided. The method includes generating a plurality of sequences of small unit tokens from a first language model that is trained with a small unit corpus including the small unit tokens, the small unit corpus having been derived by tokenization with a small unit. The method further includes tokenizing the plurality of sequences of small unit tokens by a large unit that is larger than the small unit, to create a derived large unit corpus including derived large unit tokens.
    Type: Grant
    Filed: March 1, 2018
    Date of Patent: November 10, 2020
    Assignee: International Business Machines Corporation
    Inventors: Masayuki Suzuki, Nobuyasu Itoh, Gakuto Kurata
  • Publication number: 20200184960
    Abstract: A computer-implemented method is provided for generating a plurality of templates. The method includes obtaining, by a processor device, a Recurrent Neural Network Language Model (RNNLM) trained using a first set of text data. The method further includes adapting, by the processor device, the RNNLM using a second set of text data by adding a new node corresponding to a class in both an input layer and an output layer of the RNNLM, the class being obtained from the second set of text data. The method also includes generating, by the processor device, the plurality of templates using the adapted RNNLM.
    Type: Application
    Filed: December 10, 2018
    Publication date: June 11, 2020
    Inventors: Masayuki Suzuki, Toru Nagano, Nobuyasu Itoh, Gakuto Kurata
  • Publication number: 20200168213
    Abstract: Vocabulary consistency for a language model may be improved by splitting a target token in an initial vocabulary into a plurality of split tokens, calculating an entropy of the target token and an entropy of the plurality of split tokens in a bootstrap language model, and determining whether to delete the target token from the initial vocabulary based on at least the entropy of the target token and the entropy of the plurality of split tokens.
    Type: Application
    Filed: January 30, 2020
    Publication date: May 28, 2020
    Inventors: Nobuyasu Itoh, Gakuto Kurata
  • Patent number: 10650803
    Abstract: A method, a computer program product, and a computer system for mapping between a speech signal and a transcript of the speech signal. The computer system segments the speech signal to obtain one or more segmented speech signals and the transcript of the speech signal to obtain one or more segmented transcripts of the speech signal. The computer system generates estimated phone sequences and reference phone sequences, calculates costs of correspondences between the estimated phone sequences and the reference phone sequences, determines a series of the estimated phone sequences with a smallest cost, selects a partial series of the estimated phone sequences from the series of the estimated phone sequences, and generates mapping data which includes the partial series of the estimated phone sequences and a corresponding series of the reference phone sequences.
    Type: Grant
    Filed: October 10, 2017
    Date of Patent: May 12, 2020
    Assignee: International Business Machines Corporation
    Inventors: Takashi Fukuda, Nobuyasu Itoh
  • Patent number: 10607604
    Abstract: Vocabulary consistency for a language model may be improved by splitting a target token in an initial vocabulary into a plurality of split tokens, calculating an entropy of the target token and an entropy of the plurality of split tokens in a bootstrap language model, and determining whether to delete the target token from the initial vocabulary based on at least the entropy of the target token and the entropy of the plurality of split tokens.
    Type: Grant
    Filed: October 27, 2017
    Date of Patent: March 31, 2020
    Assignee: International Business Machines Corporation
    Inventors: Nobuyasu Itoh, Gakuto Kurata
  • Publication number: 20200065378
    Abstract: A computer-implemented method, computer program product, and system are provided for separating a word in a dictionary. The method includes reading a word from the dictionary as a source word. The method also includes searching the dictionary for another word having a substring with a same surface string and a same reading as the source word. The method additionally includes splitting the another word by the source word to obtain one or more remaining substrings of the another word. The method further includes registering each of the one or more remaining substrings as a new word in the dictionary.
    Type: Application
    Filed: October 31, 2019
    Publication date: February 27, 2020
    Inventors: Toru Nagano, Nobuyasu Itoh, Gakuto Kurata
  • Patent number: 10572586
    Abstract: A computer-implemented method, computer program product, and system are provided for separating a word in a dictionary. The method includes reading a word from the dictionary as a source word. The method also includes searching the dictionary for another word having a substring with a same surface string and a same reading as the source word. The method additionally includes splitting the another word by the source word to obtain one or more remaining substrings of the another word. The method further includes registering each of the one or more remaining substrings as a new word in the dictionary.
    Type: Grant
    Filed: February 27, 2018
    Date of Patent: February 25, 2020
    Assignee: International Business Machines Corporation
    Inventors: Toru Nagano, Nobuyasu Itoh, Gakuto Kurata
  • Patent number: 10540963
    Abstract: A computer-implemented method for generating an input for a classifier. The method includes obtaining n-best hypotheses which is an output of an automatic speech recognition (ASR) for an utterance, combining the n-best hypotheses horizontally in a predetermined order with a separator between each pair of hypotheses, and outputting the combined n-best hypotheses as a single text input to a classifier.
    Type: Grant
    Filed: February 2, 2017
    Date of Patent: January 21, 2020
    Assignee: International Business Machines Corporation
    Inventors: Nobuyasu Itoh, Gakuto Kurata, Ryuki Tachibana
  • Publication number: 20200020324
    Abstract: A computer-implemented method includes generating a single text data structure for a classifier of a speech recognition system, and sending the single text data structure to the classifier. Generating the single text data structure includes obtaining n-best hypotheses as an output of an automatic speech recognition (ASR) task for an utterance received by the speech recognition system, and combining the n-best hypotheses in a predetermined order with a separator between each pair of hypotheses to generate the single text data structure. The classifier is trained based on a single training text data structure by obtaining training source data, including selecting a first text sample and at least one similar text sample belong to a same class as the first text sample based on a maximum number of hypotheses, and arranging the plurality of text samples based on a degree of similarity.
    Type: Application
    Filed: September 23, 2019
    Publication date: January 16, 2020
    Inventors: Nobuyasu Itoh, Gakuto Kurata, Ryuki Tachibana
  • Patent number: 10418029
    Abstract: Method of selecting training text for language model, and method of training language model using the training text, and computer and computer program for executing the methods. The present invention provides for selecting training text for a language model that includes: generating a template for selecting training text from a corpus in a first domain according to generation techniques of: (i) replacing one or more words in a word string selected from the corpus in the first domain with a special symbol representing any word or word string, and adopting the word string after replacement as a template for selecting the training text; and/or (ii) adopting the word string selected from the corpus in the first domain as the template for selecting the training text; and selecting text covered by the template as the training text from a corpus in a second domain different from the first domain.
    Type: Grant
    Filed: November 30, 2017
    Date of Patent: September 17, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Nobuyasu Itoh, Gakuto Kurata, Masafumi Nishimura