Patents by Inventor Nobuyasu Itoh

Nobuyasu Itoh has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Multi-step linear interpolation of language models

Patent number: 11610581

Abstract: A computer-implemented method is provided for generating a language model for an application. The method includes estimating interpolation weights of each of a plurality of language models according to an Expectation Maximization (EM) algorithm based on a first metric. The method further includes classifying the plurality of language models into two or more sets based on characteristics of the two or more sets. The method also includes estimating a hyper interpolation weight for the two or more sets based on a second metric specific to the application. The method additionally includes interpolating the plurality of language models using the interpolation weights and the hyper interpolation weight to generate a final language model.

Type: Grant

Filed: February 5, 2021

Date of Patent: March 21, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Nobuyasu Itoh, Masayuki Suzuki, Gakuto Kurata
Hindrance speech portion detection using time stamps

Patent number: 11557288

Abstract: A computer-implemented method of detecting a portion of audio data to be removed is provided. The method includes obtaining a recognition result of audio data. The recognition result includes recognized text data and time stamps. The method also includes extracting one or more candidate phrases from the recognition result using n-gram counts. The method further includes, for each candidate phrase, making pairs of same phrases with different time stamps and clustering the pairs of the same phrase by using differences in time stamps. The method includes further determining a portion of the audio data to be removed using results of the clustering.

Type: Grant

Filed: April 10, 2020

Date of Patent: January 17, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Nobuyasu Itoh, Gakuto Kurata, Masayuki Suzuki
MULTI-STEP LINEAR INTERPOLATION OF LANGUAGE MODELS

Publication number: 20220254335

Abstract: A computer-implemented method is provided for generating a language model for an application. The method includes estimating interpolation weights of each of a plurality of language models according to an Expectation Maximization (EM) algorithm based on a first metric. The method further includes classifying the plurality of language models into two or more sets based on characteristics of the two or more sets. The method also includes estimating a hyper interpolation weight for the two or more sets based on a second metric specific to the application. The method additionally includes interpolating the plurality of language models using the interpolation weights and the hyper interpolation weight to generate a final language model.

Type: Application

Filed: February 5, 2021

Publication date: August 11, 2022

Inventors: Nobuyasu Itoh, Masayuki Suzuki, Gakuto Kurata
Method for re-aligning corpus and improving the consistency

Patent number: 11276394

Abstract: Vocabulary consistency for a language model may be improved by splitting a target token in an initial vocabulary into a plurality of split tokens, calculating an entropy of the target token and an entropy of the plurality of split tokens in a bootstrap language model, and determining whether to delete the target token from the initial vocabulary based on at least the entropy of the target token and the entropy of the plurality of split tokens.

Type: Grant

Filed: January 30, 2020

Date of Patent: March 15, 2022

Assignee: International Business Machines Corporation

Inventors: Nobuyasu Itoh, Gakuto Kurata
Generation of matched corpus for language model training

Patent number: 11276391

Abstract: A computer-implemented method for generating a text is disclosed. The method includes obtaining a first text collection matched with a target domain and a second text collection including a plurality of samples, each of which describes rewriting between a first text and a second text that has a style different from the first text. The method also includes training a text generation model with the first text collection and the second text collection, in which the text generation model has, in a vocabulary, one or more operation tokens indicating rewriting. The method further includes outputting a plurality of texts obtained from the text generation model.

Type: Grant

Filed: February 6, 2020

Date of Patent: March 15, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Nobuyasu Itoh, Gakuto Kurata, Masayuki Suzuki
HINDRANCE SPEECH PORTION DETECTION USING TIME STAMPS

Publication number: 20210319787

Abstract: A computer-implemented method of detecting a portion of audio data to be removed is provided. The method includes obtaining a recognition result of audio data. The recognition result includes recognized text data and time stamps. The method also includes extracting one or more candidate phrases from the recognition result using n-gram counts. The method further includes, for each candidate phrase, making pairs of same phrases with different time stamps and clustering the pairs of the same phrase by using differences in time stamps. The method includes further determining a portion of the audio data to be removed using results of the clustering.

Type: Application

Filed: April 10, 2020

Publication date: October 14, 2021

Inventors: Nobuyasu Itoh, Gakuto Kurata, Masayuki Suzuki
GENERATION OF MATCHED CORPUS FOR LANGUAGE MODEL TRAINING

Publication number: 20210248996

Abstract: A computer-implemented method for generating a text is disclosed. The method includes obtaining a first text collection matched with a target domain and a second text collection including a plurality of samples, each of which describes rewriting between a first text and a second text that has a style different from the first text. The method also includes training a text generation model with the first text collection and the second text collection, in which the text generation model has, in a vocabulary, one or more operation tokens indicating rewriting. The method further includes outputting a plurality of texts obtained from the text generation model.

Type: Application

Filed: February 6, 2020

Publication date: August 12, 2021

Inventors: Nobuyasu Itoh, Gakuto Kurata, Masayuki Suzuki
Input generation for classifier

Patent number: 11056102

Abstract: A computer-implemented method includes generating a single text data structure for a classifier of a speech recognition system, and sending the single text data structure to the classifier. Generating the single text data structure includes obtaining n-best hypotheses as an output of an automatic speech recognition (ASR) task for an utterance received by the speech recognition system, and combining the n-best hypotheses in a predetermined order with a separator between each pair of hypotheses to generate the single text data structure. The classifier is trained based on a single training text data structure by obtaining training source data, including selecting a first text sample and at least one similar text sample belong to a same class as the first text sample based on a maximum number of hypotheses, and arranging the plurality of text samples based on a degree of similarity.

Type: Grant

Filed: September 23, 2019

Date of Patent: July 6, 2021

Assignee: International Business Machines Corporation

Inventors: Nobuyasu Itoh, Gakuto Kurata, Ryuki Tachibana
RNNLM-based generation of templates for class-based text generation

Patent number: 11011161

Abstract: A computer-implemented method is provided for generating a plurality of templates. The method includes obtaining, by a processor device, a Recurrent Neural Network Language Model (RNNLM) trained using a first set of text data. The method further includes adapting, by the processor device, the RNNLM using a second set of text data by adding a new node corresponding to a class in both an input layer and an output layer of the RNNLM, the class being obtained from the second set of text data. The method also includes generating, by the processor device, the plurality of templates using the adapted RNNLM.

Type: Grant

Filed: December 10, 2018

Date of Patent: May 18, 2021

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Masayuki Suzuki, Toru Nagano, Nobuyasu Itoh, Gakuto Kurata
Technique for automatically splitting words

Patent number: 10909316

Abstract: A computer-implemented method, computer program product, and system are provided for separating a word in a dictionary. The method includes reading a word from the dictionary as a source word. The method also includes searching the dictionary for another word having a substring with a same surface string and a same reading as the source word. The method additionally includes splitting the another word by the source word to obtain one or more remaining substrings of the another word. The method further includes registering each of the one or more remaining substrings as a new word in the dictionary.

Type: Grant

Filed: October 31, 2019

Date of Patent: February 2, 2021

Assignee: International Business Machines Corporation

Inventors: Toru Nagano, Nobuyasu Itoh, Gakuto Kurata
Use of small unit language model for training large unit language models

Patent number: 10832657

Abstract: A computer-implemented method, computer program product, and apparatus are provided. The method includes generating a plurality of sequences of small unit tokens from a first language model that is trained with a small unit corpus including the small unit tokens, the small unit corpus having been derived by tokenization with a small unit. The method further includes tokenizing the plurality of sequences of small unit tokens by a large unit that is larger than the small unit, to create a derived large unit corpus including derived large unit tokens.

Type: Grant

Filed: March 1, 2018

Date of Patent: November 10, 2020

Assignee: International Business Machines Corporation

Inventors: Masayuki Suzuki, Nobuyasu Itoh, Gakuto Kurata
RNNLM-BASED GENERATION OF TEMPLATES FOR CLASS-BASED TEXT GENERATION

Publication number: 20200184960

Abstract: A computer-implemented method is provided for generating a plurality of templates. The method includes obtaining, by a processor device, a Recurrent Neural Network Language Model (RNNLM) trained using a first set of text data. The method further includes adapting, by the processor device, the RNNLM using a second set of text data by adding a new node corresponding to a class in both an input layer and an output layer of the RNNLM, the class being obtained from the second set of text data. The method also includes generating, by the processor device, the plurality of templates using the adapted RNNLM.

Type: Application

Filed: December 10, 2018

Publication date: June 11, 2020

Inventors: Masayuki Suzuki, Toru Nagano, Nobuyasu Itoh, Gakuto Kurata
METHOD FOR RE-ALIGNING CORPUS AND IMPROVING THE CONSISTENCY

Publication number: 20200168213

Abstract: Vocabulary consistency for a language model may be improved by splitting a target token in an initial vocabulary into a plurality of split tokens, calculating an entropy of the target token and an entropy of the plurality of split tokens in a bootstrap language model, and determining whether to delete the target token from the initial vocabulary based on at least the entropy of the target token and the entropy of the plurality of split tokens.

Type: Application

Filed: January 30, 2020

Publication date: May 28, 2020

Inventors: Nobuyasu Itoh, Gakuto Kurata
Mapping between speech signal and transcript

Patent number: 10650803

Abstract: A method, a computer program product, and a computer system for mapping between a speech signal and a transcript of the speech signal. The computer system segments the speech signal to obtain one or more segmented speech signals and the transcript of the speech signal to obtain one or more segmented transcripts of the speech signal. The computer system generates estimated phone sequences and reference phone sequences, calculates costs of correspondences between the estimated phone sequences and the reference phone sequences, determines a series of the estimated phone sequences with a smallest cost, selects a partial series of the estimated phone sequences from the series of the estimated phone sequences, and generates mapping data which includes the partial series of the estimated phone sequences and a corresponding series of the reference phone sequences.

Type: Grant

Filed: October 10, 2017

Date of Patent: May 12, 2020

Assignee: International Business Machines Corporation

Inventors: Takashi Fukuda, Nobuyasu Itoh
Method for re-aligning corpus and improving the consistency

Patent number: 10607604

Abstract: Vocabulary consistency for a language model may be improved by splitting a target token in an initial vocabulary into a plurality of split tokens, calculating an entropy of the target token and an entropy of the plurality of split tokens in a bootstrap language model, and determining whether to delete the target token from the initial vocabulary based on at least the entropy of the target token and the entropy of the plurality of split tokens.

Type: Grant

Filed: October 27, 2017

Date of Patent: March 31, 2020

Assignee: International Business Machines Corporation

Inventors: Nobuyasu Itoh, Gakuto Kurata
TECHNIQUE FOR AUTOMATICALLY SPLITTING WORDS

Publication number: 20200065378

Abstract: A computer-implemented method, computer program product, and system are provided for separating a word in a dictionary. The method includes reading a word from the dictionary as a source word. The method also includes searching the dictionary for another word having a substring with a same surface string and a same reading as the source word. The method additionally includes splitting the another word by the source word to obtain one or more remaining substrings of the another word. The method further includes registering each of the one or more remaining substrings as a new word in the dictionary.

Type: Application

Filed: October 31, 2019

Publication date: February 27, 2020

Inventors: Toru Nagano, Nobuyasu Itoh, Gakuto Kurata
Technique for automatically splitting words

Patent number: 10572586

Abstract: A computer-implemented method, computer program product, and system are provided for separating a word in a dictionary. The method includes reading a word from the dictionary as a source word. The method also includes searching the dictionary for another word having a substring with a same surface string and a same reading as the source word. The method additionally includes splitting the another word by the source word to obtain one or more remaining substrings of the another word. The method further includes registering each of the one or more remaining substrings as a new word in the dictionary.

Type: Grant

Filed: February 27, 2018

Date of Patent: February 25, 2020

Assignee: International Business Machines Corporation

Inventors: Toru Nagano, Nobuyasu Itoh, Gakuto Kurata
Input generation for classifier

Patent number: 10540963

Abstract: A computer-implemented method for generating an input for a classifier. The method includes obtaining n-best hypotheses which is an output of an automatic speech recognition (ASR) for an utterance, combining the n-best hypotheses horizontally in a predetermined order with a separator between each pair of hypotheses, and outputting the combined n-best hypotheses as a single text input to a classifier.

Type: Grant

Filed: February 2, 2017

Date of Patent: January 21, 2020

Assignee: International Business Machines Corporation

Inventors: Nobuyasu Itoh, Gakuto Kurata, Ryuki Tachibana
INPUT GENERATION FOR CLASSIFIER

Publication number: 20200020324

Abstract: A computer-implemented method includes generating a single text data structure for a classifier of a speech recognition system, and sending the single text data structure to the classifier. Generating the single text data structure includes obtaining n-best hypotheses as an output of an automatic speech recognition (ASR) task for an utterance received by the speech recognition system, and combining the n-best hypotheses in a predetermined order with a separator between each pair of hypotheses to generate the single text data structure. The classifier is trained based on a single training text data structure by obtaining training source data, including selecting a first text sample and at least one similar text sample belong to a same class as the first text sample based on a maximum number of hypotheses, and arranging the plurality of text samples based on a degree of similarity.

Type: Application

Filed: September 23, 2019

Publication date: January 16, 2020

Inventors: Nobuyasu Itoh, Gakuto Kurata, Ryuki Tachibana
Method of selecting training text for language model, and method of training language model using the training text, and computer and computer program for executing the methods

Patent number: 10418029

Abstract: Method of selecting training text for language model, and method of training language model using the training text, and computer and computer program for executing the methods. The present invention provides for selecting training text for a language model that includes: generating a template for selecting training text from a corpus in a first domain according to generation techniques of: (i) replacing one or more words in a word string selected from the corpus in the first domain with a special symbol representing any word or word string, and adopting the word string after replacement as a template for selecting the training text; and/or (ii) adopting the word string selected from the corpus in the first domain as the template for selecting the training text; and selecting text covered by the template as the training text from a corpus in a second domain different from the first domain.

Type: Grant

Filed: November 30, 2017

Date of Patent: September 17, 2019

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Nobuyasu Itoh, Gakuto Kurata, Masafumi Nishimura

1 2 3 4 next