Patents by Inventor Gakuto Kurata

Gakuto Kurata has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11625595
    Abstract: Knowledge transfer between recurrent neural networks is performed by obtaining a first output sequence from a bidirectional Recurrent Neural Network (RNN) model for an input sequence, obtaining a second output sequence from a unidirectional RNN model for the input sequence, selecting at least one first output from the first output sequence based on a similarity between the at least one first output and a second output from the second output sequence; and training the unidirectional RNN model to increase the similarity between the at least one first output and the second output.
    Type: Grant
    Filed: August 29, 2018
    Date of Patent: April 11, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Gakuto Kurata, Kartik Audhkhasi
  • Publication number: 20230104244
    Abstract: A computer-implemented method is provided for training a Recurrent Neural Network Transducer (RNN-T). The method includes training, by inputting a set of audio data, a first RNN-T which includes a common encoder, a forward prediction network, and a first joint network combining outputs of both the common encoder and the forward prediction network. The forward prediction network predicts label sequences forward. The method further includes training, by inputting the set of audio data, a second RNN-T which includes the common encoder, a backward prediction network, and a second joint network combining outputs of both the common encoder and the backward prediction network. The backward prediction network predicts label sequences backward. The trained first RNN-T is used for inference.
    Type: Application
    Filed: September 17, 2021
    Publication date: April 6, 2023
    Inventor: Gakuto Kurata
  • Patent number: 11610108
    Abstract: A student neural network may be trained by a computer-implemented method, including: selecting a teacher neural network among a plurality of teacher neural networks, inputting an input data to the selected teacher neural network to obtain a soft label output generated by the selected teacher neural network, and training a student neural network with at least the input data and the soft label output from the selected teacher neural network.
    Type: Grant
    Filed: July 27, 2018
    Date of Patent: March 21, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Takashi Fukuda, Masayuki Suzuki, Osamu Ichikawa, Gakuto Kurata, Samuel Thomas, Bhuvana Ramabhadran
  • Patent number: 11610581
    Abstract: A computer-implemented method is provided for generating a language model for an application. The method includes estimating interpolation weights of each of a plurality of language models according to an Expectation Maximization (EM) algorithm based on a first metric. The method further includes classifying the plurality of language models into two or more sets based on characteristics of the two or more sets. The method also includes estimating a hyper interpolation weight for the two or more sets based on a second metric specific to the application. The method additionally includes interpolating the plurality of language models using the interpolation weights and the hyper interpolation weight to generate a final language model.
    Type: Grant
    Filed: February 5, 2021
    Date of Patent: March 21, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Nobuyasu Itoh, Masayuki Suzuki, Gakuto Kurata
  • Publication number: 20230069628
    Abstract: A computer-implemented method for fusing an end-to-end speech recognition model with an external language model (ExternalLM) is provided. The method includes obtaining an output of the end-to-end speech recognition model. The output is a probability distribution. The method further includes transforming, by a hardware processor, the probability distribution into a transformed probability distribution to relax a sharpness of the probability distribution. The method also includes fusing the transformed probability distribution and a probability distribution of the ExternalLM for decoding speech.
    Type: Application
    Filed: August 24, 2021
    Publication date: March 2, 2023
    Inventors: Tohru Nagano, Masayuki Suzuki, Gakuto Kurata
  • Patent number: 11574181
    Abstract: Fusion of neural networks is performed by obtaining a first neural network and a second neural network. The first and the second neural networks are the result of a parent neural network subjected to different training. A similarity score is calculated of a first component of the first neural network and a corresponding second component of the second neural network. An interpolation weight is determined for the first and the second components by using the similarity score. A neural network parameter of the first component is updated based on the interpolation weight and a corresponding neural network parameter of the second component to obtain a fused neural network.
    Type: Grant
    Filed: May 8, 2019
    Date of Patent: February 7, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Takashi Fukuda, Masayuki Suzuki, Gakuto Kurata
  • Patent number: 11557288
    Abstract: A computer-implemented method of detecting a portion of audio data to be removed is provided. The method includes obtaining a recognition result of audio data. The recognition result includes recognized text data and time stamps. The method also includes extracting one or more candidate phrases from the recognition result using n-gram counts. The method further includes, for each candidate phrase, making pairs of same phrases with different time stamps and clustering the pairs of the same phrase by using differences in time stamps. The method includes further determining a portion of the audio data to be removed using results of the clustering.
    Type: Grant
    Filed: April 10, 2020
    Date of Patent: January 17, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Nobuyasu Itoh, Gakuto Kurata, Masayuki Suzuki
  • Publication number: 20220415315
    Abstract: An approach for improving speech recognition is provided. A processor receives a new word to add to a prefix tree. A processor determines a bonus score for a first transition from a first node to a second node in a prefix tree on condition that the first transition is included in a path of at least one transition representing the new word. A processor determines a hypothesis score for a hypothesis that corresponds to a speech sequence based on the prefix tree, where the hypothesis score adds the bonus score to an initial hypothesis score to determine the hypothesis score. In response to a determination that the hypothesis score exceeds a threshold value, a processor generates an output text sequence for the speech sequence based on the hypothesis.
    Type: Application
    Filed: June 23, 2021
    Publication date: December 29, 2022
    Inventors: Masayuki Suzuki, Gakuto Kurata
  • Patent number: 11443169
    Abstract: A computer implemented method for adapting a model for recognition processing to a target-domain is disclosed. The method includes preparing a first distribution in relation to a part of the model, in which the first distribution is derived from data of a training-domain for the model. The method also includes obtaining a second distribution in relation to the part of the model by using data of the target-domain. The method further includes tuning one or more parameters of the part of the model so that difference between the first and the second distributions becomes small.
    Type: Grant
    Filed: February 19, 2016
    Date of Patent: September 13, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Gakuto Kurata
  • Publication number: 20220254335
    Abstract: A computer-implemented method is provided for generating a language model for an application. The method includes estimating interpolation weights of each of a plurality of language models according to an Expectation Maximization (EM) algorithm based on a first metric. The method further includes classifying the plurality of language models into two or more sets based on characteristics of the two or more sets. The method also includes estimating a hyper interpolation weight for the two or more sets based on a second metric specific to the application. The method additionally includes interpolating the plurality of language models using the interpolation weights and the hyper interpolation weight to generate a final language model.
    Type: Application
    Filed: February 5, 2021
    Publication date: August 11, 2022
    Inventors: Nobuyasu Itoh, Masayuki Suzuki, Gakuto Kurata
  • Patent number: 11404047
    Abstract: A multi-task learning system is provided for speech recognition. The system includes a common encoder network. The system further includes a primary network for minimizing a Connectionist Temporal Classification (CTC) loss for speech recognition. The system also includes a sub network for minimizing a Mean squared error (MSE) loss for feature reconstruction. A first set of output data of the common encoder network is received by both of the primary network and the sub network. A second set of the output data of the common encode network is received only by the primary network from among the primary network and the sub network.
    Type: Grant
    Filed: March 8, 2019
    Date of Patent: August 2, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Gakuto Kurata, Kartik Audhkhasi
  • Publication number: 20220208179
    Abstract: A computer-implemented method for customizing a recurrent neural network transducer (RNN-T) is provided. The computer implemented method includes synthesizing first domain audio data from first domain text data, and feeding the synthesized first domain audio data into a trained encoder of the recurrent neural network transducer (RNN-T) having an initial condition, wherein the encoder is updated using the synthesized first domain audio data and the first domain text data. The computer implemented method further includes synthesizing second domain audio data from second domain text data, and feeding the synthesized second domain audio data into the updated encoder of the recurrent neural network transducer (RNN-T), wherein the prediction network is updated using the synthesized second domain audio data and the second domain text data. The computer implemented method further includes restoring the updated encoder to the initial condition.
    Type: Application
    Filed: December 29, 2020
    Publication date: June 30, 2022
    Inventors: Gakuto Kurata, George Andrei Saon, Brian E. D. Kingsbury
  • Publication number: 20220188622
    Abstract: An approach to identifying alternate soft labels for training a student model may be provided. A teaching model may generate a soft label for a labeled training data. The training data can be an acoustic file for speech or a spoken natural language. A pool of soft labels previously generated by teacher models can be searched at the label level to identify soft labels that are similar to the generated soft label. The similar soft labels can have similar length or sequence at the word phoneme, and/or state level. The identified similar soft labels can be used in conjunction with the generated soft label to train a student model.
    Type: Application
    Filed: December 10, 2020
    Publication date: June 16, 2022
    Inventors: Toru Nagano, Takashi Fukuda, Gakuto Kurata
  • Publication number: 20220172080
    Abstract: A computer-implemented method is provided for learning multimodal feature matching. The method includes training an image encoder to obtain encoded images. The method further includes training a common classifier on the encoded images by using labeled images. The method also includes training a text encoder while keeping the common classifier in a fixed configuration by using learned text embeddings and corresponding labels for the learned text embeddings. The text encoder is further trained to match a distance of predicted text embeddings which is encoded by the text encoder to a fitted Gaussian distribution on the encoded images.
    Type: Application
    Filed: December 2, 2020
    Publication date: June 2, 2022
    Inventors: Subhajit Chaudhury, Daiki Kimura, Gakuto Kurata, Ryuki Tachibana
  • Patent number: 11341413
    Abstract: Methods and systems for language processing includes initializing a word embedding matrix based on pre-determined word classes, such that matrix entries associated with a class of which a word is a member are initialized to a non-zero value and other entries are initialized to zero. A neural network is trained based on the initialized word embedding matrix to generate a neural network language model. A language processing task is performed using the neural network language model.
    Type: Grant
    Filed: August 29, 2016
    Date of Patent: May 24, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Gakuto Kurata
  • Patent number: 11302309
    Abstract: A technique for aligning spike timing of models is disclosed. A first model having a first architecture trained with a set of training samples is generated. Each training sample includes an input sequence of observations and an output sequence of symbols having different length from the input sequence. Then, one or more second models are trained with the trained first model by minimizing a guide loss jointly with a normal loss for each second model and a sequence recognition task is performed using the one or more second models. The guide loss evaluates dissimilarity in spike timing between the trained first model and each second model being trained.
    Type: Grant
    Filed: September 13, 2019
    Date of Patent: April 12, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Gakuto Kurata, Kartik Audhkhasi
  • Publication number: 20220093083
    Abstract: A computer-implemented method is provided for model training. The method includes training a second end-to-end neural speech recognition model that has a bidirectional encoder to output same symbols from an output probability lattice of the second end-to-end neural speech recognition model as from an output probability lattice of a trained first end-to-end neural speech recognition model having a unidirectional encoder. The method also includes building a third end-to-end neural speech recognition model that has a unidirectional encoder by training the third end-to-end neural speech recognition model as a student by using the trained second end-to-end neural speech recognition model as a teacher in a knowledge distillation method.
    Type: Application
    Filed: September 24, 2020
    Publication date: March 24, 2022
    Inventors: Gakuto Kurata, George Andrei Saon
  • Patent number: 11276394
    Abstract: Vocabulary consistency for a language model may be improved by splitting a target token in an initial vocabulary into a plurality of split tokens, calculating an entropy of the target token and an entropy of the plurality of split tokens in a bootstrap language model, and determining whether to delete the target token from the initial vocabulary based on at least the entropy of the target token and the entropy of the plurality of split tokens.
    Type: Grant
    Filed: January 30, 2020
    Date of Patent: March 15, 2022
    Assignee: International Business Machines Corporation
    Inventors: Nobuyasu Itoh, Gakuto Kurata
  • Patent number: 11276391
    Abstract: A computer-implemented method for generating a text is disclosed. The method includes obtaining a first text collection matched with a target domain and a second text collection including a plurality of samples, each of which describes rewriting between a first text and a second text that has a style different from the first text. The method also includes training a text generation model with the first text collection and the second text collection, in which the text generation model has, in a vocabulary, one or more operation tokens indicating rewriting. The method further includes outputting a plurality of texts obtained from the text generation model.
    Type: Grant
    Filed: February 6, 2020
    Date of Patent: March 15, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Nobuyasu Itoh, Gakuto Kurata, Masayuki Suzuki
  • Patent number: 11263516
    Abstract: Methods and systems for training a neural network include identifying weights in a neural network between a final hidden neuron layer and an output neuron layer that correspond to state matches between a neuron of the final hidden neuron layer and a respective neuron of the output neuron layer. The identified weights are initialized to a predetermined non-zero value and initializing other weights between the final hidden neuron layer and the output neuron layer to zero. The neural network is trained based on a training corpus after initialization.
    Type: Grant
    Filed: August 2, 2016
    Date of Patent: March 1, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Gakuto Kurata