Patents by Inventor Gakuto Kurata

Gakuto Kurata has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11908458
    Abstract: A computer-implemented method for customizing a recurrent neural network transducer (RNN-T) is provided. The computer implemented method includes synthesizing first domain audio data from first domain text data, and feeding the synthesized first domain audio data into a trained encoder of the recurrent neural network transducer (RNN-T) having an initial condition, wherein the encoder is updated using the synthesized first domain audio data and the first domain text data. The computer implemented method further includes synthesizing second domain audio data from second domain text data, and feeding the synthesized second domain audio data into the updated encoder of the recurrent neural network transducer (RNN-T), wherein the prediction network is updated using the synthesized second domain audio data and the second domain text data. The computer implemented method further includes restoring the updated encoder to the initial condition.
    Type: Grant
    Filed: December 29, 2020
    Date of Patent: February 20, 2024
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Gakuto Kurata, George Andrei Saon, Brian E. D. Kingsbury
  • Patent number: 11908454
    Abstract: A processor-implemented method trains an automatic speech recognition system using speech data and text data. A computer device receives speech data, and generates a spectrogram based on the speech data. The computing device receives text data associated with an entire corpus of text data, and generates a textogram based upon the text data. The computing device trains an automatic speech recognition system using the spectrogram and the textogram.
    Type: Grant
    Filed: December 1, 2021
    Date of Patent: February 20, 2024
    Assignee: International Business Machines Corporation
    Inventors: Samuel Thomas, Hong-Kwang Kuo, Brian E. D. Kingsbury, George Andrei Saon, Gakuto Kurata
  • Patent number: 11893983
    Abstract: An approach for improving speech recognition is provided. A processor receives a new word to add to a prefix tree. A processor determines a bonus score for a first transition from a first node to a second node in a prefix tree on condition that the first transition is included in a path of at least one transition representing the new word. A processor determines a hypothesis score for a hypothesis that corresponds to a speech sequence based on the prefix tree, where the hypothesis score adds the bonus score to an initial hypothesis score to determine the hypothesis score. In response to a determination that the hypothesis score exceeds a threshold value, a processor generates an output text sequence for the speech sequence based on the hypothesis.
    Type: Grant
    Filed: June 23, 2021
    Date of Patent: February 6, 2024
    Assignee: International Business Machines Corporation
    Inventors: Masayuki Suzuki, Gakuto Kurata
  • Publication number: 20240038221
    Abstract: Systems, computer-implemented methods, and computer program products to facilitate multi-task training a recurrent neural network transducer (RNN-T) using automatic speech recognition (ASR) information are provided. According to an embodiment, a system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can include an RNN-T that can receive ASR information. The computer executable components can include a voice activity detection (VAD) model that trains the RNN-T using the ASR information, where the RNN-T can further comprise an encoder and a joint network. One or more outputs of the encoder can be integrated with the joint network and one or more outputs of the VAD model.
    Type: Application
    Filed: July 28, 2022
    Publication date: February 1, 2024
    Inventors: Sashi Novitasari, Takashi Fukuda, Gakuto Kurata
  • Publication number: 20230410797
    Abstract: A computer-implemented method is provided for model training. The method includes training a second end-to-end neural speech recognition model that has a bidirectional encoder to output same symbols from an output probability lattice of the second end-to-end neural speech recognition model as from an output probability lattice of a trained first end-to-end neural speech recognition model having a unidirectional encoder. The method also includes building a third end-to-end neural speech recognition model that has a unidirectional encoder by training the third end-to-end neural speech recognition model as a student by using the trained second end-to-end neural speech recognition model as a teacher in a knowledge distillation method.
    Type: Application
    Filed: September 1, 2023
    Publication date: December 21, 2023
    Inventors: Gakuto Kurata, George Andrei Saon
  • Patent number: 11783811
    Abstract: A computer-implemented method is provided for model training. The method includes training a second end-to-end neural speech recognition model that has a bidirectional encoder to output same symbols from an output probability lattice of the second end-to-end neural speech recognition model as from an output probability lattice of a trained first end-to-end neural speech recognition model having a unidirectional encoder. The method also includes building a third end-to-end neural speech recognition model that has a unidirectional encoder by training the third end-to-end neural speech recognition model as a student by using the trained second end-to-end neural speech recognition model as a teacher in a knowledge distillation method.
    Type: Grant
    Filed: September 24, 2020
    Date of Patent: October 10, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Gakuto Kurata, George Andrei Saon
  • Patent number: 11741355
    Abstract: A student neural network may be trained by a computer-implemented method, including: inputting common input data to each teacher neural network among a plurality of teacher neural networks to obtain a soft label output among a plurality of soft label outputs from each teacher neural network among the plurality of teacher neural networks, and training a student neural network with the input data and the plurality of soft label outputs.
    Type: Grant
    Filed: July 27, 2018
    Date of Patent: August 29, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Takashi Fukuda, Masayuki Suzuki, Osamu Ichikawa, Gakuto Kurata, Samuel Thomas, Bhuvana Ramabhadran
  • Publication number: 20230237989
    Abstract: A computer-implemented method for training a neural transducer is provided including, by using audio data and transcription data of the audio data as input data, obtaining outputs from a trained language model and a seed neural transducer, respectively, combining the outputs to obtain a supervisory output, and updating parameters of another neural transducer in training so that its output is close to the supervisory output. The neural transducer can be a Recurrent Neural Network Transducer (RNN-T).
    Type: Application
    Filed: January 21, 2022
    Publication date: July 27, 2023
    Inventor: Gakuto Kurata
  • Publication number: 20230196107
    Abstract: Knowledge transfer between recurrent neural networks is performed by obtaining a first output sequence from a bidirectional Recurrent Neural Network (RNN) model for an input sequence, obtaining a second output sequence from a unidirectional RNN model for the input sequence, selecting at least one first output from the first output sequence based on a similarity between the at least one first output and a second output from the second output sequence; and training the unidirectional RNN model to increase the similarity between the at least one first output and the second output.
    Type: Application
    Filed: February 14, 2023
    Publication date: June 22, 2023
    Inventors: Gakuto Kurata, Kartik Audhkhasi
  • Patent number: 11625595
    Abstract: Knowledge transfer between recurrent neural networks is performed by obtaining a first output sequence from a bidirectional Recurrent Neural Network (RNN) model for an input sequence, obtaining a second output sequence from a unidirectional RNN model for the input sequence, selecting at least one first output from the first output sequence based on a similarity between the at least one first output and a second output from the second output sequence; and training the unidirectional RNN model to increase the similarity between the at least one first output and the second output.
    Type: Grant
    Filed: August 29, 2018
    Date of Patent: April 11, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Gakuto Kurata, Kartik Audhkhasi
  • Publication number: 20230104244
    Abstract: A computer-implemented method is provided for training a Recurrent Neural Network Transducer (RNN-T). The method includes training, by inputting a set of audio data, a first RNN-T which includes a common encoder, a forward prediction network, and a first joint network combining outputs of both the common encoder and the forward prediction network. The forward prediction network predicts label sequences forward. The method further includes training, by inputting the set of audio data, a second RNN-T which includes the common encoder, a backward prediction network, and a second joint network combining outputs of both the common encoder and the backward prediction network. The backward prediction network predicts label sequences backward. The trained first RNN-T is used for inference.
    Type: Application
    Filed: September 17, 2021
    Publication date: April 6, 2023
    Inventor: Gakuto Kurata
  • Patent number: 11610108
    Abstract: A student neural network may be trained by a computer-implemented method, including: selecting a teacher neural network among a plurality of teacher neural networks, inputting an input data to the selected teacher neural network to obtain a soft label output generated by the selected teacher neural network, and training a student neural network with at least the input data and the soft label output from the selected teacher neural network.
    Type: Grant
    Filed: July 27, 2018
    Date of Patent: March 21, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Takashi Fukuda, Masayuki Suzuki, Osamu Ichikawa, Gakuto Kurata, Samuel Thomas, Bhuvana Ramabhadran
  • Patent number: 11610581
    Abstract: A computer-implemented method is provided for generating a language model for an application. The method includes estimating interpolation weights of each of a plurality of language models according to an Expectation Maximization (EM) algorithm based on a first metric. The method further includes classifying the plurality of language models into two or more sets based on characteristics of the two or more sets. The method also includes estimating a hyper interpolation weight for the two or more sets based on a second metric specific to the application. The method additionally includes interpolating the plurality of language models using the interpolation weights and the hyper interpolation weight to generate a final language model.
    Type: Grant
    Filed: February 5, 2021
    Date of Patent: March 21, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Nobuyasu Itoh, Masayuki Suzuki, Gakuto Kurata
  • Publication number: 20230069628
    Abstract: A computer-implemented method for fusing an end-to-end speech recognition model with an external language model (ExternalLM) is provided. The method includes obtaining an output of the end-to-end speech recognition model. The output is a probability distribution. The method further includes transforming, by a hardware processor, the probability distribution into a transformed probability distribution to relax a sharpness of the probability distribution. The method also includes fusing the transformed probability distribution and a probability distribution of the ExternalLM for decoding speech.
    Type: Application
    Filed: August 24, 2021
    Publication date: March 2, 2023
    Inventors: Tohru Nagano, Masayuki Suzuki, Gakuto Kurata
  • Patent number: 11574181
    Abstract: Fusion of neural networks is performed by obtaining a first neural network and a second neural network. The first and the second neural networks are the result of a parent neural network subjected to different training. A similarity score is calculated of a first component of the first neural network and a corresponding second component of the second neural network. An interpolation weight is determined for the first and the second components by using the similarity score. A neural network parameter of the first component is updated based on the interpolation weight and a corresponding neural network parameter of the second component to obtain a fused neural network.
    Type: Grant
    Filed: May 8, 2019
    Date of Patent: February 7, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Takashi Fukuda, Masayuki Suzuki, Gakuto Kurata
  • Patent number: 11557288
    Abstract: A computer-implemented method of detecting a portion of audio data to be removed is provided. The method includes obtaining a recognition result of audio data. The recognition result includes recognized text data and time stamps. The method also includes extracting one or more candidate phrases from the recognition result using n-gram counts. The method further includes, for each candidate phrase, making pairs of same phrases with different time stamps and clustering the pairs of the same phrase by using differences in time stamps. The method includes further determining a portion of the audio data to be removed using results of the clustering.
    Type: Grant
    Filed: April 10, 2020
    Date of Patent: January 17, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Nobuyasu Itoh, Gakuto Kurata, Masayuki Suzuki
  • Publication number: 20220415315
    Abstract: An approach for improving speech recognition is provided. A processor receives a new word to add to a prefix tree. A processor determines a bonus score for a first transition from a first node to a second node in a prefix tree on condition that the first transition is included in a path of at least one transition representing the new word. A processor determines a hypothesis score for a hypothesis that corresponds to a speech sequence based on the prefix tree, where the hypothesis score adds the bonus score to an initial hypothesis score to determine the hypothesis score. In response to a determination that the hypothesis score exceeds a threshold value, a processor generates an output text sequence for the speech sequence based on the hypothesis.
    Type: Application
    Filed: June 23, 2021
    Publication date: December 29, 2022
    Inventors: Masayuki Suzuki, Gakuto Kurata
  • Patent number: 11443169
    Abstract: A computer implemented method for adapting a model for recognition processing to a target-domain is disclosed. The method includes preparing a first distribution in relation to a part of the model, in which the first distribution is derived from data of a training-domain for the model. The method also includes obtaining a second distribution in relation to the part of the model by using data of the target-domain. The method further includes tuning one or more parameters of the part of the model so that difference between the first and the second distributions becomes small.
    Type: Grant
    Filed: February 19, 2016
    Date of Patent: September 13, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Gakuto Kurata
  • Publication number: 20220254335
    Abstract: A computer-implemented method is provided for generating a language model for an application. The method includes estimating interpolation weights of each of a plurality of language models according to an Expectation Maximization (EM) algorithm based on a first metric. The method further includes classifying the plurality of language models into two or more sets based on characteristics of the two or more sets. The method also includes estimating a hyper interpolation weight for the two or more sets based on a second metric specific to the application. The method additionally includes interpolating the plurality of language models using the interpolation weights and the hyper interpolation weight to generate a final language model.
    Type: Application
    Filed: February 5, 2021
    Publication date: August 11, 2022
    Inventors: Nobuyasu Itoh, Masayuki Suzuki, Gakuto Kurata
  • Patent number: 11404047
    Abstract: A multi-task learning system is provided for speech recognition. The system includes a common encoder network. The system further includes a primary network for minimizing a Connectionist Temporal Classification (CTC) loss for speech recognition. The system also includes a sub network for minimizing a Mean squared error (MSE) loss for feature reconstruction. A first set of output data of the common encoder network is received by both of the primary network and the sub network. A second set of the output data of the common encode network is received only by the primary network from among the primary network and the sub network.
    Type: Grant
    Filed: March 8, 2019
    Date of Patent: August 2, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Gakuto Kurata, Kartik Audhkhasi