Patents by Inventor Gakuto Kurata

Gakuto Kurata has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

DOMAIN ADAPTIVE SPEECH RECOGNITION USING ARTIFICIAL INTELLIGENCE

Publication number: 20240127801

Abstract: Methods, systems, and computer program products for domain adaptive speech recognition using artificial intelligence are provided herein. A computer-implemented method includes generating a set of language data candidates, each language data candidate comprising one or more graphemes, by processing a sequence of phonemes related to input speech data using an artificial intelligence-based data conversion model; determining, for a target pair of phonemes and graphemes, a subset of graphemes from the set of language data candidates; generating a first speech recognition output by processing the subset of graphemes using at least one biasing language model and an artificial intelligence-based speech recognition model; generating a second speech recognition output by replacing at least a portion of the subset of graphemes in the first speech recognition output with at least one of the graphemes from the target pair; and performing automated actions based on the second speech recognition output.

Type: Application

Filed: October 13, 2022

Publication date: April 18, 2024

Inventors: Tohru Nagano, Gakuto Kurata
Neural network for chemical compounds

Patent number: 11934938

Abstract: A computer implemented method for training a neural network to capture a structural feature specific to a set of chemical compounds is disclosed. In the method, the computer system reads an expression describing a structure of the chemical compound for each chemical compound in the set and enumerates one or more combinations of a position and a type of a structural element appearing in the expression for each chemical compound in the set. The computer system also generates training data based on the one or more enumerated combinations for each chemical compound in the set. The training data includes one or more values with a length, each of which indicates whether or not a corresponding type of the structural element appears at a corresponding position for each combination. Furthermore, the computer system trains the neural network based on the training data for the set of the chemical compounds.

Type: Grant

Filed: December 23, 2020

Date of Patent: March 19, 2024

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Satoshi Hara, Gakuto Kurata, Shigeru Nakagawa, Seiji Takeda
Integrating text inputs for training and adapting neural network transducer ASR models

Patent number: 11908454

Abstract: A processor-implemented method trains an automatic speech recognition system using speech data and text data. A computer device receives speech data, and generates a spectrogram based on the speech data. The computing device receives text data associated with an entire corpus of text data, and generates a textogram based upon the text data. The computing device trains an automatic speech recognition system using the spectrogram and the textogram.

Type: Grant

Filed: December 1, 2021

Date of Patent: February 20, 2024

Assignee: International Business Machines Corporation

Inventors: Samuel Thomas, Hong-Kwang Kuo, Brian E. D. Kingsbury, George Andrei Saon, Gakuto Kurata
Customization of recurrent neural network transducers for speech recognition

Patent number: 11908458

Abstract: A computer-implemented method for customizing a recurrent neural network transducer (RNN-T) is provided. The computer implemented method includes synthesizing first domain audio data from first domain text data, and feeding the synthesized first domain audio data into a trained encoder of the recurrent neural network transducer (RNN-T) having an initial condition, wherein the encoder is updated using the synthesized first domain audio data and the first domain text data. The computer implemented method further includes synthesizing second domain audio data from second domain text data, and feeding the synthesized second domain audio data into the updated encoder of the recurrent neural network transducer (RNN-T), wherein the prediction network is updated using the synthesized second domain audio data and the second domain text data. The computer implemented method further includes restoring the updated encoder to the initial condition.

Type: Grant

Filed: December 29, 2020

Date of Patent: February 20, 2024

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Gakuto Kurata, George Andrei Saon, Brian E. D. Kingsbury
Adding words to a prefix tree for improving speech recognition

Patent number: 11893983

Abstract: An approach for improving speech recognition is provided. A processor receives a new word to add to a prefix tree. A processor determines a bonus score for a first transition from a first node to a second node in a prefix tree on condition that the first transition is included in a path of at least one transition representing the new word. A processor determines a hypothesis score for a hypothesis that corresponds to a speech sequence based on the prefix tree, where the hypothesis score adds the bonus score to an initial hypothesis score to determine the hypothesis score. In response to a determination that the hypothesis score exceeds a threshold value, a processor generates an output text sequence for the speech sequence based on the hypothesis.

Type: Grant

Filed: June 23, 2021

Date of Patent: February 6, 2024

Assignee: International Business Machines Corporation

Inventors: Masayuki Suzuki, Gakuto Kurata
VOICE ACTIVITY DETECTION INTEGRATION TO IMPROVE AUTOMATIC SPEECH DETECTION

Publication number: 20240038221

Abstract: Systems, computer-implemented methods, and computer program products to facilitate multi-task training a recurrent neural network transducer (RNN-T) using automatic speech recognition (ASR) information are provided. According to an embodiment, a system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can include an RNN-T that can receive ASR information. The computer executable components can include a voice activity detection (VAD) model that trains the RNN-T using the ASR information, where the RNN-T can further comprise an encoder and a joint network. One or more outputs of the encoder can be integrated with the joint network and one or more outputs of the VAD model.

Type: Application

Filed: July 28, 2022

Publication date: February 1, 2024

Inventors: Sashi Novitasari, Takashi Fukuda, Gakuto Kurata
ACCURACY OF STREAMING RNN TRANSDUCER

Publication number: 20230410797

Abstract: A computer-implemented method is provided for model training. The method includes training a second end-to-end neural speech recognition model that has a bidirectional encoder to output same symbols from an output probability lattice of the second end-to-end neural speech recognition model as from an output probability lattice of a trained first end-to-end neural speech recognition model having a unidirectional encoder. The method also includes building a third end-to-end neural speech recognition model that has a unidirectional encoder by training the third end-to-end neural speech recognition model as a student by using the trained second end-to-end neural speech recognition model as a teacher in a knowledge distillation method.

Type: Application

Filed: September 1, 2023

Publication date: December 21, 2023

Inventors: Gakuto Kurata, George Andrei Saon
Accuracy of streaming RNN transducer

Patent number: 11783811

Abstract: A computer-implemented method is provided for model training. The method includes training a second end-to-end neural speech recognition model that has a bidirectional encoder to output same symbols from an output probability lattice of the second end-to-end neural speech recognition model as from an output probability lattice of a trained first end-to-end neural speech recognition model having a unidirectional encoder. The method also includes building a third end-to-end neural speech recognition model that has a unidirectional encoder by training the third end-to-end neural speech recognition model as a student by using the trained second end-to-end neural speech recognition model as a teacher in a knowledge distillation method.

Type: Grant

Filed: September 24, 2020

Date of Patent: October 10, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Gakuto Kurata, George Andrei Saon
Training of student neural network with teacher neural networks

Patent number: 11741355

Abstract: A student neural network may be trained by a computer-implemented method, including: inputting common input data to each teacher neural network among a plurality of teacher neural networks to obtain a soft label output among a plurality of soft label outputs from each teacher neural network among the plurality of teacher neural networks, and training a student neural network with the input data and the plurality of soft label outputs.

Type: Grant

Filed: July 27, 2018

Date of Patent: August 29, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Takashi Fukuda, Masayuki Suzuki, Osamu Ichikawa, Gakuto Kurata, Samuel Thomas, Bhuvana Ramabhadran
EXTERNAL LANGUAGE MODEL INFORMATION INTEGRATED INTO NEURAL TRANSDUCER MODEL

Publication number: 20230237989

Abstract: A computer-implemented method for training a neural transducer is provided including, by using audio data and transcription data of the audio data as input data, obtaining outputs from a trained language model and a seed neural transducer, respectively, combining the outputs to obtain a supervisory output, and updating parameters of another neural transducer in training so that its output is close to the supervisory output. The neural transducer can be a Recurrent Neural Network Transducer (RNN-T).

Type: Application

Filed: January 21, 2022

Publication date: July 27, 2023

Inventor: Gakuto Kurata
KNOWLEDGE TRANSFER BETWEEN RECURRENT NEURAL NETWORKS

Publication number: 20230196107

Abstract: Knowledge transfer between recurrent neural networks is performed by obtaining a first output sequence from a bidirectional Recurrent Neural Network (RNN) model for an input sequence, obtaining a second output sequence from a unidirectional RNN model for the input sequence, selecting at least one first output from the first output sequence based on a similarity between the at least one first output and a second output from the second output sequence; and training the unidirectional RNN model to increase the similarity between the at least one first output and the second output.

Type: Application

Filed: February 14, 2023

Publication date: June 22, 2023

Inventors: Gakuto Kurata, Kartik Audhkhasi
Knowledge transfer between recurrent neural networks

Patent number: 11625595

Abstract: Knowledge transfer between recurrent neural networks is performed by obtaining a first output sequence from a bidirectional Recurrent Neural Network (RNN) model for an input sequence, obtaining a second output sequence from a unidirectional RNN model for the input sequence, selecting at least one first output from the first output sequence based on a similarity between the at least one first output and a second output from the second output sequence; and training the unidirectional RNN model to increase the similarity between the at least one first output and the second output.

Type: Grant

Filed: August 29, 2018

Date of Patent: April 11, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Gakuto Kurata, Kartik Audhkhasi
SEPARATING ACOUSTIC AND LINGUISTIC INFORMATION IN NEURAL TRANSDUCER MODELS FOR END-TO-END SPEECH RECOGNITION

Publication number: 20230104244

Abstract: A computer-implemented method is provided for training a Recurrent Neural Network Transducer (RNN-T). The method includes training, by inputting a set of audio data, a first RNN-T which includes a common encoder, a forward prediction network, and a first joint network combining outputs of both the common encoder and the forward prediction network. The forward prediction network predicts label sequences forward. The method further includes training, by inputting the set of audio data, a second RNN-T which includes the common encoder, a backward prediction network, and a second joint network combining outputs of both the common encoder and the backward prediction network. The backward prediction network predicts label sequences backward. The trained first RNN-T is used for inference.

Type: Application

Filed: September 17, 2021

Publication date: April 6, 2023

Inventor: Gakuto Kurata
Training of student neural network with switched teacher neural networks

Patent number: 11610108

Abstract: A student neural network may be trained by a computer-implemented method, including: selecting a teacher neural network among a plurality of teacher neural networks, inputting an input data to the selected teacher neural network to obtain a soft label output generated by the selected teacher neural network, and training a student neural network with at least the input data and the soft label output from the selected teacher neural network.

Type: Grant

Filed: July 27, 2018

Date of Patent: March 21, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Takashi Fukuda, Masayuki Suzuki, Osamu Ichikawa, Gakuto Kurata, Samuel Thomas, Bhuvana Ramabhadran
Multi-step linear interpolation of language models

Patent number: 11610581

Abstract: A computer-implemented method is provided for generating a language model for an application. The method includes estimating interpolation weights of each of a plurality of language models according to an Expectation Maximization (EM) algorithm based on a first metric. The method further includes classifying the plurality of language models into two or more sets based on characteristics of the two or more sets. The method also includes estimating a hyper interpolation weight for the two or more sets based on a second metric specific to the application. The method additionally includes interpolating the plurality of language models using the interpolation weights and the hyper interpolation weight to generate a final language model.

Type: Grant

Filed: February 5, 2021

Date of Patent: March 21, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Nobuyasu Itoh, Masayuki Suzuki, Gakuto Kurata
EXTERNAL LANGUAGE MODEL FUSING METHOD FOR SPEECH RECOGNITION

Publication number: 20230069628

Abstract: A computer-implemented method for fusing an end-to-end speech recognition model with an external language model (ExternalLM) is provided. The method includes obtaining an output of the end-to-end speech recognition model. The output is a probability distribution. The method further includes transforming, by a hardware processor, the probability distribution into a transformed probability distribution to relax a sharpness of the probability distribution. The method also includes fusing the transformed probability distribution and a probability distribution of the ExternalLM for decoding speech.

Type: Application

Filed: August 24, 2021

Publication date: March 2, 2023

Inventors: Tohru Nagano, Masayuki Suzuki, Gakuto Kurata
Fusion of neural networks

Patent number: 11574181

Abstract: Fusion of neural networks is performed by obtaining a first neural network and a second neural network. The first and the second neural networks are the result of a parent neural network subjected to different training. A similarity score is calculated of a first component of the first neural network and a corresponding second component of the second neural network. An interpolation weight is determined for the first and the second components by using the similarity score. A neural network parameter of the first component is updated based on the interpolation weight and a corresponding neural network parameter of the second component to obtain a fused neural network.

Type: Grant

Filed: May 8, 2019

Date of Patent: February 7, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Takashi Fukuda, Masayuki Suzuki, Gakuto Kurata
Hindrance speech portion detection using time stamps

Patent number: 11557288

Abstract: A computer-implemented method of detecting a portion of audio data to be removed is provided. The method includes obtaining a recognition result of audio data. The recognition result includes recognized text data and time stamps. The method also includes extracting one or more candidate phrases from the recognition result using n-gram counts. The method further includes, for each candidate phrase, making pairs of same phrases with different time stamps and clustering the pairs of the same phrase by using differences in time stamps. The method includes further determining a portion of the audio data to be removed using results of the clustering.

Type: Grant

Filed: April 10, 2020

Date of Patent: January 17, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Nobuyasu Itoh, Gakuto Kurata, Masayuki Suzuki
ADDING WORDS TO A PREFIX TREE FOR IMPROVING SPEECH RECOGNITION

Publication number: 20220415315

Abstract: An approach for improving speech recognition is provided. A processor receives a new word to add to a prefix tree. A processor determines a bonus score for a first transition from a first node to a second node in a prefix tree on condition that the first transition is included in a path of at least one transition representing the new word. A processor determines a hypothesis score for a hypothesis that corresponds to a speech sequence based on the prefix tree, where the hypothesis score adds the bonus score to an initial hypothesis score to determine the hypothesis score. In response to a determination that the hypothesis score exceeds a threshold value, a processor generates an output text sequence for the speech sequence based on the hypothesis.

Type: Application

Filed: June 23, 2021

Publication date: December 29, 2022

Inventors: Masayuki Suzuki, Gakuto Kurata
Adaptation of model for recognition processing

Patent number: 11443169

Abstract: A computer implemented method for adapting a model for recognition processing to a target-domain is disclosed. The method includes preparing a first distribution in relation to a part of the model, in which the first distribution is derived from data of a training-domain for the model. The method also includes obtaining a second distribution in relation to the part of the model by using data of the target-domain. The method further includes tuning one or more parameters of the part of the model so that difference between the first and the second distributions becomes small.

Type: Grant

Filed: February 19, 2016

Date of Patent: September 13, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: Gakuto Kurata

1 2 3 4 5 … next