Patents by Inventor Hagen Soltau

Hagen Soltau has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

JOINT AUTOMATIC SPEECH RECOGNITION AND SPEAKER DIARIZATION

Publication number: 20220199094

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing audio data using neural networks.

Type: Application

Filed: April 6, 2020

Publication date: June 23, 2022

Inventors: Laurent El Shafey, Hagen Soltau, Izhak Shafran
Dialect-specific acoustic language modeling and speech recognition

Patent number: 11164566

Abstract: Methods and systems for automatic speech recognition and methods and systems for training acoustic language models are disclosed. In accordance with one automatic speech recognition method, an acoustic input data set is analyzed to identify portions of the input data set that conform to a general language and to identify portions of the input data set that conform to at least one dialect of the general language. In addition, a general language model and at least one dialect language model is applied to the input data set to perform speech recognition by dynamically selecting between the models in accordance with each of the identified portions. Further, speech recognition results obtained in accordance with the application of the models is output.

Type: Grant

Filed: May 7, 2018

Date of Patent: November 2, 2021

Assignee: International Business Machines Corporation

Inventors: Fadi Biadsy, Lidia Mangu, Hagen Soltau
DIALECT-SPECIFIC ACOUSTIC LANGUAGE MODELING AND SPEECH RECOGNITION

Publication number: 20190156820

Abstract: Methods and systems for automatic speech recognition and methods and systems for training acoustic language models are disclosed. In accordance with one automatic speech recognition method, an acoustic input data set is analyzed to identify portions of the input data set that conform to a general language and to identify portions of the input data set that conform to at least one dialect of the general language. In addition, a general language model and at least one dialect language model is applied to the input data set to perform speech recognition by dynamically selecting between the models in accordance with each of the identified portions. Further, speech recognition results obtained in accordance with the application of the models is output.

Type: Application

Filed: May 7, 2018

Publication date: May 23, 2019

Inventors: FADI BIADSY, LIDIA MANGU, HAGEN SOLTAU
Method and system for joint training of hybrid neural networks for acoustic modeling in automatic speech recognition

Patent number: 10262260

Abstract: Systems and methods for training networks are provided. A method for training networks comprises receiving an input from each of a plurality of neural networks differing from each other in at least one of architecture, input modality, and feature type, connecting the plurality of neural networks through a common output layer, or through one or more common hidden layers and a common output layer to result in a joint network, and training the joint network.

Type: Grant

Filed: February 16, 2017

Date of Patent: April 16, 2019

Assignee: International Business Machines Corporation

Inventors: George A. Saon, Hagen Soltau
ACOUSTIC-TO-WORD NEURAL NETWORK SPEECH RECOGNIZER

Publication number: 20180174576

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for large vocabulary continuous speech recognition. One method includes receiving audio data representing an utterance of a speaker. Acoustic features of the audio data are provided to a recurrent neural network trained using connectionist temporal classification to estimate likelihoods of occurrence of whole words based on acoustic feature input. Output of the recurrent neural network generated in response to the acoustic features is received. The output indicates a likelihood of occurrence for each of multiple different words in a vocabulary. A transcription for the utterance is generated based on the output of the recurrent neural network. The transcription is provided as output of the automated speech recognition system.

Type: Application

Filed: December 7, 2017

Publication date: June 21, 2018

Inventors: Hagen Soltau, Hasim Sak, Hank Liao
Dialect-specific acoustic language modeling and speech recognition

Patent number: 9966064

Abstract: Methods and systems for automatic speech recognition and methods and systems for training acoustic language models are disclosed. In accordance with one automatic speech recognition method, an acoustic input data set is analyzed to identify portions of the input data set that conform to a general language and to identify portions of the input data set that conform to at least one dialect of the general language. In addition, a general language model and at least one dialect language model is applied to the input data set to perform speech recognition by dynamically selecting between the models in accordance with each of the identified portions. Further, speech recognition results obtained in accordance with the application of the models is output.

Type: Grant

Filed: July 18, 2012

Date of Patent: May 8, 2018

Assignee: International Business Machines Corporation

Inventors: Fadi Biadsy, Lidia Mangu, Hagen Soltau
Method and system for efficient spoken term detection using confusion networks

Patent number: 9734823

Abstract: Systems and methods for spoken term detection are provided. A method for spoken term detection, comprises receiving phone level out-of-vocabulary (OOV) keyword queries, converting the phone level OOV keyword queries to words, generating a confusion network (CN) based keyword searching (KWS) index, and using the CN based KWS index for both in-vocabulary (IV) keyword queries and the OOV keyword queries.

Type: Grant

Filed: August 27, 2015

Date of Patent: August 15, 2017

Assignee: International Business Machines Corporation

Inventors: Brian E. D. Kingsbury, Hong-Kwang Kuo, Lidia Mangu, Hagen Soltau
METHOD AND SYSTEM FOR JOINT TRAINING OF HYBRID NEURAL NETWORKS FOR ACOUSTIC MODELING IN AUTOMATIC SPEECH RECOGNITION

Publication number: 20170161608

Abstract: Systems and methods for training networks are provided. A method for training networks comprises receiving an input from each of a plurality of neural networks differing from each other in at least one of architecture, input modality, and feature type, connecting the plurality of neural networks through a common output layer, or through one or more common hidden layers and a common output layer to result in a joint network, and training the joint network.

Type: Application

Filed: February 16, 2017

Publication date: June 8, 2017

Inventors: GEORGE A. SAON, HAGEN SOLTAU
Method and system for joint training of hybrid neural networks for acoustic modeling in automatic speech recognition

Patent number: 9665823

Abstract: Systems and methods for training networks are provided. A method for training networks comprises receiving an input from each of a plurality of neural networks differing from each other in at least one of architecture, input modality, and feature type, connecting the plurality of neural networks through a common output layer, or through one or more common hidden layers and a common output layer to result in a joint network, and training the joint network.

Type: Grant

Filed: June 24, 2014

Date of Patent: May 30, 2017

Assignee: International Business Machines Corporation

Inventors: George A. Saon, Hagen Soltau
Classifier-based system combination for spoken term detection

Patent number: 9477753

Abstract: Systems and methods for processing a query include determining a plurality of sets of match candidates for a query using a processor, each of the plurality of sets of match candidates being independently determined from a plurality of diverse word lattice generation components of different type. The plurality of sets of match candidates is merged by generating a first score for each match candidate to provide a merged set of match candidates. A second score is computed for each match candidate of the merged set based upon features of that match candidate. The first score and the second score are combined to provide a final set of match candidates as matches to the query.

Type: Grant

Filed: March 12, 2013

Date of Patent: October 25, 2016

Assignee: International Business Machines Corporation

Inventors: Brian E. D. Kingsbury, Hong-Kwang Jeff Kuo, Lidia Luminita Mangu, Hagen Soltau
METHOD AND SYSTEM FOR EFFICIENT SPOKEN TERM DETECTION USING CONFUSION NETWORKS

Publication number: 20160005398

Abstract: Systems and methods for spoken term detection are provided. A method for spoken term detection, comprises receiving phone level out-of-vocabulary (OOV) keyword queries, converting the phone level OOV keyword queries to words, generating a confusion network (CN) based keyword searching (KWS) index, and using the CN based KWS index for both in-vocabulary (IV) keyword queries and the OOV keyword queries.

Type: Application

Filed: August 27, 2015

Publication date: January 7, 2016

Inventors: Brian E.D. Kingsbury, Hong-Kwang Kuo, Lidia Mangu, Hagen Soltau
Method and system for efficient spoken term detection using confusion networks

Patent number: 9196243

Abstract: Systems and methods for spoken term detection are provided. A method for spoken term detection, comprises receiving phone level out-of-vocabulary (OOV) keyword queries, converting the phone level OOV keyword queries to words, generating a confusion network (CN) based keyword searching (KWS) index, and using the CN based KWS index for both in-vocabulary (IV) keyword queries and the OOV keyword queries.

Type: Grant

Filed: March 31, 2014

Date of Patent: November 24, 2015

Assignee: International Business Machines Corporation

Inventors: Brian E. D. Kingsbury, Hong-Kwang Kuo, Lidia Mangu, Hagen Soltau
DIALECT-SPECIFIC ACOUSTIC LANGUAGE MODELING AND SPEECH RECOGNITION

Publication number: 20150287405

Abstract: Methods and systems for automatic speech recognition and methods and systems for training acoustic language models are disclosed. In accordance with one automatic speech recognition method, an acoustic input data set is analyzed to identify portions of the input data set that conform to a general language and to identify portions of the input data set that conform to at least one dialect of the general language. In addition, a general language model and at least one dialect language model is applied to the input data set to perform speech recognition by dynamically selecting between the models in accordance with each of the identified portions. Further, speech recognition results obtained in accordance with the application of the models is output.

Type: Application

Filed: July 18, 2012

Publication date: October 8, 2015

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: FADI BIADSY, LIDIA MANGU, HAGEN SOLTAU
METHOD AND SYSTEM FOR EFFICIENT SPOKEN TERM DETECTION USING CONFUSION NETWORKS

Publication number: 20150279358

Abstract: Systems and methods for spoken term detection are provided. A method for spoken term detection, comprises receiving phone level out-of-vocabulary (OOV) keyword queries, converting the phone level OOV keyword queries to words, generating a confusion network (CN) based keyword searching (KWS) index, and using the CN based KWS index for both in-vocabulary (IV) keyword queries and the OOV keyword queries.

Type: Application

Filed: March 31, 2014

Publication date: October 1, 2015

Applicant: International Business Machines Corporation

Inventors: Brian E.D. Kingsbury, Hong-Kwang Kuo, Lidia Mangu, Hagen Soltau
METHOD AND SYSTEM FOR JOINT TRAINING OF HYBRID NEURAL NETWORKS FOR ACOUSTIC MODELING IN AUTOMATIC SPEECH RECOGNITION

Publication number: 20150161522

Abstract: Systems and methods for training networks are provided. A method for training networks comprises receiving an input from each of a plurality of neural networks differing from each other in at least one of architecture, input modality, and feature type, connecting the plurality of neural networks through a common output layer, or through one or more common hidden layers and a common output layer to result in a joint network, and training the joint network.

Type: Application

Filed: June 24, 2014

Publication date: June 11, 2015

Inventors: George A. Saon, Hagen Soltau
CLASSIFIER-BASED SYSTEM COMBINATION FOR SPOKEN TERM DETECTION

Publication number: 20140278390

Abstract: Systems and methods for processing a query include determining a plurality of sets of match candidates for a query using a processor, each of the plurality of sets of match candidates being independently determined from a plurality of diverse word lattice generation components of different type. The plurality of sets of match candidates is merged by generating a first score for each match candidate to provide a merged set of match candidates. A second score is computed for each match candidate of the merged set based upon features of that match candidate. The first score and the second score are combined to provide a final set of match candidates as matches to the query.

Type: Application

Filed: March 12, 2013

Publication date: September 18, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Brian E. D. Kingsbury, Hong-Kwang Jeff Kuo, Lidia Luminita Mangu, Hagen Soltau
Dialect-specific acoustic language modeling and speech recognition

Patent number: 8583432

Abstract: Methods and systems for automatic speech recognition and methods and systems for training acoustic language models are disclosed. One system for automatic speech recognition includes a dialect recognition unit and a controller. The dialect recognition unit is configured to analyze acoustic input data to identify portions of the acoustic input data that conform to a general language and to identify portions of the acoustic input data that conform to at least one dialect of the general language. In addition, the controller is configured to apply a general language model and at least one dialect language model to the input data to perform speech recognition by dynamically selecting between the models in accordance with each of the identified portions.

Type: Grant

Filed: July 25, 2012

Date of Patent: November 12, 2013

Assignee: International Business Machines Corporation

Inventors: Fadi Biadsy, Lidia Mangu, Hagen Soltau