Patents by Inventor Brian E. D. Kingsbury

Brian E. D. Kingsbury has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20220122585
    Abstract: A computer-implemented method of building a multilingual acoustic model for automatic speech recognition in a low resource setting includes training a multilingual network on a set of training languages with an original transcribed training data to create a baseline multilingual acoustic model. Transliteration of transcribed training data is performed by processing through the multilingual network a plurality of multilingual data types from the set of languages, and outputting a pool of transliterated data. A filtering metric is applied to the pool of transliterated data output to select one or more portions of the transliterated data for retraining of the acoustic model. Data augmentation is performed by adding one or more selected portions of the output transliterated data back to the original transcribed training data to update training data. The training of a new multilingual acoustic model through the multilingual network is performed using the updated training data.
    Type: Application
    Filed: October 17, 2020
    Publication date: April 21, 2022
    Inventors: Samuel Thomas, Kartik Audhkhasi, Brian E. D. Kingsbury
  • Publication number: 20220092365
    Abstract: Techniques for classifier generalization in a supervised learning process using input encoding are provided. In one aspect, a method for classification generalization includes: encoding original input features from at least one input sample {right arrow over (x)}S with a uniquely decodable code using an encoder E(?) to produce encoded input features E({right arrow over (x)}S), wherein the at least one input sample {right arrow over (x)}S comprises uncoded input features; feeding the uncoded input features and the encoded input features E({right arrow over (x)}S) to a base model to build an encoded model; and learning a classification function {tilde over (C)}E(?) using the encoded model, wherein the classification function {tilde over (C)}E(?) learned using the encoded model is more general than that learned using the uncoded input features alone.
    Type: Application
    Filed: September 23, 2020
    Publication date: March 24, 2022
    Inventors: Hazar Yueksel, Kush Raj Varshney, Brian E.D. Kingsbury
  • Publication number: 20220084508
    Abstract: A method and system of training a spoken language understanding (SLU) model includes receiving natural language training data comprising (i) one or more speech recording, and (ii) a set of semantic entities and/or intents for each corresponding speech recording. For each speech recording, one or more entity labels and corresponding values, and one or more intent labels are extracted from the corresponding semantic entities and/or overall intent. A spoken language understanding (SLU) model is trained based upon the one or more entity labels and corresponding values, and one or more intent labels of the corresponding speech recordings without a need for a transcript of the corresponding speech recording.
    Type: Application
    Filed: September 15, 2020
    Publication date: March 17, 2022
    Inventors: Hong-Kwang Jeff Kuo, Zoltan Tueske, Samuel Thomas, Yinghui Huang, Brian E. D. Kingsbury, Kartik Audhkhasi
  • Patent number: 11158303
    Abstract: In an approach to soft-forgetting training, one or more computer processors train a first model utilizing one or more training batches wherein each training batch of the one or more training batches comprises one or more blocks of information. The one or more computer processors, responsive to a completion of the training of the first model, initiate a training of a second model utilizing the one or more training batches. The one or more computer processors jitter a random block size for each block of information for each of the one or more training batches for the second model. The one or more computer processors unroll the second model over one or more non-overlapping contiguous jittered blocks of information. The one or more computer processors, responsive to the unrolling of the second model, reduce overfitting for the second model by applying twin regularization.
    Type: Grant
    Filed: August 27, 2019
    Date of Patent: October 26, 2021
    Assignee: International Business Machines Corporation
    Inventors: Kartik Audhkhasi, George Andrei Saon, Zoltan Tueske, Brian E. D. Kingsbury, Michael Alan Picheny
  • Publication number: 20210065680
    Abstract: In an approach to soft-forgetting training, one or more computer processors train a first model utilizing one or more training batches wherein each training batch of the one or more training batches comprises one or more blocks of information. The one or more computer processors, responsive to a completion of the training of the first model, initiate a training of a second model utilizing the one or more training batches. The one or more computer processors jitter a random block size for each block of information for each of the one or more training batches for the second model. The one or more computer processors unroll the second model over one or more non-overlapping contiguous jittered blocks of information. The one or more computer processors, responsive to the unrolling of the second model, reduce overfitting for the second model by applying twin regularization.
    Type: Application
    Filed: August 27, 2019
    Publication date: March 4, 2021
    Inventors: Kartik Audhkhasi, George Andrei Saon, Zoltan Tueske, Brian E. D. Kingsbury, Michael Alan Picheny
  • Publication number: 20200058296
    Abstract: Techniques for learning front-end speech recognition parameters as part of training a neural network classifier include obtaining an input speech signal, and applying front-end speech recognition parameters to extract features from the input speech signal. The extracted features may be fed through a neural network to obtain an output classification for the input speech signal, and an error measure may be computed for the output classification through comparison of the output classification with a known target classification. Back propagation may be applied to adjust one or more of the front-end parameters as one or more layers of the neural network, based on the error measure.
    Type: Application
    Filed: July 23, 2019
    Publication date: February 20, 2020
    Applicant: Nuance Communications, Inc.
    Inventors: Tara N. Sainath, Brian E. D. Kingsbury, Abdel-rahman Mohamed, Bhuvana Ramabhadran
  • Patent number: 10360901
    Abstract: Techniques for learning front-end speech recognition parameters as part of training a neural network classifier include obtaining an input speech signal, and applying front-end speech recognition parameters to extract features from the input speech signal. The extracted features may be fed through a neural network to obtain an output classification for the input speech signal, and an error measure may be computed for the output classification through comparison of the output classification with a known target classification. Back propagation may be applied to adjust one or more of the front-end parameters as one or more layers of the neural network, based on the error measure.
    Type: Grant
    Filed: December 5, 2014
    Date of Patent: July 23, 2019
    Assignee: Nuance Communications, Inc.
    Inventors: Tara N. Sainath, Brian E. D. Kingsbury, Abdel-rahman Mohamed, Bhuvana Ramabhadran
  • Patent number: 10056075
    Abstract: A method for training a deep neural network, comprises receiving and formatting speech data for the training, preconditioning a system of equations to be used for analyzing the speech data in connection with the training by using a non-fixed point quasi-Newton preconditioning scheme, and employing flexible Krylov subspace solvers in response to variations in the preconditioning scheme for different iterations of the training.
    Type: Grant
    Filed: December 9, 2016
    Date of Patent: August 21, 2018
    Assignee: International Business Machines Corporation
    Inventors: Lior Horesh, Brian E. D. Kingsbury, Tara N. Sainath
  • Patent number: 9824683
    Abstract: A method of augmenting training data includes converting a feature sequence of a source speaker determined from a plurality of utterances within a transcript to a feature sequence of a target speaker under the same transcript, training a speaker-dependent acoustic model for the target speaker for corresponding speaker-specific acoustic characteristics, estimating a mapping function between the feature sequence of the source speaker and the speaker-dependent acoustic model of the target speaker, and mapping each utterance from each speaker in a training set using the mapping function to multiple selected target speakers in the training set.
    Type: Grant
    Filed: December 22, 2015
    Date of Patent: November 21, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Xiaodong Cui, Vaibhava Goel, Brian E. D. Kingsbury
  • Patent number: 9734823
    Abstract: Systems and methods for spoken term detection are provided. A method for spoken term detection, comprises receiving phone level out-of-vocabulary (OOV) keyword queries, converting the phone level OOV keyword queries to words, generating a confusion network (CN) based keyword searching (KWS) index, and using the CN based KWS index for both in-vocabulary (IV) keyword queries and the OOV keyword queries.
    Type: Grant
    Filed: August 27, 2015
    Date of Patent: August 15, 2017
    Assignee: International Business Machines Corporation
    Inventors: Brian E. D. Kingsbury, Hong-Kwang Kuo, Lidia Mangu, Hagen Soltau
  • Patent number: 9721559
    Abstract: A method of augmenting training data includes converting a feature sequence of a source speaker determined from a plurality of utterances within a transcript to a feature sequence of a target speaker under the same transcript, training a speaker-dependent acoustic model for the target speaker for corresponding speaker-specific acoustic characteristics, estimating a mapping function between the feature sequence of the source speaker and the speaker-dependent acoustic model of the target speaker, and mapping each utterance from each speaker in a training set using the mapping function to multiple selected target speakers in the training set.
    Type: Grant
    Filed: April 17, 2015
    Date of Patent: August 1, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Xiaodong Cui, Vaibhava Goel, Brian E. D. Kingsbury
  • Publication number: 20170200446
    Abstract: A method of augmenting training data includes converting a feature sequence of a source speaker determined from a plurality of utterances within a transcript to a feature sequence of a target speaker under the same transcript, training a speaker-dependent acoustic model for the target speaker for corresponding speaker-specific acoustic characteristics, estimating a mapping function between the feature sequence of the source speaker and the speaker-dependent acoustic model of the target speaker, and mapping each utterance from each speaker in a training set using the mapping function to multiple selected target speakers in the training set.
    Type: Application
    Filed: December 22, 2015
    Publication date: July 13, 2017
    Inventors: Xiaodong Cui, Vaibhava Goel, Brian E. D. Kingsbury
  • Patent number: 9704482
    Abstract: A method for spoken term detection, comprising generating a time-marked word list, wherein the time-marked word list is an output of an automatic speech recognition system, generating an index from the time-marked word list, wherein generating the index comprises creating a word loop weighted finite state transducer for each utterance, i, receiving a plurality of keyword queries, and searching the index for a plurality of keyword hits.
    Type: Grant
    Filed: March 11, 2015
    Date of Patent: July 11, 2017
    Assignee: International Business Machines Corporation
    Inventors: Brian E. D. Kingsbury, Lidia Mangu, Michael A. Picheny, George A. Saon
  • Patent number: 9697830
    Abstract: A method for spoken term detection, comprising generating a time-marked word list, wherein the time-marked word list is an output of an automatic speech recognition system, generating an index from the time-marked word list, wherein generating the index comprises creating a word loop weighted finite state transducer for each utterance, i, receiving a plurality of keyword queries, and searching the index for a plurality of keyword hits.
    Type: Grant
    Filed: June 25, 2015
    Date of Patent: July 4, 2017
    Assignee: International Business Machines Corporation
    Inventors: Brian E. D. Kingsbury, Lidia Mangu, Michael A. Picheny, George A. Saon
  • Patent number: 9640186
    Abstract: Deep scattering spectral features are extracted from an acoustic input signal to generate a deep scattering spectral feature representation of the acoustic input signal. The deep scattering spectral feature representation is input to a speech recognition engine. The acoustic input signal is decoded based on at least a portion of the deep scattering spectral feature representation input to a speech recognition engine.
    Type: Grant
    Filed: May 2, 2014
    Date of Patent: May 2, 2017
    Assignee: International Business Machines Corporation
    Inventors: Petr Fousek, Vaibhava Goel, Brian E. D. Kingsbury, Etienne Marcheret, Shay Maymon, David Nahamoo, Vijayaditya Peddinti, Bhuvana Ramabhadran, Tara N. Sainath
  • Publication number: 20170092263
    Abstract: A method for training a deep neural network, comprises receiving and formatting speech data for the training, preconditioning a system of equations to be used for analyzing the speech data in connection with the training by using a non-fixed point quasi-Newton preconditioning scheme, and employing flexible Krylov subspace solvers in response to variations in the preconditioning scheme for different iterations of the training.
    Type: Application
    Filed: December 9, 2016
    Publication date: March 30, 2017
    Inventors: Lior Horesh, Brian E.D. Kingsbury, Tara N. Sainath
  • Patent number: 9601109
    Abstract: A method for training a deep neural network, comprises receiving and formatting speech data for the training, preconditioning a system of equations to be used for analyzing the speech data in connection with the training by using a non-fixed point quasi-Newton preconditioning scheme, and employing flexible Krylov subspace solvers in response to variations in the preconditioning scheme for different iterations of the training.
    Type: Grant
    Filed: September 29, 2014
    Date of Patent: March 21, 2017
    Assignee: International Business Machines Corporation
    Inventors: Lior Horesh, Brian E. D. Kingsbury, Tara N. Sainath
  • Publication number: 20170040016
    Abstract: A method of augmenting training data includes converting a feature sequence of a source speaker determined from a plurality of utterances within a transcript to a feature sequence of a target speaker under the same transcript, training a speaker-dependent acoustic model for the target speaker for corresponding speaker-specific acoustic characteristics, estimating a mapping function between the feature sequence of the source speaker and the speaker-dependent acoustic model of the target speaker, and mapping each utterance from each speaker in a training set using the mapping function to multiple selected target speakers in the training set.
    Type: Application
    Filed: April 17, 2015
    Publication date: February 9, 2017
    Inventors: Xiaodong Cui, Vaibhava Goel, Brian E. D. Kingsbury
  • Patent number: 9477753
    Abstract: Systems and methods for processing a query include determining a plurality of sets of match candidates for a query using a processor, each of the plurality of sets of match candidates being independently determined from a plurality of diverse word lattice generation components of different type. The plurality of sets of match candidates is merged by generating a first score for each match candidate to provide a merged set of match candidates. A second score is computed for each match candidate of the merged set based upon features of that match candidate. The first score and the second score are combined to provide a final set of match candidates as matches to the query.
    Type: Grant
    Filed: March 12, 2013
    Date of Patent: October 25, 2016
    Assignee: International Business Machines Corporation
    Inventors: Brian E. D. Kingsbury, Hong-Kwang Jeff Kuo, Lidia Luminita Mangu, Hagen Soltau
  • Publication number: 20160267906
    Abstract: A method for spoken term detection, comprising generating a time-marked word list, wherein the time-marked word list is an output of an automatic speech recognition system, generating an index from the time-marked word list, wherein generating the index comprises creating a word loop weighted finite state transducer for each utterance, i, receiving a plurality of keyword queries, and searching the index for a plurality of keyword hits.
    Type: Application
    Filed: March 11, 2015
    Publication date: September 15, 2016
    Inventors: Brian E.D. Kingsbury, Lidia Mangu, Michael A. Picheny, George A. Saon