Patents by Inventor George A. Saon

George A. Saon has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Discriminative learning via hierarchical transformations

Patent number: 9378464

Abstract: A system and an article of manufacture for discriminative learning via hierarchical transformations, which includes obtaining a model of a first set of data, two or more data transformations, and a second set of data, evaluating the two or more data transformations to determine which data transformation will most effectively modify the second set of data to match the model, and selecting the data transformation that will most effectively modify the second set of data to match the model based on the evaluation.

Type: Grant

Filed: July 30, 2012

Date of Patent: June 28, 2016

Assignee: International Business Machines Corporation

Inventors: Sasha P. Caskey, Dimitri Kanevsky, Brian Kingsbury, Tara N. Sainath, George Saon
Applying speaker adaption techniques to correlated features

Patent number: 9373324

Abstract: Systems and methods for applying feature-space maximum likelihood linear regression (fMLLR) to correlated features are provided. A method for applying fMLLR to correlated features, comprises mapping the correlated features into an uncorrelated feature space, applying fMLLR in the uncorrelated feature space to obtain fMLLR transformed features, and mapping the fMLLR transformed features back to a correlated feature space.

Type: Grant

Filed: June 25, 2014

Date of Patent: June 21, 2016

Assignee: International Business Machines Corporation

Inventors: Tara N. Sainath, George A. Saon
SYSTEMS AND METHODS FOR APPLYING SPEAKER ADAPTION TECHNIQUES TO CORRELATED FEATURES

Publication number: 20150161993

Abstract: Systems and methods for applying feature-space maximum likelihood linear regression (fMLLR) to correlated features are provided. A method for applying fMLLR to correlated features, comprises mapping the correlated features into an uncorrelated feature space, applying fMLLR in the uncorrelated feature space to obtain fMLLR transformed features, and mapping the fMLLR transformed features back to a correlated feature space.

Type: Application

Filed: June 25, 2014

Publication date: June 11, 2015

Inventors: Tara N. Sainath, George A. Saon
METHOD AND SYSTEM FOR JOINT TRAINING OF HYBRID NEURAL NETWORKS FOR ACOUSTIC MODELING IN AUTOMATIC SPEECH RECOGNITION

Publication number: 20150161522

Abstract: Systems and methods for training networks are provided. A method for training networks comprises receiving an input from each of a plurality of neural networks differing from each other in at least one of architecture, input modality, and feature type, connecting the plurality of neural networks through a common output layer, or through one or more common hidden layers and a common output layer to result in a joint network, and training the joint network.

Type: Application

Filed: June 24, 2014

Publication date: June 11, 2015

Inventors: George A. Saon, Hagen Soltau
Speaker Adaptation of Neural Network Acoustic Models Using I-Vectors

Publication number: 20150149165

Abstract: A method includes providing a deep neural network acoustic model, receiving audio data including one or more utterances of a speaker, extracting a plurality of speech recognition features from the one or more utterances of the speaker, creating a speaker identity vector for the speaker based on the extracted speech recognition features, and adapting the deep neural network acoustic model for automatic speech recognition using the extracted speech recognition features and the speaker identity vector.

Type: Application

Filed: September 29, 2014

Publication date: May 28, 2015

Inventor: George A. Saon
Discriminative Learning Via Hierarchical Transformations

Publication number: 20140032571

Abstract: A system and an article of manufacture for discriminative learning via hierarchical transformations, which includes obtaining a model of a first set of data, two or more data transformations, and a second set of data, evaluating the two or more data transformations to determine which data transformation will most effectively modify the second set of data to match the model, and selecting the data transformation that will most effectively modify the second set of data to match the model based on the evaluation.

Type: Application

Filed: July 30, 2012

Publication date: January 30, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Sasha P. Caskey, Dimitri Kanevsky, Brian Kingsbury, Tara N. Sainath, George Saon
Discriminative Learning Via Hierarchical Transformations

Publication number: 20140032570

Abstract: Techniques for discriminative learning via hierarchical transformations. A method includes obtaining a model of a first set of data, two or more data transformations, and a second set of data, evaluating the two or more data transformations to determine which data transformation will most effectively modify the second set of data to match the model, and selecting the data transformation that will most effectively modify the second set of data to match the model based on the evaluation.

Type: Application

Filed: July 30, 2012

Publication date: January 30, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Sasha P. Caskey, Dimitri Kanevsky, Brian Kingsbury, Tara N. Sainath, George Saon
Minimum bayes error feature selection in speech recognition

Patent number: 7529666

Abstract: In connection with speech recognition, the design of a linear transformation ??p×n, of rank p×n, which projects the features of a classifier x?n onto y=?x?p such as to achieve minimum Bayes error (or probability of misclassification). Two avenues are explored: the first is to maximize the ?-average divergence between the class densities and the second is to minimize the union Bhattacharyya bound in the range of ?. While both approaches yield similar performance in practice, they outperform standard linear discriminant analysis features and show a 10% relative improvement in the word error rate over known cepstral features on a large vocabulary telephony speech recognition task.

Type: Grant

Filed: October 30, 2000

Date of Patent: May 5, 2009

Assignee: International Business Machines Corporation

Inventors: Mukund Padmanabhan, George A. Saon
SPEECH RECOGNITION UTILIZING MULTITUDE OF SPEECH FEATURES

Publication number: 20080312921

Abstract: In a speech recognition system, the combination of a log-linear model with a multitude of speech features is provided to recognize unknown speech utterances. The speech recognition system models the posterior probability of linguistic units relevant to speech recognition using a log-linear model. The posterior model captures the probability of the linguistic unit given the observed speech features and the parameters of the posterior model. The posterior model may be determined using the probability of the word sequence hypotheses given a multitude of speech features. Log-linear models are used with features derived from sparse or incomplete data. The speech features that are utilized may include asynchronous, overlapping, and statistically non-independent speech features. Not all features used in training need to appear in testing/recognition.

Type: Application

Filed: August 20, 2008

Publication date: December 18, 2008

Inventors: Scott E. Axelrod, Sreeram Viswanath Balakrishnan, Stanley F. Chen, Yuging Gao, Rameah A. Gopinath, Hong-Kwang Kuo, Benoit Maison, David Nahamoo, Michael Alan Picheny, George A. Saon, Geoffrey G. Zweig
Speech recognition utilizing multitude of speech features

Patent number: 7464031

Abstract: In a speech recognition system, the combination of a log-linear model with a multitude of speech features is provided to recognize unknown speech utterances. The speech recognition system models the posterior probability of linguistic units relevant to speech recognition using a log-linear model. The posterior model captures the probability of the linguistic unit given the observed speech features and the parameters of the posterior model. The posterior model may be determined using the probability of the word sequence hypotheses given a multitude of speech features. Log-linear models are used with features derived from sparse or incomplete data. The speech features that are utilized may include asynchronous, overlapping, and statistically non-independent speech features. Not all features used in training need to appear in testing/recognition.

Type: Grant

Filed: November 28, 2003

Date of Patent: December 9, 2008

Assignee: International Business Machines Corporation

Inventors: Scott E. Axelrod, Sreeram Viswanath Balakrishnan, Stanley F. Chen, Yuging Gao, Ramesh A. Gopinath, Hong-Kwang Kuo, Benoit Maison, David Nahamoo, Michael Alan Picheny, George A. Saon, Geoffrey G. Zweig
Lattice-based unsupervised maximum likelihood linear regression for speaker adaptation

Patent number: 7216077

Abstract: Methods and arrangements using lattice-based information for unsupervised speaker adaptation. By performing adaptation against a word lattice, correct models are more likely to be used in estimating a transform. Further, a particular type of lattice proposed herein enables the use of a natural confidence measure given by the posterior occupancy probability of a state, that is, the statistics of a particular state will be updated with the current frame only if the a posteriori probability of the state at that particular time is greater than a predetermined threshold.

Type: Grant

Filed: September 26, 2000

Date of Patent: May 8, 2007

Assignee: International Business Machines Corporation

Inventors: Mukund Padmanabhan, George A. Saon, Geoffrey G. Zweig
Speech recognition utilizing multitude of speech features

Publication number: 20050119885

Abstract: In a speech recognition system, the combination of a log-linear model with a multitude of speech features is provided to recognize unknown speech utterances. The speech recognition system models the posterior probability of linguistic units relevant to speech recognition using a log-linear model. The posterior model captures the probability of the linguistic unit given the observed speech features and the parameters of the posterior model. The posterior model may be determined using the probability of the word sequence hypotheses given a multitude of speech features. Log-linear models are used with features derived from sparse or incomplete data. The speech features that are utilized may include asynchronous, overlapping, and statistically non-independent speech features. Not all features used in training need to appear in testing/recognition.

Type: Application

Filed: November 28, 2003

Publication date: June 2, 2005

Inventors: Scott Axelrod, Sreeram Balakrishnan, Stanley Chen, Yuging Gao, Ramesh Gopinath, Hong-Kwang Kuo, Benoit Maison, David Nahamoo, Michael Picheny, George Saon, Geoffrey Zweig

prev 1 2