Patents by Inventor Yifan Gong

Yifan Gong has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

High performance HMM adaptation with joint compensation of additive and convolutive distortions

Patent number: 8180637

Abstract: A method of compensating for additive and convolutive distortions applied to a signal indicative of an utterance is discussed. The method includes receiving a signal and initializing noise mean and channel mean vectors. Gaussian dependent matrix and Hidden Markov Model (HMM) parameters are calculated or updated to account for additive noise from the noise mean vector or convolutive distortion from the channel mean vector. The HMM parameters are adapted by decoding the utterance using the previously calculated HMM parameters and adjusting the Gaussian dependent matrix and the HMM parameters based upon data received during the decoding. The adapted HMM parameters are applied to decode the input utterance and provide a transcription of the utterance.

Type: Grant

Filed: December 3, 2007

Date of Patent: May 15, 2012

Assignee: Microsoft Corporation

Inventors: Dong Yu, Li Deng, Alejandro Acero, Yifan Gong, Jinyu Li
Piecewise-based variable-parameter Hidden Markov Models and the training thereof

Patent number: 8160878

Abstract: A speech recognition system uses Gaussian mixture variable-parameter hidden Markov models (VPHMMs) to recognize speech under many different conditions. Each Gaussian mixture component of the VPHMMs is characterized by a mean parameter ? and a variance parameter ?. Each of these Gaussian parameters varies as a function of at least one environmental conditioning parameter, such as, but not limited to, instantaneous signal-to-noise-ratio (SNR). The way in which a Gaussian parameter varies with the environmental conditioning parameter(s) can be approximated as a piecewise function, such as a cubic spline function. Further, the recognition system formulates the mean parameter ? and the variance parameter ? of each Gaussian mixture component in an efficient form that accommodates the use of discriminative training and parameter sharing. Parameter sharing is carried out so that the otherwise very large number of parameters in the VPHMMs can be effectively reduced with practically feasible amounts of training data.

Type: Grant

Filed: September 16, 2008

Date of Patent: April 17, 2012

Assignee: Microsoft Corporation

Inventors: Dong Yu, Li Deng, Yifan Gong, Alejandro Acero
Parameter clustering and sharing for variable-parameter hidden markov models

Patent number: 8145488

Abstract: A speech recognition system uses Gaussian mixture variable-parameter hidden Markov models (VPHMMs) to recognize speech. The VPHMMs include Gaussian parameters that vary as a function of at least one environmental conditioning parameter. The relationship of each Gaussian parameter to the environmental conditioning parameter(s) is modeled using a piecewise fitting approach, such as by using spline functions. In a training phase, the recognition system can use clustering to identify classes of spline functions, each class grouping together spline functions which are similar to each other based on some distance measure. The recognition system can then store sets of spline parameters that represent respective classes of spline functions. An instance of a spline function that belongs to a class can make reference to an associated shared set of spline parameters. The Gaussian parameters can be represented in an efficient form that accommodates the use of sharing in the above-summarized manner.

Type: Grant

Filed: September 16, 2008

Date of Patent: March 27, 2012

Assignee: Microsoft Corporation

Inventors: Dong Yu, Li Deng, Yifan Gong, Alejandro Acero
Model development authoring, generation and execution based on data and processor dependencies

Patent number: 8086455

Abstract: A recognition (e.g., speech, handwriting, etc.) model build process that is declarative and data-dependence-based. Process steps are defined in a declarative language as individual processors having input/output data relationships and data dependencies of predecessors and subsequent process steps. A compiler is utilized to generate the model building sequence. The compiler uses the input data and output data files of each model build processor to determine the sequence of model building and automatically orders the processing steps based on the declared input/output relationship (the user does not need to determine the order of execution). The compiler also automatically detects ill-defined processes, including cyclic definition and data being produced by more than one action. The user can add, change and/or modify a process by editing a declaration file, and rerunning the compiler, thereby a new process is automatically generated.

Type: Grant

Filed: January 9, 2008

Date of Patent: December 27, 2011

Assignee: Microsoft Corporation

Inventors: Yifan Gong, Ye Tian
Speech model refinement with transcription error detection

Patent number: 7860716

Abstract: Reliable transcription error-checking algorithm that uses a word confidence score and a word duration probability to detect transcription errors for improved results through the automatic detection of transcription errors in a corpus. The transcription error-checking algorithm is combined model training so as to use a current model to detect transcription errors, remove utterances which contain incorrect transcription (or manually fix the found errors), and retrain the model. This process can be repeated for several iterations to obtain an improved speech recognition model. The speech model is employed to achieve speech-transcription alignment to obtain a word boundary. Speech recognizer is then utilized to generate a word-lattice. Using the word boundary and word lattice, error detection is computed using a word confidence score and a word duration probability.

Type: Grant

Filed: April 24, 2007

Date of Patent: December 28, 2010

Assignee: Microsoft Corporation

Inventors: Ye Tian, Yifan Gong, Frank K. Soong
MODEL TRAINING FOR AUTOMATIC SPEECH RECOGNITION FROM IMPERFECT TRANSCRIPTION DATA

Publication number: 20100318355

Abstract: Techniques and systems for training an acoustic model are described. In an embodiment, a technique for training an acoustic model includes dividing a corpus of training data that includes transcription errors into N parts, and on each part, decoding an utterance with an incremental acoustic model and an incremental language model to produce a decoded transcription. The technique may further include inserting silence between a pair of words into the decoded transcription and aligning an original transcription corresponding to the utterance with the decoded transcription according to time for each part. The technique may further include selecting a segment from the utterance having at least Q contiguous matching aligned words, and training the incremental acoustic model with the selected segment. The trained incremental acoustic model may then be used on a subsequent part of the training data. Other embodiments are described and claimed.

Type: Application

Filed: June 10, 2009

Publication date: December 16, 2010

Applicant: MICROSOFT CORPORATION

Inventors: Jinyu Li, Yifan Gong, Chaojun Liu, Kaisheng Yao
TECHNIQUES FOR ENHANCED AUTOMATIC SPEECH RECOGNITION

Publication number: 20100228548

Abstract: Techniques for enhanced automatic speech recognition are described. An enhanced ASR system may be operative to generate an error correction function. The error correction function may represent a mapping between a supervised set of parameters and an unsupervised training set of parameters generated using a same set of acoustic training data, and apply the error correction function to an unsupervised testing set of parameters to form a corrected set of parameters used to perform speaker adaptation. Other embodiments are described and claimed.

Type: Application

Filed: March 9, 2009

Publication date: September 9, 2010

Applicant: MICROSOFT CORPORATION

Inventors: Chaojun Liu, Yifan Gong
Noise Suppressor for Robust Speech Recognition

Publication number: 20100153104

Abstract: Described is noise reduction technology generally for speech input in which a noise-suppression related gain value for the frame is determined based upon a noise level associated with that frame in addition to the signal to noise ratios (SNRs). In one implementation, a noise reduction mechanism is based upon minimum mean square error, Mel-frequency cepstra noise reduction technology. A high gain value (e.g., one) is set to accomplish little or no noise suppression when the noise level is below a threshold low level, and a low gain value set or computed to accomplish large noise suppression above a threshold high noise level. A noise-power dependent function, e.g., a log-linear interpolation, is used to compute the gain between the thresholds. Smoothing may be performed by modifying the gain value based upon a prior frame's gain value. Also described is learning parameters used in noise reduction via a step-adaptive discriminative learning algorithm.

Type: Application

Filed: December 16, 2008

Publication date: June 17, 2010

Applicant: MICROSOFT CORPORATION

Inventors: Dong Yu, Li Deng, Yifan Gong, Jian Wu, Alejandro Acero
ADAPTING A COMPRESSED MODEL FOR USE IN SPEECH RECOGNITION

Publication number: 20100076757

Abstract: A speech recognition system includes a receiver component that receives a distorted speech utterance. The speech recognition also includes an adaptor component that selectively adapts parameters of a compressed model used to recognize at least a portion of the distorted speech utterance, wherein the adaptor component selectively adapts the parameters of the compressed model based at least in part upon the received distorted speech utterance.

Type: Application

Filed: September 23, 2008

Publication date: March 25, 2010

Applicant: Microsoft Corporation

Inventors: Jinyu Li, Li Deng, Dong Yu, Jian Wu, Yifan Gong, Alejandro Acero
PHASE SENSITIVE MODEL ADAPTATION FOR NOISY SPEECH RECOGNITION

Publication number: 20100076758

Abstract: A speech recognition system described herein includes a receiver component that receives a distorted speech utterance. The speech recognition also includes an updater component that is in communication with a first model and a second model, wherein the updater component automatically updates parameters of the second model based at least in part upon joint estimates of additive and convolutive distortions output by the first model, wherein the joint estimates of additive and convolutive distortions are estimates of distortions based on a phase-sensitive model in the speech utterance received by the receiver component. Further, distortions other than additive and convolutive distortions, including other stationary and nonstationary sources, can also be estimated used to update the parameters of the second model.

Type: Application

Filed: September 24, 2008

Publication date: March 25, 2010

Applicant: Microsoft Corporation

Inventors: Jinyu Li, Li Deng, Dong Yu, Yifan Gong, Alejandro Acero
PARAMETER CLUSTERING AND SHARING FOR VARIABLE-PARAMETER HIDDEN MARKOV MODELS

Publication number: 20100070280

Abstract: A speech recognition system uses Gaussian mixture variable-parameter hidden Markov models (VPHMMs) to recognize speech. The VPHMMs include Gaussian parameters that vary as a function of at least one environmental conditioning parameter. The relationship of each Gaussian parameter to the environmental conditioning parameter(s) is modeled using a piecewise fitting approach, such as by using spline functions. In a training phase, the recognition system can use clustering to identify classes of spline functions, each class grouping together spline functions which are similar to each other based on some distance measure. The recognition system can then store sets of spline parameters that represent respective classes of spline functions. An instance of a spline function that belongs to a class can make reference to an associated shared set of spline parameters. The Gaussian parameters can be represented in an efficient form that accommodates the use of sharing in the above-summarized manner.

Type: Application

Filed: September 16, 2008

Publication date: March 18, 2010

Applicant: Microsoft Corporation

Inventors: Dong Yu, Li Deng, Yifan Gong, Alejandro Acero
PIECEWISE-BASED VARIABLE -PARAMETER HIDDEN MARKOV MODELS AND THE TRAINING THEREOF

Publication number: 20100070279

Abstract: A speech recognition system uses Gaussian mixture variable-parameter hidden Markov models (VPHMMs) to recognize speech under many different conditions. Each Gaussian mixture component of the VPHMMs is characterized by a mean parameter ? and a variance parameter ?. Each of these Gaussian parameters varies as a function of at least one environmental conditioning parameter, such as, but not limited to, instantaneous signal-to-noise-ratio (SNR). The way in which a Gaussian parameter varies with the environmental conditioning parameter(s) can be approximated as a piecewise function, such as a cubic spline function. Further, the recognition system formulates the mean parameter ? and the variance parameter ? of each Gaussian mixture component in an efficient form that accommodates the use of discriminative training and parameter sharing. Parameter sharing is carried out so that the otherwise very large number of parameters in the VPHMMs can be effectively reduced with practically feasible amounts of training data.

Type: Application

Filed: September 16, 2008

Publication date: March 18, 2010

Applicant: Microsoft Corporation

Inventors: Dong Yu, Li Deng, Yifan Gong, Alejandro Acero
MODEL DEVELOPMENT AUTHORING, GENERATION AND EXECUTION BASED ON DATA AND PROCESSOR DEPENDENCIES

Publication number: 20090177471

Abstract: A recognition (e.g., speech, handwriting, etc.) model build process that is declarative and data-dependence-based. Process steps are defined in a declarative language as individual processors having input/output data relationships and data dependencies of predecessors and subsequent process steps. A compiler is utilized to generate the model building sequence. The compiler uses the input data and output data files of each model build processor to determine the sequence of model building and automatically orders the processing steps based on the declared input/output relationship (the user does not need to determine the order of execution). The compiler also automatically detects ill-defined processes, including cyclic definition and data being produced by more than one action. The user can add, change and/or modify a process by editing a declaration file, and rerunning the compiler, thereby a new process is automatically generated.

Type: Application

Filed: January 9, 2008

Publication date: July 9, 2009

Applicant: MICROSOFT CORPORATION

Inventors: Yifan Gong, Ye Tian
HIGH PERFORMANCE HMM ADAPTATION WITH JOINT COMPENSATION OF ADDITIVE AND CONVOLUTIVE DISTORTIONS

Publication number: 20090144059

Abstract: A method of compensating for additive and convolutive distortions applied to a signal indicative of an utterance is discussed. The method includes receiving a signal and initializing noise mean and channel mean vectors. Gaussian dependent matrix and Hidden Markov Model (HMM) parameters are calculated or updated to account for additive noise from the noise mean vector or convolutive distortion from the channel mean vector. The HMM parameters are adapted by decoding the utterance using the previously calculated HMM parameters and adjusting the Gaussian dependent matrix and the HMM parameters based upon data received during the decoding. The adapted HMM parameters are applied to decode the input utterance and provide a transcription of the utterance.

Type: Application

Filed: December 3, 2007

Publication date: June 4, 2009

Applicant: MICROSOFT CORPORATION

Inventors: Dong Yu, Li Deng, Alejandro Acero, Yifan Gong, Jinyu Li
Middle-end solution to robust speech recognition

Patent number: 7516069

Abstract: A method for performing time and frequency Signal-to-Noise Ratio (SNR) dependent weighting in speech recognition is described that includes for each period t estimating the SNR to get time and frequency SNR information ?t,f; calculating the time and frequency weighting to get ?tf; performing the back and forth weighted time varying DCT transformation matrix computation MGtM?1 to get Tt; providing the transformation matrix Tt and the original MFCC feature ot that contains the information about the SNR to a recognizer including the Viterbi decoding; and performing weighted Viterbi recognition bj(ot).

Type: Grant

Filed: April 13, 2004

Date of Patent: April 7, 2009

Assignee: Texas Instruments Incorporated

Inventors: Alexis P. Bernard, Yifan Gong
Decoding multiple HMM sets using a single sentence grammar

Patent number: 7464033

Abstract: For a given sentence grammar, speech recognizers are often required to decode M sets of HMMs each of which models a specific acoustic environment. In order to match input acoustic observations to each of the environments, typically recognition search methods require a network of M sub-networks. A new speech recognition search method is described here, which needs a network that is only the size of a single sub-network and yet provides the same recognition performance, thus reducing the memory requirements for network storage by (M-1)/M.

Type: Grant

Filed: February 4, 2005

Date of Patent: December 9, 2008

Assignee: Texas Instruments Incorporated

Inventor: Yifan Gong
Noise-resistant utterance detector

Patent number: 7451082

Abstract: A method and detector for providing a noise resistant utterance detector is provided by extracting a noise estimate (15) to augment the signal-to-noise ratio of the speech signal, inverse filtering (17) of the speech signal to focus on the periodic excitation part of the signal and spectral reshaping (19) to accentuate separation between formants.

Type: Grant

Filed: August 27, 2003

Date of Patent: November 11, 2008

Assignee: Texas Instruments Incorporated

Inventors: Yifan Gong, Alexis P. Bernard
Speech model refinement with transcription error detection

Publication number: 20080270133

Abstract: Reliable transcription error-checking algorithm that uses a word confidence score and a word duration probability to detect transcription errors for improved results through the automatic detection of transcription errors in a corpus. The transcription error-checking algorithm is combined model training so as to use a current model to detect transcription errors, remove utterances which contain incorrect transcription (or manually fix the found errors), and retrain the model. This process can be repeated for several iterations to obtain an improved speech recognition model. The speech model is employed to achieve speech-transcription alignment to obtain a word boundary. Speech recognizer is then utilized to generate a word-lattice. Using the word boundary and word lattice, error detection is computed using a word confidence score and a word duration probability.

Type: Application

Filed: April 24, 2007

Publication date: October 30, 2008

Applicant: Microsoft Corporation

Inventors: Ye Tian, Yifan Gong, Frank K. Soong
Recognition architecture for generating Asian characters

Publication number: 20080270118

Abstract: Architecture for correcting incorrect recognition results in an Asian language speech recognition system. A spelling mode can be launched in response to receiving speech input, the spelling mode for correcting incorrect spelling of the recognition results or generating new words. Correction can be obtained using speech and/or manual selection and entry. The architecture facilitates correction in a single pass, rather than multiples times as in conventional systems. Words corrected using the spelling mode are corrected as a unit and treated as a word. The spelling mode applies to languages of at least the Asian continent, such as Simplified Chinese, Traditional Chinese, and/or other Asian languages such as Japanese.

Type: Application

Filed: April 26, 2007

Publication date: October 30, 2008

Applicant: Microsoft Corporation

Inventors: Shiun-Zu Kuo, Kevin E. Feige, Yifan Gong, Taro Miwa, Arun Chitrapu
Identifying language origin of words

Publication number: 20070219777

Abstract: The language of origin of a word is determined by analyzing non-uniform letter sequence portions of the word.

Type: Application

Filed: March 20, 2006

Publication date: September 20, 2007

Applicant: Microsoft Corporation

Inventors: Min Chu, Yi Chen, Shiun-Zu Kuo, Xiaodong He, Megan Riley, Kevin Feige, Yifan Gong

prev … 4 5 6 7 8 9 10 next