Patents by Inventor Yifan Gong
Yifan Gong has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 8180637Abstract: A method of compensating for additive and convolutive distortions applied to a signal indicative of an utterance is discussed. The method includes receiving a signal and initializing noise mean and channel mean vectors. Gaussian dependent matrix and Hidden Markov Model (HMM) parameters are calculated or updated to account for additive noise from the noise mean vector or convolutive distortion from the channel mean vector. The HMM parameters are adapted by decoding the utterance using the previously calculated HMM parameters and adjusting the Gaussian dependent matrix and the HMM parameters based upon data received during the decoding. The adapted HMM parameters are applied to decode the input utterance and provide a transcription of the utterance.Type: GrantFiled: December 3, 2007Date of Patent: May 15, 2012Assignee: Microsoft CorporationInventors: Dong Yu, Li Deng, Alejandro Acero, Yifan Gong, Jinyu Li
-
Patent number: 8160878Abstract: A speech recognition system uses Gaussian mixture variable-parameter hidden Markov models (VPHMMs) to recognize speech under many different conditions. Each Gaussian mixture component of the VPHMMs is characterized by a mean parameter ? and a variance parameter ?. Each of these Gaussian parameters varies as a function of at least one environmental conditioning parameter, such as, but not limited to, instantaneous signal-to-noise-ratio (SNR). The way in which a Gaussian parameter varies with the environmental conditioning parameter(s) can be approximated as a piecewise function, such as a cubic spline function. Further, the recognition system formulates the mean parameter ? and the variance parameter ? of each Gaussian mixture component in an efficient form that accommodates the use of discriminative training and parameter sharing. Parameter sharing is carried out so that the otherwise very large number of parameters in the VPHMMs can be effectively reduced with practically feasible amounts of training data.Type: GrantFiled: September 16, 2008Date of Patent: April 17, 2012Assignee: Microsoft CorporationInventors: Dong Yu, Li Deng, Yifan Gong, Alejandro Acero
-
Patent number: 8145488Abstract: A speech recognition system uses Gaussian mixture variable-parameter hidden Markov models (VPHMMs) to recognize speech. The VPHMMs include Gaussian parameters that vary as a function of at least one environmental conditioning parameter. The relationship of each Gaussian parameter to the environmental conditioning parameter(s) is modeled using a piecewise fitting approach, such as by using spline functions. In a training phase, the recognition system can use clustering to identify classes of spline functions, each class grouping together spline functions which are similar to each other based on some distance measure. The recognition system can then store sets of spline parameters that represent respective classes of spline functions. An instance of a spline function that belongs to a class can make reference to an associated shared set of spline parameters. The Gaussian parameters can be represented in an efficient form that accommodates the use of sharing in the above-summarized manner.Type: GrantFiled: September 16, 2008Date of Patent: March 27, 2012Assignee: Microsoft CorporationInventors: Dong Yu, Li Deng, Yifan Gong, Alejandro Acero
-
Patent number: 8086455Abstract: A recognition (e.g., speech, handwriting, etc.) model build process that is declarative and data-dependence-based. Process steps are defined in a declarative language as individual processors having input/output data relationships and data dependencies of predecessors and subsequent process steps. A compiler is utilized to generate the model building sequence. The compiler uses the input data and output data files of each model build processor to determine the sequence of model building and automatically orders the processing steps based on the declared input/output relationship (the user does not need to determine the order of execution). The compiler also automatically detects ill-defined processes, including cyclic definition and data being produced by more than one action. The user can add, change and/or modify a process by editing a declaration file, and rerunning the compiler, thereby a new process is automatically generated.Type: GrantFiled: January 9, 2008Date of Patent: December 27, 2011Assignee: Microsoft CorporationInventors: Yifan Gong, Ye Tian
-
Patent number: 7860716Abstract: Reliable transcription error-checking algorithm that uses a word confidence score and a word duration probability to detect transcription errors for improved results through the automatic detection of transcription errors in a corpus. The transcription error-checking algorithm is combined model training so as to use a current model to detect transcription errors, remove utterances which contain incorrect transcription (or manually fix the found errors), and retrain the model. This process can be repeated for several iterations to obtain an improved speech recognition model. The speech model is employed to achieve speech-transcription alignment to obtain a word boundary. Speech recognizer is then utilized to generate a word-lattice. Using the word boundary and word lattice, error detection is computed using a word confidence score and a word duration probability.Type: GrantFiled: April 24, 2007Date of Patent: December 28, 2010Assignee: Microsoft CorporationInventors: Ye Tian, Yifan Gong, Frank K. Soong
-
Publication number: 20100318355Abstract: Techniques and systems for training an acoustic model are described. In an embodiment, a technique for training an acoustic model includes dividing a corpus of training data that includes transcription errors into N parts, and on each part, decoding an utterance with an incremental acoustic model and an incremental language model to produce a decoded transcription. The technique may further include inserting silence between a pair of words into the decoded transcription and aligning an original transcription corresponding to the utterance with the decoded transcription according to time for each part. The technique may further include selecting a segment from the utterance having at least Q contiguous matching aligned words, and training the incremental acoustic model with the selected segment. The trained incremental acoustic model may then be used on a subsequent part of the training data. Other embodiments are described and claimed.Type: ApplicationFiled: June 10, 2009Publication date: December 16, 2010Applicant: MICROSOFT CORPORATIONInventors: Jinyu Li, Yifan Gong, Chaojun Liu, Kaisheng Yao
-
Publication number: 20100228548Abstract: Techniques for enhanced automatic speech recognition are described. An enhanced ASR system may be operative to generate an error correction function. The error correction function may represent a mapping between a supervised set of parameters and an unsupervised training set of parameters generated using a same set of acoustic training data, and apply the error correction function to an unsupervised testing set of parameters to form a corrected set of parameters used to perform speaker adaptation. Other embodiments are described and claimed.Type: ApplicationFiled: March 9, 2009Publication date: September 9, 2010Applicant: MICROSOFT CORPORATIONInventors: Chaojun Liu, Yifan Gong
-
Publication number: 20100153104Abstract: Described is noise reduction technology generally for speech input in which a noise-suppression related gain value for the frame is determined based upon a noise level associated with that frame in addition to the signal to noise ratios (SNRs). In one implementation, a noise reduction mechanism is based upon minimum mean square error, Mel-frequency cepstra noise reduction technology. A high gain value (e.g., one) is set to accomplish little or no noise suppression when the noise level is below a threshold low level, and a low gain value set or computed to accomplish large noise suppression above a threshold high noise level. A noise-power dependent function, e.g., a log-linear interpolation, is used to compute the gain between the thresholds. Smoothing may be performed by modifying the gain value based upon a prior frame's gain value. Also described is learning parameters used in noise reduction via a step-adaptive discriminative learning algorithm.Type: ApplicationFiled: December 16, 2008Publication date: June 17, 2010Applicant: MICROSOFT CORPORATIONInventors: Dong Yu, Li Deng, Yifan Gong, Jian Wu, Alejandro Acero
-
Publication number: 20100076757Abstract: A speech recognition system includes a receiver component that receives a distorted speech utterance. The speech recognition also includes an adaptor component that selectively adapts parameters of a compressed model used to recognize at least a portion of the distorted speech utterance, wherein the adaptor component selectively adapts the parameters of the compressed model based at least in part upon the received distorted speech utterance.Type: ApplicationFiled: September 23, 2008Publication date: March 25, 2010Applicant: Microsoft CorporationInventors: Jinyu Li, Li Deng, Dong Yu, Jian Wu, Yifan Gong, Alejandro Acero
-
Publication number: 20100076758Abstract: A speech recognition system described herein includes a receiver component that receives a distorted speech utterance. The speech recognition also includes an updater component that is in communication with a first model and a second model, wherein the updater component automatically updates parameters of the second model based at least in part upon joint estimates of additive and convolutive distortions output by the first model, wherein the joint estimates of additive and convolutive distortions are estimates of distortions based on a phase-sensitive model in the speech utterance received by the receiver component. Further, distortions other than additive and convolutive distortions, including other stationary and nonstationary sources, can also be estimated used to update the parameters of the second model.Type: ApplicationFiled: September 24, 2008Publication date: March 25, 2010Applicant: Microsoft CorporationInventors: Jinyu Li, Li Deng, Dong Yu, Yifan Gong, Alejandro Acero
-
Publication number: 20100070280Abstract: A speech recognition system uses Gaussian mixture variable-parameter hidden Markov models (VPHMMs) to recognize speech. The VPHMMs include Gaussian parameters that vary as a function of at least one environmental conditioning parameter. The relationship of each Gaussian parameter to the environmental conditioning parameter(s) is modeled using a piecewise fitting approach, such as by using spline functions. In a training phase, the recognition system can use clustering to identify classes of spline functions, each class grouping together spline functions which are similar to each other based on some distance measure. The recognition system can then store sets of spline parameters that represent respective classes of spline functions. An instance of a spline function that belongs to a class can make reference to an associated shared set of spline parameters. The Gaussian parameters can be represented in an efficient form that accommodates the use of sharing in the above-summarized manner.Type: ApplicationFiled: September 16, 2008Publication date: March 18, 2010Applicant: Microsoft CorporationInventors: Dong Yu, Li Deng, Yifan Gong, Alejandro Acero
-
Publication number: 20100070279Abstract: A speech recognition system uses Gaussian mixture variable-parameter hidden Markov models (VPHMMs) to recognize speech under many different conditions. Each Gaussian mixture component of the VPHMMs is characterized by a mean parameter ? and a variance parameter ?. Each of these Gaussian parameters varies as a function of at least one environmental conditioning parameter, such as, but not limited to, instantaneous signal-to-noise-ratio (SNR). The way in which a Gaussian parameter varies with the environmental conditioning parameter(s) can be approximated as a piecewise function, such as a cubic spline function. Further, the recognition system formulates the mean parameter ? and the variance parameter ? of each Gaussian mixture component in an efficient form that accommodates the use of discriminative training and parameter sharing. Parameter sharing is carried out so that the otherwise very large number of parameters in the VPHMMs can be effectively reduced with practically feasible amounts of training data.Type: ApplicationFiled: September 16, 2008Publication date: March 18, 2010Applicant: Microsoft CorporationInventors: Dong Yu, Li Deng, Yifan Gong, Alejandro Acero
-
Publication number: 20090177471Abstract: A recognition (e.g., speech, handwriting, etc.) model build process that is declarative and data-dependence-based. Process steps are defined in a declarative language as individual processors having input/output data relationships and data dependencies of predecessors and subsequent process steps. A compiler is utilized to generate the model building sequence. The compiler uses the input data and output data files of each model build processor to determine the sequence of model building and automatically orders the processing steps based on the declared input/output relationship (the user does not need to determine the order of execution). The compiler also automatically detects ill-defined processes, including cyclic definition and data being produced by more than one action. The user can add, change and/or modify a process by editing a declaration file, and rerunning the compiler, thereby a new process is automatically generated.Type: ApplicationFiled: January 9, 2008Publication date: July 9, 2009Applicant: MICROSOFT CORPORATIONInventors: Yifan Gong, Ye Tian
-
Publication number: 20090144059Abstract: A method of compensating for additive and convolutive distortions applied to a signal indicative of an utterance is discussed. The method includes receiving a signal and initializing noise mean and channel mean vectors. Gaussian dependent matrix and Hidden Markov Model (HMM) parameters are calculated or updated to account for additive noise from the noise mean vector or convolutive distortion from the channel mean vector. The HMM parameters are adapted by decoding the utterance using the previously calculated HMM parameters and adjusting the Gaussian dependent matrix and the HMM parameters based upon data received during the decoding. The adapted HMM parameters are applied to decode the input utterance and provide a transcription of the utterance.Type: ApplicationFiled: December 3, 2007Publication date: June 4, 2009Applicant: MICROSOFT CORPORATIONInventors: Dong Yu, Li Deng, Alejandro Acero, Yifan Gong, Jinyu Li
-
Patent number: 7516069Abstract: A method for performing time and frequency Signal-to-Noise Ratio (SNR) dependent weighting in speech recognition is described that includes for each period t estimating the SNR to get time and frequency SNR information ?t,f; calculating the time and frequency weighting to get ?tf; performing the back and forth weighted time varying DCT transformation matrix computation MGtM?1 to get Tt; providing the transformation matrix Tt and the original MFCC feature ot that contains the information about the SNR to a recognizer including the Viterbi decoding; and performing weighted Viterbi recognition bj(ot).Type: GrantFiled: April 13, 2004Date of Patent: April 7, 2009Assignee: Texas Instruments IncorporatedInventors: Alexis P. Bernard, Yifan Gong
-
Patent number: 7464033Abstract: For a given sentence grammar, speech recognizers are often required to decode M sets of HMMs each of which models a specific acoustic environment. In order to match input acoustic observations to each of the environments, typically recognition search methods require a network of M sub-networks. A new speech recognition search method is described here, which needs a network that is only the size of a single sub-network and yet provides the same recognition performance, thus reducing the memory requirements for network storage by (M-1)/M.Type: GrantFiled: February 4, 2005Date of Patent: December 9, 2008Assignee: Texas Instruments IncorporatedInventor: Yifan Gong
-
Patent number: 7451082Abstract: A method and detector for providing a noise resistant utterance detector is provided by extracting a noise estimate (15) to augment the signal-to-noise ratio of the speech signal, inverse filtering (17) of the speech signal to focus on the periodic excitation part of the signal and spectral reshaping (19) to accentuate separation between formants.Type: GrantFiled: August 27, 2003Date of Patent: November 11, 2008Assignee: Texas Instruments IncorporatedInventors: Yifan Gong, Alexis P. Bernard
-
Publication number: 20080270133Abstract: Reliable transcription error-checking algorithm that uses a word confidence score and a word duration probability to detect transcription errors for improved results through the automatic detection of transcription errors in a corpus. The transcription error-checking algorithm is combined model training so as to use a current model to detect transcription errors, remove utterances which contain incorrect transcription (or manually fix the found errors), and retrain the model. This process can be repeated for several iterations to obtain an improved speech recognition model. The speech model is employed to achieve speech-transcription alignment to obtain a word boundary. Speech recognizer is then utilized to generate a word-lattice. Using the word boundary and word lattice, error detection is computed using a word confidence score and a word duration probability.Type: ApplicationFiled: April 24, 2007Publication date: October 30, 2008Applicant: Microsoft CorporationInventors: Ye Tian, Yifan Gong, Frank K. Soong
-
Publication number: 20080270118Abstract: Architecture for correcting incorrect recognition results in an Asian language speech recognition system. A spelling mode can be launched in response to receiving speech input, the spelling mode for correcting incorrect spelling of the recognition results or generating new words. Correction can be obtained using speech and/or manual selection and entry. The architecture facilitates correction in a single pass, rather than multiples times as in conventional systems. Words corrected using the spelling mode are corrected as a unit and treated as a word. The spelling mode applies to languages of at least the Asian continent, such as Simplified Chinese, Traditional Chinese, and/or other Asian languages such as Japanese.Type: ApplicationFiled: April 26, 2007Publication date: October 30, 2008Applicant: Microsoft CorporationInventors: Shiun-Zu Kuo, Kevin E. Feige, Yifan Gong, Taro Miwa, Arun Chitrapu
-
Publication number: 20070219777Abstract: The language of origin of a word is determined by analyzing non-uniform letter sequence portions of the word.Type: ApplicationFiled: March 20, 2006Publication date: September 20, 2007Applicant: Microsoft CorporationInventors: Min Chu, Yi Chen, Shiun-Zu Kuo, Xiaodong He, Megan Riley, Kevin Feige, Yifan Gong