Patents by Inventor Yifan Gong

Yifan Gong has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

SHARED HIDDEN LAYER COMBINATION FOR SPEECH RECOGNITION SYSTEMS

Publication number: 20150310858

Abstract: Providing a framework for merging automatic speech recognition (ASR) systems having a shared deep neural network (DNN) feature transformation is provided. A received utterance may be evaluated to generate a DNN-derived feature from the top hidden layer of a DNN. The top hidden layer output may then be utilized to generate a network including a bottleneck layer and an output layer. Weights representing a feature dimension reduction may then be extracted between the top hidden layer and the bottleneck layer. Scores may then be generated and combined to merge the ASR systems which share the DNN feature transformation.

Type: Application

Filed: April 29, 2014

Publication date: October 29, 2015

Applicant: MICROSOFT CORPORATION

Inventors: JINYU LI, JIAN XUE, YIFAN GONG
LOW-FOOTPRINT ADAPTATION AND PERSONALIZATION FOR A DEEP NEURAL NETWORK

Publication number: 20150255061

Abstract: The adaptation and personalization of a deep neural network (DNN) model for automatic speech recognition is provided. An utterance which includes speech features for one or more speakers may be received in ASR tasks such as voice search or short message dictation. A decomposition approach may then be applied to an original matrix in the DNN model. In response to applying the decomposition approach, the original matrix may be converted into multiple new matrices which are smaller than the original matrix. A square matrix may then be added to the new matrices. Speaker-specific parameters may then be stored in the square matrix. The DNN model may then be adapted by updating the square matrix. This process may be applied to all of a number of original matrices in the DNN model. The adapted DNN model may include a reduced number of parameters than those received in the original DNN model.

Type: Application

Filed: March 7, 2014

Publication date: September 10, 2015

Applicant: MICROSOFT CORPORATION

Inventors: Jian Xue, Jinyu Li, Dong Yu, Michael L. Seltzer, Yifan Gong
RESTRUCTURING DEEP NEURAL NETWORK ACOUSTIC MODELS

Publication number: 20140372112

Abstract: A Deep Neural Network (DNN) model used in an Automatic Speech Recognition (ASR) system is restructured. A restructured DNN model may include fewer parameters compared to the original DNN model. The restructured DNN model may include a monophone state output layer in addition to the senone output layer of the original DNN model. Singular value decomposition (SVD) can be applied to one or more weight matrices of the DNN model to reduce the size of the DNN Model. The output layer of the DNN model may be restructured to include monophone states in addition to the senones (tied triphone states) which are included in the original DNN model. When the monophone states are included in the restructured DNN model, the posteriors of monophone states are used to select a small part of senones to be evaluated.

Type: Application

Filed: June 18, 2013

Publication date: December 18, 2014

Inventors: Jian Xue, Emilian Stoimenov, Jinyu Li, Yifan Gong
Posterior-Based Feature with Partial Distance Elimination for Speech Recognition

Publication number: 20140257814

Abstract: A high-dimensional posterior-based feature with partial distance elimination may be utilized for speech recognition. The log likelihood values of a large number of Gaussians are needed to generate the high-dimensional posterior feature. Gaussians with very small log likelihoods are associated with zero posterior values. Log likelihoods for Gaussians for a speech frame may be evaluated with a partial distance elimination method. If the partial distance of a Gaussian is already too small, the Gaussian will have a zero posterior value. The partial distance may be calculated by sequentially adding individual dimensions in a group of dimensions. The partial distance elimination occurs when less than all of the dimensions in the group are sequentially added.

Type: Application

Filed: March 5, 2013

Publication date: September 11, 2014

Applicant: MICROSOFT CORPORATION

Inventors: Jinyu Li, Zhijie Yan, Qiang Huo, Yifan Gong
MULTILINGUAL DEEP NEURAL NETWORK

Publication number: 20140257805

Abstract: Described herein are various technologies pertaining to a multilingual deep neural network (MDNN). The MDNN includes a plurality of hidden layers, wherein values for weight parameters of the plurality of hidden layers are learned during a training phase based upon training data in terms of acoustic raw features for multiple languages. The MDNN further includes softmax layers that are trained for each target language separately, making use of the hidden layer values trained jointly with multiple source languages. The MDNN is adaptable, such that a new softmax layer may be added on top of the existing hidden layers, where the new softmax layer corresponds to a new target language.

Type: Application

Filed: March 11, 2013

Publication date: September 11, 2014

Applicant: MICROSOFT CORPORATION

Inventors: Jui-Ting Huang, Jinyu Li, Dong Yu, Li Deng, Yifan Gong
EXPLOITING HETEROGENEOUS DATA IN DEEP NEURAL NETWORK-BASED SPEECH RECOGNITION SYSTEMS

Publication number: 20140257804

Abstract: Technologies pertaining to training a deep neural network (DNN) for use in a recognition system are described herein. The DNN is trained using heterogeneous data, the heterogeneous data including narrowband signals and wideband signals. The DNN, subsequent to being trained, receives an input signal that can be either a wideband signal or narrowband signal. The DNN estimates the class posterior probability of the input signal regardless of whether the input signal is the wideband signal or the narrowband signal.

Type: Application

Filed: March 7, 2013

Publication date: September 11, 2014

Applicant: Microsoft Corporation

Inventors: Jinyu Li, Dong Yu, Yifan Gong
FEATURE SPACE TRANSFORMATION FOR PERSONALIZATION USING GENERALIZED I-VECTOR CLUSTERING

Publication number: 20140214420

Abstract: Personalization for Automatic Speech Recognition (ASR) is associated with a particular device. A generalized i-vector clustering method is used to train i-vector parameters on utterances received from a device and to classify test utterances from the same device. A sub-loading matrix and a residual noise term may be used when determining the personalization. A Universal Background Model (UBM) is trained using the utterances. The UBM is applied to obtain i-vectors of training utterances received from a device and a Gaussian Mixture Model (GMM) is trained using the i-vectors. During testing, the i-vector for each utterance received from the device is estimated using the device's UBM. The utterance is then assigned to the cluster with the closest centroid in the GMM. For each utterance, the i-vector and the residual noise estimation is performed. Hyperparameter estimation is also performed. The i-vector estimation and hyperparameter estimation are performed until convergence.

Type: Application

Filed: January 25, 2013

Publication date: July 31, 2014

Applicant: MICROSOFT CORPORATION

Inventors: Kaisheng Yao, Yifan Gong
ADAPTIVE ONLINE FEATURE NORMALIZATION FOR SPEECH RECOGNITION

Publication number: 20140207448

Abstract: A speech recognition system adaptively estimates a warping factor used to reduce speaker variability. The warping factor is estimated using a small window (e.g. 100 ms) of speech. The warping factor is adaptively adjusted as more speech is obtained until the warping factor converges or a pre-defined maximum number of adaptation is reached. The speaker may be placed into a group selected from two or more groups based on characteristics that are associated with the speaker's window of speech. Different step sizes may be used within the different groups when estimating the warping factor. VTLN is applied to the speech input using the estimated warping factor. A linear transformation, including a bias term, may also be computed to assist in normalizing the speech along with the application of the VTLN.

Type: Application

Filed: January 23, 2013

Publication date: July 24, 2014

Applicant: Microsoft Corporation

Inventors: Shizhen Wang, Yifan Gong, Fileno Alleva
Online distorted speech estimation within an unscented transformation framework

Patent number: 8731916

Abstract: Noise and channel distortion parameters in the vectorized logarithmic or the cepstral domain for an utterance may be estimated, and subsequently the distorted speech parameters in the same domain may be updated using an unscented transformation framework during online automatic speech recognition. An utterance, including speech generated from a transmission source for delivery to a receiver, may be received by a computing device. The computing device may execute instructions for applying the unscented transformation framework to speech feature vectors, representative of the speech, in order to estimate, in a sequential or online manner, static noise and channel distortion parameters and dynamic noise distortion parameters in the unscented transformation framework. The static and dynamic parameters for the distorted speech in the utterance may then be updated from clean speech parameters and the noise and channel distortion parameters using non-linear mapping.

Type: Grant

Filed: November 18, 2010

Date of Patent: May 20, 2014

Assignee: Microsoft Corporation

Inventors: Deng Li, Jinyu Li, Dong Yu, Yifan Gong
Subspace speech adaptation

Patent number: 8700400

Abstract: Subspace speech adaptation may be utilized for facilitating the recognition of speech containing short utterances. Speech training data may be received in a speech model by a computer. A first matrix may be determined for preconditioning speech statistics based on the speech training data. A second matrix may be determined for representing a basis for the speech to be recognized. A set of basis matrices may then be determined from the first matrix and the second matrix. Speech test data including a short utterance may then be received by the computer. The computer may then apply the set of basis matrices to the speech test data to produce a transcription. The transcription may represent speech recognition of the short utterance.

Type: Grant

Filed: December 30, 2010

Date of Patent: April 15, 2014

Assignee: Microsoft Corporation

Inventors: Daniel Povey, Kaisheng Yao, Yifan Gong
Utilizing Scalar Operations for Recognizing Utterances During Automatic Speech Recognition in Noisy Environments

Publication number: 20140067387

Abstract: Scalar operations for model adaptation or feature enhancement may be utilized for recognizing an utterance during automatic speech recognition in a noisy environment. An utterance including distorted speech generated from a transmission source for delivery to a receiver, may be received by a computer. The distorted speech may be caused by the noisy environment and channel distortion. Computations using scalar operations in the form of an algorithm may then be performed for recognizing the utterance. As a result of performing all of the computations with scalar operations, computational complexity is very small in comparison to matrix and vector operations. Vector Taylor Series with diagonal Jacobian approximation may also be utilized as a distortion-model-based noise robust algorithm with scalar operations.

Type: Application

Filed: September 5, 2012

Publication date: March 6, 2014

Applicant: MICROSOFT CORPORATION

Inventors: Jinyu Li, Michael Lewis Seltzer, Yifan Gong
Recognition architecture for generating Asian characters

Patent number: 8457946

Abstract: Architecture for correcting incorrect recognition results in an Asian language speech recognition system. A spelling mode can be launched in response to receiving speech input, the spelling mode for correcting incorrect spelling of the recognition results or generating new words. Correction can be obtained using speech and/or manual selection and entry. The architecture facilitates correction in a single pass, rather than multiples times as in conventional systems. Words corrected using the spelling mode are corrected as a unit and treated as a word. The spelling mode applies to languages of at least the Asian continent, such as Simplified Chinese, Traditional Chinese, and/or other Asian languages such as Japanese.

Type: Grant

Filed: April 26, 2007

Date of Patent: June 4, 2013

Assignee: Microsoft Corporation

Inventors: Shiun-Zu Kuo, Kevin E. Feige, Yifan Gong, Taro Miwa, Arun Chitrapu
Model Based Online Normalization of Feature Distribution for Noise Robust Speech Recognition

Publication number: 20130080165

Abstract: Online histogram recognition may be provided. Upon receiving a spoken phrase from a user, a histogram/frequency distribution may be estimated on the spoken phrase according to a prior distribution. The histogram distribution may be equalized and then provided to a spoken language understanding application.

Type: Application

Filed: September 24, 2011

Publication date: March 28, 2013

Applicant: Microsoft Corporation

Inventors: Shizen Wang, Yifan Gong
Enhanced automatic speech recognition using mapping between unsupervised and supervised speech model parameters trained on same acoustic training data

Patent number: 8306819

Abstract: Techniques for enhanced automatic speech recognition are described. An enhanced ASR system may be operative to generate an error correction function. The error correction function may represent a mapping between a supervised set of parameters and an unsupervised training set of parameters generated using a same set of acoustic training data, and apply the error correction function to an unsupervised testing set of parameters to form a corrected set of parameters used to perform speaker adaptation. Other embodiments are described and claimed.

Type: Grant

Filed: March 9, 2009

Date of Patent: November 6, 2012

Assignee: Microsoft Corporation

Inventors: Chaojun Liu, Yifan Gong
Adapting a compressed model for use in speech recognition

Patent number: 8239195

Abstract: A speech recognition system includes a receiver component that receives a distorted speech utterance. The speech recognition also includes an adaptor component that selectively adapts parameters of a compressed model used to recognize at least a portion of the distorted speech utterance, wherein the adaptor component selectively adapts the parameters of the compressed model based at least in part upon the received distorted speech utterance.

Type: Grant

Filed: September 23, 2008

Date of Patent: August 7, 2012

Assignee: Microsoft Corporation

Inventors: Jinyu Li, Li Deng, Dong Yu, Jian Wu, Yifan Gong, Alejandro Acero
Subspace Speech Adaptation

Publication number: 20120173240

Abstract: Subspace speech adaptation may be utilized for facilitating the recognition of speech containing short utterances. Speech training data may be received in a speech model by a computer. A first matrix may be determined for preconditioning speech statistics based on the speech training data. A second matrix may be determined for representing a basis for the speech to be recognized. A set of basis matrices may then be determined from the first matrix and the second matrix. Speech test data including a short utterance may then be received by the computer. The computer may then apply the set of basis matrices to the speech test data to produce a transcription. The transcription may represent speech recognition of the short utterance.

Type: Application

Filed: December 30, 2010

Publication date: July 5, 2012

Applicant: MICROSOFT CORPORATION

Inventors: Daniel Povey, Kaisheng YAO, Yifan Gong
Phase sensitive model adaptation for noisy speech recognition

Patent number: 8214215

Abstract: A speech recognition system described herein includes a receiver component that receives a distorted speech utterance. The speech recognition also includes an updater component that is in communication with a first model and a second model, wherein the updater component automatically updates parameters of the second model based at least in part upon joint estimates of additive and convolutive distortions output by the first model, wherein the joint estimates of additive and convolutive distortions are estimates of distortions based on a phase-sensitive model in the speech utterance received by the receiver component. Further, distortions other than additive and convolutive distortions, including other stationary and nonstationary sources, can also be estimated used to update the parameters of the second model.

Type: Grant

Filed: September 24, 2008

Date of Patent: July 3, 2012

Assignee: Microsoft Corporation

Inventors: Jinyu Li, Li Deng, Dong Yu, Yifan Gong, Alejandro Acero
ONLINE DISTORTED SPEECH ESTIMATION WITHIN AN UNSCENTED TRANSFORMATION FRAMEWORK

Publication number: 20120130710

Abstract: Noise and channel distortion parameters in the vectorized logarithmic or the cepstral domain for an utterance may be estimated, and subsequently the distorted speech parameters in the same domain may be updated using an unscented transformation framework during online automatic speech recognition. An utterance, including speech generated from a transmission source for delivery to a receiver, may be received by a computing device. The computing device may execute instructions for applying the unscented transformation framework to speech feature vectors, representative of the speech, in order to estimate, in a sequential or online manner, static noise and channel distortion parameters and dynamic noise distortion parameters in the unscented transformation framework. The static and dynamic parameters for the distorted speech in the utterance may then be updated from clean speech parameters and the noise and channel distortion parameters using non-linear mapping.

Type: Application

Filed: November 18, 2010

Publication date: May 24, 2012

Applicant: Microsoft Corporation

Inventors: Deng Li, Jinyu Li, Dong Yu, Yifan Gong
Identifying language origin of words

Patent number: 8185376

Abstract: The language of origin of a word is determined by analyzing non-uniform letter sequence portions of the word.

Type: Grant

Filed: March 20, 2006

Date of Patent: May 22, 2012

Assignee: Microsoft Corporation

Inventors: Min Chu, Yi Ning Chen, Shiun-Zu Kuo, Xiaodong He, Megan Riley, Kevin E. Feige, Yifan Gong
Noise suppressor for robust speech recognition

Patent number: 8185389

Abstract: Described is noise reduction technology generally for speech input in which a noise-suppression related gain value for the frame is determined based upon a noise level associated with that frame in addition to the signal to noise ratios (SNRs). In one implementation, a noise reduction mechanism is based upon minimum mean square error, Mel-frequency cepstra noise reduction technology. A high gain value (e.g., one) is set to accomplish little or no noise suppression when the noise level is below a threshold low level, and a low gain value set or computed to accomplish large noise suppression above a threshold high noise level. A noise-power dependent function, e.g., a log-linear interpolation, is used to compute the gain between the thresholds. Smoothing may be performed by modifying the gain value based upon a prior frame's gain value. Also described is learning parameters used in noise reduction via a step-adaptive discriminative learning algorithm.

Type: Grant

Filed: December 16, 2008

Date of Patent: May 22, 2012

Assignee: Microsoft Corporation

Inventors: Dong Yu, Li Deng, Yifan Gong, Jian Wu, Alejandro Acero

prev … 3 4 5 6 7 8 9 10 next