Patents by Inventor Kaisheng Yao
Kaisheng Yao has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20160307565Abstract: Aspects of the technology described herein relate to a new type of deep neural network (DNN). The new DNN is described herein as a deep neural support vector machine (DNSVM). Traditional DNNs use the multinomial logistic regression (softmax activation) at the top layer and underlying layers for training. The new DNN instead uses a support vector machine (SVM) as one or more layers, including the top layer. The technology described herein can use one of two training algorithms to train the DNSVM to learn parameters of SVM and DNN in the maximum-margin criteria. The first training method is a frame-level training. In the frame-level training, the new model is shown to be related to the multi-class SVM with DNN features. The second training method is the sequence-level training. The sequence-level training is related to the structured SVM with DNN features and HMM state transition features.Type: ApplicationFiled: February 16, 2016Publication date: October 20, 2016Inventors: CHAOJUN LIU, KAISHENG YAO, YIFAN GONG, SHIXIONG ZHANG
-
Publication number: 20160091965Abstract: A “Natural Motion Controller” identifies various motions of one or more parts of a user's body to interact with electronic devices, thereby enabling various natural user interface (NUI) scenarios. The Natural Motion Controller constructs composite motion recognition windows by concatenating an adjustable number of sequential periods of inertial sensor data received from a plurality of separate sets of inertial sensors. Each of these separate sets of inertial sensors are coupled to, or otherwise provide sensor data relating to, a separate user worn, carried, or held mobile computing device. Each composite motion recognition window is then passed to a motion recognition model trained by one or more machine-based deep learning processes. This motion recognition model is then applied to the composite motion recognition windows to identify a sequence of one or more predefined motions. Identified motions are then used as the basis for triggering execution of one or more application commands.Type: ApplicationFiled: September 30, 2014Publication date: March 31, 2016Inventors: Jiaping Wang, Yujia Li, Xuedong Huang, Lingfeng Wu, Wei Xiong, Kaisheng Yao, Geoffrey Zweig
-
Patent number: 9280969Abstract: Techniques and systems for training an acoustic model are described. In an embodiment, a technique for training an acoustic model includes dividing a corpus of training data that includes transcription errors into N parts, and on each part, decoding an utterance with an incremental acoustic model and an incremental language model to produce a decoded transcription. The technique may further include inserting silence between a pair of words into the decoded transcription and aligning an original transcription corresponding to the utterance with the decoded transcription according to time for each part. The technique may further include selecting a segment from the utterance having at least Q contiguous matching aligned words, and training the incremental acoustic model with the selected segment. The trained incremental acoustic model may then be used on a subsequent part of the training data. Other embodiments are described and claimed.Type: GrantFiled: June 10, 2009Date of Patent: March 8, 2016Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Jinyu Li, Yifan Gong, Chaojun Liu, Kaisheng Yao
-
Patent number: 9239828Abstract: Recurrent conditional random field (R-CRF) embodiments are described. In one embodiment, the R-CFR receives feature values corresponding to a sequence of words. Semantic labels for words in the sequence of words are then generated and each label is assigned to the appropriate one of the words in the sequence of words. The R-CRF used to accomplish these tasks includes a recurrent neural network (RNN) portion and a conditional random field (CRF) portion. The RNN portion receives feature values associated with a word in the sequence of words and outputs RNN activation layer activations data that is indicative of a semantic label. The CRF portion inputs the RNN activation layer activations data output from the RNN for one or more words in the sequence of words and outputs label data that is indicative of a separate semantic label that is to be assigned to each of the words.Type: GrantFiled: March 7, 2014Date of Patent: January 19, 2016Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Kaisheng Yao, Geoffrey Gerson Zweig, Dong Yu
-
Publication number: 20150364127Abstract: The technology relates to performing letter-to-sound conversion utilizing recurrent neural networks (RNNs). The RNNs may be implemented as RNN modules for letter-to-sound conversion. The RNN modules receive text input and convert the text to corresponding phonemes. In determining the corresponding phonemes, the RNN modules may analyze the letters of the text and the letters surrounding the text being analyzed. The RNN modules may also analyze the letters of the text in reverse order. The RNN modules may also receive contextual information about the input text. The letter-to-sound conversion may then also be based on the contextual information that is received. The determined phonemes may be utilized to generate synthesized speech from the input text.Type: ApplicationFiled: June 13, 2014Publication date: December 17, 2015Applicant: MICROSOFT CORPORATIONInventors: Pei Zhao, Kaisheng Yao, Max Leung, Mei-Yuh Hwang, Sheng Zhao, Bo Yan, Geoffrey Zweig, Fileno A. Alleva
-
Publication number: 20150364128Abstract: The technology relates to converting text to speech utilizing recurrent neural networks (RNNs). The recurrent neural networks may be implemented as multiple modules for determining properties of the text. In embodiments, a part-of-speech RNN module, letter-to-sound RNN module, a linguistic prosody tagger RNN module, and a context awareness and semantic mining RNN module may all be utilized. The properties from the RNN modules are processed by a hyper-structure RNN module that determine the phonetic properties of the input text based on the outputs of the other RNN modules. The hyper-structure RNN module may generate a generation sequence that is capable of being converting to audible speech by a speech synthesizer. The generation sequence may also be optimized by a global optimization module prior to being synthesized into audible speech.Type: ApplicationFiled: June 13, 2014Publication date: December 17, 2015Applicant: MICROSOFT CORPORATIONInventors: Pei Zhao, Max Leung, Kaisheng Yao, Bo Yan, Sheng Zhao, Fileno A. Alleva
-
Patent number: 9208777Abstract: Personalization for Automatic Speech Recognition (ASR) is associated with a particular device. A generalized i-vector clustering method is used to train i-vector parameters on utterances received from a device and to classify test utterances from the same device. A sub-loading matrix and a residual noise term may be used when determining the personalization. A Universal Background Model (UBM) is trained using the utterances. The UBM is applied to obtain i-vectors of training utterances received from a device and a Gaussian Mixture Model (GMM) is trained using the i-vectors. During testing, the i-vector for each utterance received from the device is estimated using the device's UBM. The utterance is then assigned to the cluster with the closest centroid in the GMM. For each utterance, the i-vector and the residual noise estimation is performed. Hyperparameter estimation is also performed. The i-vector estimation and hyperparameter estimation are performed until convergence.Type: GrantFiled: January 25, 2013Date of Patent: December 8, 2015Assignee: Microsoft Technology Licensing, LLCInventors: Kaisheng Yao, Yifan Gong
-
Patent number: 9177550Abstract: Various technologies described herein pertain to conservatively adapting a deep neural network (DNN) in a recognition system for a particular user or context. A DNN is employed to output a probability distribution over models of context-dependent units responsive to receipt of captured user input. The DNN is adapted for a particular user based upon the captured user input, wherein the adaption is undertaken conservatively such that a deviation between outputs of the adapted DNN and the unadapted DNN is constrained.Type: GrantFiled: March 6, 2013Date of Patent: November 3, 2015Assignee: Microsoft Technology Licensing, LLCInventors: Dong Yu, Kaisheng Yao, Hang Su, Gang Li, Frank Seide
-
Publication number: 20150161101Abstract: Recurrent conditional random field (R-CRF) embodiments are described. In one embodiment, the R-CFR receives feature values corresponding to a sequence of words. Semantic labels for words in the sequence of words are then generated and each label is assigned to the appropriate one of the words in the sequence of words. The R-CRF used to accomplish these tasks includes a recurrent neural network (RNN) portion and a conditional random field (CRF) portion. The RNN portion receives feature values associated with a word in the sequence of words and outputs RNN activation layer activations data that is indicative of a semantic label. The CRF portion inputs the RNN activation layer activations data output from the RNN for one or more words in the sequence of words and outputs label data that is indicative of a separate semantic label that is to be assigned to each of the words.Type: ApplicationFiled: March 7, 2014Publication date: June 11, 2015Applicant: Microsoft CorporationInventors: Kaisheng Yao, Geoffrey Gerson Zweig, Dong Yu
-
Publication number: 20150066496Abstract: Technologies pertaining to slot filling are described herein. A deep neural network, a recurrent neural network, and/or a spatio-temporally deep neural network are configured to assign labels to words in a word sequence set forth in natural language. At least one label is a semantic label that is assigned to at least one word in the word sequence.Type: ApplicationFiled: September 2, 2013Publication date: March 5, 2015Applicant: Microsoft CorporationInventors: Anoop Deoras, Kaisheng Yao, Xiaodong He, Li Deng, Geoffrey Gerson Zweig, Ruhi Sarikaya, Dong Yu, Mei-Yuh Hwang, Gregoire Mesnil
-
Publication number: 20140257803Abstract: Various technologies described herein pertain to conservatively adapting a deep neural network (DNN) in a recognition system for a particular user or context. A DNN is employed to output a probability distribution over models of context-dependent units responsive to receipt of captured user input. The DNN is adapted for a particular user based upon the captured user input, wherein the adaption is undertaken conservatively such that a deviation between outputs of the adapted DNN and the unadapted DNN is constrained.Type: ApplicationFiled: March 6, 2013Publication date: September 11, 2014Applicant: MICROSOFT CORPORATIONInventors: Dong Yu, Kaisheng Yao, Hang Su, Gang Li, Frank Seide
-
Publication number: 20140214420Abstract: Personalization for Automatic Speech Recognition (ASR) is associated with a particular device. A generalized i-vector clustering method is used to train i-vector parameters on utterances received from a device and to classify test utterances from the same device. A sub-loading matrix and a residual noise term may be used when determining the personalization. A Universal Background Model (UBM) is trained using the utterances. The UBM is applied to obtain i-vectors of training utterances received from a device and a Gaussian Mixture Model (GMM) is trained using the i-vectors. During testing, the i-vector for each utterance received from the device is estimated using the device's UBM. The utterance is then assigned to the cluster with the closest centroid in the GMM. For each utterance, the i-vector and the residual noise estimation is performed. Hyperparameter estimation is also performed. The i-vector estimation and hyperparameter estimation are performed until convergence.Type: ApplicationFiled: January 25, 2013Publication date: July 31, 2014Applicant: MICROSOFT CORPORATIONInventors: Kaisheng Yao, Yifan Gong
-
Patent number: 8700400Abstract: Subspace speech adaptation may be utilized for facilitating the recognition of speech containing short utterances. Speech training data may be received in a speech model by a computer. A first matrix may be determined for preconditioning speech statistics based on the speech training data. A second matrix may be determined for representing a basis for the speech to be recognized. A set of basis matrices may then be determined from the first matrix and the second matrix. Speech test data including a short utterance may then be received by the computer. The computer may then apply the set of basis matrices to the speech test data to produce a transcription. The transcription may represent speech recognition of the short utterance.Type: GrantFiled: December 30, 2010Date of Patent: April 15, 2014Assignee: Microsoft CorporationInventors: Daniel Povey, Kaisheng Yao, Yifan Gong
-
Publication number: 20120173240Abstract: Subspace speech adaptation may be utilized for facilitating the recognition of speech containing short utterances. Speech training data may be received in a speech model by a computer. A first matrix may be determined for preconditioning speech statistics based on the speech training data. A second matrix may be determined for representing a basis for the speech to be recognized. A set of basis matrices may then be determined from the first matrix and the second matrix. Speech test data including a short utterance may then be received by the computer. The computer may then apply the set of basis matrices to the speech test data to produce a transcription. The transcription may represent speech recognition of the short utterance.Type: ApplicationFiled: December 30, 2010Publication date: July 5, 2012Applicant: MICROSOFT CORPORATIONInventors: Daniel Povey, Kaisheng YAO, Yifan Gong
-
Patent number: 8180635Abstract: A method for adapting acoustic models used for automatic speech recognition is provided. The method includes estimating noise in a portion of a speech signal, determining a first estimated variance scaling vector using an estimated 2-order polynomial and the noise estimation. The estimated 2-order polynomial represents a prior knowledge of a dependency of a variance scaling vector on noise, determining a second estimated variance scaling vector using statistics from prior portions of the speech signal, determining a variance scaling factor using the first estimated variance scaling vector and the second estimated variance scaling vector, and using the variance scaling factor to adapt an acoustic model.Type: GrantFiled: December 31, 2008Date of Patent: May 15, 2012Assignee: Texas Instruments IncorporatedInventors: Xiaodong Cui, Kaisheng Yao
-
Patent number: 7966183Abstract: Automatic speech recognition verification using a combination of two or more confidence scores based on UV features which reuse computations of the original recognition.Type: GrantFiled: May 4, 2007Date of Patent: June 21, 2011Assignee: Texas Instruments IncorporatedInventors: Kaisheng Yao, Lorin Paul Netsch, Vishu Viswanathan
-
Publication number: 20100318355Abstract: Techniques and systems for training an acoustic model are described. In an embodiment, a technique for training an acoustic model includes dividing a corpus of training data that includes transcription errors into N parts, and on each part, decoding an utterance with an incremental acoustic model and an incremental language model to produce a decoded transcription. The technique may further include inserting silence between a pair of words into the decoded transcription and aligning an original transcription corresponding to the utterance with the decoded transcription according to time for each part. The technique may further include selecting a segment from the utterance having at least Q contiguous matching aligned words, and training the incremental acoustic model with the selected segment. The trained incremental acoustic model may then be used on a subsequent part of the training data. Other embodiments are described and claimed.Type: ApplicationFiled: June 10, 2009Publication date: December 16, 2010Applicant: MICROSOFT CORPORATIONInventors: Jinyu Li, Yifan Gong, Chaojun Liu, Kaisheng Yao
-
Publication number: 20100169090Abstract: A method for adapting acoustic models used for automatic speech recognition is provided. The method includes estimating noise in a portion of a speech signal, determining a first estimated variance scaling vector using an estimated 2-order polynomial and the noise estimation, wherein the estimated 2-order polynomial represents a priori knowledge of a dependency of a variance scaling vector on noise, determining a second estimated variance scaling vector using statistics from prior portions of the speech signal, determining a variance scaling factor using the first estimated variance scaling vector and the second estimated variance scaling vector, and using the variance scaling factor to adapt an acoustic model.Type: ApplicationFiled: December 31, 2008Publication date: July 1, 2010Inventors: Xiaodong Cui, Kaisheng Yao
-
Publication number: 20080300875Abstract: A speech recognition method and system, the method comprising the steps of providing a speech model, said speech model includes at least a portion of a state of Gaussian, clustering said Gaussian of said speech model to give N clusters of Gaussians, wherein N is an integer and utilizing said Gaussian in recognizing an utterance.Type: ApplicationFiled: June 4, 2008Publication date: December 4, 2008Inventors: Kaisheng Yao, Yu Tsao
-
Publication number: 20070233490Abstract: A system for, and method of, text-to-phoneme (TTP) mapping and a digital signal processor (DSP) incorporating the system or the method. In one embodiment, the system includes: (1) a letter-to-phoneme (LTP) mapping generator configured to generate an LTP mapping by iteratively aligning a full training set with a set of correctly aligned entries based on statistics of phonemes and letters from the set of correctly aligned entries and redefining the full training set as a union of the set of correctly aligned entries and a set of incorrectly aligned entries created during the aligning and (2) a model trainer configured to update prior probabilities of LTP mappings generated by the LTP generator and evaluate whether the LTP mappings are suitable for training a decision-tree-based pronunciation model (DTPM).Type: ApplicationFiled: April 3, 2006Publication date: October 4, 2007Applicant: Texas Instruments, IncorporatedInventor: Kaisheng Yao