Patents by Inventor Kaisheng Yao

Kaisheng Yao has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20160307565
    Abstract: Aspects of the technology described herein relate to a new type of deep neural network (DNN). The new DNN is described herein as a deep neural support vector machine (DNSVM). Traditional DNNs use the multinomial logistic regression (softmax activation) at the top layer and underlying layers for training. The new DNN instead uses a support vector machine (SVM) as one or more layers, including the top layer. The technology described herein can use one of two training algorithms to train the DNSVM to learn parameters of SVM and DNN in the maximum-margin criteria. The first training method is a frame-level training. In the frame-level training, the new model is shown to be related to the multi-class SVM with DNN features. The second training method is the sequence-level training. The sequence-level training is related to the structured SVM with DNN features and HMM state transition features.
    Type: Application
    Filed: February 16, 2016
    Publication date: October 20, 2016
    Inventors: CHAOJUN LIU, KAISHENG YAO, YIFAN GONG, SHIXIONG ZHANG
  • Publication number: 20160091965
    Abstract: A “Natural Motion Controller” identifies various motions of one or more parts of a user's body to interact with electronic devices, thereby enabling various natural user interface (NUI) scenarios. The Natural Motion Controller constructs composite motion recognition windows by concatenating an adjustable number of sequential periods of inertial sensor data received from a plurality of separate sets of inertial sensors. Each of these separate sets of inertial sensors are coupled to, or otherwise provide sensor data relating to, a separate user worn, carried, or held mobile computing device. Each composite motion recognition window is then passed to a motion recognition model trained by one or more machine-based deep learning processes. This motion recognition model is then applied to the composite motion recognition windows to identify a sequence of one or more predefined motions. Identified motions are then used as the basis for triggering execution of one or more application commands.
    Type: Application
    Filed: September 30, 2014
    Publication date: March 31, 2016
    Inventors: Jiaping Wang, Yujia Li, Xuedong Huang, Lingfeng Wu, Wei Xiong, Kaisheng Yao, Geoffrey Zweig
  • Patent number: 9280969
    Abstract: Techniques and systems for training an acoustic model are described. In an embodiment, a technique for training an acoustic model includes dividing a corpus of training data that includes transcription errors into N parts, and on each part, decoding an utterance with an incremental acoustic model and an incremental language model to produce a decoded transcription. The technique may further include inserting silence between a pair of words into the decoded transcription and aligning an original transcription corresponding to the utterance with the decoded transcription according to time for each part. The technique may further include selecting a segment from the utterance having at least Q contiguous matching aligned words, and training the incremental acoustic model with the selected segment. The trained incremental acoustic model may then be used on a subsequent part of the training data. Other embodiments are described and claimed.
    Type: Grant
    Filed: June 10, 2009
    Date of Patent: March 8, 2016
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Jinyu Li, Yifan Gong, Chaojun Liu, Kaisheng Yao
  • Patent number: 9239828
    Abstract: Recurrent conditional random field (R-CRF) embodiments are described. In one embodiment, the R-CFR receives feature values corresponding to a sequence of words. Semantic labels for words in the sequence of words are then generated and each label is assigned to the appropriate one of the words in the sequence of words. The R-CRF used to accomplish these tasks includes a recurrent neural network (RNN) portion and a conditional random field (CRF) portion. The RNN portion receives feature values associated with a word in the sequence of words and outputs RNN activation layer activations data that is indicative of a semantic label. The CRF portion inputs the RNN activation layer activations data output from the RNN for one or more words in the sequence of words and outputs label data that is indicative of a separate semantic label that is to be assigned to each of the words.
    Type: Grant
    Filed: March 7, 2014
    Date of Patent: January 19, 2016
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Kaisheng Yao, Geoffrey Gerson Zweig, Dong Yu
  • Publication number: 20150364127
    Abstract: The technology relates to performing letter-to-sound conversion utilizing recurrent neural networks (RNNs). The RNNs may be implemented as RNN modules for letter-to-sound conversion. The RNN modules receive text input and convert the text to corresponding phonemes. In determining the corresponding phonemes, the RNN modules may analyze the letters of the text and the letters surrounding the text being analyzed. The RNN modules may also analyze the letters of the text in reverse order. The RNN modules may also receive contextual information about the input text. The letter-to-sound conversion may then also be based on the contextual information that is received. The determined phonemes may be utilized to generate synthesized speech from the input text.
    Type: Application
    Filed: June 13, 2014
    Publication date: December 17, 2015
    Applicant: MICROSOFT CORPORATION
    Inventors: Pei Zhao, Kaisheng Yao, Max Leung, Mei-Yuh Hwang, Sheng Zhao, Bo Yan, Geoffrey Zweig, Fileno A. Alleva
  • Publication number: 20150364128
    Abstract: The technology relates to converting text to speech utilizing recurrent neural networks (RNNs). The recurrent neural networks may be implemented as multiple modules for determining properties of the text. In embodiments, a part-of-speech RNN module, letter-to-sound RNN module, a linguistic prosody tagger RNN module, and a context awareness and semantic mining RNN module may all be utilized. The properties from the RNN modules are processed by a hyper-structure RNN module that determine the phonetic properties of the input text based on the outputs of the other RNN modules. The hyper-structure RNN module may generate a generation sequence that is capable of being converting to audible speech by a speech synthesizer. The generation sequence may also be optimized by a global optimization module prior to being synthesized into audible speech.
    Type: Application
    Filed: June 13, 2014
    Publication date: December 17, 2015
    Applicant: MICROSOFT CORPORATION
    Inventors: Pei Zhao, Max Leung, Kaisheng Yao, Bo Yan, Sheng Zhao, Fileno A. Alleva
  • Patent number: 9208777
    Abstract: Personalization for Automatic Speech Recognition (ASR) is associated with a particular device. A generalized i-vector clustering method is used to train i-vector parameters on utterances received from a device and to classify test utterances from the same device. A sub-loading matrix and a residual noise term may be used when determining the personalization. A Universal Background Model (UBM) is trained using the utterances. The UBM is applied to obtain i-vectors of training utterances received from a device and a Gaussian Mixture Model (GMM) is trained using the i-vectors. During testing, the i-vector for each utterance received from the device is estimated using the device's UBM. The utterance is then assigned to the cluster with the closest centroid in the GMM. For each utterance, the i-vector and the residual noise estimation is performed. Hyperparameter estimation is also performed. The i-vector estimation and hyperparameter estimation are performed until convergence.
    Type: Grant
    Filed: January 25, 2013
    Date of Patent: December 8, 2015
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Kaisheng Yao, Yifan Gong
  • Patent number: 9177550
    Abstract: Various technologies described herein pertain to conservatively adapting a deep neural network (DNN) in a recognition system for a particular user or context. A DNN is employed to output a probability distribution over models of context-dependent units responsive to receipt of captured user input. The DNN is adapted for a particular user based upon the captured user input, wherein the adaption is undertaken conservatively such that a deviation between outputs of the adapted DNN and the unadapted DNN is constrained.
    Type: Grant
    Filed: March 6, 2013
    Date of Patent: November 3, 2015
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Dong Yu, Kaisheng Yao, Hang Su, Gang Li, Frank Seide
  • Publication number: 20150161101
    Abstract: Recurrent conditional random field (R-CRF) embodiments are described. In one embodiment, the R-CFR receives feature values corresponding to a sequence of words. Semantic labels for words in the sequence of words are then generated and each label is assigned to the appropriate one of the words in the sequence of words. The R-CRF used to accomplish these tasks includes a recurrent neural network (RNN) portion and a conditional random field (CRF) portion. The RNN portion receives feature values associated with a word in the sequence of words and outputs RNN activation layer activations data that is indicative of a semantic label. The CRF portion inputs the RNN activation layer activations data output from the RNN for one or more words in the sequence of words and outputs label data that is indicative of a separate semantic label that is to be assigned to each of the words.
    Type: Application
    Filed: March 7, 2014
    Publication date: June 11, 2015
    Applicant: Microsoft Corporation
    Inventors: Kaisheng Yao, Geoffrey Gerson Zweig, Dong Yu
  • Publication number: 20150066496
    Abstract: Technologies pertaining to slot filling are described herein. A deep neural network, a recurrent neural network, and/or a spatio-temporally deep neural network are configured to assign labels to words in a word sequence set forth in natural language. At least one label is a semantic label that is assigned to at least one word in the word sequence.
    Type: Application
    Filed: September 2, 2013
    Publication date: March 5, 2015
    Applicant: Microsoft Corporation
    Inventors: Anoop Deoras, Kaisheng Yao, Xiaodong He, Li Deng, Geoffrey Gerson Zweig, Ruhi Sarikaya, Dong Yu, Mei-Yuh Hwang, Gregoire Mesnil
  • Publication number: 20140257803
    Abstract: Various technologies described herein pertain to conservatively adapting a deep neural network (DNN) in a recognition system for a particular user or context. A DNN is employed to output a probability distribution over models of context-dependent units responsive to receipt of captured user input. The DNN is adapted for a particular user based upon the captured user input, wherein the adaption is undertaken conservatively such that a deviation between outputs of the adapted DNN and the unadapted DNN is constrained.
    Type: Application
    Filed: March 6, 2013
    Publication date: September 11, 2014
    Applicant: MICROSOFT CORPORATION
    Inventors: Dong Yu, Kaisheng Yao, Hang Su, Gang Li, Frank Seide
  • Publication number: 20140214420
    Abstract: Personalization for Automatic Speech Recognition (ASR) is associated with a particular device. A generalized i-vector clustering method is used to train i-vector parameters on utterances received from a device and to classify test utterances from the same device. A sub-loading matrix and a residual noise term may be used when determining the personalization. A Universal Background Model (UBM) is trained using the utterances. The UBM is applied to obtain i-vectors of training utterances received from a device and a Gaussian Mixture Model (GMM) is trained using the i-vectors. During testing, the i-vector for each utterance received from the device is estimated using the device's UBM. The utterance is then assigned to the cluster with the closest centroid in the GMM. For each utterance, the i-vector and the residual noise estimation is performed. Hyperparameter estimation is also performed. The i-vector estimation and hyperparameter estimation are performed until convergence.
    Type: Application
    Filed: January 25, 2013
    Publication date: July 31, 2014
    Applicant: MICROSOFT CORPORATION
    Inventors: Kaisheng Yao, Yifan Gong
  • Patent number: 8700400
    Abstract: Subspace speech adaptation may be utilized for facilitating the recognition of speech containing short utterances. Speech training data may be received in a speech model by a computer. A first matrix may be determined for preconditioning speech statistics based on the speech training data. A second matrix may be determined for representing a basis for the speech to be recognized. A set of basis matrices may then be determined from the first matrix and the second matrix. Speech test data including a short utterance may then be received by the computer. The computer may then apply the set of basis matrices to the speech test data to produce a transcription. The transcription may represent speech recognition of the short utterance.
    Type: Grant
    Filed: December 30, 2010
    Date of Patent: April 15, 2014
    Assignee: Microsoft Corporation
    Inventors: Daniel Povey, Kaisheng Yao, Yifan Gong
  • Publication number: 20120173240
    Abstract: Subspace speech adaptation may be utilized for facilitating the recognition of speech containing short utterances. Speech training data may be received in a speech model by a computer. A first matrix may be determined for preconditioning speech statistics based on the speech training data. A second matrix may be determined for representing a basis for the speech to be recognized. A set of basis matrices may then be determined from the first matrix and the second matrix. Speech test data including a short utterance may then be received by the computer. The computer may then apply the set of basis matrices to the speech test data to produce a transcription. The transcription may represent speech recognition of the short utterance.
    Type: Application
    Filed: December 30, 2010
    Publication date: July 5, 2012
    Applicant: MICROSOFT CORPORATION
    Inventors: Daniel Povey, Kaisheng YAO, Yifan Gong
  • Patent number: 8180635
    Abstract: A method for adapting acoustic models used for automatic speech recognition is provided. The method includes estimating noise in a portion of a speech signal, determining a first estimated variance scaling vector using an estimated 2-order polynomial and the noise estimation. The estimated 2-order polynomial represents a prior knowledge of a dependency of a variance scaling vector on noise, determining a second estimated variance scaling vector using statistics from prior portions of the speech signal, determining a variance scaling factor using the first estimated variance scaling vector and the second estimated variance scaling vector, and using the variance scaling factor to adapt an acoustic model.
    Type: Grant
    Filed: December 31, 2008
    Date of Patent: May 15, 2012
    Assignee: Texas Instruments Incorporated
    Inventors: Xiaodong Cui, Kaisheng Yao
  • Patent number: 7966183
    Abstract: Automatic speech recognition verification using a combination of two or more confidence scores based on UV features which reuse computations of the original recognition.
    Type: Grant
    Filed: May 4, 2007
    Date of Patent: June 21, 2011
    Assignee: Texas Instruments Incorporated
    Inventors: Kaisheng Yao, Lorin Paul Netsch, Vishu Viswanathan
  • Publication number: 20100318355
    Abstract: Techniques and systems for training an acoustic model are described. In an embodiment, a technique for training an acoustic model includes dividing a corpus of training data that includes transcription errors into N parts, and on each part, decoding an utterance with an incremental acoustic model and an incremental language model to produce a decoded transcription. The technique may further include inserting silence between a pair of words into the decoded transcription and aligning an original transcription corresponding to the utterance with the decoded transcription according to time for each part. The technique may further include selecting a segment from the utterance having at least Q contiguous matching aligned words, and training the incremental acoustic model with the selected segment. The trained incremental acoustic model may then be used on a subsequent part of the training data. Other embodiments are described and claimed.
    Type: Application
    Filed: June 10, 2009
    Publication date: December 16, 2010
    Applicant: MICROSOFT CORPORATION
    Inventors: Jinyu Li, Yifan Gong, Chaojun Liu, Kaisheng Yao
  • Publication number: 20100169090
    Abstract: A method for adapting acoustic models used for automatic speech recognition is provided. The method includes estimating noise in a portion of a speech signal, determining a first estimated variance scaling vector using an estimated 2-order polynomial and the noise estimation, wherein the estimated 2-order polynomial represents a priori knowledge of a dependency of a variance scaling vector on noise, determining a second estimated variance scaling vector using statistics from prior portions of the speech signal, determining a variance scaling factor using the first estimated variance scaling vector and the second estimated variance scaling vector, and using the variance scaling factor to adapt an acoustic model.
    Type: Application
    Filed: December 31, 2008
    Publication date: July 1, 2010
    Inventors: Xiaodong Cui, Kaisheng Yao
  • Publication number: 20080300875
    Abstract: A speech recognition method and system, the method comprising the steps of providing a speech model, said speech model includes at least a portion of a state of Gaussian, clustering said Gaussian of said speech model to give N clusters of Gaussians, wherein N is an integer and utilizing said Gaussian in recognizing an utterance.
    Type: Application
    Filed: June 4, 2008
    Publication date: December 4, 2008
    Inventors: Kaisheng Yao, Yu Tsao
  • Publication number: 20070233490
    Abstract: A system for, and method of, text-to-phoneme (TTP) mapping and a digital signal processor (DSP) incorporating the system or the method. In one embodiment, the system includes: (1) a letter-to-phoneme (LTP) mapping generator configured to generate an LTP mapping by iteratively aligning a full training set with a set of correctly aligned entries based on statistics of phonemes and letters from the set of correctly aligned entries and redefining the full training set as a union of the set of correctly aligned entries and a set of incorrectly aligned entries created during the aligning and (2) a model trainer configured to update prior probabilities of LTP mappings generated by the LTP generator and evaluate whether the LTP mappings are suitable for training a decision-tree-based pronunciation model (DTPM).
    Type: Application
    Filed: April 3, 2006
    Publication date: October 4, 2007
    Applicant: Texas Instruments, Incorporated
    Inventor: Kaisheng Yao