Patents by Inventor Kaisheng Yao

Kaisheng Yao has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

DEEP NEURAL SUPPORT VECTOR MACHINES

Publication number: 20160307565

Abstract: Aspects of the technology described herein relate to a new type of deep neural network (DNN). The new DNN is described herein as a deep neural support vector machine (DNSVM). Traditional DNNs use the multinomial logistic regression (softmax activation) at the top layer and underlying layers for training. The new DNN instead uses a support vector machine (SVM) as one or more layers, including the top layer. The technology described herein can use one of two training algorithms to train the DNSVM to learn parameters of SVM and DNN in the maximum-margin criteria. The first training method is a frame-level training. In the frame-level training, the new model is shown to be related to the multi-class SVM with DNN features. The second training method is the sequence-level training. The sequence-level training is related to the structured SVM with DNN features and HMM state transition features.

Type: Application

Filed: February 16, 2016

Publication date: October 20, 2016

Inventors: CHAOJUN LIU, KAISHENG YAO, YIFAN GONG, SHIXIONG ZHANG
NATURAL MOTION-BASED CONTROL VIA WEARABLE AND MOBILE DEVICES

Publication number: 20160091965

Abstract: A “Natural Motion Controller” identifies various motions of one or more parts of a user's body to interact with electronic devices, thereby enabling various natural user interface (NUI) scenarios. The Natural Motion Controller constructs composite motion recognition windows by concatenating an adjustable number of sequential periods of inertial sensor data received from a plurality of separate sets of inertial sensors. Each of these separate sets of inertial sensors are coupled to, or otherwise provide sensor data relating to, a separate user worn, carried, or held mobile computing device. Each composite motion recognition window is then passed to a motion recognition model trained by one or more machine-based deep learning processes. This motion recognition model is then applied to the composite motion recognition windows to identify a sequence of one or more predefined motions. Identified motions are then used as the basis for triggering execution of one or more application commands.

Type: Application

Filed: September 30, 2014

Publication date: March 31, 2016

Inventors: Jiaping Wang, Yujia Li, Xuedong Huang, Lingfeng Wu, Wei Xiong, Kaisheng Yao, Geoffrey Zweig
Model training for automatic speech recognition from imperfect transcription data

Patent number: 9280969

Abstract: Techniques and systems for training an acoustic model are described. In an embodiment, a technique for training an acoustic model includes dividing a corpus of training data that includes transcription errors into N parts, and on each part, decoding an utterance with an incremental acoustic model and an incremental language model to produce a decoded transcription. The technique may further include inserting silence between a pair of words into the decoded transcription and aligning an original transcription corresponding to the utterance with the decoded transcription according to time for each part. The technique may further include selecting a segment from the utterance having at least Q contiguous matching aligned words, and training the incremental acoustic model with the selected segment. The trained incremental acoustic model may then be used on a subsequent part of the training data. Other embodiments are described and claimed.

Type: Grant

Filed: June 10, 2009

Date of Patent: March 8, 2016

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Jinyu Li, Yifan Gong, Chaojun Liu, Kaisheng Yao
Recurrent conditional random fields

Patent number: 9239828

Abstract: Recurrent conditional random field (R-CRF) embodiments are described. In one embodiment, the R-CFR receives feature values corresponding to a sequence of words. Semantic labels for words in the sequence of words are then generated and each label is assigned to the appropriate one of the words in the sequence of words. The R-CRF used to accomplish these tasks includes a recurrent neural network (RNN) portion and a conditional random field (CRF) portion. The RNN portion receives feature values associated with a word in the sequence of words and outputs RNN activation layer activations data that is indicative of a semantic label. The CRF portion inputs the RNN activation layer activations data output from the RNN for one or more words in the sequence of words and outputs label data that is indicative of a separate semantic label that is to be assigned to each of the words.

Type: Grant

Filed: March 7, 2014

Date of Patent: January 19, 2016

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Kaisheng Yao, Geoffrey Gerson Zweig, Dong Yu
ADVANCED RECURRENT NEURAL NETWORK BASED LETTER-TO-SOUND

Publication number: 20150364127

Abstract: The technology relates to performing letter-to-sound conversion utilizing recurrent neural networks (RNNs). The RNNs may be implemented as RNN modules for letter-to-sound conversion. The RNN modules receive text input and convert the text to corresponding phonemes. In determining the corresponding phonemes, the RNN modules may analyze the letters of the text and the letters surrounding the text being analyzed. The RNN modules may also analyze the letters of the text in reverse order. The RNN modules may also receive contextual information about the input text. The letter-to-sound conversion may then also be based on the contextual information that is received. The determined phonemes may be utilized to generate synthesized speech from the input text.

Type: Application

Filed: June 13, 2014

Publication date: December 17, 2015

Applicant: MICROSOFT CORPORATION

Inventors: Pei Zhao, Kaisheng Yao, Max Leung, Mei-Yuh Hwang, Sheng Zhao, Bo Yan, Geoffrey Zweig, Fileno A. Alleva
HYPER-STRUCTURE RECURRENT NEURAL NETWORKS FOR TEXT-TO-SPEECH

Publication number: 20150364128

Abstract: The technology relates to converting text to speech utilizing recurrent neural networks (RNNs). The recurrent neural networks may be implemented as multiple modules for determining properties of the text. In embodiments, a part-of-speech RNN module, letter-to-sound RNN module, a linguistic prosody tagger RNN module, and a context awareness and semantic mining RNN module may all be utilized. The properties from the RNN modules are processed by a hyper-structure RNN module that determine the phonetic properties of the input text based on the outputs of the other RNN modules. The hyper-structure RNN module may generate a generation sequence that is capable of being converting to audible speech by a speech synthesizer. The generation sequence may also be optimized by a global optimization module prior to being synthesized into audible speech.

Type: Application

Filed: June 13, 2014

Publication date: December 17, 2015

Applicant: MICROSOFT CORPORATION

Inventors: Pei Zhao, Max Leung, Kaisheng Yao, Bo Yan, Sheng Zhao, Fileno A. Alleva
Feature space transformation for personalization using generalized i-vector clustering

Patent number: 9208777

Abstract: Personalization for Automatic Speech Recognition (ASR) is associated with a particular device. A generalized i-vector clustering method is used to train i-vector parameters on utterances received from a device and to classify test utterances from the same device. A sub-loading matrix and a residual noise term may be used when determining the personalization. A Universal Background Model (UBM) is trained using the utterances. The UBM is applied to obtain i-vectors of training utterances received from a device and a Gaussian Mixture Model (GMM) is trained using the i-vectors. During testing, the i-vector for each utterance received from the device is estimated using the device's UBM. The utterance is then assigned to the cluster with the closest centroid in the GMM. For each utterance, the i-vector and the residual noise estimation is performed. Hyperparameter estimation is also performed. The i-vector estimation and hyperparameter estimation are performed until convergence.

Type: Grant

Filed: January 25, 2013

Date of Patent: December 8, 2015

Assignee: Microsoft Technology Licensing, LLC

Inventors: Kaisheng Yao, Yifan Gong
Conservatively adapting a deep neural network in a recognition system

Patent number: 9177550

Abstract: Various technologies described herein pertain to conservatively adapting a deep neural network (DNN) in a recognition system for a particular user or context. A DNN is employed to output a probability distribution over models of context-dependent units responsive to receipt of captured user input. The DNN is adapted for a particular user based upon the captured user input, wherein the adaption is undertaken conservatively such that a deviation between outputs of the adapted DNN and the unadapted DNN is constrained.

Type: Grant

Filed: March 6, 2013

Date of Patent: November 3, 2015

Assignee: Microsoft Technology Licensing, LLC

Inventors: Dong Yu, Kaisheng Yao, Hang Su, Gang Li, Frank Seide
RECURRENT CONDITIONAL RANDOM FIELDS

Publication number: 20150161101

Abstract: Recurrent conditional random field (R-CRF) embodiments are described. In one embodiment, the R-CFR receives feature values corresponding to a sequence of words. Semantic labels for words in the sequence of words are then generated and each label is assigned to the appropriate one of the words in the sequence of words. The R-CRF used to accomplish these tasks includes a recurrent neural network (RNN) portion and a conditional random field (CRF) portion. The RNN portion receives feature values associated with a word in the sequence of words and outputs RNN activation layer activations data that is indicative of a semantic label. The CRF portion inputs the RNN activation layer activations data output from the RNN for one or more words in the sequence of words and outputs label data that is indicative of a separate semantic label that is to be assigned to each of the words.

Type: Application

Filed: March 7, 2014

Publication date: June 11, 2015

Applicant: Microsoft Corporation

Inventors: Kaisheng Yao, Geoffrey Gerson Zweig, Dong Yu
ASSIGNMENT OF SEMANTIC LABELS TO A SEQUENCE OF WORDS USING NEURAL NETWORK ARCHITECTURES

Publication number: 20150066496

Abstract: Technologies pertaining to slot filling are described herein. A deep neural network, a recurrent neural network, and/or a spatio-temporally deep neural network are configured to assign labels to words in a word sequence set forth in natural language. At least one label is a semantic label that is assigned to at least one word in the word sequence.

Type: Application

Filed: September 2, 2013

Publication date: March 5, 2015

Applicant: Microsoft Corporation

Inventors: Anoop Deoras, Kaisheng Yao, Xiaodong He, Li Deng, Geoffrey Gerson Zweig, Ruhi Sarikaya, Dong Yu, Mei-Yuh Hwang, Gregoire Mesnil
CONSERVATIVELY ADAPTING A DEEP NEURAL NETWORK IN A RECOGNITION SYSTEM

Publication number: 20140257803

Abstract: Various technologies described herein pertain to conservatively adapting a deep neural network (DNN) in a recognition system for a particular user or context. A DNN is employed to output a probability distribution over models of context-dependent units responsive to receipt of captured user input. The DNN is adapted for a particular user based upon the captured user input, wherein the adaption is undertaken conservatively such that a deviation between outputs of the adapted DNN and the unadapted DNN is constrained.

Type: Application

Filed: March 6, 2013

Publication date: September 11, 2014

Applicant: MICROSOFT CORPORATION

Inventors: Dong Yu, Kaisheng Yao, Hang Su, Gang Li, Frank Seide
FEATURE SPACE TRANSFORMATION FOR PERSONALIZATION USING GENERALIZED I-VECTOR CLUSTERING

Publication number: 20140214420

Abstract: Personalization for Automatic Speech Recognition (ASR) is associated with a particular device. A generalized i-vector clustering method is used to train i-vector parameters on utterances received from a device and to classify test utterances from the same device. A sub-loading matrix and a residual noise term may be used when determining the personalization. A Universal Background Model (UBM) is trained using the utterances. The UBM is applied to obtain i-vectors of training utterances received from a device and a Gaussian Mixture Model (GMM) is trained using the i-vectors. During testing, the i-vector for each utterance received from the device is estimated using the device's UBM. The utterance is then assigned to the cluster with the closest centroid in the GMM. For each utterance, the i-vector and the residual noise estimation is performed. Hyperparameter estimation is also performed. The i-vector estimation and hyperparameter estimation are performed until convergence.

Type: Application

Filed: January 25, 2013

Publication date: July 31, 2014

Applicant: MICROSOFT CORPORATION

Inventors: Kaisheng Yao, Yifan Gong
Subspace speech adaptation

Patent number: 8700400

Abstract: Subspace speech adaptation may be utilized for facilitating the recognition of speech containing short utterances. Speech training data may be received in a speech model by a computer. A first matrix may be determined for preconditioning speech statistics based on the speech training data. A second matrix may be determined for representing a basis for the speech to be recognized. A set of basis matrices may then be determined from the first matrix and the second matrix. Speech test data including a short utterance may then be received by the computer. The computer may then apply the set of basis matrices to the speech test data to produce a transcription. The transcription may represent speech recognition of the short utterance.

Type: Grant

Filed: December 30, 2010

Date of Patent: April 15, 2014

Assignee: Microsoft Corporation

Inventors: Daniel Povey, Kaisheng Yao, Yifan Gong
Subspace Speech Adaptation

Publication number: 20120173240

Abstract: Subspace speech adaptation may be utilized for facilitating the recognition of speech containing short utterances. Speech training data may be received in a speech model by a computer. A first matrix may be determined for preconditioning speech statistics based on the speech training data. A second matrix may be determined for representing a basis for the speech to be recognized. A set of basis matrices may then be determined from the first matrix and the second matrix. Speech test data including a short utterance may then be received by the computer. The computer may then apply the set of basis matrices to the speech test data to produce a transcription. The transcription may represent speech recognition of the short utterance.

Type: Application

Filed: December 30, 2010

Publication date: July 5, 2012

Applicant: MICROSOFT CORPORATION

Inventors: Daniel Povey, Kaisheng YAO, Yifan Gong
Weighted sequential variance adaptation with prior knowledge for noise robust speech recognition

Patent number: 8180635

Abstract: A method for adapting acoustic models used for automatic speech recognition is provided. The method includes estimating noise in a portion of a speech signal, determining a first estimated variance scaling vector using an estimated 2-order polynomial and the noise estimation. The estimated 2-order polynomial represents a prior knowledge of a dependency of a variance scaling vector on noise, determining a second estimated variance scaling vector using statistics from prior portions of the speech signal, determining a variance scaling factor using the first estimated variance scaling vector and the second estimated variance scaling vector, and using the variance scaling factor to adapt an acoustic model.

Type: Grant

Filed: December 31, 2008

Date of Patent: May 15, 2012

Assignee: Texas Instruments Incorporated

Inventors: Xiaodong Cui, Kaisheng Yao
Multiplying confidence scores for utterance verification in a mobile telephone

Patent number: 7966183

Abstract: Automatic speech recognition verification using a combination of two or more confidence scores based on UV features which reuse computations of the original recognition.

Type: Grant

Filed: May 4, 2007

Date of Patent: June 21, 2011

Assignee: Texas Instruments Incorporated

Inventors: Kaisheng Yao, Lorin Paul Netsch, Vishu Viswanathan
MODEL TRAINING FOR AUTOMATIC SPEECH RECOGNITION FROM IMPERFECT TRANSCRIPTION DATA

Publication number: 20100318355

Abstract: Techniques and systems for training an acoustic model are described. In an embodiment, a technique for training an acoustic model includes dividing a corpus of training data that includes transcription errors into N parts, and on each part, decoding an utterance with an incremental acoustic model and an incremental language model to produce a decoded transcription. The technique may further include inserting silence between a pair of words into the decoded transcription and aligning an original transcription corresponding to the utterance with the decoded transcription according to time for each part. The technique may further include selecting a segment from the utterance having at least Q contiguous matching aligned words, and training the incremental acoustic model with the selected segment. The trained incremental acoustic model may then be used on a subsequent part of the training data. Other embodiments are described and claimed.

Type: Application

Filed: June 10, 2009

Publication date: December 16, 2010

Applicant: MICROSOFT CORPORATION

Inventors: Jinyu Li, Yifan Gong, Chaojun Liu, Kaisheng Yao
WEIGHTED SEQUENTIAL VARIANCE ADAPTATION WITH PRIOR KNOWLEDGE FOR NOISE ROBUST SPEECH RECOGNITION

Publication number: 20100169090

Abstract: A method for adapting acoustic models used for automatic speech recognition is provided. The method includes estimating noise in a portion of a speech signal, determining a first estimated variance scaling vector using an estimated 2-order polynomial and the noise estimation, wherein the estimated 2-order polynomial represents a priori knowledge of a dependency of a variance scaling vector on noise, determining a second estimated variance scaling vector using statistics from prior portions of the speech signal, determining a variance scaling factor using the first estimated variance scaling vector and the second estimated variance scaling vector, and using the variance scaling factor to adapt an acoustic model.

Type: Application

Filed: December 31, 2008

Publication date: July 1, 2010

Inventors: Xiaodong Cui, Kaisheng Yao
Efficient Speech Recognition with Cluster Methods

Publication number: 20080300875

Abstract: A speech recognition method and system, the method comprising the steps of providing a speech model, said speech model includes at least a portion of a state of Gaussian, clustering said Gaussian of said speech model to give N clusters of Gaussians, wherein N is an integer and utilizing said Gaussian in recognizing an utterance.

Type: Application

Filed: June 4, 2008

Publication date: December 4, 2008

Inventors: Kaisheng Yao, Yu Tsao
SYSTEM AND METHOD FOR TEXT-TO-PHONEME MAPPING WITH PRIOR KNOWLEDGE

Publication number: 20070233490

Abstract: A system for, and method of, text-to-phoneme (TTP) mapping and a digital signal processor (DSP) incorporating the system or the method. In one embodiment, the system includes: (1) a letter-to-phoneme (LTP) mapping generator configured to generate an LTP mapping by iteratively aligning a full training set with a set of correctly aligned entries based on statistics of phonemes and letters from the set of correctly aligned entries and redefining the full training set as a union of the set of correctly aligned entries and a set of incorrectly aligned entries created during the aligning and (2) a model trainer configured to update prior probabilities of LTP mappings generated by the LTP generator and evaluate whether the LTP mappings are suitable for training a decision-tree-based pronunciation model (DTPM).

Type: Application

Filed: April 3, 2006

Publication date: October 4, 2007

Applicant: Texas Instruments, Incorporated

Inventor: Kaisheng Yao

prev 1 2 3 next