Patents by Inventor Zhi-Jie Yan

Zhi-Jie Yan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Deep learning using alternating direction method of multipliers

Patent number: 10579922

Abstract: The use of the alternating direction method of multipliers (ADMM) algorithm to train a classifier may reduce the amount of classifier training time with little degradation in classifier accuracy. The training involves partitioning the training data for training the classifier into multiple data blocks. The partitions may preserve the joint distribution of input features and an output class of the training data. The training may further include performing an ADMM iteration on the multiple data blocks in an initial order using multiple worker nodes. Subsequently, the training of the classifier is determined to be completed if a stop criterion is satisfied following the ADMM iteration. Otherwise, if the stop criterion is determined to be unsatisfied following the ADMM iteration, one or more additional ADMM iterations may be performed on different orders of the multiple data blocks until the stop criterion is satisfied.

Type: Grant

Filed: April 8, 2014

Date of Patent: March 3, 2020

Assignee: Microsoft Technology Licensing, LLC

Inventors: Qiang Huo, Zhi-Jie Yan, Kai Chen
DEEP LEARNING USING ALTERNATING DIRECTION METHOD OF MULTIPLIERS

Publication number: 20170147920

Abstract: The use of the alternating direction method of multipliers (ADMM) algorithm to train a classifier may reduce the amount of classifier training time with little degradation in classifier accuracy. The training involves partitioning the training data for training the classifier into multiple data blocks. The partitions may preserve the joint distribution of input features and an output class of the training data. The training may further include performing an ADMM iteration on the multiple data blocks in an initial order using multiple worker nodes. Subsequently, the training of the classifier is determined to be completed if a stop criterion is satisfied following the ADMM iteration. Otherwise, if the stop criterion is determined to be unsatisfied following the ADMM iteration, one or more additional ADMM iterations may be performed on different orders of the multiple data blocks until the stop criterion is satisfied.

Type: Application

Filed: April 8, 2014

Publication date: May 25, 2017

Inventors: Qiang Huo, Zhi-Jie Yan, Kai Chen
I-Vector Based Clustering Training Data in Speech Recognition

Publication number: 20150199960

Abstract: Methods and systems for i-vector based clustering training data in speech recognition are described. An i-vector may be extracted from a speech segment of a speech training data to represent acoustic information. The extracted i-vectors from the speech training data may be clustered into multiple clusters using a hierarchical divisive clustering algorithm. Using a cluster of the multiple clusters, an acoustic model may be trained. This trained acoustic model may be used in speech recognition.

Type: Application

Filed: August 24, 2012

Publication date: July 16, 2015

Inventors: Qiang Huo, Zhi-Jie Yan, Yu Zhang, Jian Xu
NORMALIZATION BASED DISCRIMINATIVE TRAINING FOR CONTINUOUS SPEECH RECOGNITION

Publication number: 20130185070

Abstract: A speech recognition system trains a plurality of feature transforms and a plurality of acoustic models using an irrelevant variability normalization based discriminative training. The speech recognition system employs the trained feature transforms to absorb or ignore variability within an unknown speech that is irrelevant to phonetic classification. The speech recognition system may then recognize the unknown speech using the trained recognition models. The speech recognition system may further perform an unsupervised adaptation to adapt the feature transforms for the unknown speech and thus increase the accuracy of recognizing the unknown speech.

Type: Application

Filed: January 12, 2012

Publication date: July 18, 2013

Applicant: Microsoft Corporation

Inventors: Qiang Huo, Zhi-Jie Yan, Yu Zhang
Rich context modeling for text-to-speech engines

Patent number: 8340965

Abstract: Embodiments of rich context modeling for speech synthesis are disclosed. In operation, a text-to-speech engine refines a plurality of rich context models based on decision tree-tied Hidden Markov Models (HMMs) to produce a plurality of refined rich context models. The text-to-speech engine then generates synthesized speech for an input text based at least on some of the plurality of refined rich context models.

Type: Grant

Filed: December 2, 2009

Date of Patent: December 25, 2012

Assignee: Microsoft Corporation

Inventors: Zhi-Jie Yan, Yao Qian, Frank Kao-Ping Soong
Trajectory Tiling Approach for Text-to-Speech

Publication number: 20120143611

Abstract: Hidden Markov Models HMM trajectory tiling (HTT)-based approaches may be used to synthesize speech from text. In operation, a set of Hidden Markov Models (HMMs) and a set of waveform units may be obtained from a speech corpus. The set of HMMs are further refined via minimum generation error (MGE) training to generate a refined set of HMMs. Subsequently, a speech parameter trajectory may be generated by applying the refined set of HMMs to an input text. A unit lattice of candidate waveform units may be selected from the set of waveform units based at least on the speech parameter trajectory. A normalized cross-correlation (NCC)-based search on the unit lattice may be performed to obtain a minimal concatenation cost sequence of candidate waveform units, which are concatenated into a concatenated waveform sequence that is synthesized into speech.

Type: Application

Filed: December 7, 2010

Publication date: June 7, 2012

Applicant: Microsoft Corporation

Inventors: Yao Qian, Zhi-Jie Yan, Yi-Jian Wu, Frank Kao-Ping Soong
SMALL FOOTPRINT TEXT-TO-SPEECH ENGINE

Publication number: 20110071835

Abstract: Embodiments of small footprint text-to-speech engine are disclosed. In operation, the small footprint text-to-speech engine generates a set of feature parameters for an input text. The set of feature parameters includes static feature parameters and delta feature parameters. The small footprint text-to-speech engine then derives a saw-tooth stochastic trajectory that represents the speech characteristics of the input text based on the static feature parameters and the delta parameters. Finally, the small footprint text-to-speech engine produces a smoothed trajectory from the saw-tooth stochastic trajectory, and generates synthesized speech based on the smoothed trajectory.

Type: Application

Filed: September 22, 2009

Publication date: March 24, 2011

Applicant: MICROSOFT CORPORATION

Inventors: Yi-Ning Chen, Zhi-Jie Yan, Frank Kao-Ping Soong
RICH CONTEXT MODELING FOR TEXT-TO-SPEECH ENGINES

Publication number: 20110054903

Abstract: Embodiments of rich text modeling for speech synthesis are disclosed. In operation, a text-to-speech engine refines a plurality of rich context models based on decision tree-tied Hidden Markov Models (HMMs) to produce a plurality of refined rich context models. The text-to-speech engine then generates synthesized speech for an input text based at least on some of the plurality of refined rich context models.

Type: Application

Filed: December 2, 2009

Publication date: March 3, 2011

Applicant: MICROSOFT CORPORATION

Inventors: Zhi-Jie Yan, Yao Qian, Frank Kao-Ping Soong