Patents by Inventor Yifan Gong

Yifan Gong has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Decoding multiple HMM sets using a single sentence grammar

Patent number: 7269558

Abstract: For a given sentence grammar, speech recognizers are often required to decode M sets of HMMs each of which models a specific acoustic environment. In order to match input acoustic observations to each of the environments, typically recognition search methods require a network of M sub-networks. A new speech recognition search method is described here, which needs that is only the size of a single sub-network and yet gives the same recognition performance, thus reducing memory requirement for network storage by (M?1)/M.

Type: Grant

Filed: July 26, 2001

Date of Patent: September 11, 2007

Assignee: Texas Instruments Incorporated

Inventor: Yifan Gong
Method to extend operating range of joint additive and convolutive compensating algorithms

Patent number: 7236930

Abstract: The operating range of joint additive and convolutive compensating method is extended by enhanced channel estimation procedure that adds SNR-dependent inertia and SNR-dependent limit on the channel estimate.

Type: Grant

Filed: April 12, 2004

Date of Patent: June 26, 2007

Assignee: Texas Instruments Incorporated

Inventors: Alexis P. Bernard, Yifan Gong
Method of speech recognition resistant to convolutive distortion and additive distortion

Patent number: 7165028

Abstract: A speech recognizer operating in both ambient noise (additive distortion) and microphone changes (convolutive distortion) is provided. For each utterance to be recognized the recognizer system adapts HMM mean vectors with noise estimates calculated from pre-utterance pause and a channel estimate calculated using an Estimation Maximization algorithm from previous utterances.

Type: Grant

Filed: September 20, 2002

Date of Patent: January 16, 2007

Assignee: Texas Instruments Incorporated

Inventor: Yifan Gong
Implementing a high accuracy continuous speech recognizer on a fixed-point processor

Patent number: 7103547

Abstract: A small vocabulary speech recognizer suitable for implementation on a 16-bit fixed-point DSP is described. The input speech xt is sampled at analog-to-digital (A/D) converter 11 and the digital samples are applied to MFCC (Mel-scaled cepstrum coefficients) front end processing 13. For robustness to background noises, PMC (parallel model combination) 15 is integrated. The MFCC and Gaussian mean vectors are applied to PMC 15. The MFCC and PMC provide speech features extracted in noise and this is used to modify the HMMs. The noise adapted HMMs excluding mean vectors are applied to the search procedure to recognize the grammar. A method of computing MFCC comprises the steps of: performing dynamic Q-point computation for the preemphasis, Hamming Window, FFT, complex FFT to power spectrum and Mel scale power spectrum into filter bank steps, a log filter bank step and after the log filter bank step performing fixed Q-point computation. A polynomial fit is used to compute log2 in the log filter bank step.

Type: Grant

Filed: May 2, 2002

Date of Patent: September 5, 2006

Assignee: Texas Instruments Incorporated

Inventors: Yu-Hung Kao, Yifan Gong
Accumulating transformations for hierarchical linear regression HMM adaptation

Patent number: 7089183

Abstract: A new iterative hierarchical linear regression method for generating a set of linear transforms to adapt HMM speech models to a new environment for improved speech recognition is disclosed. The method determines a new set of linear transforms at an iterative step by Estimate-Maximize (EM) estimation, and then combines the new set of linear transforms with the prior set of linear transforms to form a new merged set of linear transforms. An iterative step may include realignment of adaptation speech data to the adapted HMM models to further improve speech recognition performance.

Type: Grant

Filed: June 22, 2001

Date of Patent: August 8, 2006

Assignee: Texas Instruments Incorporated

Inventor: Yifan Gong
Method of speech recognition with compensation for both channel distortion and background noise

Patent number: 7062433

Abstract: A method of speech recognition with compensation is provided by modifying HMM models trained on clean speech with cepstral mean normalization. For all speech utterances the MFCC vector is calculated for the clean database. This mean MFCC vector is added to the original models. An estimate of the background noise is determined for a given speech utterance. The model mean vectors adapted to the noise are determined. The mean vector of the noisy data over the noisy speech space is determined and this is removed from model mean vectors adapted to noise to get the target model.

Type: Grant

Filed: January 18, 2002

Date of Patent: June 13, 2006

Assignee: Texas Instruments Incorporated

Inventor: Yifan Gong
Automatic utterance detector with high noise immunity

Patent number: 6980950

Abstract: An utterance detector for speech recognition is described. The detector consists of two components. The first part makes a speech/non-speech decision for each incoming speech frame. The decision is based on a frequency-selective autocorrelation function obtained by speech power spectrum estimation, frequency filter, and inverse Fourier transform. The second component makes utterance detection decision, using a state machine that describes the detection process in terms of the speech/non-speech decision made by the first component.

Type: Grant

Filed: September 21, 2000

Date of Patent: December 27, 2005

Assignee: Texas Instruments Incorporated

Inventors: Yifan Gong, Yu-Hung Kao
Source normalization training for HMM modeling of speech

Patent number: 6980952

Abstract: A maximum likelihood (ML) linear regression (LR) solution to environment normalization is provided where the environment is modeled as a hidden (non-observable) variable. By application of an expectation maximization algorithm and extension of Baum-Welch forward and backward variables (Steps 23a–23d) a source normalization is achieved such that it is not necessary to label a database in terms of environment such as speaker identity, channel, microphone and noise type.

Type: Grant

Filed: June 7, 2000

Date of Patent: December 27, 2005

Assignee: Texas Instruments Incorporated

Inventor: Yifan Gong
Sequential variance adaptation for reducing signal mismatching

Publication number: 20050256714

Abstract: The mismatch between the distributions of acoustic models and features in speech recognition may cause performance degradation. A sequential variance adaptation (SVA) adapts the covariances dynamically based on a sequential EM algorithm. The original covariances in acoustic models are adjusted by scaling factors which are sequentially updated once new collection data is available.

Type: Application

Filed: March 29, 2004

Publication date: November 17, 2005

Inventors: Xiaodong Cui, Yifan Gong
Middle-end solution to robust speech recognition

Publication number: 20050228662

Abstract: A method for performing time and frequency Signal-to-Noise Ratio (SNR) dependent weighting in speech recognition is described that includes for each period t estimating the SNR to get time and frequency SNR information ?t,f; calculating the time and frequency weighting to get ?tf; performing the back and forth weighted time varying DCT transformation matrix computation MGtM?1 to get Tt; providing the transformation matrix computation Tt and the original MFCC feature ot that contains the information about the SNR to a recognizer including the Viterbi decoding; and performing weighted Viterbi recognition bj(ot).

Type: Application

Filed: April 13, 2004

Publication date: October 13, 2005

Inventors: Alexis Bernard, Yifan Gong
Method to extend operating range of joint additive and convolutive compensating algorithms

Publication number: 20050228669

Abstract: The operating range of joint additive and convolutive compensating method is extended by enhanced channel estimation procedure that adds SNR-dependent inertia and SNR-dependent limit on the channel estimate.

Type: Application

Filed: April 12, 2004

Publication date: October 13, 2005

Inventors: Alexis Bernard, Yifan Gong
Incremental adjustment of state-dependent bias parameters for adaptive speech recognition

Publication number: 20050216266

Abstract: The mismatch between the distributions of acoustic models and features in speech recognition may cause performance degradation. A sequential bias adaptation (SBA) applies state or class dependent biases to the original mean vectors in acoustic models to take into account the mismatch between features and the acoustic models.

Type: Application

Filed: March 29, 2004

Publication date: September 29, 2005

Inventors: Yifan Gong, Xiaodong Cui
Decoding multiple HMM sets using a single sentence grammar

Publication number: 20050187771

Abstract: For a given sentence grammar, speech recognizers are often required to decode M sets of HMMs each of which models a specific acoustic environment. In order to match input acoustic observations to each of the environments, typically recognition search methods require a network of M sub-networks. A new speech recognition search method is described here, which needs a network that is only the size of a single sub-network and yet provides the same recognition performance, thus reducing the memory requirements for network storage by (M-1)/M.

Type: Application

Filed: February 4, 2005

Publication date: August 25, 2005

Inventor: Yifan Gong
Calibration of speech data acquisition path

Patent number: 6912497

Abstract: A method and system for calibration of a data acquisition path is achieved by applying a voice utterance to a first high quality microphone and reference path and to a test acquisition path including a test microphone such as a lower quality one used in a car. The calibration device includes detecting the power density of the reference signal YR through the reference path and detecting the power density of the signal YN through the acquisition path. A processor processes these signals to provide an output signal representing a noise estimate and channel estimate. The processing uses equation derived by modeling convolutive and additive noise as polynomials with different orders and estimating model parameters using maximum likelihood criterion and simultaneously solving linear equations for the different orders.

Type: Grant

Filed: January 18, 2002

Date of Patent: June 28, 2005

Assignee: Texas Instruments Incorporated

Inventor: Yifan Gong
Noise-resistant utterance detector

Publication number: 20050049863

Abstract: A method and detector for providing a noise resistant utterance detector is provided by extracting a noise estimate (15) to augment the signal-to-noise ratio of the speech signal, inverse filtering (17) of the speech signal to focus on the periodic excitation part of the signal and spectral reshaping (19) to accentuate separation between formants.

Type: Application

Filed: August 27, 2003

Publication date: March 3, 2005

Inventors: Yifan Gong, Alexis Bernard
Speaker-dependent recognition of voice command embedded in arbitrary utterance

Publication number: 20050049871

Abstract: A method of speaker-dependent voice command recognition is provided that includes providing a hybrid of sentence network and Gaussian mixture models with a shared pool of distributions and performing an out-of-vocabulary procedure based on the score difference between a top candidate and background model over the recognized in-vocabulary word. The network is a three section network to represent speech embedded in extra speech where first and last sections are intended to absorb extra- speech and the middle section to match with in-vocabulary speech. An utterance is accepted as containing in-vocabulary word based on a rejection parameter, which has several alternative forms.

Type: Application

Filed: August 26, 2003

Publication date: March 3, 2005

Inventor: Yifan Gong
Speech recognition using model parameters dependent on acoustic environment

Publication number: 20040181409

Abstract: To make speech recognition robust in a noisy environment, variable parameter Gaussian Mixture HMM is described which extends existing HMMs by allowing HMM parameters to change as a function of a continuous variable that depends on the environment. Specifically, in one embodiment the function is a polynomial, the environment is described by signal-to-noise ratio. The use of the parameters functions improves the HMM discriminability during multi-condition training. In the recognition process, a set of HMM parameters is instantiated according to parameter functions, based on current environment. The model parameters are estimated using Expectation-Maximization algorithm for variable parameter GMHMM.

Type: Application

Filed: March 11, 2003

Publication date: September 16, 2004

Inventors: Yifan Gong, Xiaodong Cui
Method for transforming HMMs for speaker-independent recognition in a noisy environment

Patent number: 6658385

Abstract: On improved transformation method uses an initial set of Hidden Markov Models (HMMs) trained on a large amount of speech recorded in a low noise environment R to provide rich information on co-articulation and speaker variation and a smaller database in a more noisy target environment T. A set H of HMMs is trained with data provided in the low noise environment R and the utterances in the noisy environment T are transcribed phonetically using set H of HMMs. The transcribed segments are grouped into a set of Classes C. For each subclass c of Classes C, the transformation &PHgr;c is found to maximize likelihood utterances in T, given H. The HMMs are transformed and steps repeated until likelihood stabilizes.

Type: Grant

Filed: February 10, 2000

Date of Patent: December 2, 2003

Assignee: Texas Instruments Incorporated

Inventors: Yifan Gong, John J. Godfrey
Speech recognition front-end feature extraction for noisy speech

Patent number: 6633842

Abstract: An estimate of clean speech vector, typically Mel-Frequency Cepstral Coefficient (MFCC) given its noisy observation is provided. The method makes use of two Gaussian mixtures. The first one is trained on clean speech and the second is derived from the first one using some noise samples. The method gives an estimate of a clean speech feature vector as the conditional expectancy of clean speech given an observed noisy vector.

Type: Grant

Filed: September 21, 2000

Date of Patent: October 14, 2003

Assignee: Texas Instruments Incorporated

Inventor: Yifan Gong
Log-spectral compensation of PMC Gaussian mean vectors for noisy speech recognition using log-max assumption

Patent number: 6633843

Abstract: Reducing mismatch between HMMs trained with clean speech and speech signals recorded under background noise can be approached by distribution adaptation using parallel model combination (PMC). Accurate PMC has no closed-form expression, therefore simplification assumptions must be made in implementation. Under a new log-max assumption, adaptation formula for log-spectral parameters are presented, both for static and dynamic parameters.

Type: Grant

Filed: April 27, 2001

Date of Patent: October 14, 2003

Assignee: Texas Instruments Incorporated

Inventor: Yifan Gong

prev … 5 6 7 8 9 10 next