Patents by Inventor Yifan Gong

Yifan Gong has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Method of speech recognition resistant to convolutive distortion and additive distortion

Publication number: 20030115055

Abstract: A speech recognizer operating in both ambient noise (additive distortion) and microphone changes (convolutive distortion) is provided. For each utterance to be recognized the recognizer system adapts HMM mean vectors with noise estimates calculated from pre-utterance pause and a channel estimate calculated using an Estimation Maximization algorithm from previous utterances.

Type: Application

Filed: September 20, 2002

Publication date: June 19, 2003

Inventor: Yifan Gong
System and method of noise-dependent classification

Patent number: 6577997

Abstract: A noise-dependent classifier for a speech recognition system includes a recognizer (15) that provides scores and score differences of two closest in-vocabulary words from a received utterance. A noise detector (17) detects the noise level of a pre-speech portion of the utterance. A classifier (19) is responsive to the detected noise level and scores and noise dependent model for making decisions for accepting or rejecting the utterance as a recognized word depending on the noise-dependent model and the scores.

Type: Grant

Filed: April 27, 2000

Date of Patent: June 10, 2003

Assignee: Texas Instruments Incorporated

Inventor: Yifan Gong
Implementing a high accuracy continuous speech recognizer on a fixed-point processor

Publication number: 20020198706

Abstract: A small vocabulary speech recognizer suitable for implementation on a 16-bit fixed-point DSP is described. The input speech xt is sampled at analog-to-digital (A/D) converter 11 and the digital samples are applied to MFCC (Mel-scaled cepstrum coefficients) front end processing 13. For robustness to background noises, PMC (parallel model combination) 15 is integrated. The MFCC and Gaussian mean vectors are applied to PMC 15. The MFCC and PMC provide speech features extracted in noise and this is used to modify the HMMs. The noise adapted HMMs excluding mean vectors are applied to the search procedure to recognize the grammar. A method of computing MFCC comprises the steps of: performing dynamic Q-point computation for the preemphasis, Hamming Window, FFT, complex FFT to power spectrum and Mel scale power spectrum into filter bank steps, a log filter bank step and after the log filter bank step performing fixed Q-point computation. A polynomial fit is used to compute log2 in the log filter bank step.

Type: Application

Filed: May 2, 2002

Publication date: December 26, 2002

Inventors: Yu-Hung Kao, Yifan Gong
Calibration of speech data acquisition path

Publication number: 20020177998

Abstract: A method and system for calibration of a data acquisition path is achieved by applying a voice utterance to a first high quality microphone and reference path and to a test acquisition path including a test microphone such as a lower quality one used in a car. The calibration device includes detecting the power density of the reference signal YR through the reference path and detecting the power density of the signal YN through the acquisition path. A processor processes these signals to provide an output signal representing a noise estimate and channel estimate. The processing uses equation derived by modeling convolutive and additive noise as polynomials with different orders and estimating model parameters using maximum likelihood criterion and simultaneously solving linear equations for the different orders.

Type: Application

Filed: January 18, 2002

Publication date: November 28, 2002

Inventor: Yifan Gong
Method of speech recognition with compensation for both channel distortion and background noise

Publication number: 20020173959

Abstract: A method of speech recognition with compensation is provided by modifying HMM models trained on clean speech with cepstral mean normalization. For each spech utterance the MFCC vector is calculated for the clean database. This mean MFCC vector is added to the original models. An estimate of the background noise is determined for a given speech utterance. The model mean vectors adapted to the noise is determined. The mean vector of the noisy data over the noisy speech space is determinedand thid is removed from model mean vectors adapted to noise to get the target model.

Type: Application

Filed: January 18, 2002

Publication date: November 21, 2002

Inventor: Yifan Gong
Method and system for adaptive speech recognition in a noisy environment

Patent number: 6418411

Abstract: The system uses utterances recorded in low noise condition, such as a car engine off to optimally adapt speech acoustic models to transducer and speaker characteristics and uses speech pauses to adjust the adopted models to a changing background noise, such as when in a car with the engine running.

Type: Grant

Filed: February 10, 2000

Date of Patent: July 9, 2002

Assignee: Texas Instruments Incorporated

Inventor: Yifan Gong
Method of adapting speech recognition models for speaker, microphone, and noisy environment

Patent number: 6389393

Abstract: The recognition of hands-free speech in a car environment has to deal with variabilities from speaker, microphone channel and background noises. A two-stage model adaptation scheme is presented. The first stage adapts speaker-independent HMM seed model set to a speaker and microphone dependent model set. The second stage adapts speaker and microphone-dependent model set to a speaker, microphone, and noise dependent model set, which is then used for speech recognition. Both adaptations are based on maximum-likelihood linear regression (MLLR).

Type: Grant

Filed: April 15, 1999

Date of Patent: May 14, 2002

Assignee: Texas Instruments Incorporated

Inventor: Yifan Gong
Sequential determination of utterance log-spectral mean by maximum a posteriori probability estimation

Patent number: 6381571

Abstract: Utterance-based mean removal in log-domain, or in any linear transformation of log-domain, e.g., cepstral domain, is known to improve substantially a recognizer's robustness to transducer difference, channel distortion, and speaker variation. Applicants teach a sequential determination of utterance log-spectral mean by a generalized maximum a posteriori estimation. The solution is generalized to a weighted sum of the prior mean and the mean estimated from available frames where the weights are a function of the number of available frames.

Type: Grant

Filed: April 16, 1999

Date of Patent: April 30, 2002

Assignee: Texas Instruments Incorporated

Inventors: Yifan Gong, Coimbatore S. Ramalingam
Method of enrolling phone-based speaker specific commands

Patent number: 6377924

Abstract: A method of enrolling phone-based speaker specific commands includes the first step of providing a set of (H) of speaker-independent phone-based Hidden Markov Models (HMMs), grammar (G) comprising a loop of phones with optional between word silence (BWS) and two utterances U1, and U2 of the command produced by the enrollment speaker and wherein the first frames of the first utterance contain only background noise. The processor generates a sequence of phone-like HMMs and the number of HMMs in that sequence as output. The second step performs model mean adjustment to suit enrollment microphone and speaker characteristics and performs segmentation. The third step generates an HMM for each segment except for silence for utterance U1. The fourth step re-estimates the HMM using both utterance U1 and U2.

Type: Grant

Filed: February 10, 2000

Date of Patent: April 23, 2002

Assignee: Texas Instruments Incorporated

Inventors: Yifan Gong, Coimbatore S. Ramalingam
Decoding multiple HMM sets using a single sentence grammar

Publication number: 20020042710

Abstract: For a given sentence grammar, speech recognizers are often required to decode M set of HMMs each of which models a specific acoustic environment. In order to match input acoustic observations to each of the environments, typically recognition search methods require a network of M sub-networks. A new speech recognition search method is described here, which needs only 1 out of the M subnetwork and yet gives the same recognition performance, thus reducing memory requirement for network storage by M-1/M.

Type: Application

Filed: July 26, 2001

Publication date: April 11, 2002

Inventor: Yifan Gong
Accumulating transformations for hierarchical linear regression HMM adaptation

Publication number: 20020035473

Abstract: A new method, which builds the models at m-th step directly from models at the initial step, is provided to minimize the storage and calculation. The method therefore merges the M×N transformations into a single transformation. The merge guarantees the exactness of the transformations and make it possible for recognizers on mobile devices to have adaptation capability.

Type: Application

Filed: June 22, 2001

Publication date: March 21, 2002

Inventor: Yifan Gong
Log-spectral compensation of gaussian mean vectors for noisy speech recognition

Publication number: 20020013697

Abstract: Reducing mismatch between HMMs trained with clean speech and speech signals recorded under background noise can be approached by distribution adaptation using parallel model combination (PMC). Accurate PMC has no closed-form expression, therefore simplification assumptions must be made in implementation. Under a new log-max assumption, adaptation formula for log-spectral parameters are presented, both for static and dynamic parameters.

Type: Application

Filed: April 27, 2001

Publication date: January 31, 2002

Inventor: Yifan Gong
Source normalization training for HMM modeling of speech

Patent number: 6151573

Abstract: A maximum likelihood (ML) linear regression (LR) solution to environment normalization is provided where the environment is modeled as a hidden (non-observable) variable. By application of an expectation maximization algorithm and extension of Baum-Welch forward and backward variables (Steps 23a-23d) a source normalization is achieved such that it is not necessary to label a database in terms of environment such as speaker identity, channel, microphone and noise type.

Type: Grant

Filed: August 15, 1998

Date of Patent: November 21, 2000

Assignee: Texas Instruments Incorporated

Inventor: Yifan Gong

prev … 6 7 8 9 10