Patents by Inventor Yifan Gong

Yifan Gong has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 7269558
    Abstract: For a given sentence grammar, speech recognizers are often required to decode M sets of HMMs each of which models a specific acoustic environment. In order to match input acoustic observations to each of the environments, typically recognition search methods require a network of M sub-networks. A new speech recognition search method is described here, which needs that is only the size of a single sub-network and yet gives the same recognition performance, thus reducing memory requirement for network storage by (M?1)/M.
    Type: Grant
    Filed: July 26, 2001
    Date of Patent: September 11, 2007
    Assignee: Texas Instruments Incorporated
    Inventor: Yifan Gong
  • Patent number: 7236930
    Abstract: The operating range of joint additive and convolutive compensating method is extended by enhanced channel estimation procedure that adds SNR-dependent inertia and SNR-dependent limit on the channel estimate.
    Type: Grant
    Filed: April 12, 2004
    Date of Patent: June 26, 2007
    Assignee: Texas Instruments Incorporated
    Inventors: Alexis P. Bernard, Yifan Gong
  • Patent number: 7165028
    Abstract: A speech recognizer operating in both ambient noise (additive distortion) and microphone changes (convolutive distortion) is provided. For each utterance to be recognized the recognizer system adapts HMM mean vectors with noise estimates calculated from pre-utterance pause and a channel estimate calculated using an Estimation Maximization algorithm from previous utterances.
    Type: Grant
    Filed: September 20, 2002
    Date of Patent: January 16, 2007
    Assignee: Texas Instruments Incorporated
    Inventor: Yifan Gong
  • Patent number: 7103547
    Abstract: A small vocabulary speech recognizer suitable for implementation on a 16-bit fixed-point DSP is described. The input speech xt is sampled at analog-to-digital (A/D) converter 11 and the digital samples are applied to MFCC (Mel-scaled cepstrum coefficients) front end processing 13. For robustness to background noises, PMC (parallel model combination) 15 is integrated. The MFCC and Gaussian mean vectors are applied to PMC 15. The MFCC and PMC provide speech features extracted in noise and this is used to modify the HMMs. The noise adapted HMMs excluding mean vectors are applied to the search procedure to recognize the grammar. A method of computing MFCC comprises the steps of: performing dynamic Q-point computation for the preemphasis, Hamming Window, FFT, complex FFT to power spectrum and Mel scale power spectrum into filter bank steps, a log filter bank step and after the log filter bank step performing fixed Q-point computation. A polynomial fit is used to compute log2 in the log filter bank step.
    Type: Grant
    Filed: May 2, 2002
    Date of Patent: September 5, 2006
    Assignee: Texas Instruments Incorporated
    Inventors: Yu-Hung Kao, Yifan Gong
  • Patent number: 7089183
    Abstract: A new iterative hierarchical linear regression method for generating a set of linear transforms to adapt HMM speech models to a new environment for improved speech recognition is disclosed. The method determines a new set of linear transforms at an iterative step by Estimate-Maximize (EM) estimation, and then combines the new set of linear transforms with the prior set of linear transforms to form a new merged set of linear transforms. An iterative step may include realignment of adaptation speech data to the adapted HMM models to further improve speech recognition performance.
    Type: Grant
    Filed: June 22, 2001
    Date of Patent: August 8, 2006
    Assignee: Texas Instruments Incorporated
    Inventor: Yifan Gong
  • Patent number: 7062433
    Abstract: A method of speech recognition with compensation is provided by modifying HMM models trained on clean speech with cepstral mean normalization. For all speech utterances the MFCC vector is calculated for the clean database. This mean MFCC vector is added to the original models. An estimate of the background noise is determined for a given speech utterance. The model mean vectors adapted to the noise are determined. The mean vector of the noisy data over the noisy speech space is determined and this is removed from model mean vectors adapted to noise to get the target model.
    Type: Grant
    Filed: January 18, 2002
    Date of Patent: June 13, 2006
    Assignee: Texas Instruments Incorporated
    Inventor: Yifan Gong
  • Patent number: 6980950
    Abstract: An utterance detector for speech recognition is described. The detector consists of two components. The first part makes a speech/non-speech decision for each incoming speech frame. The decision is based on a frequency-selective autocorrelation function obtained by speech power spectrum estimation, frequency filter, and inverse Fourier transform. The second component makes utterance detection decision, using a state machine that describes the detection process in terms of the speech/non-speech decision made by the first component.
    Type: Grant
    Filed: September 21, 2000
    Date of Patent: December 27, 2005
    Assignee: Texas Instruments Incorporated
    Inventors: Yifan Gong, Yu-Hung Kao
  • Patent number: 6980952
    Abstract: A maximum likelihood (ML) linear regression (LR) solution to environment normalization is provided where the environment is modeled as a hidden (non-observable) variable. By application of an expectation maximization algorithm and extension of Baum-Welch forward and backward variables (Steps 23a–23d) a source normalization is achieved such that it is not necessary to label a database in terms of environment such as speaker identity, channel, microphone and noise type.
    Type: Grant
    Filed: June 7, 2000
    Date of Patent: December 27, 2005
    Assignee: Texas Instruments Incorporated
    Inventor: Yifan Gong
  • Publication number: 20050256714
    Abstract: The mismatch between the distributions of acoustic models and features in speech recognition may cause performance degradation. A sequential variance adaptation (SVA) adapts the covariances dynamically based on a sequential EM algorithm. The original covariances in acoustic models are adjusted by scaling factors which are sequentially updated once new collection data is available.
    Type: Application
    Filed: March 29, 2004
    Publication date: November 17, 2005
    Inventors: Xiaodong Cui, Yifan Gong
  • Publication number: 20050228662
    Abstract: A method for performing time and frequency Signal-to-Noise Ratio (SNR) dependent weighting in speech recognition is described that includes for each period t estimating the SNR to get time and frequency SNR information ?t,f; calculating the time and frequency weighting to get ?tf; performing the back and forth weighted time varying DCT transformation matrix computation MGtM?1 to get Tt; providing the transformation matrix computation Tt and the original MFCC feature ot that contains the information about the SNR to a recognizer including the Viterbi decoding; and performing weighted Viterbi recognition bj(ot).
    Type: Application
    Filed: April 13, 2004
    Publication date: October 13, 2005
    Inventors: Alexis Bernard, Yifan Gong
  • Publication number: 20050228669
    Abstract: The operating range of joint additive and convolutive compensating method is extended by enhanced channel estimation procedure that adds SNR-dependent inertia and SNR-dependent limit on the channel estimate.
    Type: Application
    Filed: April 12, 2004
    Publication date: October 13, 2005
    Inventors: Alexis Bernard, Yifan Gong
  • Publication number: 20050216266
    Abstract: The mismatch between the distributions of acoustic models and features in speech recognition may cause performance degradation. A sequential bias adaptation (SBA) applies state or class dependent biases to the original mean vectors in acoustic models to take into account the mismatch between features and the acoustic models.
    Type: Application
    Filed: March 29, 2004
    Publication date: September 29, 2005
    Inventors: Yifan Gong, Xiaodong Cui
  • Publication number: 20050187771
    Abstract: For a given sentence grammar, speech recognizers are often required to decode M sets of HMMs each of which models a specific acoustic environment. In order to match input acoustic observations to each of the environments, typically recognition search methods require a network of M sub-networks. A new speech recognition search method is described here, which needs a network that is only the size of a single sub-network and yet provides the same recognition performance, thus reducing the memory requirements for network storage by (M-1)/M.
    Type: Application
    Filed: February 4, 2005
    Publication date: August 25, 2005
    Inventor: Yifan Gong
  • Patent number: 6912497
    Abstract: A method and system for calibration of a data acquisition path is achieved by applying a voice utterance to a first high quality microphone and reference path and to a test acquisition path including a test microphone such as a lower quality one used in a car. The calibration device includes detecting the power density of the reference signal YR through the reference path and detecting the power density of the signal YN through the acquisition path. A processor processes these signals to provide an output signal representing a noise estimate and channel estimate. The processing uses equation derived by modeling convolutive and additive noise as polynomials with different orders and estimating model parameters using maximum likelihood criterion and simultaneously solving linear equations for the different orders.
    Type: Grant
    Filed: January 18, 2002
    Date of Patent: June 28, 2005
    Assignee: Texas Instruments Incorporated
    Inventor: Yifan Gong
  • Publication number: 20050049863
    Abstract: A method and detector for providing a noise resistant utterance detector is provided by extracting a noise estimate (15) to augment the signal-to-noise ratio of the speech signal, inverse filtering (17) of the speech signal to focus on the periodic excitation part of the signal and spectral reshaping (19) to accentuate separation between formants.
    Type: Application
    Filed: August 27, 2003
    Publication date: March 3, 2005
    Inventors: Yifan Gong, Alexis Bernard
  • Publication number: 20050049871
    Abstract: A method of speaker-dependent voice command recognition is provided that includes providing a hybrid of sentence network and Gaussian mixture models with a shared pool of distributions and performing an out-of-vocabulary procedure based on the score difference between a top candidate and background model over the recognized in-vocabulary word. The network is a three section network to represent speech embedded in extra speech where first and last sections are intended to absorb extra- speech and the middle section to match with in-vocabulary speech. An utterance is accepted as containing in-vocabulary word based on a rejection parameter, which has several alternative forms.
    Type: Application
    Filed: August 26, 2003
    Publication date: March 3, 2005
    Inventor: Yifan Gong
  • Publication number: 20040181409
    Abstract: To make speech recognition robust in a noisy environment, variable parameter Gaussian Mixture HMM is described which extends existing HMMs by allowing HMM parameters to change as a function of a continuous variable that depends on the environment. Specifically, in one embodiment the function is a polynomial, the environment is described by signal-to-noise ratio. The use of the parameters functions improves the HMM discriminability during multi-condition training. In the recognition process, a set of HMM parameters is instantiated according to parameter functions, based on current environment. The model parameters are estimated using Expectation-Maximization algorithm for variable parameter GMHMM.
    Type: Application
    Filed: March 11, 2003
    Publication date: September 16, 2004
    Inventors: Yifan Gong, Xiaodong Cui
  • Patent number: 6658385
    Abstract: On improved transformation method uses an initial set of Hidden Markov Models (HMMs) trained on a large amount of speech recorded in a low noise environment R to provide rich information on co-articulation and speaker variation and a smaller database in a more noisy target environment T. A set H of HMMs is trained with data provided in the low noise environment R and the utterances in the noisy environment T are transcribed phonetically using set H of HMMs. The transcribed segments are grouped into a set of Classes C. For each subclass c of Classes C, the transformation &PHgr;c is found to maximize likelihood utterances in T, given H. The HMMs are transformed and steps repeated until likelihood stabilizes.
    Type: Grant
    Filed: February 10, 2000
    Date of Patent: December 2, 2003
    Assignee: Texas Instruments Incorporated
    Inventors: Yifan Gong, John J. Godfrey
  • Patent number: 6633842
    Abstract: An estimate of clean speech vector, typically Mel-Frequency Cepstral Coefficient (MFCC) given its noisy observation is provided. The method makes use of two Gaussian mixtures. The first one is trained on clean speech and the second is derived from the first one using some noise samples. The method gives an estimate of a clean speech feature vector as the conditional expectancy of clean speech given an observed noisy vector.
    Type: Grant
    Filed: September 21, 2000
    Date of Patent: October 14, 2003
    Assignee: Texas Instruments Incorporated
    Inventor: Yifan Gong
  • Patent number: 6633843
    Abstract: Reducing mismatch between HMMs trained with clean speech and speech signals recorded under background noise can be approached by distribution adaptation using parallel model combination (PMC). Accurate PMC has no closed-form expression, therefore simplification assumptions must be made in implementation. Under a new log-max assumption, adaptation formula for log-spectral parameters are presented, both for static and dynamic parameters.
    Type: Grant
    Filed: April 27, 2001
    Date of Patent: October 14, 2003
    Assignee: Texas Instruments Incorporated
    Inventor: Yifan Gong