Patents by Inventor Yifan Gong

Yifan Gong has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20030115055
    Abstract: A speech recognizer operating in both ambient noise (additive distortion) and microphone changes (convolutive distortion) is provided. For each utterance to be recognized the recognizer system adapts HMM mean vectors with noise estimates calculated from pre-utterance pause and a channel estimate calculated using an Estimation Maximization algorithm from previous utterances.
    Type: Application
    Filed: September 20, 2002
    Publication date: June 19, 2003
    Inventor: Yifan Gong
  • Patent number: 6577997
    Abstract: A noise-dependent classifier for a speech recognition system includes a recognizer (15) that provides scores and score differences of two closest in-vocabulary words from a received utterance. A noise detector (17) detects the noise level of a pre-speech portion of the utterance. A classifier (19) is responsive to the detected noise level and scores and noise dependent model for making decisions for accepting or rejecting the utterance as a recognized word depending on the noise-dependent model and the scores.
    Type: Grant
    Filed: April 27, 2000
    Date of Patent: June 10, 2003
    Assignee: Texas Instruments Incorporated
    Inventor: Yifan Gong
  • Publication number: 20020198706
    Abstract: A small vocabulary speech recognizer suitable for implementation on a 16-bit fixed-point DSP is described. The input speech xt is sampled at analog-to-digital (A/D) converter 11 and the digital samples are applied to MFCC (Mel-scaled cepstrum coefficients) front end processing 13. For robustness to background noises, PMC (parallel model combination) 15 is integrated. The MFCC and Gaussian mean vectors are applied to PMC 15. The MFCC and PMC provide speech features extracted in noise and this is used to modify the HMMs. The noise adapted HMMs excluding mean vectors are applied to the search procedure to recognize the grammar. A method of computing MFCC comprises the steps of: performing dynamic Q-point computation for the preemphasis, Hamming Window, FFT, complex FFT to power spectrum and Mel scale power spectrum into filter bank steps, a log filter bank step and after the log filter bank step performing fixed Q-point computation. A polynomial fit is used to compute log2 in the log filter bank step.
    Type: Application
    Filed: May 2, 2002
    Publication date: December 26, 2002
    Inventors: Yu-Hung Kao, Yifan Gong
  • Publication number: 20020177998
    Abstract: A method and system for calibration of a data acquisition path is achieved by applying a voice utterance to a first high quality microphone and reference path and to a test acquisition path including a test microphone such as a lower quality one used in a car. The calibration device includes detecting the power density of the reference signal YR through the reference path and detecting the power density of the signal YN through the acquisition path. A processor processes these signals to provide an output signal representing a noise estimate and channel estimate. The processing uses equation derived by modeling convolutive and additive noise as polynomials with different orders and estimating model parameters using maximum likelihood criterion and simultaneously solving linear equations for the different orders.
    Type: Application
    Filed: January 18, 2002
    Publication date: November 28, 2002
    Inventor: Yifan Gong
  • Publication number: 20020173959
    Abstract: A method of speech recognition with compensation is provided by modifying HMM models trained on clean speech with cepstral mean normalization. For each spech utterance the MFCC vector is calculated for the clean database. This mean MFCC vector is added to the original models. An estimate of the background noise is determined for a given speech utterance. The model mean vectors adapted to the noise is determined. The mean vector of the noisy data over the noisy speech space is determinedand thid is removed from model mean vectors adapted to noise to get the target model.
    Type: Application
    Filed: January 18, 2002
    Publication date: November 21, 2002
    Inventor: Yifan Gong
  • Patent number: 6418411
    Abstract: The system uses utterances recorded in low noise condition, such as a car engine off to optimally adapt speech acoustic models to transducer and speaker characteristics and uses speech pauses to adjust the adopted models to a changing background noise, such as when in a car with the engine running.
    Type: Grant
    Filed: February 10, 2000
    Date of Patent: July 9, 2002
    Assignee: Texas Instruments Incorporated
    Inventor: Yifan Gong
  • Patent number: 6389393
    Abstract: The recognition of hands-free speech in a car environment has to deal with variabilities from speaker, microphone channel and background noises. A two-stage model adaptation scheme is presented. The first stage adapts speaker-independent HMM seed model set to a speaker and microphone dependent model set. The second stage adapts speaker and microphone-dependent model set to a speaker, microphone, and noise dependent model set, which is then used for speech recognition. Both adaptations are based on maximum-likelihood linear regression (MLLR).
    Type: Grant
    Filed: April 15, 1999
    Date of Patent: May 14, 2002
    Assignee: Texas Instruments Incorporated
    Inventor: Yifan Gong
  • Patent number: 6381571
    Abstract: Utterance-based mean removal in log-domain, or in any linear transformation of log-domain, e.g., cepstral domain, is known to improve substantially a recognizer's robustness to transducer difference, channel distortion, and speaker variation. Applicants teach a sequential determination of utterance log-spectral mean by a generalized maximum a posteriori estimation. The solution is generalized to a weighted sum of the prior mean and the mean estimated from available frames where the weights are a function of the number of available frames.
    Type: Grant
    Filed: April 16, 1999
    Date of Patent: April 30, 2002
    Assignee: Texas Instruments Incorporated
    Inventors: Yifan Gong, Coimbatore S. Ramalingam
  • Patent number: 6377924
    Abstract: A method of enrolling phone-based speaker specific commands includes the first step of providing a set of (H) of speaker-independent phone-based Hidden Markov Models (HMMs), grammar (G) comprising a loop of phones with optional between word silence (BWS) and two utterances U1, and U2 of the command produced by the enrollment speaker and wherein the first frames of the first utterance contain only background noise. The processor generates a sequence of phone-like HMMs and the number of HMMs in that sequence as output. The second step performs model mean adjustment to suit enrollment microphone and speaker characteristics and performs segmentation. The third step generates an HMM for each segment except for silence for utterance U1. The fourth step re-estimates the HMM using both utterance U1 and U2.
    Type: Grant
    Filed: February 10, 2000
    Date of Patent: April 23, 2002
    Assignee: Texas Instruments Incorporated
    Inventors: Yifan Gong, Coimbatore S. Ramalingam
  • Publication number: 20020042710
    Abstract: For a given sentence grammar, speech recognizers are often required to decode M set of HMMs each of which models a specific acoustic environment. In order to match input acoustic observations to each of the environments, typically recognition search methods require a network of M sub-networks. A new speech recognition search method is described here, which needs only 1 out of the M subnetwork and yet gives the same recognition performance, thus reducing memory requirement for network storage by M-1/M.
    Type: Application
    Filed: July 26, 2001
    Publication date: April 11, 2002
    Inventor: Yifan Gong
  • Publication number: 20020035473
    Abstract: A new method, which builds the models at m-th step directly from models at the initial step, is provided to minimize the storage and calculation. The method therefore merges the M×N transformations into a single transformation. The merge guarantees the exactness of the transformations and make it possible for recognizers on mobile devices to have adaptation capability.
    Type: Application
    Filed: June 22, 2001
    Publication date: March 21, 2002
    Inventor: Yifan Gong
  • Publication number: 20020013697
    Abstract: Reducing mismatch between HMMs trained with clean speech and speech signals recorded under background noise can be approached by distribution adaptation using parallel model combination (PMC). Accurate PMC has no closed-form expression, therefore simplification assumptions must be made in implementation. Under a new log-max assumption, adaptation formula for log-spectral parameters are presented, both for static and dynamic parameters.
    Type: Application
    Filed: April 27, 2001
    Publication date: January 31, 2002
    Inventor: Yifan Gong
  • Patent number: 6151573
    Abstract: A maximum likelihood (ML) linear regression (LR) solution to environment normalization is provided where the environment is modeled as a hidden (non-observable) variable. By application of an expectation maximization algorithm and extension of Baum-Welch forward and backward variables (Steps 23a-23d) a source normalization is achieved such that it is not necessary to label a database in terms of environment such as speaker identity, channel, microphone and noise type.
    Type: Grant
    Filed: August 15, 1998
    Date of Patent: November 21, 2000
    Assignee: Texas Instruments Incorporated
    Inventor: Yifan Gong