Patents by Inventor Yu-Hung Kao

Yu-Hung Kao has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Implementing a high accuracy continuous speech recognizer on a fixed-point processor

Patent number: 7103547

Abstract: A small vocabulary speech recognizer suitable for implementation on a 16-bit fixed-point DSP is described. The input speech xt is sampled at analog-to-digital (A/D) converter 11 and the digital samples are applied to MFCC (Mel-scaled cepstrum coefficients) front end processing 13. For robustness to background noises, PMC (parallel model combination) 15 is integrated. The MFCC and Gaussian mean vectors are applied to PMC 15. The MFCC and PMC provide speech features extracted in noise and this is used to modify the HMMs. The noise adapted HMMs excluding mean vectors are applied to the search procedure to recognize the grammar. A method of computing MFCC comprises the steps of: performing dynamic Q-point computation for the preemphasis, Hamming Window, FFT, complex FFT to power spectrum and Mel scale power spectrum into filter bank steps, a log filter bank step and after the log filter bank step performing fixed Q-point computation. A polynomial fit is used to compute log2 in the log filter bank step.

Type: Grant

Filed: May 2, 2002

Date of Patent: September 5, 2006

Assignee: Texas Instruments Incorporated

Inventors: Yu-Hung Kao, Yifan Gong
Compact text-to-phone pronunciation dictionary

Patent number: 7080005

Abstract: A typical English pronunciation dictionary takes up to 1,826,302 bytes in ASCII to store. A five times compression while maintaining computability is achieved by prefix delta encoding of the word and error encoding of the pronunciation.

Type: Grant

Filed: June 8, 2000

Date of Patent: July 18, 2006

Assignee: Texas Instruments Incorporated

Inventor: Yu-Hung Kao
Automatic utterance detector with high noise immunity

Patent number: 6980950

Abstract: An utterance detector for speech recognition is described. The detector consists of two components. The first part makes a speech/non-speech decision for each incoming speech frame. The decision is based on a frequency-selective autocorrelation function obtained by speech power spectrum estimation, frequency filter, and inverse Fourier transform. The second component makes utterance detection decision, using a state machine that describes the detection process in terms of the speech/non-speech decision made by the first component.

Type: Grant

Filed: September 21, 2000

Date of Patent: December 27, 2005

Assignee: Texas Instruments Incorporated

Inventors: Yifan Gong, Yu-Hung Kao
Implementing a high accuracy continuous speech recognizer on a fixed-point processor

Publication number: 20020198706

Abstract: A small vocabulary speech recognizer suitable for implementation on a 16-bit fixed-point DSP is described. The input speech xt is sampled at analog-to-digital (A/D) converter 11 and the digital samples are applied to MFCC (Mel-scaled cepstrum coefficients) front end processing 13. For robustness to background noises, PMC (parallel model combination) 15 is integrated. The MFCC and Gaussian mean vectors are applied to PMC 15. The MFCC and PMC provide speech features extracted in noise and this is used to modify the HMMs. The noise adapted HMMs excluding mean vectors are applied to the search procedure to recognize the grammar. A method of computing MFCC comprises the steps of: performing dynamic Q-point computation for the preemphasis, Hamming Window, FFT, complex FFT to power spectrum and Mel scale power spectrum into filter bank steps, a log filter bank step and after the log filter bank step performing fixed Q-point computation. A polynomial fit is used to compute log2 in the log filter bank step.

Type: Application

Filed: May 2, 2002

Publication date: December 26, 2002

Inventors: Yu-Hung Kao, Yifan Gong
Minimization of search network in speech recognition

Patent number: 6456970

Abstract: The search network in a speech recognition system is reduced by parsing the incoming speech expanding all active paths (101), comparing to speech models and scoring the paths and storing recognition level values at the slots (103) and accumulating the scores and discarding previous slots when a word end is detected creating a word end slot (109).

Type: Grant

Filed: July 15, 1999

Date of Patent: September 24, 2002

Assignee: Texas Instruments Incorporated

Inventor: Yu-Hung Kao
N-best search for continuous speech recognition using viterbi pruning for non-output differentiation states

Patent number: 6374220

Abstract: A method for N-best search for continuous speech recognition with limited storage space includes the steps of Viterbi pruning word level (same word, different time alignment, thus non-output differentiation) states and keeping the N-best sub-optimal paths for sentence level (output differentiation) states.

Type: Grant

Filed: July 15, 1999

Date of Patent: April 16, 2002

Assignee: Texas Instruments Incorporated

Inventor: Yu-Hung Kao
Method of memory management in speech recognition

Patent number: 6374222

Abstract: A memory management method is described for reducing the size of memory required in speech recognition searching. The searching involves parsing the input speech and building a dynamically changing search tree. The basic unit of the search network is a slot. The present invention describes ways of reducing the size of the slot and therefore the size of the required memory. The slot size is reduced by removing the time index, by the model_index and state_index being packed and by a coding for last_time field where one bit represents a slot is available for reuse and a second bit is for backtrace update.

Type: Grant

Filed: July 16, 1999

Date of Patent: April 16, 2002

Assignee: Texas Instruments Incorporated

Inventor: Yu-Hung Kao
Method of phonetic modeling using acoustic decision tree

Patent number: 6317712

Abstract: Phonetic modeling includes the steps of forming triphone grammars (11) from phonetic data, training triphone models (13), clustering triphones (14) that are acoustically close together and mapping unclustered triphone grammars into a clustered model (16). The clustering process includes using a decision tree based on the acoustic likelihood and allows sub-model clusters in user-definable units.

Type: Grant

Filed: January 21, 1999

Date of Patent: November 13, 2001

Assignee: Texas Instruments Incorporated

Inventors: Yu-Hung Kao, Kazuhiro Kondo
Speed up speech recognition search using macro evaluator

Patent number: 6285981

Abstract: A speed up speech recognition search method is provided wherein the number of HMM states is determined and a microslot is allocated for Hidden Markov Models (HMMs) below a given threshold level of states. A macroslot treats a whole HMM as a basic unit. The lowest level of macroslot is a phone. If the number of states exceeds the threshold level a microslot is allocated for this HMM.

Type: Grant

Filed: June 7, 1999

Date of Patent: September 4, 2001

Assignee: Texas Instruments Incorporated

Inventor: Yu-Hung Kao
Speech recognition using clustered between word and/or phrase coarticulation

Patent number: 5819221

Abstract: Improved speech recognition is achieved according to the present invention by use of between word and/or between phrase coarticulation. The increase in the number of phonetic models required to model this additional vocabulary is reduced by clustering 19, 20 the inter-word/phrase models and grammar into only a few classes. By using one class for consonant inter-word context and two classes for vowel contexts, the accuracy for Japanese was almost as good as for unclustered models while the number of models was reduced more than half.

Type: Grant

Filed: August 31, 1994

Date of Patent: October 6, 1998

Assignee: Texas Instruments Incorporated

Inventors: Kazuhiro Kondo, Ikuo Kudo, Yu-Hung Kao, Barbara J. Wheatley