Endpoint detection in a stand-alone real-time voice recognition system

Info

Patent number: 5845092
Type: Grant
Filed: Apr 14, 1995
Date of Patent: Dec 1, 1998
Assignee: Industrial Technology Research Institute (Hsinchu)
Inventor: Chau-Kai Hsieh (Hsinchu)
Primary Examiner: Allen R. MacDonald
Assistant Examiner: Robert C. Mattson
Attorney: H. C. Lin Patent Agent
Application Number: 8/422,765

Abstract

A stand-alone, real-time voice recognition system, which converts an analog voice signal into serial digital signal, then preprocesses in parallel the digital signal to detect the end-point, and output fixed multi-order prediction coefficients. In this recognition system, these multi-order prediction coefficients are stored as the reference pattern in the training mode. In recognition mode, these multi-order prediction coefficients are adapted by dynamic time warping method, which is modified by a symmetric form. This symmetric form is implemented with a one-dimensional circular buffer for dynamic programming matching instead of the traditional two-dimensional buffer to save memory space. Finally, these adapted coefficients are compared with reference pattern to output the result of recognition.

Claims

1. A method of recognizing voice in real time by converting a sampled voice signal, having voice portions and noise portions, in digital form to a reference pattern in a training mode and outputting recognition results in recognition mode, comprising the steps of:

preprocessing by prefiltering said sampled voice signal through a first order filter to emphasize the high frequency components of the sampled voice signal in digital form and to obtain a prefiltered signal:

feature extraction by framing said prefiltered signal to produce a framed signal, filtering said frame signal by a Hamming window function and by a Durbin algorithm to result in multi-order fixed point linear prediction coefficients;

voice end-point detection by computing said voice portions and eliminating said noise portions using the following steps:

step 1: define a length L of time of said voice to be zero,

step 2: fetch one frame to compute the energy coefficients E, where ##EQU4## S(i) is the amplitude of said sampled voice signal, step 3: test whether E>=a predetermined noise threshold, if "no", go to step 2,

step 4: set Flag=0 where Flag is a Boolean variable to indicate that the sampled voice signal is a single tone,

step 5: set a width D, the length of a single tone of said voice, D=0,

step 6: increase D by 1 and fetch next frame to compute the energy coefficient E, and if E>=the predetermined noise threshold, stay at step 6 until E<the predetermined noise threshold,

step 7: let L=L+D,

step 8: if Flag=0, as set in step 4 and D<8, go to step 1,

if Flag=0 and D>=8, then BTW=0, where BTW is a distance between one said single tone and another said single tone, Flag=1, go to step 9,

if Flag=1 and D<8, then BTW=BTW+D, go to step 9,

if Flag=1 and D>=8, then BTW=0, go to step 9,

step 9: if E<the predetermined noise threshold and BTW<16, then BTW=BTW+1, and fetch next frame to compute E, and go to step 9,

step 10: if BTW<16, set L=L+BTW and go to step 5,

step 11: set L=L-BTW, clear BTW and output L,

step 12: end said end-point detection;

in said training mode, storing said multi-order fixed point linear prediction coefficients as a reference pattern in a memory, and going back to said preprocessing step;

in said recognition mode, storing said multi-order coefficients by a dynamic time warping method in a modified symmetric form, comparing updated said coefficients with said reference pattern obtained previously during said training mode, and outputting result;

said modified symmetric form using a one-dimensional circular buffer with only 2*n+1 space in said memory instead of n*n space for a 2-dimensional memory, where n is the adjustable size of said dynamic time warping window and is adjustable.

2. A method as described in claim 1, wherein said voice signal is sampled every 30 ms with 10 ms overlap.

3. A method as described in claim 1, wherein said multi-order fixed point linear prediction coefficients are 10-th order fixed linear prediction coefficients.