Endpoint detection in a stand-alone real-time voice recognition system
A stand-alone, real-time voice recognition system, which converts an analog voice signal into serial digital signal, then preprocesses in parallel the digital signal to detect the end-point, and output fixed multi-order prediction coefficients. In this recognition system, these multi-order prediction coefficients are stored as the reference pattern in the training mode. In recognition mode, these multi-order prediction coefficients are adapted by dynamic time warping method, which is modified by a symmetric form. This symmetric form is implemented with a one-dimensional circular buffer for dynamic programming matching instead of the traditional two-dimensional buffer to save memory space. Finally, these adapted coefficients are compared with reference pattern to output the result of recognition.
Latest Industrial Technology Research Institute Patents:
Claims
1. A method of recognizing voice in real time by converting a sampled voice signal, having voice portions and noise portions, in digital form to a reference pattern in a training mode and outputting recognition results in recognition mode, comprising the steps of:
- preprocessing by prefiltering said sampled voice signal through a first order filter to emphasize the high frequency components of the sampled voice signal in digital form and to obtain a prefiltered signal:
- feature extraction by framing said prefiltered signal to produce a framed signal, filtering said frame signal by a Hamming window function and by a Durbin algorithm to result in multi-order fixed point linear prediction coefficients;
- voice end-point detection by computing said voice portions and eliminating said noise portions using the following steps:
- step 1: define a length L of time of said voice to be zero,
- step 2: fetch one frame to compute the energy coefficients E, where ##EQU4## S(i) is the amplitude of said sampled voice signal, step 3: test whether E>=a predetermined noise threshold, if "no", go to step 2,
- step 4: set Flag=0 where Flag is a Boolean variable to indicate that the sampled voice signal is a single tone,
- step 5: set a width D, the length of a single tone of said voice, D=0,
- step 6: increase D by 1 and fetch next frame to compute the energy coefficient E, and if E>=the predetermined noise threshold, stay at step 6 until E<the predetermined noise threshold,
- step 7: let L=L+D,
- step 8: if Flag=0, as set in step 4 and D<8, go to step 1,
- if Flag=0 and D>=8, then BTW=0, where BTW is a distance between one said single tone and another said single tone, Flag=1, go to step 9,
- if Flag=1 and D<8, then BTW=BTW+D, go to step 9,
- if Flag=1 and D>=8, then BTW=0, go to step 9,
- step 9: if E<the predetermined noise threshold and BTW<16, then BTW=BTW+1, and fetch next frame to compute E, and go to step 9,
- step 10: if BTW<16, set L=L+BTW and go to step 5,
- step 11: set L=L-BTW, clear BTW and output L,
- step 12: end said end-point detection;
- in said training mode, storing said multi-order fixed point linear prediction coefficients as a reference pattern in a memory, and going back to said preprocessing step;
- in said recognition mode, storing said multi-order coefficients by a dynamic time warping method in a modified symmetric form, comparing updated said coefficients with said reference pattern obtained previously during said training mode, and outputting result;
- said modified symmetric form using a one-dimensional circular buffer with only 2*n+1 space in said memory instead of n*n space for a 2-dimensional memory, where n is the adjustable size of said dynamic time warping window and is adjustable.
2. A method as described in claim 1, wherein said voice signal is sampled every 30 ms with 10 ms overlap.
3. A method as described in claim 1, wherein said multi-order fixed point linear prediction coefficients are 10-th order fixed linear prediction coefficients.
4509187 | April 2, 1985 | Ackland et al. |
4712242 | December 8, 1987 | Rajasekaran et al. |
4751737 | June 14, 1988 | Gerson et al. |
4821325 | April 11, 1989 | Martin et al. |
4882756 | November 21, 1989 | Watari |
4918733 | April 17, 1990 | Daugherty |
4956865 | September 11, 1990 | Lenning et al. |
5073939 | December 17, 1991 | Vensko et al. |
5309547 | May 3, 1994 | Niyada et al. |
5327521 | July 5, 1994 | Savk et al. |
- H. Sakoe, S. Chiba, "Dynamic Programming Algorithm Optimization for Spoken Word Recognition," IEEE Trawn on Acoustics, Speech, and Signal Processing, vol. ASSP-26, No. 1, Feb. 1978, pp. 43-49. Y.-.C. Liu, G.A. Gibson, Microcomputer Systems: The 8086/8088 Family, Prentice Hall, Englewood Cliffs, NJ, 1986, pp. 349-352, 374-377, 424-427. C.S. Meyers, et al., "A Comparative Study of Several Dynamic Time-Warping-Algorithms for Connected-Word Recognition," The Bell System Technical Journal, Sep. 1981, 60(7):1389-1407.
Type: Grant
Filed: Apr 14, 1995
Date of Patent: Dec 1, 1998
Assignee: Industrial Technology Research Institute (Hsinchu)
Inventor: Chau-Kai Hsieh (Hsinchu)
Primary Examiner: Allen R. MacDonald
Assistant Examiner: Robert C. Mattson
Attorney: H. C. Lin Patent Agent
Application Number: 8/422,765
International Classification: G10L 506;