Endpoint detection in a stand-alone real-time voice recognition system

A stand-alone, real-time voice recognition system, which converts an analog voice signal into serial digital signal, then preprocesses in parallel the digital signal to detect the end-point, and output fixed multi-order prediction coefficients. In this recognition system, these multi-order prediction coefficients are stored as the reference pattern in the training mode. In recognition mode, these multi-order prediction coefficients are adapted by dynamic time warping method, which is modified by a symmetric form. This symmetric form is implemented with a one-dimensional circular buffer for dynamic programming matching instead of the traditional two-dimensional buffer to save memory space. Finally, these adapted coefficients are compared with reference pattern to output the result of recognition.

Skip to:  ·  Claims  ·  References Cited  · Patent History  ·  Patent History

Claims

1. A method of recognizing voice in real time by converting a sampled voice signal, having voice portions and noise portions, in digital form to a reference pattern in a training mode and outputting recognition results in recognition mode, comprising the steps of:

preprocessing by prefiltering said sampled voice signal through a first order filter to emphasize the high frequency components of the sampled voice signal in digital form and to obtain a prefiltered signal:
feature extraction by framing said prefiltered signal to produce a framed signal, filtering said frame signal by a Hamming window function and by a Durbin algorithm to result in multi-order fixed point linear prediction coefficients;
voice end-point detection by computing said voice portions and eliminating said noise portions using the following steps:
step 1: define a length L of time of said voice to be zero,
step 2: fetch one frame to compute the energy coefficients E, where ##EQU4## S(i) is the amplitude of said sampled voice signal, step 3: test whether E>=a predetermined noise threshold, if "no", go to step 2,
step 4: set Flag=0 where Flag is a Boolean variable to indicate that the sampled voice signal is a single tone,
step 5: set a width D, the length of a single tone of said voice, D=0,
step 6: increase D by 1 and fetch next frame to compute the energy coefficient E, and if E>=the predetermined noise threshold, stay at step 6 until E<the predetermined noise threshold,
step 7: let L=L+D,
step 8: if Flag=0, as set in step 4 and D<8, go to step 1,
if Flag=0 and D>=8, then BTW=0, where BTW is a distance between one said single tone and another said single tone, Flag=1, go to step 9,
if Flag=1 and D<8, then BTW=BTW+D, go to step 9,
if Flag=1 and D>=8, then BTW=0, go to step 9,
step 9: if E<the predetermined noise threshold and BTW<16, then BTW=BTW+1, and fetch next frame to compute E, and go to step 9,
step 10: if BTW<16, set L=L+BTW and go to step 5,
step 11: set L=L-BTW, clear BTW and output L,
step 12: end said end-point detection;
in said training mode, storing said multi-order fixed point linear prediction coefficients as a reference pattern in a memory, and going back to said preprocessing step;
in said recognition mode, storing said multi-order coefficients by a dynamic time warping method in a modified symmetric form, comparing updated said coefficients with said reference pattern obtained previously during said training mode, and outputting result;
said modified symmetric form using a one-dimensional circular buffer with only 2*n+1 space in said memory instead of n*n space for a 2-dimensional memory, where n is the adjustable size of said dynamic time warping window and is adjustable.

2. A method as described in claim 1, wherein said voice signal is sampled every 30 ms with 10 ms overlap.

3. A method as described in claim 1, wherein said multi-order fixed point linear prediction coefficients are 10-th order fixed linear prediction coefficients.

Referenced Cited
U.S. Patent Documents
4509187 April 2, 1985 Ackland et al.
4712242 December 8, 1987 Rajasekaran et al.
4751737 June 14, 1988 Gerson et al.
4821325 April 11, 1989 Martin et al.
4882756 November 21, 1989 Watari
4918733 April 17, 1990 Daugherty
4956865 September 11, 1990 Lenning et al.
5073939 December 17, 1991 Vensko et al.
5309547 May 3, 1994 Niyada et al.
5327521 July 5, 1994 Savk et al.
Other references
  • H. Sakoe, S. Chiba, "Dynamic Programming Algorithm Optimization for Spoken Word Recognition," IEEE Trawn on Acoustics, Speech, and Signal Processing, vol. ASSP-26, No. 1, Feb. 1978, pp. 43-49. Y.-.C. Liu, G.A. Gibson, Microcomputer Systems: The 8086/8088 Family, Prentice Hall, Englewood Cliffs, NJ, 1986, pp. 349-352, 374-377, 424-427. C.S. Meyers, et al., "A Comparative Study of Several Dynamic Time-Warping-Algorithms for Connected-Word Recognition," The Bell System Technical Journal, Sep. 1981, 60(7):1389-1407.
Patent History
Patent number: 5845092
Type: Grant
Filed: Apr 14, 1995
Date of Patent: Dec 1, 1998
Assignee: Industrial Technology Research Institute (Hsinchu)
Inventor: Chau-Kai Hsieh (Hsinchu)
Primary Examiner: Allen R. MacDonald
Assistant Examiner: Robert C. Mattson
Attorney: H. C. Lin Patent Agent
Application Number: 8/422,765
Classifications
Current U.S. Class: 395/257
International Classification: G10L 506;