Abstract: An interactive voice response system includes a server and a set of mobile clients. The server and clients include RF transceivers for exchanging messages over an RF channel. Each mobile client includes a microphone, a speaker or headset, a processor and a voice browser. The voice browser interprets voice pages received from the server. Upon receiving a particular voice page from the server, the voice browser outputs via the speaker voice prompts specified by the voice page. A speech recognition engine used by the voice browser converts voice responses from a user into a text response. The voice browser then performs an action based on the text response. The action taken may be to request a new voice page from the server, or to continue to interpret the current voice page.
Type:
Grant
Filed:
March 30, 2000
Date of Patent:
December 9, 2003
Assignee:
Voxware, Inc.
Inventors:
Ray Albayrak, Sherri L. Meade, John D. Puterbaugh, Lee Stewart, Craig Vanderborgh, David C. Vetter
Abstract: A modular system and method is provided for low bit rate encoding and decoding of speech signals using voicing probability determination. The continuous input speech is divided into time segments of a predetermined length. For each segment the encoder of the system computes a model signal and subtracts the model signal from the original signal in the segment to obtain a residual excitation signal. Using the excitation signal the system computes the signal pitch and a parameter which is related to the relative content of voiced and unvoiced portions in the spectrum of the excitation signal, which is expressed as a ratio Pv, defined as a voicing probability. The voiced and the unvoiced portions of the excitation spectrum, as determined by the parameter Pv, are encoded using one or more parameters related to the energy of the excitation signal in a predetermined set of frequency bands.
Abstract: A method and system is provided for encoding and decoding of speech signals at a low bit rate. The continuous input speech is divided into voiced and unvoiced time segments of a predetermined length. The encoder of the system uses a linear predictive coding model for the unvoiced speech segments and harmonic frequencies decomposition for the voiced speech segments. Only the magnitudes of the harmonic frequencies are determined using the discrete Fourier transform of the voiced speech segments. The decoder synthesizes voiced speech segments using the magnitudes of the transmitted harmonics and estimates the phase of each harmonic from the signal in the preceding speech segments. Unvoiced speech segments are synthesized using linear prediction coding (LPC) coefficients obtained from codebook entries for the poles of the LPC coefficient polynomial. Boundary conditions between voiced and unvoiced segments are established to insure amplitude and phase continuity for improved output speech quality.
Abstract: A modular system and method is provided for encoding and decoding of speech signals using voicing probability determination. The continuous input speech is divided into time segments of a predetermined length. For each segment the encoder of the system computes the signal pitch and a parameter which is related to the relative content of voiced and unvoiced portions in the spectrum of the signal, which is expressed as a ratio Pv, defined as a voicing probability. The voiced portion of the signal spectrum, as determined by the parameter Pv, is encoded using a set of harmonically related amplitudes corresponding to the estimated pitch. The unvoiced portion of the signal is processed in a separate processing branch which uses a modified linear predictive coding algorithm. Parameters representing both the voiced and the unvoiced portions of a speech segment are combined in data packets for transmission.