Abstract: A voice signal coding method is provided for processing the residual base-band signal provided by a Voice Excited Predictive (VEPC) Coder. The residual signal is split over several sub-bands and the contents of each sub-band is dually coded by using a process including a simultaneous direct BCPCM and differential APC coding with dynamic assignment of the quantizing bits according to the relative sub-band contents, which simultaneous coding is followed by a dynamic choice of the sub-band coded output providing the lowest coding error.
Abstract: A continuous speech recognition circuit has a data generating circuit for calculating feature pattern data each having N-frame feature parameter data of a plurality of word-periods and reference pattern data every time one-frame period has elapsed and for sequentially generating a maximal similarity data among the calculated similarity data, and a recognition circuit for detecting a series of continuous word-periods which gives the largest similarity sum within a speech interval in accordance with the similarity data from the data generating circuit and recognizing as effective word data the word series corresponding to the detected series of continuous word-periods. The similarity data in each word period is obtained by calculating partial similarity data between the feature parameter data of each frame and each reference parameter data and using the N partial similarity data obtained during the word-period.
Abstract: Continuous speech signal is recognized using "rough" and "detail" parameters derived from prestored reference speech and current unknown speech. The detail parameters are 16 spectral coefficients, the rough parameters 2 or 4 spectral coefficients representing the signal. A word interval detector decides segmentation based on rough parameter similarity.
Abstract: A computer controlled by a voice input has a speech recognition section for converting a keyword of a program which is entered by the voice input and corresponds to a start number, thereby obtaining a digital code. The digital code data which indicates the keyword selects the start number corresponding to the storage content of a table stored in a program memory. The start number data is used to access a start address of the corresponding program, thereby starting and executing the program. Also disclosed is a system wherein when a chosen key of a key input device is operated while a voice operator guidance is generated, a key-in signal is produced to forcibly stop the ongoing guidance. In particular, when a plurality of voice operator guidances are provided, the computer learns the state of the operation by the operator from the manner of the forcible stop, and automatically stops the generation of the voice guidance on a specific item from the next processing.
Abstract: A method for detecting suicidal predisposition in a person by securing an utterance from the person, identifying the person as being suicidally predisposed if the utterance decays substantially non-instantaneously upon conclusion and identifying the person as being suicidally predisposed if signal amplitude modulation during the utterance is low.
Abstract: A digital speech processor operates in parallel with a programmable digital computer to generate sequences of variable-length speech phases and pauses at the request of the computer. A speech memory region within the speech processor contains digitally-encoded speech data segments of varying length. A separate command memory region, can be loaded with a plurality of commands. When sequentially executed by the speech processor, these commands cause the processor to generate an arbitrary sequence of spoken phases and pauses without intervention by the computer. When the programmable digital computer is not operating the speech processor to synthesize spoken words, the speech and command memory regions are used as auxiliary random access memory to increase the size of the memory space of the computer.
September 21, 1983
Date of Patent:
June 23, 1987
Jostens Learning Systems, Inc.
William J. Raymond, Robert L. Morgan, Ricky L. Miller
Abstract: In order to produce and store a pattern of reference words for use in a speaker-dependent speech recognition system, the system prompts the operator of the system to speak standard words in a predetermined sequence. For this purpose, a prestored standard word is spoken by the system with a predetermined length, power and rhythm, and the operator then repeats the standard words while attempting to simulate the same predetermined length, power and rhythm. The standard word repeated by the operator is detected and processed to determine whether it meets a certain resemblance criteria with respect to the standard word as spoken by the system. If the standard word repeated by the operator does not meet the resemblance criteria, the system repeats the same standard word to prompt the operator to try again; and, if the standard word repeated by the operator meets the resemblance criteria, it is stored as a reference word. This operation is repeated for each of the sequence of prestored standard words.
Abstract: In an ADPCM (Adaptive Differential PCM) system, in which the signal is commonly coded in C.sub.i, Q.sub.n, and .sigma. parameters, a lower sampling rate which normally causes distortion is made possible by deriving additional parameters A.sub.k, B.sub.k as a function of the error (distortion) between the original signal S.sub.n and the sampled signal Y.sub.n. The A.sub.k, B.sub.k coefficients control a distortion filter at the receiver.
Abstract: A method and apparatus for processing a signal to extract selected information therefrom is provided. According to the method, an input signal is converted into a sequence of data samples, and these data samples are applied sequentially through a convolver having first and second sections, the output of the first section forming the input of the second section. A data sample in the second convolver section is then compared to each data sample in the first convolver section to produce an output signal of a plurality of data points, each of the data points representative of a midpoint position within the convolver between a pair of compared data samples. This comparison step is repeated for each data sample in the second convolver section and the output signals form a histogram from which the selected information is extracted. This method may be advantageously utilized in a speech recognition process for extracting various features from an input speech signal.
Abstract: For improved pattern recognition, the reference pattern feature sequence contains control parameters (operators) which provide branching and/or omission of those portions of words which may be non-standard due to speaker or dialect deformations. A pattern matching apparatus comprising a stack controller with two PUSH/POP stacks for addressing reference patterns to be used for correlations against the detected pattern information. The reference patterns may contain control characters indicating alternative reference pattern segments or segments which may be omitted.
Abstract: A speech synthesizing apparatus has a first memory storing a plurality of phrase data each including speech data, an address designating circuit for designating an address of the first memory, a second memory for storing synthesizing condition data, and a synthesizer for synthesizing a speech signal based on speech data from the first memory in accordance with the synthesizing condition data stored in the second memory. Each phrase data stored in the first memory also includes the corresponding synthesizing condition data. When each phrase data is read out from the first memory, the synthesizing condition data is first read out and is stored in the second memory, and then the speech data is read out and is supplied to the synthesizer.
Abstract: An improved excitation signal in a low bit-rate coding device for coding a discrete speech signal sequence into an output code sequence for use in exciting a synthesizing filter, an autocorrelation function of an impulse response calculated for the synthesizing filter by using a parmeter sequence representative of a spectral envelope of the segment and a cross-correlation function between the segment and the impulse response are used to produce a sequence of excitation pulses by successively deciding locations and amplitudes of the pulses with the location of a currently processed pulse decided by the use of the locations and the amplitudes of previously processed pulses and with renewal of the previously processed pulse amplitudes carried out concurrently with decision of the currently processed pulse amplitude by the use of the previously and currently processed pulse locations.
Abstract: A speech pattern is partitioned into its syllabic subunits by generating signals representative of the speech energy and autocorrelation features of the time frames portions thereof. The peak energy time frames are identified from the frame energy signals and the minimum energy time frames between each pair of successive peak energy frames of the speech pattern are determined from the time frame energy and autocorrelation feature signals. Candidate syllabic subunits are formed responsive to the peak and minimum energy frame characteristics and the autocorrelation feature signals. Signals corresponding to the duration and the energy of each candidate syllabic subunit peak energy frame relative to the energy of the other peak energy frames and the maximum peak energy frame of the speech pattern are formed and these signals are combined to produce a figure of merit for each candidate syllabic subunit.
October 7, 1983
Date of Patent:
May 12, 1987
American Telephone and Telegraph Company AT&T Bell Laboratories
Abstract: A method of and apparatus for processing audio signals in which a measure of amplitude of audio signals in a selected time period is obtained. The audio signals for the selected time period are delayed until the measure of amplitude is obtained, and then the delayed audio signals are normalized using the measure of amplitude. High-frequency emphasis may be employed prior to obtaining the measure of amplitude. Alternatively, a multi-channel system can be employed for processing audio signals in limited frequency bands. The method and apparatus are applicable in a variety of applications including hearing aids, audio storage media, broadcast and public address systems, and voice communications such as telephone systems.
September 20, 1985
Date of Patent:
April 28, 1987
Larry K. Henrickson, Dorothy A. Huntington
Abstract: A linguistic coding system and keyboard therefor for the use of people unable to use their own voices is described. The coding system and associated keyboard are based on the sentence rather than the word, phoneme or letter. The keyboard is coupled to a computer which stores a plurality of plural word messages or sentences in the memory thereof for selective retrieval by the keyboard. The sentences retrieved from the keyboard are fed to a voice synthesizer which converts them through a loud speaker to audible spoken messages. The keyboard utilizes polysemic (many-meaning) symbols on the respective keys and by designating a selected one of the keys and its associated polysemic symbols a primary message theme key, selected recorded plural messages in the computer memory may be retrieved by actuating a combination of the designated primary message theme key and other keys to vary the context of the polysemic symbols.
December 27, 1985
Date of Patent:
April 28, 1987
Bruce R. Baker, Richard D. Creech, Kenneth W. Smith
Abstract: An allophone vocoder which utilizes the inherent redundancy of the spoken language together with the automatic human filtering of speech so as to obtain a speech compression and recognition system. An analog speech signal is broken up into its phoneme components and encoded for transmission. The encoded phoneme sequence has a much higher compression rate than the analog speech signal. The phonemes are then either transmitted, stored, or used to generate directly an analogous allophone sequence so as to approximate the original speech signal. Due to the inherent redundancy of the spoken language, and the filtering effect of the human ear, variations or errors in the approximations of the phonemes received from the original speech signal are inconsequential to the comprehension ability of the final allophone synthesized speech.
Abstract: A fuel supplying apparatus comprises a fuel supplying nozzle having an opening and closing valve, a fuel supplying arrangement for supplying a fuel to the fuel supplying nozzle, a nozzle hook onto which the fuel supplying nozzle is hooked when the fuel supplying nozzle is not in use, a flow quantity signal generator for generating a flow quantity signal responsive to a quantity of fuel which has been supplied from the fuel supplying arrangement, a speech generating device for generating a speech which is in accordance with a completion of a fuel supplying operation, responsive to a hooking of the fuel supplying nozzle onto the nozzle hook after the fuel supplying operation is completed, and a resetting circuit for stopping the generation of speech by the speech generating means and for resetting the fuel supplying apparatus to a state where a subsequent fuel supplying operation can be started, responsive to an unhooking of the fuel supplying nozzle from the nozzle hook before the generation of speech by the s
Abstract: An integrated circuit device or chip digitally synthesizes human speech employing a linear predictive filter and a variable frame rate. The variable frame rate provides a more natural speech by slowing or speeding the frame rate for a particular application used in a system which constructs the speech data to be synthesized from allophone codes.
Abstract: Smooth transition between concatenated sound elements is achieved by use of an address control unit which is initialized to a point in time corresponding to the maximum similarity between adjacent sound units. A sound synthesizing apparatus for achieving compiling synthesization by the use of sound elements extracted from an analog sound waveform, wherein an analog sound signal is converted into a digital signal, data in the vicinity of the trailing end portion of a preceding sound element and data in the vicinity of the leading end portion of a succeeding sound element are shifted relatively and an arithmetic operation of similarity is made by arithmetic control means, and data of the succeeding sound element is clocked out from a storage means such that the succeeding sound element is connected to the preceding sound element most smoothly.
Abstract: Speaker verification is tested in a sequence of steps: speech recognition of the spoken identification code (key code) is followed by speaker verification using the sounds of the spoken identification code. If verification fails, the speaker is urged by a speech synthesizer to utter his or her name for speaker verification.