Abstract: A method for automatically converting a decision tree into one or more weighted finite-state transducers. Specifically, the method in accordance with an illustrative embodiment of the present invention processes one or more terminal (i.e., leaf) nodes of a given decision tree to generate one or more corresponding weighted rewrite rules. Then, these weighted rewrite rules are processed to generate weighted finite-state transducers corresponding to the one or more terminal nodes of the decision tree. In this manner, decision trees may be advantageously compiled into weighted finite-state transducers, and these transducers may then be used directly in various speech and natural language processing systems. The weighted rewrite rules employed herein comprise an extension of conventional rewrite rules, familiar to those skilled in the art.
Abstract: A novel approach to parameter encoding is presented which improves coding efficiency and performance by exploiting the variable rate nature of certain classes of signals. This is achieved using an interpolative variable frame-rate breakpointing scheme referred to as adaptive frame selection (AFS). In the approach described in this report, frame selection is achieved using a recursive dynamic programming algorithm; the resulting parameter encoding system is referred to as adaptive frame selection using dynamic programming (AFS/DP). The AFS/DP algorithm determines optimal breakpoint locations in the context of parameter encoding using an arbitrary objective performance measure, and operates in a fixed bit-rate, fixed-delay context with low computational requirements. When applied to the problem of low bit-rate coding of speech spectral and gain parameters, the AFS/DP algorithm is capable of improving the perceptual quality of coded speech and robustness to quantization errors over fixed frame-rate approaches.
Type:
Grant
Filed:
September 19, 1996
Date of Patent:
September 8, 1998
Assignee:
Texas Instruments Incorporated
Inventors:
E. Bryan George, Alan V. McCree, Vishu R. Viswanathan
Abstract: A tonal sound recognizer determines tones in a tonal language without the use of voicing recognizers or peak picking rules. The tonal sound recognizer computes feature vectors for a number of segments of a sampled tonal sound signal in a feature vector computing device, compares the feature vectors of a first of the segments with the feature vectors of another segment in a cross-correlator to determine a trend of a movement of a tone of the sampled tonal sound signal, and uses the trend as an input to a word recognizer to determine a word or part of a word of the sampled tonal sound signal.
Abstract: A biofeedback system for speech disorders is provided which is adapted to detect disfluent speech, and to provide auditory feedback enabling immediate fluent speech, and to control the auditory feedback in accordance with the disfluent speech, to enable immediate and carryover fluency. The disfluent speech detector is preferably an electromyograph (EMG). The auditory feedback is preferably frequency-altered auditory feedback (FAF). The controller shifts the pitch of the user's voice in accordance with the user's disfluent speech. The biofeedback system may also be provided with delayed auditory feedback (DAF) which enables user control of speaking rate, with masking auditory feedback (MAP) which improves user awareness of the physical sensations of speech, and with a voice-operated switch (VOX) to switch the device off when the user stops talking. The biofeedback system may also include a timer on the DAF circuit to automatically vary the user's speaking rate at regular time intervals.
Abstract: A vocoder device and corresponding method characterizes and reconstructs speech excitation. An excitation analysis portion performs a cyclic excitation transformation process on a target excitation segment by rotating a peak amplitude to a beginning buffer location. The excitation phase representation is dealiased using multiple dealiasing passes based on the phase slope variance. Both primary and secondary excitation components are characterized, where the secondary excitation is characterized based on a computation of the error between the characterized primary excitation and the original excitation. Alternatively, an excitation pulse compression filter is applied to the target, resulting in a symmetric target. The symmetric target is characterized by normalizing half the symmetric target. The synthesis portion performs reconstruction and synthesis of the characterized excitation based on the characterization method employed by the analysis portion.
Type:
Grant
Filed:
September 13, 1996
Date of Patent:
August 11, 1998
Assignee:
Motorola, Inc.
Inventors:
Chad Scott Bergstrom, Bruce Alan Fette, Cynthia Ann Jaskie, Clifford Wood, Sean Sungsoo You
Abstract: The pitch of synthesized speech signals is varied by separating the speech signals into a spectral component and an excitation component. The latter is multiplied by a series of overlapping window functions synchronous, in the case of voiced speech, with pitch timing mark information corresponding at least approximately to instants of vocal excitation, to separate it into windowed speech segments which are added together again after the application of a controllable time-shift. The spectral and excitation components are then recombined. The multiplication employs at least two windows per pitch period, each having a duration of less than one pitch period. Alternatively each window has a duration of less than twice the pitch period between timing marks and is asymmetric about the timing mark.
Abstract: A vocoder for generating speech from a plurality of stored speech parameters which computes the excitation signals in the speech production model. The present invention generates a periodic excitation signal with flat frequency response and linear group delay. The present invention uses properties of the phase delay sequence being generated to calculate each of the parameters of the excitation signal in an efficient and optimized manner. Generation of the excitation signal requires computation of the expression: ##EQU1## The above expression uses the equation: ##EQU2## This equation defines the phase relationship between the signals using a linear group delay where .phi.'.sub.I (x)* is the absolute phase offset from the first phase harmonic, I is an index for the harmonic, x is time, P is the pitch period, and k" is a constant. The present invention performs the following iterations to compute the above sequence:1) .phi.'.sub.I (x)*=.phi.'.sub.I- (x)*+A.sub.I-1 (x)2) A.sub.I (x)=A.sub.I-1 (x)-Bwhere A.sub.
Abstract: A system for controlling a device such as a television and for controlling access to broadcast information such as video, audio, and/or text information is disclosed. The system includes a first receiver for receiving utterances of a speaker, a second receiver for receiving vocabulary data defining a vocabulary of utterances, and a processor for executing a speech recognition algorithm using the received vocabulary data to recognize the utterances of the speaker and for controlling the device and the access to the broadcast information in accordance with the recognized utterances of the speaker.
Type:
Grant
Filed:
January 3, 1995
Date of Patent:
June 30, 1998
Assignee:
Scientific-Atlanta, Inc.
Inventors:
Peter B. Houser, Mark E. Schutte, Gloria J. Majid
Abstract: An encoded data decoding apparatus including: a circuit for sequentially receiving a series of encoded data frames, wherein each of the series of data frames includes a plurality of multiplexed band data classified into a plurality of predetermined frequency bands, each of the multiplexed band data including encoded information data belonging to a corresponding frequency band and processing data used for encoding the encoded information data, and for processing each of the data frames so as to separate the encoded information data and the processing data from each other; a circuit for decoding the encoded information data by using the processing data separated from the information data in each of the frequency bands; and any one of the following circuits, one is a circuit for determining whether each of the data frames is to be decoded or not on the basis of a level of the encoded information data belonging to at least one frequency band selected from the frequency bands included in the data frame and the oth
Abstract: A method and system for creating voice commands for inserting previously entered information is provided. In a preferred embodiment, a command to create a new voice command is received after a user has selected information to be inserted by the new voice command. In response, the selected information is retrieved and a new voice command name is received from the user. A new voice command is then created that, when invoked by speaking the new voice command name while a document is being edited, inserts the selected information into the document being edited.
Abstract: A dialogue-sound processing appratus of the present invention generates discourse structure representing the flow of dialogue from fragmentary spoken utterances. In the dialogue-sound processing apparatus, the speech fragments of the dialogue-sound is inputted through a sound input section. A clue extraction section extracts clue which is a word or prosodic feature representing flow of dialogue from the speech fragments. An utterance function rule memory section memorizes utterance function rule which is correspondence relation between the clue and the utterance function representing pragmatic effect for the flow of dialogue. An utterance function extraction section assigns the utterance function to the clue in accordance with the utterance function rule.
Abstract: A method for encoding the information has a forward inverse transform step of forward orthogonal transforming an input signal using a pre-set windowing function to form an output spectral signal, and an encoding step of encoding the resulting output spectral signal from the forward orthogonal transform. The preset windowing function having a smaller slope of a characteristic curve at both skirt ends is employed. This allows efficient encoding when employing a transform constituting a waveform signal by overlapping waveform elements with both neighboring waveform elements for generating a waveform signal during inverse orthogonal transform, such as MDCT.
Abstract: A computer system for linearly encoding a pronunciation prefix tree. The pronunciation prefix tree has nodes such that each non-root and non-leaf node represents a phoneme and wherein each leaf node represents a word formed by the phonemes represented by the non-leaf nodes in a path from the root node to the leaf node. Each leaf node has a probability associated with the word of the leaf node. The computer system creates a tree node dictionary containing an indication of the phonemes that compose each word. The computer system then orders the child nodes of each non-leaf node based on the highest probability of descendent leaf nodes of the child node. Then, for each non-leaf node, the computer system sets the probability of the non-leaf node to a probability based on the probability of its child nodes, and for each node, sets a factor of the node to the probability of the node divided by the probability of the parent node of the node.
Abstract: The present invention relates to a method and apparatus for recording an audio signal on an integrated circuit (IC) memory card. The audio signal to be recorded is considered to have plural chapters (i.e., songs, or distinct movements) with a mute section (i.e., moment of silence of at least a predetermined length) between each adjoining chapter in the audio signal to be recorded. The present invention provides for automatic partitioning between the chapters as they are recorded on a data area of the IC memory card even when the audio signal is recorded continuously based upon a single press of the record button and terminated with a single press of the stop button. Since the chapters of the recorded audio signal are automatically partitioned, without need for starting and stopping of the recording process by a user, a recording is conveniently made on the IC memory card which allows random access to any one of the chapters for playback.
Abstract: A speech synthesizing method which uses glottal modelling to determine and transform ten or fewer high level parameters into thirty-nine low level parameters using mapping relations. These parameters are inputted to a speech synthesizer to enable speech to be synthesized more simply than with prior art systems that required 50 to 60 parameters to be inputted to represent any particular speech.
Abstract: A system for comparing subjective dialogue quality in mobile telephone systems that include at least one mobile telephone exchange operating with a number of base stations, and at least one mobile radio unit for communicating with a respective base station. A first representation of the subjective dialogue quality that is experienced by a user of a first connection in the mobile telephone system is provided and compared with a representation produced for a second connection in a mobile telephone system. The system includes a transmitter for transmitting at least one predetermined and stored speech message that constitutes a second representation of a correct dialogue quality, and second means, including speech recognition means, for receiving and evaluating the speech message transmitted. The receiver produces in accordance with the recognizable parts of the speech message, a third representation of a dialogue quality experienced the user of the system.
Abstract: A solid state digital hand held recording device having a multifunctional switch assembly. A printed circuit board including a microcontroller electrically coupled to switch terminals operates to control the processing of sound into electrical signals and store said signals on a digital recording medium. The switch assembly actuates electrical signals coupled to said microcontroller thereby activating a sequence of actions (a program) stored within read-only memory device. A plurality of programs can be activated to instantaneously begin recording a message, verify the integrity of the recording medium, and index a message being recorded for rapid recall.
Type:
Grant
Filed:
March 3, 1997
Date of Patent:
April 21, 1998
Assignee:
Norris Communications Corporation
Inventors:
Norbert P. Daberko, Richard K. Davis, Richard D. Bridgewater
Abstract: A method for enhancing a block companding uniform midtread quantizer in the ATRAC (Adaptive Transform Acoustic Coding) for the quantization of digital audio signals, wherein the audio signals are represented by a plurality of frames of quantized spectral components. The method includes modifying all spectral sample values in a defined frequency interval by a constant factor, calculating a scale factor for the frequency interval from a maximum spectral sample value, and quantizing all the spectral samples in the modified frequency interval with a modified quantizer.
Type:
Grant
Filed:
December 6, 1993
Date of Patent:
March 31, 1998
Assignee:
Matsushita Electric Industrial Co., Ltd.
Abstract: A bit rate Codebook Excited Linear Predictor (CELP) communication system which includes a transmitter that organizes a signal containing speech into frames of 40 millisecond duration, and classifies each frame as one of three modes: voiced and stationary, unvoiced or transient, and background noise.
Type:
Grant
Filed:
April 18, 1994
Date of Patent:
March 31, 1998
Assignee:
Hughes Electronics
Inventors:
Kumar Swaminathan, Kalyan Ganesan, Prabhat K. Gupta
Abstract: A system for recognizing spoken sounds from continuous speech includes a plurality of classifiers and a selector. Each of the classifiers implements a discriminant function which is based on a polynomial expansion. By determining the polynomial coefficients of a discriminant function, the corresponding classifier is tuned to classify a specific spoken sound. The selector utilizes the classifier outputs to identify the spoken sounds. A method of using the system is also provided.