Abstract: A method of encoding time-discrete audio signals comprises the steps of weighting the time-discrete audio signal by means of window functions overlapping each other so as to form blocks, the window functions producing blocks of a first length for signals varying weakly with time and blocks of a second length for signals varying strongly with time. A start window sequence is selected for the transition from windowing with blocks of the first length to windowing with blocks of the second length, whereas a stop window sequence is selected for the opposite transition. The start window sequence is selected from at least two different start window sequences having different lengths, whereas the stop window sequence is selected from at least two different stop window sequences having different lengths. A method of decoding blocks of encoded audio signals selects a suitable inverse transformation as well as a suitable synthesis window as a reaction to side information associated with each block.
Type:
Grant
Filed:
July 11, 1996
Date of Patent:
December 8, 1998
Assignees:
Fraunhofer-Gesellschaft zur Forderung der Angewandten Forschung E.V., Dolby Laboratories Licensing Corp.
Inventors:
Marina Bosi, Grant Davidson, Charles Robinson, Martin Dietz, Uwe Gbur, Oliver Kunz, Karlheinz Brandenburg
Abstract: An apparatus for monitoring signal quality in a communications link is provided which recognizes speech elements in signals received over the communications link and generates therefrom an estimate of the original speech signal, and compares the estimated signal with the actual received signal to provide an output based on the comparison.
Type:
Grant
Filed:
January 13, 1997
Date of Patent:
December 8, 1998
Assignee:
British Telecommunications Public Limited Company
Inventors:
Michael Peter Hollier, Philip John Sheppard
Abstract: A method, system and product are provided for selectively modifying an encoded audio signal. The method includes receiving the encoded audio signal, the encoded audio signal having a first frequency bandwidth, and identifying a delivery point for the encoded audio signal, the delivery point having a second frequency bandwidth. The method also includes selecting a plurality of subbands from the first frequency bandwidth based on the second frequency bandwidth, and modifying the encoded audio signal based on the plurality of subbands selected. The system includes control logic for performing the method. The product includes a storage medium having computer readable programmed instructions for performing the method.
Abstract: The invention relates to a method and apparatus for automatically generating a speech recognition vocabulary for a speech recognition system from a listing that contains a number of entries, each entry containing a multi-word identification data that distinguishes that entry from other entries in the list. The method comprises the steps of creating for each entry in the listing a plurality of orthographies in the speech recognition vocabulary that are formed by combining selected words from the entry. The words combination is effected by applying a heuristics model that mimics the way users formulate requests to the automated directory assistance system. The method is particularly useful for generating speech recognition vocabularies for automated directory assistance systems.
Abstract: There is provided a speaker-independent model generation apparatus and a speech recognition apparatus which require a processing unit to have less memory capacity and which allow its computation time to be reduced, as compared with a conventional counterpart. A single Gaussian HMM is generated with a Baum-Welch training algorithm based on spoken speech data from a plurality of specific speakers. A state having a maximum increase in likelihood as a result of splitting one state in contextual or temporal domains is searched. Then, the state having a maximum increase in likelihood is split in a contextual or temporal domain corresponding to the maximum increase in likelihood. Thereafter, a single Gaussian HMM is generated with the Baum-Welch training algorithm, and these steps are iterated until the states within the single Gaussian HMM can no longer be split or until a predetermined number of splits is reached. Thus, a speaker-independent HMM is generated.
Type:
Grant
Filed:
November 29, 1996
Date of Patent:
November 17, 1998
Assignee:
ATR Interpreting Telecommunications Research Laboratories
Abstract: Sequential digital vocal sound data are orthogonal-transformed per predetermined number of the data to obtain power spectrum data. The power spectrum data are converted into a data conversion form that a feature corresponding to a phoneme of the vocal sound data is extracted. Converted data thus converted into the data conversion form are compared with reference data patterns related to the feature corresponding to the data conversion form to obtain correlation data between the converted data and the reference data. Pitches are extracted in a frequency direction based on the power spectrum data or the converted data. Power values are extracted based on the vocal sound data or the power spectrum data. The correlation data, pitches, and power values are then coded, sequentially. The coded data are decoded and signals related to each phoneme are formed based on the decoded power values and pitches. The signals are synthesizing with each other to reproduce vocal sound signals.
Abstract: A device and method in which polyphones of speech of a first language is received and stored as well as a movement pattern in a person's face and/or body is registered. The registration of the movement pattern is made by measuring movement at a number of measuring points in the face/body of the speaker, where the measurements are made at the same time that the polyphones are registered. In connection with translation of a person's speech from one language into another, the polyphones and corresponding movement patterns in the face are linked up to a movement model in the face. A picture image of a face of the real person is after that pasted over the model, at which one to the language corresponding movement pattern is obtained. The invention consequently gives the impression that the person really speaks the language in question.
Abstract: A degrouping method for an MPEG 1 decoder for degrouping three consecutive subband samples (X, Y and Z) compressed into one codeword .COPYRGT. by a step number (N) includes the steps determining whether the value of the step number is 3, determining whether the value of the step number is 5 if the value of the step number is not 3, determining whether the value of the step number is 9 if the value of the step number is not 5, searching corresponding values of the subband samples from a first look-up table in the sequence of Z, Y and X, if the value of the step number is 3, searching corresponding values of the subband samples from a second look-up table in the sequence of Z, Y and X, if the value of the step number is 5, and searching corresponding values of the subband samples from a third look-up table in the sequence of Z, Y and X, if the value of the step number is 9, wherein the first, second and third look-up tables have the respective values of the subband samples corresponding to the codeword value.