Patents Examined by Michael Sartori
  • Patent number: 5649060
    Abstract: A method of automatically aligning a written transcript with speech in video and audio clips. The disclosed technique involves as a basic component an automatic speech recognizer. The automatic speech recognizer decodes speech (recorded on a tape) and produces a file with a decoded text. This decoded text is then matched with the original written transcript via identification of similar words or clusters of words. The results of this matching is an alignment of the speech with the original transcript. The method can be used (a) to create indexing of video clips, (b) for "teleprompting" (i.e. showing the next portion of text when someone is reading from a television screen), or (c) to enhance editing of a text that was dictated to a stenographer or recorded on a tape for its subsequent textual reproduction by a typist.
    Type: Grant
    Filed: October 23, 1995
    Date of Patent: July 15, 1997
    Assignee: International Business Machines Corporation
    Inventors: Hamed A. Ellozy, Dimitri Kanevsky, Michelle Y. Kim, David Nahamoo, Michael Alan Picheny, Wlodek Wlodzimierz Zadrozny
  • Patent number: 5636324
    Abstract: A stereo audio encoding method for encoding left and right original signals to a left and right reproduced signals for suppressing a loss of quality in the reproduced audio signal. The correlation between the right and left channel signals is determined, and the phase of each signal is compared. If the two signals have the same phase, a modified scale factor is calculated based on a power equalization method, but if the two signals are in opposite phase, another modified scale factor is calculated based on an error minimization method. The modified scale factors are used for calculating the reproduced signals.
    Type: Grant
    Filed: March 31, 1995
    Date of Patent: June 3, 1997
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Do-Hui Teh, Ah-Peng Tan
  • Patent number: 5632003
    Abstract: The invention relates in general to low bit-rate encoding and decoding of information such as audio information. More particularly, the invention relates to computationally efficient adaptive bit allocation and quantization of encoded information useful in high-quality low bit-rate coding systems.In one embodiment, an audio split-band encoder splits an input signal into frequency subband signals, quantizes the subband signals according to values established by an allocation function, and assembles the quantized subband signals into an encoded signal. The allocation function establishes allocation values in accordance with psychoacoustic principles based upon a masking threshold.
    Type: Grant
    Filed: November 1, 1993
    Date of Patent: May 20, 1997
    Assignee: Dolby Laboratories Licensing Corporation
    Inventors: Grant A. Davidson, Craig C. Todd, Mark F. Davis, Brian D. Link, Louis D. Fielder
  • Patent number: 5630016
    Abstract: A digital discontinuous cellular communication system has a transmitter that transmits two frames of data following detection of voice inactivity. A receiver includes a comfort noise generator that uses the two frames of data to output noise to the speaker during period of voice inactivity. The comfort noise generator includes synthesis codebook with samples scaled by actual background noise and excitation codebook with samples filtered and scaled by the background noise that are combined to produce comfort noise having attributes and loudness level of the received background noise prior to interruption of transmission. The scaled signals are weighted to vary the loudness level and spectral attributes.
    Type: Grant
    Filed: March 7, 1996
    Date of Patent: May 13, 1997
    Assignee: Hughes Electronics
    Inventors: Kumar Swaminathan, Brian M. McCarthy
  • Patent number: 5627939
    Abstract: A data compression system greatly compresses the stored data used by a speech recognition system employing hidden Markov models (HMM). The speech recognition system vector quantizes the acoustic space spoken by humans by dividing it into a predetermined number of acoustic features that are stored as codewords in a vector quantization (output probability) table or codebook. For each spoken word, the speech recognition system calculates an output probability value for each codeword, the output probability value representing an estimated probability that the word will be spoken using the acoustic feature associated with the codeword. The probability values are stored in an output probability table indexed by each codeword and by each word in a vocabulary. The output probability table is arranged to allow compression of the probability of values associated with each codeword based on other probability values associated with the same codeword, thereby compressing the stored output probability.
    Type: Grant
    Filed: September 3, 1993
    Date of Patent: May 6, 1997
    Assignee: Microsoft Corporation
    Inventors: Xuedong Huang, Shenzhi Zhang
  • Patent number: 5623577
    Abstract: The invention relates in general to low bit-rate encoding and decoding of information such as audio information. More particularly, the invention relates to computationally efficient adaptive bit allocation and quantization of encoded information useful in high-quality low bit-rate coding systems.In audio applications, a digital split-band encoder splits an input signal into frequency subband signals having bandwidths commensurate with the critical bandwidths of the human auditory system, quantizes the subband signals according to values established by an allocation function, and assembles the quantized subband signals into an encoded signal. The allocation function establishes allocation values in accordance with psychoacoustic principles with allowance for decoding synthesis filter bank spectral distortions.
    Type: Grant
    Filed: January 28, 1994
    Date of Patent: April 22, 1997
    Assignee: Dolby Laboratories Licensing Corporation
    Inventor: Louis D. Fielder
  • Patent number: 5621848
    Abstract: A method of analyzing data, particularly a speech signal, first pre-processes the signal by performing analog-to-digital conversion and cepstral analysis, producing a sequence of data frames. Then the sequence of data frames is partitioned into a plurality of data blocks. The data blocks may be subjected to further analysis, for example, by introducing them to a plurality of neural networks. The system may be implemented using either hardware or software or a combination thereof.
    Type: Grant
    Filed: June 6, 1994
    Date of Patent: April 15, 1997
    Assignee: Motorola, Inc.
    Inventor: Shay-Ping T. Wang
  • Patent number: 5617509
    Abstract: In a statistical based speech recognition system, one of the key issues is the selection of the Hidden Markov Model that best matches a given sequence of feature observations. The problem is usually addressed by the calculation of the maximum likelihood, ML, state sequence by means of a Viterbi or other decoder. Noise or inadequate training can produce a ML sequence associated with a Hidden Markov Model other than the correct model. The method of the present invention provides improved robustness by combining the standard ML state sequence score (416) with an additional path score (418) derived from the dynamics of the ML score as a function of time. These two scores, when combined, form a hybrid metric (420) that, when used with the decoder, optimizes selection of the correct Hidden Markov Model (422).
    Type: Grant
    Filed: March 29, 1995
    Date of Patent: April 1, 1997
    Assignee: Motorola, Inc.
    Inventors: William M. Kushner, Edward Srenger, Matthew A. Hartman
  • Patent number: 5615300
    Abstract: Synthesized speech is generated by a software-implemented system with a programmed central processing unit. Phonetic parameters are generated from a series of phonetic symbols of an input text to be converted into synthesized speech, and prosodic parameters are also generated from prosodic information of the input text. The activity ratio of the central processing unit is determined, and the order of phonetic parameters or the arrangement of a synthesis unit or filter for speech synthesis is determined depending on the determined activity ratio of the central processing unit. Synthesized speech sounds are generated and filtered based on the phonetic and prosodic parameters according to the determined order of phonetic parameters or the determined arrangement of the filter.
    Type: Grant
    Filed: May 26, 1993
    Date of Patent: March 25, 1997
    Assignee: Toshiba Corporation
    Inventors: Yoshiyuki Hara, Tsuneo Nitta
  • Patent number: 5613036
    Abstract: Maintaining dynamic categories for speech rules in a speech recognition system which has a plurality of speech rules each comprising a language model and action. Each speech rule indicates whether the language model includes a flag identifying whether the words in the language model is dynamic according to changing data in the speech recognition system. At periodic intervals, such as system initialization or application program launch time, for each flag in each speech rule which indicates that words in the language model are dynamic, the words of each of the language model(s) are updated depending upon the state of the system. Concurrent with the determination of acoustic features during speech recognition, a current language model can be created based upon the language models from these speech rules.
    Type: Grant
    Filed: April 25, 1995
    Date of Patent: March 18, 1997
    Assignee: Apple Computer, Inc.
    Inventor: Robert D. Strong
  • Patent number: 5583969
    Abstract: An apparatus for processing a speech signal includes a coefficient calculating circuit for receiving an input signal, and for generating a first value for suppressing a change of level of the input signal; a first delay circuit for receiving the input signal, and for delaying the input signal by a predetermined time; a feature extracting circuit for receiving the input signal, and for deriving a feature value representing a feature of consonants from the input signal; a coefficient control circuit for receiving the first value from the coefficient calculating circuit and the feature value from the feature extracting circuit, and for changing the amplitude and the duration of the first value depending on the feature value, so as to generate a second value; a multiplying circuit for receiving the delayed input signal from the first delay circuit and the second value from the coefficient control circuit, and for multiplying the delayed input signal by the second value.
    Type: Grant
    Filed: April 26, 1993
    Date of Patent: December 10, 1996
    Assignee: Technology Research Association of Medical and Welfare Apparatus
    Inventors: Yoshiyuki Yoshizumi, Tsuyoshi Mekata, Yoshinori Yamada, Ryoji Suzuki
  • Patent number: 5581652
    Abstract: A wideband speech signal (8 kHz, for example) of high quantity is reconstructed from a narrowband speech signal (300 Hz to 3.4 kHz). The input narrowband speech signal is LPC-analyzed to obtain spectrum information parameters, and the parameters are vector-quantized using a narrowband speech signal codebook. For each code number of the narrowband speech signal codebook, the wideband speech waveform corresponding to the codevector concerned is extracted by one pitch for voiced speech and by one frame for unvoiced speech and prestored in a representative waveform codebook. Representative waveform segments corresponding to the respective output codevector numbers of the quantizer are extracted from the representative waveform codebook. Voiced speech is synthesized by pitch-synchronous overlapping of the extracted representative waveform segments and unvoiced speech is synthesized by randomly using waveforms of one frame length. By this, a wideband speech signal is produced.
    Type: Grant
    Filed: September 29, 1993
    Date of Patent: December 3, 1996
    Assignee: Nippon Telegraph and Telephone Corporation
    Inventors: Masanobu Abe, Yuki Yoshida
  • Patent number: 5579431
    Abstract: The device detects the beginning and ending portions of speech contained within an input signal based on the variance of frequency band limited energy within the signal. The use of the variance allows detection which is relatively independent of an absolute signal-to-noise ratio with the signal, and allows accurate detection within a wide variety of backgrounds such as music, motor noise, and background noise, such as other speakers. The device can be easily implemented using off-the-shelf hardware along with a high-speed special purpose digital signal processor integrated circuit.
    Type: Grant
    Filed: October 5, 1992
    Date of Patent: November 26, 1996
    Assignees: Panasonic Technologies, Inc., Matsushita Electric Industrial Co. Ltd.
    Inventor: Benjamin K. Reaves
  • Patent number: 5572624
    Abstract: The speech recognition system disclosed herein obtains improved recognition accuracy by employing recognition models which are discriminatively trained from a data base comprising training data from different sources, e.g., both male and female voices. A linear discriminant analysis is performed on the training data using expanded matrices in which sources are identified or labelled. The linear discriminant analysis yields respective transforms for the different sources which however map the different sources onto a common vector space in which the vocabulary models are defined.
    Type: Grant
    Filed: January 24, 1994
    Date of Patent: November 5, 1996
    Assignee: Kurzweil Applied Intelligence, Inc.
    Inventor: Vladimir Sejnoha
  • Patent number: 5555343
    Abstract: A text parser for a text-to-speech processor accepts a text stream and parses the text stream to detect non-spoken characters and spoken characters. The spoken characters are passed to the text-to-speech converter and are not altered. A text generator generates pre-designated text sequences in response to non-spoken characters, such as special character sequences or character sequences which match format templates. A speech command generator generates speech commands in response to detecting of non-spoken characters such as non-spoken characters which affect text style, font, underlining, etc. The text-to-speech converter converts spoken text parsed by the parser and text generated by the text generator into speech, the text-to-speech converter being operable in response to speech commands generated by the speech command generator.
    Type: Grant
    Filed: April 7, 1995
    Date of Patent: September 10, 1996
    Assignee: Canon Information Systems, Inc.
    Inventor: Willis J. Luther
  • Patent number: 5553191
    Abstract: A method of coding a sampled speech signal vector in an analysis-by-synthesis coding procedure includes the step of forming an optimum excitation vector comprising a linear combination of a code vector from a fixed code book and a long term predictor vector. A first estimate of the long term predictor vector is formed in an open loop analysis. A second estimate of the-long term predictor vector is formed in a closed loop analysis. Finally, each of the first and second estimates are combined in an exhaustive search with each code vector of the fixed code book to form that excitation vector that gives the best coding of the speech signal vector.
    Type: Grant
    Filed: January 26, 1993
    Date of Patent: September 3, 1996
    Assignee: Telefonaktiebolaget LM Ericsson
    Inventor: Tor B. Minde
  • Patent number: 5553192
    Abstract: The disclosure is to eliminate discomforting background noise regenerated at a receive side (viz., a base station) in a mobile radio communications system wherein discontinuous transmission (DTX) is utilized. When speech pause is detected at the receive side, synthesis filter coefficients are produced using a background noise code which has been transmitted from a mobile unit. Subsequently, a Q value of the synthesis filter is measured using the above-mentioned synthesis filter coefficients. If the Q value is larger than a threshold level, each of the filter coefficients is lowered by a predetermined value. Thus, the regenerated discomforting background noise can effectively be reduced.
    Type: Grant
    Filed: October 12, 1993
    Date of Patent: September 3, 1996
    Assignee: NEC Corporation
    Inventor: Toshihiro Hayata
  • Patent number: 5539859
    Abstract: Fourier transform processing is applied to digital signals obtained by analog-to-digital conversion of signals supplied by two microphones spaced by a fixed distance to produce two series of discrete data each datum of which represents the energy and phase of a spectral frequency band of the received sound. A dominant angle of incidence representing the angle of incidence of a speech signal component of the received sound signal relative to the two microphones is determined from phase differences between the discrete data in the same frequency bands of the two series and is used to combined the two series of discrete data into a single instantaneous spectrum in which any speech signal component is amplified relative to the noise.
    Type: Grant
    Filed: February 16, 1993
    Date of Patent: July 23, 1996
    Assignee: Alcatel N.V.
    Inventors: Fran.cedilla.ois Robbe, Luc Dartois
  • Patent number: 5537509
    Abstract: A digital discontinuous cellular communication system has a transmitter that transmits two frames of data following detection of voice inactivity. A receiver includes a comfort noise generator that uses the two frames of data to output noise to the speaker during period of voice inactivity. The comfort noise generator includes synthesis codebook with samples scaled by actual background noise and excitation codebook with samples filtered and scaled by the background noise that are combined to produce comfort noise having attributes and loudness level of the received background noise prior to interruption of transmission. The scaled signals are weighted to vary the loudness level and spectral attributes.
    Type: Grant
    Filed: May 28, 1992
    Date of Patent: July 16, 1996
    Assignee: Hughes Electronics
    Inventors: Kumar Swaminathan, Brian M. McCarthy
  • Patent number: 5528727
    Abstract: An adaptive pitch pulse enhancer and method, adaptive to a voicing measure of input speech, for modifying the adaptive codebook of a CELP search loop to enhance the pitch pulse structure of the adaptive codebook. The adaptive pitch pulse enhancer determines a voicing measure of an input signal, the voicing measure being voiced when the input signal includes voiced speech and the voicing measure being unvoiced when the input signal does not include voiced speech, modifies a total excitation vector produced by the CELP search loop in accordance with the voicing measure of the input signal, and updates the adaptive codebook of the CELP search loop by storing the modified total excitation vector in the adaptive codebook.
    Type: Grant
    Filed: May 3, 1995
    Date of Patent: June 18, 1996
    Assignee: Hughes Electronics
    Inventor: Yi-Sheng Wang