Voiced Or Unvoiced Patents (Class 704/208)
  • Publication number: 20030086341
    Abstract: Copies of original sound recordings are identified by extracting features from the copy, creating a vector of those features, and comparing that vector against a database of vectors. Identification can be performed for copies of sound recordings that have been subjected to compression and other manipulation such that they are not exact replicas of the original. Computational efficiency permits many hundreds of queries to be serviced at the same time. The vectors may be less than 100 bytes, so that many millions of vectors can be stored on a portable device.
    Type: Application
    Filed: July 22, 2002
    Publication date: May 8, 2003
    Applicant: GRACENOTE, INC.
    Inventors: Maxwell Wells, Vidya Venkatachalam, Luca Cazzanti, Kwan Fai Cheung, Navdeep Dhillon, Somsak Sukittanon
  • Patent number: 6556967
    Abstract: The present invention is a device for and method of detecting voice activity by receiving a signal; computing the absolute value of the signal; squaring the absolute value; low pass filtering the squared result; computing the mean of the filtered signal; subtracting the mean from the filtered result; padding the mean subtracted result with zeros to form a value that is a power of two if the result is not already a power of two; computing a DFFT of the power of two result; normalizing the DFFT result of the last step; computing a mean of the normalization; computing a variance of the normalization; computing a power ratio of the normalization; classifying the mean, variance and power ratio as speech or non-speech based on how this feature vector compares to similarly constructed feature vectors of known speech and non-speech. The voice activity detector includes an absolute value squarer; a low pass filter; a mean subtractor; a zero padder; a DFFT; a normalizer; and a classifier.
    Type: Grant
    Filed: March 12, 1999
    Date of Patent: April 29, 2003
    Assignee: The United States of America as represented by the National Security Agency
    Inventors: Douglas J. Nelson, David C. Smith, Jeffrey L. Townsend
  • Publication number: 20030061036
    Abstract: A system and method for transmitting speech activity in a distributed voice recognition system. The distributed voice recognition system includes a local VR engine in a subscriber unit and a server VR engine on a server. The local VR engine comprises a feature extraction (FE) module that extracts features from a speech signal, and a voice activity detection module (VAD) that detects voice activity within a speech signal. Indications of voice activity are transmitted ahead of features from the subscriber unit to the server.
    Type: Application
    Filed: December 14, 2001
    Publication date: March 27, 2003
    Inventors: Harinath Garudadri, Michael Stuart Phillips
  • Publication number: 20030053639
    Abstract: A method for detecting voice activity comprises receiving audio signals on a plurality of channels and processing the audio signals on the channels to improve the signal-to-noise ratio thereof. The processed audio signals on each channel are then fed to associated voice activity detection algorithms and further processed. A voice or silence determination is then rendered based on at least the output of the voice activity detection algorithms. A voice activity detector is also provided.
    Type: Application
    Filed: August 15, 2002
    Publication date: March 20, 2003
    Applicant: Mitel Knowledge Corporation
    Inventors: Franck Beaucoup, Michael Tetelbaum
  • Patent number: 6535847
    Abstract: A speech coder is operable to compress digital data representing speech using a Waveform Interpolation speech coding method. The coding method is carried out on the residual signal from a Linear Predicative Coding stage. On the basis of a series of overlapping frames of the residual signal, a series of respective spectra are found. The evolution of the spectra is filtered in a multi-stage filtering process, the filtered phase data being replaced with the original phase data at the end of each stage. This is found to result in the decoder being better able to approximate the original speech signal. This is of particular utility in relation to mobile telephony.
    Type: Grant
    Filed: September 14, 1999
    Date of Patent: March 18, 2003
    Assignee: British Telecommunications public limited company
    Inventor: David F. Marston
  • Patent number: 6519279
    Abstract: Transceiver circuitry 1 comprises a first portion 10,20,30,41,50,100, having a first modulation means 41 operating at a first order of modulation, for transmitting and receiving voice signals; a second portion 20,30,42,50,100, having a second modulation means 42 operating at a second order of modulation, for transmitting and receiving digital signals at a higher data rate than is achievable by the first portion; and a data conversion means 20,30,100 operable to convert from or into voice signals intended for processing by the first portion into or from digital signals for processing by the second portion.
    Type: Grant
    Filed: January 5, 2000
    Date of Patent: February 11, 2003
    Assignee: Motorola, Inc.
    Inventors: Ouelid Abdesselem, Lydie Desperben
  • Patent number: 6510407
    Abstract: A speech encoding method using analysis-by-synthesis includes sampling an input speech and dividing the resulting speech samples into frames and subframes. The frames are analyzed to determine coefficients for the synthesis filter. The subframes are categorized into unvoiced, voiced and onset categories. Based on the category, a different coding scheme is used. The coded speech is fed into the synthesis filter, the output of which is compared to the input speech samples to produce an error signal. The coding is then adjusted per the error signal.
    Type: Grant
    Filed: October 19, 1999
    Date of Patent: January 21, 2003
    Assignee: Atmel Corporation
    Inventor: Shihua Wang
  • Publication number: 20020193989
    Abstract: The present invention includes a method, apparatus and system for STANDARD VOICE USER INTERFACE AND VOICE CONTROLLED DEVICES as described in the claims. Briefly, a standard voice user interface is provided to control various devices by using standard speech commands. The standard VUI provides a set of standard VUI commands and syntax for the interface between a user and the voice controlled device. The standard VUI commands include an identification phrase to determine if voice controlled devices are available in an environment. Other standard VUI commands provide for determining the names of the voice controlled devices and altering them.
    Type: Application
    Filed: May 21, 1999
    Publication date: December 19, 2002
    Inventors: MICHAEL GEILHUFE, DAVID MACMILLAN, AVRAHAM BAREL, AMOS BROWN, KARIN LISSETTE BOOTSMA, LAWRENCE KENT GADDY, PHILLIP PAUL PYO
  • Publication number: 20020188442
    Abstract: A method of detecting voice activity in a signal smoothes the “voice” or “noise” decision to avoid loss of speech segments. The method is particularly suitable for situations in which the noise level is high. Unlike the prior art method which favors optimizing traffic, this method favors the intelligibility of the signal reproduced after decoding. The signal to be coded is divided into frames. A “voice” or “noise” initial decision is made for each signal frame. The method makes the “voice” decision as soon as there is any increase in the energy of the signal relative to the frame preceding the current frame, even if the increase is slight. The method makes the “noise” decision only if the characteristics of the signal correspond to the characteristics of the noise for at least i consecutive frames (for example i=6). The method has applications in telephony.
    Type: Application
    Filed: May 10, 2002
    Publication date: December 12, 2002
    Applicant: ALCATEL
    Inventors: Raymond Gass, Richard Atzenhoffer
  • Publication number: 20020184007
    Abstract: A low-bit-rate coding technique for unvoiced segments of speech includes the steps of extracting high-time-resolution energy coefficients from a frame of speech, quantizing the energy coefficients, generating a high-time-resolution energy envelope from the quantized energy coefficients, and reconstituting a residue signal by shaping a randomly generated noise vector with quantized values of the energy envelope. The energy envelope may be generated with a linear interpolation technique. A post-processing measure may be obtained and compared with a predefined threshold to determine whether the coding algorithm is performing adequately.
    Type: Application
    Filed: July 17, 2002
    Publication date: December 5, 2002
    Inventors: Amitava Das, Sharath Manjunath
  • Publication number: 20020177997
    Abstract: A digital system and method of operation is provided in which musical notes and melodies are synthesized. The operation done for music synthesis is based on time domain processing of prerecorded waveforms, referred to as analysis waveforms. The computations are done using time-marks, which is a set of digital sample positions of the analysis waveform indicating the starting position of each period of the fundamental frequency or an arbitrary position for non-periodic analysis waveforms. The algorithm defines on a time scale the time-marks of the synthesis waveform. The synthesis is based on making a relation between the analysis time-marks and the synthesis time-marks. The synthesis waveforms are built with the extraction of small portions of signal located at corresponding time-mark positions of the analysis waveform and adding them to the corresponding synthesis time-marks on the synthesis time-scale.
    Type: Application
    Filed: September 26, 2001
    Publication date: November 28, 2002
    Inventors: Laurent Le-Faucheur, Gilles Dassot
  • Publication number: 20020173950
    Abstract: The speech intelligibility of an audio signal of unchanged volume is improved by raising the total audio signal by a constant factor and lowering the amplitude of this raised signal by a high-pass filter. The corner frequency fc of the high-pass filter is adjusted such that the output amplitude of the audio signal at the end of the processing segment is equal or proportional to the input amplitude of the audio signal.
    Type: Application
    Filed: May 20, 2002
    Publication date: November 21, 2002
    Inventor: Matthias Vierthaler
  • Patent number: 6470311
    Abstract: In a speech processing system, an optimal filter frequency is determined and used to filter an unfiltered signal. The optimum filter is chosen by passing the largest voice area greater than 50 ms through multiple filters. The average energy output for each filter and differences between the filter averages (DeltaEnergy) are calculated. The first peak in DeltaEnergy above the average DeltaEnergy determines the optimal filter for filtering the signal. The filtered signal is divided into segments and voiced periods are determined. The unfiltered signal is divided into pitch synchronous frames based on the filtered signal.
    Type: Grant
    Filed: October 15, 1999
    Date of Patent: October 22, 2002
    Assignee: Fonix Corporation
    Inventor: Robert Brian Moncur
  • Patent number: 6466904
    Abstract: There is provided a speech decoder comprising a means for generating an excitation signal and a means for performing harmonic analysis and synthesis on the excitation signal in order to generate a smooth, periodic speech signal. The speech decoder further comprises a mixing means for mixing the excitation signal with the smooth, periodic signal and a synthesizing means for synthesizing the modified excitation signal into a speech signal that can be played to a user through a listening means. There is also provided a receiver that incorporates a speech decoder such as the decoder described above as well as a method for speech decoding.
    Type: Grant
    Filed: July 25, 2000
    Date of Patent: October 15, 2002
    Assignee: Conexant Systems, Inc.
    Inventors: Yang Gao, Huan-yu Su
  • Patent number: 6463407
    Abstract: A low-bit-rate coding technique for unvoiced segments of speech includes the steps of extracting high-time-resolution energy coefficients from a frame of speech, quantizing the energy coefficients, generating a high-time-resolution energy envelope from the quantized energy coefficients, and reconstituting a residue signal by shaping a randomly generated noise vector with quantized values of the energy envelope. The energy envelope may be generated with linear interpolation technique. A post-processing measure may be obtained and compared with a predefined threshold to determine whether the coding algorithm is performing adequately.
    Type: Grant
    Filed: November 13, 1998
    Date of Patent: October 8, 2002
    Assignee: Qualcomm Inc.
    Inventors: Amitava Das, Sharath Manjunath
  • Publication number: 20020138255
    Abstract: The invention relates to a voice activity detecting device and a voice activity detecting method. An object of the invention is to adapt to various characteristics of noise which may possibly be superimposed on an aural signal to thereby reliably discriminate between an active voice segment and a non-active voice segment. For this purpose, the voice activity detecting device comprises: a speech-segment inferring section 11 for determining the probability that each of active voice frames given in order of time sequence belongs to the active voice segment, based on the statistical characteristic of the aural signal; a quality monitoring section 12 for monitoring the quality of the aural signal for each active voice frame, and a speech-segment determining section 13 for weighting the determined probability with the above quality to obtain for each active voice frame the accuracy that the active voice frame belongs to the active voice segment.
    Type: Application
    Filed: March 28, 2002
    Publication date: September 26, 2002
    Inventors: Kaori Endo, Yasuji Ota
  • Publication number: 20020138254
    Abstract: A speech processing apparatus comprises a speech input section which receives multi-channel signals, a beam former processing section for performing beam former processing on the multi-channel signals to suppress a signal arriving from a target speech source, a target source direction estimation section for estimating the direction of the target source from filter coefficients resulting from the beam former processing, and a voiced/unvoiced speech determination section for determining a speech interval of a speech signal on the basis of the estimated direction of the target source.
    Type: Application
    Filed: March 20, 2002
    Publication date: September 26, 2002
    Inventors: Takehiko Isaka, Yoshifumi Nagata
  • Publication number: 20020133333
    Abstract: A sound separation apparatus for separating a target signal from a mixed input signal, wherein the mixed input signal includes the target signal and one or more sound signals emitted from different sound sources. The sound separation apparatus according comprises a frequency analyzer for performing a frequency analysis on the mixed input signal and calculating spectrum and frequency component candidate points at each time. The apparatus further comprises feature extraction means for extracting feature parameters which are estimated to correspond with the target signal, comprising a local layer for analyzing local feature parameters using the spectrum and the frequency component candidate points and one or more global layers for analyzing global feature parameters using the feature parameters extracted by the local layer. The apparatus further comprises a signal regenerator for regenerating a waveform of the target signal using the feature parameters extracted by the feature extraction means.
    Type: Application
    Filed: January 17, 2002
    Publication date: September 19, 2002
    Inventors: Masashi Ito, Hiroshi Tsujino
  • Patent number: 6453284
    Abstract: For tracking multiple, simultaneous voices, predicted tracking is used to follow individual voices through time, even when the voices are very similar in fundamental frequency. An acoustic waveform comprised of a group of voices is submitted to a frequency estimator, which may employ an average magnitude difference function (AMDF) calculation to determine the voice fundamental frequencies that are present for each voice. These frequency estimates are then used as input values to a recurrent neural network that tracks each of the frequencies by predicting the current fundamental frequency value for each voice present based on past fundamental frequency values in order to disambiguate any fundamental frequency trajectories that may be converging in frequency.
    Type: Grant
    Filed: July 26, 1999
    Date of Patent: September 17, 2002
    Assignee: Texas Tech University Health Sciences Center
    Inventor: D. Dwayne Paschall
  • Publication number: 20020128825
    Abstract: The signal discrimination result is prevented from becoming the “voice” during the transmission of the modem signal by the V.34 modulation system.
    Type: Application
    Filed: March 6, 2002
    Publication date: September 12, 2002
    Inventor: Yukimasa Sugino
  • Patent number: 6438517
    Abstract: A “multi-stage” method of estimating pitch in a speech encoder (FIG. 2). In a first stage of the method, a set of candidate pitch values is selected, such as by using a cost function that operates on said speech signal (steps 21-23). In a second stage of the method, a best candidate is selected. Specifically, in the second stage, pitch values calculated from previous speech segments are used to calculate an average pitch value (step 25). Then, depending on whether the average pitch value is short or long, one of two different analysis-by-synthesis (ABS) processes is then repeated for each candidate, such that for each iteration, a synthesized signal is derived from that pitch candidate and compared to a reference signal to provide an error value. A time domain ABS process is used if the average pitch is short (step 27), whereas a frequency domain ABS process is used if the average pitch is long (step 28).
    Type: Grant
    Filed: April 27, 2000
    Date of Patent: August 20, 2002
    Assignee: Texas Instruments Incorporated
    Inventor: Suat Yeldener
  • Patent number: 6427134
    Abstract: A voice activity detector suitable for deployment in a mobile phone apparatus is disclosed. An advantage of the voice activity detector is that it is better able to provide a decision (79) as to whether an input signal (19) consists of noise (which it is not desired to transmit) or comprises speech or information tones (which are required to be transmitted), especially in noisy environments. The voice activity detector includes a number of components, in particular an auxiliary voice activity detector (3). The auxiliary voice activity detector (3) distinguishes between noise and speech on the basis that the spectrum of speech changes more rapidly than that of noise. This results in the auxiliary detector (3) rarely mistaking a speech signal to be a noise signal. Hence, a very reliable noise template (421) is obtained. For this reason, the auxiliary detector (3) is also useful in noise reduction applications. The voice activity detector also uses a neural net classifier (7).
    Type: Grant
    Filed: September 26, 1998
    Date of Patent: July 30, 2002
    Assignee: British Telecommunications public limited company
    Inventors: Neil Robert Garner, Paul Alexander Barrett
  • Patent number: 6427135
    Abstract: A method for encoding speech wherein an input speech signal is separated by a component separator into a first component mainly constituted by speech and a second component mainly constituted by a background noise at each predetermined unit of time, a bit allocation selector selects bit allocation for each component based on the first and second components from among a plurality of predetermined candidates for bit allocation, a speech encoder and a noise encoder encode the first and second components from the component separator based on the bit allocation according to predetermined different methods for encoding, and a multiplexer multiplexes encoded data of the first and second components and information on the bit allocation and outputs them as transmitted encoded data.
    Type: Grant
    Filed: October 27, 2000
    Date of Patent: July 30, 2002
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Kimio Miseki, Masahiro Oshikiri, Tadashi Amada, Masami Akamine
  • Patent number: 6415252
    Abstract: Bits are allocated to short-term repetition information for unvoiced input signals. Stated differently, more bits are allocated for pitch information during unvoiced input speech than in the prior art. The improved method and apparatus in an encoder (300) and decoder (700) result in improved consistency of amplitude pulses compared to prior art methods which indicates improved stability due to increased search resolution. Also, the improved method and apparatus result in higher energy compared to prior art methods which indicates that the synthesized waveform matches the target waveform more closely, resulting in a higher fixed codebook (FCB) gain.
    Type: Grant
    Filed: May 28, 1998
    Date of Patent: July 2, 2002
    Assignee: Motorola, Inc.
    Inventors: Weimin Peng, James Patrick Ashley
  • Publication number: 20020062209
    Abstract: A voiced/unvoiced information estimation system uses input spectrum and synthetic spectrum to produce a voicing level spectrum. The estimation system uses a spectrum difference calculation unit to normalize a spectrum difference energy for each harmonic band in unit of harmonic band, and further uses a voicing level calculation unit to calculate a voicing level. The voicing level of each harmonic band has a continuous value between 1 and 0. The estimation system is effective in vector quantization of voiced/unvoiced information at a low bit rate. Because it is unnecessary to calculate a threshold for deciding a voiced/unvoiced information, a decision anomaly occurring due to threshold is eliminated, and the accuracy of a voicing level is improved. Furthermore, since a spectrum is represented by mixing a voiced element and a unvoiced element in a harmonic band, the estimation system improves the audio quality of a combined sound.
    Type: Application
    Filed: July 3, 2001
    Publication date: May 23, 2002
    Applicant: LG Electronics Inc.
    Inventor: Yong Soo Choi
  • Patent number: 6385570
    Abstract: An apparatus and method for detecting transitional parts of speech, and a method of synthesizing transitional parts of speech, are provided. This apparatus includes a residual signal preprocessor for emphasizing a period of a speech residual signal which includes a peak value, a relative peak value calculation unit for obtaining a peak value of a preprocessed residual signal and a relative peak value using a predetermined reference peak value, and a transitional part detector for detecting transitional parts of speech on the basis of the relative peak value.
    Type: Grant
    Filed: May 1, 2000
    Date of Patent: May 7, 2002
    Assignee: Samsung Electronics Co., Ltd.
    Inventor: Moo-young Kim
  • Patent number: 6377915
    Abstract: A decoder compares a spectral envelope value y8 on a frequency axis with a predetermined threshold f9 to identify a voiced region and an unvoiced region. An excitation signal is produced by using excitations suitable for respective frequency regions. An encoder applies the nonuniform quantization to the period of the aperiodic pitch in accordance with its frequency of occurrence. The result of the nonuniform quantization is transmitted together with the quantization result of the unvoiced state and the periodic pitch as one code. A decoder obtains spectral envelope amplitude l8′ from the spectral envelope information, and identifies a frequency band e10′ where the spectral envelope amplitude value is maximized in each of respective bands divided on the frequency axis.
    Type: Grant
    Filed: March 14, 2000
    Date of Patent: April 23, 2002
    Assignee: YRP Advanced Mobile Communication Systems Research Laboratories Co., Ltd.
    Inventor: Seishi Sasaki
  • Patent number: 6377916
    Abstract: A speech signal is encoded into a set of encoded bits by digitizing the speech signal to produce a sequence of digital speech samples that are divided into a sequence of frames, each of which spans multiple digital speech samples. A set of speech model parameters are estimated for a frame. The speech model parameters include voicing parameters dividing the frame into voiced and unvoiced regions, at least one pitch parameter representing pitch for at least the voiced regions of the frame, and spectral parameters representing spectral information for at least the voiced regions of the frame. The speech model parameters are quantized to produce parameter bits. The frame is also divided into one or more subframes for which transform coefficients are computed. The transform coefficients for unvoiced regions of the frame are quantized to produce transform bits. The parameter bits and the transform bits are included in the set of encoded bits.
    Type: Grant
    Filed: November 29, 1999
    Date of Patent: April 23, 2002
    Assignee: Digital Voice Systems, Inc.
    Inventor: John C. Hardwick
  • Patent number: 6377920
    Abstract: A voicing probability determination method is provided for estimating a percentage of unvoiced and voiced energy for each harmonic within each of a plurality of bands of a speech signal spectrum. Initially, a synthetic speech spectrum is generated based on the assumption that speech is purely voiced. The original and synthetic speech spectra are then divided into plurality of bands. The synthetic and original speech spectra are compared harmonic by harmonic, and a voicing determination is made based on this comparison. In one embodiment, each harmonic of the original speech spectrum is assigned a voicing decision as either completely voiced or unvoiced by comparing the difference with an adaptive threshold. If the difference for each harmonic is less than the adaptive threshold, the corresponding harmonic is declared as voiced; otherwise the harmonic is declared as unvoiced. The voicing probability for each band is then computed based on the amount of energy in the voiced harmonics in that decision band.
    Type: Grant
    Filed: February 28, 2001
    Date of Patent: April 23, 2002
    Assignee: Comsat Corporation
    Inventor: Suat Yeldener
  • Patent number: 6370500
    Abstract: A technique is used in a speech encoder (107) that reduces non-speech activity of a low bit rate digital voice message. Speech model parameters that include quantized speech spectral parameter vectors are generated in a sequence of frames. A determination is made as to which frames of the sequence of frames are voiced frames and which frames are unvoiced frames. A consecutive sequence of frames of unvoiced frames is identified (2330) as an unvoiced burst when a length, NUV, of the consecutive sequence of frames exceeds a predetermined length, Ns. A non-speech activity portion of the unvoiced burst is identified (2335-2365) and removed.
    Type: Grant
    Filed: September 30, 1999
    Date of Patent: April 9, 2002
    Assignee: Motorola, Inc.
    Inventors: Jian-Cheng Huang, Sunil Satyamurti, Floyd Simpson, Kenneth Finlon
  • Publication number: 20020025048
    Abstract: A method of transmitting voice information from an electronic communications device (1) comprises the steps of receiving the voice information from the environment of the device together with a first background sound, generating a sound signal having a first signal part representing the voice information and a second signal part representing the first background sound, reducing the signal part representing the first background sound, and transmitting the sound signal through a communications channel to which the device is connected. The method further comprises the step of adding to the sound signal an additional signal representing a second background sound. In this way background noise can be removed, while a natural and comfortable conversation can be maintained without revealing the location of the user of the device.
    Type: Application
    Filed: March 30, 2001
    Publication date: February 28, 2002
    Inventors: Harald Gustafsson, Ulf Lindgren, Ingvar Claesson, Mattias Dahl
  • Publication number: 20010049598
    Abstract: A low-bit-rate coding technique for unvoiced segments of speech includes the steps of extracting high-time-resolution energy coefficients from a frame of speech, quantizing the energy coefficients, generating a high-time-resolution energy envelope from the quantized energy coefficients, and reconstituting a residue signal by shaping a randomly generated noise vector with quantized values of the energy envelope. The energy envelope may be generated with a linear interpolation technique. A post-processing measure may be obtained and compared with a predefined threshold to determine whether the coding algorithm is performing adequately.
    Type: Application
    Filed: November 13, 1998
    Publication date: December 6, 2001
    Inventor: AMITAVA DAS
  • Patent number: 6289311
    Abstract: A method and apparatus for sound synthesizing and sound band expanding of a narrow band input signal uses wide-band voiced and unvoiced sound code books and also uses narrow-band voiced and unvoiced sound code books. Coded input sound parameters are decoded and quantized using the narrow-band voiced and unvoiced sound code books and are then de-quantized using the wide-band voiced and unvoiced sound code books. The sound is synthesized based on the de-quantized data and a so-called innovation-related parameter formed by a zero-filling circuit filing zeros between samples of the framed input signal, so that the result is an upsampled aliased wide-band signal used with the de-quantized data to synthesize the sound.
    Type: Grant
    Filed: October 20, 1998
    Date of Patent: September 11, 2001
    Assignee: Sony Corporation
    Inventors: Shiro Omori, Masayuki Nishiguchi
  • Patent number: 6285979
    Abstract: Phoneme analysis is carried out in real time by detecting a voiced component in the range of 200 Hz to 1 KHz and simultaneously detecting voiceless components having frequencies greater than about 2.4 KHz and greater than about 3.4 KHz, respectively, to produce respective outputs which are logically combined to produce two-bit logic signals which can be used to control a speech processing device.
    Type: Grant
    Filed: February 22, 1999
    Date of Patent: September 4, 2001
    Assignee: AVR Communications Ltd.
    Inventors: Boris Ginzburg, Barak Dar
  • Patent number: 6275794
    Abstract: A method and apparatus for generating frame voicing decisions for an incoming speech signal having periods of active voice and non-active voice for a speech encoder in a speech communications system. A predetermined set of parameters is extracted from the incoming speech signal, including a pitch gain and a pitch lag. A frame voicing decision is made for each frame of the incoming speech signal according to values calculated from the extracted parameters. The predetermined set of parameters further includes a partial residual frame full band energy, and a set of spectral parameters called Line Spectral Frequencies (LSF). A signal-to-noise value is estimated and tracked to adaptively set threshold values, thereby improving performance under various noise conditions.
    Type: Grant
    Filed: December 22, 1998
    Date of Patent: August 14, 2001
    Assignee: Conexant Systems, Inc.
    Inventors: Adil Benyassine, Eyal Shlomot
  • Patent number: 6269332
    Abstract: A method of coding speech is disclosed in which the speech signal is sampled and divided into a plurality of frames upon which multi-band excitation analysis is performed to derive a fundamental pitch, a plurality of voiced/unvoiced decisions and amplitudes of harmonics within the bands. The harmonic amplitudes are split into a first group of a fixed number of harmonics and a second group of the remainder of harmonics and these are separately transformed using the Discrete Cosine Transform for the first group and Non-Square Transform for the second group, the resulting transform coefficients being vector quantized to form a plurality of output indices. A decoding method and apparatus for performing both encoding and decoding methods are also disclosed.
    Type: Grant
    Filed: May 28, 1999
    Date of Patent: July 31, 2001
    Assignee: Siemens Aktiengesellschaft
    Inventors: Wee Boon Choo, Soo Ngee Koh
  • Publication number: 20010007973
    Abstract: A voice encoding device according to the present invention includes an encoder having a first quantizing block suitable for voice encoding and a second quantizing block suitable for non-voice encoding and compressively encoding input signals, a voice/non-voice signal identification unit for identifying whether an signal input to the encoder is a voice signal or a non-voice signal and outputting a determination result, and a multiplexer portion for multiplexing respective outputs from the first quantizing block and the second quantizing block in order to output to a transmission path. In this case, the encoder has a selector for selecting either one of the first quantizing block or the second quantizing in accordance with the determination result from the voice/non-voice signal identification unit, and the first quantizing block and the second quantizing block compressively encode signals by using a same quantization table.
    Type: Application
    Filed: December 20, 2000
    Publication date: July 12, 2001
    Applicant: MITSUBISHI DENKI KABUSHIKI KAISHA
    Inventor: Hisashi Yajima
  • Patent number: 6253171
    Abstract: A voicing probability determination method is provided for estimating a percentage of unvoiced and voiced energy for each harmonic within each of a plurality of bands of a speech signal spectrum. Initially, a synthetic speech spectrum is generated based on the assumption that speech is purely voiced. The original and synthetic speech spectra are then divided into plurality of bands. The synthetic and original speech spectra are compared harmonic by harmonic, and a voicing determination is made based on this comparison. In one embodiment, each harmonic of the original speech spectrum is assigned a voicing decision as either completely voiced or unvoiced by comparing the difference with an adaptive threshold. If the difference for each harmonic is less than the adaptive threshold, the corresponding harmonic is declared as voiced; otherwise the harmonic is declared as unvoiced. The voicing probability for each band is then computed based on the amount of energy in the voiced harmonics in that decision band.
    Type: Grant
    Filed: February 23, 1999
    Date of Patent: June 26, 2001
    Assignee: Comsat Corporation
    Inventor: Suat Yeldener
  • Patent number: 6233551
    Abstract: A method and an apparatus for determining multiband voicing levels using a frequency moving method in a vocoder are provided. The method for determining the multiband voicing levels using the frequency moving method according to the present invention in the vocoder includes the steps of (a) applying a window to an input voice signal and obtaining a power spectrum from a voice spectrum obtained by Fourier converting a windowed signal, (b) moving the frequency of each subband to an origin after dividing the power spectrum into a predetermined number of subbands, (c) obtaining autocorrelation values of the respective subbands by inverse Fourier converting the power spectrum the frequency of which is moved to the origin, and (d) normalizing the respective autocorrelation values and determining the voicing levels of the subbands from the normalized autocorrelation values.
    Type: Grant
    Filed: April 22, 1999
    Date of Patent: May 15, 2001
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Yong-duk Cho, Moo-young Kim
  • Patent number: 6233550
    Abstract: A method and apparatus for encoding speech for communication to a decoder for reproduction of the speech where the speech signal is classified into steady state voiced (harmonic), stationary unvoiced, and “transitory” or “transition” speech, and a particular type of coding scheme is used for each class. Harmonic coding is used for steady state voiced speech, “noise-like” coding is used for stationary unvoiced speech, and a special coding mode is used for transition speech, designed to capture the location, the structure, and the strength of the local time events that characterize the transition portions of the speech. The compression schemes can be applied to the speech signal or to the LP residual signal.
    Type: Grant
    Filed: August 28, 1998
    Date of Patent: May 15, 2001
    Assignee: The Regents of the University of California
    Inventors: Allen Gersho, Eyal Shlomot, Vladimir Cuperman, Chunyan Li
  • Patent number: 6226606
    Abstract: In a method for tracking pitch in a speech signal, first and second window vectors are created from samples taken across first and second windows of the speech signal. The first window is separated from the second window by a test pitch period. The energy of the speech signal in the first window is combined with the correlation between the first window vector and the second window vector to produce a predictable energy factor. The predictable energy factor is then used to determine a pitch score for the test pitch period. Based in part on the pitch score, a portion of the pitch track is identified.
    Type: Grant
    Filed: November 24, 1998
    Date of Patent: May 1, 2001
    Assignee: Microsoft Corporation
    Inventors: Alejandro Acero, James G. Droppo, III
  • Patent number: 6208960
    Abstract: An audio equivalent input signal is divided into a sequence of overlapping or adjacent signal segments. A lengthened signal is synthesized by systematically maintaining or repeating respective signal segments of the sequence of segments. Repeating non-periodic segments, such as a voiceless part of a speech signal or noise in music, results in audible artefacts. The introduced periodicity is broken by dividing a signal section originating from one non-periodic source signal segment into a second sequence of signal segments with at least one of the signal segments having a duration not equal to a duration of the source signal segment and not equal to a multiple of the duration of the source signal segment. Signal segments of the second sequence are shuffled.
    Type: Grant
    Filed: December 16, 1998
    Date of Patent: March 27, 2001
    Assignee: U.S. Philips Corporation
    Inventor: Ercan F. Gigi
  • Patent number: 6199037
    Abstract: Speech is encoded into a frame of bits. A speech signal is digitized into a sequence of digital speech samples that are then divided into a sequence of subframes. A set of model parameters is estimated for each subframe. The model parameters include a set of voicing metrics that represent voicing information for the subframe. Two or more subframes from the sequence of subframes are designated as corresponding to a frame. The voicing metrics from the subframes within the frame are jointly quantized. The joint quantization includes forming predicted voicing information from the quantized voicing information from the previous frame, computing the residual parameters as the difference between the voicing information and the predicted voicing information, combining the residual parameters from both of the subframes within the frame, and quantizing the combined residual parameters into a set of encoded voicing information bits which are included in the frame of bits.
    Type: Grant
    Filed: December 4, 1997
    Date of Patent: March 6, 2001
    Assignee: Digital Voice Systems, Inc.
    Inventor: John C. Hardwick
  • Patent number: 6173256
    Abstract: Speech is received as a sequence of segments that are coded according to an LPC principle. The segments are reproduced for concatenated read-out in audio reproduction, by exciting an all-pole filter with recurrent signals in case of voiced speech and by white noise in case of unvoiced speech. In particular, the recurrent signals are globally represented as an accumulated series of periodic signals on the basis of mutually overlapping time windows. The recurrent signals are supplemented by noise for filtering through an amended LPC filter derived from the original LPC-filter by using information of pitch and formants, and of a voiced-unvoiced dichotomy. The filter is determined as depending on at least a subset of the four quantities Global Noise Scaling, Pitch Dependent Noise Scaling, Amplitude Dependent Noise Scaling, and Inter-Formant Noise Scaling.
    Type: Grant
    Filed: October 27, 1998
    Date of Patent: January 9, 2001
    Assignee: U.S. Philips Corporation
    Inventor: Ercan F. Gigi
  • Patent number: 6161005
    Abstract: A remote door locking/unlocking system includes both telephone receiver/DTMF decoder circuitry and a wireless radio frequency or infrared sensor for enabling the system to be actuated either by a portable short range transmitter or by any telephone. The telephone circuitry or handset may be removably installed in the door so that it can be used with a variety of different cellular or satellite communications systems. An alternative to DTMF and/or radio frequency or infrared activation, speech recognition could be used to interpret voice commands transmitted over the network or picked up by a microphone.
    Type: Grant
    Filed: August 10, 1998
    Date of Patent: December 12, 2000
    Inventor: Brian W. Pinzon
  • Patent number: 6157908
    Abstract: An order point communication system and method employs a noise reduction circuit for enhancing the messages communicated to complete an order to improve substantially the speech to noise ratio thereof so that the perceived sound quality is enhanced in noisy environments. The circuit includes a device for separating noise content of portions of the messages, and another device for reconstructing the noise-removed portion of the messages to form a single substantially pure speech signal to serve as a reconstructed message for completing the order.
    Type: Grant
    Filed: January 27, 1998
    Date of Patent: December 5, 2000
    Assignee: HM Electronics, Inc.
    Inventor: David C. O'Gwynn
  • Patent number: 6148282
    Abstract: A multimodal code-excited linear prediction (CELP) speech coder determines a pitch-lag-periodicity-independent peakiness measure from the input speech. If the measure is greater than a peakiness threshold the encoder classifies the speech in a first coding mode. In one embodiment only frames having an open-loop pitch prediction gain not greater than a threshold, a zero-crossing rate not less than a threshold, and a peakiness measure not greater than the peakiness threshold will be classified as unvoiced speech. Accordingly, the beginning or end of a voiced utterance will be properly coded as voiced speech and speech quality improved. In another embodiment, gain-match scaling matches coded speech energy to input speech energy. A target vector (the portion of input speech with any effects of previous signals removed) is approximated using the precomputed gain for excitation vectors while minimizing perceptually-weighted error.
    Type: Grant
    Filed: December 29, 1997
    Date of Patent: November 14, 2000
    Assignee: Texas Instruments Incorporated
    Inventors: Erdal Paksoy, Alan V. McCree
  • Patent number: 6138092
    Abstract: A speech coding system and associated method relies on a speech encoder (15) and a speech decoder (20). The speech decoder (20) includes an LPC synthesis filter (90), a Gaussian noise generator (80) for generating unvoiced excitation, an epoch-adaptive harmonic generator (70) for generating voiced excitation for pitch harmonics below voicing cutoff frequency, and an excitation summer (72) for summing the voiced and unvoiced excitation generated by the Gaussian noise generator (80) and the harmonic generator (70). The output of the excitation summer (72) is provided to the LPC synthesis filter (90) to generate synthesized speech. The system and method provides natural sounding synthesized speech at a low bit rate.
    Type: Grant
    Filed: July 13, 1998
    Date of Patent: October 24, 2000
    Assignee: Lockheed Martin Corporation
    Inventors: Richard Louis Zinser, Jr., Mark Lewis Grabb, Steven Robert Koch, Glen William Brooksby
  • Patent number: 6138089
    Abstract: The invention provides system, apparatus, and method for compressing a speech signal by decimating or removing somewhat redundant portions of the signal while retaining reference signal portions sufficient to reconstruct the signal without noticeable loss in quality, thereby permitting a storage and transmission of high quality speech with minimal storage volume or transmission bandwidth requirements. Speech pitch waveform decimation is used to reduce data to produce an encoded speech signal during compression, and time based interpolative speech reconstruction is used on the encoded signal to reconstruct the original speech signal.
    Type: Grant
    Filed: March 10, 1999
    Date of Patent: October 24, 2000
    Assignee: Infolio, Inc.
    Inventor: Shelia Guberman
  • Patent number: 6134519
    Abstract: A voice encoder using a VOX (voice operated transmission) control has a pitch analyzer and a high-efficiency encoder. When a voiced state is detected in an input audio signal, the input audio signal and pitch information extracted therefrom are encoded by the high-efficiency encoder and transmitted to a voice decoder. When an unvoiced state is detected, the high-efficiency encoder encodes the input audio signal without a gain of the pitch information. The encoded data without using the gain information is transmitted after a post-amble signal to obtain natural background noise.
    Type: Grant
    Filed: June 8, 1998
    Date of Patent: October 17, 2000
    Assignee: NEC Corporation
    Inventor: Satoshi Aihara