Voiced Or Unvoiced Patents (Class 704/208)
  • Patent number: 7555310
    Abstract: An electronic apparatus configured, on the assumption of its sharing by a plurality of users, to provide a plurality of functions, in which normal operations and interrupt operations based on a user's voice can be performed is provided. The apparatus comprises: a voice inputting unit for inputting a user's voice; a storing unit for storing at least a predetermined type of operation for each of a plurality of users; a voice decoding unit for decoding input voice information; a determining unit for determining if the decoded voice describes a type of operation or describes a type of operation and a word “interrupt”; a command outputting unit for outputting: a command which allows a processing involved in the type of operation to be executed, at least when the determining unit determines that the decoded voice describes a type of operation in the storing unit.
    Type: Grant
    Filed: November 9, 2006
    Date of Patent: June 30, 2009
    Assignee: Kyocera Mita Corporation
    Inventors: Kentarou Sakuramoto, Hideki Hayashi
  • Patent number: 7542787
    Abstract: The present invention provides an apparatus and method for providing hands-free operation of a device. A hands-free adapter is provided that communicates with a device and a headset. The hands-free adapter allows a user to use voice commands so that the user does not have to handle the device. The hands-free adapter receives voice commands from the headset and translates the voice commands to commands recognized by the device. The hands-free adapter also monitors the device to detect device events and provides notice of the events to the user via the headset.
    Type: Grant
    Filed: February 14, 2006
    Date of Patent: June 2, 2009
    Assignee: AT&T Intellectual Property I, L. P.
    Inventors: Lan Zhang, Joseph E. Page, Jr., Barrett M. Kreiner
  • Patent number: 7536298
    Abstract: An embodiment of the invention improves upon the International Telecommunication Union's ITU-T G.729 Annex B comfort noise generation algorithm by reducing the computational complexity of the comfort noise generation algorithm. The computational complexity is reduced by reusing pre-computed random Gaussian noise samples for each non active voice frame versus calculating new random Gaussian noise samples for each non active voice frame as described by Annex B.
    Type: Grant
    Filed: March 15, 2004
    Date of Patent: May 19, 2009
    Assignee: Intel Corporation
    Inventors: Permachanahalli S Ramkumar, Shashi Shankar Hosur
  • Publication number: 20090125301
    Abstract: The technology disclosed relates to audio signal processing. It includes a series of modules that individually are useful to solve audio signal processing problems. Among the problems addressed are buzz removal, selecting a pitch candidate among pitch candidates based on local continuity of pitch and regional octave consistency, making small adjustments in pitch, ensuring that a selected pitch is consistent with harmonic peaks, determining whether a given frame or region of frames includes harmonic, voiced signal, extracting harmonics from voice signals and detecting vibrato. One environment in which these modules are useful is transcribing singing or humming into a symbolic melody. Another environment that would usefully employ some of these modules is speech processing. Some of the modules, such as buzz removal, are useful in many other environments as well.
    Type: Application
    Filed: November 3, 2008
    Publication date: May 14, 2009
    Applicant: Melodis Inc.
    Inventors: Aaron Master, Seyed Majid Emami
  • Patent number: 7529664
    Abstract: An approach for improving quality of synthesized speech is presented. The input speech or residual is first separated into a voiced portion and a noise portion. The voice portion is coded using CELP methods. The noise portion of the input speech may be estimated at the decoder since it contains minimal voiced speech components. The separation is frequency dependent and is adaptive to the input speech. The separation may be accomplished using a lowpass/highpass filter combination. The information regarding bandwidth of the lowpass/highpass is presented to the decoder to facilitate reproduction of the noise portion of the speech.
    Type: Grant
    Filed: March 11, 2004
    Date of Patent: May 5, 2009
    Assignee: Mindspeed Technologies, Inc.
    Inventor: Yang Gao
  • Patent number: 7505950
    Abstract: Systems and methods are provided for performing soft alignment in Gaussian mixture model (GMM) based and other vector transformations. Soft alignment may assign alignment probabilities to source and target feature vector pairs. The vector pairs and associated probabilities may then be used calculate a conversion function, for example, by computing GMM training parameters from the joint vectors and alignment probabilities to create a voice conversion function for converting speech sounds from a source speaker to a target speaker.
    Type: Grant
    Filed: April 26, 2006
    Date of Patent: March 17, 2009
    Assignee: Nokia Corporation
    Inventors: Jilei Tian, Jani Nurminen, Victor Popa
  • Patent number: 7505594
    Abstract: A method and apparatus for controlling a discontinuous transmission process. Audio information is digitized and provided to a vocoder. A voice activity level is determined from the digitized audio signal, and if voice activity is present, active vocoder frames are generated at a predetermined output rate. If voice activity is not detected, inactive vocoder frames are generated. During transitions between periods of speech activity and speech inactivity, transition frames are generated, the transition frames comprising background noise information.
    Type: Grant
    Filed: December 19, 2000
    Date of Patent: March 17, 2009
    Assignee: QUALCOMM Incorporated
    Inventor: Anthony Mauro
  • Patent number: 7478041
    Abstract: Provided is a method for canceling background noise of a sound source other than a target direction sound source in order to realize highly accurate speech recognition, and a system using the same. In terms of directional characteristics of a microphone array, due to a capability of approximating a power distribution of each angle of each of possible various sound source directions by use of a sum of coefficient multiples of a base form angle power distribution of a target sound source measured beforehand by base form angle by using a base form sound, and power distribution of a non-directional background sound by base form, only a component of the target sound source direction is extracted at a noise suppression part. In addition, when the target sound source direction is unknown, at a sound source localization part, a distribution for minimizing the approximate residual is selected from base form angle power distributions of various sound source directions to assume a target sound source direction.
    Type: Grant
    Filed: March 12, 2003
    Date of Patent: January 13, 2009
    Assignee: International Business Machines Corporation
    Inventors: Osamu Ichikawa, Tetsuya Takiguchi, Masafumi Nishimura
  • Patent number: 7478040
    Abstract: A method for adaptive long-term filtering of an audio signal, such as a decoded speech signal. The method includes measuring a smoothed periodicity of an audio signal segment, such as an audio frame, wherein the smoothed periodicity is measured by low-pass filtering an instantaneous periodicity of the audio signal segment. The periodicity of the audio signal segment is then increased in a manner that depends upon whether the smoothed periodicity is less than a predetermined threshold. By utilizing a smoothed periodicity measurement in this fashion, more accurate control of the post-filter is provided as compared to conventional solutions. Additionally, the method includes deriving filters by interpolating between filter responses of adjacent audio signal segments to minimize distortion at segment boundaries.
    Type: Grant
    Filed: October 20, 2004
    Date of Patent: January 13, 2009
    Assignee: Broadcom Corporation
    Inventors: Jes Thyssen, Juin-Hwey Chen
  • Patent number: 7457746
    Abstract: There is provided a pitch lag predictor for use by a speech decoder to generate a predicted pitch lag parameter. The pitch lag predictor comprises a summation calculator configured to generate a first summation based on a plurality of previous pitch lag parameters, and a second summation based on a plurality of previous pitch lag parameters and a position of each of the plurality of previous pitch lag parameters with respect to the predicted pitch lag parameter; a coefficient calculator configured to generate a first coefficient using a first equation based on the first summation and the second summation, and a second coefficient using a second equation based on the first summation and the second summation, wherein the first equation is different than the second equation; and a predictor configured to generate the predicted pitch lag parameter based on the first coefficient and the second coefficient.
    Type: Grant
    Filed: March 20, 2006
    Date of Patent: November 25, 2008
    Assignee: Mindspeed Technologies, Inc.
    Inventor: Yang Gao
  • Patent number: 7424427
    Abstract: An audio classification system classifies sounds in an audio stream as belonging to one of a relatively small number of classes. The audio classification system includes a signal analysis component [301] and a decoder [302]. The decoder [302] includes a number of models [310-316] for performing the audio classifications. In one implementation, the possible classifications include: vowels, fricatives, narrowband, wideband, coughing, gender, and silence. The classified audio may be used to enhance speech recognition of the audio stream.
    Type: Grant
    Filed: October 16, 2003
    Date of Patent: September 9, 2008
    Assignees: Verizon Corporate Services Group Inc., BBN Technologies Corp.
    Inventors: Daben Liu, Francis G. Kubala
  • Patent number: 7412379
    Abstract: Techniques utilising Time Scale Modification (TSM) of signals are described. The signal is analysed and divided into frames of similar signal types. Techniques specific to the signal type are then applied to the frames thereby optimising the modification process. The method of the present invention enables TSM of different audio signal parts to be realized using different methods, and a system for effecting said method is also described.
    Type: Grant
    Filed: April 2, 2002
    Date of Patent: August 12, 2008
    Assignee: Koninklijke Philips Electronics N.V.
    Inventors: Rakesh Taori, Andreas Johannes Gerrits, Dzevdet Burazerovic
  • Publication number: 20080167863
    Abstract: The present invention relates to an apparatus and method of improving intelligibility of a voice signal. A method of improving intelligibility of a voice signal according to an embodiment of the present invention includes analyzing a background noise signal on a call receiving side, classifying a received voice signal into a silence signal, an unvoiced sound signal, and a voiced sound signal, and intensifying the classified unvoiced sound signal and voiced sound signal on the basis of the analyzed background noise signal on the call receiving side.
    Type: Application
    Filed: November 16, 2007
    Publication date: July 10, 2008
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Chang-kyu Choi, Kwang-il Hwang, Sun-gi Hong, Young-hun Sung, Yeun-bae Kim, Yong Kim, Sang-hoon Lee, Hong Jeong
  • Patent number: 7386444
    Abstract: Hybrid linear predictive speech coding system with phase alignment predictive quantization zero phase alignment of speech prior to waveform coding aligns synthesized speech frames of a waveform coder with frames synthesized with a parametric coder. Inter-frame interpolation of LP coefficients suppresses artifacts in resultant synthesized speech frames.
    Type: Grant
    Filed: January 30, 2004
    Date of Patent: June 10, 2008
    Assignee: Texas Instruments Incorporated
    Inventor: Jacek Stachurski
  • Patent number: 7376557
    Abstract: A privacy apparatus adds a privacy sound based on a speaker's own voice into the environment, thereby confusing listeners as to which of the sounds is the real source. This permits disruption of the ability to understand the source speech of the user by eliminating segregation cues that the auditory system uses to interpret speech. The privacy apparatus minimizes segregation cues. The privacy apparatus is relatively quiet and thus easily acceptable in a typical open floor design office space. The privacy apparatus contains an A/D converter that converts the speech into a digital signal, a DSP that converts the digital signal into a privacy signal with pre-recorded speech fragments that are summed so that the speech fragments at least partly overlap one another, a D/A converter that converts the privacy signal into an output signal and one or more loudspeakers from which the output signal is emitted.
    Type: Grant
    Filed: January 4, 2006
    Date of Patent: May 20, 2008
    Assignee: Herman Miller, Inc.
    Inventors: Jeffrey Specht, Daniel Mapes-Riordan, William DeKruif
  • Publication number: 20080109218
    Abstract: A system and method for modeling speech in such a way that both voiced and unvoiced contributions can co-exist at certain frequencies. In various embodiments, three spectral bands (or bands of up to three different types) are used. In one embodiment, the lowest band or group of bands is completely voiced, the middle band or group of bands contains both voiced and unvoiced contributions, and the highest band or group of bands is completely unvoiced. The embodiments of the present invention may be used for speech coding and other speech processing applications.
    Type: Application
    Filed: September 13, 2007
    Publication date: May 8, 2008
    Inventors: Jani Nurminen, Sakari Himanen
  • Publication number: 20080109217
    Abstract: An apparatus for providing control of voicing in processed speech includes a spectra approximation element and a comparing element. The spectra approximation element may be configured to compute a voiced contribution and an unvoiced contribution for each of a reference speech sample and a processed speech sample. The comparing element may be configured to compare indications of voiced and unvoiced contributions of the reference speech sample and indications of voiced and unvoiced contributions of the processed speech sample, and to determine whether to correct at least one of the voiced or unvoiced contributions of the processed speech sample based on the comparison.
    Type: Application
    Filed: November 8, 2006
    Publication date: May 8, 2008
    Inventor: Jani K. Nurminen
  • Patent number: 7366658
    Abstract: An enhanced noise pre-processor in a speech codec smoothes channel energy estimate moving toward a first smoothing constant if a prior signal to noise ratio estimate for more than five channels are above a threshold and toward a second smaller smoothing constant otherwise. Forming a signal to noise ratio estimate for each channel includes conditionally boosting if a signal energy estimate is more than a predetermined factor of a noise energy estimate and signal to noise ratio estimates are above a threshold for more than five channels. The estimated signal to noise ratio is conditionally modified if two long term prediction coefficients are above a predetermined factor. The estimated signal to noise ratio is not modified and a voice metric is set greater than a voice metric threshold upon matching templates corresponding to the fricative and nasal speech sounds. An adaptive minimum channel gain is chosen based on a current signal to noise ratio estimate.
    Type: Grant
    Filed: December 11, 2006
    Date of Patent: April 29, 2008
    Assignee: Texas Instruments Incorporated
    Inventors: Pratibha Moogi, Chanaveeragouda Virupaxagouda Goudar
  • Publication number: 20080077399
    Abstract: There is provided a low-frequency-band voice reconstructing device. A voice signal from which a signal in a low-frequency band is removed is inputted to the device and the device reconstructs the signal in the low frequency band based on the input voice signal. The device comprises a first portion for extracting part of harmonic components of a pitch signal of voice from the input voice signal, a second portion for squaring a signal extracted by the first portion, a third portion for extracting a signal of a pitch frequency and harmonic signals of a lower limit frequency or below of the input voice signal, from the signal obtained by the second portion, and a fourth portion for correcting an amplitude level of the signal extracted by the third portion.
    Type: Application
    Filed: September 20, 2007
    Publication date: March 27, 2008
    Applicant: Sanyo Electric Co., Ltd.
    Inventor: Masahiro Yoshida
  • Patent number: 7343284
    Abstract: A method for discriminating noise from signal in a noise-contaminated signal involves decomposing a frame of samples of the signal into decorrelated components, and using a difference between probability distributions of the noise contributions and the signal contributions to identify signal and noise. A Gaussian distribution is used to determine whether the components are only noise whereas a Laplacian distribution is used to determine whether the components contain the signal. Such discrimination may be used in speech enhancement or voice activity detection apparatus.
    Type: Grant
    Filed: July 17, 2003
    Date of Patent: March 11, 2008
    Assignee: Nortel Networks Limited
    Inventors: Saeed Gazor, Mohamed El-Hennawey
  • Patent number: 7337107
    Abstract: Pitch estimation and classification into voiced, unvoiced and transitional speech were performed by a spectro-temporal auto-correlation technique. A peak picking formula was then employed. A weighting function was then applied to the power spectrum. The harmonics weighted power spectrum underwent mel-scaled band-pass filtering, and the log-energy of the filter's output was discrete cosine transformed to produce cepstral coefficients. A within-filter cubic-root amplitude compression was applied to reduce amplitude variation without compromise of the gain invariance properties.
    Type: Grant
    Filed: October 2, 2001
    Date of Patent: February 26, 2008
    Assignee: The Regents of the University of California
    Inventors: Kenneth Rose, Liang Gu
  • Patent number: 7337108
    Abstract: An adaptive “temporal audio scaler” is provided for automatically stretching and compressing frames of audio signals received across a packet-based network. Prior to stretching or compressing segments of a current frame, the temporal audio scaler first computes a pitch period for each frame for sizing signal templates used for matching operations in stretching and compressing segments. Further, the temporal audio scaler also determines the type or types of segments comprising each frame. These segment types include “voiced” segments, “unvoiced” segments, and “mixed” segments which include both voiced and unvoiced portions. The stretching or compression methods applied to segments of each frame are then dependent upon the type of segments comprising each frame. Further, the amount of stretching and compression applied to particular segments is automatically variable for minimizing signal artifacts while still ensuring that an overall target stretching or compression ratio is maintained for each frame.
    Type: Grant
    Filed: September 10, 2003
    Date of Patent: February 26, 2008
    Assignee: Microsoft Corporation
    Inventors: Dinei Florencio, Philip Chou, Li-Wei He
  • Patent number: 7295982
    Abstract: The present invention relates to a system and method for automatically verifying that a message received from a user is intelligible. In an exemplary embodiment, a message is received from the user. A speech level of the user's message may be measured and compared to a pre-determined speech level threshold to determine whether the measured speech level is below the pre-determined speech level threshold. A signal-to-noise ratio of the user's message may be measured and compared to a pre-determined signal-to-noise ratio threshold to determine whether the measured signal-to-noise ratio of the message is below the pre-determined signal-to-noise ratio threshold. An estimate of intelligibility for the user's message may be calculated and compared to an intelligibility threshold to determine whether the calculated estimate of intelligibility is below the intelligibility threshold.
    Type: Grant
    Filed: November 19, 2001
    Date of Patent: November 13, 2007
    Assignee: AT&T Corp.
    Inventors: Harvey S. Cohen, Randy G. Goldberg, Kenneth H. Rosen
  • Patent number: 7277916
    Abstract: A system for emulating interaction with an interactive voice response unit is provided. The system comprises, a client node connected to the network, the client node soliciting interaction with the interactive voice response unit and a proxy server node connected to the network, the server node accessible to client node, the interactive voice response unit accessible to the server node. A connection is established between the client node and the proxy server node, the proxy server node accepts data from the client node and translates the data into a format for interacting with the interactive voice response unit whereupon the data is then propagated to the interactive voice response unit. Response data resulting from the input data at the interactive voice response unit is propagated to the proxy server node whereupon the response data is translated into a format for dissemination at the client node and propagated thereto.
    Type: Grant
    Filed: October 5, 2004
    Date of Patent: October 2, 2007
    Assignee: Genesys Telecommunications Laboratories, Inc.
    Inventor: Marcialito Nuestro
  • Patent number: 7272551
    Abstract: Estimating a speech signal pitch frequency by determining a speech signal frame line spectrum including spectral lines having respective line amplitudes and frequencies, selecting a predefined number of spectral lines having highest amplitudes, fewer then the total number of the spectral lines, calculating a preliminary utility function over a pitch frequency range to provide a preliminary utility function value for each pitch frequency in the range measuring the compatibility of the selected spectral lines with the pitch frequency, identifying a predefined number of preliminary pitch frequency candidates at least partly responsive to the preliminary utility function, where each candidate is a local maximum of the preliminary utility function, calculating a final utility score for each of the candidates, and selecting any of the candidates to be an estimated pitch frequency of the speech signal at least partly responsive to any of the final utility scores.
    Type: Grant
    Filed: February 24, 2003
    Date of Patent: September 18, 2007
    Assignee: International Business Machines Corporation
    Inventor: Alexander Sorin
  • Patent number: 7266493
    Abstract: There is provided a method of selecting a pitch lag value from a plurality of pitch lag candidates for coding a speech signal. The method comprises identifying the plurality of pitch lag candidates from a frame of the speech signal using correlation; classifying the speech signal to obtain a voice classification; determining whether one or more of the plurality of pitch lag candidates are in a temporal neighborhood of one or more previous pitch lag values; favoring the one or more of the plurality of pitch lag candidates determined to be in the temporal neighborhood of the one or more previous pitch lag values, by adaptive weighting, over other ones of the plurality of pitch lag candidates; and selecting the pitch lag value based on the voice classification and the one or more of the plurality of pitch lag candidates favored by the adaptive weighting.
    Type: Grant
    Filed: October 13, 2005
    Date of Patent: September 4, 2007
    Assignee: Mindspeed Technologies, Inc.
    Inventors: Huan-Yu Su, Yang Gao
  • Publication number: 20070185709
    Abstract: A method and apparatus of estimating a voicing for speech recognition by using local spectral information. The voicing estimation method for speech recognition includes performing a Fourier transform on input voice signals after performing pre-processing on the input voice signals. The method further includes detecting peaks in the input voice signals after smoothing the input voice signals. The method also includes computing every frequency bound associated with the detected peaks, and determining a class of a voicing according to each computed frequency bound.
    Type: Application
    Filed: January 25, 2007
    Publication date: August 9, 2007
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Kwang Cheol Oh, Jae-Hoon Jeong
  • Patent number: 7246059
    Abstract: The invention provides a method and system for dynamically estimating background noise. The system includes a portable communication device, a vocoder, and a voice activated detector. Based on information received by the portable communication device, the vocoder determines parameters related to incoming information including a voicing mode indicative of the periodicity of incoming information. The voice activated detector then compares the voicing mode to a threshold to determine whether a background noise estimate should be updated.
    Type: Grant
    Filed: July 24, 2003
    Date of Patent: July 17, 2007
    Assignee: Motorola, Inc.
    Inventors: Ali Behboodian, Pratik Desai, Chin Pan Wong
  • Patent number: 7243062
    Abstract: A method (200) and apparatus (100) for segmenting a sequence of audio samples into homogeneous segments (550 and 555) are disclosed. The method (200) forms a sequence of frames (701 to 704) along the sequence of audio samples, and extracts, for each frame, a data feature. The data features form a sequence of data features. Transition points in the sequence of data features are thin detected by applying the Bayesian Information Criterion to the sequence of data features. The transition points define the homogeneous segments (550 and 555). Preferably the data feature is single-dimensional and a leptokurtic distribution is used as an event model in the Bayesian Information Criterion.
    Type: Grant
    Filed: October 25, 2002
    Date of Patent: July 10, 2007
    Assignee: Canon Kabushiki Kaisha
    Inventor: Timothy John Wark
  • Patent number: 7233899
    Abstract: Computer comparison of one or more dictionary entries with a sound record of a human utterance to determine whether and where each dictionary entry is contained within the sound record. The record is segmented, and for each vocalized segment a spectrogram is obtained, and for other segments symbolic and numeric data are obtained. The spectrogram of a vocalized segment is then processed using a method selected from a group consisting of a triple time transform, a triple frequency transform, a linear-piecewise-linear transform, and combinations thereof, to decrease noise and to eliminate variations in pronunciation. Each entry in the dictionary is then compared with every sequence of segments of substantially the same length in the sound record. The comparison takes into account the formant profiles within each vocalized segment and symbolic and numeric data for other segments are obtained in the record and in the dictionary entries.
    Type: Grant
    Filed: March 7, 2002
    Date of Patent: June 19, 2007
    Inventors: Vitaliy S. Fain, Samuel V. Fain
  • Patent number: 7233894
    Abstract: A pitch estimation system including a low-frequency band noise detector (LBND) operative to detect the presence of low-frequency band noise in a first audio frame, a frequency-domain pitch estimator operative to calculate a pitch estimation of a second audio frame from at least one spectral peak in the second audio frame, and a pitch estimator controller operative to cause the pitch estimator to exclude from the spectrum of the second audio frame at least one low-frequency spectral peak below a predefined threshold where low-frequency band noise is present in the first audio frame.
    Type: Grant
    Filed: February 24, 2003
    Date of Patent: June 19, 2007
    Assignee: International Business Machines Corporation
    Inventor: Alexander Sorin
  • Patent number: 7228271
    Abstract: The telephone apparatus of the present invention comprises a first voice band expander for generating a voiced signal frequency component by shifting the frequency of the voice signal received, a second voice band expander for generating a voiceless signal frequency component by shifting the frequency of the voice signal received, and a voice composer for composing the voice signal received, the output of the first voice band expander, and the output of the second voice band expander, which is able to output clear voices in aural communication.
    Type: Grant
    Filed: December 23, 2002
    Date of Patent: June 5, 2007
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Toshimichi Tokuda, Takashi Kimura
  • Patent number: 7222070
    Abstract: Linear predictive speech coding system with classification of frames and a hybrid coder using both waveform coding and parametric coding for different classes of frames. Phase alignment for a parametric coder aligns synthesized speech frames with adjacent waveform coder synthesized frames. Zero phase alignment of speech prior to waveform coding aligns synthesized speech frames of a waveform coder with frames synthesized with a parametric coder. Inter-frame interpolation of LP coefficients suppresses artifacts in resultant synthesized speech frames.
    Type: Grant
    Filed: September 22, 2000
    Date of Patent: May 22, 2007
    Assignee: Texas Instruments Incorporated
    Inventors: Jacek Stachurski, Alan V. McCree
  • Patent number: 7206739
    Abstract: A method for searching an excitation (or fixed) codebook in a speech coding system. In a speech coding system including a synthesis filter for synthesizing a speech signal, a fixed codebook searcher according to the present invention segments a speech signal frame into a plurality of subframes to generate an excitation signal to be used in a synthesis filter, segments again each of the subframes into a plurality of subgroups, and searches the respective subframes each comprised of a plurality of pulse position/amplitude combinations for pulses. The fixed codebook searcher searches the respective subgroups for a predetermine number of pulses having non-zero amplitude, and generates the searched pulses as an initial vector. Next, the fixed codebook searcher selects a pulse combination including at least one pulse among the pulses of the initial vector, and then substitutes pulses of the selected pulse combination for pulses in other positions in the subgroups.
    Type: Grant
    Filed: May 23, 2002
    Date of Patent: April 17, 2007
    Assignee: Samsung Electronics Co., Ltd.
    Inventor: Dae-Ryong Lee
  • Patent number: 7191123
    Abstract: The gain smoothing method and device modify the amplitude of an innovative codevector in relation to background noise present in a previously sampled wideband signal. The gain smoothing device comprises a gain smoothing calculator for calculating a smoothing gain in response to a factor representative of voicing in the sampled wideband signal, a factor representative of the stability of a set of linear prediction filter coefficients, and an innovative codebook gain. The gain smoothing device also comprises an amplifier for amplifying the innovative codevector with the smoothing gain to thereby produce a gain-smoothed innovative codevector. The function of the gain-smoothing device improves the perceived synthesized signal when background noise is present in the sampled wideband signal.
    Type: Grant
    Filed: November 17, 2000
    Date of Patent: March 13, 2007
    Assignee: Voiceage Corporation
    Inventors: Bruno Bessette, Redwan Salami, Roch Lefebvre
  • Patent number: 7191128
    Abstract: The present invention relates to method and system for distinguishing speech from music in a digital audio signal in real time.
    Type: Grant
    Filed: February 21, 2003
    Date of Patent: March 13, 2007
    Assignee: LG Electronics Inc.
    Inventors: Mikhael A. Sall, Sergei N. Gramnitskiy, Alexandr L. Maiboroda, Victor V. Redkov, Anatoli I. Tikhotsky, Andrei B. Viktorov
  • Patent number: 7171357
    Abstract: A voice activity detector (100) filters (204) out noise energy and then computes a high-frequency (2400 Hz to 4000 Hz) versus low-frequency (100 Hz to 2400 Hz) signal energy ratio (224), total voiceband (100 Hz to 4000 Hz) signal energy (214), and signal periodicity (208) on successive frames of signal samples. Signal periodicity is determined by estimating the pitch period (206) of the signal, determining a gain value of the signal over the pitch period as a function of the estimated pitch period, and estimating a periodicity of the signal over the pitch period as a function of the estimated pitch period and the gain value.
    Type: Grant
    Filed: March 21, 2001
    Date of Patent: January 30, 2007
    Assignee: Avaya Technology Corp.
    Inventor: Simon Daniel Boland
  • Patent number: 7149683
    Abstract: The present invention relates to a method and device for quantizing linear prediction parameters in variable bit-rate sound signal coding, in which an input linear prediction parameter vector is received, a sound signal frame corresponding to the input linear prediction parameter vector is classified, a prediction vector is computed, the computed prediction vector is removed from the input linear prediction parameter vector to produce a prediction error vector, and the prediction error vector is quantized. Computation of the prediction vector comprises selecting one of a plurality of prediction schemes in relation to the classification of the sound signal frame, and processing the prediction error vector through the selected prediction scheme. The present invention further relates to a method and device for dequantizing linear prediction parameters in variable bit-rate sound signal decoding.
    Type: Grant
    Filed: January 19, 2005
    Date of Patent: December 12, 2006
    Assignee: Nokia Corporation
    Inventor: Milan Jelinek
  • Patent number: 7146310
    Abstract: A low-bit-rate coding technique for unvoiced segments of speech includes the steps of extracting high-time-resolution energy coefficients from a frame of speech, quantizing the energy coefficients, generating a high-time-resolution energy envelope from the quantized energy coefficients, and reconstituting a residue signal by shaping a randomly generated noise vector with quantized values of the energy envelope. The energy envelope may be generated with a linear interpolation technique. A post-processing measure may be obtained and compared with a predefined threshold to determine whether the coding algorithm is performing adequately.
    Type: Grant
    Filed: September 29, 2004
    Date of Patent: December 5, 2006
    Assignee: Qualcomm, Incorporated
    Inventors: Amitava Das, Sharath Manjunath
  • Patent number: 7139700
    Abstract: Linear predictive speech coding system with classification of frames and a hybrid coder using both waveform coding and parametric coding for different classes of frames. Phase alignment for a parametric coder aligns synthesized speech frames with adjacent waveform coder synthesized frames. Zero phase alignment of speech prior to waveform coding aligns synthesized speech frames of a waveform coder with frames synthesized with a parametric coder. Inter-frame interpolation of LP coefficients suppresses artifacts in resultant synthesized speech frames.
    Type: Grant
    Filed: September 22, 2000
    Date of Patent: November 21, 2006
    Assignee: Texas Instruments Incorporated
    Inventors: Jacek Stachurski, Alan V. McCree
  • Patent number: 7127392
    Abstract: The present invention is a device for and method of detecting voice activity. First, the AM envelope of a segment of a signal of interest is determined. Next, the number of times the AM envelope crosses a user-definable threshold is determined. If there are no crossings, the segment is identified as non-speech. next, the number of points on the AM envelope within a user-definable range is determined. If there are less than a user-definable number of points within the range, the segment is identified as non-speech. Next, the mean, variance, and power ratio of the normalized spectral content of the AM envelope is found and compared to the same for known speech and non-speech. The segment is identified as being of the same type as the known speech or non-speech to which it most closely compares. These steps are repreated for each signal segment of interest.
    Type: Grant
    Filed: February 12, 2003
    Date of Patent: October 24, 2006
    Assignee: The United States of America as represented by the National Security Agency
    Inventor: David C. Smith
  • Patent number: 7120576
    Abstract: A method for detecting music in a speech signal having a plurality of frames. The method comprises defining a music threshold value for a first parameter extracted from a frame of the speech signal, defining a background noise threshold value for the first parameter, and defining an unsure threshold value for the first parameter. The unsure threshold value falls between the music threshold value and the background noise threshold value. If the first parameter falls between the music threshold value and the background noise threshold value, the speech signal is classified as music or background noise based on analyzing a plurality of first parameters extracted from the plurality of frames.
    Type: Grant
    Filed: November 4, 2004
    Date of Patent: October 10, 2006
    Assignee: Mindspeed Technologies, Inc.
    Inventor: Yang Gao
  • Patent number: 7120578
    Abstract: Speech coding systems include multi-rate speech codecs having an encoder and a decoder. Silence description coding for multi-rate speech coding systems that employ discontinued transmission is performed in either the encoder or the decoder of the multi-rate speech codec. It may also be performed in a distributed manner wherein it is performed partially in the encoder and partially in the decoder. The silence description coding is performed on a speech signal having a substantially non-speech-like characteristic. Voice activity detection classifies the speech signal as being either substantially speech-like or substantially non-speech-like. The silence description coding is selected from a plurality of coding modes. In certain embodiments of the invention, the silence description coding is a source coding mode that operates at a bit rate that fits within a bit rate budget as determined by all of the available source coding modes within the plurality of coding modes.
    Type: Grant
    Filed: April 24, 2001
    Date of Patent: October 10, 2006
    Assignee: Mindspeed Technologies, Inc.
    Inventors: Jes Thyssen, Huan-yu Su, Adil Benyassine, Eyal Shlomot
  • Patent number: 7117150
    Abstract: A first filter (2061 in FIG. 1) calculates a long-time average of first change quantities based on a difference between a line spectral frequency of an input voice signal and a long-time average thereof. A second filter (2062 in FIG. 1) calculates a long-time average of second change quantities based on a difference between a whole band energy of the input voice signal and a long-time average thereof. A third filter (2063 in FIG. 1) calculates a long-time average of third change quantities based on a difference between a low band energy of the input voice signal and a long-time average thereof. A fourth filter (2064 in FIG. 1) calculates a long-time average of fourth change quantities based on a difference between a zero cross number of the input voice signal and a long-time average thereof. A voice/non-voice determining circuit (1040 in FIG.
    Type: Grant
    Filed: May 31, 2001
    Date of Patent: October 3, 2006
    Assignee: NEC Corporation
    Inventor: Atsushi Murashima
  • Patent number: 7113522
    Abstract: Wideband speech signals must be converted to narrowband speech signals if the transmission medium or the destination terminal is constructed with narrowband constraints. A typical wideband-to-narrowband conversion method is the elimination of frequencies above 3400 Hz using a low pass filter and a down sampler. However, this method produces a muffled speech sound since the resulting narrowband signal has a flat frequency response. Methods and apparatus are presented herein to enhance the acoustic quality of a wideband-to-narrowband converted signal. A bandwidth switching filter is used to emphasize a mid-range frequency portion of the wideband signal so that the resulting narrowband signal has a non-flat frequency spectrum.
    Type: Grant
    Filed: January 24, 2001
    Date of Patent: September 26, 2006
    Assignee: QUALCOMM, Incorporated
    Inventors: Khaled H. El-Maleh, Arasanipalai K. Ananthapadmanabhan, Andrew P. DeJaco
  • Patent number: 7103349
    Abstract: The invention is a system, a method of transmitting messages selectively as text or non-text from an entity (104) in a network (100 and 102), and an entity in a network. A system in accordance with the invention includes at least one terminal (16); a network containing the at least one terminal; an entity in the network which provides messages selectively as text or non-text to the network in a speech encoded form; and wherein the messages are transmitted in the speech encoded form by the network to the at least one terminal which reproduces the messages to a user thereof in either a text form or by a sound reproduction device of the at least one terminal.
    Type: Grant
    Filed: May 2, 2002
    Date of Patent: September 5, 2006
    Assignee: Nokia Corporation
    Inventors: Teemu Himanen, Pasi Ylinen
  • Patent number: 7099823
    Abstract: A coded voice signal format converting apparatus is provided which is capable of converting a signal format of a coded voice signal by computations in reduced amounts. In the coded voice signal format converting apparatus, in a second coding device is employed a quantizing accuracy information converting section to which a first quantizing accuracy information output from a quantizing accuracy information decoding section in a first decoding device is input. Second mapping signal is quantized by a mapped signal coding section to produce a coded voice signal and the first quantizing accuracy information is converted so that it can be used by mapped signal coding section to determine a second quantizing accuracy information.
    Type: Grant
    Filed: February 27, 2001
    Date of Patent: August 29, 2006
    Assignee: NEC Corporation
    Inventor: Yuichiro Takamizawa
  • Patent number: 7089178
    Abstract: A distributed voice recognition system and method for obtaining acoustic features and speech activity at multiple frequencies by extracting high frequency components thereof on a device, such as a subscriber station and transmitting them to a network server having multiple stream processing capability, including cepstral feature processing, MLP nonlinear transformation processing, and multiband temporal pattern architecture processing. The features received at the network server are processed using all three streams, wherein each of the three streams provide benefits not available in the other two, thereby enhancing feature interpretation. Feature extraction and feature interpretation may operate at multiple frequencies, including but not limited to 8 kHz, 11 kHz, and 16 kHz.
    Type: Grant
    Filed: April 30, 2002
    Date of Patent: August 8, 2006
    Assignee: Qualcomm Inc.
    Inventors: Harinath Garudadri, Sunil Sivadas, Hynek Hermansky, Nelson H. Morgan, Charles C. Wooters, Andre Gustavo Adami, Maria Carmen Benitez Ortuzar, Lukas Burget, Stephane N. Dupont, Frantisek Grezl, Pratibha Jain, Sachin Kajarekar, Petr Motlicek
  • Patent number: 7080017
    Abstract: A frequency compander for improving the frequency response of a telephone line when used for remote broadcasting. The inventive device comprises an encoder for compressing the frequency spectrum of an audio signal and a decoder for expanding the signal back to its original spectrum. Preferably the encoder comprises: an anti-aliasing filter; an A/D converter for digitizing incoming audio; a DSP for compressing the audio; and a D/A converter for outputting compressed audio to the phone line. The decoder comprises: an anti-aliasing filter; an A/D converter for digitizing the incoming compressed signal; a DSP for restoring the original audio; and a D/A converter for outputting program audio. In a preferred embodiment, encoding and decoding are performed in the frequency domain. In another preferred embodiment, encoding and decoding are performed in the time domain using trigonometric transformations.
    Type: Grant
    Filed: May 31, 2002
    Date of Patent: July 18, 2006
    Inventors: Ken Scott Fisher, Kevin Cotton Baxter, Fred H. Holmes
  • Patent number: 7065485
    Abstract: The method and preprocessor enhances the intelligibility of narrowband speech without essentially lengthening the overall time duration of the signal. Both spectral enhancements and variable-rate time-scaling procedures are implemented to improve the salience of initial consonants, particularly the perceptually important formant transitions. Emphasis is transferred from the dominating vowel to the preceding consonant through adaptation of the phoneme timing structure. In a further embodiment, the technique is applied as a preprocessor to a speech coder.
    Type: Grant
    Filed: January 9, 2002
    Date of Patent: June 20, 2006
    Assignee: AT&T Corp
    Inventors: Nicola R. Chong-White, Richard Vandervoort Cox