Voiced Or Unvoiced Patents (Class 704/208)

Electronic apparatus and computer readable medium recorded voice operating program

Patent number: 7555310

Abstract: An electronic apparatus configured, on the assumption of its sharing by a plurality of users, to provide a plurality of functions, in which normal operations and interrupt operations based on a user's voice can be performed is provided. The apparatus comprises: a voice inputting unit for inputting a user's voice; a storing unit for storing at least a predetermined type of operation for each of a plurality of users; a voice decoding unit for decoding input voice information; a determining unit for determining if the decoded voice describes a type of operation or describes a type of operation and a word “interrupt”; a command outputting unit for outputting: a command which allows a processing involved in the type of operation to be executed, at least when the determining unit determines that the decoded voice describes a type of operation in the storing unit.

Type: Grant

Filed: November 9, 2006

Date of Patent: June 30, 2009

Assignee: Kyocera Mita Corporation

Inventors: Kentarou Sakuramoto, Hideki Hayashi
Apparatus and method for providing hands-free operation of a device

Patent number: 7542787

Abstract: The present invention provides an apparatus and method for providing hands-free operation of a device. A hands-free adapter is provided that communicates with a device and a headset. The hands-free adapter allows a user to use voice commands so that the user does not have to handle the device. The hands-free adapter receives voice commands from the headset and translates the voice commands to commands recognized by the device. The hands-free adapter also monitors the device to detect device events and provides notice of the events to the user via the headset.

Type: Grant

Filed: February 14, 2006

Date of Patent: June 2, 2009

Assignee: AT&T Intellectual Property I, L. P.

Inventors: Lan Zhang, Joseph E. Page, Jr., Barrett M. Kreiner
Method of comfort noise generation for speech communication

Patent number: 7536298

Abstract: An embodiment of the invention improves upon the International Telecommunication Union's ITU-T G.729 Annex B comfort noise generation algorithm by reducing the computational complexity of the comfort noise generation algorithm. The computational complexity is reduced by reusing pre-computed random Gaussian noise samples for each non active voice frame versus calculating new random Gaussian noise samples for each non active voice frame as described by Annex B.

Type: Grant

Filed: March 15, 2004

Date of Patent: May 19, 2009

Assignee: Intel Corporation

Inventors: Permachanahalli S Ramkumar, Shashi Shankar Hosur
VOICING DETECTION MODULES IN A SYSTEM FOR AUTOMATIC TRANSCRIPTION OF SUNG OR HUMMED MELODIES

Publication number: 20090125301

Abstract: The technology disclosed relates to audio signal processing. It includes a series of modules that individually are useful to solve audio signal processing problems. Among the problems addressed are buzz removal, selecting a pitch candidate among pitch candidates based on local continuity of pitch and regional octave consistency, making small adjustments in pitch, ensuring that a selected pitch is consistent with harmonic peaks, determining whether a given frame or region of frames includes harmonic, voiced signal, extracting harmonics from voice signals and detecting vibrato. One environment in which these modules are useful is transcribing singing or humming into a symbolic melody. Another environment that would usefully employ some of these modules is speech processing. Some of the modules, such as buzz removal, are useful in many other environments as well.

Type: Application

Filed: November 3, 2008

Publication date: May 14, 2009

Applicant: Melodis Inc.

Inventors: Aaron Master, Seyed Majid Emami
Signal decomposition of voiced speech for CELP speech coding

Patent number: 7529664

Abstract: An approach for improving quality of synthesized speech is presented. The input speech or residual is first separated into a voiced portion and a noise portion. The voice portion is coded using CELP methods. The noise portion of the input speech may be estimated at the decoder since it contains minimal voiced speech components. The separation is frequency dependent and is adaptive to the input speech. The separation may be accomplished using a lowpass/highpass filter combination. The information regarding bandwidth of the lowpass/highpass is presented to the decoder to facilitate reproduction of the noise portion of the speech.

Type: Grant

Filed: March 11, 2004

Date of Patent: May 5, 2009

Assignee: Mindspeed Technologies, Inc.

Inventor: Yang Gao
Soft alignment based on a probability of time alignment

Patent number: 7505950

Abstract: Systems and methods are provided for performing soft alignment in Gaussian mixture model (GMM) based and other vector transformations. Soft alignment may assign alignment probabilities to source and target feature vector pairs. The vector pairs and associated probabilities may then be used calculate a conversion function, for example, by computing GMM training parameters from the joint vectors and alignment probabilities to create a voice conversion function for converting speech sounds from a source speaker to a target speaker.

Type: Grant

Filed: April 26, 2006

Date of Patent: March 17, 2009

Assignee: Nokia Corporation

Inventors: Jilei Tian, Jani Nurminen, Victor Popa
Discontinuous transmission (DTX) controller system and method

Patent number: 7505594

Abstract: A method and apparatus for controlling a discontinuous transmission process. Audio information is digitized and provided to a vocoder. A voice activity level is determined from the digitized audio signal, and if voice activity is present, active vocoder frames are generated at a predetermined output rate. If voice activity is not detected, inactive vocoder frames are generated. During transitions between periods of speech activity and speech inactivity, transition frames are generated, the transition frames comprising background noise information.

Type: Grant

Filed: December 19, 2000

Date of Patent: March 17, 2009

Assignee: QUALCOMM Incorporated

Inventor: Anthony Mauro
Speech recognition apparatus, speech recognition apparatus and program thereof

Patent number: 7478041

Abstract: Provided is a method for canceling background noise of a sound source other than a target direction sound source in order to realize highly accurate speech recognition, and a system using the same. In terms of directional characteristics of a microphone array, due to a capability of approximating a power distribution of each angle of each of possible various sound source directions by use of a sum of coefficient multiples of a base form angle power distribution of a target sound source measured beforehand by base form angle by using a base form sound, and power distribution of a non-directional background sound by base form, only a component of the target sound source direction is extracted at a noise suppression part. In addition, when the target sound source direction is unknown, at a sound source localization part, a distribution for minimizing the approximate residual is selected from base form angle power distributions of various sound source directions to assume a target sound source direction.

Type: Grant

Filed: March 12, 2003

Date of Patent: January 13, 2009

Assignee: International Business Machines Corporation

Inventors: Osamu Ichikawa, Tetsuya Takiguchi, Masafumi Nishimura
Method for adaptive filtering

Patent number: 7478040

Abstract: A method for adaptive long-term filtering of an audio signal, such as a decoded speech signal. The method includes measuring a smoothed periodicity of an audio signal segment, such as an audio frame, wherein the smoothed periodicity is measured by low-pass filtering an instantaneous periodicity of the audio signal segment. The periodicity of the audio signal segment is then increased in a manner that depends upon whether the smoothed periodicity is less than a predetermined threshold. By utilizing a smoothed periodicity measurement in this fashion, more accurate control of the post-filter is provided as compared to conventional solutions. Additionally, the method includes deriving filters by interpolating between filter responses of adjacent audio signal segments to minimize distortion at segment boundaries.

Type: Grant

Filed: October 20, 2004

Date of Patent: January 13, 2009

Assignee: Broadcom Corporation

Inventors: Jes Thyssen, Juin-Hwey Chen
Pitch prediction for packet loss concealment

Patent number: 7457746

Abstract: There is provided a pitch lag predictor for use by a speech decoder to generate a predicted pitch lag parameter. The pitch lag predictor comprises a summation calculator configured to generate a first summation based on a plurality of previous pitch lag parameters, and a second summation based on a plurality of previous pitch lag parameters and a position of each of the plurality of previous pitch lag parameters with respect to the predicted pitch lag parameter; a coefficient calculator configured to generate a first coefficient using a first equation based on the first summation and the second summation, and a second coefficient using a second equation based on the first summation and the second summation, wherein the first equation is different than the second equation; and a predictor configured to generate the predicted pitch lag parameter based on the first coefficient and the second coefficient.

Type: Grant

Filed: March 20, 2006

Date of Patent: November 25, 2008

Assignee: Mindspeed Technologies, Inc.

Inventor: Yang Gao
Systems and methods for classifying audio into broad phoneme classes

Patent number: 7424427

Abstract: An audio classification system classifies sounds in an audio stream as belonging to one of a relatively small number of classes. The audio classification system includes a signal analysis component [301] and a decoder [302]. The decoder [302] includes a number of models [310-316] for performing the audio classifications. In one implementation, the possible classifications include: vowels, fricatives, narrowband, wideband, coughing, gender, and silence. The classified audio may be used to enhance speech recognition of the audio stream.

Type: Grant

Filed: October 16, 2003

Date of Patent: September 9, 2008

Assignees: Verizon Corporate Services Group Inc., BBN Technologies Corp.

Inventors: Daben Liu, Francis G. Kubala
Time-scale modification of signals

Patent number: 7412379

Abstract: Techniques utilising Time Scale Modification (TSM) of signals are described. The signal is analysed and divided into frames of similar signal types. Techniques specific to the signal type are then applied to the frames thereby optimising the modification process. The method of the present invention enables TSM of different audio signal parts to be realized using different methods, and a system for effecting said method is also described.

Type: Grant

Filed: April 2, 2002

Date of Patent: August 12, 2008

Assignee: Koninklijke Philips Electronics N.V.

Inventors: Rakesh Taori, Andreas Johannes Gerrits, Dzevdet Burazerovic
Apparatus and method of improving intelligibility of voice signal

Publication number: 20080167863

Abstract: The present invention relates to an apparatus and method of improving intelligibility of a voice signal. A method of improving intelligibility of a voice signal according to an embodiment of the present invention includes analyzing a background noise signal on a call receiving side, classifying a received voice signal into a silence signal, an unvoiced sound signal, and a voiced sound signal, and intensifying the classified unvoiced sound signal and voiced sound signal on the basis of the analyzed background noise signal on the call receiving side.

Type: Application

Filed: November 16, 2007

Publication date: July 10, 2008

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Chang-kyu Choi, Kwang-il Hwang, Sun-gi Hong, Young-hun Sung, Yeun-bae Kim, Yong Kim, Sang-hoon Lee, Hong Jeong
Hybrid speech coding and system

Patent number: 7386444

Abstract: Hybrid linear predictive speech coding system with phase alignment predictive quantization zero phase alignment of speech prior to waveform coding aligns synthesized speech frames of a waveform coder with frames synthesized with a parametric coder. Inter-frame interpolation of LP coefficients suppresses artifacts in resultant synthesized speech frames.

Type: Grant

Filed: January 30, 2004

Date of Patent: June 10, 2008

Assignee: Texas Instruments Incorporated

Inventor: Jacek Stachurski
Method and apparatus of overlapping and summing speech for an output that disrupts speech

Patent number: 7376557

Abstract: A privacy apparatus adds a privacy sound based on a speaker's own voice into the environment, thereby confusing listeners as to which of the sounds is the real source. This permits disruption of the ability to understand the source speech of the user by eliminating segregation cues that the auditory system uses to interpret speech. The privacy apparatus minimizes segregation cues. The privacy apparatus is relatively quiet and thus easily acceptable in a typical open floor design office space. The privacy apparatus contains an A/D converter that converts the speech into a digital signal, a DSP that converts the digital signal into a privacy signal with pre-recorded speech fragments that are summed so that the speech fragments at least partly overlap one another, a D/A converter that converts the privacy signal into an output signal and one or more loudspeakers from which the output signal is emitted.

Type: Grant

Filed: January 4, 2006

Date of Patent: May 20, 2008

Assignee: Herman Miller, Inc.

Inventors: Jeffrey Specht, Daniel Mapes-Riordan, William DeKruif
SYSTEM AND METHOD FOR MODELING SPEECH SPECTRA

Publication number: 20080109218

Abstract: A system and method for modeling speech in such a way that both voiced and unvoiced contributions can co-exist at certain frequencies. In various embodiments, three spectral bands (or bands of up to three different types) are used. In one embodiment, the lowest band or group of bands is completely voiced, the middle band or group of bands contains both voiced and unvoiced contributions, and the highest band or group of bands is completely unvoiced. The embodiments of the present invention may be used for speech coding and other speech processing applications.

Type: Application

Filed: September 13, 2007

Publication date: May 8, 2008

Inventors: Jani Nurminen, Sakari Himanen
Method, Apparatus and Computer Program Product for Controlling Voicing in Processed Speech

Publication number: 20080109217

Abstract: An apparatus for providing control of voicing in processed speech includes a spectra approximation element and a comparing element. The spectra approximation element may be configured to compute a voiced contribution and an unvoiced contribution for each of a reference speech sample and a processed speech sample. The comparing element may be configured to compare indications of voiced and unvoiced contributions of the reference speech sample and indications of voiced and unvoiced contributions of the processed speech sample, and to determine whether to correct at least one of the voiced or unvoiced contributions of the processed speech sample based on the comparison.

Type: Application

Filed: November 8, 2006

Publication date: May 8, 2008

Inventor: Jani K. Nurminen
Noise pre-processor for enhanced variable rate speech codec

Patent number: 7366658

Abstract: An enhanced noise pre-processor in a speech codec smoothes channel energy estimate moving toward a first smoothing constant if a prior signal to noise ratio estimate for more than five channels are above a threshold and toward a second smaller smoothing constant otherwise. Forming a signal to noise ratio estimate for each channel includes conditionally boosting if a signal energy estimate is more than a predetermined factor of a noise energy estimate and signal to noise ratio estimates are above a threshold for more than five channels. The estimated signal to noise ratio is conditionally modified if two long term prediction coefficients are above a predetermined factor. The estimated signal to noise ratio is not modified and a voice metric is set greater than a voice metric threshold upon matching templates corresponding to the fricative and nasal speech sounds. An adaptive minimum channel gain is chosen based on a current signal to noise ratio estimate.

Type: Grant

Filed: December 11, 2006

Date of Patent: April 29, 2008

Assignee: Texas Instruments Incorporated

Inventors: Pratibha Moogi, Chanaveeragouda Virupaxagouda Goudar
Low-frequency-band voice reconstructing device, voice signal processor and recording apparatus

Publication number: 20080077399

Abstract: There is provided a low-frequency-band voice reconstructing device. A voice signal from which a signal in a low-frequency band is removed is inputted to the device and the device reconstructs the signal in the low frequency band based on the input voice signal. The device comprises a first portion for extracting part of harmonic components of a pitch signal of voice from the input voice signal, a second portion for squaring a signal extracted by the first portion, a third portion for extracting a signal of a pitch frequency and harmonic signals of a lower limit frequency or below of the input voice signal, from the signal obtained by the second portion, and a fourth portion for correcting an amplitude level of the signal extracted by the third portion.

Type: Application

Filed: September 20, 2007

Publication date: March 27, 2008

Applicant: Sanyo Electric Co., Ltd.

Inventor: Masahiro Yoshida
Method and system for speech processing for enhancement and detection

Patent number: 7343284

Abstract: A method for discriminating noise from signal in a noise-contaminated signal involves decomposing a frame of samples of the signal into decorrelated components, and using a difference between probability distributions of the noise contributions and the signal contributions to identify signal and noise. A Gaussian distribution is used to determine whether the components are only noise whereas a Laplacian distribution is used to determine whether the components contain the signal. Such discrimination may be used in speech enhancement or voice activity detection apparatus.

Type: Grant

Filed: July 17, 2003

Date of Patent: March 11, 2008

Assignee: Nortel Networks Limited

Inventors: Saeed Gazor, Mohamed El-Hennawey
Perceptual harmonic cepstral coefficients as the front-end for speech recognition

Patent number: 7337107

Abstract: Pitch estimation and classification into voiced, unvoiced and transitional speech were performed by a spectro-temporal auto-correlation technique. A peak picking formula was then employed. A weighting function was then applied to the power spectrum. The harmonics weighted power spectrum underwent mel-scaled band-pass filtering, and the log-energy of the filter's output was discrete cosine transformed to produce cepstral coefficients. A within-filter cubic-root amplitude compression was applied to reduce amplitude variation without compromise of the gain invariance properties.

Type: Grant

Filed: October 2, 2001

Date of Patent: February 26, 2008

Assignee: The Regents of the University of California

Inventors: Kenneth Rose, Liang Gu
System and method for providing high-quality stretching and compression of a digital audio signal

Patent number: 7337108

Abstract: An adaptive “temporal audio scaler” is provided for automatically stretching and compressing frames of audio signals received across a packet-based network. Prior to stretching or compressing segments of a current frame, the temporal audio scaler first computes a pitch period for each frame for sizing signal templates used for matching operations in stretching and compressing segments. Further, the temporal audio scaler also determines the type or types of segments comprising each frame. These segment types include “voiced” segments, “unvoiced” segments, and “mixed” segments which include both voiced and unvoiced portions. The stretching or compression methods applied to segments of each frame are then dependent upon the type of segments comprising each frame. Further, the amount of stretching and compression applied to particular segments is automatically variable for minimizing signal artifacts while still ensuring that an overall target stretching or compression ratio is maintained for each frame.

Type: Grant

Filed: September 10, 2003

Date of Patent: February 26, 2008

Assignee: Microsoft Corporation

Inventors: Dinei Florencio, Philip Chou, Li-Wei He
System and method for automatic verification of the understandability of speech

Patent number: 7295982

Abstract: The present invention relates to a system and method for automatically verifying that a message received from a user is intelligible. In an exemplary embodiment, a message is received from the user. A speech level of the user's message may be measured and compared to a pre-determined speech level threshold to determine whether the measured speech level is below the pre-determined speech level threshold. A signal-to-noise ratio of the user's message may be measured and compared to a pre-determined signal-to-noise ratio threshold to determine whether the measured signal-to-noise ratio of the message is below the pre-determined signal-to-noise ratio threshold. An estimate of intelligibility for the user's message may be calculated and compared to an intelligibility threshold to determine whether the calculated estimate of intelligibility is below the intelligibility threshold.

Type: Grant

Filed: November 19, 2001

Date of Patent: November 13, 2007

Assignee: AT&T Corp.

Inventors: Harvey S. Cohen, Randy G. Goldberg, Kenneth H. Rosen
Dynamic translation between data network-based protocol in a data-packet-network and interactive voice response functions of a telephony network

Patent number: 7277916

Abstract: A system for emulating interaction with an interactive voice response unit is provided. The system comprises, a client node connected to the network, the client node soliciting interaction with the interactive voice response unit and a proxy server node connected to the network, the server node accessible to client node, the interactive voice response unit accessible to the server node. A connection is established between the client node and the proxy server node, the proxy server node accepts data from the client node and translates the data into a format for interacting with the interactive voice response unit whereupon the data is then propagated to the interactive voice response unit. Response data resulting from the input data at the interactive voice response unit is propagated to the proxy server node whereupon the response data is translated into a format for dissemination at the client node and propagated thereto.

Type: Grant

Filed: October 5, 2004

Date of Patent: October 2, 2007

Assignee: Genesys Telecommunications Laboratories, Inc.

Inventor: Marcialito Nuestro
Computational effectiveness enhancement of frequency domain pitch estimators

Patent number: 7272551

Abstract: Estimating a speech signal pitch frequency by determining a speech signal frame line spectrum including spectral lines having respective line amplitudes and frequencies, selecting a predefined number of spectral lines having highest amplitudes, fewer then the total number of the spectral lines, calculating a preliminary utility function over a pitch frequency range to provide a preliminary utility function value for each pitch frequency in the range measuring the compatibility of the selected spectral lines with the pitch frequency, identifying a predefined number of preliminary pitch frequency candidates at least partly responsive to the preliminary utility function, where each candidate is a local maximum of the preliminary utility function, calculating a final utility score for each of the candidates, and selecting any of the candidates to be an estimated pitch frequency of the speech signal at least partly responsive to any of the final utility scores.

Type: Grant

Filed: February 24, 2003

Date of Patent: September 18, 2007

Assignee: International Business Machines Corporation

Inventor: Alexander Sorin
Pitch determination based on weighting of pitch lag candidates

Patent number: 7266493

Abstract: There is provided a method of selecting a pitch lag value from a plurality of pitch lag candidates for coding a speech signal. The method comprises identifying the plurality of pitch lag candidates from a frame of the speech signal using correlation; classifying the speech signal to obtain a voice classification; determining whether one or more of the plurality of pitch lag candidates are in a temporal neighborhood of one or more previous pitch lag values; favoring the one or more of the plurality of pitch lag candidates determined to be in the temporal neighborhood of the one or more previous pitch lag values, by adaptive weighting, over other ones of the plurality of pitch lag candidates; and selecting the pitch lag value based on the voice classification and the one or more of the plurality of pitch lag candidates favored by the adaptive weighting.

Type: Grant

Filed: October 13, 2005

Date of Patent: September 4, 2007

Assignee: Mindspeed Technologies, Inc.

Inventors: Huan-Yu Su, Yang Gao
Voicing estimation method and apparatus for speech recognition by using local spectral information

Publication number: 20070185709

Abstract: A method and apparatus of estimating a voicing for speech recognition by using local spectral information. The voicing estimation method for speech recognition includes performing a Fourier transform on input voice signals after performing pre-processing on the input voice signals. The method further includes detecting peaks in the input voice signals after smoothing the input voice signals. The method also includes computing every frequency bound associated with the detected peaks, and determining a class of a voicing according to each computed frequency bound.

Type: Application

Filed: January 25, 2007

Publication date: August 9, 2007

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Kwang Cheol Oh, Jae-Hoon Jeong
Method for fast dynamic estimation of background noise

Patent number: 7246059

Abstract: The invention provides a method and system for dynamically estimating background noise. The system includes a portable communication device, a vocoder, and a voice activated detector. Based on information received by the portable communication device, the vocoder determines parameters related to incoming information including a voicing mode indicative of the periodicity of incoming information. The voice activated detector then compares the voicing mode to a threshold to determine whether a background noise estimate should be updated.

Type: Grant

Filed: July 24, 2003

Date of Patent: July 17, 2007

Assignee: Motorola, Inc.

Inventors: Ali Behboodian, Pratik Desai, Chin Pan Wong
Audio segmentation with energy-weighted bandwidth bias

Patent number: 7243062

Abstract: A method (200) and apparatus (100) for segmenting a sequence of audio samples into homogeneous segments (550 and 555) are disclosed. The method (200) forms a sequence of frames (701 to 704) along the sequence of audio samples, and extracts, for each frame, a data feature. The data features form a sequence of data features. Transition points in the sequence of data features are thin detected by applying the Bayesian Information Criterion to the sequence of data features. The transition points define the homogeneous segments (550 and 555). Preferably the data feature is single-dimensional and a leptokurtic distribution is used as an event model in the Bayesian Information Criterion.

Type: Grant

Filed: October 25, 2002

Date of Patent: July 10, 2007

Assignee: Canon Kabushiki Kaisha

Inventor: Timothy John Wark
Speech recognition system using normalized voiced segment spectrogram analysis

Patent number: 7233899

Abstract: Computer comparison of one or more dictionary entries with a sound record of a human utterance to determine whether and where each dictionary entry is contained within the sound record. The record is segmented, and for each vocalized segment a spectrogram is obtained, and for other segments symbolic and numeric data are obtained. The spectrogram of a vocalized segment is then processed using a method selected from a group consisting of a triple time transform, a triple frequency transform, a linear-piecewise-linear transform, and combinations thereof, to decrease noise and to eliminate variations in pronunciation. Each entry in the dictionary is then compared with every sequence of segments of substantially the same length in the sound record. The comparison takes into account the formant profiles within each vocalized segment and symbolic and numeric data for other segments are obtained in the record and in the dictionary entries.

Type: Grant

Filed: March 7, 2002

Date of Patent: June 19, 2007

Inventors: Vitaliy S. Fain, Samuel V. Fain
Low-frequency band noise detection

Patent number: 7233894

Abstract: A pitch estimation system including a low-frequency band noise detector (LBND) operative to detect the presence of low-frequency band noise in a first audio frame, a frequency-domain pitch estimator operative to calculate a pitch estimation of a second audio frame from at least one spectral peak in the second audio frame, and a pitch estimator controller operative to cause the pitch estimator to exclude from the spectrum of the second audio frame at least one low-frequency spectral peak below a predefined threshold where low-frequency band noise is present in the first audio frame.

Type: Grant

Filed: February 24, 2003

Date of Patent: June 19, 2007

Assignee: International Business Machines Corporation

Inventor: Alexander Sorin
Telephone apparatus

Patent number: 7228271

Abstract: The telephone apparatus of the present invention comprises a first voice band expander for generating a voiced signal frequency component by shifting the frequency of the voice signal received, a second voice band expander for generating a voiceless signal frequency component by shifting the frequency of the voice signal received, and a voice composer for composing the voice signal received, the output of the first voice band expander, and the output of the second voice band expander, which is able to output clear voices in aural communication.

Type: Grant

Filed: December 23, 2002

Date of Patent: June 5, 2007

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Toshimichi Tokuda, Takashi Kimura
Hybrid speech coding and system

Patent number: 7222070

Abstract: Linear predictive speech coding system with classification of frames and a hybrid coder using both waveform coding and parametric coding for different classes of frames. Phase alignment for a parametric coder aligns synthesized speech frames with adjacent waveform coder synthesized frames. Zero phase alignment of speech prior to waveform coding aligns synthesized speech frames of a waveform coder with frames synthesized with a parametric coder. Inter-frame interpolation of LP coefficients suppresses artifacts in resultant synthesized speech frames.

Type: Grant

Filed: September 22, 2000

Date of Patent: May 22, 2007

Assignee: Texas Instruments Incorporated

Inventors: Jacek Stachurski, Alan V. McCree
Excitation codebook search method in a speech coding system

Patent number: 7206739

Abstract: A method for searching an excitation (or fixed) codebook in a speech coding system. In a speech coding system including a synthesis filter for synthesizing a speech signal, a fixed codebook searcher according to the present invention segments a speech signal frame into a plurality of subframes to generate an excitation signal to be used in a synthesis filter, segments again each of the subframes into a plurality of subgroups, and searches the respective subframes each comprised of a plurality of pulse position/amplitude combinations for pulses. The fixed codebook searcher searches the respective subgroups for a predetermine number of pulses having non-zero amplitude, and generates the searched pulses as an initial vector. Next, the fixed codebook searcher selects a pulse combination including at least one pulse among the pulses of the initial vector, and then substitutes pulses of the selected pulse combination for pulses in other positions in the subgroups.

Type: Grant

Filed: May 23, 2002

Date of Patent: April 17, 2007

Assignee: Samsung Electronics Co., Ltd.

Inventor: Dae-Ryong Lee
Gain-smoothing in wideband speech and audio signal decoder

Patent number: 7191123

Abstract: The gain smoothing method and device modify the amplitude of an innovative codevector in relation to background noise present in a previously sampled wideband signal. The gain smoothing device comprises a gain smoothing calculator for calculating a smoothing gain in response to a factor representative of voicing in the sampled wideband signal, a factor representative of the stability of a set of linear prediction filter coefficients, and an innovative codebook gain. The gain smoothing device also comprises an amplifier for amplifying the innovative codevector with the smoothing gain to thereby produce a gain-smoothed innovative codevector. The function of the gain-smoothing device improves the perceived synthesized signal when background noise is present in the sampled wideband signal.

Type: Grant

Filed: November 17, 2000

Date of Patent: March 13, 2007

Assignee: Voiceage Corporation

Inventors: Bruno Bessette, Redwan Salami, Roch Lefebvre
Method and system for distinguishing speech from music in a digital audio signal in real time

Patent number: 7191128

Abstract: The present invention relates to method and system for distinguishing speech from music in a digital audio signal in real time.

Type: Grant

Filed: February 21, 2003

Date of Patent: March 13, 2007

Assignee: LG Electronics Inc.

Inventors: Mikhael A. Sall, Sergei N. Gramnitskiy, Alexandr L. Maiboroda, Victor V. Redkov, Anatoli I. Tikhotsky, Andrei B. Viktorov
Voice-activity detection using energy ratios and periodicity

Patent number: 7171357

Abstract: A voice activity detector (100) filters (204) out noise energy and then computes a high-frequency (2400 Hz to 4000 Hz) versus low-frequency (100 Hz to 2400 Hz) signal energy ratio (224), total voiceband (100 Hz to 4000 Hz) signal energy (214), and signal periodicity (208) on successive frames of signal samples. Signal periodicity is determined by estimating the pitch period (206) of the signal, determining a gain value of the signal over the pitch period as a function of the estimated pitch period, and estimating a periodicity of the signal over the pitch period as a function of the estimated pitch period and the gain value.

Type: Grant

Filed: March 21, 2001

Date of Patent: January 30, 2007

Assignee: Avaya Technology Corp.

Inventor: Simon Daniel Boland
Method and device for robust predictive vector quantization of linear prediction parameters in variable bit rate speech coding

Patent number: 7149683

Abstract: The present invention relates to a method and device for quantizing linear prediction parameters in variable bit-rate sound signal coding, in which an input linear prediction parameter vector is received, a sound signal frame corresponding to the input linear prediction parameter vector is classified, a prediction vector is computed, the computed prediction vector is removed from the input linear prediction parameter vector to produce a prediction error vector, and the prediction error vector is quantized. Computation of the prediction vector comprises selecting one of a plurality of prediction schemes in relation to the classification of the sound signal frame, and processing the prediction error vector through the selected prediction scheme. The present invention further relates to a method and device for dequantizing linear prediction parameters in variable bit-rate sound signal decoding.

Type: Grant

Filed: January 19, 2005

Date of Patent: December 12, 2006

Assignee: Nokia Corporation

Inventor: Milan Jelinek
Low bit-rate coding of unvoiced segments of speech

Patent number: 7146310

Abstract: A low-bit-rate coding technique for unvoiced segments of speech includes the steps of extracting high-time-resolution energy coefficients from a frame of speech, quantizing the energy coefficients, generating a high-time-resolution energy envelope from the quantized energy coefficients, and reconstituting a residue signal by shaping a randomly generated noise vector with quantized values of the energy envelope. The energy envelope may be generated with a linear interpolation technique. A post-processing measure may be obtained and compared with a predefined threshold to determine whether the coding algorithm is performing adequately.

Type: Grant

Filed: September 29, 2004

Date of Patent: December 5, 2006

Assignee: Qualcomm, Incorporated

Inventors: Amitava Das, Sharath Manjunath
Hybrid speech coding and system

Patent number: 7139700

Abstract: Linear predictive speech coding system with classification of frames and a hybrid coder using both waveform coding and parametric coding for different classes of frames. Phase alignment for a parametric coder aligns synthesized speech frames with adjacent waveform coder synthesized frames. Zero phase alignment of speech prior to waveform coding aligns synthesized speech frames of a waveform coder with frames synthesized with a parametric coder. Inter-frame interpolation of LP coefficients suppresses artifacts in resultant synthesized speech frames.

Type: Grant

Filed: September 22, 2000

Date of Patent: November 21, 2006

Assignee: Texas Instruments Incorporated

Inventors: Jacek Stachurski, Alan V. McCree
Device for and method of detecting voice activity

Patent number: 7127392

Abstract: The present invention is a device for and method of detecting voice activity. First, the AM envelope of a segment of a signal of interest is determined. Next, the number of times the AM envelope crosses a user-definable threshold is determined. If there are no crossings, the segment is identified as non-speech. next, the number of points on the AM envelope within a user-definable range is determined. If there are less than a user-definable number of points within the range, the segment is identified as non-speech. Next, the mean, variance, and power ratio of the normalized spectral content of the AM envelope is found and compared to the same for known speech and non-speech. The segment is identified as being of the same type as the known speech or non-speech to which it most closely compares. These steps are repreated for each signal segment of interest.

Type: Grant

Filed: February 12, 2003

Date of Patent: October 24, 2006

Assignee: The United States of America as represented by the National Security Agency

Inventor: David C. Smith
Low-complexity music detection algorithm and system

Patent number: 7120576

Abstract: A method for detecting music in a speech signal having a plurality of frames. The method comprises defining a music threshold value for a first parameter extracted from a frame of the speech signal, defining a background noise threshold value for the first parameter, and defining an unsure threshold value for the first parameter. The unsure threshold value falls between the music threshold value and the background noise threshold value. If the first parameter falls between the music threshold value and the background noise threshold value, the speech signal is classified as music or background noise based on analyzing a plurality of first parameters extracted from the plurality of frames.

Type: Grant

Filed: November 4, 2004

Date of Patent: October 10, 2006

Assignee: Mindspeed Technologies, Inc.

Inventor: Yang Gao
Silence description coding for multi-rate speech codecs

Patent number: 7120578

Abstract: Speech coding systems include multi-rate speech codecs having an encoder and a decoder. Silence description coding for multi-rate speech coding systems that employ discontinued transmission is performed in either the encoder or the decoder of the multi-rate speech codec. It may also be performed in a distributed manner wherein it is performed partially in the encoder and partially in the decoder. The silence description coding is performed on a speech signal having a substantially non-speech-like characteristic. Voice activity detection classifies the speech signal as being either substantially speech-like or substantially non-speech-like. The silence description coding is selected from a plurality of coding modes. In certain embodiments of the invention, the silence description coding is a source coding mode that operates at a bit rate that fits within a bit rate budget as determined by all of the available source coding modes within the plurality of coding modes.

Type: Grant

Filed: April 24, 2001

Date of Patent: October 10, 2006

Assignee: Mindspeed Technologies, Inc.

Inventors: Jes Thyssen, Huan-yu Su, Adil Benyassine, Eyal Shlomot
Voice detecting method and apparatus using a long-time average of the time variation of speech features, and medium thereof

Patent number: 7117150

Abstract: A first filter (2061 in FIG. 1) calculates a long-time average of first change quantities based on a difference between a line spectral frequency of an input voice signal and a long-time average thereof. A second filter (2062 in FIG. 1) calculates a long-time average of second change quantities based on a difference between a whole band energy of the input voice signal and a long-time average thereof. A third filter (2063 in FIG. 1) calculates a long-time average of third change quantities based on a difference between a low band energy of the input voice signal and a long-time average thereof. A fourth filter (2064 in FIG. 1) calculates a long-time average of fourth change quantities based on a difference between a zero cross number of the input voice signal and a long-time average thereof. A voice/non-voice determining circuit (1040 in FIG.

Type: Grant

Filed: May 31, 2001

Date of Patent: October 3, 2006

Assignee: NEC Corporation

Inventor: Atsushi Murashima
Enhanced conversion of wideband signals to narrowband signals

Patent number: 7113522

Abstract: Wideband speech signals must be converted to narrowband speech signals if the transmission medium or the destination terminal is constructed with narrowband constraints. A typical wideband-to-narrowband conversion method is the elimination of frequencies above 3400 Hz using a low pass filter and a down sampler. However, this method produces a muffled speech sound since the resulting narrowband signal has a flat frequency response. Methods and apparatus are presented herein to enhance the acoustic quality of a wideband-to-narrowband converted signal. A bandwidth switching filter is used to emphasize a mid-range frequency portion of the wideband signal so that the resulting narrowband signal has a non-flat frequency spectrum.

Type: Grant

Filed: January 24, 2001

Date of Patent: September 26, 2006

Assignee: QUALCOMM, Incorporated

Inventors: Khaled H. El-Maleh, Arasanipalai K. Ananthapadmanabhan, Andrew P. DeJaco
Method, system and network entity for providing text telephone enhancement for voice, tone and sound-based network services

Patent number: 7103349

Abstract: The invention is a system, a method of transmitting messages selectively as text or non-text from an entity (104) in a network (100 and 102), and an entity in a network. A system in accordance with the invention includes at least one terminal (16); a network containing the at least one terminal; an entity in the network which provides messages selectively as text or non-text to the network in a speech encoded form; and wherein the messages are transmitted in the speech encoded form by the network to the at least one terminal which reproduces the messages to a user thereof in either a text form or by a sound reproduction device of the at least one terminal.

Type: Grant

Filed: May 2, 2002

Date of Patent: September 5, 2006

Assignee: Nokia Corporation

Inventors: Teemu Himanen, Pasi Ylinen
Coded voice signal format converting apparatus

Patent number: 7099823

Abstract: A coded voice signal format converting apparatus is provided which is capable of converting a signal format of a coded voice signal by computations in reduced amounts. In the coded voice signal format converting apparatus, in a second coding device is employed a quantizing accuracy information converting section to which a first quantizing accuracy information output from a quantizing accuracy information decoding section in a first decoding device is input. Second mapping signal is quantized by a mapped signal coding section to produce a coded voice signal and the first quantizing accuracy information is converted so that it can be used by mapped signal coding section to determine a second quantizing accuracy information.

Type: Grant

Filed: February 27, 2001

Date of Patent: August 29, 2006

Assignee: NEC Corporation

Inventor: Yuichiro Takamizawa
Multistream network feature processing for a distributed speech recognition system

Patent number: 7089178

Abstract: A distributed voice recognition system and method for obtaining acoustic features and speech activity at multiple frequencies by extracting high frequency components thereof on a device, such as a subscriber station and transmitting them to a network server having multiple stream processing capability, including cepstral feature processing, MLP nonlinear transformation processing, and multiband temporal pattern architecture processing. The features received at the network server are processed using all three streams, wherein each of the three streams provide benefits not available in the other two, thereby enhancing feature interpretation. Feature extraction and feature interpretation may operate at multiple frequencies, including but not limited to 8 kHz, 11 kHz, and 16 kHz.

Type: Grant

Filed: April 30, 2002

Date of Patent: August 8, 2006

Assignee: Qualcomm Inc.

Inventors: Harinath Garudadri, Sunil Sivadas, Hynek Hermansky, Nelson H. Morgan, Charles C. Wooters, Andre Gustavo Adami, Maria Carmen Benitez Ortuzar, Lukas Burget, Stephane N. Dupont, Frantisek Grezl, Pratibha Jain, Sachin Kajarekar, Petr Motlicek
Frequency compander for a telephone line

Patent number: 7080017

Abstract: A frequency compander for improving the frequency response of a telephone line when used for remote broadcasting. The inventive device comprises an encoder for compressing the frequency spectrum of an audio signal and a decoder for expanding the signal back to its original spectrum. Preferably the encoder comprises: an anti-aliasing filter; an A/D converter for digitizing incoming audio; a DSP for compressing the audio; and a D/A converter for outputting compressed audio to the phone line. The decoder comprises: an anti-aliasing filter; an A/D converter for digitizing the incoming compressed signal; a DSP for restoring the original audio; and a D/A converter for outputting program audio. In a preferred embodiment, encoding and decoding are performed in the frequency domain. In another preferred embodiment, encoding and decoding are performed in the time domain using trigonometric transformations.

Type: Grant

Filed: May 31, 2002

Date of Patent: July 18, 2006

Inventors: Ken Scott Fisher, Kevin Cotton Baxter, Fred H. Holmes
Enhancing speech intelligibility using variable-rate time-scale modification

Patent number: 7065485

Abstract: The method and preprocessor enhances the intelligibility of narrowband speech without essentially lengthening the overall time duration of the signal. Both spectral enhancements and variable-rate time-scaling procedures are implemented to improve the salience of initial consonants, particularly the perceptually important formant transitions. Emphasis is transferred from the dominating vowel to the preceding consonant through adaptation of the phoneme timing structure. In a further embodiment, the technique is applied as a preprocessor to a speech coder.

Type: Grant

Filed: January 9, 2002

Date of Patent: June 20, 2006

Assignee: AT&T Corp

Inventors: Nicola R. Chong-White, Richard Vandervoort Cox

prev … 4 5 6 7 8 9 10 11 next