Voiced Or Unvoiced Patents (Class 704/208)
  • Patent number: 7835905
    Abstract: In order to detect a degree of voicing of a speech signal, an input speech signal is converted to a speech signal in the frequency domain, a pitch value is calculated from the speech signal, a plurality of harmonic peaks existing in the speech signal are detected, and a difference obtained by comparing the pitch value to an interval between adjacent harmonic peaks among the detected harmonic peaks is detected as the degree of voicing included in the speech signal.
    Type: Grant
    Filed: April 4, 2007
    Date of Patent: November 16, 2010
    Assignee: Samsung Electronics Co., Ltd
    Inventor: Hyun-Soo Kim
  • Patent number: 7835908
    Abstract: A method and apparatus for robust speaker localization and a camera control system employing the same are provided. The apparatus for speaker localization includes: a difference spectrum obtaining section which obtains a difference spectrum of a first pseudo-power spectrum for a speech section and a second pseudo-power spectrum for a non-speech section detected in a voice signal output from a microphone array; and a speaker direction estimation section which detects a peak value in any one of the difference spectrum and the first pseudo-power spectrum, and estimates the direction of a speaker based on the direction angle corresponding to the detected peak value.
    Type: Grant
    Filed: October 13, 2004
    Date of Patent: November 16, 2010
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Changkyu Choi, Donggeon Kong, Bonyoung Lee, Sookwon Rang
  • Patent number: 7818168
    Abstract: A method of measuring the degree of enhancement made to a voice signal by receiving the voice signal, identifying formant regions in the voice signal, computing stationarity for each identified formant region, enhancing the voice signal, identifying formant regions in the enhanced voice signal that correspond to those identified in the received voice signal, computing stationarity for each formant region identified in the enhanced voice signal, comparing corresponding stationarity results for the received and enhanced voice signals, and calculating at least one user-definable statistic of the comparison results as the degree of enhancement made to the received voice signal.
    Type: Grant
    Filed: December 1, 2006
    Date of Patent: October 19, 2010
    Assignee: The United States of America as represented by the Director, National Security Agency
    Inventor: Adolf Cusmariu
  • Patent number: 7809554
    Abstract: An apparatus, method, and medium for detecting a voiced sound and an unvoiced sound. The apparatus includes a blocking unit for dividing an input signal into block units; a parameter calculator for calculating a first parameter to determine the voiced sound and a second parameter to determine the unvoiced sound by using a slope and spectral flatness measure (SFM) of a mel-scaled filter bank spectrum of an input signal existing in a block; and a determiner for determining a voiced sound zone and an unvoiced sound zone in the block by comparing the first and second parameters to predetermined threshold values.
    Type: Grant
    Filed: February 7, 2005
    Date of Patent: October 5, 2010
    Assignee: Samsung Electronics Co., Ltd.
    Inventor: Kwangcheol Oh
  • Patent number: 7809555
    Abstract: Provided is a speech signal classification system and method. The speech signal classification system includes a primary recognition unit for determining using characteristics extracted from a speech frame whether the speech frame is a voice sound, a non-voice sound, or background noise and a secondary recognition unit for determining using at least one other speech frame whether a determination-reserved speech frame is an non-voice sound or background noise, if it is determined according to a primary recognition result that an input speech frame is not a voice sound.
    Type: Grant
    Filed: March 19, 2007
    Date of Patent: October 5, 2010
    Assignee: Samsung Electronics Co., Ltd
    Inventor: Hyun-Soo Kim
  • Publication number: 20100250246
    Abstract: A speech signal evaluation apparatus includes: an acquisition unit that acquires, as a first frame, a speech signal of a specified length from speech signals; a first detection unit that detects, on the basis of a speech condition, whether the first frame is voiced or unvoiced; a variation calculation unit that, when the first frame is unvoiced, calculates a variation in a spectrum associated with the first frame on the basis of a spectrum of the first frame and a spectrum of a second frame that is unvoiced and precedes the first frame in time; and a second detection unit that detects, on the basis of a non-stationary condition based on the variation in spectrum, whether the variation of the first frame satisfies the non-stationary condition.
    Type: Application
    Filed: March 24, 2010
    Publication date: September 30, 2010
    Applicant: FUJITSU LIMITED
    Inventor: Chikako MATSUMOTO
  • Patent number: 7805295
    Abstract: The present invention relates to a method of synthesizing a signal comprising the steps of: a) determining of a required pitch bell location on the signal to be synthesized. b) mapping of the required pitch bell location onto an original signal to provide a first pitch bell location, c) randomizing the first pitch bell location to provide a second pitch bell location, d) windowing of the original signal on the second pitch bell location to provide a pitch bell, e) placing the resulting pitch bell at the required pitch bell location within the domain of the signal to be synthesized, f) repeating of the steps a) to e) for all required pitch bell locations and performing an overlap and add operation with respect to the pitch bells in order to synthesize the signal.
    Type: Grant
    Filed: August 8, 2003
    Date of Patent: September 28, 2010
    Assignee: Koninklijke Philips Electronics N.V.
    Inventor: Ercan Ferit Gigi
  • Patent number: 7801726
    Abstract: A speech processing apparatus includes a sound input unit that receives an input of a sound including a voice of one of an operator and a person other than the operator; a designation-duration accepting unit that accepts a designation-duration designated by the operator as a time interval that is a target of a speech processing within the input sound; a voice-duration detecting unit that detects a voice-duration that is a time interval in which the voice is present from the input sound; a speaker determining unit that determines whether a speaker of the voice is the operator or the person based on the input sound; and a deciding unit that detects an overlapping period between the designation-duration and the voice-duration, and decides that the voice-duration including the overlapping period is a processing duration, when the overlapping period is detected and the speaker is determined to be the person.
    Type: Grant
    Filed: October 17, 2006
    Date of Patent: September 21, 2010
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Masahide Ariu
  • Patent number: 7792669
    Abstract: A method and apparatus of estimating a voicing for speech recognition by using local spectral information. The voicing estimation method for speech recognition includes performing a Fourier transform on input voice signals after performing pre-processing on the input voice signals. The method further includes detecting peaks in the input voice signals after smoothing the input voice signals. The method also includes computing every frequency bound associated with the detected peaks, and determining a class of a voicing according to each computed frequency bound.
    Type: Grant
    Filed: January 25, 2007
    Date of Patent: September 7, 2010
    Assignee: Samsung Electronics Co., Inc.
    Inventors: Kwang Cheol Oh, Jae-Hoon Jeong
  • Patent number: 7792672
    Abstract: A method for converting a voice signal from a source speaker into a converted voice signal with acoustic characteristics similar to those of a target speaker includes the steps of determining (1) at least one function for transforming source speaker acoustic characteristics into acoustic characteristics similar to those of the target speaker using target and source speaker voice samples; and transforming acoustic characteristics of the source speaker voice signal to be converted by applying the transformation function(s). The method is characterized in that the transformation (2) includes the step (44) of applying only a predetermined portion of at least one transformation function to said signal to be converted.
    Type: Grant
    Filed: March 14, 2005
    Date of Patent: September 7, 2010
    Assignee: France Telecom
    Inventors: Olivier Rosec, Taoufik En-Najjary
  • Patent number: 7778831
    Abstract: Voice recognition methods and systems are disclosed. A voice signal is obtained for an utterance of a speaker. A runtime pitch is determined from the voice signal for the utterance. The speaker is categorized based on the runtime pitch and one or more acoustic model parameters are adjusted based on a categorization of the speaker. The parameter adjustment may be performed at any instance of time during the recognition. A voice recognition analysis of the utterance is then performed based on the acoustic model.
    Type: Grant
    Filed: February 21, 2006
    Date of Patent: August 17, 2010
    Assignee: Sony Computer Entertainment Inc.
    Inventor: Ruxin Chen
  • Patent number: 7778825
    Abstract: An apparatus and method for extracting precise voiced/unvoiced classification information from a voice signal is provided. The apparatus extracts voiced/unvoiced classification information by analyzing a ratio of a harmonic component to a non-harmonic (or residual) component. The apparatus uses a harmonic to residual ratio (HRR), a harmonic to noise component ratio (HNR), and a sub-band harmonic to noise component ratio (SB-HNR), which are feature extracting schemes obtained based on a harmonic component analysis, thereby precisely classifying voiced/unvoiced sounds. Therefore, the apparatus and method can be used for voice coding, recognition, composition, reinforcement, etc. in all voice signal processing systems.
    Type: Grant
    Filed: July 13, 2006
    Date of Patent: August 17, 2010
    Assignee: Samsung Electronics Co., Ltd
    Inventor: Hyun-Soo Kim
  • Publication number: 20100185440
    Abstract: The embodiments of a transcoding method, a transcoding device, and a communication apparatus are provided. The embodiment of a method includes: receiving a bit stream input from a sending end; determining an attribute of discontinuous transmission (DTX) used by a receiving end and a frame type of the input bit stream; and transcoding the input bit stream in a corresponding processing manner according to a determination result. Thereby, a corresponding transcoding operation is performed on the input bit stream according to the attribute of DTX used by the receiving end and the frame type of the input bit stream. In such a manner, input bit streams of various types can be processed, and the input bit streams can be correspondingly transcoded according to the requirements of the receiving end. Therefore, the average computational complexity and peak computational complexity can be effectively decreased without decreasing the quality of the synthesized speech.
    Type: Application
    Filed: January 21, 2010
    Publication date: July 22, 2010
    Inventors: Changchun Bao, Hao Xu, Fanrong Tang, Xiangyu Hu
  • Patent number: 7756700
    Abstract: Pitch estimation and classification into voiced, unvoiced and transitional speech were performed by a spectro-temporal auto-correlation technique. A peak picking formula was then employed. A weighing function was then applied to the power spectrum. The harmonics weighted power spectrum underwent mel-scaled band-pass filtering, and the log-energy of the filter's output was discrete cosine transformed to produce cepstral coefficients. A within-filter cubic-root amplitude compression was applied to reduce amplitude variation without compromise of the gain invariance properties.
    Type: Grant
    Filed: February 1, 2008
    Date of Patent: July 13, 2010
    Assignee: The Regents of the University of California
    Inventors: Kenneth Rose, Liang Gu
  • Patent number: 7756709
    Abstract: A method for identifying end of voiced speech within an audio stream of a noisy environment employs a speech discriminator. The discriminator analyzes each window of the audio stream, producing an output corresponding to the window. The output is used to classify the window in one of several classes, for example, (1) speech, (2) silence, or (3) noise. A state machine processes the window classifications, incrementing counters as each window is classified: speech counter for speech windows, silence counter for silence, and noise counter for noise. If the speech counter indicates a predefined number of windows, the state machine clears all counters. Otherwise, the state machine appropriately weights the values in the silence and noise counters, adds the weighted values, and compares the sum to a limit imposed on the number of non-voice windows. When the non-voice limit is reached, the state machine terminates processing of the audio stream.
    Type: Grant
    Filed: February 2, 2004
    Date of Patent: July 13, 2010
    Assignee: Applied Voice & Speech Technologies, Inc.
    Inventor: Karl D. Gierach
  • Patent number: 7752037
    Abstract: A method of determining a pitch period of an audio signal using a correlation-based signal derived from the audio signal. The correlation-based signal includes known peaks each corresponding to a respective one of known time lags. The known peaks includes a global maximum peak. The method comprises: (a) determining if a candidate peak among the local peaks exceeds a peak threshold; (b) determining if a candidate time lag corresponding to the candidate peak is within a predetermined range of at least one integer sub-multiple of the time lag corresponding to the global maximum peak; and (c) setting the pitch period equal to the candidate time lag when the determinations of both steps (a) and (b) are true.
    Type: Grant
    Filed: October 31, 2002
    Date of Patent: July 6, 2010
    Assignee: Broadcom Corporation
    Inventor: Juin-Hwey Chen
  • Patent number: 7747432
    Abstract: A high quality speech is reproduced with a small data amount in speech coding and decoding for performing compression coding and decoding of a speech signal to a digital signal. In speech coding method according to a code-excited linear prediction (CELP) speech coding, a noise level of a speech in a concerning coding period is evaluated by using a code or coding result of at least one of spectrum information, power information, and pitch information, and various excitation codebooks are used based on an evaluation result.
    Type: Grant
    Filed: October 29, 2007
    Date of Patent: June 29, 2010
    Assignee: Mitsubishi Denki Kabushiki Kaisha
    Inventor: Tadashi Yamaura
  • Patent number: 7747433
    Abstract: A high quality speech is reproduced with a small data amount in speech coding and decoding for performing compression coding and decoding of a speech signal to a digital signal. In speech coding method according to a code-excited linear prediction (CELP) speech coding, a noise level of a speech in a concerning coding period is evaluated by using a code or coding result of at least one of spectrum information, power information, and pitch information, and various excitation codebooks are used based on an evaluation result.
    Type: Grant
    Filed: October 29, 2007
    Date of Patent: June 29, 2010
    Assignee: Mitsubishi Denki Kabushiki Kaisha
    Inventor: Tadashi Yamaura
  • Patent number: 7742917
    Abstract: A high quality speech is reproduced with a small data amount in speech coding and decoding for performing compression coding and decoding of a speech signal to a digital signal. In speech coding method according to a code-excited linear prediction (CELP) speech coding, a noise level of a speech in a concerning coding period is evaluated by using a code or coding result of at least one of spectrum information, power information, and pitch information, and various excitation codebooks are used based on an evaluation result.
    Type: Grant
    Filed: October 29, 2007
    Date of Patent: June 22, 2010
    Assignee: Mitsubishi Denki Kabushiki Kaisha
    Inventor: Tadashi Yamaura
  • Patent number: 7742914
    Abstract: A method of reducing noise in an audio signal, comprising the steps of: using a furrow filter to select spectral components that are narrow in frequency but relatively broad in time; using a bar filter to select spectral components that are broad in frequency but relatively narrow in time; analyzing the relative energy distribution between the output of the furrow and bar filters to determine the optimal proportion of spectral components for the output signal; and reconstructing the audio signal to generate the output signal. A second pair of time-frequency filters may be used to further improve intelligibility of the output signal. The temporal relationship between the furrow filter output and the bar filter output may be monitored so that the fricative components are allowed primarily at boundaries between intervals with no voiced signal present and intervals with voice components. A noise reduction system for an audio signal.
    Type: Grant
    Filed: March 7, 2005
    Date of Patent: June 22, 2010
    Inventors: Daniel A. Kosek, Robert Crawford Maher
  • Publication number: 20100145688
    Abstract: An apparatus and a method to encode and decode a speech signal using an encoding mode are provided. An encoding apparatus may select an encoding mode of a frame included in an input speech signal, and encode a frame having an unvoiced mode for an unvoiced speech as the selected encoding mode.
    Type: Application
    Filed: December 4, 2009
    Publication date: June 10, 2010
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Ho Sang Sung, Ki Hyun Choo, Jung Hoe Kim, Eun Mi Oh
  • Patent number: 7734463
    Abstract: The present invention is directed to systems and methods in which a speaker records strings of numbers in different string lengths. Advantage is taken of the fact that speakers typically break numbers into group sizes of two, three, or four. Thus, by way of example, a recorder records two 0's, two 2s, two 3s, etc. Then the recorder records three 1s, three 2s, three 3s, etc., followed by four 1s, four 2s, four 3s, etc. The spoken number values for each string are broken apart and stored as individual numbers corresponding to the string length of the recording. When a number string is to be spoken (for example, the number 782), the system retrieves from the three digit string a first 7, a middle 8, and an end 2. When these retrieved values are communicated to a recipient, proper inflections are achieved for each digit.
    Type: Grant
    Filed: October 13, 2004
    Date of Patent: June 8, 2010
    Assignee: Intervoice Limited Partnership
    Inventor: Forrest McKay
  • Publication number: 20100125452
    Abstract: A method of refining a pitch period estimation of a signal, the method comprising: for each of a plurality of portions of the signal, scanning over a predefined range of time offsets to find an estimate of the pitch period of the portion within the predefined range of time offsets; identifying the average pitch period of the estimated pitch periods of the portions; determining a refined range of time offsets in dependence on the average pitch period, the refined range of time offsets being narrower than the predefined range of time offsets; and for a subsequent portion of the signal, scanning over the refined range of time offsets to find an estimate of the pitch period of the subsequent portion.
    Type: Application
    Filed: November 19, 2008
    Publication date: May 20, 2010
    Applicant: Cambridge Silicon Radio Limited
    Inventor: Xuejing Sun
  • Patent number: 7720679
    Abstract: Provided is a method for canceling background noise of a sound source other than a target direction sound source in order to realize highly accurate speech recognition, and a system using the same. In terms of directional characteristics of a microphone array, due to a capability of approximating a power distribution of each angle of each of possible various sound source directions by use of a sum of coefficient multiples of a base form angle power distribution of a target sound source measured beforehand by base form angle by using a base form sound, and power distribution of a non-directional background sound by base form, only a component of the target sound source direction is extracted at a noise suppression part. In addition, when the target sound source direction is unknown, at a sound source localization part, a distribution for minimizing the approximate residual is selected from base form angle power distributions of various sound source directions to assume a target sound source direction.
    Type: Grant
    Filed: September 24, 2008
    Date of Patent: May 18, 2010
    Assignee: Nuance Communications, Inc.
    Inventors: Osamu Ichikawa, Tetsuya Takiguchi, Masafumi Nishimura
  • Publication number: 20100098064
    Abstract: A method and apparatus for dynamically enabling the activation and deactivation of comfort noise over a VoIP media path or channel are disclosed. The present method detects all sound levels in the media path and only activates the comfort noise in the absence of sound and when the background noise level or the telephone line noise level is low rather than only in the absence of speech.
    Type: Application
    Filed: December 26, 2009
    Publication date: April 22, 2010
    Inventors: MARIAN CROAK, Hossein Eslambolchi
  • Publication number: 20100094620
    Abstract: First encoded voice bits are transcoded into second encoded voice bits by dividing the first encoded voice bits into one or more received frames, with each received frame containing multiple ones of the first encoded voice bits. First parameter bits for at least one of the received frames are generated by applying error control decoding to one or more of the encoded voice bits contained in the received frame, speech parameters are computed from the first parameter bits, and the speech parameters are quantized to produce second parameter bits. Finally, a transmission frame is formed by applying error control encoding to one or more of the second parameter bits, and the transmission frame is included in the second encoded voice bits.
    Type: Application
    Filed: December 14, 2009
    Publication date: April 15, 2010
    Applicant: DIGITAL VOICE SYSTEMS, INC.
    Inventor: John C. Hardwick
  • Publication number: 20100088089
    Abstract: Synthesizing a set of digital speech samples corresponding to a selected voicing state includes dividing speech model parameters into frames, with a frame of speech model parameters including pitch information, voicing information determining the voicing state in one or more frequency regions, and spectral information. First and second digital filters are computed using, respectively, first and second frames of speech model parameters, with the frequency responses of the digital filters corresponding to the spectral information in frequency regions for which the voicing state equals the selected voicing state. A set of pulse locations are determined, and sets of first and second signal samples are produced using the pulse locations and, respectively, the first and second digital filters. Finally, the sets of first and second signal samples are combined to produce a set of digital speech samples corresponding to the selected voicing state.
    Type: Application
    Filed: August 21, 2009
    Publication date: April 8, 2010
    Applicant: DIGITAL VOICE SYSTEMS, INC.
    Inventor: John C. Hardwick
  • Patent number: 7693398
    Abstract: High audibility output is realized when audio output is provided in special playback. In special playback with audio output, skip/repeat control is done so that decoding and outputting of the audio data is periodically repeated/skipped during part of one frame. The output level may be corrected so as to emphasize appropriate frequency components. This realizes good audio output. In addition, the skip/repeat control and output level correcting methods are changed according to characteristics of the audio data to be reproduced. Also, this realizes good audio output.
    Type: Grant
    Filed: March 8, 2005
    Date of Patent: April 6, 2010
    Assignee: Hitachi, Ltd.
    Inventors: Takashi Kanemaru, Sadao Tsuruga
  • Patent number: 7693710
    Abstract: The present invention relates to a method and device for improving concealment of frame erasure caused by frames of an encoded sound signal erased during transmission from an encoder (106) to a decoder (110), and for accelerating recovery of the decoder after non erased frames of the encoded sound signal have been received. For that purpose, concealment/recovery parameters are determined in the encoder or decoder. When determined in the encoder (106), the concealment/recovery parameters are transmitted to the decoder (110). In the decoder, erasure frame concealment and decoder recovery is conducted in response to the concealment/recovery parameters. The concealment/recovery parameters may be selected from the group consisting of: a signal classification parameter, an energy information parameter and a phase information parameter.
    Type: Grant
    Filed: May 30, 2003
    Date of Patent: April 6, 2010
    Assignee: VoiceAge Corporation
    Inventors: Milan Jelinek, Philippe Gournay
  • Patent number: 7660716
    Abstract: The present invention relates to a system and method for automatically verifying that a message received from a user is intelligible. In an exemplary embodiment, a message is received from the user. A speech level of the user's message may be measured and compared to a pre-determined speech level threshold to determine whether the measured speech level is below the pre-determined speech level threshold. A signal-to-noise ratio of the user's message may be measured and compared to a pre-determined signal-to-noise ratio threshold to determine whether the measured signal-to-noise ratio of the message is below the pre-determined signal-to-noise ratio threshold. An estimate of intelligibility for the user's message may be calculated and compared to an intelligibility threshold to determine whether the calculated estimate of intelligibility is below the intelligibility threshold.
    Type: Grant
    Filed: October 3, 2007
    Date of Patent: February 9, 2010
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Harvey S. Cohen, Randy G. Goldberg, Kenneth H. Rosen
  • Patent number: 7657427
    Abstract: Speech signal classification and encoding systems and methods are disclosed herein. The signal classification is done in three steps each of them discriminating a specific signal class. First, a voice activity detector (VAD) discriminates between active and inactive speech frames. If an inactive speech frame is detected (background noise signal) then the classification chain ends and the frame is encoded with comfort noise generation (CNG). If an active speech frame is detected, the frame is subjected to a second classifier dedicated to discriminate unvoiced frames. If the classifier classifies the frame as unvoiced speech signal, the classification chain ends, and the frame is encoded using a coding method optimized for unvoiced signals. Otherwise, the speech frame is passed through to the “stable voiced” classification module. If the frame is classified as stable voiced frame, then the frame is encoded using a coding method optimized for stable voiced signals.
    Type: Grant
    Filed: January 19, 2005
    Date of Patent: February 2, 2010
    Assignee: Nokia Corporation
    Inventor: Milan Jelinek
  • Patent number: 7653536
    Abstract: A signal processing system which discriminates between voice signals and data signals modulated by a voiceband carrier. The signal processing system includes a voice exchange, a data exchange and a call discriminator. The voice exchange is capable of exchanging voice signals between a switched circuit network and a packet based network. The signal processing system also includes a data exchange capable of exchanging data signals modulated by a voiceband carrier on the switched circuit network with unmodulated data signal packets on the packet based network. The data exchange is performed by demodulating data signals from the switched circuit network for transmission on the packet based network, and modulating data signal packets from the packet based network for transmission on the switched circuit network. The call discriminator is used to selectively enable the voice exchange and data exchange.
    Type: Grant
    Filed: February 20, 2007
    Date of Patent: January 26, 2010
    Assignee: Broadcom Corporation
    Inventors: Onur Tackin, Scott Branden
  • Publication number: 20100017203
    Abstract: A method and apparatus for processing audio signals. The method includes receiving an audio signal as a sequence of digital samples, said audio signal containing a speech portion and a non-speech portion, dividing said sequence of digital samples into a sequence of sub-frames, selecting a set of sub-frames from said sequence of sub-frames, said set including a current sub-frame, determining whether a difference of peak values for any pair of sub-frames is greater than a pre-determined threshold, wherein said pair of sub-frames are contained in said set of sub-frames, and concluding that said current sub-frame represents said speech portion if said difference of peak values exceeds said pre-determined threshold.
    Type: Application
    Filed: May 27, 2009
    Publication date: January 21, 2010
    Applicant: TEXAS INSTRUMENTS INCORPORATED
    Inventor: Fitzgerald John Archibald
  • Publication number: 20100017202
    Abstract: Provided is a method and apparatus for determining a signal coding mode. The signal coding mode may be determined or changed according to whether a current frame corresponds to a silence period and by using a history of speech or music presence possibilities.
    Type: Application
    Filed: July 9, 2009
    Publication date: January 21, 2010
    Applicant: SAMSUNG ELECTRONICS Co., LTD
    Inventors: Ho-sang Sung, Jie Zhan, Ki-hyun Choo
  • Patent number: 7643988
    Abstract: A method for analyzing fundamental frequency information contained in voice samples includes at least one analysis step (2) for the voice samples which are grouped together in frames in order to obtain information relating to the spectrum and information relating to the fundamental frequency for each sample frame; a step (20) for the determination of a model representing the common characteristics of the spectrum and fundamental frequency of all samples; and a step (30) for determination of a fundamental frequency prediction function exclusively according to spectrum-related in formation on the basis of the model and voice samples.
    Type: Grant
    Filed: March 2, 2004
    Date of Patent: January 5, 2010
    Assignee: France Telecom
    Inventors: Taoufik En-Najjary, Olivier Rosec
  • Patent number: 7636659
    Abstract: In accordance with the present invention, computer implemented methods and systems are provided for representing and modeling the temporal structure of audio signals. In response to receiving a signal, a time-to-frequency domain transformation on at least a portion of the received signal to generate a frequency domain representation is performed. The time-to-frequency domain transformation converts the signal from a time domain representation to the frequency domain representation. A frequency domain linear prediction (FDLP) is performed on the frequency domain representation to estimate a temporal envelope of the frequency domain representation. Based on the temporal envelope, one or more speech features are generated.
    Type: Grant
    Filed: March 25, 2005
    Date of Patent: December 22, 2009
    Assignee: The Trustees of Columbia University in the City of New York
    Inventors: Marios Athineos, Daniel P. W. Ellis
  • Patent number: 7630396
    Abstract: Multichannel signal coding equipment is provided for presenting a high quality sound at a low bit rate. In the multichannel signal coding equipment (2), a down mix part (10) generates monaural reference channel signals for N number of channel signals. A coding part (11) codes the generated reference channel signal. A signal analyzing part (12) extracts parameters indicating characteristics of each of the N number of channel signals. An MUX part (13) multiplexes the coded reference channel signal with the extracted parameters.
    Type: Grant
    Filed: August 24, 2005
    Date of Patent: December 8, 2009
    Assignee: Panasonic Corporation
    Inventors: Michiyo Goto, Chun Woei Teo, Sua Hong Neo, Koji Yoshida
  • Patent number: 7627468
    Abstract: An apparatus enabling automatic determination of a portion that reliably represents a feature of a speech waveform includes: an acoustic/prosodic analysis unit calculating, from data, distribution of an energy of a prescribed frequency range of the speech waveform on a time axis, and for extracting, among various syllables of the speech waveform, a range that is generated stably, based on the distribution and the pitch of the speech waveform; cepstral analysis unit estimating, based on the spectral distribution of the speech waveform on the time axis, a range of the speech waveform of which change is well controlled by a speaker; and a pseudo-syllabic center extracting unit extracting, as a portion of high reliability of the speech waveform, that range which has been estimated to be the stably generated range and of which change is estimated to be well controlled by the speaker.
    Type: Grant
    Filed: February 21, 2003
    Date of Patent: December 1, 2009
    Assignees: Japan Science and Technology Agency, Advanced Telecommunication Research Institute International
    Inventors: Nick Campbell, Parham Mokhtari
  • Patent number: 7617094
    Abstract: One aspect of the invention is a method of using a computer to identify a conversation. Another aspect is a method for an audio processing system that identifies conversations and enhances each conversation for each user in the conversation.
    Type: Grant
    Filed: April 16, 2003
    Date of Patent: November 10, 2009
    Assignee: Palo Alto Research Center Incorporated
    Inventors: Paul M. Aoki, Margaret H. Szymanski, James D. Thornton, Daniel H. Wilson, Allison G. Woodruff
  • Patent number: 7596488
    Abstract: An “adaptive audio playback controller” operates by decoding and reading received packets of an audio signal into a signal buffer. Samples of the decoded audio signal are then played out of the signal buffer according to the needs of a player device. Jitter control and packet loss concealment are accomplished by continuously analyzing buffer content in real-time, and determining whether to provide unmodified playback from the buffer contents, whether to compress buffer content, stretch buffer content, or whether to provide for packet loss concealment for overly delayed or lost packets as a function of buffer content. Further, the adaptive audio playback controller also determines where to stretch or compress particular frames or signal segments in the signal buffer, and how much to stretch or compress such segments in order to optimize perceived playback quality.
    Type: Grant
    Filed: September 15, 2003
    Date of Patent: September 29, 2009
    Assignee: Microsoft Corporation
    Inventors: Dinei Florencio, Philip Chou, Li-Wei He
  • Patent number: 7596487
    Abstract: A method of detecting voice activity in a signal smoothes the “voice” or “noise” decision to avoid loss of speech segments. The method is particularly suitable for situations in which the noise level is high. Unlike the prior art method which favors optimizing traffic, this method favors the intelligibility of the signal reproduced after decoding. The signal to be coded is divided into frames. A “voice” or “noise” initial decision is made for each signal frame. The method makes the “voice” decision as soon as there is any increase in the energy of the signal relative to the frame preceding the current frame, even if the increase is slight. The method makes the “noise” decision only if the characteristics of the signal correspond to the characteristics of the noise for at least i consecutive frames (for example i=6). The method has applications in telephony.
    Type: Grant
    Filed: May 10, 2002
    Date of Patent: September 29, 2009
    Assignee: Alcatel
    Inventors: Raymond Gass, Richard Atzenhoffer
  • Patent number: 7590524
    Abstract: The present invention relates to enhancing a quality of speech wherein speech quality degradation is reduced by removing noise from an unvoiced speech. The present invention comprises dividing an input speech into a voiced speech and an unvoiced speech, performing adaptive filtering on the voiced speech to remove a noise of the voiced speech, and performing special subtraction on the unvoiced speech.
    Type: Grant
    Filed: September 6, 2005
    Date of Patent: September 15, 2009
    Assignee: LG Electronics Inc.
    Inventor: Chan Woo Kim
  • Patent number: 7577564
    Abstract: Method and apparatus for the classification of speech signals. Speech is classified into two broad classes of speech production—whispered speech and normally phonated speech. Speech classified in this manner will yield increased performance of automated speech processing systems because the erroneous results that occur when typical automated speech processing systems encounter non-typical speech such as whispered speech, will be avoided.
    Type: Grant
    Filed: March 3, 2003
    Date of Patent: August 18, 2009
    Assignee: The United States of America as represented by the Secretary of the Air Force
    Inventors: Stanley J. Wenndt, Edward J. Cupples
  • Patent number: 7577565
    Abstract: Packetized CELP-encoded speech playout with frame truncation during silence and frame expansion method dependent upon voicing classification with voiced frame expansion maintaining phasealignment.
    Type: Grant
    Filed: June 10, 2008
    Date of Patent: August 18, 2009
    Assignee: Texas Instruments Incorporated
    Inventors: Krishnasamy Anandakumar, Alan McCree, Erdal Paksoy
  • Patent number: 7574451
    Abstract: A “Media Identifier” operates on concurrent media streams to provide large numbers of clients with real-time server-side identification of media objects embedded in streaming media, such as radio, television, or Internet broadcasts. Such media objects may include songs, commercials, jingles, station identifiers, etc. Identification of the media objects is provided to clients by comparing client-generated traces computed from media stream samples to a large database of stored, pre-computed traces (i.e., “fingerprints”) of known identification. Further, given a finite number of media streams and a much larger number of clients, many of the traces sent to the server are likely to be almost identical. Therefore, a searchable dynamic trace cache is used to limit the database queries necessary to identify particular traces. This trace cache caches only one copy of recent traces along with the database search results, either positive or negative. Cache entries are then removed as they age.
    Type: Grant
    Filed: November 2, 2004
    Date of Patent: August 11, 2009
    Assignee: Microsoft Corporation
    Inventors: Chris Burges, John Platt
  • Patent number: 7574357
    Abstract: Method and system for generating electromyographic or sub-audible signals (“SAWPs”) and for transmitting and recognizing the SAWPs that represent the original words and/or phrases. The SAWPs may be generated in an environment that interferes excessively with normal speech or that requires stealth communications, and may be transmitted using encoded, enciphered or otherwise transformed signals that are less subject to signal distortion or degradation in the ambient environment.
    Type: Grant
    Filed: June 24, 2005
    Date of Patent: August 11, 2009
    Assignee: The United States of America as represented by the Admimnistrator of the National Aeronautics and Space Administration (NASA)
    Inventors: C. Charles Jorgensen, Bradley J. Betts
  • Patent number: 7567908
    Abstract: Differential dynamic content delivery including providing a session document for a presentation, wherein the session document includes a session grammar and a session structured document; selecting from the session structured document a classified structural element in dependence upon user classifications of a user participant in the presentation; presenting the selected structural element to the user; streaming presentation speech to the user including individual speech from at least one user participating in the presentation; converting the presentation speech to text; detecting whether the presentation speech contains simultaneous individual speech from two or more users; and displaying the text if the presentation speech contains simultaneous individual speech from two or more users.
    Type: Grant
    Filed: January 13, 2004
    Date of Patent: July 28, 2009
    Assignee: International Business Machines Corporation
    Inventors: William Kress Bodin, Michael John Burkhart, Daniel G. Eisenhauer, Daniel Mark Schumacher, Thomas J. Watson
  • Patent number: 7567900
    Abstract: A harmonic structure acoustic signal detection device not depending on the level fluctuation of the input signal including: an FFT unit which performs FFT on an input signal and calculates a power spectrum component for each frame; a harmonic structure extraction unit which leaves only a harmonic structure from the power spectrum component; a voiced feature evaluation unit which evaluates correlation between the frames of harmonic structures extracted by the harmonic structure extraction unit, thereby evaluates whether or not the segment is a vowel segment, and extracts the voiced segment; and a speech segment determination unit which determines a speech segment according to the continuity and durability of the output of the voiced feature evaluation unit.
    Type: Grant
    Filed: June 3, 2004
    Date of Patent: July 28, 2009
    Assignee: Panasonic Corporation
    Inventors: Tetsu Suzuki, Takeo Kanamori, Takashi Kawamura
  • Patent number: 7565286
    Abstract: A method for lost speech samples recovery in speech transmission systems is disclosed. The method employs a waveform coder operating on digital speech samples. It exploits the composite model of speech, wherein each speech segment contains both periodic and colored noise components, and separately estimates these two components of the unreliable samples. First, adaptive FIR filters computed from received signal statistics are used to interpolate estimates of the periodic component for the unreliable samples. These FIR filters are inherently stable and typically short, since only strongly correlated elements of the signal corresponding to pitch offset samples are used to compute the estimate. These periodic estimates are also computed for sample times corresponding to reliable samples adjacent to the unreliable sample interval. The differences between these reliable samples and the corresponding periodic estimates are considered as samples of the noise component.
    Type: Grant
    Filed: July 16, 2004
    Date of Patent: July 21, 2009
    Assignee: Her Majesty the Queen in right of Canada, as represented by the Minister of Industry, Through the Communications Research Centre Canada
    Inventors: Ken Gracie, John Lodge
  • Publication number: 20090182556
    Abstract: Methods, systems, and machine-readable media are disclosed for processing a signal representing speech. According to one embodiment, a method of processing a signal representing speech can comprise receiving a frame of the signal representing speech, classifying the frame as a voiced frame, and parsing the voiced frame into one or more regions based on occurrence of one or more events within the voiced frame. For example, the one or more events can comprise one or more glottal pulses. The one or more regions may collectively represent less than all of the voiced frame.
    Type: Application
    Filed: October 23, 2008
    Publication date: July 16, 2009
    Applicant: Red Shift Company, LLC
    Inventors: Erik N. Reckase, John F. Remillard