Voiced Or Unvoiced Patents (Class 704/214)
  • Publication number: 20100280824
    Abstract: Systems and methods to reduce the negative impact of wind on an electronic system include use of a first detector that receives a first signal and a second detector that receives a second signal. A voice activity detector (VAD) coupled to the first detector generates a VAD signal when the first signal corresponds to voiced speech. A wind detector coupled to the second detector correlates signals received at the second detector and derives from the correlation wind metrics that characterize wind noise that is acoustic disturbance corresponding to at least one of air flow and air pressure in the second detector. The wind detector controls a configuration of the second detector according to the wind metrics. The wind detector uses the wind metrics to dynamically control mixing of the first signal and the second signal to generate an output signal for transmission.
    Type: Application
    Filed: May 3, 2010
    Publication date: November 4, 2010
    Inventors: Nicolas Petit, Gregory Burnett, Michael Goertz
  • Patent number: 7822408
    Abstract: Systems and methods for processing information are disclosed and may include comparing a plurality of signals having identical time stamps. The method may further include identifying which of the plurality of signals comprise voice signals and which of the plurality of signals comprise corresponding non-voice signals based on the comparison. Data, which includes the voice signals and the corresponding non-voice signals, may be sequentially arranged for storage. The data, which includes the voice signals and the corresponding non-voice signals, may be arranged in a single file. The data, which includes the voice signals and the corresponding non-voice signals, may be recorded to a single file in memory element. The voice signals may be arranged so that the data including the corresponding non-voice signals is sequential. The non-voice signals may include video data, text-messaging data, and/or e-mail data.
    Type: Grant
    Filed: October 5, 2007
    Date of Patent: October 26, 2010
    Inventor: Fei Xie
  • Publication number: 20100268532
    Abstract: A system for voice detection includes a feature value calculation unit that calculates a feature value from an input signal sliced on a per frame basis, a provisional voice/non-voice decision unit that provisionally decides a voiced interval and a non-voiced interval from the feature value calculated on a per frame basis, and a voice/non-voice decision unit that determines a voiced interval duration threshold value or a non-voiced interval duration threshold value, using a ratio of the feature value found on a per frame basis to a threshold value for the feature value and that re-decides the voiced interval and the non-voiced interval, using the voiced interval duration threshold value determined and the non-voiced interval duration threshold value determined.
    Type: Application
    Filed: November 26, 2008
    Publication date: October 21, 2010
    Inventors: Takayuki Arakawa, Masanori Tsujikawa
  • Patent number: 7809560
    Abstract: In a method and system for identifying speech sound and non-speech sound in an environment, a speech signal and other non-speech signals are identified from a mixed sound source having a plurality of channels. The method includes the following steps: (a) using a blind source separation (BSS) unit to separate the mixed sound source into a plurality of sound signals; (b) storing spectrum of each of the sound signals; (c) calculating spectrum fluctuation of each of the sound signals in accordance with stored past spectrum information and current spectrum information sent from the blind source separation unit; and (d) identifying one of the sound signals that has a largest spectrum fluctuation as the speech signal.
    Type: Grant
    Filed: January 26, 2006
    Date of Patent: October 5, 2010
    Assignee: Panasonic Corporation
    Inventors: Chia-Shin Yen, Chien-Ming Wu, Che-Ming Lin
  • Publication number: 20100250246
    Abstract: A speech signal evaluation apparatus includes: an acquisition unit that acquires, as a first frame, a speech signal of a specified length from speech signals; a first detection unit that detects, on the basis of a speech condition, whether the first frame is voiced or unvoiced; a variation calculation unit that, when the first frame is unvoiced, calculates a variation in a spectrum associated with the first frame on the basis of a spectrum of the first frame and a spectrum of a second frame that is unvoiced and precedes the first frame in time; and a second detection unit that detects, on the basis of a non-stationary condition based on the variation in spectrum, whether the variation of the first frame satisfies the non-stationary condition.
    Type: Application
    Filed: March 24, 2010
    Publication date: September 30, 2010
    Applicant: FUJITSU LIMITED
    Inventor: Chikako MATSUMOTO
  • Patent number: 7805297
    Abstract: A system and method for performing frame loss concealment (FLC) when portions of a bit stream representing an audio signal are lost within the context of a digital communication system. The system and method utilizes a plurality of different FLC techniques, wherein each technique is tuned or designed for a different kind of audio signal. When a frame is lost, a previously-decoded audio signal corresponding to one or more previously-received good frames is analyzed. Based on the result of the analysis, the FLC technique that is most likely to perform well for the previously-decoded audio signal is chosen to perform the FLC operation for the current lost frame. In one implementation, the plurality of different FLC techniques include an FLC technique designed for music, such as a frame repeat FLC technique, and an FLC technique designed for speech, such as a periodic waveform extrapolation (PWE) technique.
    Type: Grant
    Filed: November 23, 2005
    Date of Patent: September 28, 2010
    Assignee: Broadcom Corporation
    Inventor: Juin-Hwey Chen
  • Patent number: 7805295
    Abstract: The present invention relates to a method of synthesizing a signal comprising the steps of: a) determining of a required pitch bell location on the signal to be synthesized. b) mapping of the required pitch bell location onto an original signal to provide a first pitch bell location, c) randomizing the first pitch bell location to provide a second pitch bell location, d) windowing of the original signal on the second pitch bell location to provide a pitch bell, e) placing the resulting pitch bell at the required pitch bell location within the domain of the signal to be synthesized, f) repeating of the steps a) to e) for all required pitch bell locations and performing an overlap and add operation with respect to the pitch bells in order to synthesize the signal.
    Type: Grant
    Filed: August 8, 2003
    Date of Patent: September 28, 2010
    Assignee: Koninklijke Philips Electronics N.V.
    Inventor: Ercan Ferit Gigi
  • Publication number: 20100208918
    Abstract: A volume correction device includes: a variable gain means for controlling a gain, given to an input audio signal, according to a gain control signal; a consecutive relevant sounds interval detection means for detecting a consecutive relevant sounds interval, during which a group of temporally adjoining consecutive relevant sounds is present, in the input audio signal; a mean level detection means for detecting the mean level of the input audio signal attained during the consecutive relevant sounds interval, and whose time constant for mean level detection is set to a smaller value during the leading period of the consecutive relevant sounds interval than during the remaining period; a gain control signal production means for producing the gain control signal, so that the mean level will be equal to a reference level, and feeding the gain control signal to the variable gain means.
    Type: Application
    Filed: February 8, 2010
    Publication date: August 19, 2010
    Applicant: Sony Corporation
    Inventor: Masayoshi Noguchi
  • Publication number: 20100211385
    Abstract: The present invention relates to a voice activity detector (VAD) comprising at least a first primary voice detector. The voice activity detector is configured to output a speech decision “vad_flag” indicative of the presence of speech in an input signal based on at least a primary speech decision “vad_prim_A” produced by said first primary voice detector. The voice activity detector further comprises a short term activity detector and the voice activity detector is further configured to produce a music decision “vad_music” indicative of the presence of music in the input signal based on a short term primary activity signal ?vad_act_prim_A? produced by said short term activity detector based on the primary speech decision “vad_prim_A” produced by the first voice detector. The short term primary activity signal “vad_act_prim_A” is proportional to the presence of music in the input signal. The invention also relates to a node, e.g. a terminal, in a communication system comprising such a VAD.
    Type: Application
    Filed: April 18, 2008
    Publication date: August 19, 2010
    Inventor: Martin Sehlstedt
  • Publication number: 20100198590
    Abstract: A signal processing system which discriminates between voice signals and data signals modulated by a voiceband carrier. The signal processing system includes a voice exchange, a data exchange and a call discriminator. The voice exchange is capable of exchanging voice signals between a switched circuit network and a packet based network. The signal processing system also includes a data exchange capable of exchanging data signals modulated by a voiceband carrier on the switched circuit network with unmodulated data signal packets on the packet based network. The data exchange is performed by demodulating data signals from the switched circuit network for transmission on the packet based network, and modulating data signal packets from the packet based network for transmission on the switched circuit network. The call discriminator is used to selectively enable the voice exchange and data exchange.
    Type: Application
    Filed: January 25, 2010
    Publication date: August 5, 2010
    Inventors: Onur Tackin, Scott Branden
  • Patent number: 7765099
    Abstract: A band recovering device recovers frequency components lying in a frequency band lost due to band-limitation of a sound signal. The device includes a peak-limiting amplifier for amplifying an input narrow-band signal while preventing the resulting amplified signal from exceeding a maximum amplitude. A peak-limitation detector detects the level of the amplified signal output. An amplification controller increases the amplification factor and/or the amount of amplification of the peak-limiting amplifier in accordance with the level of the amplified signal. A band recovering circuit generates, based on the amplified signal output from the peak-limiting amplifier and input narrow-band signal, a band-recovered signal including the frequency components lying in the missing band.
    Type: Grant
    Filed: March 12, 2008
    Date of Patent: July 27, 2010
    Assignee: Oki Electric Industry Co., Ltd.
    Inventor: Masashi Takada
  • Patent number: 7747433
    Abstract: A high quality speech is reproduced with a small data amount in speech coding and decoding for performing compression coding and decoding of a speech signal to a digital signal. In speech coding method according to a code-excited linear prediction (CELP) speech coding, a noise level of a speech in a concerning coding period is evaluated by using a code or coding result of at least one of spectrum information, power information, and pitch information, and various excitation codebooks are used based on an evaluation result.
    Type: Grant
    Filed: October 29, 2007
    Date of Patent: June 29, 2010
    Assignee: Mitsubishi Denki Kabushiki Kaisha
    Inventor: Tadashi Yamaura
  • Patent number: 7747432
    Abstract: A high quality speech is reproduced with a small data amount in speech coding and decoding for performing compression coding and decoding of a speech signal to a digital signal. In speech coding method according to a code-excited linear prediction (CELP) speech coding, a noise level of a speech in a concerning coding period is evaluated by using a code or coding result of at least one of spectrum information, power information, and pitch information, and various excitation codebooks are used based on an evaluation result.
    Type: Grant
    Filed: October 29, 2007
    Date of Patent: June 29, 2010
    Assignee: Mitsubishi Denki Kabushiki Kaisha
    Inventor: Tadashi Yamaura
  • Patent number: 7742917
    Abstract: A high quality speech is reproduced with a small data amount in speech coding and decoding for performing compression coding and decoding of a speech signal to a digital signal. In speech coding method according to a code-excited linear prediction (CELP) speech coding, a noise level of a speech in a concerning coding period is evaluated by using a code or coding result of at least one of spectrum information, power information, and pitch information, and various excitation codebooks are used based on an evaluation result.
    Type: Grant
    Filed: October 29, 2007
    Date of Patent: June 22, 2010
    Assignee: Mitsubishi Denki Kabushiki Kaisha
    Inventor: Tadashi Yamaura
  • Patent number: 7710982
    Abstract: The present invention prevents a receiving buffer from becoming empty by: storing received packets in the receiving buffer; detecting the largest arrival delay jitter of the packets and the buffer level of the receiving buffer by a state detecting part; obtaining an optimum buffer level for the largest delay jitter using a predetermined table by a control part; determining, based on the detected buffer level and the optimum buffer level, the level of urgency about the need to adjust the buffer level; expanding or reducing the waveform of a decoded audio data stream of the current frame decoded from a packet read out of the receiving buffer by a consumption adjusting part to adjust the consumption of reproduction frames on the basis of the urgency level, the detected buffer level, and the optimum buffer level.
    Type: Grant
    Filed: May 25, 2005
    Date of Patent: May 4, 2010
    Assignee: Nippon Telegraph and Telephone Corporation
    Inventors: Hitoshi Ohmuro, Takeshi Mori, Yusuke Hiwasaki, Akitoshi Kataoka
  • Patent number: 7693398
    Abstract: High audibility output is realized when audio output is provided in special playback. In special playback with audio output, skip/repeat control is done so that decoding and outputting of the audio data is periodically repeated/skipped during part of one frame. The output level may be corrected so as to emphasize appropriate frequency components. This realizes good audio output. In addition, the skip/repeat control and output level correcting methods are changed according to characteristics of the audio data to be reproduced. Also, this realizes good audio output.
    Type: Grant
    Filed: March 8, 2005
    Date of Patent: April 6, 2010
    Assignee: Hitachi, Ltd.
    Inventors: Takashi Kanemaru, Sadao Tsuruga
  • Patent number: 7680655
    Abstract: A method and apparatus are provided for determining the quality of a speech transmission, including temporal clipping, delay and jitter, using a carefully constructed test signal (300) and digital signal processing techniques. The test signal that is to be transmitted through a speech transmission system (100) is created (700). Then the test signal is transmitted through the speech transmission system such that the speech transmission system creates an output signal that corresponds to the input signal, as modified by the speech transmission system (702). The test signal includes multiple segments (500) of speech signals interleaved with periods of silence. The periods of silence vary in duration according to a predefined pattern. Each segment of speech signals includes multiple predefined speech samples or symbols (400, 402, 404, 406, 408, 410, 412, 414) interleaved with a plurality of silence gaps. The speech samples have a common period of duration, but the silence gaps do not.
    Type: Grant
    Filed: May 20, 2005
    Date of Patent: March 16, 2010
    Assignee: Alcatel-Lucent USA Inc.
    Inventors: Ronald Jay Canniff, Michael R. Kosek, Alan Howard Matten, Harvey P. Siy, Peng Zhang
  • Patent number: 7657427
    Abstract: Speech signal classification and encoding systems and methods are disclosed herein. The signal classification is done in three steps each of them discriminating a specific signal class. First, a voice activity detector (VAD) discriminates between active and inactive speech frames. If an inactive speech frame is detected (background noise signal) then the classification chain ends and the frame is encoded with comfort noise generation (CNG). If an active speech frame is detected, the frame is subjected to a second classifier dedicated to discriminate unvoiced frames. If the classifier classifies the frame as unvoiced speech signal, the classification chain ends, and the frame is encoded using a coding method optimized for unvoiced signals. Otherwise, the speech frame is passed through to the “stable voiced” classification module. If the frame is classified as stable voiced frame, then the frame is encoded using a coding method optimized for stable voiced signals.
    Type: Grant
    Filed: January 19, 2005
    Date of Patent: February 2, 2010
    Assignee: Nokia Corporation
    Inventor: Milan Jelinek
  • Patent number: 7653537
    Abstract: A system and method is provided for determining whether a data frame of a coded speech signal corresponds to voice or to noise. In one embodiment, a voice activity detector determines a cross-correlation of data. If the cross-correlation is lower than a predetermined cross-correlation value, then the data frame corresponds to noise. If not, then the voice activity detector determines a periodicity of the cross-correlation and a variance of the periodicity. If the variance is less than a predetermined variance value, then the data frame corresponds to voice. In another embodiment, a method determines energy of the data frame and an average energy of the coded speech signal. If the data frame is one of a predetermined number of initial data frames, then a comparison between the average energy to the energy of the data frame is used to determine whether the data frame is noise or voice.
    Type: Grant
    Filed: September 28, 2004
    Date of Patent: January 26, 2010
    Assignee: STMicroelectronics Asia Pacific Pte. Ltd.
    Inventors: Kabi Prakash Padhi, Sapna George
  • Patent number: 7653536
    Abstract: A signal processing system which discriminates between voice signals and data signals modulated by a voiceband carrier. The signal processing system includes a voice exchange, a data exchange and a call discriminator. The voice exchange is capable of exchanging voice signals between a switched circuit network and a packet based network. The signal processing system also includes a data exchange capable of exchanging data signals modulated by a voiceband carrier on the switched circuit network with unmodulated data signal packets on the packet based network. The data exchange is performed by demodulating data signals from the switched circuit network for transmission on the packet based network, and modulating data signal packets from the packet based network for transmission on the switched circuit network. The call discriminator is used to selectively enable the voice exchange and data exchange.
    Type: Grant
    Filed: February 20, 2007
    Date of Patent: January 26, 2010
    Assignee: Broadcom Corporation
    Inventors: Onur Tackin, Scott Branden
  • Patent number: 7643990
    Abstract: Portions from time-domain speech segments are extracted. Feature vectors that represent the portions in a vector space are created. The feature vectors incorporate phase information of the portions. A distance between the feature vectors in the vector space is determined. In one aspect, the feature vectors are created by constructing a matrix W from the portions and decomposing the matrix W. In one aspect, decomposing the matrix W comprises extracting global boundary-centric features from the portions. In one aspect, the portions include at least one pitch period. In another aspect, the portions include centered pitch periods.
    Type: Grant
    Filed: October 23, 2003
    Date of Patent: January 5, 2010
    Assignee: Apple Inc.
    Inventor: Jerome R. Bellegarda
  • Publication number: 20090306975
    Abstract: A system is provided for transmitting information through a speech codec (in-band) such as found in a wireless communication network. A modulator transforms the data into a spectrally noise-like signal based on the mapping of a shaped pulse to predetermined positions within a modulation frame, and the signal is efficiently encoded by a speech codec. A synchronization sequence provides modulation frame timing at the receiver and is detected based on analysis of a correlation peak pattern. A request/response protocol provides reliable transfer of data using message redundancy, retransmission, and/or robust modulation modes dependent on the communication channel conditions.
    Type: Application
    Filed: June 3, 2009
    Publication date: December 10, 2009
    Applicant: QUALCOMM Incorporated
    Inventors: CHRISTIAN PIETSCH, GEORG FRANK, CHRISTIAN SGRAJA, PENGJUN HUANG, CHRISTOPH A. JOETTEN, MARC W. WERNER, WOLFGANG GRANZOW
  • Publication number: 20090306976
    Abstract: A system is provided for transmitting information through a speech codec (in-band) such as found in a wireless communication network. A modulator transforms the data into a spectrally noise-like signal based on the mapping of a shaped pulse to predetermined positions within a modulation frame, and the signal is efficiently encoded by a speech codec. A synchronization sequence provides modulation frame timing at the receiver and is detected based on analysis of a correlation peak pattern. A request/response protocol provides reliable transfer of data using message redundancy, retransmission, and/or robust modulation modes dependent on the communication channel conditions.
    Type: Application
    Filed: June 3, 2009
    Publication date: December 10, 2009
    Applicant: QUALCOMM Incorporated
    Inventors: Christoph A. Joetten, Christian Sgraja, Georg Frank, Pengjun Huang, Christian Pietsch, Marc W. Werner, Ethan R. Duni, Eugene J. Baik
  • Patent number: 7627468
    Abstract: An apparatus enabling automatic determination of a portion that reliably represents a feature of a speech waveform includes: an acoustic/prosodic analysis unit calculating, from data, distribution of an energy of a prescribed frequency range of the speech waveform on a time axis, and for extracting, among various syllables of the speech waveform, a range that is generated stably, based on the distribution and the pitch of the speech waveform; cepstral analysis unit estimating, based on the spectral distribution of the speech waveform on the time axis, a range of the speech waveform of which change is well controlled by a speaker; and a pseudo-syllabic center extracting unit extracting, as a portion of high reliability of the speech waveform, that range which has been estimated to be the stably generated range and of which change is estimated to be well controlled by the speaker.
    Type: Grant
    Filed: February 21, 2003
    Date of Patent: December 1, 2009
    Assignees: Japan Science and Technology Agency, Advanced Telecommunication Research Institute International
    Inventors: Nick Campbell, Parham Mokhtari
  • Publication number: 20090292533
    Abstract: Streaming voice signals, such as might be received at a contact center or similar operation, are analyzed to detect the occurrence of one or more unprompted, predetermined utterances. The predetermined utterances preferably constitute a vocabulary of words and/or phrases having particular meaning within the context in which they are uttered. Detection of one or more of the predetermined utterances during a call causes a determination of response-determinative significance of the detected utterance(s). Based on the response-determinative significance of the detected utterance(s), a responsive action may be further determined. Additionally, long term storage of the call corresponding to the detected utterance may also be initiated. Conversely, calls in which no predetermined utterances are detected may be deleted from short term storage.
    Type: Application
    Filed: May 22, 2009
    Publication date: November 26, 2009
    Applicant: Accenture Global Services GmbH
    Inventors: Thomas J. Ryan, Biji K. Janan
  • Publication number: 20090292532
    Abstract: Streaming voice signals, such as might be received at a contact center or similar operation, are analyzed to detect the occurrence of one or more unprompted, predetermined utterances. The predetermined utterances preferably constitute a vocabulary of words and/or phrases having particular meaning within the context in which they are uttered. Detection of one or more of the predetermined utterances during a call causes a determination of response-determinative significance of the detected utterance(s). Based on the response-determinative significance of the detected utterance(s), a responsive action may be further determined. Additionally, long term storage of the call corresponding to the detected utterance may also be initiated. Conversely, calls in which no predetermined utterances are detected may be deleted from short term storage.
    Type: Application
    Filed: May 22, 2009
    Publication date: November 26, 2009
    Applicant: ACCENTURE GLOBAL SERVICES GMBH
    Inventors: Thomas J. Ryan, Biji K. Janan
  • Patent number: 7617094
    Abstract: One aspect of the invention is a method of using a computer to identify a conversation. Another aspect is a method for an audio processing system that identifies conversations and enhances each conversation for each user in the conversation.
    Type: Grant
    Filed: April 16, 2003
    Date of Patent: November 10, 2009
    Assignee: Palo Alto Research Center Incorporated
    Inventors: Paul M. Aoki, Margaret H. Szymanski, James D. Thornton, Daniel H. Wilson, Allison G. Woodruff
  • Patent number: 7577564
    Abstract: Method and apparatus for the classification of speech signals. Speech is classified into two broad classes of speech production—whispered speech and normally phonated speech. Speech classified in this manner will yield increased performance of automated speech processing systems because the erroneous results that occur when typical automated speech processing systems encounter non-typical speech such as whispered speech, will be avoided.
    Type: Grant
    Filed: March 3, 2003
    Date of Patent: August 18, 2009
    Assignee: The United States of America as represented by the Secretary of the Air Force
    Inventors: Stanley J. Wenndt, Edward J. Cupples
  • Patent number: 7577565
    Abstract: Packetized CELP-encoded speech playout with frame truncation during silence and frame expansion method dependent upon voicing classification with voiced frame expansion maintaining phasealignment.
    Type: Grant
    Filed: June 10, 2008
    Date of Patent: August 18, 2009
    Assignee: Texas Instruments Incorporated
    Inventors: Krishnasamy Anandakumar, Alan McCree, Erdal Paksoy
  • Patent number: 7567908
    Abstract: Differential dynamic content delivery including providing a session document for a presentation, wherein the session document includes a session grammar and a session structured document; selecting from the session structured document a classified structural element in dependence upon user classifications of a user participant in the presentation; presenting the selected structural element to the user; streaming presentation speech to the user including individual speech from at least one user participating in the presentation; converting the presentation speech to text; detecting whether the presentation speech contains simultaneous individual speech from two or more users; and displaying the text if the presentation speech contains simultaneous individual speech from two or more users.
    Type: Grant
    Filed: January 13, 2004
    Date of Patent: July 28, 2009
    Assignee: International Business Machines Corporation
    Inventors: William Kress Bodin, Michael John Burkhart, Daniel G. Eisenhauer, Daniel Mark Schumacher, Thomas J. Watson
  • Patent number: 7555430
    Abstract: Method and apparatus for multi-pass speech recognition. An input device receives spoken input. A processor performs a first pass speech recognition technique on the spoken input and forms first pass results. The first pass results include a number of alternative speech expressions, each having an assigned score related to the certainty that the corresponding expression correctly matches the spoken input. The processor selectively performs a second pass speech recognition technique on the spoken input according to the first pass results. Preferably, the second pass attempts to correctly match the spoken input to only those expressions which were identified during the first pass. Otherwise, if one of the expressions identified by the first pass is assigned a score higher than a predetermined threshold (e.g., 95%), the second pass is not performed.
    Type: Grant
    Filed: April 4, 2006
    Date of Patent: June 30, 2009
    Assignee: Nuance Communications
    Inventors: Hy Murveit, Ashvin Kannan, Ben Shahshahani, Chris Leggetter, Katherine Knill
  • Patent number: 7546238
    Abstract: A digital line transmission unit can carry out switching between speech codecs during the same call to achieve balance between making effective use of a line and a high sound quality without bringing about a feeling of discomfort in a user by the switching. It includes in an encoder a first speech codec 7 with a high sound quality and a high bit rate, a second speech codec 8 with a reasonable sound quality but a low bit rate. It carries out switching between these speech codecs in response to the control information an operation monitoring controller 4 obtains by making a decision as to the traffic volume of the bearer line 111. The switching between the speech codecs is made during a speech pause a speech burst detector 31 in a signal detector 3 detects in an input speech signal.
    Type: Grant
    Filed: February 4, 2002
    Date of Patent: June 9, 2009
    Assignee: Mitsubishi Denki Kabushiki Kaisha
    Inventor: Yoshihisa Harada
  • Patent number: 7542897
    Abstract: This disclosure is directed to techniques for condensed voice buffering, transmission and playback. The techniques may involve identification of encoded voice frames as either speech or a pause, and selective exclusion of a portion of the frames for storage, transmission or playback based on the identification. In this manner, the techniques are capable of condensing a series of encoded voice frames. When variable rate coding is employed, a pause frame may be identified, for example, based on a threshold comparison for the rate of the encoded frame. In some cases, the techniques may involve excluding only a portion of the identified frames from a consecutive sequence of the identified frames, thereby preserving a minimum number of the identified frames needed for intelligible conversation.
    Type: Grant
    Filed: August 29, 2002
    Date of Patent: June 2, 2009
    Assignee: QUALCOMM Incorporated
    Inventors: James A. Hutchison, Sun Tam
  • Patent number: 7535859
    Abstract: The present invention relates to a method and apparatus for detecting voice activity in a communication signal, wherein filter means are provided for estimating or suppressing an offset component of the level of the communication signal. A filter parameter is controlled based on the output of the filter means. Furthermore, the estimation or suppression of the offset component is limited in response to the output of the filter means. The filter means may be based on a non-linear adaptive notch level filter or a noise floor tracking filter. Thereby, the tracking behavior of noise floor estimation to sudden rises in noise floor can be improved and the voice activity detection can work efficiently over a wide dynamic range.
    Type: Grant
    Filed: October 8, 2004
    Date of Patent: May 19, 2009
    Assignee: NXP B.V.
    Inventor: Wolfgang Brox
  • Publication number: 20090089053
    Abstract: Voice activity detection using multiple microphones can be based on a relationship between an energy at each of a speech reference microphone and a noise reference microphone. The energy output from each of the speech reference microphone and the noise reference microphone can be determined. A speech to noise energy ratio can be determined and compared to a predetermined voice activity threshold. In another embodiment, the absolute value of the autocorrelation of the speech and noise reference signals are determined and a ratio based on autocorrelation values is determined. Ratios that exceed the predetermined threshold can indicate the presence of a voice signal. The speech and noise energies or autocorrelations can be determined using a weighted average or over a discrete frame size.
    Type: Application
    Filed: September 28, 2007
    Publication date: April 2, 2009
    Applicant: QUALCOMM INCORPORATED
    Inventors: Song Wang, Samir Kumar Gupta, Eddie L. T. Choy
  • Patent number: 7502734
    Abstract: The exemplary embodiments of this invention relate to a method and device for quantizing linear prediction parameters in variable bit-rate sound signal coding, in which an input linear prediction parameter vector is received, a sound signal frame corresponding to the input linear prediction parameter vector is classified, a prediction vector is computed, the computed prediction vector is removed from the input linear prediction parameter vector to produce a prediction error vector, and the prediction error vector is quantized. Computation of the prediction vector comprises selecting one of a plurality of prediction schemes in relation to the classification of the sound signal frame, and processing the prediction error vector through the selected prediction scheme. The exemplary embodiments of this invention further relate to a method and device for dequantizing linear prediction parameters in variable bit-rate sound signal decoding.
    Type: Grant
    Filed: November 22, 2006
    Date of Patent: March 10, 2009
    Assignee: Nokia Corporation
    Inventor: Milan Jelinek
  • Patent number: 7496505
    Abstract: A method and apparatus for the variable rate coding of a speech signal. An input speech signal is classified and an appropriate coding mode is selected based on this classification. For each classification, the coding mode that achieves the lowest bit rate with an acceptable quality of speech reproduction is selected. Low average bit rates are achieved by only employing high fidelity modes (i.e., high bit rate, broadly applicable to different types of speech) during portions of the speech where this fidelity is required for acceptable output. Lower bit rate modes are used during portions of speech where these modes produce acceptable output. Input speech signal is classified into active and inactive regions. Active regions are further classified into voiced, unvoiced, and transient regions. Various coding modes are applied to active speech, depending upon the required level of fidelity. Coding modes may be utilized according to the strengths and weaknesses of each particular mode.
    Type: Grant
    Filed: November 13, 2006
    Date of Patent: February 24, 2009
    Assignee: QUALCOMM Incorporated
    Inventors: Sharath Manjunath, William Gardner
  • Patent number: 7493256
    Abstract: A low-bit-rate coding technique for unvoiced segments of speech, without loss of quality compared to the conventional Code Excited Linear Prediction (CELP) method operating at a much higher bit rate. A set of gains are derived from a residual signal after whitening the speech signal by a linear prediction filter. These gains are then quantized and applied to a randomly generated sparse excitation. The excitation is filtered, and its spectral characteristics are analyzed and compared to the spectral characteristics of the original residual signal. Based on this analysis, a filter is chosen to shape the spectral characteristics of the excitation to achieve optimal performance.
    Type: Grant
    Filed: March 13, 2007
    Date of Patent: February 17, 2009
    Assignee: QUALCOMM Incorporated
    Inventor: Pengjun Huang
  • Patent number: 7487083
    Abstract: A method and an apparatus accurately discriminates between speech and voice-band data (VBD) in a communication network by calculating self similarity ratio (SSR) values, which indicate periodicity characteristics of an input signal segment, and/or autocorrelation coefficients, which indicate spectral characteristics of an input signal segment, to generate a speech/VBD discrimination result. In one implementation, the speech-VBD discriminating apparatus calculates both short-term delay and long-term delay SSR values to analyze the repetition rate of an input signal frame, thereby indicating whether the input signal frame has the periodicity characteristics of a typical speech signal or a VBD signal. The speech-VBD discriminating apparatus further calculates a plurality of short-term autocorrelation coefficients to determine the spectral envelope of an input frame, thereby facilitating accurate speech/VBD discrimination.
    Type: Grant
    Filed: July 13, 2000
    Date of Patent: February 3, 2009
    Assignee: Alcatel-Lucent USA Inc.
    Inventor: Peng Jie Zhang
  • Patent number: 7472059
    Abstract: A speech classification technique for robust classification of varying modes of speech to enable maximum performance of multi-mode variable bit rate encoding techniques. A speech classifier accurately classifies a high percentage of speech segments for encoding at minimal bit rates, meeting lower bit rate requirements. Highly accurate speech classification produces a lower average encoded bit rate, and higher quality decoded speech. The speech classifier considers a maximum number of parameters for each frame of speech, producing numerous and accurate speech mode classifications for each frame. The speech classifier correctly classifies numerous modes of speech under varying environmental conditions.
    Type: Grant
    Filed: December 8, 2000
    Date of Patent: December 30, 2008
    Assignee: QUALCOMM Incorporated
    Inventor: Pengjun Huang
  • Patent number: 7457744
    Abstract: A device and a method for estimating an open-loop pitch in a general speech CODEC are disclosed. The open-loop pitch estimation device includes an autocorrelation function calculation unit which calculates a normalized autocorrelation function from a perceptual weighing filtered speech signal, a maximum autocorrelation function and lag estimation unit which estimates a maximum autocorrelation function and candidates for the maximum autocorrelation function, a pitch candidate decision unit which decides candidates for a pitch by using the ratio of the estimated maximum autocorrelation function to the candidates for the estimated maximum autocorrelation function, and lags of which values are smaller than a predetermined threshold value, and a pitch estimation unit which estimates a pitch between the candidates for a pitch and the lags corresponding to the estimated maximum autocorrelation function by using a pitch of a previous frame of the speech signal.
    Type: Grant
    Filed: July 25, 2003
    Date of Patent: November 25, 2008
    Assignee: Electronics and Telecommunications Research Institute
    Inventors: Mi-suk Lee, Dae-hwan Hwang
  • Publication number: 20080288247
    Abstract: A method for detecting the presence or absence of an audio signal in a communications system in which an audio signal is encoded by a delta modulation encoding algorithm, and in which a step size parameter is adapted according to characteristics of the encoded signal, the method comprising determining based on the magnitude of the step size parameter whether the encoded signal represents audio activity, and adapting the operation of the communication system based on that determination.
    Type: Application
    Filed: June 24, 2005
    Publication date: November 20, 2008
    Applicant: CAMBRIDGE SILICON RADIO LIMITED
    Inventor: Robert Young
  • Publication number: 20080281586
    Abstract: A “speech onset detector” provides a variable length frame buffer in combination with either variable transmission rate or temporal speech compression for buffered signal frames. The variable length buffer buffers frames that are not clearly identified as either speech or non-speech frames during an initial analysis. Buffering of signal frames continues until a current frame is identified as either speech or non-speech. If the current frame is identified as non-speech, buffered frames are encoded as non-speech frames. However, if the current frame is identified as a speech frame, buffered frames are searched for the actual onset point of the speech. Once that onset point is identified, the signal is either transmitted in a burst, or a time-scale modification of the buffered signal is applied for compressing buffered frames beginning with the frame in which onset point is detected. The compressed frames are then encoded as one or more speech frames.
    Type: Application
    Filed: July 28, 2008
    Publication date: November 13, 2008
    Applicant: MICROSOFT CORPORATION
    Inventors: Dinei A. Florencio, Philip A. Chou
  • Patent number: 7433358
    Abstract: An embodiment may include an apparatus comprising a dejitter buffer to receive packets containing audio data, a codec coupled with the dejitter buffer, the codec to receive coded audio frames from the dejitter buffer and decode them, and a concealed seconds meter coupled with the dejitter buffer, the concealed seconds meter to record concealment events by the decoder to provide an objective measure of media impairment. Another exemplary embodiment may be a method comprising receiving packets containing audio information at a dejitter buffer, decomposing the packets to coded audio frames, sending the coded audio frames to a decoder and decoding the frames, generating a concealment output stream if the decoder does not receive a valid frame from the dejitter buffer, and recording concealment events to provide an objective measure of media impairment.
    Type: Grant
    Filed: July 8, 2005
    Date of Patent: October 7, 2008
    Assignee: Cisco Technology, Inc.
    Inventors: Paul Volkaerts, Kevin Joseph Connor, James C. Frauenthal, Rajesh Kumar
  • Publication number: 20080243494
    Abstract: A speech receiving unit receives a user ID, a speech obtained at a terminal, and an utterance duration, from the terminal. A proximity determining unit calculates a correlation value expressing a correlation between speeches received from plural terminals, compares the correlation value with a first threshold value, and determines that the plural terminals that receive the speeches whose correlation value is calculated are close to each other, when the correlation value is larger than the first threshold value. A dialog detecting unit determines whether a relationship between the utterance durations received from the plural terminals that are determined to be close to each other within an arbitrarily target period during which a dialog is to be detected fits a rule. When the relationship is determined to fit the rule, the dialog detecting unit detects dialog information containing the target period and the user ID.
    Type: Application
    Filed: March 11, 2008
    Publication date: October 2, 2008
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Masayuki Okamoto, Naoki Iketani, Hideo Umeki, Sogo Tsuboi, Kenta Cho, Keisuke Nishimura, Masanori Hattori
  • Publication number: 20080243495
    Abstract: Packetized CELP-encoded speech playout with frame truncation during silence and frame expansion method dependent upon voicing classification with voiced frame expansion maintaining phasealignment.
    Type: Application
    Filed: June 10, 2008
    Publication date: October 2, 2008
    Inventors: Krishnasamy Anandakumar, Alan V. McCree, Erdal Paksoy
  • Patent number: 7415416
    Abstract: A voice activated camera is described which allows users to take remote photographs by speaking one or more keywords. In a preferred embodiment, a speech processing unit is provided which is arranged to detect extended periodic signals from a microphone of the camera. A control unit is also provided to control the taking of a photograph when such an extended periodic component is detected by the speech processing unit.
    Type: Grant
    Filed: September 10, 2004
    Date of Patent: August 19, 2008
    Assignee: Canon Kabushiki Kaisha
    Inventor: David Llewellyn Rees
  • Patent number: 7412379
    Abstract: Techniques utilising Time Scale Modification (TSM) of signals are described. The signal is analysed and divided into frames of similar signal types. Techniques specific to the signal type are then applied to the frames thereby optimising the modification process. The method of the present invention enables TSM of different audio signal parts to be realized using different methods, and a system for effecting said method is also described.
    Type: Grant
    Filed: April 2, 2002
    Date of Patent: August 12, 2008
    Assignee: Koninklijke Philips Electronics N.V.
    Inventors: Rakesh Taori, Andreas Johannes Gerrits, Dzevdet Burazerovic
  • Patent number: 7411985
    Abstract: A low complexity packet loss concealment method for use in voice-over-IP speech transmission calculates a cross-correlation of previous speech data to estimate the pitch period of the previous speech when speech frames have been lost. A tap interval used to calculate the cross-correlation is dynamically adapted, thereby reducing the computational complexity of the process. In addition, the pitch period estimation is bypassed completely when it is determined not to be necessary, as a result of the speech being unvoiced or silence. A waveform “bending” operation is performed into the current frame without inserting any algorithmic delay into each frame.
    Type: Grant
    Filed: March 21, 2003
    Date of Patent: August 12, 2008
    Assignee: Lucent Technologies Inc.
    Inventors: Minkyu Lee, James William McGowan
  • Patent number: 7406096
    Abstract: Techniques are presented herein to provide tandem-free operation between two wireless terminals through two otherwise incompatible wireless networks. Specifically, embodiments provide tandem-free operation between a wireless terminal communicating through a continuous transmission (CTX) wireless channel to a wireless terminal communicating through a discontinuous transmission (DTX) wireless channel. In a first aspect, inactive speech frames are translated between DTX and CTX formats. In a second aspect, each wireless terminal includes an active speech decoder that is compatible with the active speech encoder on the opposite end of the mobile-to-mobile connection.
    Type: Grant
    Filed: December 6, 2002
    Date of Patent: July 29, 2008
    Assignee: QUALCOMM Incorporated
    Inventors: Khaled Helmi El-Maleh, Ananthapadmanabhan Arasanipalai Kandhadai, Sharath Manjunath