Specialized Information Patents (Class 704/206)
  • Patent number: 8788275
    Abstract: A decoding apparatus decodes a first encoded data that is encoded from a low-frequency component of an audio signal, and a second encoded data that is used when creating a high-frequency component of an audio signal from a low-frequency component and encoded in accordance with a certain bandwidth, into the audio signal. In the decoding apparatus, a high-frequency component detecting unit divides the high-frequency component into bands with a certain interval range correspondingly to the certain bandwidth, and detects magnitude of the high-frequency components corresponding to each of the bands. A high-frequency component compensating unit compensates the high-frequency components based on the magnitude of the high-frequency components corresponding to each of the bands detected by the high-frequency component detecting unit.
    Type: Grant
    Filed: September 20, 2007
    Date of Patent: July 22, 2014
    Assignee: Fujitsu Limited
    Inventors: Miyuki Shirakawa, Masanao Suzuki, Takashi Makiuchi, Yoshiteru Tsuchinaga
  • Patent number: 8788264
    Abstract: An audio encoding device (1A) corrects initial gain information calculated for an arbitrary frame, based on gain information of a stored past frame, thereby calculating gain information to be used in the frame. The audio encoding device (1A) encodes the calculated gain information as a difference from the gain information of the past frame. An audio decoding device (3A) receives the differential gain, and calculates the gain of the arbitrary frame based on the gain used in the past frame, thereby generating a decoded audio signal.
    Type: Grant
    Filed: June 25, 2008
    Date of Patent: July 22, 2014
    Assignee: NEC Corporation
    Inventor: Osamu Shimada
  • Publication number: 20140200884
    Abstract: Systems and methods for applying user specific acoustic adjustment parameters are provided. The intelligibility of speech for a particular user is determined and a set of acoustic adjustment parameters is determined. The set or template of acoustic adjustment parameters for the user is placed in central store, for example provided as or in association with a server. The template can be obtained from the server for application in connection with a communication involving the user by providing an identification of the template.
    Type: Application
    Filed: January 17, 2013
    Publication date: July 17, 2014
    Applicant: Avaya Inc.
    Inventors: Chris McArthur, Paul Haig, John C. Lynch, Paul Roller Michaelis
  • Publication number: 20140200885
    Abstract: The invention relates to the analysis of characteristics of audio and/or video signals for the generation of audio-visual content signatures. To determine an audio signature a region of interest for example of high entropy—is identified in audio signature data. This region of interest is then provided as an audio signature with offset information. A video signature is also provided.
    Type: Application
    Filed: February 26, 2014
    Publication date: July 17, 2014
    Applicant: Snell Limited
    Inventor: Jonathan Diggins
  • Patent number: 8781844
    Abstract: A method for encoding an audio signal including: processing a selected subset of a lower series of samples forming a lower frequency spectral band of the audio signal and a higher series of samples forming a higher frequency spectral band of the audio signal to parametrically encode the higher series of samples forming the higher frequency spectral band by identifying a sub-series of the lower series of samples.
    Type: Grant
    Filed: September 25, 2009
    Date of Patent: July 15, 2014
    Assignee: Nokia Corporation
    Inventors: Lasse Juhani Laaksonen, Mikko Tapio Tammi, Adriana Vasilache, Anssi Sakari Ramo
  • Publication number: 20140188464
    Abstract: An apparatus for generating a bandwidth extended signal includes an anti-sparseness processing unit to perform anti-sparseness processing on a low-frequency spectrum; and a frequency domain high-frequency extension decoding unit to perform high-frequency extension encoding in the frequency domain on the low-frequency spectrum on which the anti-sparseness processing is performed.
    Type: Application
    Filed: July 2, 2012
    Publication date: July 3, 2014
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventor: Ki-hyun Choo
  • Patent number: 8762139
    Abstract: A noise suppression device includes: a power spectrum calculator converting an input signal of time domain into power spectra of frequency domain; a voice/noise determination unit determining whether the power spectra indicate voice or noise; a noise spectrum estimation unit estimating noise spectra of the power spectra; a period component estimation unit analyzing a harmonic structure constituting the power spectra and estimating periodical information about the power spectra; a weighting coefficient calculator calculating a weighting coefficient for weighting the power spectra; a suppression coefficient calculator calculating a suppression coefficient for suppressing noise included in the power spectra; a spectrum suppression unit suppressing amplitude of the power spectra in accordance with the suppression coefficient; and an inverse Fourier transformer converting the power spectra output by the spectrum suppression unit into a signal of time domain to generate a noise-suppressed signal.
    Type: Grant
    Filed: September 21, 2010
    Date of Patent: June 24, 2014
    Assignee: Mitsubishi Electric Corporation
    Inventors: Satoru Furuta, Hirohisa Tasaki
  • Patent number: 8762158
    Abstract: A method and apparatus for generating synthesis audio signals are provided. The method includes decoding a bitstream; splitting the decoded bitstream into n sub-band signals; generating n transformed sub-band signals by transforming the n sub-band signals in a frequency domain; and generating synthesis audio signals by respectively multiplying the n transformed sub-band signals by values corresponding to synthesis filter bank coefficients.
    Type: Grant
    Filed: August 5, 2011
    Date of Patent: June 24, 2014
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Hyun-wook Kim, Han-gil Moon, Sang-hoon Lee
  • Patent number: 8744842
    Abstract: A robust method and apparatus to detect voice activity based on the power level of an audio frame. The method may include performing primary active/non-active voice period determination of an input audio frame according to a power level of the audio frame, extracting a noise power prediction value and a signal power prediction value by referring to power levels of current and previous audio frames according to a primary active/non-active voice period determination value, and performing secondary active/non-active voice period determination for the input audio frame by comparing the extracted signal power prediction value with the extracted noise power prediction value.
    Type: Grant
    Filed: May 28, 2008
    Date of Patent: June 3, 2014
    Assignee: Samsung Electronics Co., Ltd.
    Inventor: Jae-youn Cho
  • Patent number: 8744863
    Abstract: A multi-mode audio signal decoder has a spectral value determinator to obtain sets of decoded spectral coefficients for a plurality of portions of an audio content and a spectrum processor configured to apply a spectral shaping to a set of spectral coefficients in dependence on a set of linear-prediction-domain parameters for a portion of the audio content encoded in a linear-prediction mode, and in dependence on a set of scale factor parameters for a portion of the audio content encoded in a frequency-domain mode. The audio signal decoder has a frequency-domain-to-time-domain converter configured to obtain a time-domain audio representation on the basis of a spectrally-shaped set of decoded spectral coefficients for a portion of the audio content encoded in the linear-prediction mode and for a portion of the audio content encoded in the frequency domain mode. An audio signal encoder is also described.
    Type: Grant
    Filed: April 6, 2012
    Date of Patent: June 3, 2014
    Assignee: Fraunhofer-Gesellschaft zur Foerderung der Angewandten Forschung E.V.
    Inventors: Max Neuendorf, Guillaume Fuchs, Nikolaus Rettelbach, Tom Baeckstroem, Jeremie Lecomte, Juergen Herre
  • Publication number: 20140149111
    Abstract: A speech enhancement apparatus includes: a noise estimating unit which estimates a noise component contained in a speech signal for each frequency band; a signal-to-noise ratio computing unit which computes, for each frequency band, a signal-to-noise ratio; a gain computing unit which selects a frequency band whose computed signal-to-noise ratio indicates that the signal component contained in the speech signal for the frequency band is recognizable, and which determines a gain indicating the degree of enhancement to be applied to the speech signal in accordance with the signal-to-noise ratio of the selected frequency band; and an enhancing unit which amplifies an amplitude component of a frequency domain signal in each frequency band in accordance with the gain, and which corrects the amplitude component of the frequency domain signal by subtracting the noise component from the amplitude component in each frequency band.
    Type: Application
    Filed: November 6, 2013
    Publication date: May 29, 2014
    Applicant: FUJITSU LIMITED
    Inventor: Naoshi MATSUO
  • Patent number: 8731923
    Abstract: A system and method for merging audio data streams receive audio data streams from separate inputs, independently transform each data stream from the time to the frequency domain, and generate separate feature data sets for the transformed data streams. Feature data from each of the separate feature data sets is selected to form a merged feature data set that is output to a decoder for recognition purposes. The separate inputs can include an ear microphone and a mouth microphone.
    Type: Grant
    Filed: August 20, 2010
    Date of Patent: May 20, 2014
    Assignee: Adacel Systems, Inc.
    Inventor: Chang-Qing Shu
  • Patent number: 8719019
    Abstract: Speaker identification techniques are described. In one or more implementations, sample data is received at a computing device of one or more user utterances captured using a microphone. The sample data is processed by the computing device to identify a speaker of the one or more user utterances. The processing involving use of a feature set that includes features obtained using a filterbank having filters that space linearly at higher frequencies and logarithmically at lower frequencies, respectively, features that model the speaker's vocal tract transfer function, and features that indicate a vibration rate of vocal folds of the speaker of the sample data.
    Type: Grant
    Filed: April 25, 2011
    Date of Patent: May 6, 2014
    Assignee: Microsoft Corporation
    Inventors: Hoang T. Do, Ivan J. Tashev, Alejandro Acero, Jason S. Flaks, Robert N. Heitkamp, Molly R. Suver
  • Patent number: 8712767
    Abstract: A scalable encoding apparatus, a scalable decoding apparatus and the like are disclosed which can achieve a band scalable LSP encoding that exhibits both a high quantization efficiency and a high performance. In these apparatuses, a narrow band-to-wide band converter receives and converts a quantized narrow band LSP to a wide band, and then outputs the quantized narrow band LSP as converted (i.e., a converted wide band LSP parameter) to an LSP-to-LPC converter. The LSP-to-LPC converter converts the quantized narrow band LSP as converted to a linear prediction coefficient and then outputs it to a pre-emphasizer. The pre-emphasizer calculates and outputs the pre-emphasized linear prediction coefficient to an LPC-to-LSP converter. The LPC-to-LSP converter converts the pre-emphasized linear prediction coefficient to a pre-emphasized quantized narrow band LSP as wide band converted, and then outputs it to a prediction quantizer.
    Type: Grant
    Filed: October 28, 2010
    Date of Patent: April 29, 2014
    Assignee: Panasonic Corporation
    Inventor: Hiroyuki Ehara
  • Patent number: 8694325
    Abstract: A hierarchical audio coding, decoding method and system are provided. The method includes dividing frequency domain coefficients of an audio signal after MDCT into a plurality of coding sub-bands, quantizing and coding amplitude envelope values of coding sub-bands; allocating bits to each coding sub-band of the core layer, quantizing and coding core layer frequency domain coefficients to obtain coded bits of core layer frequency domain coefficients; calculating the amplitude envelope value of each coding sub-band of the core layer residual signal; allocating bits to each coding sub-band of the extended layer, quantizing and coding the extended layer coding signal to obtain coded bits of the extended layer coding signal; multiplexing and packing amplitude value envelope coded bits of each coding sub-band composed by core layer and extended layer frequency domain coefficients, core layer frequency coefficients coded bits, and extended layer coding signal coded bits, then transmitting to the decoding end.
    Type: Grant
    Filed: October 26, 2010
    Date of Patent: April 8, 2014
    Assignee: ZTE Corporation
    Inventors: Zhibin Lin, Zheng Deng, Hao Yuan, Jing Lu, Xiaojun Qiu, Jiali Li, Guoming Chen, Ke Peng, Kaiwen Liu
  • Patent number: 8694306
    Abstract: A method of processing a signal, including taking a signal formed from a plurality of source signal emitters and expressed in an original domain, decomposing the signal into a mathematical representation of a plurality of constituent elements in an alternate domain, analyzing the plurality of constituent elements to associate at least a subset of the constituent elements with at least one of the plurality of source signal emitters, separating at least a subset of the constituent elements based on the association and reconstituting at least a subset of constituent elements to produce an output signal in at least one of the original domain, the alternate domain and another domain.
    Type: Grant
    Filed: May 3, 2013
    Date of Patent: April 8, 2014
    Assignee: Kaonyx Labs LLC
    Inventors: Kevin M. Short, Brian T. Hone
  • Patent number: 8682664
    Abstract: The present invention discloses a method and a device for audio signal classification, and relates to the field of communications technologies, which solve a problem of high complexity of type classification of audio signals in the prior art. In the present invention, after an audio signal to be classified is received, a tonal characteristic parameter of the audio signal to be classified, where the tonal characteristic parameter of the audio signal to be classified is in at least one sub-band, is obtained, and a type of the audio signal to be classified is determined according to the obtained characteristic parameter. The present invention is mainly applied to an audio signal classification scenario, and implements audio signal classification through a relatively simple method.
    Type: Grant
    Filed: September 27, 2011
    Date of Patent: March 25, 2014
    Assignee: Huawei Technologies Co., Ltd.
    Inventors: Lijing Xu, Shunmei Wu, Liwei Chen, Qing Zhang
  • Patent number: 8682651
    Abstract: The invention relates to the analysis of characteristics of audio and/or video signals for the generation of audio-visual content signatures. To determine an audio signature a region of interest for example of high entropy—is identified in audio signature data. This region of interest is then provided as an audio signature with offset information. A video signature is also provided.
    Type: Grant
    Filed: February 20, 2009
    Date of Patent: March 25, 2014
    Assignee: Snell Limited
    Inventor: Jonathan Diggins
  • Patent number: 8676572
    Abstract: A computer-implemented system and method for enhancing audio to individuals participating in a conversation is provided. Audio data for individuals participating in one or more conversations is analyzed. Possible conversational configurations of the individuals are generated based on the audio data, and each possible conversational configuration includes one or more subconfigurations of at least two of the individuals. A probability weight is assigned to each of the subconfigurations and includes a likelihood that the individuals of that subconfiguration are participating in one of the conversations. A probability of each possible conversational configuration is determined by combining the probability weights for the subconfigurations of that possible conversational configuration. The possible conversational configuration with the highest probability is selected as a most probable configuration. The individuals participating in the conversations are determined based on the most probable configuration.
    Type: Grant
    Filed: March 14, 2013
    Date of Patent: March 18, 2014
    Assignee: Palo Alto Research Center Incorporated
    Inventors: Paul M. Aoki, Margaret H. Szymanski, James D. Thornton, Daniel H. Wilson, Allison G. Woodruff
  • Patent number: 8676574
    Abstract: In a spoken language processing method for tone/intonation recognition, an auditory spectrum may be determined for an input window of sound and one or more multi-scale features may be extracted from the auditory spectrum. Each multi-scale feature can be extracted using a separate two-dimensional spectro-temporal receptive filter. One or more feature maps corresponding to the one or more multi-scale features can be generated and an auditory gist vector can be extracted from each of the one or more feature maps. A cumulative gist vector may be obtained through augmentation of each auditory gist vector extracted from the one or more feature maps. One or more tonal characteristics corresponding to the input window of sound can be determined by mapping the cumulative gist vector to one or more tonal characteristics using a machine learning algorithm.
    Type: Grant
    Filed: November 10, 2010
    Date of Patent: March 18, 2014
    Assignee: Sony Computer Entertainment Inc.
    Inventor: Ozlem Kalinli
  • Patent number: 8676172
    Abstract: A system for generating a relational indicator based on analysis of at least one telecommunications event between a first party and a second party, comprises: a relation management engine which is configured to process first content characteristics extracted from a plurality of telecommunications events to produce a first relation parameter and to process second content characteristics extracted from the plurality of telecommunications events to produce a second relation parameter; a terminal device configured to use the first and second relation parameters to generate the relational indicator.
    Type: Grant
    Filed: June 29, 2009
    Date of Patent: March 18, 2014
    Assignee: Nokia Solutions and Networks Oy
    Inventors: Lorant Farkas, Gyorgy Csefan, Jozsef Dombi
  • Patent number: 8666732
    Abstract: A high frequency signal interpolation apparatus provides, with a simple structure, a high-quality digital audio signal through interpolation of high frequency signals missing due to compression. The high frequency signal interpolation apparatus includes a peak value detection and holding circuit configured to detect a peak value of a digital audio signal provided to an input terminal by sampling the digital audio signal and generate a square wave signal by holding the detected peak value; a high-pass filter configured to extract a higher harmonic component from the generated square wave signal; and an adder configured to add the extracted higher harmonic component to the digital audio signal provided to the input terminal.
    Type: Grant
    Filed: October 16, 2007
    Date of Patent: March 4, 2014
    Assignee: Kyushu Institute of Technology
    Inventors: Yasushi Sato, Atsuko Ryu
  • Patent number: 8666733
    Abstract: When encoding an audio signal, it is possible to efficiently encode the audio signal while maintaining high register signal components, and prevent deterioration of sound quality of decoded signal. A digital audio signal is divided into a plurality of frequency bands. The digital audio signal having been divided into each band is function-approximated for each divided band. Further, parameters of function having been function-approximated are encoded. When performing decoding process, parameters of the function of each band are used to perform function interpolation, synthesize the function-interpolated signal of each band interpolated, and decode the signal. Thus, when function-approximating each band, by suitably setting the function equation, it is possible to perform an encoding process while maintaining the high register components and perform a compression-coding process which enables reproduction with very good sound quality.
    Type: Grant
    Filed: June 3, 2009
    Date of Patent: March 4, 2014
    Assignee: Japan Science and Technology Agency
    Inventors: Kazuo Toraichi, Mitsuteru Nakamura, Yasuo Morooka
  • Patent number: 8645127
    Abstract: Traditional audio encoders may conserve coding bit-rate by encoding fewer than all spectral coefficients, which can produce a blurry low-pass sound in the reconstruction. An audio encoder using wide-sense perceptual similarity improves the quality by encoding a perceptually similar version of the omitted spectral coefficients, represented as a scaled version of already coded spectrum. The omitted spectral coefficients are divided into a number of sub-bands. The sub-bands are encoded as two parameters: a scale factor, which may represent the energy in the band; and a shape parameter, which may represent a shape of the band. The shape parameter may be in the form of a motion vector pointing to a portion of the already coded spectrum, an index to a spectral shape in a fixed code-book, or a random noise vector. The encoding thus efficiently represents a scaled version of a similarly shaped portion of spectrum to be copied at decoding.
    Type: Grant
    Filed: November 26, 2008
    Date of Patent: February 4, 2014
    Assignee: Microsoft Corporation
    Inventors: Sanjeev Mehrotra, Wei-Ge Chen
  • Patent number: 8631295
    Abstract: A method and apparatus for selectively replacing damaged portions of a data stream. The method comprises analyzing the data stream to identify damaged portions therein; selecting a damaged portion for replacement; and replacing the selected damaged portion. The selected damaged portion is selected for replacement in dependence on a rate of replacement, the rate of replacement being that at which previous portions of the data stream have been replaced.
    Type: Grant
    Filed: November 20, 2012
    Date of Patent: January 14, 2014
    Assignee: Cambridge Silicon Radio Limited
    Inventors: Xuejing Sun, Sameer Gadre, Scott Plude
  • Patent number: 8626495
    Abstract: The invention relates to a method of identifying and correcting errors in a noisy binary mask. An object of the present invention is to provide a scheme for improving a binary mask representing speech. The problem is solved in that the method comprises a) providing a noisy binary mask comprising a binary representation of the power density of an acoustic signal comprising a target signal and a noise signal at a predefined number of discrete frequencies and a number of discrete time instances; b) providing a statistical model of a clean binary mask representing the power density of the target signal; and c) using the statistical model to detect and correct errors in the noisy binary mask. This has the advantage of providing an alternative and relatively simple way of improving an estimate of a binary mask representing a speech signal. The invention may e.g. be used for speech processing, e.g. in a hearing instrument.
    Type: Grant
    Filed: August 4, 2010
    Date of Patent: January 7, 2014
    Assignee: Oticon A/S
    Inventors: Jesper Bünsow Boldt, Ulrik Kjems, Michael Syskind Pedersen, Mads Graesbøll Christensen, Søren Holdt Jensen
  • Patent number: 8620643
    Abstract: A computer numerical processing method for representing audio information for use in conjunction with human hearing is described. The method comprises approximating an eigenfunction equation representing a model of human hearing, calculating the approximation to each of a plurality of eigenfunctions from at least one aspect of the eigenfunction equation, and storing the approximation to each of a plurality of eigenfunctions for use at a later time. The approximation to each of a plurality of eigenfunctions represents audio information. The model of human hearing includes a bandpass operation with a bandwidth having the frequency range of human hearing and a time-limiting operation approximating the time duration correlation window of human hearing.
    Type: Grant
    Filed: August 2, 2010
    Date of Patent: December 31, 2013
    Inventor: Lester F. Ludwig
  • Patent number: 8620660
    Abstract: Improved oscillator-based source modeling methods for estimating model parameters, for evaluating model quality for restoring the input from the model parameters, and for improving performance over known in the art methods are disclosed. An application of these innovations to speech coding is described. The improved oscillator model is derived from the information contained in the current input signal as well as from some form of data history, often the restored versions of the earlier processed data. Operations can be performed in real time, and compression can be achieved at a user-specified level of performance and, in some cases, without information loss. The new model can be combined with methods in the existing art in order to complement the properties of these methods, to improve overall performance. The present invention is effective for very low bit-rate coding/compression and decoding/decompression of digital signals, including digitized speech and audio signals.
    Type: Grant
    Filed: October 29, 2010
    Date of Patent: December 31, 2013
    Assignee: The United States of America, as Represented by the Secretary of the Navy
    Inventors: Anton Yen, Irina Gorodnitsky
  • Patent number: 8615392
    Abstract: The present technology provides a sophisticated level of control of the spatial pattern of an acoustic field which can overcome or substantially alleviate problems associated with transmitting an acoustic signal within the near-end acoustic environment. The spatial pattern is produced by utilizing an array of audio transducers which generate a plurality of acoustic waves forming an acoustic interference pattern, such that the resultant acoustic energy is constrained (e.g. limited to an acoustic energy level at or below a predetermined threshold level) in one or more regions of the spatial pattern. In doing so, listeners in these region(s) may not receive sufficient acoustic energy to hear the associated acoustic signal, while listeners in other regions can. Similarly, these techniques can suppress echo paths within those region(s).
    Type: Grant
    Filed: September 29, 2010
    Date of Patent: December 24, 2013
    Assignee: Audience, Inc.
    Inventor: Michael M. Goodwin
  • Patent number: 8615390
    Abstract: The invention relates to transform coding/decoding of a digital audio signal represented by a succession of frames, using windows of different lengths. For the coding within the meaning of the invention, it is sought to detect (51) a particular event, such as an attack, in a current frame (Ti); and, at least if said particular event is detected at the start of the current frame (53), a short window (54) is directly applied in order to code (56) the current frame (Ti) without applying a transition window. Thus, the coding has a reduced delay in relation to the prior art. In addition, an ad hoc processing is applied during decoding in order to compensate for the direct passage from a long window to a short window during coding.
    Type: Grant
    Filed: December 18, 2007
    Date of Patent: December 24, 2013
    Assignee: France Telecom
    Inventors: Balazs Kovesi, David Virette, Pierrick Philippe
  • Patent number: 8606567
    Abstract: Provided is a signal encoding apparatus including: an encoding unit which encodes a quantization value of a frequency spectrum in an input signal through a plurality of encoding algorithms; an amplitude change amount calculation unit which calculates, for each of a plurality of subbands of the frequency spectrum, an amplitude change amount with respect to the frequency spectrum based on a spectrum envelope of the frequency spectrum; and an encoding selection unit which selects, for each subband, the encoding algorithm according to a degree of deflection of an occurrence probability distribution of the quantization value in the amplitude change amount among the plurality of the encoding algorithms.
    Type: Grant
    Filed: July 8, 2010
    Date of Patent: December 10, 2013
    Assignee: Sony Corporation
    Inventors: Yuuji Maeda, Jun Matsumoto, Yasuhiro Toguri, Shiro Suzuki, Yuuki Matsumura
  • Patent number: 8606586
    Abstract: A bandwidth extension encoder for encoding an audio signal has a signal analyzer, a core encoder, a parameter calculator, and a window controller. The audio signal has a low frequency signal having a core frequency band and a high frequency signal having an upper frequency band. The signal analyzer is configured for analyzing the audio signal, the audio signal having a block of audio samples, the block having a specified length in time. The signal analyzer is furthermore configured for determining from a plurality of analysis windows an analysis window to be used for performing a bandwidth extension in a bandwidth extension decoder. The core encoder is configured for encoding the low frequency signal to acquire an encoded or frequency signal. The parameter calculator is configured for calculating bandwidth extension parameters from the high frequency signal. The window controller is configured to provide control information indicating analysis window functions.
    Type: Grant
    Filed: December 22, 2011
    Date of Patent: December 10, 2013
    Assignee: Fraunhofer-Gesellschaft zur Foerderung der Angewandten Forschung E.V.
    Inventors: Frederik Nagel, Markus Multrus, Sascha Disch, Jeremie Lecomte, Christian Ertel, Patrick Warmbold
  • Patent number: 8600758
    Abstract: Described herein are methods, systems, apparatuses and products for reconstruction of a smooth speech signal from a stuttered speech signal. One aspect provides for accessing a stored speech signal having stuttering; identifying at least one stuttered region in the stored speech signal; modifying the at least one stuttered region in the stored speech signal; and responsive to modifying the at least one stuttered region, reconstructing a smooth speech signal corresponding to the stored speech signal. Other embodiments are disclosed.
    Type: Grant
    Filed: August 28, 2012
    Date of Patent: December 3, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Om Dadaji Deshmukh, Suraj Satishkumar Sheth, Ashish Verma
  • Patent number: 8600765
    Abstract: Embodiments of the present invention provide a signal classification method and device, and encoding and decoding methods and devices. The encoding method includes: dividing a current frame into a low-frequency band signal and a high-frequency band signal; attenuating the high-frequency band signal or a to-be-encoded characteristic parameter of the high-frequency band signal according to an energy attenuation value of the low-frequency band signal, where the energy attenuation value indicates energy attenuation of the low-frequency band signal caused by encoding of the low-frequency band signal; and encoding the attenuated high-frequency band signal or the attenuated to-be-encoded characteristic parameter of the high-frequency band signal. The technical solutions according to the embodiments of the present invention can improve the effect of combining the low-frequency band signal and the high-frequency band signal at the decoder.
    Type: Grant
    Filed: December 27, 2012
    Date of Patent: December 3, 2013
    Assignee: Huawei Technologies Co., Ltd.
    Inventors: Zexin Liu, Lei Miao, Anisse Taleb
  • Patent number: 8583426
    Abstract: A method for enhancing speech components of an audio signal composed of speech and noise components processes subbands of the audio signal, the processing including controlling the gain of the audio signal in ones of the subbands, wherein the gain in a subband is controlled at least by processes that convey either additive/subtractive differences in gain or multiplicative ratios of gain so as to reduce gain in a subband as the level of noise components increases with respect to the level of speech components in the subband and increase gain in a subband when speech components are present in subbands of the audio signal, the processes each responding to subbands of the audio signal and controlling gain independently of each other to provide a processed subband audio signal.
    Type: Grant
    Filed: September 10, 2008
    Date of Patent: November 12, 2013
    Assignee: Dolby Laboratories Licensing Corporation
    Inventors: Rongshan Yu, Charles Phillip Brown
  • Patent number: 8583425
    Abstract: Methods, systems, and computer readable media for fricatives and high frequencies detection are disclosed. According to one method, the method includes receiving a narrowband signal. The method also includes detecting, using one or more autocorrelation coefficients, a high frequency speech component associated with the narrowband signal.
    Type: Grant
    Filed: June 21, 2011
    Date of Patent: November 12, 2013
    Assignee: Genband US LLC
    Inventors: Emmanuel Rossignol Thepie Fapi, Eric Poulin
  • Patent number: 8571853
    Abstract: A method and apparatus for laughter detection. Laughter is detected through the presence of a sequence of at least a predetermined number such as three consecutive bursts, each burst comprising a voiced portion and an unvoiced portion. After detecting bursts, n-tuples such as triplets are detected, and a likelihood of each burst N-tuple to represent laughter is provided by comparison to predetermined thresholds. Finally, a total score is assigned to the signal based on the grades associated with the triplets and parameters such as the distance between the N-tuples, the total score representing the probability that the audio signal comprises a laughter episode. The method and apparatus preferably comprise a training step and module for determining the thresholds according to manually marked audio signals.
    Type: Grant
    Filed: February 11, 2007
    Date of Patent: October 29, 2013
    Assignee: Nice Systems Ltd.
    Inventors: Oren Peleg, Moshe Wasserblat
  • Patent number: 8566107
    Abstract: Disclosed is a method of processing a signal, which includes receiving at least one of a first signal and a second signal, receiving mode information, and decoding the at least one of the first signal and the second signal using at least one of a first coding scheme and a second coding scheme according to the mode information. The mode information is information for indicating that a prescribed mode corresponds to one of at least three modes. The method includes detecting when a restricted mode change occurs and changing at least one mode when detecting a restricted mode change.
    Type: Grant
    Filed: October 15, 2008
    Date of Patent: October 22, 2013
    Assignees: LG Electronics Inc., Intellectual Discovery Co., Ltd.
    Inventors: Hyen-O Oh, Hong Goo Kang, Chang Heon Lee, Sang Wook Shin, Yang Won Jung
  • Patent number: 8566092
    Abstract: The present invention discloses a method and an apparatus for extracting a prosodic feature of a speech signal, the method including: dividing the speech signal into speech frames; transforming the speech frames from time domain to frequency domain; and extracting respective prosodic features for different frequency ranges. According to the above technical solution of the present invention, it is possible to effectively extract the prosodic feature which can combine with a traditional acoustics feature without any obstacle.
    Type: Grant
    Filed: August 16, 2010
    Date of Patent: October 22, 2013
    Assignee: Sony Corporation
    Inventors: Kun Liu, Weiguo Wu
  • Patent number: 8554546
    Abstract: A logarithmic frequency spectrum within a predetermined time range is calculated from a speech signal. The logarithmic frequency spectrum has a frequency element at equal intervals along a logarithmic frequency axis. A logarithmic frequency spectrogram is calculated by connecting a plurality of logarithmic frequency spectrums. A value of the frequency element along a straight line on the logarithmic frequency spectrogram is voted onto a Hough plane. The Hough plane has a voted value in correspondence with a gradient of the straight line. The voted value above a threshold and the gradient corresponding to the voted value are extracted from the Hough plane. A fundamental frequency change is calculated using the voted value and the gradient extracted.
    Type: Grant
    Filed: September 9, 2009
    Date of Patent: October 8, 2013
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Yusuke Kida, Takashi Masuko
  • Patent number: 8548803
    Abstract: A system and method may be configured to process an audio signal. The system and method may track pitch, chirp rate, and/or harmonic envelope across the audio signal, may reconstruct sound represented in the audio signal, and/or may segment or classify the audio signal. A transform may be performed on the audio signal to place the audio signal in a frequency chirp domain that enhances the sound parameter tracking, reconstruction, and/or classification.
    Type: Grant
    Filed: August 8, 2011
    Date of Patent: October 1, 2013
    Assignee: The Intellisis Corporation
    Inventors: David C. Bradley, Daniel S. Goldin, Robert N. Hilton, Nicholas K. Fisher, Rodney Gateau, Derrick R. Roos, Eric Wiewiora
  • Patent number: 8538755
    Abstract: An automated emotional recognition system is adapted to determine emotional states of a speaker based on the analysis of a speech signal. The emotional recognition system includes at least one server function and at least one client function in communication with the at least one server function for receiving assistance in determining the emotional states of the speaker. The at least one client function includes an emotional features calculator adapted to receive the speech signal and to extract therefrom a set of speech features indicative of the emotional state of the speaker. The emotional state recognition system further includes at least one emotional state decider adapted to determine the emotional state of the speaker exploiting the set of speech features based on a decision model. The server function includes at least a decision model trainer adapted to update the selected decision model according to the speech signal.
    Type: Grant
    Filed: January 31, 2007
    Date of Patent: September 17, 2013
    Assignee: Telecom Italia S.p.A.
    Inventors: Gianmario Bollano, Donato Ettorre, Antonio Esiliato
  • Patent number: 8538763
    Abstract: Enhancing speech components of an audio signal composed of speech and noise components includes controlling the gain of the audio signal in ones of its subbands, wherein the gain in a subband is reduced as the level of estimated noise components increases with respect to the level of speech components, wherein the level of estimated noise components is determined at least in part by (1) comparing an estimated noise components level with the level of the audio signal in the subband and increasing the estimated noise components level in the subband by a predetermined amount when the input signal level in the subband exceeds the estimated noise components level in the subband by a limit for more than a defined time, or (2) obtaining and monitoring the signal-to-noise ratio in the subband and increasing the estimated noise components level in the subband by a predetermined amount when the signal-to-noise ratio in the subband exceeds a limit for more than a defined time.
    Type: Grant
    Filed: September 10, 2008
    Date of Patent: September 17, 2013
    Assignee: Dolby Laboratories Licensing Corporation
    Inventor: Rongshan Yu
  • Patent number: 8538750
    Abstract: This invention realizes a speech communication system and method, and a robot apparatus capable of significantly improving entertainment property. A speech communication system with a function to make conversation with a conversation partner is provided with a speech recognition means for recognizing speech of the conversation partner, a conversation control means for controlling conversation with the conversation partner based on the recognition result of the speech recognition means, an image recognition means for recognizing the face of the conversation partner, and a tracking control means for tracing the existence of the conversation partner based on one or both of the recognition result of the image recognition means and the recognition result of the speech recognition means. The conversation control means controls conversation so as to continue depending on tracking of the tracking control means.
    Type: Grant
    Filed: November 2, 2012
    Date of Patent: September 17, 2013
    Assignee: Sony Corporation
    Inventors: Kazumi Aoyama, Hideki Shimomura
  • Patent number: 8494842
    Abstract: The technology disclosed relates to audio signal processing. It includes a series of modules that individually are useful to solve audio signal processing problems. Among the problems addressed are buzz removal, selecting a pitch candidate among pitch candidates based on local continuity of pitch and regional octave consistency, making small adjustments in pitch, ensuring that a selected pitch is consistent with harmonic peaks, determining whether a given frame or region of frames includes harmonic, voiced signal, extracting harmonics from voice signals and detecting vibrato. One environment in which these modules are useful is transcribing singing or humming into a symbolic melody. Another environment that would usefully employ some of these modules is speech processing. Some of the modules, such as buzz removal, are useful in many other environments as well.
    Type: Grant
    Filed: November 3, 2008
    Date of Patent: July 23, 2013
    Assignee: Soundhound, Inc.
    Inventors: Aaron Master, Seyed Majid Emami
  • Patent number: 8489391
    Abstract: A system method of reusing information in a low power scalable hybrid audio encoder are disclosed. The includes determining a state of an advanced audio coding (AAC) transient flag, performing spectral band replication (SBR) transient detection on at least two possible locations upon a determination that the AAC transient flag is equal to a first value, performing SBR transient detection on a high frequency upon a determination that the AAC transient flag is equal to a second value, and determining whether a transient exists. The system includes a spectral band replication (SBR) coding module configured to determine a state of an advanced audio coding (AAC) transient flag and perform SBR transient detection on at least one location based upon an energy in a signal upon a determination that the AAC transient flag is equal to a first value.
    Type: Grant
    Filed: August 5, 2010
    Date of Patent: July 16, 2013
    Assignee: STMicroelectronics Asia Pacific Pte., Ltd.
    Inventors: Evelyn Kurniawati, Sapna George
  • Patent number: 8484018
    Abstract: An input frame data producing unit produces from data stored in an input buffer input frames each including a predetermined number of sub-frames of a first hopsize determined based on the first frame size and the overlapping rate. A frame processing unit executes a window function on the input frames and shifts the windowed input frames by the first hopsize and overlaps the shifted input frames, storing the overlapped frames in an output frame. An output buffer data producing frame unit stores data from the output frame to an output buffer including a predetermined number of sub-frames of a second hopsize. A CPU sets the first hopsize and overlapping rate in a slow-speed reproduction when the reproducing speed ratio is set lower than 1 different from in a high-speed reproduction when the reproducing speed ratio is set larger than 1.
    Type: Grant
    Filed: July 15, 2010
    Date of Patent: July 9, 2013
    Assignee: Casio Computer Co., Ltd
    Inventor: Masaru Setoguchi
  • Patent number: 8473283
    Abstract: The technology disclosed relates to audio signal processing. It includes a series of modules that individually are useful to solve audio signal processing problems. Among the problems addressed are buzz removal, selecting a pitch candidate among pitch candidates based on local continuity of pitch and regional octave consistency, making small adjustments in pitch, ensuring that a selected pitch is consistent with harmonic peaks, determining whether a given frame or region of frames includes harmonic, voiced signal, extracting harmonics from voice signals and detecting vibrato. One environment in which these modules are useful is transcribing singing or humming into a symbolic melody. Another environment that would usefully employ some of these modules is speech processing. Some of the modules, such as buzz removal, are useful in many other environments as well.
    Type: Grant
    Filed: November 3, 2008
    Date of Patent: June 25, 2013
    Assignee: Soundhound, Inc.
    Inventors: Aaron Master, Seyed Majid Emami
  • Patent number: 8473282
    Abstract: In a sound processing device, a modulation spectrum specifier specifies a modulation spectrum of an input sound for each of a plurality of unit intervals. An index calculator calculates an index value corresponding to a magnitude of components of modulation frequencies belonging to a predetermined range of the modulation spectrum. A determinator determines whether the input sound of each of the unit intervals is a vocal sound or a non-vocal sound based on the index value.
    Type: Grant
    Filed: January 23, 2009
    Date of Patent: June 25, 2013
    Assignee: Yamaha Corporation
    Inventor: Yasuo Yoshioka
  • Patent number: 8468024
    Abstract: A method of generating a frame of audio data for an audio signal from preceding audio data for the audio signal that precede the frame of audio data, the method comprising the steps of: predicting a predetermined number of data samples for the frame of audio data based on the preceding audio data, to form predicted data samples; identifying a section of the preceding audio data for use in generating the frame of audio data; and forming the audio data of the frame of audio data as a repetition (602) of at least part of the identified section to span the frame of audio data, wherein the beginning of the frame of audio data comprises a combination of a subset of the repetition (602) of the at least part of the identified section and the predicted data samples.
    Type: Grant
    Filed: May 14, 2007
    Date of Patent: June 18, 2013
    Assignee: Freescale Semiconductor, Inc.
    Inventors: Adrian Susan, Mihai Neghina