Speech Recognition Techniques For Robustness In Adverse Environments, E.g., In Noise, Of Stress Induced Speech, Etc. (epo) Patents (Class 704/E15.039)
  • Publication number: 20110166856
    Abstract: Systems, methods, and devices for noise profile determination for a voice-related feature of an electronic device are provided. In one example, an electronic device capable of such noise profile determination may include a microphone and data processing circuitry. When a voice-related feature of the electronic device is not in use, the microphone may obtain ambient sounds. The data processing circuitry may determine a noise profile based at least in part on the obtained ambient sounds. The noise profile may enable the data processing circuitry to at least partially filter other ambient sounds obtained when the voice-related feature of the electronic device is in use.
    Type: Application
    Filed: January 6, 2010
    Publication date: July 7, 2011
    Applicant: APPLE INC.
    Inventors: Aram Lindahl, Joseph M. Williams, Gints Valdis Klimanis
  • Publication number: 20110161078
    Abstract: Pitch is tracked for individual samples, which are taken much more frequently than an analysis frame. Speech is identified based on the tracked pitch and the speech components of the signal are removed with a time-varying filter, leaving only an estimate of a time-varying speech signal. This estimate is then used to generate a time-varying noise model which, in turn, can be used to enhance speech related systems.
    Type: Application
    Filed: March 7, 2011
    Publication date: June 30, 2011
    Applicant: Microsoft Corporation
    Inventors: James G. Droppo, Alejandro Acero, Luis Buera
  • Publication number: 20110144987
    Abstract: A method of automated speech recognition in a vehicle. The method includes receiving audio in the vehicle, pre-processing the received audio to generate acoustic feature vectors, decoding the generated acoustic feature vectors to produce at least one speech hypothesis, and post-processing the at least one speech hypothesis using pitch to improve speech recognition accuracy. The speech hypothesis can be accepted as recognized speech during post-processing if pitch is present in the received audio. Alternatively, a pitch count for the received audio can be determined, N-best speech hypotheses can be post-processed by comparing the pitch count to syllable counts associated with the speech hypotheses, and the speech hypothesis having a syllable count equal to the pitch count can be accepted as recognized speech.
    Type: Application
    Filed: December 10, 2009
    Publication date: June 16, 2011
    Applicant: GENERAL MOTORS LLC
    Inventors: Xufang Zhao, Uma Arun
  • Publication number: 20110142256
    Abstract: A method of removing a noise signal from an input signal, the method including receiving a pure noise signal and an input signal including the noise signal; determining whether the pure noise signal is a static noise signal or a non-static noise signal; and removing the noise signal from the input signal according to whether the noise signal is determined to be the static noise signal or the non-static noise signal.
    Type: Application
    Filed: December 2, 2010
    Publication date: June 16, 2011
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Seung-yeol LEE, Sungyub Daniel YOO, Gang-youl KIM, Sang-yoon KIM, Jung-eun PARK
  • Publication number: 20110144988
    Abstract: An embedded auditory system includes a voice detecting unit for receiving a voice signal as an input and dividing the voice signal into a voice section and a non-voice section; a noise removing unit for removing a noise in the voice section of the voice signal using noise information in the non-voice section of the voice signal; and a keyword spotting unit for extracting a feature vector from the voice signal noise-removed by the noise removing unit and detecting a keyword from the voice section of the voice signal using the feature vector. A method for processing a voice signal includes receiving a voice signal as an input and dividing the voice signal into a voice section and a non-voice section; removing a noise in the voice section of the voice signal using noise information in the non-voice section of the voice signal; and extracting a feature vector from the voice signal noise-removed by the noise removing unit and detecting a keyword from the voice section of the voice signal using the feature vector.
    Type: Application
    Filed: August 16, 2010
    Publication date: June 16, 2011
    Inventors: JongSuk Choi, Munsang Kim, Byung-Gi Lee, Hyung Soon Kim, Nam Ik Cho
  • Publication number: 20110135107
    Abstract: A clear, high quality voice signal with a high signal-to-noise ratio is achieved by use of an adaptive noise reduction scheme with two microphones in close proximity. The method includes the use of two omini directional microphones in a highly directional mode, and then applying an adaptive noise cancellation algorithm to reduce the noise.
    Type: Application
    Filed: June 14, 2010
    Publication date: June 9, 2011
    Inventor: Alon Konchitsky
  • Publication number: 20110119059
    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for selecting a speech recognition model in a standardized speech recognition infrastructure. The system receives speech from a user, and if a user-specific supervised speech model associated with the user is available, retrieves the supervised speech model. If the user-specific supervised speech model is unavailable and if an unsupervised speech model is available, the system retrieves the unsupervised speech model. If the user-specific supervised speech model and the unsupervised speech model are unavailable, the system retrieves a generic speech model associated with the user. Next the system recognizes the received speech from the user with the retrieved model. In one embodiment, the system trains a speech recognition model in a standardized speech recognition infrastructure. In another embodiment, the system handshakes with a remote application in a standardized speech recognition infrastructure.
    Type: Application
    Filed: November 13, 2009
    Publication date: May 19, 2011
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Andrej LJOLJE, Bernard S. RENGER, Steven Neil TISCHER
  • Publication number: 20110106533
    Abstract: A dual microphone voice activity detector system is presented. A voice activity detector system estimates the signal level and noise level at each microphone. A level differential between the two microphones of nearby sounds such as the signal is greater than the level differential of more distant sounds such as the noise. Thus, the voice activity detector detects the presence of nearby sounds.
    Type: Application
    Filed: June 25, 2009
    Publication date: May 5, 2011
    Applicant: DOLBY LABORATORIES LICENSING CORPORATION
    Inventor: Rongshan Yu
  • Publication number: 20110099596
    Abstract: A personalized television or internet video viewing environment, where the user can respond to messages. Messages are received over the internet and overlaid onto the video program. A light and vibrator on the remote control alert the viewer to respond by speaking into a microphone in the remote control unit. Voice recognition techniques are used to interpret the user's response, and biometric voice analysis can be used to identify the user. Successive interactions can be related and tailored to the particular user.
    Type: Application
    Filed: October 26, 2009
    Publication date: April 28, 2011
    Inventor: Michael J. Ure
  • Publication number: 20110071825
    Abstract: To this end, a voice detection device includes a band-based power calculation unit that calculates a total of signal power values (sub-band power) of signals entered from the microphones from one preset frequency width (sub-band) to another. The voice detection device also includes a band-based noise estimation unit that estimates the sub-band based noise power, and a sub-band based SNR calculation unit. The sub-band based SNR calculation unit calculates a sub-band SNR from one sub-band to another to output the largest one of the sub-band SNRs as an SNR for a microphone of interest. The voice detection device further includes a voice/non-voice decision unit that determines the voice/non-voice using the SNR for the microphone of interest.
    Type: Application
    Filed: May 26, 2009
    Publication date: March 24, 2011
    Inventors: Tadashi Emori, Masanori Tsujikawa
  • Publication number: 20110071824
    Abstract: An apparatus includes a function module, a strength module, and a filter module. The function module compares an input signal, which has a component, to a first delayed version of the input signal and a second delayed version of the input signal to produce a multi-dimensional model. The strength module calculates a strength of each extremum from a plurality of extrema of the multi-dimensional model based on a value of at least one opposite extremum of the multi-dimensional model. The strength module then identifies a first extremum from the plurality of extrema, which is associated with a pitch of the component of the input signal, that has the strength greater than the strength of the remaining extrema. The filter module extracts the pitch of the component from the input signal based on the strength of the first extremum.
    Type: Application
    Filed: September 23, 2010
    Publication date: March 24, 2011
    Inventors: Carol Espy-Wilson, Srikanth Vishnubhotla
  • Publication number: 20110054891
    Abstract: The method comprises the following steps in the frequency domain: a) combining signals into a noisy combined signal; b) estimating a pseudo-steady noise component; c) calculating a probability of transients being present in the noisy combined signal; d) estimating a main arrival direction of transients; e) calculating a probability of speech being present on the basis of a three-dimensional spatial criterion suitable for discriminating amongst the transients between useful speech and lateral noise; and f) selectively reducing noise by applying a variable gain specific to each frequency band and to each time frame.
    Type: Application
    Filed: July 1, 2010
    Publication date: March 3, 2011
    Applicant: PARROT
    Inventors: Guillaume Vitte, Julie Seris, Guillaume Pinto
  • Publication number: 20110035216
    Abstract: The invention can recognize any several languages at the same time without using samples. The important skill is that features of known words in any language are extracted from unknown words or continuous voices. These unknown words represented by matrices are spread in the 144-dimensional space. The feature of a known word of any language represented by a matrix is simulated by the surrounding unknown words. The invention includes 12 elastic frames of equal length without filter and without overlap to normalize the signal waveform of variable length for a word, which has one to several syllables, into a 12×12 matrix as a feature of the word. The invention can improve the feature such that the speech recognition of an unknown sentence is correct. The invention can correctly recognize any languages without samples, such as English, Chinese, German, French, Japanese, Korean, Russian, Cantonese, Taiwanese, etc.
    Type: Application
    Filed: August 5, 2009
    Publication date: February 10, 2011
    Inventors: Tze Fen LI, Tai-Jan Lee Li, Shih-Tzung Li, Shih-Hon Li, Li-Chuan Liao
  • Publication number: 20110029309
    Abstract: Provided are a signal separating apparatus and a signal separating method capable of solving the permutation problem and separating user speech to be extracted. The signal separating apparatus separates a specific speech signal and a noise signal from a received sound signal. First, a joint probability density distribution estimation unit of a permutation solving unit calculates joint probability density distributions of the respective separated signals. Then, a classifying determination unit of the permutation solving unit determines classifying based on shapes of the calculated joint probability density distributions.
    Type: Application
    Filed: September 2, 2008
    Publication date: February 3, 2011
    Applicants: TOYOTA JIDOSHA KABUSHIKI KAISHA, NATIONAL UNIVERSITY CORPORATION NARA INSTITUTE OF SCIENCE AND TECHNOLOGY
    Inventors: Tomoya Takatani, Jani Even
  • Publication number: 20110015925
    Abstract: A speech recognition method, comprising: receiving a speech input in a first noise environment which comprises a sequence of observations; determining the likelihood of a sequence of words arising from the sequence of observations using an acoustic model, comprising: providing an acoustic model for performing speech recognition on a input signal which comprises a sequence of observations, wherein said model has been trained to recognise speech in a second noise environment, said model having a plurality of model parameters relating to the probability distribution of a word or part thereof being related to an observation; adapting the model trained in the second environment to that of the first environment; the speech recognition method further comprising determining the likelihood of a sequence of observations occurring in a given language using a language model; combining the likelihoods determined by the acoustic model and the language model and outputting a sequence of words identified from said spee
    Type: Application
    Filed: March 26, 2010
    Publication date: January 20, 2011
    Applicant: Kabushiki Kaisha Toshiba
    Inventors: Haitian Xu, Mark John Francis Gales
  • Publication number: 20110010172
    Abstract: Speech detection is a technique to determine and classify periods of speech. In a normal conversation, each speaker speaks less than half the time. The remaining time is devoted to listening to the other end and pauses between speech and silence. The classification is usually done by comparing the signal energy to a threshold. Classifying speech as noise and noise as speech may affect the performance of the communication device. The current invention overcomes such problems by utilizing an alternate sensor signal indicating the presence or absence of speech. In the current invention, the communication device receives an audio signal via single or multiple microphones. The speech sensor may generate a unique signal based on the facial, bone, lips and/or throat movements. The system then combines the information received by the microphones and the speech sensor to decide the presence or absence of speech.
    Type: Application
    Filed: July 9, 2010
    Publication date: January 13, 2011
    Inventor: Alon Konchitsky
  • Publication number: 20110010171
    Abstract: A system and method for providing speech recognition functionality offers improved accuracy and robustness in noisy environments having multiple speakers. The described technique includes receiving speech energy and converting the received speech energy to a digitized form. The digitized speech energy is decomposed into features that are then projected into a feature space having multiple speaker subspaces. The projected features fall either into one of the multiple speaker subspaces or outside of all speaker subspaces. A speech recognition operation is performed on a selected one of the multiple speaker subspaces to resolve the utterance to a command or data.
    Type: Application
    Filed: July 7, 2009
    Publication date: January 13, 2011
    Applicant: General Motors Corporation
    Inventors: Gaurav Talwar, Rathinavelu Chengalvarayan
  • Publication number: 20110004472
    Abstract: A method for automatic speech recognition includes determining for an input signal a plurality scores representative of certainties that the input signal is associated with corresponding states of a speech recognition model, using the speech recognition model and the determined scores to compute an average signal, computing a difference value representative of a difference between the input signal and the average signal, and processing the input signal in accordance with the difference value.
    Type: Application
    Filed: September 15, 2010
    Publication date: January 6, 2011
    Inventor: Igor Zlokarnik
  • Publication number: 20100318354
    Abstract: Technologies are described herein for noise adaptive training to achieve robust automatic speech recognition. Through the use of these technologies, a noise adaptive training (NAT) approach may use both clean and corrupted speech for training. The NAT approach may normalize the environmental distortion as part of the model training. A set of underlying “pseudo-clean” model parameters may be estimated directly. This may be done without point estimation of clean speech features as an intermediate step. The pseudo-clean model parameters learned from the NAT technique may be used with a Vector Taylor Series (VTS) adaptation. Such adaptation may support decoding noisy utterances during the operating phase of a automatic voice recognition system.
    Type: Application
    Filed: June 12, 2009
    Publication date: December 16, 2010
    Applicant: Microsoft Corporation
    Inventors: Michael Lewis Seltzer, James Garnet Droppo, Ozlem Kalinli, Alejandro Acero
  • Publication number: 20100312554
    Abstract: A system and method is provided for sending a message type identifier through a speech codec (in-band) such as found in a wireless communication network. A first predetermined sequence with noise-like characteristics identifies a first message type. A second predetermined sequence with noise-like characteristics identifies a second message type.
    Type: Application
    Filed: June 28, 2010
    Publication date: December 9, 2010
    Applicant: QUALCOMM Incorporated
    Inventors: CHRISTIAN PIETSCH, Marc W. Werner, Christoph A. Joetten, Christian Sgraja
  • Publication number: 20100286490
    Abstract: Systems, methods and techniques are described for monitoring a subject. The subject's safety, health and wellbeing can be monitoring using a system that receives input indicating the subject's status. The system can verbally interact with the subject to obtain information on the subject's status. The words used by the subject or the quality of the subject's response can be used to decide whether to contact emergency services to assist the subject.
    Type: Application
    Filed: April 20, 2007
    Publication date: November 11, 2010
    Applicant: IQ LIFE, INC.
    Inventor: Dennis A. Koverzin
  • Publication number: 20100280826
    Abstract: An apparatus and a method that achieve physical separation of sound sources by pointing directly a beam of coherent electromagnetic waves (i.e. laser). Analyzing the physical properties of a beam reflected from the vibrations generating sound source enable the reconstruction of the sound signal generated by the sound source, eliminating the noise component added to the original sound signal. In addition, the use of multiple electromagnetic waves beams or a beam that rapidly skips from one sound source to another allows the physical separation of these sound sources. Aiming each beam to a different sound source ensures the independence of the sound signals sources and therefore provides full sources separation.
    Type: Application
    Filed: July 15, 2010
    Publication date: November 4, 2010
    Applicant: AudioZoom LTD
    Inventor: Tal Bakish
  • Publication number: 20100268533
    Abstract: A speech detection apparatus and method are provided. The speech detection apparatus and method determine whether a frame is speech or not using feature information extracted from an input signal. The speech detection apparatus may estimate a situation related to an input frame and determine which feature information is required for speech detection for the input frame in the estimated situation. The speech detection apparatus may detect a speech signal using dynamic feature information that may be more suitable to the situation of a particular frame, instead of using the same feature information for each and every frame.
    Type: Application
    Filed: April 16, 2010
    Publication date: October 21, 2010
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Chi-youn PARK, Nam-hoon Kim, Jeong-mi Cho
  • Publication number: 20100262424
    Abstract: The present invention provides a method of eliminating background noise and a device using the same. The method of eliminating background noise comprises the steps of: detecting an effective value of a received audio signal, and generating an average power signal of the received audio signal; generating a noise eliminating control signal by comparing the average power signal with a first threshold; and eliminating the noise, and amplifying the voice signal using the noise eliminating control signal. A device of eliminating background noise comprises a detecting unit, which is configured to detect an effective value, and generate an average power signal of the received audio signal; a first signal generating unit, which is configured to generate a noise eliminating control signal; and an amplifying unit, which is configured to eliminate the noise, and amplify the voice signal.
    Type: Application
    Filed: November 5, 2009
    Publication date: October 14, 2010
    Inventors: Hai Li, Kunping Xu, Lizhen Zhang, Yun Yang, Wei Feng
  • Publication number: 20100262425
    Abstract: Disclosed is a noise suppression device capable of better noise suppression by means of a simpler structure and with a lighter computational load. A noise suppression device (100) has a noise suppression processor (150) to estimate the required information only from the observed information, which is the required information corrupted by noise. A correlator (154) calculates the correlation of the estimation error when the state quantity, which contains the required information, of the system at time n+1 was estimated from the information until time n or time n+1 for the observed information at only time n.
    Type: Application
    Filed: March 18, 2009
    Publication date: October 14, 2010
    Applicant: Tokyo University of Science Educational Foundation Administrative Organization
    Inventors: Nari Tanabe, Toshihiro Furukawa
  • Publication number: 20100248786
    Abstract: Audio input to a user device is captured in a buffer and played back to the user while being sent to and recognized by an automatic speech recognition (ASR) system. Overlapping the playback with the speech recognition processing masks a portion of the true latency of the ASR system thus improving the user's perception of the ASR system's responsiveness. Further, upon hearing the playback, the user is intuitively guided to self-correct for any defects in the captured audio.
    Type: Application
    Filed: March 30, 2010
    Publication date: September 30, 2010
    Inventor: Laurent Charriere
  • Publication number: 20100217590
    Abstract: A system and method for performing speaker localization is described. The system and method utilizes speaker recognition to provide an estimate of the direction of arrival (DOA) of speech sound waves emanating from a desired speaker with respect to a microphone array included in the system. Candidate DOA estimates may be preselected or generated by one or more other DOA estimation techniques. The system and method is suited to support steerable beamforming as well as other applications that utilize or benefit from DOA estimation. The system and method provides robust performance even in systems and devices having small microphone arrays and thus may advantageously be implemented to steer a beamformer in a cellular telephone or other mobile telephony terminal featuring a speakerphone mode.
    Type: Application
    Filed: February 24, 2009
    Publication date: August 26, 2010
    Applicant: BROADCOM CORPORATION
    Inventors: Elias Nemer, Jes Thyssen
  • Publication number: 20100211388
    Abstract: A method for enhancing speech components of an audio signal composed of speech and noise components processes subbands of the audio signal, the processing including controlling the gain of the audio signal in ones of the subbands, wherein the gain in a subband is controlled at least by processes that convey either additive/subtractive differences in gain or multiplicative ratios of gain so as to reduce gain in a subband as the level of noise components increases with respect to the level of speech components in the subband and increase gain in a subband when speech components are present in subbands of the audio signal, the processes each responding to subbands of the audio signal and controlling gain independently of each other to provide a processed subband audio signal.
    Type: Application
    Filed: September 10, 2008
    Publication date: August 19, 2010
    Applicant: DOLBY LABORATORIES LICENSING CORPORATION
    Inventors: Rongshan Yu, Charles Phillip Brown
  • Publication number: 20100204988
    Abstract: A speech recognition method includes receiving a speech input signal in a first noise environment which includes a sequence of observations, determining the likelihood of a sequence of words arising from the sequence of observations using an acoustic model, adapting the model trained in a second noise environment to that of the first environment, wherein adapting the model trained in the second environment to that of the first environment includes using second order or higher order Taylor expansion coefficients derived for a group of probability distributions and the same expansion coefficient is used for the whole group.
    Type: Application
    Filed: April 20, 2010
    Publication date: August 12, 2010
    Inventors: Haitian XU, Kean Kheong Chin
  • Publication number: 20100204985
    Abstract: A warping factor estimation system comprises label information generation unit that outputs voice/non-voice label information, warp model storage unit in which a probability model representing voice and non-voice occurrence probabilities is stored, and warp estimation unit that calculates a warping factor in the frequency axis direction using the probability model representing voice and non-voice occurrence probabilities, voice and non-voice labels, and a cepstrum.
    Type: Application
    Filed: September 22, 2008
    Publication date: August 12, 2010
    Inventor: Tadashi Emori
  • Publication number: 20100191529
    Abstract: Systems and methods are described for a speech system that manages multiple grammars from one or more speech-enabled applications. The speech system includes a speech server that supports different grammars and different types of grammars by exposing several methods to the speech-enabled applications. The speech server supports static grammars that do not change and dynamic grammars that may change after a commit. The speech server provides persistence by supporting persistent grammars that enable a user to issue a command to an application even when the application is not loaded. In such a circumstance, the application is automatically launched and the command is processed. The speech server may enable or disable a grammar in order to limit confusion between grammars. Global and yielding grammars are also supported by the speech server. Global grammars are always active (e.g., “call 9-1-1”) while yielding grammars may be deactivated when an interaction whose grammar requires priority is active.
    Type: Application
    Filed: March 31, 2010
    Publication date: July 29, 2010
    Applicant: Microsoft Corporation
    Inventors: Stephen Russell Falcon, Clement Chun Pong Yip, David Michael Miller, Dan Banay
  • Publication number: 20100191524
    Abstract: A non-speech section detecting device generating a plurality of frames having a given time length on the basis of sound data obtained by sampling sound, and detecting a non-speech section having a frame not containing voice data based on speech uttered by a person, the device including: a calculating part calculating a bias of a spectrum obtained by converting sound data of each frame into components on a frequency axis; a judging part judging whether the bias is greater than or equal to a given threshold or alternatively smaller than or equal to a given threshold; a counting part counting the number of consecutive frames judged as having a bias greater than or equal to the threshold or alternatively smaller than or equal to the threshold; a count judging part judging whether the obtained number of consecutive frames is greater than or equal to a given value.
    Type: Application
    Filed: April 5, 2010
    Publication date: July 29, 2010
    Applicant: FUJITSU LIMITED
    Inventors: Nobuyuki Washio, Shoji Hayakawa
  • Publication number: 20100169089
    Abstract: A voice recognizing apparatus includes a microphone 12 which inputs an input voice including speech voice uttered by a user speaker and interference voice uttered by an interference speaker other than the user speaker, superimposition amount determining unit 14 which determines a noise superimposition amount for the input voice on the basis of a speech voice and an interference voice separately input as the input voice, a noise superimposing unit 16 which superimposes noise according to the noise superimposition amount onto the input voice and outputs the resultant voice as noise-superimposed voice; and a voice recognizing unit 18 which recognizes the noise-superimposed voice.
    Type: Application
    Filed: January 10, 2007
    Publication date: July 1, 2010
    Inventor: Toru Iwasawa
  • Publication number: 20100169090
    Abstract: A method for adapting acoustic models used for automatic speech recognition is provided. The method includes estimating noise in a portion of a speech signal, determining a first estimated variance scaling vector using an estimated 2-order polynomial and the noise estimation, wherein the estimated 2-order polynomial represents a priori knowledge of a dependency of a variance scaling vector on noise, determining a second estimated variance scaling vector using statistics from prior portions of the speech signal, determining a variance scaling factor using the first estimated variance scaling vector and the second estimated variance scaling vector, and using the variance scaling factor to adapt an acoustic model.
    Type: Application
    Filed: December 31, 2008
    Publication date: July 1, 2010
    Inventors: Xiaodong Cui, Kaisheng Yao
  • Publication number: 20100161326
    Abstract: A speech recognition system includes: a speed level classifier for measuring a moving speed of a moving object by using a noise signal at an initial time of speech recognition to determine a speed level of the moving object; a first speech enhancement unit for enhancing sound quality of an input speech signal of the speech recognition by using a Wiener filter, if the speed level of the moving object is equal to or lower than a specific level; and a second speech enhancement unit enhancing the sound quality of the input speech signal by using a Gaussian mixture model, if the speed level of the moving object is higher than the specific level. The system further includes an end point detection unit for detecting start and end points, an elimination unit for eliminating sudden noise components based on a sudden noise Gaussian mixture model.
    Type: Application
    Filed: July 21, 2009
    Publication date: June 24, 2010
    Applicant: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE
    Inventors: Sung Joo Lee, Ho-Young Jung, Jeon Gue Park, Hoon Chung, Yunkeun Lee, Byung Ok Kang, Hyung-Bae Jeon, Jong Jin Kim, Ki-young Park, Euisok Chung, Ji Hyun Wang, Jeom Ja Kang
  • Publication number: 20100153104
    Abstract: Described is noise reduction technology generally for speech input in which a noise-suppression related gain value for the frame is determined based upon a noise level associated with that frame in addition to the signal to noise ratios (SNRs). In one implementation, a noise reduction mechanism is based upon minimum mean square error, Mel-frequency cepstra noise reduction technology. A high gain value (e.g., one) is set to accomplish little or no noise suppression when the noise level is below a threshold low level, and a low gain value set or computed to accomplish large noise suppression above a threshold high noise level. A noise-power dependent function, e.g., a log-linear interpolation, is used to compute the gain between the thresholds. Smoothing may be performed by modifying the gain value based upon a prior frame's gain value. Also described is learning parameters used in noise reduction via a step-adaptive discriminative learning algorithm.
    Type: Application
    Filed: December 16, 2008
    Publication date: June 17, 2010
    Applicant: MICROSOFT CORPORATION
    Inventors: Dong Yu, Li Deng, Yifan Gong, Jian Wu, Alejandro Acero
  • Publication number: 20100145687
    Abstract: Method for removing noise from a digital speech waveform, including receiving the digital speech waveform having the noise contained therein, segmenting the digital speech waveform into one or more frames, each frame having a clean portion and a noisy portion, extracting a feature component from each frame, creating an nonlinear speech distortion model from the feature components, creating a statistical noise model by making a Piecewise Linear Approximation (PLA) of the nonlinear speech distortion model, determining the clean portion of each frame using the statistical noise model, a log power spectra of each frame, and a model of a digital speech waveform recorded in a noise controlled environment, and constructing a clean digital speech waveform from each clean portion of each frame.
    Type: Application
    Filed: December 4, 2008
    Publication date: June 10, 2010
    Applicant: Microsoft Corporation
    Inventors: Qiang Huo, Jun Du
  • Publication number: 20100131268
    Abstract: An apparatus having a voice-estimation (VE) interface that probes the vocal tract of a user with sub-threshold acoustic waves to estimate the user's voice while the user speaks silently or audibly in a noisy or socially sensitive environment. In one embodiment, the VE interface is integrated into a cell phone that directs an estimated-voice signal over a network to a remote party to enable (i) the user to have a conversation with the remote party without disturbing other people, e.g., at a meeting, conference, movie, or performance, and (ii) the remote party to more-clearly hear the user whose voice would otherwise be overwhelmed by a relatively loud ambient noise due to the user being, e.g., in a nightclub, disco, or flying aircraft.
    Type: Application
    Filed: November 26, 2008
    Publication date: May 27, 2010
    Applicant: ALCATEL-LUCENT USA INC.
    Inventor: Lothar Benedikt Moeller
  • Publication number: 20100131269
    Abstract: Uses of an enhanced sidetone signal in an active noise cancellation operation are disclosed.
    Type: Application
    Filed: November 18, 2009
    Publication date: May 27, 2010
    Applicant: QUALCOMM Incorporated
    Inventors: Hyun Jin Park, Kwokleung Chan
  • Publication number: 20100121636
    Abstract: A computer-implemented method of multisensory speech detection is disclosed. The method comprises determining an orientation of a mobile device and determining an operating mode of the mobile device based on the orientation of the mobile device. The method further includes identifying speech detection parameters that specify when speech detection begins or ends based on the determined operating mode and detecting speech from a user of the mobile device based on the speech detection parameters.
    Type: Application
    Filed: November 10, 2009
    Publication date: May 13, 2010
    Inventors: Dave Burke, Michael J. Lebeau, Konrad Gianno, Trausti Kristjansson, John Nicholas Jitkoff, Andrew W. Senior
  • Publication number: 20100114570
    Abstract: An apparatus and a method for restoring voice are provided. The apparatus reduces noise included in a voice signal input to a microphone and outputs a voice signal having reduced noise, detects harmonic frequencies from the voice signal having reduced noise, and restores the voice signal having reduced noise approximate to its original state before being input to the microphone according to detected harmonic frequencies of the voice signal having reduced noise.
    Type: Application
    Filed: October 30, 2009
    Publication date: May 6, 2010
    Inventors: Jae-hoon Jeong, Kwang-cheol Oh
  • Publication number: 20100106495
    Abstract: A voice recognition system comprises: a voice input unit that receives an input signal from a voice input element and output it; a voice detection unit that detects an utterance segment in the input signal; a voice recognition unit that performs voice recognition for the utterance segment; and a control unit that outputs a control signal to at least one of the voice input unit and the voice detection unit and suppresses a detection frequency if the detection frequency satisfies a predetermined condition.
    Type: Application
    Filed: February 27, 2008
    Publication date: April 29, 2010
    Applicant: NEc Corporation
    Inventor: Toru Iwasawa
  • Publication number: 20100088093
    Abstract: A voice command acquisition method and system for motor vehicles is improved in that noise source information is obtained directly from the vehicle system bus. Upon receiving an input signal with a voice command, the system bus is queried for one or more possible sources of a noise component in the input signal. In addition to vehicle-internal information (e.g., window status, fan blower speed, vehicle speed), the system may acquire external information (e.g., weather status) in order to better classify the noise component in the input signal. If the noise source is found to be a window, for example, the driver may be prompted to close the window. In addition, if the fan blower is at a high speed level, it may be slowed down automatically.
    Type: Application
    Filed: October 3, 2008
    Publication date: April 8, 2010
    Applicant: VOLKSWAGEN AKTIENGESELLSCHAFT
    Inventors: Chu Hee Lee, Jonathan Lee, Daniel Rosario, Edward Kim, Thomas Chan
  • Publication number: 20100082341
    Abstract: A device includes a speaker recognition device operable to perform a method that identifies a speaker using voice signal analysis. The speaker recognition device and method identifies the speaker by analyzing a voice signal and comparing the signal with voice signal characteristics of speakers, which are statistically classified. The device and method is applicable to a case where a voice signal is a voiced sound or a voiceless sound or to a case where no information on a voice signal is present. Since voice/non-voice determination is performed, the speaker can be reliably identified from the voice signal. The device and method is adaptable to applications that require a real-time process due to a small amount of data to be calculated and fast processing. Furthermore, the device and method can be variously applied to portable devices due to low power consumption.
    Type: Application
    Filed: September 29, 2009
    Publication date: April 1, 2010
    Applicant: Samsung Electronics Co., Ltd.
    Inventor: Hyun-Soo Kim
  • Publication number: 20100082340
    Abstract: The speech recognition system of the present invention includes: a sound source separating section which separates mixed speeches from multiple sound sources; a mask generating section which generates a soft mask which can take continuous values between 0 and 1 for each separated speech according to reliability of separation in separating operation of the sound source separating section; and a speech recognizing section which recognizes speeches separated by the sound source separating section using soft masks generated by the mask generating section.
    Type: Application
    Filed: August 19, 2009
    Publication date: April 1, 2010
    Applicant: HONDA MOTOR CO., LTD.
    Inventors: Kazuhiro Nakadai, Toru Takahashi, Hiroshi Okuno
  • Publication number: 20100076757
    Abstract: A speech recognition system includes a receiver component that receives a distorted speech utterance. The speech recognition also includes an adaptor component that selectively adapts parameters of a compressed model used to recognize at least a portion of the distorted speech utterance, wherein the adaptor component selectively adapts the parameters of the compressed model based at least in part upon the received distorted speech utterance.
    Type: Application
    Filed: September 23, 2008
    Publication date: March 25, 2010
    Applicant: Microsoft Corporation
    Inventors: Jinyu Li, Li Deng, Dong Yu, Jian Wu, Yifan Gong, Alejandro Acero
  • Publication number: 20100076758
    Abstract: A speech recognition system described herein includes a receiver component that receives a distorted speech utterance. The speech recognition also includes an updater component that is in communication with a first model and a second model, wherein the updater component automatically updates parameters of the second model based at least in part upon joint estimates of additive and convolutive distortions output by the first model, wherein the joint estimates of additive and convolutive distortions are estimates of distortions based on a phase-sensitive model in the speech utterance received by the receiver component. Further, distortions other than additive and convolutive distortions, including other stationary and nonstationary sources, can also be estimated used to update the parameters of the second model.
    Type: Application
    Filed: September 24, 2008
    Publication date: March 25, 2010
    Applicant: Microsoft Corporation
    Inventors: Jinyu Li, Li Deng, Dong Yu, Yifan Gong, Alejandro Acero
  • Publication number: 20100049514
    Abstract: An enhanced system for speech interpretation is provided. The system may include receiving a user verbalization and generating one or more preliminary interpretations of the verbalization by identifying one or more phonemes in the verbalization. An acoustic grammar may be used to map the phonemes to syllables or words, and the acoustic grammar may include one or more linking elements to reduce a search space associated with the grammar. The preliminary interpretations may be subject to various post-processing techniques to sharpen accuracy of the preliminary interpretation. A heuristic model may assign weights to various parameters based on a context, a user profile, or other domain knowledge. A probable interpretation may be identified based on a confidence score for each of a set of candidate interpretations generated by the heuristic model. The model may be augmented or updated based on various information associated with the interpretation of the verbalization.
    Type: Application
    Filed: October 29, 2009
    Publication date: February 25, 2010
    Applicant: VoiceBox Technologies, Inc.
    Inventors: Robert A. Kennewick, Min Ke, Michael Tjalve, Philippe Di Cristo
  • Publication number: 20100017206
    Abstract: A system and method for sound source separation. The system and method use a beamforming technique. The sound source separation system includes a windowing processor; a DFT transformer; a transfer function estimator; and a noise estimator. The system also includes a voice signal extractor that cancels individual voice signals, except an individual voice signal that is desired to be extracted among individual voice signals, from the integrated voice signals. The system further includes a voice signal detector that cancels a noise part provided through the noise estimator from a transfer function of an individual voice signal which is desired to be detected and extracts a noise-canceled individual voice signal. Even when two or more sound sources are simultaneously input, the sound sources can be separated from each other and separately stored and managed, or an initial sound source can be stored and managed.
    Type: Application
    Filed: July 20, 2009
    Publication date: January 21, 2010
    Applicants: Samsung Electronics Co., Ltd., Korea University Research and Business Foundation
    Inventors: Hyun-Soo Kim, Hanseok Ko, Jounghoon Beh, Taekjin Lee
  • Publication number: 20100017207
    Abstract: A signal is used to form intermediate feature vectors which are subjected to high-pass filtering. The high-pass-filtered intermediate feature vectors have a respective prescribed addition feature vector added to them.
    Type: Application
    Filed: September 24, 2009
    Publication date: January 21, 2010
    Applicant: Infineon Technologies AG
    Inventors: Werner Hemmert, Marcus Holmberg