Patents by Inventor Alejandro Acero

Alejandro Acero has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20120128176
    Abstract: A noise reduction system and a method of noise reduction includes utilizing an array of microphones to receive sound signals from stationary sound sources and a user that is speaking. Positions of the stationary sound sources relative to the array of microphones are estimated using sound signals emitted from the sound sources at an earlier time. Noise is suppressed in an audio signal based at least in part on the estimated positions of the stationary sound sources.
    Type: Application
    Filed: January 27, 2012
    Publication date: May 24, 2012
    Applicant: Microsoft Corporation
    Inventors: Alejandro Acero, Ivan J. Tashev, Michael L. Seltzer
  • Patent number: 8185389
    Abstract: Described is noise reduction technology generally for speech input in which a noise-suppression related gain value for the frame is determined based upon a noise level associated with that frame in addition to the signal to noise ratios (SNRs). In one implementation, a noise reduction mechanism is based upon minimum mean square error, Mel-frequency cepstra noise reduction technology. A high gain value (e.g., one) is set to accomplish little or no noise suppression when the noise level is below a threshold low level, and a low gain value set or computed to accomplish large noise suppression above a threshold high noise level. A noise-power dependent function, e.g., a log-linear interpolation, is used to compute the gain between the thresholds. Smoothing may be performed by modifying the gain value based upon a prior frame's gain value. Also described is learning parameters used in noise reduction via a step-adaptive discriminative learning algorithm.
    Type: Grant
    Filed: December 16, 2008
    Date of Patent: May 22, 2012
    Assignee: Microsoft Corporation
    Inventors: Dong Yu, Li Deng, Yifan Gong, Jian Wu, Alejandro Acero
  • Patent number: 8180636
    Abstract: Pitch is tracked for individual samples, which are taken much more frequently than an analysis frame. Speech is identified based on the tracked pitch and the speech components of the signal are removed with a time-varying filter, leaving only an estimate of a time-varying speech signal. This estimate is then used to generate a time-varying noise model which, in turn, can be used to enhance speech related systems.
    Type: Grant
    Filed: March 7, 2011
    Date of Patent: May 15, 2012
    Assignee: Microsoft Corporation
    Inventors: James G. Droppo, Alejandro Acero, Luis Buera
  • Patent number: 8180640
    Abstract: Described is the use of acoustic data to improve grapheme-to-phoneme conversion for speech recognition, such as to more accurately recognize spoken names in a voice-dialing system. A joint model of acoustics and graphonemes (acoustic data, phonemes sequences, grapheme sequences and an alignment between phoneme sequences and grapheme sequences) is described, as is retraining by maximum likelihood training and discriminative training in adapting graphoneme model parameters using acoustic data. Also described is the unsupervised collection of grapheme labels for received acoustic data, thereby automatically obtaining a substantial number of actual samples that may be used in retraining. Speech input that does not meet a confidence threshold may be filtered out so as to not be used by the retrained model.
    Type: Grant
    Filed: June 20, 2011
    Date of Patent: May 15, 2012
    Assignee: Microsoft Corporation
    Inventors: Xiao Li, Asela J. R. Gunawardana, Alejandro Acero, Jr.
  • Patent number: 8180637
    Abstract: A method of compensating for additive and convolutive distortions applied to a signal indicative of an utterance is discussed. The method includes receiving a signal and initializing noise mean and channel mean vectors. Gaussian dependent matrix and Hidden Markov Model (HMM) parameters are calculated or updated to account for additive noise from the noise mean vector or convolutive distortion from the channel mean vector. The HMM parameters are adapted by decoding the utterance using the previously calculated HMM parameters and adjusting the Gaussian dependent matrix and the HMM parameters based upon data received during the decoding. The adapted HMM parameters are applied to decode the input utterance and provide a transcription of the utterance.
    Type: Grant
    Filed: December 3, 2007
    Date of Patent: May 15, 2012
    Assignee: Microsoft Corporation
    Inventors: Dong Yu, Li Deng, Alejandro Acero, Yifan Gong, Jinyu Li
  • Patent number: 8165870
    Abstract: The method and apparatus utilize a filter to remove a variety of non-dictated words from data based on probability and improve the effectiveness of creating a language model.
    Type: Grant
    Filed: February 10, 2005
    Date of Patent: April 24, 2012
    Assignee: Microsoft Corporation
    Inventors: Alejandro Acero, Dong Yu, Julian J. Odell, Milind V. Mahajan, Peter K. L. Mau
  • Patent number: 8160878
    Abstract: A speech recognition system uses Gaussian mixture variable-parameter hidden Markov models (VPHMMs) to recognize speech under many different conditions. Each Gaussian mixture component of the VPHMMs is characterized by a mean parameter ? and a variance parameter ?. Each of these Gaussian parameters varies as a function of at least one environmental conditioning parameter, such as, but not limited to, instantaneous signal-to-noise-ratio (SNR). The way in which a Gaussian parameter varies with the environmental conditioning parameter(s) can be approximated as a piecewise function, such as a cubic spline function. Further, the recognition system formulates the mean parameter ? and the variance parameter ? of each Gaussian mixture component in an efficient form that accommodates the use of discriminative training and parameter sharing. Parameter sharing is carried out so that the otherwise very large number of parameters in the VPHMMs can be effectively reduced with practically feasible amounts of training data.
    Type: Grant
    Filed: September 16, 2008
    Date of Patent: April 17, 2012
    Assignee: Microsoft Corporation
    Inventors: Dong Yu, Li Deng, Yifan Gong, Alejandro Acero
  • Patent number: 8145488
    Abstract: A speech recognition system uses Gaussian mixture variable-parameter hidden Markov models (VPHMMs) to recognize speech. The VPHMMs include Gaussian parameters that vary as a function of at least one environmental conditioning parameter. The relationship of each Gaussian parameter to the environmental conditioning parameter(s) is modeled using a piecewise fitting approach, such as by using spline functions. In a training phase, the recognition system can use clustering to identify classes of spline functions, each class grouping together spline functions which are similar to each other based on some distance measure. The recognition system can then store sets of spline parameters that represent respective classes of spline functions. An instance of a spline function that belongs to a class can make reference to an associated shared set of spline parameters. The Gaussian parameters can be represented in an efficient form that accommodates the use of sharing in the above-summarized manner.
    Type: Grant
    Filed: September 16, 2008
    Date of Patent: March 27, 2012
    Assignee: Microsoft Corporation
    Inventors: Dong Yu, Li Deng, Yifan Gong, Alejandro Acero
  • Patent number: 8107642
    Abstract: A noise reduction system and a method of noise reduction includes a microphone array comprising a first microphone, a second microphone, and a third microphone. Each microphone has a known position and a known directivity pattern. An instantaneous direction-of-arrival (IDOA) module determines a first phase difference quantity and a second phase difference quantity. The first phase difference quantity is based on phase differences between non-repetitive pairs of input signals received by the first microphone and the second microphone, while the second phase difference quantity is based on phase differences between non-repetitive pairs of input signals received by the first microphone and the third microphone. A spatial noise reduction module computes an estimate of a desired signal based on a priori spatial signal-to-noise ratio and an a posteriori spatial signal-to-noise ratio based on the first and second phase difference quantities.
    Type: Grant
    Filed: May 12, 2009
    Date of Patent: January 31, 2012
    Assignee: Microsoft Corporation
    Inventors: Alejandro Acero, Ivan J. Tashev, Michael L. Seltzer
  • Patent number: 8099279
    Abstract: A method of aiding a speech recognition program developer by grouping calls passing through an identified question-answer (QA) state or transition into clusters based on causes of problems associated with the calls is provided. The method includes determining a number of clusters into which a plurality of calls will be grouped. Then, the plurality of calls is at least partially randomly assigned to the different clusters. Model parameters are estimated using clustering information based upon the assignment of the plurality of calls to the different clusters. Individual probabilities are calculated for each of the plurality of calls using the estimated model parameters. The individual probabilities are indicative of a likelihood that the corresponding call belongs to a particular cluster. The plurality of calls is then re-assigned to the different clusters based upon the calculated probabilities. These steps are then repeated until the grouping of the plurality of calls achieves a desired stability.
    Type: Grant
    Filed: February 9, 2005
    Date of Patent: January 17, 2012
    Assignee: Microsoft Corporation
    Inventors: Alejandro Acero, Dong Yu
  • Publication number: 20110307251
    Abstract: Described is a multiple phase process/system that combines spatial filtering with regularization to separate sound from different sources such as the speech of two different speakers. In a first phase, frequency domain signals corresponding to the sensed sounds are processed into separated spatially filtered signals including by inputting the signals into a plurality of beamformers (which may include nullformers) followed by nonlinear spatial filters. In a regularization phase, the separated spatially filtered signals are input into an independent component analysis mechanism that is configured with multi-tap filters, followed by secondary nonlinear spatial filters. Separated audio signals are the provided via an inverse-transform.
    Type: Application
    Filed: June 15, 2010
    Publication date: December 15, 2011
    Applicant: MICROSOFT CORPORATION
    Inventors: Ivan Tashev, Lae-Hoon Kim, Alejandro Acero, Jason Scott Flaks
  • Patent number: 8065146
    Abstract: An answering machine detection module is used to determine whether a call recipient is an actual person or an answering machine. The answering machine detection module includes a speech recognizer and a call analysis module. The speech recognizer receives an audible response of the call recipient to a call. The speech recognizer processes the audible response and provides an output indicative of recognized speech. The call analysis module processes the output of the speech recognizer to generate an output indicative of whether the call recipient is a person or an answering machine.
    Type: Grant
    Filed: July 12, 2006
    Date of Patent: November 22, 2011
    Assignee: Microsoft Corporation
    Inventors: Alejandro Acero, Craig M. Fisher, Dong Yu, Ye-Yi Wang, Yun-Cheng Ju
  • Publication number: 20110274289
    Abstract: A novel beamforming post-processor technique with enhanced noise suppression capability. The present beamforming post-processor technique is a non-linear post-processing technique for sensor arrays (e.g., microphone arrays) which improves the directivity and signal separation capabilities. The technique works in so-called instantaneous direction of arrival space, estimates the probability for sound coming from a given incident angle or look-up direction and applies a time-varying, gain based, spatio-temporal filter for suppressing sounds coming from directions other than the sound source direction, resulting in minimal artifacts and musical noise.
    Type: Application
    Filed: July 20, 2011
    Publication date: November 10, 2011
    Applicant: Microsoft Corporation
    Inventors: Ivan Tashev, Alejandro Acero
  • Publication number: 20110274291
    Abstract: A novel adaptive beamforming technique with enhanced noise suppression capability. The technique incorporates the sound-source presence probability into an adaptive blocking matrix. In one embodiment the sound-source presence probability is estimated based on the instantaneous direction of arrival of the input signals and voice activity detection. The technique guarantees robustness to steering vector errors without imposing ad hoc constraints on the adaptive filter coefficients. It can provide good suppression performance for both directional interference signals as well as isotropic ambient noise.
    Type: Application
    Filed: July 21, 2011
    Publication date: November 10, 2011
    Applicant: Microsoft Corporation
    Inventors: Ivan Tashev, Alejandro Acero, Byung-Jun Yoon
  • Publication number: 20110270610
    Abstract: Parameters for distributions of a hidden trajectory model including means and variances are estimated using an acoustic likelihood function for observation vectors as an objection function for optimization. The estimation includes only acoustic data and not any intermediate estimate on hidden dynamic variables. Gradient ascent methods can be developed for optimizing the acoustic likelihood function.
    Type: Application
    Filed: July 14, 2011
    Publication date: November 3, 2011
    Applicant: MICROSOFT CORPORATION
    Inventors: Li Deng, Dong Yu, Xiaolong Li, Alejandro Acero
  • Publication number: 20110251844
    Abstract: Described is the use of acoustic data to improve grapheme-to-phoneme conversion for speech recognition, such as to more accurately recognize spoken names in a voice-dialing system. A joint model of acoustics and graphonemes (acoustic data, phonemes sequences, grapheme sequences and an alignment between phoneme sequences and grapheme sequences) is described, as is retraining by maximum likelihood training and discriminative training in adapting graphoneme model parameters using acoustic data. Also described is the unsupervised collection of grapheme labels for received acoustic data, thereby automatically obtaining a substantial number of actual samples that may be used in retraining. Speech input that does not meet a confidence threshold may be filtered out so as to not be used by the retrained model.
    Type: Application
    Filed: June 20, 2011
    Publication date: October 13, 2011
    Applicant: MICROSOFT CORPORATION
    Inventors: Xiao Li, Asela J. R. Gunawardana, Alejandro Acero
  • Publication number: 20110238416
    Abstract: Described is a technology by which a speech recognizer is adapted to perform in noisy environments using linear spline interpolation to approximate the nonlinear relationship between clean speech, noise, and noisy speech. Linear spline parameters that minimize the error the between predicted noisy features and actual noisy features are learned from training data, along with variance data that reflect regression errors. Also described is compensating for linear channel distortion and updating noise and channel parameters during speech recognition decoding.
    Type: Application
    Filed: March 24, 2010
    Publication date: September 29, 2011
    Applicant: MICROSOFT CORPORATION
    Inventors: Michael Lewis Seltzer, Kaustubh Prakash Kalgaonkar, Alejandro Acero
  • Publication number: 20110224982
    Abstract: Described is a technology in which information retrieval (IR) techniques are used in a speech recognition (ASR) system. Acoustic units (e.g., phones, syllables, multi-phone units, words and/or phrases) are decoded, and features found from those acoustic units. The features are then used with IR techniques (e.g., TF-IDF based retrieval) to obtain a target output (a word or words).
    Type: Application
    Filed: March 12, 2010
    Publication date: September 15, 2011
    Applicant: c/o Microsoft Corporation
    Inventors: Alejandro Acero, James Garnet Droppo, III, Xiaoqiang Xiao, Geoffrey G. Zweig
  • Patent number: 8019602
    Abstract: An automatic speech recognition system recognizes user changes to dictated text and infers whether such changes result from the user changing his/her mind, or whether such changes are a result of a recognition error. If a recognition error is detected, the system uses the type of user correction to modify itself to reduce the chance that such recognition error will occur again. Accordingly, the system and methods provide for significant speech recognition learning with little or no additional user interaction.
    Type: Grant
    Filed: January 20, 2004
    Date of Patent: September 13, 2011
    Assignee: Microsoft Corporation
    Inventors: Dong Yu, Peter Mau, Mei-Yuh Hwang, Alejandro Acero
  • Patent number: 8019089
    Abstract: A noisy audio signal, with user input device noise, is received. Particular frames in the audio signal that are corrupted by user input device noise are identified and removed. The removed audio data is then reconstructed to obtain a clean audio signal.
    Type: Grant
    Filed: November 20, 2006
    Date of Patent: September 13, 2011
    Assignee: Microsoft Corporation
    Inventors: Michael Seltzer, Alejandro Acero, Amarnag Subramanya