Patents by Inventor Alejandro Acero

Alejandro Acero has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Voice aware demographic personalization

Publication number: 20080298562

Abstract: A voice interaction system is configured to analyze an utterance and identify inherent attributes that are indicative of a demographic characteristic of the system user that spoke the utterance. The system then selects and presents a personalized response to the user, the response being selected based at least in part on the identified demographic characteristic. In one embodiment, the demographic characteristic is one or more of the caller's age, gender, ethnicity, education level, emotional state, health status and geographic group. In another embodiment, the selection of the response is further based on consideration of corroborative caller data.

Type: Application

Filed: June 4, 2007

Publication date: December 4, 2008

Applicant: Microsoft Corporation

Inventors: Yun-Cheng Ju, Alejandro Acero, Neal Alan Berstein, Geoffrey Gerson Zweig
Method of pattern recognition using noise reduction uncertainty

Patent number: 7460992

Abstract: A method and apparatus are provided for using the uncertainty of a noise-removal process during pattern recognition. In particular, noise is removed from a representation of a portion of a noisy signal to produce a representation of a cleaned signal. In the meantime, an uncertainty associated with the noise removal is computed and is used with the representation of the cleaned signal to modify a probability for a phonetic state in the recognition system. In particular embodiments, the uncertainty is used to modify a probability distribution, by increasing the variance in each Gaussian distribution by the amount equal to the estimated variance of the cleaned signal, which is used in decoding the phonetic state sequence in a pattern recognition task.

Type: Grant

Filed: May 16, 2006

Date of Patent: December 2, 2008

Assignee: Microsoft Corporation

Inventors: James G. Droppo, Alejandro Acero, Li Deng
SENSOR ARRAY BEAMFORMER POST-PROCESSOR

Publication number: 20080288219

Abstract: A novel beamforming post-processor technique with enhanced noise suppression capability. The present beam forming post-processor technique is a non-linear post-processing technique for sensor arrays (e.g., microphone arrays) which improves the directivity and signal separation capabilities. The technique works in so-called instantaneous direction of arrival space, estimates the probability for sound coming from a given incident angle or look-up direction and applies a time-varying, gain based, spatio-temporal filter for suppressing sounds coming from directions other than the sound source direction resulting in minimal artifacts and musical noise.

Type: Application

Filed: May 17, 2007

Publication date: November 20, 2008

Applicant: Microsoft Corporation

Inventors: Ivan Tashev, Alejandro Acero
Training wideband acoustic models in the cepstral domain using mixed-bandwidth training data and extended vectors for speech recognition

Patent number: 7454338

Abstract: A method and apparatus are provided that generate values for a first set of dimensions of a feature vector from a speech signal. The values of the first set of dimensions are used to estimate values for a second set of dimensions of the feature vector to form an extended feature vector. The extended feature vector is then used to train an acoustic model.

Type: Grant

Filed: February 8, 2005

Date of Patent: November 18, 2008

Assignee: Microsoft Corporation

Inventors: Michael L. Seltzer, Alejandro Acero
METHOD OF PATTERN RECOGNITION USING NOISE REDUCTION UNCERTAINTY

Publication number: 20080281591

Abstract: A method and apparatus are provided for using the uncertainty of a noise-removal process during pattern recognition. In particular, noise is removed from a representation of a portion of a noisy signal to produce a representation of a cleaned signal. In the meantime, an uncertainty associated with the noise removal is computed and is used with the representation of the cleaned signal to modify a probability for a phonetic state in the recognition system. In particular embodiments, the uncertainty is used to modify a probability distribution, by increasing the variance in each Gaussian distribution by the amount equal to the estimated variance of the cleaned signal, which is used in decoding the phonetic state sequence in a pattern recognition task.

Type: Application

Filed: July 25, 2008

Publication date: November 13, 2008

Applicant: MICROSOFT CORPORATION

Inventors: James G. Droppo, Alejandro Acero, Li Deng
USING STRUCTURED DATABASE FOR WEBPAGE INFORMATION EXTRACTION

Publication number: 20080281827

Abstract: A structured database is used for webpage information extraction, and in particular, to obtain training data from the webpage for training a statistical model. The structured database has a plurality of entries, wherein each entry comprises a plurality of fields. One of the fields comprises a URL (uniform resource locater), while another field comprises information at least similar to other information to be located in a webpage associated with the URL. For at least some of the entries in the structured database, a web page associated with the URL is retrieved. The webpage is analyzed and if information is found in the webpage similar to the information in the structured database, the webpage is identified as being suitable to be considered as a training sample.

Type: Application

Filed: May 10, 2007

Publication date: November 13, 2008

Applicant: Microsoft Corporation

Inventors: Ye-Yi Wang, Alejandro Acero, Mandar A. Rahurkar
SEARCHING A DATABASE OF LISTINGS

Publication number: 20080281806

Abstract: A database having listings rather than long documents is searched using a term frequency-inverse document frequency (Tf/Idf) algorithm.

Type: Application

Filed: May 10, 2007

Publication date: November 13, 2008

Applicant: Microsoft Corporation

Inventors: Ye-Yi Wang, Dong Yu, Yun-Cheng Ju, Alejandro Acero, Geoffrey G. Zweig
Removing noise from feature vectors

Patent number: 7451083

Abstract: A method and computer-readable medium are provided for identifying clean signal feature vectors from noisy signal feature vectors. One aspect of the invention includes using an iterative approach to identify the clean signal feature vector. Another aspect of the invention includes using the variance of a set of noise feature vectors and/or channel distortion feature vectors when identifying the clean signal feature vectors.

Type: Grant

Filed: July 20, 2005

Date of Patent: November 11, 2008

Assignee: Microsoft Corporation

Inventors: Brendan J. Frey, Alejandro Acero, Li Deng
Method and apparatus for multi-sensory speech enhancement

Patent number: 7447630

Abstract: A method and system use an alternative sensor signal received from a sensor other than an air conduction microphone to estimate a clean speech value. The estimation uses either the alternative sensor signal alone, or in conjunction with the air conduction microphone signal. The clean speech value is estimated without using a model trained from noisy training data collected from an air conduction microphone. Under one embodiment, correction vectors are added to a vector formed from the alternative sensor signal in order to form a filter, which is applied to the air conductive microphone signal to produce the clean speech estimate. In other embodiments, the pitch of a speech signal is determined from the alternative sensor signal and is used to decompose an air conduction microphone signal. The decomposed signal is then used to determine a clean signal estimate.

Type: Grant

Filed: November 26, 2003

Date of Patent: November 4, 2008

Assignee: Microsoft Corporation

Inventors: Zicheng Liu, Michael J. Sinclair, Alejandro Acero, Xuedong D. Huang, James G. Droppo, Li Deng, Zhengyou Zhang, Yanli Zheng
Multimodal rating system

Publication number: 20080262995

Abstract: A method of communicating information about a product evaluation between a system having a data store and a wireless client device is discussed. The method includes receiving a signal representative of an audible indication from the client device via a wireless communication link identifying the product about which evaluation information is to be communicated. The method further includes comparing an indication of the signal to data in the data store in response to match the indication with a portion of the data and communicating evaluation information between the wireless client device and the system.

Type: Application

Filed: April 19, 2007

Publication date: October 23, 2008

Applicant: Microsoft Corporation

Inventors: Geoffrey Gerson Zweig, Yun-Cheng Ju, Patrick Nguyen, Alejandro Acero
ROBUST ADAPTIVE BEAMFORMING WITH ENHANCED NOISE SUPPRESSION

Publication number: 20080232607

Abstract: A novel adaptive beamforming technique with enhanced noise suppression capability. The technique incorporates the sound-source presence probability into an adaptive blocking matrix. In one embodiment the sound-source presence probability is estimated based on the instantaneous direction of arrival of the input signals and voice activity detection. The technique guarantees robustness to steering vector errors without imposing ad hoc constraints on the adaptive filter coefficients. It can provide good suppression performance for both directional interference signals as well as isotropic ambient noise.

Type: Application

Filed: March 22, 2007

Publication date: September 25, 2008

Applicant: Microsoft Corporation

Inventors: Ivan Tashev, Alejandro Acero, Byung-Jun Yoon
Method and apparatus for formant tracking using a residual model

Patent number: 7424423

Abstract: A method of tracking formants defines a formant search space comprising sets of formants to be searched. Formants are identified for a first frame in the speech utterance by searching the entirety of the formant search space using the codebook, and for the remaining frames by searching the same space using both the codebook and the continuity constraint across adjacent frames. Under one embodiment, the formants are identified by mapping sets of formants into feature vectors and applying the feature vectors to a model. Formants are also identified by applying dynamic programming to search for the best sequence that optimally satisfies the continuity constraint required by the model.

Type: Grant

Filed: April 1, 2003

Date of Patent: September 9, 2008

Assignee: Microsoft Corporation

Inventors: Issam Bazzi, Li Deng, Alejandro Acero
DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR TEXT AND SPEECH CLASSIFICATION

Publication number: 20080215311

Abstract: Methods are disclosed for estimating language models such that the conditional likelihood of a class given a word string, which is very well correlated with classification accuracy, is maximized. The methods comprise tuning statistical language model parameters jointly for all classes such that a classifier discriminates between the correct class and the incorrect ones for a given training sentence or utterance. Specific embodiments of the present invention pertain to implementation of the rational function growth transform in the context of a discriminative training technique for n-gram classifiers.

Type: Application

Filed: April 15, 2008

Publication date: September 4, 2008

Applicant: Microsoft Corporation

Inventors: Ciprian Chelba, Alejandro Acero, Milind Mahajan
Pitch model for noise estimation

Publication number: 20080215321

Abstract: Pitch is tracked for individual samples, which are taken much more frequently than an analysis frame. Speech is identified based on the tracked pitch and the speech components of the signal are removed with a time-varying filter, leaving only an estimate of a time-varying speech signal. This estimate is then used to generate a time-varying noise model which, in turn, can be used to enhance speech related systems.

Type: Application

Filed: April 19, 2007

Publication date: September 4, 2008

Applicant: Microsoft Corporation

Inventors: James G. Droppo, Alejandro Acero, Luis Buera
Noise robust speech recognition with a switching linear dynamic model

Patent number: 7418383

Abstract: A unified, nonlinear, non-stationary, stochastic model is disclosed for estimating and removing effects of background noise on speech cepstra. Generally stated, the model is a union of dynamic system equations for speech and noise, and a model describing how speech and noise are mixed. Embodiments also pertain to related methods for enhancement.

Type: Grant

Filed: September 3, 2004

Date of Patent: August 26, 2008

Assignee: Microsoft Corporation

Inventors: James Droppo, Alejandro Acero
Generic framework for large-margin MCE training in speech recognition

Publication number: 20080201139

Abstract: A method and apparatus for training an acoustic model are disclosed. A training corpus is accessed and converted into an initial acoustic model. Scores are calculated for a correct class and competitive classes, respectively, for each token given the initial acoustic model. Also, a sample-adaptive window bandwidth is calculated for each training token. From the calculated scores and the sample-adaptive window bandwidth values, loss values are calculated based on a loss function. The loss function, which may be derived from a Bayesian risk minimization viewpoint, can include a margin value that moves a decision boundary such that token-to-boundary distances for correct tokens that are near the decision boundary are maximized. The margin can either be a fixed margin or can vary monotonically as a function of algorithm iterations. The acoustic model is updated based on the calculated loss values. This process can be repeated until an empirical convergence is met.

Type: Application

Filed: February 20, 2007

Publication date: August 21, 2008

Applicant: Microsoft Corporation

Inventors: Dong Yu, Alejandro Acero, Li Deng, Xiaodong He
Two-stage implementation for phonetic recognition using a bi-directional target-filtering model of speech coarticulation and reduction

Patent number: 7409346

Abstract: A structured generative model of a speech coarticulation and reduction is described with a novel two-stage implementation. At the first stage, the dynamics of formants or vocal tract resonance (VTR) are generated using prior information of resonance targets in the phone sequence. Bi-directional temporal filtering with finite impulse response (FIR) is applied to the segmental target sequence as the FIR filter's input. At the second stage the dynamics of speech cepstra are predicted analytically based on the FIR filtered VTR targets. The combined system of these two stages thus generates correlated and causally related VTR and cepstral dynamics where phonetic reduction is represented explicitly in the hidden resonance space and implicitly in the observed cepstral space. The combined system also gives the acoustic observation probability given a phone sequence. Using this probability, different phone sequences can be compared and ranked in terms of their respective probability values.

Type: Grant

Filed: March 1, 2005

Date of Patent: August 5, 2008

Assignee: Microsoft Corporation

Inventors: Alejandro Acero, Dong Yu, Li Deng
Representation of a deleted interpolation N-gram language model in ARPA standard format

Patent number: 7406416

Abstract: A method and apparatus are provided for storing parameters of a deleted interpolation language model as parameters of a backoff language model. In particular, the parameters of the deleted interpolation language model are stored in the standard ARPA format. Under one embodiment, the deleted interpolation language model parameters are formed using fractional counts.

Type: Grant

Filed: March 26, 2004

Date of Patent: July 29, 2008

Assignee: Microsoft Corporation

Inventors: Ciprian Chelba, Milind Mahajan, Alejandro Acero
Integrated speech recognition and semantic classification

Publication number: 20080177547

Abstract: A novel system integrates speech recognition and semantic classification, so that acoustic scores in a speech recognizer that accepts spoken utterances may be taken into account when training both language models and semantic classification models. For example, a joint association score may be defined that is indicative of a correspondence of a semantic class and a word sequence for an acoustic signal. The joint association score may incorporate parameters such as weighting parameters for signal-to-class modeling of the acoustic signal, language model parameters and scores, and acoustic model parameters and scores. The parameters may be revised to raise the joint association score of a target word sequence with a target semantic class relative to the joint association score of a competitor word sequence with the target semantic class. The parameters may be designed so that the semantic classification errors in the training data are minimized.

Type: Application

Filed: January 19, 2007

Publication date: July 24, 2008

Applicant: Microsoft Corporation

Inventors: Sibel Yaman, Li Deng, Dong Yu, Ye-Yi Wang, Alejandro Acero
Automatic reading tutoring with parallel polarized language modeling

Publication number: 20080177545

Abstract: A novel system for automatic reading tutoring provides effective error detection and reduced false alarms combined with low processing time burdens and response times short enough to maintain a natural, engaging flow of interaction. According to one illustrative embodiment, an automatic reading tutoring method includes displaying a text output and receiving an acoustic input. The acoustic input is modeled with a domain-specific target language model specific to the text output, and with a general-domain garbage language model, both of which may be efficiently constructed as context-free grammars. The domain-specific target language model may be built dynamically or “on-the-fly” based on the currently displayed text (e.g. the story to be read by the user), while the general-domain garbage language model is shared among all different text outputs. User-perceptible tutoring feedback is provided based on the target language model and the garbage language model.

Type: Application

Filed: January 19, 2007

Publication date: July 24, 2008

Applicant: Microsoft Corporation

Inventors: Xiaolong Li, Yun-Cheng Ju, Li Deng, Alejandro Acero

prev … 5 6 7 8 9 10 11 12 13 … next