Patents by Inventor Alejandro Acero
Alejandro Acero has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 8407041Abstract: Architecture that provides the integration of automatic speech recognition (ASR) and machine translation (MT) components of a full speech translation system. The architecture is an integrative and discriminative approach that employs an end-to-end objective function (the conditional probability of the translated sentence (target) given the source language's acoustic signal, as well as the associated BLEU score in the translation, as a goal in the integrated system. This goal defines the theoretically correct variables to determine the speech translation system output using a Bayesian decision rule. These theoretically correct variables are modified in practical use due to known imperfections of the various models used in building the full speech translation system. The disclosed approach also employs automatic training of these variables using minimum classification error (MCE) criterion.Type: GrantFiled: December 1, 2010Date of Patent: March 26, 2013Assignee: Microsoft CorporationInventors: Li Deng, Yaodong Zhang, Alejandro Acero, Xiaodong He
-
Patent number: 8401852Abstract: A computer-implemented speech recognition system described herein includes a receiver component that receives a plurality of detected units of an audio signal, wherein the audio signal comprises a speech utterance of an individual. A selector component selects a subset of the plurality of detected units that correspond to a particular time-span. A generator component generates at least one feature with respect to the particular time-span, wherein the at least one feature is one of an existence feature, an expectation feature, or an edit distance feature. Additionally, a statistical speech recognition model outputs at least one word that corresponds to the particular time-span based at least in part upon the at least one feature generated by the feature generator component.Type: GrantFiled: November 30, 2009Date of Patent: March 19, 2013Assignee: Microsoft CorporationInventors: Geoffrey Gerson Zweig, Patrick An-Phu Nguyen, James Garnet Droppo, III, Alejandro Acero
-
Patent number: 8385557Abstract: A multichannel acoustic echo reduction system is described herein. The system includes an acoustic echo canceller (AEC) component having a fixed filter for each respective combination of loudspeaker and microphone signals and having an adaptive filter for each microphone signal. For each microphone signal, the AEC component modifies the microphone signal to reduce contributions from the outputs of the loudspeakers based at least in part on the respective adaptive filter associated with the microphone signal and the set of fixed filters associated with the respective microphone signal.Type: GrantFiled: June 19, 2008Date of Patent: February 26, 2013Assignee: Microsoft CorporationInventors: Ivan Jelev Tashev, Alejandro Acero, Nilesh Madhu
-
Patent number: 8379891Abstract: Sound signals to be output from a loudspeaker array are modified by a plurality of filters designed according to an unconstrained optimization procedure to improve overall performance (e.g., power, directivity) of the loudspeaker array. More particularly, respective filters are configured to receive a signal to be output to a plurality of loudspeakers. Upon receiving the signal, the respective filters individually modify the received signal according to the results of the unconstrained optimization procedure and then output the individually modified signals to respective loudspeakers. The unconstrained optimization procedure takes into account manufacturing tolerances and individually enhances the signal output to each of a plurality of individual loudspeakers within an array to achieve an overall improvement in performance.Type: GrantFiled: June 4, 2008Date of Patent: February 19, 2013Assignee: Microsoft CorporationInventors: Ivan J. Tashev, James G. Droppo, Michael L. Seltzer, Alejandro Acero
-
Patent number: 8351589Abstract: Spatialized audio is generated for voice data received at a telecommunications device based on spatial audio information received with the voice data and based on a determined virtual position of the source of the voice data for producing spatialized audio signals.Type: GrantFiled: June 16, 2009Date of Patent: January 8, 2013Assignee: Microsoft CorporationInventors: Alejandro Acero, Christian Huitema
-
Patent number: 8340267Abstract: The claimed subject matter relates to an architecture that can preprocess audio portions of communications in order to enrich multiparty communication sessions or environments. In particular, the architecture can provide both a public channel for public communications that are received by substantially all connected parties and can further provide a private channel for private communications that are received by a selected subset of all connected parties. Most particularly, the architecture can apply an audio transform to communications that occur during the multiparty communication session based upon a target audience of the communication. By way of illustration, the architecture can apply a whisper transform to private communications, an emotion transform based upon relationships, an ambience or spatial transform based upon physical locations, or a pace transform based upon lack of presence.Type: GrantFiled: February 5, 2009Date of Patent: December 25, 2012Assignee: Microsoft CorporationInventors: Dinei A. Florencio, Alejandro Acero, William Buxton, Phillip A. Chou, Ross G. Cutler, Jason Garms, Christian Huitema, Kori M. Quinn, Daniel Allen Rosenfeld, Zhengyou Zhang
-
Patent number: 8335683Abstract: The present invention involves using one or more statistical classifiers in order to perform task classification on natural language inputs. In another embodiment, the statistical classifiers can be used in conjunction with a rule-based classifier to perform task classification.Type: GrantFiled: January 23, 2003Date of Patent: December 18, 2012Assignee: Microsoft CorporationInventors: Alejandro Acero, Ciprian Chelba, Ye-Yi Wang, Leon Wong, Brendan Frey
-
Patent number: 8325909Abstract: Sound signals captured by a microphone are adjusted to provide improved sound quality. More particularly, an Acoustic Echo Reduction system which performs a first stage of echo reduction (e.g., acoustic echo cancellation) on a received signal is configured to perform a second stage of echo reduction (e.g., acoustic echo suppression) by segmenting the received signal into a plurality of frequency bins respectively comprised within a number of frames (e.g., 0.3 s to 0.5 s sound signal segments) for a given block. Data comprised within respective frequency bins is modeled according to a probability density function (e.g., Gaussian distribution). The probability of whether respective frequency bins comprise predominantly near-end signal or predominantly residual echo is calculated.Type: GrantFiled: June 25, 2008Date of Patent: December 4, 2012Assignee: Microsoft CorporationInventors: Ivan J. Tashev, Alejandro Acero, Nilesh Madhu
-
Patent number: 8306817Abstract: In an automatic speech recognition system, a feature extractor extracts features from a speech signal, and speech is recognized by the automatic speech recognition system based on the extracted features. Noise reduction as part of the feature extractor is provided by feature enhancement in which feature-domain noise reduction in the form of Mel-frequency cepstra is provided based on the minimum means square error criterion. Specifically, the devised method takes into account the random phase between the clean speech and the mixing noise. The feature-domain noise reduction is performed in a dimension-wise fashion to the individual dimensions of the feature vectors input to the automatic speech recognition system, in order to perform environment-robust speech recognition.Type: GrantFiled: January 8, 2008Date of Patent: November 6, 2012Assignee: Microsoft CorporationInventors: Dong Yu, Alejandro Acero, James G. Droppo, Li Deng
-
Patent number: 8306818Abstract: Methods are disclosed for estimating language models such that the conditional likelihood of a class given a word string, which is very well correlated with classification accuracy, is maximized. The methods comprise tuning statistical language model parameters jointly for all classes such that a classifier discriminates between the correct class and the incorrect ones for a given training sentence or utterance. Specific embodiments of the present invention pertain to implementation of the rational function growth transform in the context of a discriminative training technique for n-gram classifiers.Type: GrantFiled: April 15, 2008Date of Patent: November 6, 2012Assignee: Microsoft CorporationInventors: Ciprian Chelba, Alejandro Acero, Milind Mahajan
-
Publication number: 20120271632Abstract: Speaker identification techniques are described. In one or more implementations, sample data is received at a computing device of one or more user utterances captured using a microphone. The sample data is processed by the computing device to identify a speaker of the one or more user utterances. The processing involving use of a feature set that includes features obtained using a filterbank having filters that space linearly at higher frequencies and logarithmically at lower frequencies, respectively, features that model the speaker's vocal tract transfer function, and features that indicate a vibration rate of vocal folds of the speaker of the sample data.Type: ApplicationFiled: April 25, 2011Publication date: October 25, 2012Applicant: MICROSOFT CORPORATIONInventors: Hoang T. Do, Ivan J. Tashev, Alejandro Acero, Jason S. Flaks, Robert N. Heitkamp, Molly R. Suver
-
Patent number: 8285542Abstract: A statistical language model is trained for use in a directory assistance system using the data in a directory assistance listing corpus. Calculations are made to determine how important words in the corpus are in distinguishing a listing from other listings, and how likely words are to be omitted or added by a user. The language model is trained using these calculations.Type: GrantFiled: February 15, 2011Date of Patent: October 9, 2012Assignee: Microsoft CorporationInventors: Dong Yu, Alejandro Acero, Yun-Cheng Ju
-
Publication number: 20120254086Abstract: A method is disclosed herein that includes an act of causing a processor to access a deep-structured, layered or hierarchical model, called deep convex network, retained in a computer-readable medium, wherein the deep-structured model comprises a plurality of layers with weights assigned thereto. This layered model can produce the output serving as the scores to combine with transition probabilities between states in a hidden Markov model and language model scores to form a full speech recognizer. The method makes joint use of nonlinear random projections and RBM weights, and it stacks a lower module's output with the raw data to establish its immediately higher module. Batch-based, convex optimization is performed to learn a portion of the deep convex network's weights, rendering it appropriate for parallel computation to accomplish the training.Type: ApplicationFiled: March 31, 2011Publication date: October 4, 2012Applicant: Microsoft CorporationInventors: Li Deng, Dong Yu, Alejandro Acero
-
Patent number: 8280733Abstract: An automatic speech recognition system recognizes user changes to dictated text and infers whether such changes result from the user changing his/her mind, or whether such changes are a result of a recognition error. If a recognition error is detected, the system uses the type of user correction to modify itself to reduce the chance that such recognition error will occur again. Accordingly, the system and methods provide for significant speech recognition learning with little or no additional user interaction.Type: GrantFiled: September 17, 2010Date of Patent: October 2, 2012Assignee: Microsoft CorporationInventors: Dong Yu, Peter Mau, Mei-Yuh Hwang, Alejandro Acero
-
Patent number: 8239195Abstract: A speech recognition system includes a receiver component that receives a distorted speech utterance. The speech recognition also includes an adaptor component that selectively adapts parameters of a compressed model used to recognize at least a portion of the distorted speech utterance, wherein the adaptor component selectively adapts the parameters of the compressed model based at least in part upon the received distorted speech utterance.Type: GrantFiled: September 23, 2008Date of Patent: August 7, 2012Assignee: Microsoft CorporationInventors: Jinyu Li, Li Deng, Dong Yu, Jian Wu, Yifan Gong, Alejandro Acero
-
Patent number: 8214215Abstract: A speech recognition system described herein includes a receiver component that receives a distorted speech utterance. The speech recognition also includes an updater component that is in communication with a first model and a second model, wherein the updater component automatically updates parameters of the second model based at least in part upon joint estimates of additive and convolutive distortions output by the first model, wherein the joint estimates of additive and convolutive distortions are estimates of distortions based on a phase-sensitive model in the speech utterance received by the receiver component. Further, distortions other than additive and convolutive distortions, including other stationary and nonstationary sources, can also be estimated used to update the parameters of the second model.Type: GrantFiled: September 24, 2008Date of Patent: July 3, 2012Assignee: Microsoft CorporationInventors: Jinyu Li, Li Deng, Dong Yu, Yifan Gong, Alejandro Acero
-
Publication number: 20120166186Abstract: This document describes various techniques for dual-band speech encoding. In some embodiments, a first type of speech feature is received from a remote entity, an estimate of a second type of speech feature is determined based on the first type of speech feature, the estimate of the second type of speech feature is provided to a speech recognizer, speech-recognition results based on the estimate of the second type of speech feature are received from the speech recognizer, and the speech-recognition results are transmitted to the remote entity.Type: ApplicationFiled: December 23, 2010Publication date: June 28, 2012Applicant: Microsoft CorporationInventors: Alejandro Acero, James G. Droppo, III, Michael L. Seltzer
-
Publication number: 20120158703Abstract: One or more techniques and/or systems are disclosed for creating an expanded or improved lexicon for use in search-based semantic tagging. A set of first documents can be identified using a set of first lexicon elements as queries, and one or more first document patterns can be extracted from the set of first documents. The document patterns can be used to find one or more second documents in a query log that comprise the document patterns, which are associated with query terms used to return the second documents. The query terms for the second documents can be extracted and used to expand the lexicon. Elements within the lexicon may be weighted based upon relevance to different query domains, for example.Type: ApplicationFiled: December 16, 2010Publication date: June 21, 2012Applicant: Microsoft CorporationInventors: Xiao Li, Jingjing Liu, Alejandro Acero, Ye-Yi Wang
-
Publication number: 20120143599Abstract: A warped spectral estimate of an original audio signal can be used to encode a representation of a fine estimate of the original signal. The representation of the warped spectral estimate and the representation of the fine estimate can be sent to a speech recognition system. The representation of the warped spectral estimate can be passed to a speech recognition engine, where it may be used for speech recognition. The representation of the warped spectral estimate can also be used along with the representation of the fine estimate to reconstruct a representation of the original audio signal.Type: ApplicationFiled: December 3, 2010Publication date: June 7, 2012Applicant: Microsoft CorporationInventors: Michael L. Seltzer, James G. Droppo, Henrique S. Malvar, Alejandro Acero, Xing Fan
-
Publication number: 20120143591Abstract: Architecture that provides the integration of automatic speech recognition (ASR) and machine translation (MT) components of a full speech translation system. The architecture is an integrative and discriminative approach that employs an end-to-end objective function (the conditional probability of the translated sentence (target) given the source language's acoustic signal, as well as the associated BLEU score in the translation, as a goal in the integrated system. This goal defines the theoretically correct variables to determine the speech translation system output using a Bayesian decision rule. These theoretically correct variables are modified in practical use due to known imperfections of the various models used in building the full speech translation system. The disclosed approach also employs automatic training of these variables using minimum classification error (MCE) criterion.Type: ApplicationFiled: December 1, 2010Publication date: June 7, 2012Applicant: Microsoft CorporationInventors: Li Deng, Yaodong Zhang, Alejandro Acero, Xiaodong He