Patents by Inventor Alejandro Acero

Alejandro Acero has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Integrative and discriminative technique for spoken utterance translation

Patent number: 8407041

Abstract: Architecture that provides the integration of automatic speech recognition (ASR) and machine translation (MT) components of a full speech translation system. The architecture is an integrative and discriminative approach that employs an end-to-end objective function (the conditional probability of the translated sentence (target) given the source language's acoustic signal, as well as the associated BLEU score in the translation, as a goal in the integrated system. This goal defines the theoretically correct variables to determine the speech translation system output using a Bayesian decision rule. These theoretically correct variables are modified in practical use due to known imperfections of the various models used in building the full speech translation system. The disclosed approach also employs automatic training of these variables using minimum classification error (MCE) criterion.

Type: Grant

Filed: December 1, 2010

Date of Patent: March 26, 2013

Assignee: Microsoft Corporation

Inventors: Li Deng, Yaodong Zhang, Alejandro Acero, Xiaodong He
Utilizing features generated from phonic units in speech recognition

Patent number: 8401852

Abstract: A computer-implemented speech recognition system described herein includes a receiver component that receives a plurality of detected units of an audio signal, wherein the audio signal comprises a speech utterance of an individual. A selector component selects a subset of the plurality of detected units that correspond to a particular time-span. A generator component generates at least one feature with respect to the particular time-span, wherein the at least one feature is one of an existence feature, an expectation feature, or an edit distance feature. Additionally, a statistical speech recognition model outputs at least one word that corresponds to the particular time-span based at least in part upon the at least one feature generated by the feature generator component.

Type: Grant

Filed: November 30, 2009

Date of Patent: March 19, 2013

Assignee: Microsoft Corporation

Inventors: Geoffrey Gerson Zweig, Patrick An-Phu Nguyen, James Garnet Droppo, III, Alejandro Acero
Multichannel acoustic echo reduction

Patent number: 8385557

Abstract: A multichannel acoustic echo reduction system is described herein. The system includes an acoustic echo canceller (AEC) component having a fixed filter for each respective combination of loudspeaker and microphone signals and having an adaptive filter for each microphone signal. For each microphone signal, the AEC component modifies the microphone signal to reduce contributions from the outputs of the loudspeakers based at least in part on the respective adaptive filter associated with the microphone signal and the set of fixed filters associated with the respective microphone signal.

Type: Grant

Filed: June 19, 2008

Date of Patent: February 26, 2013

Assignee: Microsoft Corporation

Inventors: Ivan Jelev Tashev, Alejandro Acero, Nilesh Madhu
Loudspeaker array design

Patent number: 8379891

Abstract: Sound signals to be output from a loudspeaker array are modified by a plurality of filters designed according to an unconstrained optimization procedure to improve overall performance (e.g., power, directivity) of the loudspeaker array. More particularly, respective filters are configured to receive a signal to be output to a plurality of loudspeakers. Upon receiving the signal, the respective filters individually modify the received signal according to the results of the unconstrained optimization procedure and then output the individually modified signals to respective loudspeakers. The unconstrained optimization procedure takes into account manufacturing tolerances and individually enhances the signal output to each of a plurality of individual loudspeakers within an array to achieve an overall improvement in performance.

Type: Grant

Filed: June 4, 2008

Date of Patent: February 19, 2013

Assignee: Microsoft Corporation

Inventors: Ivan J. Tashev, James G. Droppo, Michael L. Seltzer, Alejandro Acero
Spatial audio for audio conferencing

Patent number: 8351589

Abstract: Spatialized audio is generated for voice data received at a telecommunications device based on spatial audio information received with the voice data and based on a determined virtual position of the source of the voice data for producing spatialized audio signals.

Type: Grant

Filed: June 16, 2009

Date of Patent: January 8, 2013

Assignee: Microsoft Corporation

Inventors: Alejandro Acero, Christian Huitema
Audio transforms in connection with multiparty communication

Patent number: 8340267

Abstract: The claimed subject matter relates to an architecture that can preprocess audio portions of communications in order to enrich multiparty communication sessions or environments. In particular, the architecture can provide both a public channel for public communications that are received by substantially all connected parties and can further provide a private channel for private communications that are received by a selected subset of all connected parties. Most particularly, the architecture can apply an audio transform to communications that occur during the multiparty communication session based upon a target audience of the communication. By way of illustration, the architecture can apply a whisper transform to private communications, an emotion transform based upon relationships, an ambience or spatial transform based upon physical locations, or a pace transform based upon lack of presence.

Type: Grant

Filed: February 5, 2009

Date of Patent: December 25, 2012

Assignee: Microsoft Corporation

Inventors: Dinei A. Florencio, Alejandro Acero, William Buxton, Phillip A. Chou, Ross G. Cutler, Jason Garms, Christian Huitema, Kori M. Quinn, Daniel Allen Rosenfeld, Zhengyou Zhang
System for using statistical classifiers for spoken language understanding

Patent number: 8335683

Abstract: The present invention involves using one or more statistical classifiers in order to perform task classification on natural language inputs. In another embodiment, the statistical classifiers can be used in conjunction with a rule-based classifier to perform task classification.

Type: Grant

Filed: January 23, 2003

Date of Patent: December 18, 2012

Assignee: Microsoft Corporation

Inventors: Alejandro Acero, Ciprian Chelba, Ye-Yi Wang, Leon Wong, Brendan Frey
Acoustic echo suppression

Patent number: 8325909

Abstract: Sound signals captured by a microphone are adjusted to provide improved sound quality. More particularly, an Acoustic Echo Reduction system which performs a first stage of echo reduction (e.g., acoustic echo cancellation) on a received signal is configured to perform a second stage of echo reduction (e.g., acoustic echo suppression) by segmenting the received signal into a plurality of frequency bins respectively comprised within a number of frames (e.g., 0.3 s to 0.5 s sound signal segments) for a given block. Data comprised within respective frequency bins is modeled according to a probability density function (e.g., Gaussian distribution). The probability of whether respective frequency bins comprise predominantly near-end signal or predominantly residual echo is calculated.

Type: Grant

Filed: June 25, 2008

Date of Patent: December 4, 2012

Assignee: Microsoft Corporation

Inventors: Ivan J. Tashev, Alejandro Acero, Nilesh Madhu
Speech recognition with non-linear noise reduction on Mel-frequency cepstra

Patent number: 8306817

Abstract: In an automatic speech recognition system, a feature extractor extracts features from a speech signal, and speech is recognized by the automatic speech recognition system based on the extracted features. Noise reduction as part of the feature extractor is provided by feature enhancement in which feature-domain noise reduction in the form of Mel-frequency cepstra is provided based on the minimum means square error criterion. Specifically, the devised method takes into account the random phase between the clean speech and the mixing noise. The feature-domain noise reduction is performed in a dimension-wise fashion to the individual dimensions of the feature vectors input to the automatic speech recognition system, in order to perform environment-robust speech recognition.

Type: Grant

Filed: January 8, 2008

Date of Patent: November 6, 2012

Assignee: Microsoft Corporation

Inventors: Dong Yu, Alejandro Acero, James G. Droppo, Li Deng
Discriminative training of language models for text and speech classification

Patent number: 8306818

Abstract: Methods are disclosed for estimating language models such that the conditional likelihood of a class given a word string, which is very well correlated with classification accuracy, is maximized. The methods comprise tuning statistical language model parameters jointly for all classes such that a classifier discriminates between the correct class and the incorrect ones for a given training sentence or utterance. Specific embodiments of the present invention pertain to implementation of the rational function growth transform in the context of a discriminative training technique for n-gram classifiers.

Type: Grant

Filed: April 15, 2008

Date of Patent: November 6, 2012

Assignee: Microsoft Corporation

Inventors: Ciprian Chelba, Alejandro Acero, Milind Mahajan
Speaker Identification

Publication number: 20120271632

Abstract: Speaker identification techniques are described. In one or more implementations, sample data is received at a computing device of one or more user utterances captured using a microphone. The sample data is processed by the computing device to identify a speaker of the one or more user utterances. The processing involving use of a feature set that includes features obtained using a filterbank having filters that space linearly at higher frequencies and logarithmically at lower frequencies, respectively, features that model the speaker's vocal tract transfer function, and features that indicate a vibration rate of vocal folds of the speaker of the sample data.

Type: Application

Filed: April 25, 2011

Publication date: October 25, 2012

Applicant: MICROSOFT CORPORATION

Inventors: Hoang T. Do, Ivan J. Tashev, Alejandro Acero, Jason S. Flaks, Robert N. Heitkamp, Molly R. Suver
Adapting a language model to accommodate inputs not found in a directory assistance listing

Patent number: 8285542

Abstract: A statistical language model is trained for use in a directory assistance system using the data in a directory assistance listing corpus. Calculations are made to determine how important words in the corpus are in distinguishing a listing from other listings, and how likely words are to be omitted or added by a user. The language model is trained using these calculations.

Type: Grant

Filed: February 15, 2011

Date of Patent: October 9, 2012

Assignee: Microsoft Corporation

Inventors: Dong Yu, Alejandro Acero, Yun-Cheng Ju
DEEP CONVEX NETWORK WITH JOINT USE OF NONLINEAR RANDOM PROJECTION, RESTRICTED BOLTZMANN MACHINE AND BATCH-BASED PARALLELIZABLE OPTIMIZATION

Publication number: 20120254086

Abstract: A method is disclosed herein that includes an act of causing a processor to access a deep-structured, layered or hierarchical model, called deep convex network, retained in a computer-readable medium, wherein the deep-structured model comprises a plurality of layers with weights assigned thereto. This layered model can produce the output serving as the scores to combine with transition probabilities between states in a hidden Markov model and language model scores to form a full speech recognizer. The method makes joint use of nonlinear random projections and RBM weights, and it stacks a lower module's output with the raw data to establish its immediately higher module. Batch-based, convex optimization is performed to learn a portion of the deep convex network's weights, rendering it appropriate for parallel computation to accomplish the training.

Type: Application

Filed: March 31, 2011

Publication date: October 4, 2012

Applicant: Microsoft Corporation

Inventors: Li Deng, Dong Yu, Alejandro Acero
Automatic speech recognition learning using categorization and selective incorporation of user-initiated corrections

Patent number: 8280733

Abstract: An automatic speech recognition system recognizes user changes to dictated text and infers whether such changes result from the user changing his/her mind, or whether such changes are a result of a recognition error. If a recognition error is detected, the system uses the type of user correction to modify itself to reduce the chance that such recognition error will occur again. Accordingly, the system and methods provide for significant speech recognition learning with little or no additional user interaction.

Type: Grant

Filed: September 17, 2010

Date of Patent: October 2, 2012

Assignee: Microsoft Corporation

Inventors: Dong Yu, Peter Mau, Mei-Yuh Hwang, Alejandro Acero
Adapting a compressed model for use in speech recognition

Patent number: 8239195

Abstract: A speech recognition system includes a receiver component that receives a distorted speech utterance. The speech recognition also includes an adaptor component that selectively adapts parameters of a compressed model used to recognize at least a portion of the distorted speech utterance, wherein the adaptor component selectively adapts the parameters of the compressed model based at least in part upon the received distorted speech utterance.

Type: Grant

Filed: September 23, 2008

Date of Patent: August 7, 2012

Assignee: Microsoft Corporation

Inventors: Jinyu Li, Li Deng, Dong Yu, Jian Wu, Yifan Gong, Alejandro Acero
Phase sensitive model adaptation for noisy speech recognition

Patent number: 8214215

Abstract: A speech recognition system described herein includes a receiver component that receives a distorted speech utterance. The speech recognition also includes an updater component that is in communication with a first model and a second model, wherein the updater component automatically updates parameters of the second model based at least in part upon joint estimates of additive and convolutive distortions output by the first model, wherein the joint estimates of additive and convolutive distortions are estimates of distortions based on a phase-sensitive model in the speech utterance received by the receiver component. Further, distortions other than additive and convolutive distortions, including other stationary and nonstationary sources, can also be estimated used to update the parameters of the second model.

Type: Grant

Filed: September 24, 2008

Date of Patent: July 3, 2012

Assignee: Microsoft Corporation

Inventors: Jinyu Li, Li Deng, Dong Yu, Yifan Gong, Alejandro Acero
Dual-Band Speech Encoding

Publication number: 20120166186

Abstract: This document describes various techniques for dual-band speech encoding. In some embodiments, a first type of speech feature is received from a remote entity, an estimate of a second type of speech feature is determined based on the first type of speech feature, the estimate of the second type of speech feature is provided to a speech recognizer, speech-recognition results based on the estimate of the second type of speech feature are received from the speech recognizer, and the speech-recognition results are transmitted to the remote entity.

Type: Application

Filed: December 23, 2010

Publication date: June 28, 2012

Applicant: Microsoft Corporation

Inventors: Alejandro Acero, James G. Droppo, III, Michael L. Seltzer
SEARCH LEXICON EXPANSION

Publication number: 20120158703

Abstract: One or more techniques and/or systems are disclosed for creating an expanded or improved lexicon for use in search-based semantic tagging. A set of first documents can be identified using a set of first lexicon elements as queries, and one or more first document patterns can be extracted from the set of first documents. The document patterns can be used to find one or more second documents in a query log that comprise the document patterns, which are associated with query terms used to return the second documents. The query terms for the second documents can be extracted and used to expand the lexicon. Elements within the lexicon may be weighted based upon relevance to different query domains, for example.

Type: Application

Filed: December 16, 2010

Publication date: June 21, 2012

Applicant: Microsoft Corporation

Inventors: Xiao Li, Jingjing Liu, Alejandro Acero, Ye-Yi Wang
WARPED SPECTRAL AND FINE ESTIMATE AUDIO ENCODING

Publication number: 20120143599

Abstract: A warped spectral estimate of an original audio signal can be used to encode a representation of a fine estimate of the original signal. The representation of the warped spectral estimate and the representation of the fine estimate can be sent to a speech recognition system. The representation of the warped spectral estimate can be passed to a speech recognition engine, where it may be used for speech recognition. The representation of the warped spectral estimate can also be used along with the representation of the fine estimate to reconstruct a representation of the original audio signal.

Type: Application

Filed: December 3, 2010

Publication date: June 7, 2012

Applicant: Microsoft Corporation

Inventors: Michael L. Seltzer, James G. Droppo, Henrique S. Malvar, Alejandro Acero, Xing Fan
INTEGRATIVE AND DISCRIMINATIVE TECHNIQUE FOR SPOKEN UTTERANCE TRANSLATION

Publication number: 20120143591

Abstract: Architecture that provides the integration of automatic speech recognition (ASR) and machine translation (MT) components of a full speech translation system. The architecture is an integrative and discriminative approach that employs an end-to-end objective function (the conditional probability of the translated sentence (target) given the source language's acoustic signal, as well as the associated BLEU score in the translation, as a goal in the integrated system. This goal defines the theoretically correct variables to determine the speech translation system output using a Bayesian decision rule. These theoretically correct variables are modified in practical use due to known imperfections of the various models used in building the full speech translation system. The disclosed approach also employs automatic training of these variables using minimum classification error (MCE) criterion.

Type: Application

Filed: December 1, 2010

Publication date: June 7, 2012

Applicant: Microsoft Corporation

Inventors: Li Deng, Yaodong Zhang, Alejandro Acero, Xiaodong He

prev 1 2 3 4 5 6 7 … next