Patents by Inventor Alejandro Acero

Alejandro Acero has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Deep Structured Semantic Model Produced Using Click-Through Data

Publication number: 20150074027

Abstract: A deep structured semantic module (DSSM) is described herein which uses a model that is discriminatively trained based on click-through data, e.g., such that a conditional likelihood of clicked documents, given respective queries, is maximized, and a condition likelihood of non-clicked documents, given the queries, is reduced. In operation, after training is complete, the DSSM maps an input item into an output item expressed in a semantic space, using the trained model. To facilitate training and runtime operation, a dimensionality-reduction module (DRM) can reduce the dimensionality of the input item that is fed to the DSSM. A search engine may use the above-summarized functionality to convert a query and a plurality of documents into the common semantic space, and then determine the similarity between the query and documents in the semantic space. The search engine may then rank the documents based, at least in part, on the similarity measures.

Type: Application

Filed: September 6, 2013

Publication date: March 12, 2015

Applicant: Microsoft Corporation

Inventors: Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alejandro Acero, Larry P. Heck
Structured models of repetition for speech recognition

Patent number: 8965765

Abstract: Described is a technology by which a structured model of repetition is used to determine the words spoken by a user, and/or a corresponding database entry, based in part on a prior utterance. For a repeated utterance, a joint probability analysis is performed on (at least some of) the corresponding word sequences as recognized by one or more recognizers) and associated acoustic data. For example, a generative probabilistic model, or a maximum entropy model may be used in the analysis. The second utterance may be a repetition of the first utterance using the exact words, or another structural transformation thereof relative to the first utterance, such as an extension that adds one or more words, a truncation that removes one or more words, or a whole or partial spelling of one or more words.

Type: Grant

Filed: September 19, 2008

Date of Patent: February 24, 2015

Assignee: Microsoft Corporation

Inventors: Geoffrey G. Zweig, Xiao Li, Dan Bohus, Alejandro Acero, Eric J. Horvitz
Parameter learning in a hidden trajectory model

Patent number: 8942978

Abstract: Parameters for distributions of a hidden trajectory model including means and variances are estimated using an acoustic likelihood function for observation vectors as an objection function for optimization. The estimation includes only acoustic data and not any intermediate estimate on hidden dynamic variables. Gradient ascent methods can be developed for optimizing the acoustic likelihood function.

Type: Grant

Filed: July 14, 2011

Date of Patent: January 27, 2015

Assignee: Microsoft Corporation

Inventors: Li Deng, Dong Yu, Xiaolong Li, Alejandro Acero
Dual-Band Speech Encoding

Publication number: 20140358525

Abstract: This document describes various techniques for dual-band speech encoding. In some embodiments, a first type of speech feature is received from a remote entity, an estimate of a second type of speech feature is determined based on the first type of speech feature, the estimate of the second type of speech feature is provided to a speech recognizer, speech-recognition results based on the estimate of the second type of speech feature are received from the speech recognizer, and the speech-recognition results are transmitted to the remote entity.

Type: Application

Filed: August 14, 2014

Publication date: December 4, 2014

Inventors: Alejandro Acero, James G. Droppo, III, Michael L. Seltzer
Robust adaptive beamforming with enhanced noise suppression

Patent number: 8818002

Abstract: A novel adaptive beamforming technique with enhanced noise suppression capability. The technique incorporates the sound-source presence probability into an adaptive blocking matrix. In one embodiment the sound-source presence probability is estimated based on the instantaneous direction of arrival of the input signals and voice activity detection. The technique guarantees robustness to steering vector errors without imposing ad hoc constraints on the adaptive filter coefficients. It can provide good suppression performance for both directional interference signals as well as isotropic ambient noise.

Type: Grant

Filed: July 21, 2011

Date of Patent: August 26, 2014

Assignee: Microsoft Corp.

Inventors: Ivan Tashev, Alejandro Acero, Byung-Jun Yoon
Dual-band speech encoding

Patent number: 8818797

Abstract: This document describes various techniques for dual-band speech encoding. In some embodiments, a first type of speech feature is received from a remote entity, an estimate of a second type of speech feature is determined based on the first type of speech feature, the estimate of the second type of speech feature is provided to a speech recognizer, speech-recognition results based on the estimate of the second type of speech feature are received from the speech recognizer, and the speech-recognition results are transmitted to the remote entity.

Type: Grant

Filed: December 23, 2010

Date of Patent: August 26, 2014

Assignee: Microsoft Corporation

Inventors: Alejandro Acero, James G. Droppo, III, Michael L. Seltzer
Feature-Augmented Neural Networks and Applications of Same

Publication number: 20140229158

Abstract: A system is described herein which uses a neural network having an input layer that accepts an input vector and a feature vector. The input vector represents at least part of input information, such as, but not limited to, a word or phrase in a sequence of input words. The feature vector provides supplemental information pertaining to the input information. The neural network produces an output vector based on the input vector and the feature vector. In one implementation, the neural network is a recurrent neural network. Also described herein are various applications of the system, including a machine translation application.

Type: Application

Filed: February 10, 2013

Publication date: August 14, 2014

Applicant: MICROSOFT CORPORATION

Inventors: Geoffrey G. Zweig, Tomas Mikolov, Alejandro Acero
Speaker identification

Patent number: 8719019

Abstract: Speaker identification techniques are described. In one or more implementations, sample data is received at a computing device of one or more user utterances captured using a microphone. The sample data is processed by the computing device to identify a speaker of the one or more user utterances. The processing involving use of a feature set that includes features obtained using a filterbank having filters that space linearly at higher frequencies and logarithmically at lower frequencies, respectively, features that model the speaker's vocal tract transfer function, and features that indicate a vibration rate of vocal folds of the speaker of the sample data.

Type: Grant

Filed: April 25, 2011

Date of Patent: May 6, 2014

Assignee: Microsoft Corporation

Inventors: Hoang T. Do, Ivan J. Tashev, Alejandro Acero, Jason S. Flaks, Robert N. Heitkamp, Molly R. Suver
Acoustic model adaptation using splines

Patent number: 8700394

Abstract: Described is a technology by which a speech recognizer is adapted to perform in noisy environments using linear spline interpolation to approximate the nonlinear relationship between clean speech, noise, and noisy speech. Linear spline parameters that minimize the error the between predicted noisy features and actual noisy features are learned from training data, along with variance data that reflect regression errors. Also described is compensating for linear channel distortion and updating noise and channel parameters during speech recognition decoding.

Type: Grant

Filed: March 24, 2010

Date of Patent: April 15, 2014

Assignee: Microsoft Corporation

Inventors: Michael Lewis Seltzer, Kaustubh Prakash Kalgaonkar, Alejandro Acero
Noise suppressor for speech recognition

Patent number: 8615393

Abstract: A noise suppressor for altering a speech signal is trained based on a speech recognition system. An objective function can be utilized to adjust parameters of the noise suppressor. The noise suppressor can be used to alter speech signals for the speech recognition system.

Type: Grant

Filed: November 15, 2006

Date of Patent: December 24, 2013

Assignee: Microsoft Corporation

Inventors: Ivan J. Tashev, Alejandro Acero, James G. Droppo
Sound source separation using spatial filtering and regularization phases

Patent number: 8583428

Abstract: Described is a multiple phase process/system that combines spatial filtering with regularization to separate sound from different sources such as the speech of two different speakers. In a first phase, frequency domain signals corresponding to the sensed sounds are processed into separated spatially filtered signals including by inputting the signals into a plurality of beamformers (which may include nullformers) followed by nonlinear spatial filters. In a regularization phase, the separated spatially filtered signals are input into an independent component analysis mechanism that is configured with multi-tap filters, followed by secondary nonlinear spatial filters. Separated audio signals are the provided via an inverse-transform.

Type: Grant

Filed: June 15, 2010

Date of Patent: November 12, 2013

Assignee: Microsoft Corporation

Inventors: Ivan Tashev, Lae-Hoon Kim, Alejandro Acero, Jason Scott Flaks
DEEP CONVEX NETWORK WITH JOINT USE OF NONLINEAR RANDOM PROJECTION, RESTRICTED BOLTZMANN MACHINE AND BATCH-BASED PARALLELIZABLE OPTIMIZATION

Publication number: 20130282634

Abstract: A method is disclosed herein that includes an act of causing a processor to access a deep-structured, layered or hierarchical model, called a deep convex network, retained in a computer-readable medium, wherein the deep-structured model comprises a plurality of layers with weights assigned thereto. This layered model can produce the output serving as the scores to combine with transition probabilities between states in a hidden Markov model and language model scores to form a full speech recognizer. Batch-based, convex optimization is performed to learn a portion of the deep convex network's weights, rendering it appropriate for parallel computation to accomplish the training. The method can further include the act of jointly substantially optimizing the weights, the transition probabilities, and the language model scores of the deep-structured model using the optimization criterion based on a sequence rather than a set of unrelated frames.

Type: Application

Filed: June 17, 2013

Publication date: October 24, 2013

Inventors: Li Deng, Dong Yu, Alejandro Acero
FACTORED TRANSFORMS FOR SEPARABLE ADAPTATION OF ACOUSTIC MODELS

Publication number: 20130253930

Abstract: Various technologies described herein pertain to adapting a speech recognizer to input speech data. A first linear transform can be selected from a first set of linear transforms based on a value of a first variability source corresponding to the input speech data, and a second linear transform can be selected from a second set of linear transforms based on a value of a second variability source corresponding to the input speech data. The linear transforms in the first and second sets can compensate for the first variability source and the second variability source, respectively. Moreover, the first linear transform can be applied to the input speech data to generate intermediate transformed speech data, and the second linear transform can be applied to the intermediate transformed speech data to generate transformed speech data. Further, speech can be recognized based on the transformed speech data to obtain a result.

Type: Application

Filed: March 23, 2012

Publication date: September 26, 2013

Applicant: MICROSOFT CORPORATION

Inventors: Michael Lewis Seltzer, Alejandro Acero
Warped spectral and fine estimate audio encoding

Patent number: 8532985

Abstract: A warped spectral estimate of an original audio signal can be used to encode a representation of a fine estimate of the original signal. The representation of the warped spectral estimate and the representation of the fine estimate can be sent to a speech recognition system. The representation of the warped spectral estimate can be passed to a speech recognition engine, where it may be used for speech recognition. The representation of the warped spectral estimate can also be used along with the representation of the fine estimate to reconstruct a representation of the original audio signal.

Type: Grant

Filed: December 3, 2010

Date of Patent: September 10, 2013

Assignee: Microsoft Coporation

Inventors: Michael L. Seltzer, James G. Droppo, Henrique S. Malvar, Alejandro Acero, Xing Fan
Incorporating prior knowledge into independent component analysis

Patent number: 8515096

Abstract: The quality of sound recorded from a plurality of people speaking at the same time is improved by incorporating prior knowledge into an independent component analysis (ICA) separating algorithm. More particularly, prior knowledge is defined as a probability distribution according to some prior situation (e.g., prior distribution of people in a room). A mixture of sounds (e.g., mixture of voices) from a plurality of sources (e.g., people) captured by one or more recording devices (e.g., microphones) is separated into individual components (e.g., individual voices from respective people) by applying an maximum a posteriori (MAP) ICA algorithm which incorporates prior knowledge of the respective sources (e.g., location of sources) directly into the MAP ICA algorithm thereby allowing recovery of independent underlying sounds associated with individual sources from the mixture.

Type: Grant

Filed: June 18, 2008

Date of Patent: August 20, 2013

Assignee: Microsoft Corporation

Inventors: Michael L. Seltzer, Graham Taylor, Alejandro Acero
Deep convex network with joint use of nonlinear random projection, Restricted Boltzmann Machine and batch-based parallelizable optimization

Patent number: 8489529

Abstract: A method is disclosed herein that includes an act of causing a processor to access a deep-structured, layered or hierarchical model, called deep convex network, retained in a computer-readable medium, wherein the deep-structured model comprises a plurality of layers with weights assigned thereto. This layered model can produce the output serving as the scores to combine with transition probabilities between states in a hidden Markov model and language model scores to form a full speech recognizer. The method makes joint use of nonlinear random projections and RBM weights, and it stacks a lower module's output with the raw data to establish its immediately higher module. Batch-based, convex optimization is performed to learn a portion of the deep convex network's weights, rendering it appropriate for parallel computation to accomplish the training.

Type: Grant

Filed: March 31, 2011

Date of Patent: July 16, 2013

Assignee: Microsoft Corporation

Inventors: Li Deng, Dong Yu, Alejandro Acero
Conditional model for natural language understanding

Patent number: 8442828

Abstract: A conditional model is used in spoken language understanding. One such model is a conditional random field model.

Type: Grant

Filed: March 17, 2006

Date of Patent: May 14, 2013

Assignee: Microsoft Corporation

Inventors: Ye-Yi Wang, Alejandro Acero, John Sie Yuen Lee, Milind V. Mahajan
Automatic reading tutoring with parallel polarized language modeling

Patent number: 8433576

Abstract: A novel system for automatic reading tutoring provides effective error detection and reduced false alarms combined with low processing time burdens and response times short enough to maintain a natural, engaging flow of interaction. According to one illustrative embodiment, an automatic reading tutoring method includes displaying a text output and receiving an acoustic input. The acoustic input is modeled with a domain-specific target language model specific to the text output, and with a general-domain garbage language model, both of which may be efficiently constructed as context-free grammars. The domain-specific target language model may be built dynamically or “on-the-fly” based on the currently displayed text (e.g. the story to be read by the user), while the general-domain garbage language model is shared among all different text outputs. User-perceptible tutoring feedback is provided based on the target language model and the garbage language model.

Type: Grant

Filed: January 19, 2007

Date of Patent: April 30, 2013

Assignee: Microsoft Corporation

Inventors: Xiaolong Li, Yun-Cheng Ju, Li Deng, Alejandro Acero
Generic framework for large-margin MCE training in speech recognition

Patent number: 8423364

Abstract: A method and apparatus for training an acoustic model are disclosed. A training corpus is accessed and converted into an initial acoustic model. Scores are calculated for a correct class and competitive classes, respectively, for each token given the initial acoustic model. Also, a sample-adaptive window bandwidth is calculated for each training token. From the calculated scores and the sample-adaptive window bandwidth values, loss values are calculated based on a loss function. The loss function, which may be derived from a Bayesian risk minimization viewpoint, can include a margin value that moves a decision boundary such that token-to-boundary distances for correct tokens that are near the decision boundary are maximized. The margin can either be a fixed margin or can vary monotonically as a function of algorithm iterations. The acoustic model is updated based on the calculated loss values. This process can be repeated until an empirical convergence is met.

Type: Grant

Filed: February 20, 2007

Date of Patent: April 16, 2013

Assignee: Microsoft Corporation

Inventors: Dong Yu, Alejandro Acero, Li Deng, Xiaodong He
Visual feedback for natural head positioning

Patent number: 8405706

Abstract: A videoconferencing conferee may be provided with feedback on his or her location relative a local video camera by altering how remote videoconference video is displayed on a local videoconference display viewed by the conferee. The conferee's location may be tracked and the displayed remote video may be altered in accordance to the changing location of the conferee. The remote video may appear to move in directions mirroring movement of the conferee. This effect may be achieved by modeling the remote video as offset and behind a virtual portal corresponding to the display. The remote video may be displayed according to a view of the remote video through the virtual portal. As the conferee's position changes, the view through the portal changes, and the remote video changes accordingly.

Type: Grant

Filed: December 17, 2008

Date of Patent: March 26, 2013

Assignee: Microsoft Corporation

Inventors: Zhengyou Zhang, Christian Huitema, Alejandro Acero

prev 1 2 3 4 5 6 … next