Patents by Inventor Alejandro Acero

Alejandro Acero has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20150074027
    Abstract: A deep structured semantic module (DSSM) is described herein which uses a model that is discriminatively trained based on click-through data, e.g., such that a conditional likelihood of clicked documents, given respective queries, is maximized, and a condition likelihood of non-clicked documents, given the queries, is reduced. In operation, after training is complete, the DSSM maps an input item into an output item expressed in a semantic space, using the trained model. To facilitate training and runtime operation, a dimensionality-reduction module (DRM) can reduce the dimensionality of the input item that is fed to the DSSM. A search engine may use the above-summarized functionality to convert a query and a plurality of documents into the common semantic space, and then determine the similarity between the query and documents in the semantic space. The search engine may then rank the documents based, at least in part, on the similarity measures.
    Type: Application
    Filed: September 6, 2013
    Publication date: March 12, 2015
    Applicant: Microsoft Corporation
    Inventors: Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alejandro Acero, Larry P. Heck
  • Patent number: 8965765
    Abstract: Described is a technology by which a structured model of repetition is used to determine the words spoken by a user, and/or a corresponding database entry, based in part on a prior utterance. For a repeated utterance, a joint probability analysis is performed on (at least some of) the corresponding word sequences as recognized by one or more recognizers) and associated acoustic data. For example, a generative probabilistic model, or a maximum entropy model may be used in the analysis. The second utterance may be a repetition of the first utterance using the exact words, or another structural transformation thereof relative to the first utterance, such as an extension that adds one or more words, a truncation that removes one or more words, or a whole or partial spelling of one or more words.
    Type: Grant
    Filed: September 19, 2008
    Date of Patent: February 24, 2015
    Assignee: Microsoft Corporation
    Inventors: Geoffrey G. Zweig, Xiao Li, Dan Bohus, Alejandro Acero, Eric J. Horvitz
  • Patent number: 8942978
    Abstract: Parameters for distributions of a hidden trajectory model including means and variances are estimated using an acoustic likelihood function for observation vectors as an objection function for optimization. The estimation includes only acoustic data and not any intermediate estimate on hidden dynamic variables. Gradient ascent methods can be developed for optimizing the acoustic likelihood function.
    Type: Grant
    Filed: July 14, 2011
    Date of Patent: January 27, 2015
    Assignee: Microsoft Corporation
    Inventors: Li Deng, Dong Yu, Xiaolong Li, Alejandro Acero
  • Publication number: 20140358525
    Abstract: This document describes various techniques for dual-band speech encoding. In some embodiments, a first type of speech feature is received from a remote entity, an estimate of a second type of speech feature is determined based on the first type of speech feature, the estimate of the second type of speech feature is provided to a speech recognizer, speech-recognition results based on the estimate of the second type of speech feature are received from the speech recognizer, and the speech-recognition results are transmitted to the remote entity.
    Type: Application
    Filed: August 14, 2014
    Publication date: December 4, 2014
    Inventors: Alejandro Acero, James G. Droppo, III, Michael L. Seltzer
  • Patent number: 8818002
    Abstract: A novel adaptive beamforming technique with enhanced noise suppression capability. The technique incorporates the sound-source presence probability into an adaptive blocking matrix. In one embodiment the sound-source presence probability is estimated based on the instantaneous direction of arrival of the input signals and voice activity detection. The technique guarantees robustness to steering vector errors without imposing ad hoc constraints on the adaptive filter coefficients. It can provide good suppression performance for both directional interference signals as well as isotropic ambient noise.
    Type: Grant
    Filed: July 21, 2011
    Date of Patent: August 26, 2014
    Assignee: Microsoft Corp.
    Inventors: Ivan Tashev, Alejandro Acero, Byung-Jun Yoon
  • Patent number: 8818797
    Abstract: This document describes various techniques for dual-band speech encoding. In some embodiments, a first type of speech feature is received from a remote entity, an estimate of a second type of speech feature is determined based on the first type of speech feature, the estimate of the second type of speech feature is provided to a speech recognizer, speech-recognition results based on the estimate of the second type of speech feature are received from the speech recognizer, and the speech-recognition results are transmitted to the remote entity.
    Type: Grant
    Filed: December 23, 2010
    Date of Patent: August 26, 2014
    Assignee: Microsoft Corporation
    Inventors: Alejandro Acero, James G. Droppo, III, Michael L. Seltzer
  • Publication number: 20140229158
    Abstract: A system is described herein which uses a neural network having an input layer that accepts an input vector and a feature vector. The input vector represents at least part of input information, such as, but not limited to, a word or phrase in a sequence of input words. The feature vector provides supplemental information pertaining to the input information. The neural network produces an output vector based on the input vector and the feature vector. In one implementation, the neural network is a recurrent neural network. Also described herein are various applications of the system, including a machine translation application.
    Type: Application
    Filed: February 10, 2013
    Publication date: August 14, 2014
    Applicant: MICROSOFT CORPORATION
    Inventors: Geoffrey G. Zweig, Tomas Mikolov, Alejandro Acero
  • Patent number: 8719019
    Abstract: Speaker identification techniques are described. In one or more implementations, sample data is received at a computing device of one or more user utterances captured using a microphone. The sample data is processed by the computing device to identify a speaker of the one or more user utterances. The processing involving use of a feature set that includes features obtained using a filterbank having filters that space linearly at higher frequencies and logarithmically at lower frequencies, respectively, features that model the speaker's vocal tract transfer function, and features that indicate a vibration rate of vocal folds of the speaker of the sample data.
    Type: Grant
    Filed: April 25, 2011
    Date of Patent: May 6, 2014
    Assignee: Microsoft Corporation
    Inventors: Hoang T. Do, Ivan J. Tashev, Alejandro Acero, Jason S. Flaks, Robert N. Heitkamp, Molly R. Suver
  • Patent number: 8700394
    Abstract: Described is a technology by which a speech recognizer is adapted to perform in noisy environments using linear spline interpolation to approximate the nonlinear relationship between clean speech, noise, and noisy speech. Linear spline parameters that minimize the error the between predicted noisy features and actual noisy features are learned from training data, along with variance data that reflect regression errors. Also described is compensating for linear channel distortion and updating noise and channel parameters during speech recognition decoding.
    Type: Grant
    Filed: March 24, 2010
    Date of Patent: April 15, 2014
    Assignee: Microsoft Corporation
    Inventors: Michael Lewis Seltzer, Kaustubh Prakash Kalgaonkar, Alejandro Acero
  • Patent number: 8615393
    Abstract: A noise suppressor for altering a speech signal is trained based on a speech recognition system. An objective function can be utilized to adjust parameters of the noise suppressor. The noise suppressor can be used to alter speech signals for the speech recognition system.
    Type: Grant
    Filed: November 15, 2006
    Date of Patent: December 24, 2013
    Assignee: Microsoft Corporation
    Inventors: Ivan J. Tashev, Alejandro Acero, James G. Droppo
  • Patent number: 8583428
    Abstract: Described is a multiple phase process/system that combines spatial filtering with regularization to separate sound from different sources such as the speech of two different speakers. In a first phase, frequency domain signals corresponding to the sensed sounds are processed into separated spatially filtered signals including by inputting the signals into a plurality of beamformers (which may include nullformers) followed by nonlinear spatial filters. In a regularization phase, the separated spatially filtered signals are input into an independent component analysis mechanism that is configured with multi-tap filters, followed by secondary nonlinear spatial filters. Separated audio signals are the provided via an inverse-transform.
    Type: Grant
    Filed: June 15, 2010
    Date of Patent: November 12, 2013
    Assignee: Microsoft Corporation
    Inventors: Ivan Tashev, Lae-Hoon Kim, Alejandro Acero, Jason Scott Flaks
  • Publication number: 20130282634
    Abstract: A method is disclosed herein that includes an act of causing a processor to access a deep-structured, layered or hierarchical model, called a deep convex network, retained in a computer-readable medium, wherein the deep-structured model comprises a plurality of layers with weights assigned thereto. This layered model can produce the output serving as the scores to combine with transition probabilities between states in a hidden Markov model and language model scores to form a full speech recognizer. Batch-based, convex optimization is performed to learn a portion of the deep convex network's weights, rendering it appropriate for parallel computation to accomplish the training. The method can further include the act of jointly substantially optimizing the weights, the transition probabilities, and the language model scores of the deep-structured model using the optimization criterion based on a sequence rather than a set of unrelated frames.
    Type: Application
    Filed: June 17, 2013
    Publication date: October 24, 2013
    Inventors: Li Deng, Dong Yu, Alejandro Acero
  • Publication number: 20130253930
    Abstract: Various technologies described herein pertain to adapting a speech recognizer to input speech data. A first linear transform can be selected from a first set of linear transforms based on a value of a first variability source corresponding to the input speech data, and a second linear transform can be selected from a second set of linear transforms based on a value of a second variability source corresponding to the input speech data. The linear transforms in the first and second sets can compensate for the first variability source and the second variability source, respectively. Moreover, the first linear transform can be applied to the input speech data to generate intermediate transformed speech data, and the second linear transform can be applied to the intermediate transformed speech data to generate transformed speech data. Further, speech can be recognized based on the transformed speech data to obtain a result.
    Type: Application
    Filed: March 23, 2012
    Publication date: September 26, 2013
    Applicant: MICROSOFT CORPORATION
    Inventors: Michael Lewis Seltzer, Alejandro Acero
  • Patent number: 8532985
    Abstract: A warped spectral estimate of an original audio signal can be used to encode a representation of a fine estimate of the original signal. The representation of the warped spectral estimate and the representation of the fine estimate can be sent to a speech recognition system. The representation of the warped spectral estimate can be passed to a speech recognition engine, where it may be used for speech recognition. The representation of the warped spectral estimate can also be used along with the representation of the fine estimate to reconstruct a representation of the original audio signal.
    Type: Grant
    Filed: December 3, 2010
    Date of Patent: September 10, 2013
    Assignee: Microsoft Coporation
    Inventors: Michael L. Seltzer, James G. Droppo, Henrique S. Malvar, Alejandro Acero, Xing Fan
  • Patent number: 8515096
    Abstract: The quality of sound recorded from a plurality of people speaking at the same time is improved by incorporating prior knowledge into an independent component analysis (ICA) separating algorithm. More particularly, prior knowledge is defined as a probability distribution according to some prior situation (e.g., prior distribution of people in a room). A mixture of sounds (e.g., mixture of voices) from a plurality of sources (e.g., people) captured by one or more recording devices (e.g., microphones) is separated into individual components (e.g., individual voices from respective people) by applying an maximum a posteriori (MAP) ICA algorithm which incorporates prior knowledge of the respective sources (e.g., location of sources) directly into the MAP ICA algorithm thereby allowing recovery of independent underlying sounds associated with individual sources from the mixture.
    Type: Grant
    Filed: June 18, 2008
    Date of Patent: August 20, 2013
    Assignee: Microsoft Corporation
    Inventors: Michael L. Seltzer, Graham Taylor, Alejandro Acero
  • Patent number: 8489529
    Abstract: A method is disclosed herein that includes an act of causing a processor to access a deep-structured, layered or hierarchical model, called deep convex network, retained in a computer-readable medium, wherein the deep-structured model comprises a plurality of layers with weights assigned thereto. This layered model can produce the output serving as the scores to combine with transition probabilities between states in a hidden Markov model and language model scores to form a full speech recognizer. The method makes joint use of nonlinear random projections and RBM weights, and it stacks a lower module's output with the raw data to establish its immediately higher module. Batch-based, convex optimization is performed to learn a portion of the deep convex network's weights, rendering it appropriate for parallel computation to accomplish the training.
    Type: Grant
    Filed: March 31, 2011
    Date of Patent: July 16, 2013
    Assignee: Microsoft Corporation
    Inventors: Li Deng, Dong Yu, Alejandro Acero
  • Patent number: 8442828
    Abstract: A conditional model is used in spoken language understanding. One such model is a conditional random field model.
    Type: Grant
    Filed: March 17, 2006
    Date of Patent: May 14, 2013
    Assignee: Microsoft Corporation
    Inventors: Ye-Yi Wang, Alejandro Acero, John Sie Yuen Lee, Milind V. Mahajan
  • Patent number: 8433576
    Abstract: A novel system for automatic reading tutoring provides effective error detection and reduced false alarms combined with low processing time burdens and response times short enough to maintain a natural, engaging flow of interaction. According to one illustrative embodiment, an automatic reading tutoring method includes displaying a text output and receiving an acoustic input. The acoustic input is modeled with a domain-specific target language model specific to the text output, and with a general-domain garbage language model, both of which may be efficiently constructed as context-free grammars. The domain-specific target language model may be built dynamically or “on-the-fly” based on the currently displayed text (e.g. the story to be read by the user), while the general-domain garbage language model is shared among all different text outputs. User-perceptible tutoring feedback is provided based on the target language model and the garbage language model.
    Type: Grant
    Filed: January 19, 2007
    Date of Patent: April 30, 2013
    Assignee: Microsoft Corporation
    Inventors: Xiaolong Li, Yun-Cheng Ju, Li Deng, Alejandro Acero
  • Patent number: 8423364
    Abstract: A method and apparatus for training an acoustic model are disclosed. A training corpus is accessed and converted into an initial acoustic model. Scores are calculated for a correct class and competitive classes, respectively, for each token given the initial acoustic model. Also, a sample-adaptive window bandwidth is calculated for each training token. From the calculated scores and the sample-adaptive window bandwidth values, loss values are calculated based on a loss function. The loss function, which may be derived from a Bayesian risk minimization viewpoint, can include a margin value that moves a decision boundary such that token-to-boundary distances for correct tokens that are near the decision boundary are maximized. The margin can either be a fixed margin or can vary monotonically as a function of algorithm iterations. The acoustic model is updated based on the calculated loss values. This process can be repeated until an empirical convergence is met.
    Type: Grant
    Filed: February 20, 2007
    Date of Patent: April 16, 2013
    Assignee: Microsoft Corporation
    Inventors: Dong Yu, Alejandro Acero, Li Deng, Xiaodong He
  • Patent number: 8405706
    Abstract: A videoconferencing conferee may be provided with feedback on his or her location relative a local video camera by altering how remote videoconference video is displayed on a local videoconference display viewed by the conferee. The conferee's location may be tracked and the displayed remote video may be altered in accordance to the changing location of the conferee. The remote video may appear to move in directions mirroring movement of the conferee. This effect may be achieved by modeling the remote video as offset and behind a virtual portal corresponding to the display. The remote video may be displayed according to a view of the remote video through the virtual portal. As the conferee's position changes, the view through the portal changes, and the remote video changes accordingly.
    Type: Grant
    Filed: December 17, 2008
    Date of Patent: March 26, 2013
    Assignee: Microsoft Corporation
    Inventors: Zhengyou Zhang, Christian Huitema, Alejandro Acero