Patents by Inventor Alejandro Acero

Alejandro Acero has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 7856351
    Abstract: A novel system integrates speech recognition and semantic classification, so that acoustic scores in a speech recognizer that accepts spoken utterances may be taken into account when training both language models and semantic classification models. For example, a joint association score may be defined that is indicative of a correspondence of a semantic class and a word sequence for an acoustic signal. The joint association score may incorporate parameters such as weighting parameters for signal-to-class modeling of the acoustic signal, language model parameters and scores, and acoustic model parameters and scores. The parameters may be revised to raise the joint association score of a target word sequence with a target semantic class relative to the joint association score of a competitor word sequence with the target semantic class. The parameters may be designed so that the semantic classification errors in the training data are minimized.
    Type: Grant
    Filed: January 19, 2007
    Date of Patent: December 21, 2010
    Assignee: Microsoft Corporation
    Inventors: Sibel Yaman, Li Deng, Dong Yu, Ye-Yi Wang, Alejandro Acero
  • Publication number: 20100318354
    Abstract: Technologies are described herein for noise adaptive training to achieve robust automatic speech recognition. Through the use of these technologies, a noise adaptive training (NAT) approach may use both clean and corrupted speech for training. The NAT approach may normalize the environmental distortion as part of the model training. A set of underlying “pseudo-clean” model parameters may be estimated directly. This may be done without point estimation of clean speech features as an intermediate step. The pseudo-clean model parameters learned from the NAT technique may be used with a Vector Taylor Series (VTS) adaptation. Such adaptation may support decoding noisy utterances during the operating phase of a automatic voice recognition system.
    Type: Application
    Filed: June 12, 2009
    Publication date: December 16, 2010
    Applicant: Microsoft Corporation
    Inventors: Michael Lewis Seltzer, James Garnet Droppo, Ozlem Kalinli, Alejandro Acero
  • Publication number: 20100316232
    Abstract: Spatialized audio is generated for voice data received at a telecommunications device based on spatial audio information received with the voice data and based on a determined virtual position of the source of the voice data for producing spatialized audio signals.
    Type: Application
    Filed: June 16, 2009
    Publication date: December 16, 2010
    Applicant: Microsoft Corporation
    Inventors: Alejandro Acero, Christian Huitema
  • Publication number: 20100311030
    Abstract: Described is a technology for learning a foreign language or other subject. Answers (e.g., translations) to questions (e.g., sentences to translate) received from learners are combined into a combined answer that serves as a representative model answer for those learners. The questions also may be provided to machine subsystems to generate machine answers, e.g., machine translators, with those machine answers used in the combined answer. The combined answer is used to evaluate each learner's individual answer. The evaluation may be used to compute profile information that is then fed back for use in selecting further questions, e.g., more difficult sentences as the learners progress. Also described is integrating the platform/technology into a web service.
    Type: Application
    Filed: June 3, 2009
    Publication date: December 9, 2010
    Applicant: Microsoft Corporation
    Inventors: Xiaodong He, Alejandro Acero, Sebastian de la Chica
  • Publication number: 20100312782
    Abstract: A query may be applied against search engines that respectively return a set of search results relating to various items discovered in the searched data sets. However, presenting numerous and varied search results may be difficult on mobile devices with small displays and limited computational resources. Instead, search results may be associated with search domains representing various information types (e.g., contacts, public figures, places, projects, movies, music, and books) and presented by grouping search results with associated query domains, e.g., in a tabbed user interface. The query may be received through an input device associated with a particular input domain, and may be transitioned to the query domain of a particular search engine (e.g., by recognizing phonemes of a voice query using an acoustic model; matching phonemes with query terms according to a pronunciation model; and generating a recognition result according to a vocabulary of an n-gram language model.
    Type: Application
    Filed: June 5, 2009
    Publication date: December 9, 2010
    Applicant: Microsoft Corporation
    Inventors: Xiao Li, Patrick Nguyen, Geoffrey Zweig, Alejandro Acero
  • Patent number: 7831425
    Abstract: A computer-implemented method of indexing a speech lattice for search of audio corresponding to the speech lattice is provided. The method includes identifying at least two speech recognition hypotheses for a word which have time ranges satisfying a criteria. The method further includes merging the at least two speech recognition hypotheses to generate a merged speech recognition hypothesis for the word.
    Type: Grant
    Filed: December 15, 2005
    Date of Patent: November 9, 2010
    Assignee: Microsoft Corporation
    Inventors: Alejandro Acero, Asela J. Gunawardana, Ciprian I. Chelba, Erik W. Selberg, Frank Torsten B. Seide, Patrick Nguyen, Roger Peng Yu
  • Patent number: 7831428
    Abstract: A speech segment is indexed by identifying at least two alternative word sequences for the speech segment. For each word in the alternative sequences, information is placed in an entry for the word in the index. Speech units are eliminated from entries in the index based on a comparison of a probability that the word appears in the speech segment and a threshold value.
    Type: Grant
    Filed: November 9, 2005
    Date of Patent: November 9, 2010
    Assignee: Microsoft Corporation
    Inventors: Ciprian I. Chelba, Alejandro Acero, Jorge F. Silva Sanchez
  • Patent number: 7813923
    Abstract: A first set of signals from an array of one or more microphones, and a second signal from a reference microphone are used to calibrate a set of filter parameters such that the filter parameters minimize a difference between the second signal and a beamformer output signal that is based on the first set of signals. Once calibrated, the filter parameters are used to form a beamformer output signal that is filtered using a non-linear adaptive filter that is adapted based on portions of a signal that do not contain speech, as determined by a speech detection sensor.
    Type: Grant
    Filed: October 14, 2005
    Date of Patent: October 12, 2010
    Assignee: Microsoft Corporation
    Inventors: Alejandro Acero, Michael L. Seltzer, Zhengyou Zhang, Zicheng Liu
  • Publication number: 20100256977
    Abstract: Described is a technology by which a maximum entropy (MaxEnt) model, such as used as a classifier or in a conditional random field or hidden conditional random field that embed the maximum entropy model, uses continuous features with continuous weights that are continuous functions of the feature values (instead of single-valued weights). The continuous weights may be approximated by a spline-based solution. In general, this converts the optimization problem into a standard log-linear optimization problem without continuous weights at a higher-dimensional space.
    Type: Application
    Filed: April 1, 2009
    Publication date: October 7, 2010
    Applicant: Microsoft Corporation
    Inventors: Dong Yu, Li Deng, Alejandro Acero
  • Patent number: 7809568
    Abstract: An index for searching spoken documents having speech data and text meta-data is created by obtaining probabilities of occurrence of words and positional information of the words of the speech data and combining it with at least positional information of the words in the text meta-data. A single index can be created because the speech data and the text meta-data are treated the same and considered only different categories.
    Type: Grant
    Filed: November 8, 2005
    Date of Patent: October 5, 2010
    Assignee: Microsoft Corporation
    Inventors: Alejandro Acero, Ciprian I. Chelba, Jorge F. Silva Sanchez
  • Publication number: 20100195812
    Abstract: The claimed subject matter relates to an architecture that can preprocess audio portions of communications in order to enrich multiparty communication sessions or environments. In particular, the architecture can provide both a public channel for public communications that are received by substantially all connected parties and can further provide a private channel for private communications that are received by a selected subset of all connected parties. Most particularly, the architecture can apply an audio transform to communications that occur during the multiparty communication session based upon a target audience of the communication. By way of illustration, the architecture can apply a whisper transform to private communications, an emotion transform based upon relationships, an ambience or spatial transform based upon physical locations, or a pace transform based upon lack of presence.
    Type: Application
    Filed: February 5, 2009
    Publication date: August 5, 2010
    Applicant: MICROSOFT CORPORATION
    Inventors: Dinei A. Florencio, Alejandro Acero, William Buxton, Phillip A. Chou, Ross G. Cutler, Jason Garms, Christian Huitema, Kori M. Quinn, Daniel Allen Rosenfeld, Zhengyou Zhang
  • Patent number: 7769582
    Abstract: A method and apparatus are provided for using the uncertainty of a noise-removal process during pattern recognition. In particular, noise is removed from a representation of a portion of a noisy signal to produce a representation of a cleaned signal. In the meantime, an uncertainty associated with the noise removal is computed and is used with the representation of the cleaned signal to modify a probability for a phonetic state in the recognition system. In particular embodiments, the uncertainty is used to modify a probability distribution, by increasing the variance in each Gaussian distribution by the amount equal to the estimated variance of the cleaned signal, which is used in decoding the phonetic state sequence in a pattern recognition task.
    Type: Grant
    Filed: July 25, 2008
    Date of Patent: August 3, 2010
    Assignee: Microsoft Corporation
    Inventors: James G. Droppo, Alejandro Acero, Li Deng
  • Publication number: 20100161332
    Abstract: A method and apparatus are provided that use narrowband data and wideband data to train a wideband acoustic model.
    Type: Application
    Filed: March 8, 2010
    Publication date: June 24, 2010
    Applicant: MICROSOFT CORPORATION
    Inventors: Michael L. Seltzer, Alejandro Acero
  • Publication number: 20100153104
    Abstract: Described is noise reduction technology generally for speech input in which a noise-suppression related gain value for the frame is determined based upon a noise level associated with that frame in addition to the signal to noise ratios (SNRs). In one implementation, a noise reduction mechanism is based upon minimum mean square error, Mel-frequency cepstra noise reduction technology. A high gain value (e.g., one) is set to accomplish little or no noise suppression when the noise level is below a threshold low level, and a low gain value set or computed to accomplish large noise suppression above a threshold high noise level. A noise-power dependent function, e.g., a log-linear interpolation, is used to compute the gain between the thresholds. Smoothing may be performed by modifying the gain value based upon a prior frame's gain value. Also described is learning parameters used in noise reduction via a step-adaptive discriminative learning algorithm.
    Type: Application
    Filed: December 16, 2008
    Publication date: June 17, 2010
    Applicant: MICROSOFT CORPORATION
    Inventors: Dong Yu, Li Deng, Yifan Gong, Jian Wu, Alejandro Acero
  • Publication number: 20100149310
    Abstract: A videoconferencing conferee may be provided with feedback on his or her location relative a local video camera by altering how remote videoconference video is displayed on a local videoconference display viewed by the conferee. The conferee's location may be tracked and the displayed remote video may be altered in accordance to the changing location of the conferee. The remote video may appear to move in directions mirroring movement of the conferee. This effect may be achieved by modeling the remote video as offset and behind a virtual portal corresponding to the display. The remote video may be displayed according to a view of the remote video through the virtual portal. As the conferee's position changes, the view through the portal changes, and the remote video changes accordingly.
    Type: Application
    Filed: December 17, 2008
    Publication date: June 17, 2010
    Applicant: Microsoft Corporation
    Inventors: Zhengyou Zhang, Christian Huitema, Alejandro Acero
  • Patent number: 7734460
    Abstract: A time-asynchronous lattice-constrained search algorithm is developed and used to process a linguistic model of speech that has a long-contextual-span capability. In the algorithm, nodes and links in the lattices developed from the model are expanded via look-ahead. Heuristics as utilized by a search algorithm are estimated. Additionally, pruning strategies can be applied to speed up the search.
    Type: Grant
    Filed: December 20, 2005
    Date of Patent: June 8, 2010
    Assignee: Microsoft Corporation
    Inventors: Dong Yu, Li Deng, Alejandro Acero
  • Patent number: 7725314
    Abstract: A method and apparatus identify a clean speech signal from a noisy speech signal. To do this, a clean speech value and a noise value are estimated from the noisy speech signal. The clean speech value and the noise value are then used to define a gain on a filter. The noisy speech signal is applied to the filter to produce the clean speech signal. Under some embodiments, the noise value and the clean speech value are used in both the numerator and the denominator of the filter gain, with the numerator being guaranteed to be positive.
    Type: Grant
    Filed: February 16, 2004
    Date of Patent: May 25, 2010
    Assignee: Microsoft Corporation
    Inventors: Jian Wu, James G. Droppo, Li Deng, Alejandro Acero
  • Patent number: 7707029
    Abstract: A method and apparatus are provided that use narrowband data and wideband data to train a wideband acoustic model.
    Type: Grant
    Filed: November 23, 2005
    Date of Patent: April 27, 2010
    Assignee: Microsoft Corporation
    Inventors: Michael L. Seltzer, Alejandro Acero
  • Patent number: 7689419
    Abstract: A method and apparatus are provided for training parameters in a hidden conditional random field model for use in speech recognition and phonetic classification. The hidden conditional random field model uses parameterized features that are determined from a segment of speech, and those values are used to identify a phonetic unit for the segment of speech. The parameters are updated after processing of individual training samples.
    Type: Grant
    Filed: September 22, 2005
    Date of Patent: March 30, 2010
    Assignee: Microsoft Corporation
    Inventors: Milind V. Mahajan, Alejandro Acero, Asela J. Gunawardana, John C. Platt
  • Publication number: 20100076758
    Abstract: A speech recognition system described herein includes a receiver component that receives a distorted speech utterance. The speech recognition also includes an updater component that is in communication with a first model and a second model, wherein the updater component automatically updates parameters of the second model based at least in part upon joint estimates of additive and convolutive distortions output by the first model, wherein the joint estimates of additive and convolutive distortions are estimates of distortions based on a phase-sensitive model in the speech utterance received by the receiver component. Further, distortions other than additive and convolutive distortions, including other stationary and nonstationary sources, can also be estimated used to update the parameters of the second model.
    Type: Application
    Filed: September 24, 2008
    Publication date: March 25, 2010
    Applicant: Microsoft Corporation
    Inventors: Jinyu Li, Li Deng, Dong Yu, Yifan Gong, Alejandro Acero