Patents by Inventor Alejandro Acero

Alejandro Acero has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Integrated speech recognition and semantic classification

Patent number: 7856351

Abstract: A novel system integrates speech recognition and semantic classification, so that acoustic scores in a speech recognizer that accepts spoken utterances may be taken into account when training both language models and semantic classification models. For example, a joint association score may be defined that is indicative of a correspondence of a semantic class and a word sequence for an acoustic signal. The joint association score may incorporate parameters such as weighting parameters for signal-to-class modeling of the acoustic signal, language model parameters and scores, and acoustic model parameters and scores. The parameters may be revised to raise the joint association score of a target word sequence with a target semantic class relative to the joint association score of a competitor word sequence with the target semantic class. The parameters may be designed so that the semantic classification errors in the training data are minimized.

Type: Grant

Filed: January 19, 2007

Date of Patent: December 21, 2010

Assignee: Microsoft Corporation

Inventors: Sibel Yaman, Li Deng, Dong Yu, Ye-Yi Wang, Alejandro Acero
NOISE ADAPTIVE TRAINING FOR SPEECH RECOGNITION

Publication number: 20100318354

Abstract: Technologies are described herein for noise adaptive training to achieve robust automatic speech recognition. Through the use of these technologies, a noise adaptive training (NAT) approach may use both clean and corrupted speech for training. The NAT approach may normalize the environmental distortion as part of the model training. A set of underlying “pseudo-clean” model parameters may be estimated directly. This may be done without point estimation of clean speech features as an intermediate step. The pseudo-clean model parameters learned from the NAT technique may be used with a Vector Taylor Series (VTS) adaptation. Such adaptation may support decoding noisy utterances during the operating phase of a automatic voice recognition system.

Type: Application

Filed: June 12, 2009

Publication date: December 16, 2010

Applicant: Microsoft Corporation

Inventors: Michael Lewis Seltzer, James Garnet Droppo, Ozlem Kalinli, Alejandro Acero
Spatial Audio for Audio Conferencing

Publication number: 20100316232

Abstract: Spatialized audio is generated for voice data received at a telecommunications device based on spatial audio information received with the voice data and based on a determined virtual position of the source of the voice data for producing spatialized audio signals.

Type: Application

Filed: June 16, 2009

Publication date: December 16, 2010

Applicant: Microsoft Corporation

Inventors: Alejandro Acero, Christian Huitema
USING COMBINED ANSWERS IN MACHINE-BASED EDUCATION

Publication number: 20100311030

Abstract: Described is a technology for learning a foreign language or other subject. Answers (e.g., translations) to questions (e.g., sentences to translate) received from learners are combined into a combined answer that serves as a representative model answer for those learners. The questions also may be provided to machine subsystems to generate machine answers, e.g., machine translators, with those machine answers used in the combined answer. The combined answer is used to evaluate each learner's individual answer. The evaluation may be used to compute profile information that is then fed back for use in selecting further questions, e.g., more difficult sentences as the learners progress. Also described is integrating the platform/technology into a web service.

Type: Application

Filed: June 3, 2009

Publication date: December 9, 2010

Applicant: Microsoft Corporation

Inventors: Xiaodong He, Alejandro Acero, Sebastian de la Chica
PRESENTING SEARCH RESULTS ACCORDING TO QUERY DOMAINS

Publication number: 20100312782

Abstract: A query may be applied against search engines that respectively return a set of search results relating to various items discovered in the searched data sets. However, presenting numerous and varied search results may be difficult on mobile devices with small displays and limited computational resources. Instead, search results may be associated with search domains representing various information types (e.g., contacts, public figures, places, projects, movies, music, and books) and presented by grouping search results with associated query domains, e.g., in a tabbed user interface. The query may be received through an input device associated with a particular input domain, and may be transitioned to the query domain of a particular search engine (e.g., by recognizing phonemes of a voice query using an acoustic model; matching phonemes with query terms according to a pronunciation model; and generating a recognition result according to a vocabulary of an n-gram language model.

Type: Application

Filed: June 5, 2009

Publication date: December 9, 2010

Applicant: Microsoft Corporation

Inventors: Xiao Li, Patrick Nguyen, Geoffrey Zweig, Alejandro Acero
Time-anchored posterior indexing of speech

Patent number: 7831425

Abstract: A computer-implemented method of indexing a speech lattice for search of audio corresponding to the speech lattice is provided. The method includes identifying at least two speech recognition hypotheses for a word which have time ranges satisfying a criteria. The method further includes merging the at least two speech recognition hypotheses to generate a merged speech recognition hypothesis for the word.

Type: Grant

Filed: December 15, 2005

Date of Patent: November 9, 2010

Assignee: Microsoft Corporation

Inventors: Alejandro Acero, Asela J. Gunawardana, Ciprian I. Chelba, Erik W. Selberg, Frank Torsten B. Seide, Patrick Nguyen, Roger Peng Yu
Speech index pruning

Patent number: 7831428

Abstract: A speech segment is indexed by identifying at least two alternative word sequences for the speech segment. For each word in the alternative sequences, information is placed in an entry for the word in the index. Speech units are eliminated from entries in the index based on a comparison of a probability that the word appears in the speech segment and a threshold value.

Type: Grant

Filed: November 9, 2005

Date of Patent: November 9, 2010

Assignee: Microsoft Corporation

Inventors: Ciprian I. Chelba, Alejandro Acero, Jorge F. Silva Sanchez
Calibration based beamforming, non-linear adaptive filtering, and multi-sensor headset

Patent number: 7813923

Abstract: A first set of signals from an array of one or more microphones, and a second signal from a reference microphone are used to calibrate a set of filter parameters such that the filter parameters minimize a difference between the second signal and a beamformer output signal that is based on the first set of signals. Once calibrated, the filter parameters are used to form a beamformer output signal that is filtered using a non-linear adaptive filter that is adapted based on portions of a signal that do not contain speech, as determined by a speech detection sensor.

Type: Grant

Filed: October 14, 2005

Date of Patent: October 12, 2010

Assignee: Microsoft Corporation

Inventors: Alejandro Acero, Michael L. Seltzer, Zhengyou Zhang, Zicheng Liu
MAXIMUM ENTROPY MODEL WITH CONTINUOUS FEATURES

Publication number: 20100256977

Abstract: Described is a technology by which a maximum entropy (MaxEnt) model, such as used as a classifier or in a conditional random field or hidden conditional random field that embed the maximum entropy model, uses continuous features with continuous weights that are continuous functions of the feature values (instead of single-valued weights). The continuous weights may be approximated by a spline-based solution. In general, this converts the optimization problem into a standard log-linear optimization problem without continuous weights at a higher-dimensional space.

Type: Application

Filed: April 1, 2009

Publication date: October 7, 2010

Applicant: Microsoft Corporation

Inventors: Dong Yu, Li Deng, Alejandro Acero
Indexing and searching speech with text meta-data

Patent number: 7809568

Abstract: An index for searching spoken documents having speech data and text meta-data is created by obtaining probabilities of occurrence of words and positional information of the words of the speech data and combining it with at least positional information of the words in the text meta-data. A single index can be created because the speech data and the text meta-data are treated the same and considered only different categories.

Type: Grant

Filed: November 8, 2005

Date of Patent: October 5, 2010

Assignee: Microsoft Corporation

Inventors: Alejandro Acero, Ciprian I. Chelba, Jorge F. Silva Sanchez
AUDIO TRANSFORMS IN CONNECTION WITH MULTIPARTY COMMUNICATION

Publication number: 20100195812

Abstract: The claimed subject matter relates to an architecture that can preprocess audio portions of communications in order to enrich multiparty communication sessions or environments. In particular, the architecture can provide both a public channel for public communications that are received by substantially all connected parties and can further provide a private channel for private communications that are received by a selected subset of all connected parties. Most particularly, the architecture can apply an audio transform to communications that occur during the multiparty communication session based upon a target audience of the communication. By way of illustration, the architecture can apply a whisper transform to private communications, an emotion transform based upon relationships, an ambience or spatial transform based upon physical locations, or a pace transform based upon lack of presence.

Type: Application

Filed: February 5, 2009

Publication date: August 5, 2010

Applicant: MICROSOFT CORPORATION

Inventors: Dinei A. Florencio, Alejandro Acero, William Buxton, Phillip A. Chou, Ross G. Cutler, Jason Garms, Christian Huitema, Kori M. Quinn, Daniel Allen Rosenfeld, Zhengyou Zhang
Method of pattern recognition using noise reduction uncertainty

Patent number: 7769582

Abstract: A method and apparatus are provided for using the uncertainty of a noise-removal process during pattern recognition. In particular, noise is removed from a representation of a portion of a noisy signal to produce a representation of a cleaned signal. In the meantime, an uncertainty associated with the noise removal is computed and is used with the representation of the cleaned signal to modify a probability for a phonetic state in the recognition system. In particular embodiments, the uncertainty is used to modify a probability distribution, by increasing the variance in each Gaussian distribution by the amount equal to the estimated variance of the cleaned signal, which is used in decoding the phonetic state sequence in a pattern recognition task.

Type: Grant

Filed: July 25, 2008

Date of Patent: August 3, 2010

Assignee: Microsoft Corporation

Inventors: James G. Droppo, Alejandro Acero, Li Deng
TRAINING WIDEBAND ACOUSTIC MODELS IN THE CEPSTRAL DOMAIN USING MIXED-BANDWIDTH TRAINING DATA FOR SPEECH RECOGNITION

Publication number: 20100161332

Abstract: A method and apparatus are provided that use narrowband data and wideband data to train a wideband acoustic model.

Type: Application

Filed: March 8, 2010

Publication date: June 24, 2010

Applicant: MICROSOFT CORPORATION

Inventors: Michael L. Seltzer, Alejandro Acero
Noise Suppressor for Robust Speech Recognition

Publication number: 20100153104

Abstract: Described is noise reduction technology generally for speech input in which a noise-suppression related gain value for the frame is determined based upon a noise level associated with that frame in addition to the signal to noise ratios (SNRs). In one implementation, a noise reduction mechanism is based upon minimum mean square error, Mel-frequency cepstra noise reduction technology. A high gain value (e.g., one) is set to accomplish little or no noise suppression when the noise level is below a threshold low level, and a low gain value set or computed to accomplish large noise suppression above a threshold high noise level. A noise-power dependent function, e.g., a log-linear interpolation, is used to compute the gain between the thresholds. Smoothing may be performed by modifying the gain value based upon a prior frame's gain value. Also described is learning parameters used in noise reduction via a step-adaptive discriminative learning algorithm.

Type: Application

Filed: December 16, 2008

Publication date: June 17, 2010

Applicant: MICROSOFT CORPORATION

Inventors: Dong Yu, Li Deng, Yifan Gong, Jian Wu, Alejandro Acero
VISUAL FEEDBACK FOR NATURAL HEAD POSITIONING

Publication number: 20100149310

Abstract: A videoconferencing conferee may be provided with feedback on his or her location relative a local video camera by altering how remote videoconference video is displayed on a local videoconference display viewed by the conferee. The conferee's location may be tracked and the displayed remote video may be altered in accordance to the changing location of the conferee. The remote video may appear to move in directions mirroring movement of the conferee. This effect may be achieved by modeling the remote video as offset and behind a virtual portal corresponding to the display. The remote video may be displayed according to a view of the remote video through the virtual portal. As the conferee's position changes, the view through the portal changes, and the remote video changes accordingly.

Type: Application

Filed: December 17, 2008

Publication date: June 17, 2010

Applicant: Microsoft Corporation

Inventors: Zhengyou Zhang, Christian Huitema, Alejandro Acero
Time asynchronous decoding for long-span trajectory model

Patent number: 7734460

Abstract: A time-asynchronous lattice-constrained search algorithm is developed and used to process a linguistic model of speech that has a long-contextual-span capability. In the algorithm, nodes and links in the lattices developed from the model are expanded via look-ahead. Heuristics as utilized by a search algorithm are estimated. Additionally, pruning strategies can be applied to speed up the search.

Type: Grant

Filed: December 20, 2005

Date of Patent: June 8, 2010

Assignee: Microsoft Corporation

Inventors: Dong Yu, Li Deng, Alejandro Acero
Method and apparatus for constructing a speech filter using estimates of clean speech and noise

Patent number: 7725314

Abstract: A method and apparatus identify a clean speech signal from a noisy speech signal. To do this, a clean speech value and a noise value are estimated from the noisy speech signal. The clean speech value and the noise value are then used to define a gain on a filter. The noisy speech signal is applied to the filter to produce the clean speech signal. Under some embodiments, the noise value and the clean speech value are used in both the numerator and the denominator of the filter gain, with the numerator being guaranteed to be positive.

Type: Grant

Filed: February 16, 2004

Date of Patent: May 25, 2010

Assignee: Microsoft Corporation

Inventors: Jian Wu, James G. Droppo, Li Deng, Alejandro Acero
Training wideband acoustic models in the cepstral domain using mixed-bandwidth training data for speech recognition

Patent number: 7707029

Abstract: A method and apparatus are provided that use narrowband data and wideband data to train a wideband acoustic model.

Type: Grant

Filed: November 23, 2005

Date of Patent: April 27, 2010

Assignee: Microsoft Corporation

Inventors: Michael L. Seltzer, Alejandro Acero
Updating hidden conditional random field model parameters after processing individual training samples

Patent number: 7689419

Abstract: A method and apparatus are provided for training parameters in a hidden conditional random field model for use in speech recognition and phonetic classification. The hidden conditional random field model uses parameterized features that are determined from a segment of speech, and those values are used to identify a phonetic unit for the segment of speech. The parameters are updated after processing of individual training samples.

Type: Grant

Filed: September 22, 2005

Date of Patent: March 30, 2010

Assignee: Microsoft Corporation

Inventors: Milind V. Mahajan, Alejandro Acero, Asela J. Gunawardana, John C. Platt
PHASE SENSITIVE MODEL ADAPTATION FOR NOISY SPEECH RECOGNITION

Publication number: 20100076758

Abstract: A speech recognition system described herein includes a receiver component that receives a distorted speech utterance. The speech recognition also includes an updater component that is in communication with a first model and a second model, wherein the updater component automatically updates parameters of the second model based at least in part upon joint estimates of additive and convolutive distortions output by the first model, wherein the joint estimates of additive and convolutive distortions are estimates of distortions based on a phase-sensitive model in the speech utterance received by the receiver component. Further, distortions other than additive and convolutive distortions, including other stationary and nonstationary sources, can also be estimated used to update the parameters of the second model.

Type: Application

Filed: September 24, 2008

Publication date: March 25, 2010

Applicant: Microsoft Corporation

Inventors: Jinyu Li, Li Deng, Dong Yu, Yifan Gong, Alejandro Acero

prev … 2 3 4 5 6 7 8 9 10 … next