Training Of Hmm (epo) Patents (Class 704/256.2)
-
Patent number: 7856351Abstract: A novel system integrates speech recognition and semantic classification, so that acoustic scores in a speech recognizer that accepts spoken utterances may be taken into account when training both language models and semantic classification models. For example, a joint association score may be defined that is indicative of a correspondence of a semantic class and a word sequence for an acoustic signal. The joint association score may incorporate parameters such as weighting parameters for signal-to-class modeling of the acoustic signal, language model parameters and scores, and acoustic model parameters and scores. The parameters may be revised to raise the joint association score of a target word sequence with a target semantic class relative to the joint association score of a competitor word sequence with the target semantic class. The parameters may be designed so that the semantic classification errors in the training data are minimized.Type: GrantFiled: January 19, 2007Date of Patent: December 21, 2010Assignee: Microsoft CorporationInventors: Sibel Yaman, Li Deng, Dong Yu, Ye-Yi Wang, Alejandro Acero
-
Publication number: 20100318354Abstract: Technologies are described herein for noise adaptive training to achieve robust automatic speech recognition. Through the use of these technologies, a noise adaptive training (NAT) approach may use both clean and corrupted speech for training. The NAT approach may normalize the environmental distortion as part of the model training. A set of underlying “pseudo-clean” model parameters may be estimated directly. This may be done without point estimation of clean speech features as an intermediate step. The pseudo-clean model parameters learned from the NAT technique may be used with a Vector Taylor Series (VTS) adaptation. Such adaptation may support decoding noisy utterances during the operating phase of a automatic voice recognition system.Type: ApplicationFiled: June 12, 2009Publication date: December 16, 2010Applicant: Microsoft CorporationInventors: Michael Lewis Seltzer, James Garnet Droppo, Ozlem Kalinli, Alejandro Acero
-
Publication number: 20100312562Abstract: A rope-jumping algorithm is employed in a Hidden Markov Model based text to speech system to determine start and end models and to modify the start and end models by setting small co-variances. Disordered acoustic parameters due to violation of parameter constraints are avoided through the modification and result in stable line frequency spectrum for the generated speech.Type: ApplicationFiled: June 4, 2009Publication date: December 9, 2010Applicant: Microsoft CorporationInventors: Wenlin Wang, Guoliang Zhang, Jingyang Xu
-
Patent number: 7818172Abstract: The method of recognizing speech in an acoustic signal comprises developing acoustic stochastic models of voice units in the form of a set of states of an acoustic signal and using the acoustic models for recognition by a comparison of the signal with predetermined acoustic models obtained via a prior learning process. While developing the acoustic models, the voice units are modeled by means of a first portion of the states independent of adjacent voice units and by means of a second portion of the states dependent on adjacent voice units. The second portion of states dependent on adjacent voice units shares common parameters with a plurality of units sharing same phonemes.Type: GrantFiled: April 20, 2004Date of Patent: October 19, 2010Assignee: France TelecomInventors: Ronaldo Messina, Denis Jouvet
-
Patent number: 7805301Abstract: A reliable full covariance matrix estimation algorithm for pattern unit's state output distribution in pattern recognition system is discussed. An intermediate hierarchical tree structure is built to relate models for product units. Full covariance matrices of pattern unit's state output distribution are estimated based on all the related nodes in the tree.Type: GrantFiled: July 1, 2005Date of Patent: September 28, 2010Assignee: Microsoft CorporationInventors: Ye Tian, Frank Kao-Ping Soong, Jian-Lai Zhou
-
Patent number: 7778831Abstract: Voice recognition methods and systems are disclosed. A voice signal is obtained for an utterance of a speaker. A runtime pitch is determined from the voice signal for the utterance. The speaker is categorized based on the runtime pitch and one or more acoustic model parameters are adjusted based on a categorization of the speaker. The parameter adjustment may be performed at any instance of time during the recognition. A voice recognition analysis of the utterance is then performed based on the acoustic model.Type: GrantFiled: February 21, 2006Date of Patent: August 17, 2010Assignee: Sony Computer Entertainment Inc.Inventor: Ruxin Chen
-
Publication number: 20100204988Abstract: A speech recognition method includes receiving a speech input signal in a first noise environment which includes a sequence of observations, determining the likelihood of a sequence of words arising from the sequence of observations using an acoustic model, adapting the model trained in a second noise environment to that of the first environment, wherein adapting the model trained in the second environment to that of the first environment includes using second order or higher order Taylor expansion coefficients derived for a group of probability distributions and the same expansion coefficient is used for the whole group.Type: ApplicationFiled: April 20, 2010Publication date: August 12, 2010Inventors: Haitian XU, Kean Kheong Chin
-
Publication number: 20100191532Abstract: An object comparison method comprises: generating a first ordered vector sequence representation of a first object; generating a second ordered vector sequence representation of a second object; representing the first object by a first ordered sequence of model parameters generated by modeling the first ordered vector sequence representation using a semi-continuous hidden Markov model employing a universal basis; representing the second object by a second ordered sequence of model parameters generated by modeling the second ordered vector sequence representation using a semi-continuous hidden Markov model employing the universal basis; and comparing the first and second ordered sequences of model parameters to generate a quantitative comparison measure.Type: ApplicationFiled: January 28, 2009Publication date: July 29, 2010Applicant: Xerox CorporationInventors: Jose A. Rodriguez Serrano, Florent C. Perronnin
-
Patent number: 7729909Abstract: Model compression is combined with model compensation. Model compression is needed in embedded ASR to reduce the size and the computational complexity of compressed models. Model-compensation is used to adapt in real-time to changing noise environments. The present invention allows for the design of smaller ASR engines (memory consumption reduced to up to one-sixth) with reduced impact on recognition accuracy and/or robustness to noises.Type: GrantFiled: March 6, 2006Date of Patent: June 1, 2010Assignee: Panasonic CorporationInventors: Luca Rigazio, David Kryze, Keiko Morii, Nobuyuki Kunieda, Jean-Claude Junqua
-
Publication number: 20100128985Abstract: Method for online character recognition of Arabic text, the method including receiving handwritten Arabic text from a user in the form of handwriting strokes, sampling the handwriting strokes to acquire a sequence of two dimensional point representations thereof, with associated temporal data, geometrically pre processing and extracting features on the point representations, detecting delayed strokes and word parts in the pre processed point representations, projecting the delayed strokes onto the body of the word parts, constructing feature vector representations for each word part, thereby generating an observation sequence, and determining the word with maximum probability given the observation sequence, resulting in a list of word probabilities.Type: ApplicationFiled: July 26, 2007Publication date: May 27, 2010Applicant: BGN TECHNOLOGIES LTD.Inventors: Jihad El-Sana, Fadi Biadsy
-
Patent number: 7707027Abstract: A method for identifying data that is meaningless and generating a natural language statistical model which can reject meaningless input. The method can include identifying unigrams that are individually meaningless from a set of training data. At least a portion of the unigrams identified as being meaningless can be assigned to a first n-gram class. The method also can include identifying bigrams that are entirely composed of meaningless unigrams and determining whether the identified bigrams are individually meaningless. At least a portion of the bigrams identified as being individually meaningless can be assigned to the first n-gram class.Type: GrantFiled: April 13, 2006Date of Patent: April 27, 2010Assignee: Nuance Communications, Inc.Inventors: Rajesh Balchandran, Linda Boyer
-
Patent number: 7689419Abstract: A method and apparatus are provided for training parameters in a hidden conditional random field model for use in speech recognition and phonetic classification. The hidden conditional random field model uses parameterized features that are determined from a segment of speech, and those values are used to identify a phonetic unit for the segment of speech. The parameters are updated after processing of individual training samples.Type: GrantFiled: September 22, 2005Date of Patent: March 30, 2010Assignee: Microsoft CorporationInventors: Milind V. Mahajan, Alejandro Acero, Asela J. Gunawardana, John C. Platt
-
Publication number: 20100070280Abstract: A speech recognition system uses Gaussian mixture variable-parameter hidden Markov models (VPHMMs) to recognize speech. The VPHMMs include Gaussian parameters that vary as a function of at least one environmental conditioning parameter. The relationship of each Gaussian parameter to the environmental conditioning parameter(s) is modeled using a piecewise fitting approach, such as by using spline functions. In a training phase, the recognition system can use clustering to identify classes of spline functions, each class grouping together spline functions which are similar to each other based on some distance measure. The recognition system can then store sets of spline parameters that represent respective classes of spline functions. An instance of a spline function that belongs to a class can make reference to an associated shared set of spline parameters. The Gaussian parameters can be represented in an efficient form that accommodates the use of sharing in the above-summarized manner.Type: ApplicationFiled: September 16, 2008Publication date: March 18, 2010Applicant: Microsoft CorporationInventors: Dong Yu, Li Deng, Yifan Gong, Alejandro Acero
-
Patent number: 7680664Abstract: A multi-state pattern recognition model with non-uniform kernel allocation is formed by setting a number of states for a multi-state pattern recognition model and assigning different numbers of kernels to different states. The kernels are then trained using training data to form the multi-state pattern recognition model.Type: GrantFiled: August 16, 2006Date of Patent: March 16, 2010Assignee: Microsoft CorporationInventors: Peng Liu, Jian-Iai Zhou, Frank Kao-ping Soong
-
Patent number: 7672847Abstract: Methods are given for improving discriminative training of hidden Markov models for continuous speech recognition. For a mixture component of a hidden Markov model state, a gradient adjustment is calculated of the standard deviation of the mixture component. If the calculated gradient adjustment is greater than a first threshold amount, an adjustment is performed of the standard deviation of the mixture component using the first threshold. If the calculated gradient adjustment is less than a second threshold amount, an adjustment is performed of the standard deviation of the mixture component using the second threshold. Otherwise, an adjustment is performed of the standard deviation of the mixture component using the calculated gradient adjustment.Type: GrantFiled: September 30, 2008Date of Patent: March 2, 2010Assignee: Nuance Communications, Inc.Inventors: Chuang He, Jianxiong Wu, Vlad Sejnoha
-
Patent number: 7664643Abstract: A method, and a system to execute this method is being presented for the identification and separation of sources of an acoustic signal, which signal contains a mixture of multiple simultaneous component signals. The method represents the signal with multiple discrete state-variable sequences and combines acoustic and context level dynamics to achieve the source separation. The method identifies sources by discovering those frames of the signal whose features are dominated by single sources. The signal may be the simultaneous speech of multiple speakers.Type: GrantFiled: August 25, 2006Date of Patent: February 16, 2010Assignees: Nuance Communications, Inc.Inventors: Ramesh Ambat Gopinath, John Randall Hershey, Trausti Thor Kristjansson, Peder Andreas Olsen, Steven John Rennie
-
Patent number: 7660717Abstract: Speech recognition is performed by matching between a characteristic quantity of an inputted speech and a composite HMM obtained by synthesizing a speech HMM (hidden Markov model) and a noise HMM for each speech frame of the inputted speech by use of the composite HMM.Type: GrantFiled: January 9, 2008Date of Patent: February 9, 2010Assignee: Nuance Communications, Inc.Inventors: Tetsuya Takiguchi, Masafumi Nishimura
-
Patent number: 7627473Abstract: A method and apparatus are provided for training and using a hidden conditional random field model for speech recognition and phonetic classification. The hidden conditional random field model uses feature functions, at least one of which is based on a hidden state in a phonetic unit. Values for the feature functions are determined from a segment of speech, and these values are used to identify a phonetic unit for the segment of speech.Type: GrantFiled: October 15, 2004Date of Patent: December 1, 2009Assignee: Microsoft CorporationInventors: Asela J. Gunawardana, Milind Mahajan, Alejandro Acero
-
Patent number: 7624020Abstract: An adapter for a text to text training. A main corpus is used for training, and a domain specific corpus is used to adapt the main corpus according to the training information in the domain specific corpus. The adaptation is carried out using a technique that may be faster than the main training. The parameter set from the main training is adapted using the domain specific part.Type: GrantFiled: September 9, 2005Date of Patent: November 24, 2009Assignee: Language Weaver, Inc.Inventors: Kenji Yamada, Kevin Knight, Greg Langmead
-
Patent number: 7603276Abstract: A standard model creating apparatus which provides a high-precision standard model used for pattern recognition such as speech recognition, character recognition, or image recognition using a probability model based on a hidden Markov model, Bayesian theory, or linear discrimination analysis; intention interpretation using a probability model such as a Bayesian net; data-mining performed using a probability model; and so forth. The standard model creating apparatus includes a reference model preparing unit that prepares at least one reference model; a reference model storing unit that stores the reference model prepared by the reference model preparing unit; and a standard model creating unit that creates a standard model by calculating statistics of the standard model so as to maximize or locally maximize the probability or likelihood with respect to the reference model stored in the reference storing unit.Type: GrantFiled: November 18, 2003Date of Patent: October 13, 2009Assignee: Panasonic CorporationInventor: Shinichi Yoshizawa
-
Patent number: 7574411Abstract: Management of a low memory treelike data structure is shown. The method according to the invention comprises steps for creating a decision tree including a parent node and at least one leaf node, and steps for searching data from said nodes. The nodes of the decision tree are stored sequentially in such a manner that nodes follow the parent node in storage order, wherein the nodes refining the context of the searchable data can be reached without a link from their parent node. The method can preferably be utilized in speech-recognition systems, in text-to-phoneme mapping.Type: GrantFiled: April 29, 2004Date of Patent: August 11, 2009Assignee: Nokia CorporationInventors: Janne Suontausta, Jilei Tian
-
Publication number: 20090112595Abstract: Disclosed are systems and methods for training a barge-in-model for speech processing in a spoken dialogue system comprising the steps of (1) receiving an input having at least one speech segment and at least one non-speech segment, (2) establishing a restriction of recognizing only speech states during speech segments of the input and non-speech states during non-speech segments of the input, (2) generating a hypothesis lattice by allowing any sequence of speech Hidden Markov Models (HMMs) and non-speech HMMs, (4) generating a reference lattice by only allowing speech HMMs for at least one speech segment and non-speech HMMs for at least one non-speech segment, wherein different iterations of training generates at least one different reference lattice and at least one reference transcription, and (5) employing the generated reference lattice as the barge-in-model for speech processing.Type: ApplicationFiled: October 31, 2007Publication date: April 30, 2009Applicant: AT&T LabsInventor: Andrej Ljolje
-
Patent number: 7509259Abstract: A device (800) performs statistical pattern recognition using model parameters that are refined by optimizing an objective function that includes a term for many items of training data for which recognition errors occur wherein each term depends on a relative magnitude of a first score for a recognition result for an item of training data and a second score calculated by evaluating a statistical pattern recognition model identified by a transcribed identity of the training data item with feature vectors extracted from the item of training data. The objective function does not include terms for items of training data for which there is a gross discrepancy between a transcribed identity and a recognized identity. Gross discrepancies can be detected by probability score or pattern identity comparisons. Terms, of the objective function are weighted based on the type of recognition error and weights can be increased for high priority patterns.Type: GrantFiled: December 21, 2004Date of Patent: March 24, 2009Assignee: Motorola, Inc.Inventor: Jianming J. Song
-
Patent number: 7499857Abstract: The present invention is used to adapt acoustic models, quantized in subspaces, using adaptation training data (such as speaker-dependent training data). The acoustic model is compressed into multi-dimensional subspaces. A codebook is generated for each subspace. An adaptation transform is estimated, and it is applied to codewords in the codebooks, rather than to the means themselves.Type: GrantFiled: May 15, 2003Date of Patent: March 3, 2009Assignee: Microsoft CorporationInventor: Asela J. Gunawardana
-
Publication number: 20090055182Abstract: Methods are given for improving discriminative training of hidden Markov models for continuous speech recognition. For a mixture component of a hidden Markov model state, a gradient adjustment is calculated of the standard deviation of the mixture component. If the calculated gradient adjustment is greater than a first threshold amount, an adjustment is performed of the standard deviation of the mixture component using the first threshold. If the calculated gradient adjustment is less than a second threshold amount, an adjustment is performed of the standard deviation of the mixture component using the second threshold. Otherwise, an adjustment is performed of the standard deviation of the mixture component using the calculated gradient adjustment.Type: ApplicationFiled: September 30, 2008Publication date: February 26, 2009Applicant: NUANCE COMMUNICATIONS, INC.Inventors: Chuang He, Jianxiong Wu, Vlad Sejnoha
-
Patent number: 7496509Abstract: In large-scale deployments of speaker recognition systems the potential for legacy problems increases as the evolving technology may require configuration changes in the system thus invalidating already existing user voice accounts. Unless the entire database of original speech waveform were stored, users need to reenroll to keep their accounts functional, which, however, may be expensive and commercially not acceptable. Model migration is defined as a conversion of obsolete models to new-configuration models without additional data and waveform requirements. The present disclosure investigates ways to achieve such a migration with minimum loss of system accuracy.Type: GrantFiled: May 28, 2004Date of Patent: February 24, 2009Assignee: International Business Machines CorporationInventors: Jiri Navratil, Ganesh N. Ramaswamy, Ran D. Zilca
-
Patent number: 7475014Abstract: A method models trajectories of a signal source. Training signals generated by a signal source moving along known trajectories are acquired by each sensor in an array of sensors. Phase differences between all unique pairs of the training signals are determined. A wrapped-phase hidden Markov model is constructed from the phase differences. The wrapped-phase hidden Markov model includes multiple Gaussian distributions to model the known trajectories of the signal source.Type: GrantFiled: July 25, 2005Date of Patent: January 6, 2009Assignee: Mitsubishi Electric Research Laboratories, Inc.Inventors: Paris Smaragdis, Petros Boufounos
-
Patent number: 7472064Abstract: A method and system are provided in which a decision tree-based model (“general model”) is scaled down (“trim-down”) for a given task. The trim-down model can be adapted for the given task using task specific data. The general model can be based on a hidden markov model (HMM). By allowing a decision tree-based acoustic model (“general model”) to be scaled according to the vocabulary of the given task, the general model can be configured dynamically into a trim-down model, which can be used to improve speech recognition performance and reduce system resource utilization. Furthermore, the trim-down model can be adapted/adjusted according to task specific data, e.g., task vocabulary, model size, or other like task specific data.Type: GrantFiled: September 30, 2000Date of Patent: December 30, 2008Assignee: Intel CorporationInventors: Qing Guo, Yonghong Yan, Baosheng Yuan
-
Patent number: 7464033Abstract: For a given sentence grammar, speech recognizers are often required to decode M sets of HMMs each of which models a specific acoustic environment. In order to match input acoustic observations to each of the environments, typically recognition search methods require a network of M sub-networks. A new speech recognition search method is described here, which needs a network that is only the size of a single sub-network and yet provides the same recognition performance, thus reducing the memory requirements for network storage by (M-1)/M.Type: GrantFiled: February 4, 2005Date of Patent: December 9, 2008Assignee: Texas Instruments IncorporatedInventor: Yifan Gong
-
Patent number: 7454341Abstract: According to one aspect of the invention, a method is provided in which a mean vector set and a variance vector set of a set of N Gaussians are divided into multiple mean sub-vector sets and variance sub-vector sets, respectively. Each mean sub-vector set contains a subset of the dimensions of the corresponding mean vector set and each variance sub-vector set contains a subset of the dimensions of the corresponding variance vector set. Each resultant sub-vector set is clustered to build a codebook for the respective sub-vector set using a modified K-means clustering process which dynamically merges and splits clusters based upon the size and average distortion of each cluster during each iteration in the modified K-means clustering process.Type: GrantFiled: September 30, 2000Date of Patent: November 18, 2008Assignee: Intel CorporationInventors: Jielin Pan, Baosheng Yuan
-
Patent number: 7437288Abstract: A speech recognition apparatus using a probability model that employs a mixed distribution, the apparatus formed by a standard pattern storage means for storing a standard pattern; a recognition means for outputting recognition results corresponding to an input speech by using the standard pattern; a standard pattern generating means for inputting learning speech and generating the standard pattern; and a standard pattern adjustment means, provided between the standard pattern generating means and the standard pattern storage means, for adjusting the number of element distributions of the mixed distribution of the standard pattern.Type: GrantFiled: March 11, 2002Date of Patent: October 14, 2008Assignee: NEC CorporationInventor: Koichi Shinoda
-
Patent number: 7424427Abstract: An audio classification system classifies sounds in an audio stream as belonging to one of a relatively small number of classes. The audio classification system includes a signal analysis component [301] and a decoder [302]. The decoder [302] includes a number of models [310-316] for performing the audio classifications. In one implementation, the possible classifications include: vowels, fricatives, narrowband, wideband, coughing, gender, and silence. The classified audio may be used to enhance speech recognition of the audio stream.Type: GrantFiled: October 16, 2003Date of Patent: September 9, 2008Assignees: Verizon Corporate Services Group Inc., BBN Technologies Corp.Inventors: Daben Liu, Francis G. Kubala
-
Patent number: 7403896Abstract: Speech recognition is performed by matching between a characteristic quantity of an inputted speech and a composite HMM obtained by synthesizing a speech HMM (hidden Markov model) and a noise HMM for each speech frame of the inputted speech by use of the composite HMM.Type: GrantFiled: March 14, 2003Date of Patent: July 22, 2008Assignee: International Business Machines CorporationInventors: Tetsuya Takiguchi, Masafumi Nishimura
-
Publication number: 20080147404Abstract: Speech is processed that may be colored by speech accent. A method for recognizing speech includes maintaining a model of speech accent that is established based on training speech data, wherein the training speech data includes at least a first set of training speech data, and wherein establishing the model of speech accent includes not using any phone or phone-class transcription of the first set of training speech data. Related systems are also presented. A system for recognizing speech includes an accent identification module that is configured to identify accent of the speech to be recognized; and a recognizer that is configured to use models to recognize the speech to be recognized, wherein the models include at least an acoustic model that has been adapted for the identified accent using training speech data of a language, other than primary language of the speech to be recognized, that is associated with the identified accent. Related methods are also presented.Type: ApplicationFiled: May 15, 2001Publication date: June 19, 2008Applicant: NuSuara Technologies SDN BHDInventors: Wai Kat Liu, Pascale Fung
-
Patent number: 7353172Abstract: The present invention comprises a system and method for implementing a Cantonese speech recognizer with an optimized phone set, and may include a recognizer configured to compare input speech data to phone strings from a vocabulary dictionary that is implemented according to an optimized Cantonese phone set. The optimized Cantonese phone set may be implemented with a phonetic technique to separately include consonantal phones and vocalic phones. For reasons of system efficiency, the optimized Cantonese phone set may preferably be implemented in a compact manner to include only a minimum required number of consonantal phones and vocalic phones to accurately represent Cantonese speech during the speech recognition procedure.Type: GrantFiled: March 24, 2003Date of Patent: April 1, 2008Assignees: Sony Corporation, Sony Electronics Inc.Inventors: Michael Emonts, Xavier Menendez-Pidal, Lex Olorenshaw
-
Patent number: 7353173Abstract: The present invention comprises a system and method for implementing a Mandarin Chinese speech recognizer with an optimized phone set, and may include a recognizer configured to compare input speech data to phone strings from a vocabulary dictionary that is implemented according to an optimized Mandarin Chinese phone set. The optimized Mandarin Chinese phone set may be implemented with a phonetic technique to separately include consonantal phones and vocalic phones. For reasons of system efficiency, the optimized Mandarin Chinese phone set may preferably be implemented in a compact manner to include only a minimum required number of consonantal phones and vocalic phones to accurately represent Mandarin Chinese speech during the speech recognition procedure.Type: GrantFiled: March 31, 2003Date of Patent: April 1, 2008Assignees: Sony Corporation, Sony Electronics Inc.Inventors: Xavier Menendez-Pidal, Lei Duan, Jingwen Lu, Lex Olorenshaw
-
Patent number: 7353174Abstract: The present invention comprises a system and method for effectively implementing a Mandarin Chinese speech recognition dictionary, and may include a recognizer configured to compare input speech data to phone strings from a vocabulary dictionary that is implemented according to an optimized Mandarin Chinese phone set. The optimized Mandarin Chinese phone set may efficiently be implemented by utilizing an allophone and phonemic variation technique. In addition, the foregoing vocabulary dictionary may be implemented by utilizing unified dictionary optimization techniques to provide robust and accurate speech recognition. Furthermore, the vocabulary dictionary may be implemented as an optimized dictionary to accurately recognize either Northern Mandarin Chinese speech or Southern Mandarin Chinese speech during the speech recognition procedure.Type: GrantFiled: March 31, 2003Date of Patent: April 1, 2008Assignees: Sony Corporation, Sony Electronics Inc.Inventors: Xavier Menendez-Pidal, Lei Duan, Jingwen Lu, Lex Olorenshaw
-
Patent number: 7346507Abstract: A method and apparatus for building a training set for an automated speech recognition-based system, which determines the statistically optimal number of frequently requested responses to automate in order to achieve a desired automation rate. The invention may be used to select the appropriate tokens and responses to train the system and to achieve a desired “phrase coverage” for all of the many different ways human beings may phrase a request that calls for one of a plurality of frequently-requested responses. The invention also determines the statistically optimal number of tokens (spoken requests) required to train a speech recognition-based system to achieve the desired phrase coverage and optimal allocation of tokens over the set of responses that are to be automated.Type: GrantFiled: June 4, 2003Date of Patent: March 18, 2008Assignee: BBN Technologies Corp.Inventors: Premkumar Natarajan, Rohit Prasad
-
Patent number: 7313269Abstract: A method learns a structure of a video, in an unsupervised setting, to detect events in the video consistent with the structure. Sets of features are selected from the video. Based on the selected features, a hierarchical statistical model is updated, and an information gain of the hierarchical statistical model is evaluated. Redundant features are then filtered, and the hierarchical statistical model is updated, based on the filtered features. A Bayesian information criteria is applied to each model and feature set pair, which can then be rank ordered according to the criteria to detect the events in the video.Type: GrantFiled: December 12, 2003Date of Patent: December 25, 2007Assignee: Mitsubishi Electric Research Laboratories, Inc.Inventors: Lexing Xie, Ajay Divakaran, Shih-Fu Chang
-
Patent number: 7308030Abstract: An object activity modeling method which can efficiently model complex objects such as a human body is provided. The object activity modeling method includes the steps of (a) obtaining an optical flow vector from a video sequence; (b) obtaining the probability distribution of the feature vector for a plurality of video frames, using the optical flow vector; (c) modeling states, using the probability distribution of the feature vector; and (d) expressing the activity of the object in the video sequence based on state transition. According to the modeling method, in video indexing and recognition field, complex activities such as human activities can be efficiently modeled and recognized without segmenting objects.Type: GrantFiled: April 12, 2005Date of Patent: December 11, 2007Assignees: Samsung Electronics Co., Ltd., The Regents of the University of CaliforniaInventors: Yang-lim Choi, Yun-ju Yu, Bangalore S. Manjunath, Xinding Sun, Ching-wei Chen
-
Patent number: 7269558Abstract: For a given sentence grammar, speech recognizers are often required to decode M sets of HMMs each of which models a specific acoustic environment. In order to match input acoustic observations to each of the environments, typically recognition search methods require a network of M sub-networks. A new speech recognition search method is described here, which needs that is only the size of a single sub-network and yet gives the same recognition performance, thus reducing memory requirement for network storage by (M?1)/M.Type: GrantFiled: July 26, 2001Date of Patent: September 11, 2007Assignee: Texas Instruments IncorporatedInventor: Yifan Gong
-
Patent number: 7225125Abstract: A speech recognition system uses speech recognition models which are specifically trained and optimized for users residing in a particular geographic area or region. The speech models are trained with samples of word variants expected to be used in a natural language by representative members of a population associated with the geographic region or community of users. The speech recognition system is configured to have a real-time response that imitates a dialogue with a human operator.Type: GrantFiled: January 7, 2005Date of Patent: May 29, 2007Assignee: Phoenix Solutions, Inc.Inventors: Ian M. Bennett, Bandi Ramesh Babu, Kishor Morkhandikar, Pallaki Gururaj
-
Patent number: 7209883Abstract: A speech recognition method includes use of synchronous or asynchronous audio and a video data to enhance speech recognition probabilities. A two stream factorial hidden Markov model is trained and used to identify speech. At least one stream is derived from audio data and a second stream is derived from mouth pattern data. Gestural or other suitable data streams can optionally be combined to reduce speech recognition error rates in noisy environments.Type: GrantFiled: May 9, 2002Date of Patent: April 24, 2007Assignee: Intel CorporationInventor: Ara V. Nefian
-
Patent number: 7209881Abstract: Noise-superimposed speech data is grouped according to acoustic similarity, and sufficient statistics are prepared using the speech data in each of the groups. A group acoustically similar to voice data of a user of the speech recognition is selected, and sufficient statistics acoustically similar to the user's voice data are selected from the sufficient statistics in the selected group. Using the selected sufficient statistics, an acoustic model is prepared.Type: GrantFiled: December 18, 2002Date of Patent: April 24, 2007Assignee: Matsushita Electric Industrial Co., Ltd.Inventors: Shinichi Yoshizawa, Kiyohiro Shikano
-
Patent number: 7188064Abstract: A system and method for coding text data wherein a first group of text data is coded using a Viterbi algorithm using a Hidden Markov model. The Hidden Markov Model computes a probable coding responsive to the first group of text data. A second group of text data is coded using the Viterbi algorithm using a corrected Hidden Markov Model. The Hidden Markov Model is based upon the coding of the first group of text data. Coding the first group of text data includes assigning word concepts to groups of at least one word in the first group of text data and assigning propositions to groups of the assigned word concepts.Type: GrantFiled: April 12, 2002Date of Patent: March 6, 2007Assignee: University of Texas System Board of RegentsInventors: Richard M. Golden, Michael Arthur Durbin, Jason Warner Earwood
-
Patent number: 7165029Abstract: A speech recognition method includes use of synchronous or asynchronous audio and a video data to enhance speech recognition probabilities. A two stream coupled hidden Markov model is trained and used to identify speech. At least one stream is derived from audio data and a second stream is derived from mouth pattern data. Gestural or other suitable data streams can optionally be combined to reduce speech recognition error rates in noisy environments.Type: GrantFiled: May 9, 2002Date of Patent: January 16, 2007Assignee: Intel CorporationInventor: Ara V. Nefian
-
Patent number: 7103544Abstract: A method of modeling a speech recognition system includes decoding a speech signal produced from a training text to produce a sequence of predicted speech units. The training text comprises a sequence of actual speech units that is used with the sequence of predicted speech units to form a confusion model. In further embodiments, the confusion model is used to decode a text to identify an error rate that would be expected if the speech recognition system decoded speech based on the text.Type: GrantFiled: June 6, 2005Date of Patent: September 5, 2006Assignee: Microsoft CorporationInventors: Milind Mahajan, Yonggang Deng, Alejandro Acero, Asela J. R. Gunawardana, Ciprian Chelba
-
Patent number: 7089183Abstract: A new iterative hierarchical linear regression method for generating a set of linear transforms to adapt HMM speech models to a new environment for improved speech recognition is disclosed. The method determines a new set of linear transforms at an iterative step by Estimate-Maximize (EM) estimation, and then combines the new set of linear transforms with the prior set of linear transforms to form a new merged set of linear transforms. An iterative step may include realignment of adaptation speech data to the adapted HMM models to further improve speech recognition performance.Type: GrantFiled: June 22, 2001Date of Patent: August 8, 2006Assignee: Texas Instruments IncorporatedInventor: Yifan Gong