Training Of Hmm (epo) Patents (Class 704/256.2)
  • Patent number: 7856351
    Abstract: A novel system integrates speech recognition and semantic classification, so that acoustic scores in a speech recognizer that accepts spoken utterances may be taken into account when training both language models and semantic classification models. For example, a joint association score may be defined that is indicative of a correspondence of a semantic class and a word sequence for an acoustic signal. The joint association score may incorporate parameters such as weighting parameters for signal-to-class modeling of the acoustic signal, language model parameters and scores, and acoustic model parameters and scores. The parameters may be revised to raise the joint association score of a target word sequence with a target semantic class relative to the joint association score of a competitor word sequence with the target semantic class. The parameters may be designed so that the semantic classification errors in the training data are minimized.
    Type: Grant
    Filed: January 19, 2007
    Date of Patent: December 21, 2010
    Assignee: Microsoft Corporation
    Inventors: Sibel Yaman, Li Deng, Dong Yu, Ye-Yi Wang, Alejandro Acero
  • Publication number: 20100318354
    Abstract: Technologies are described herein for noise adaptive training to achieve robust automatic speech recognition. Through the use of these technologies, a noise adaptive training (NAT) approach may use both clean and corrupted speech for training. The NAT approach may normalize the environmental distortion as part of the model training. A set of underlying “pseudo-clean” model parameters may be estimated directly. This may be done without point estimation of clean speech features as an intermediate step. The pseudo-clean model parameters learned from the NAT technique may be used with a Vector Taylor Series (VTS) adaptation. Such adaptation may support decoding noisy utterances during the operating phase of a automatic voice recognition system.
    Type: Application
    Filed: June 12, 2009
    Publication date: December 16, 2010
    Applicant: Microsoft Corporation
    Inventors: Michael Lewis Seltzer, James Garnet Droppo, Ozlem Kalinli, Alejandro Acero
  • Publication number: 20100312562
    Abstract: A rope-jumping algorithm is employed in a Hidden Markov Model based text to speech system to determine start and end models and to modify the start and end models by setting small co-variances. Disordered acoustic parameters due to violation of parameter constraints are avoided through the modification and result in stable line frequency spectrum for the generated speech.
    Type: Application
    Filed: June 4, 2009
    Publication date: December 9, 2010
    Applicant: Microsoft Corporation
    Inventors: Wenlin Wang, Guoliang Zhang, Jingyang Xu
  • Patent number: 7818172
    Abstract: The method of recognizing speech in an acoustic signal comprises developing acoustic stochastic models of voice units in the form of a set of states of an acoustic signal and using the acoustic models for recognition by a comparison of the signal with predetermined acoustic models obtained via a prior learning process. While developing the acoustic models, the voice units are modeled by means of a first portion of the states independent of adjacent voice units and by means of a second portion of the states dependent on adjacent voice units. The second portion of states dependent on adjacent voice units shares common parameters with a plurality of units sharing same phonemes.
    Type: Grant
    Filed: April 20, 2004
    Date of Patent: October 19, 2010
    Assignee: France Telecom
    Inventors: Ronaldo Messina, Denis Jouvet
  • Patent number: 7805301
    Abstract: A reliable full covariance matrix estimation algorithm for pattern unit's state output distribution in pattern recognition system is discussed. An intermediate hierarchical tree structure is built to relate models for product units. Full covariance matrices of pattern unit's state output distribution are estimated based on all the related nodes in the tree.
    Type: Grant
    Filed: July 1, 2005
    Date of Patent: September 28, 2010
    Assignee: Microsoft Corporation
    Inventors: Ye Tian, Frank Kao-Ping Soong, Jian-Lai Zhou
  • Patent number: 7778831
    Abstract: Voice recognition methods and systems are disclosed. A voice signal is obtained for an utterance of a speaker. A runtime pitch is determined from the voice signal for the utterance. The speaker is categorized based on the runtime pitch and one or more acoustic model parameters are adjusted based on a categorization of the speaker. The parameter adjustment may be performed at any instance of time during the recognition. A voice recognition analysis of the utterance is then performed based on the acoustic model.
    Type: Grant
    Filed: February 21, 2006
    Date of Patent: August 17, 2010
    Assignee: Sony Computer Entertainment Inc.
    Inventor: Ruxin Chen
  • Publication number: 20100204988
    Abstract: A speech recognition method includes receiving a speech input signal in a first noise environment which includes a sequence of observations, determining the likelihood of a sequence of words arising from the sequence of observations using an acoustic model, adapting the model trained in a second noise environment to that of the first environment, wherein adapting the model trained in the second environment to that of the first environment includes using second order or higher order Taylor expansion coefficients derived for a group of probability distributions and the same expansion coefficient is used for the whole group.
    Type: Application
    Filed: April 20, 2010
    Publication date: August 12, 2010
    Inventors: Haitian XU, Kean Kheong Chin
  • Publication number: 20100191532
    Abstract: An object comparison method comprises: generating a first ordered vector sequence representation of a first object; generating a second ordered vector sequence representation of a second object; representing the first object by a first ordered sequence of model parameters generated by modeling the first ordered vector sequence representation using a semi-continuous hidden Markov model employing a universal basis; representing the second object by a second ordered sequence of model parameters generated by modeling the second ordered vector sequence representation using a semi-continuous hidden Markov model employing the universal basis; and comparing the first and second ordered sequences of model parameters to generate a quantitative comparison measure.
    Type: Application
    Filed: January 28, 2009
    Publication date: July 29, 2010
    Applicant: Xerox Corporation
    Inventors: Jose A. Rodriguez Serrano, Florent C. Perronnin
  • Patent number: 7729909
    Abstract: Model compression is combined with model compensation. Model compression is needed in embedded ASR to reduce the size and the computational complexity of compressed models. Model-compensation is used to adapt in real-time to changing noise environments. The present invention allows for the design of smaller ASR engines (memory consumption reduced to up to one-sixth) with reduced impact on recognition accuracy and/or robustness to noises.
    Type: Grant
    Filed: March 6, 2006
    Date of Patent: June 1, 2010
    Assignee: Panasonic Corporation
    Inventors: Luca Rigazio, David Kryze, Keiko Morii, Nobuyuki Kunieda, Jean-Claude Junqua
  • Publication number: 20100128985
    Abstract: Method for online character recognition of Arabic text, the method including receiving handwritten Arabic text from a user in the form of handwriting strokes, sampling the handwriting strokes to acquire a sequence of two dimensional point representations thereof, with associated temporal data, geometrically pre processing and extracting features on the point representations, detecting delayed strokes and word parts in the pre processed point representations, projecting the delayed strokes onto the body of the word parts, constructing feature vector representations for each word part, thereby generating an observation sequence, and determining the word with maximum probability given the observation sequence, resulting in a list of word probabilities.
    Type: Application
    Filed: July 26, 2007
    Publication date: May 27, 2010
    Applicant: BGN TECHNOLOGIES LTD.
    Inventors: Jihad El-Sana, Fadi Biadsy
  • Patent number: 7707027
    Abstract: A method for identifying data that is meaningless and generating a natural language statistical model which can reject meaningless input. The method can include identifying unigrams that are individually meaningless from a set of training data. At least a portion of the unigrams identified as being meaningless can be assigned to a first n-gram class. The method also can include identifying bigrams that are entirely composed of meaningless unigrams and determining whether the identified bigrams are individually meaningless. At least a portion of the bigrams identified as being individually meaningless can be assigned to the first n-gram class.
    Type: Grant
    Filed: April 13, 2006
    Date of Patent: April 27, 2010
    Assignee: Nuance Communications, Inc.
    Inventors: Rajesh Balchandran, Linda Boyer
  • Patent number: 7689419
    Abstract: A method and apparatus are provided for training parameters in a hidden conditional random field model for use in speech recognition and phonetic classification. The hidden conditional random field model uses parameterized features that are determined from a segment of speech, and those values are used to identify a phonetic unit for the segment of speech. The parameters are updated after processing of individual training samples.
    Type: Grant
    Filed: September 22, 2005
    Date of Patent: March 30, 2010
    Assignee: Microsoft Corporation
    Inventors: Milind V. Mahajan, Alejandro Acero, Asela J. Gunawardana, John C. Platt
  • Publication number: 20100070280
    Abstract: A speech recognition system uses Gaussian mixture variable-parameter hidden Markov models (VPHMMs) to recognize speech. The VPHMMs include Gaussian parameters that vary as a function of at least one environmental conditioning parameter. The relationship of each Gaussian parameter to the environmental conditioning parameter(s) is modeled using a piecewise fitting approach, such as by using spline functions. In a training phase, the recognition system can use clustering to identify classes of spline functions, each class grouping together spline functions which are similar to each other based on some distance measure. The recognition system can then store sets of spline parameters that represent respective classes of spline functions. An instance of a spline function that belongs to a class can make reference to an associated shared set of spline parameters. The Gaussian parameters can be represented in an efficient form that accommodates the use of sharing in the above-summarized manner.
    Type: Application
    Filed: September 16, 2008
    Publication date: March 18, 2010
    Applicant: Microsoft Corporation
    Inventors: Dong Yu, Li Deng, Yifan Gong, Alejandro Acero
  • Patent number: 7680664
    Abstract: A multi-state pattern recognition model with non-uniform kernel allocation is formed by setting a number of states for a multi-state pattern recognition model and assigning different numbers of kernels to different states. The kernels are then trained using training data to form the multi-state pattern recognition model.
    Type: Grant
    Filed: August 16, 2006
    Date of Patent: March 16, 2010
    Assignee: Microsoft Corporation
    Inventors: Peng Liu, Jian-Iai Zhou, Frank Kao-ping Soong
  • Patent number: 7672847
    Abstract: Methods are given for improving discriminative training of hidden Markov models for continuous speech recognition. For a mixture component of a hidden Markov model state, a gradient adjustment is calculated of the standard deviation of the mixture component. If the calculated gradient adjustment is greater than a first threshold amount, an adjustment is performed of the standard deviation of the mixture component using the first threshold. If the calculated gradient adjustment is less than a second threshold amount, an adjustment is performed of the standard deviation of the mixture component using the second threshold. Otherwise, an adjustment is performed of the standard deviation of the mixture component using the calculated gradient adjustment.
    Type: Grant
    Filed: September 30, 2008
    Date of Patent: March 2, 2010
    Assignee: Nuance Communications, Inc.
    Inventors: Chuang He, Jianxiong Wu, Vlad Sejnoha
  • Patent number: 7664643
    Abstract: A method, and a system to execute this method is being presented for the identification and separation of sources of an acoustic signal, which signal contains a mixture of multiple simultaneous component signals. The method represents the signal with multiple discrete state-variable sequences and combines acoustic and context level dynamics to achieve the source separation. The method identifies sources by discovering those frames of the signal whose features are dominated by single sources. The signal may be the simultaneous speech of multiple speakers.
    Type: Grant
    Filed: August 25, 2006
    Date of Patent: February 16, 2010
    Assignees: Nuance Communications, Inc.
    Inventors: Ramesh Ambat Gopinath, John Randall Hershey, Trausti Thor Kristjansson, Peder Andreas Olsen, Steven John Rennie
  • Patent number: 7660717
    Abstract: Speech recognition is performed by matching between a characteristic quantity of an inputted speech and a composite HMM obtained by synthesizing a speech HMM (hidden Markov model) and a noise HMM for each speech frame of the inputted speech by use of the composite HMM.
    Type: Grant
    Filed: January 9, 2008
    Date of Patent: February 9, 2010
    Assignee: Nuance Communications, Inc.
    Inventors: Tetsuya Takiguchi, Masafumi Nishimura
  • Patent number: 7627473
    Abstract: A method and apparatus are provided for training and using a hidden conditional random field model for speech recognition and phonetic classification. The hidden conditional random field model uses feature functions, at least one of which is based on a hidden state in a phonetic unit. Values for the feature functions are determined from a segment of speech, and these values are used to identify a phonetic unit for the segment of speech.
    Type: Grant
    Filed: October 15, 2004
    Date of Patent: December 1, 2009
    Assignee: Microsoft Corporation
    Inventors: Asela J. Gunawardana, Milind Mahajan, Alejandro Acero
  • Patent number: 7624020
    Abstract: An adapter for a text to text training. A main corpus is used for training, and a domain specific corpus is used to adapt the main corpus according to the training information in the domain specific corpus. The adaptation is carried out using a technique that may be faster than the main training. The parameter set from the main training is adapted using the domain specific part.
    Type: Grant
    Filed: September 9, 2005
    Date of Patent: November 24, 2009
    Assignee: Language Weaver, Inc.
    Inventors: Kenji Yamada, Kevin Knight, Greg Langmead
  • Patent number: 7603276
    Abstract: A standard model creating apparatus which provides a high-precision standard model used for pattern recognition such as speech recognition, character recognition, or image recognition using a probability model based on a hidden Markov model, Bayesian theory, or linear discrimination analysis; intention interpretation using a probability model such as a Bayesian net; data-mining performed using a probability model; and so forth. The standard model creating apparatus includes a reference model preparing unit that prepares at least one reference model; a reference model storing unit that stores the reference model prepared by the reference model preparing unit; and a standard model creating unit that creates a standard model by calculating statistics of the standard model so as to maximize or locally maximize the probability or likelihood with respect to the reference model stored in the reference storing unit.
    Type: Grant
    Filed: November 18, 2003
    Date of Patent: October 13, 2009
    Assignee: Panasonic Corporation
    Inventor: Shinichi Yoshizawa
  • Patent number: 7574411
    Abstract: Management of a low memory treelike data structure is shown. The method according to the invention comprises steps for creating a decision tree including a parent node and at least one leaf node, and steps for searching data from said nodes. The nodes of the decision tree are stored sequentially in such a manner that nodes follow the parent node in storage order, wherein the nodes refining the context of the searchable data can be reached without a link from their parent node. The method can preferably be utilized in speech-recognition systems, in text-to-phoneme mapping.
    Type: Grant
    Filed: April 29, 2004
    Date of Patent: August 11, 2009
    Assignee: Nokia Corporation
    Inventors: Janne Suontausta, Jilei Tian
  • Publication number: 20090112595
    Abstract: Disclosed are systems and methods for training a barge-in-model for speech processing in a spoken dialogue system comprising the steps of (1) receiving an input having at least one speech segment and at least one non-speech segment, (2) establishing a restriction of recognizing only speech states during speech segments of the input and non-speech states during non-speech segments of the input, (2) generating a hypothesis lattice by allowing any sequence of speech Hidden Markov Models (HMMs) and non-speech HMMs, (4) generating a reference lattice by only allowing speech HMMs for at least one speech segment and non-speech HMMs for at least one non-speech segment, wherein different iterations of training generates at least one different reference lattice and at least one reference transcription, and (5) employing the generated reference lattice as the barge-in-model for speech processing.
    Type: Application
    Filed: October 31, 2007
    Publication date: April 30, 2009
    Applicant: AT&T Labs
    Inventor: Andrej Ljolje
  • Patent number: 7509259
    Abstract: A device (800) performs statistical pattern recognition using model parameters that are refined by optimizing an objective function that includes a term for many items of training data for which recognition errors occur wherein each term depends on a relative magnitude of a first score for a recognition result for an item of training data and a second score calculated by evaluating a statistical pattern recognition model identified by a transcribed identity of the training data item with feature vectors extracted from the item of training data. The objective function does not include terms for items of training data for which there is a gross discrepancy between a transcribed identity and a recognized identity. Gross discrepancies can be detected by probability score or pattern identity comparisons. Terms, of the objective function are weighted based on the type of recognition error and weights can be increased for high priority patterns.
    Type: Grant
    Filed: December 21, 2004
    Date of Patent: March 24, 2009
    Assignee: Motorola, Inc.
    Inventor: Jianming J. Song
  • Patent number: 7499857
    Abstract: The present invention is used to adapt acoustic models, quantized in subspaces, using adaptation training data (such as speaker-dependent training data). The acoustic model is compressed into multi-dimensional subspaces. A codebook is generated for each subspace. An adaptation transform is estimated, and it is applied to codewords in the codebooks, rather than to the means themselves.
    Type: Grant
    Filed: May 15, 2003
    Date of Patent: March 3, 2009
    Assignee: Microsoft Corporation
    Inventor: Asela J. Gunawardana
  • Publication number: 20090055182
    Abstract: Methods are given for improving discriminative training of hidden Markov models for continuous speech recognition. For a mixture component of a hidden Markov model state, a gradient adjustment is calculated of the standard deviation of the mixture component. If the calculated gradient adjustment is greater than a first threshold amount, an adjustment is performed of the standard deviation of the mixture component using the first threshold. If the calculated gradient adjustment is less than a second threshold amount, an adjustment is performed of the standard deviation of the mixture component using the second threshold. Otherwise, an adjustment is performed of the standard deviation of the mixture component using the calculated gradient adjustment.
    Type: Application
    Filed: September 30, 2008
    Publication date: February 26, 2009
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventors: Chuang He, Jianxiong Wu, Vlad Sejnoha
  • Patent number: 7496509
    Abstract: In large-scale deployments of speaker recognition systems the potential for legacy problems increases as the evolving technology may require configuration changes in the system thus invalidating already existing user voice accounts. Unless the entire database of original speech waveform were stored, users need to reenroll to keep their accounts functional, which, however, may be expensive and commercially not acceptable. Model migration is defined as a conversion of obsolete models to new-configuration models without additional data and waveform requirements. The present disclosure investigates ways to achieve such a migration with minimum loss of system accuracy.
    Type: Grant
    Filed: May 28, 2004
    Date of Patent: February 24, 2009
    Assignee: International Business Machines Corporation
    Inventors: Jiri Navratil, Ganesh N. Ramaswamy, Ran D. Zilca
  • Patent number: 7475014
    Abstract: A method models trajectories of a signal source. Training signals generated by a signal source moving along known trajectories are acquired by each sensor in an array of sensors. Phase differences between all unique pairs of the training signals are determined. A wrapped-phase hidden Markov model is constructed from the phase differences. The wrapped-phase hidden Markov model includes multiple Gaussian distributions to model the known trajectories of the signal source.
    Type: Grant
    Filed: July 25, 2005
    Date of Patent: January 6, 2009
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Paris Smaragdis, Petros Boufounos
  • Patent number: 7472064
    Abstract: A method and system are provided in which a decision tree-based model (“general model”) is scaled down (“trim-down”) for a given task. The trim-down model can be adapted for the given task using task specific data. The general model can be based on a hidden markov model (HMM). By allowing a decision tree-based acoustic model (“general model”) to be scaled according to the vocabulary of the given task, the general model can be configured dynamically into a trim-down model, which can be used to improve speech recognition performance and reduce system resource utilization. Furthermore, the trim-down model can be adapted/adjusted according to task specific data, e.g., task vocabulary, model size, or other like task specific data.
    Type: Grant
    Filed: September 30, 2000
    Date of Patent: December 30, 2008
    Assignee: Intel Corporation
    Inventors: Qing Guo, Yonghong Yan, Baosheng Yuan
  • Patent number: 7464033
    Abstract: For a given sentence grammar, speech recognizers are often required to decode M sets of HMMs each of which models a specific acoustic environment. In order to match input acoustic observations to each of the environments, typically recognition search methods require a network of M sub-networks. A new speech recognition search method is described here, which needs a network that is only the size of a single sub-network and yet provides the same recognition performance, thus reducing the memory requirements for network storage by (M-1)/M.
    Type: Grant
    Filed: February 4, 2005
    Date of Patent: December 9, 2008
    Assignee: Texas Instruments Incorporated
    Inventor: Yifan Gong
  • Patent number: 7454341
    Abstract: According to one aspect of the invention, a method is provided in which a mean vector set and a variance vector set of a set of N Gaussians are divided into multiple mean sub-vector sets and variance sub-vector sets, respectively. Each mean sub-vector set contains a subset of the dimensions of the corresponding mean vector set and each variance sub-vector set contains a subset of the dimensions of the corresponding variance vector set. Each resultant sub-vector set is clustered to build a codebook for the respective sub-vector set using a modified K-means clustering process which dynamically merges and splits clusters based upon the size and average distortion of each cluster during each iteration in the modified K-means clustering process.
    Type: Grant
    Filed: September 30, 2000
    Date of Patent: November 18, 2008
    Assignee: Intel Corporation
    Inventors: Jielin Pan, Baosheng Yuan
  • Patent number: 7437288
    Abstract: A speech recognition apparatus using a probability model that employs a mixed distribution, the apparatus formed by a standard pattern storage means for storing a standard pattern; a recognition means for outputting recognition results corresponding to an input speech by using the standard pattern; a standard pattern generating means for inputting learning speech and generating the standard pattern; and a standard pattern adjustment means, provided between the standard pattern generating means and the standard pattern storage means, for adjusting the number of element distributions of the mixed distribution of the standard pattern.
    Type: Grant
    Filed: March 11, 2002
    Date of Patent: October 14, 2008
    Assignee: NEC Corporation
    Inventor: Koichi Shinoda
  • Patent number: 7424427
    Abstract: An audio classification system classifies sounds in an audio stream as belonging to one of a relatively small number of classes. The audio classification system includes a signal analysis component [301] and a decoder [302]. The decoder [302] includes a number of models [310-316] for performing the audio classifications. In one implementation, the possible classifications include: vowels, fricatives, narrowband, wideband, coughing, gender, and silence. The classified audio may be used to enhance speech recognition of the audio stream.
    Type: Grant
    Filed: October 16, 2003
    Date of Patent: September 9, 2008
    Assignees: Verizon Corporate Services Group Inc., BBN Technologies Corp.
    Inventors: Daben Liu, Francis G. Kubala
  • Patent number: 7403896
    Abstract: Speech recognition is performed by matching between a characteristic quantity of an inputted speech and a composite HMM obtained by synthesizing a speech HMM (hidden Markov model) and a noise HMM for each speech frame of the inputted speech by use of the composite HMM.
    Type: Grant
    Filed: March 14, 2003
    Date of Patent: July 22, 2008
    Assignee: International Business Machines Corporation
    Inventors: Tetsuya Takiguchi, Masafumi Nishimura
  • Publication number: 20080147404
    Abstract: Speech is processed that may be colored by speech accent. A method for recognizing speech includes maintaining a model of speech accent that is established based on training speech data, wherein the training speech data includes at least a first set of training speech data, and wherein establishing the model of speech accent includes not using any phone or phone-class transcription of the first set of training speech data. Related systems are also presented. A system for recognizing speech includes an accent identification module that is configured to identify accent of the speech to be recognized; and a recognizer that is configured to use models to recognize the speech to be recognized, wherein the models include at least an acoustic model that has been adapted for the identified accent using training speech data of a language, other than primary language of the speech to be recognized, that is associated with the identified accent. Related methods are also presented.
    Type: Application
    Filed: May 15, 2001
    Publication date: June 19, 2008
    Applicant: NuSuara Technologies SDN BHD
    Inventors: Wai Kat Liu, Pascale Fung
  • Patent number: 7353172
    Abstract: The present invention comprises a system and method for implementing a Cantonese speech recognizer with an optimized phone set, and may include a recognizer configured to compare input speech data to phone strings from a vocabulary dictionary that is implemented according to an optimized Cantonese phone set. The optimized Cantonese phone set may be implemented with a phonetic technique to separately include consonantal phones and vocalic phones. For reasons of system efficiency, the optimized Cantonese phone set may preferably be implemented in a compact manner to include only a minimum required number of consonantal phones and vocalic phones to accurately represent Cantonese speech during the speech recognition procedure.
    Type: Grant
    Filed: March 24, 2003
    Date of Patent: April 1, 2008
    Assignees: Sony Corporation, Sony Electronics Inc.
    Inventors: Michael Emonts, Xavier Menendez-Pidal, Lex Olorenshaw
  • Patent number: 7353173
    Abstract: The present invention comprises a system and method for implementing a Mandarin Chinese speech recognizer with an optimized phone set, and may include a recognizer configured to compare input speech data to phone strings from a vocabulary dictionary that is implemented according to an optimized Mandarin Chinese phone set. The optimized Mandarin Chinese phone set may be implemented with a phonetic technique to separately include consonantal phones and vocalic phones. For reasons of system efficiency, the optimized Mandarin Chinese phone set may preferably be implemented in a compact manner to include only a minimum required number of consonantal phones and vocalic phones to accurately represent Mandarin Chinese speech during the speech recognition procedure.
    Type: Grant
    Filed: March 31, 2003
    Date of Patent: April 1, 2008
    Assignees: Sony Corporation, Sony Electronics Inc.
    Inventors: Xavier Menendez-Pidal, Lei Duan, Jingwen Lu, Lex Olorenshaw
  • Patent number: 7353174
    Abstract: The present invention comprises a system and method for effectively implementing a Mandarin Chinese speech recognition dictionary, and may include a recognizer configured to compare input speech data to phone strings from a vocabulary dictionary that is implemented according to an optimized Mandarin Chinese phone set. The optimized Mandarin Chinese phone set may efficiently be implemented by utilizing an allophone and phonemic variation technique. In addition, the foregoing vocabulary dictionary may be implemented by utilizing unified dictionary optimization techniques to provide robust and accurate speech recognition. Furthermore, the vocabulary dictionary may be implemented as an optimized dictionary to accurately recognize either Northern Mandarin Chinese speech or Southern Mandarin Chinese speech during the speech recognition procedure.
    Type: Grant
    Filed: March 31, 2003
    Date of Patent: April 1, 2008
    Assignees: Sony Corporation, Sony Electronics Inc.
    Inventors: Xavier Menendez-Pidal, Lei Duan, Jingwen Lu, Lex Olorenshaw
  • Patent number: 7346507
    Abstract: A method and apparatus for building a training set for an automated speech recognition-based system, which determines the statistically optimal number of frequently requested responses to automate in order to achieve a desired automation rate. The invention may be used to select the appropriate tokens and responses to train the system and to achieve a desired “phrase coverage” for all of the many different ways human beings may phrase a request that calls for one of a plurality of frequently-requested responses. The invention also determines the statistically optimal number of tokens (spoken requests) required to train a speech recognition-based system to achieve the desired phrase coverage and optimal allocation of tokens over the set of responses that are to be automated.
    Type: Grant
    Filed: June 4, 2003
    Date of Patent: March 18, 2008
    Assignee: BBN Technologies Corp.
    Inventors: Premkumar Natarajan, Rohit Prasad
  • Patent number: 7313269
    Abstract: A method learns a structure of a video, in an unsupervised setting, to detect events in the video consistent with the structure. Sets of features are selected from the video. Based on the selected features, a hierarchical statistical model is updated, and an information gain of the hierarchical statistical model is evaluated. Redundant features are then filtered, and the hierarchical statistical model is updated, based on the filtered features. A Bayesian information criteria is applied to each model and feature set pair, which can then be rank ordered according to the criteria to detect the events in the video.
    Type: Grant
    Filed: December 12, 2003
    Date of Patent: December 25, 2007
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Lexing Xie, Ajay Divakaran, Shih-Fu Chang
  • Patent number: 7308030
    Abstract: An object activity modeling method which can efficiently model complex objects such as a human body is provided. The object activity modeling method includes the steps of (a) obtaining an optical flow vector from a video sequence; (b) obtaining the probability distribution of the feature vector for a plurality of video frames, using the optical flow vector; (c) modeling states, using the probability distribution of the feature vector; and (d) expressing the activity of the object in the video sequence based on state transition. According to the modeling method, in video indexing and recognition field, complex activities such as human activities can be efficiently modeled and recognized without segmenting objects.
    Type: Grant
    Filed: April 12, 2005
    Date of Patent: December 11, 2007
    Assignees: Samsung Electronics Co., Ltd., The Regents of the University of California
    Inventors: Yang-lim Choi, Yun-ju Yu, Bangalore S. Manjunath, Xinding Sun, Ching-wei Chen
  • Patent number: 7269558
    Abstract: For a given sentence grammar, speech recognizers are often required to decode M sets of HMMs each of which models a specific acoustic environment. In order to match input acoustic observations to each of the environments, typically recognition search methods require a network of M sub-networks. A new speech recognition search method is described here, which needs that is only the size of a single sub-network and yet gives the same recognition performance, thus reducing memory requirement for network storage by (M?1)/M.
    Type: Grant
    Filed: July 26, 2001
    Date of Patent: September 11, 2007
    Assignee: Texas Instruments Incorporated
    Inventor: Yifan Gong
  • Patent number: 7225125
    Abstract: A speech recognition system uses speech recognition models which are specifically trained and optimized for users residing in a particular geographic area or region. The speech models are trained with samples of word variants expected to be used in a natural language by representative members of a population associated with the geographic region or community of users. The speech recognition system is configured to have a real-time response that imitates a dialogue with a human operator.
    Type: Grant
    Filed: January 7, 2005
    Date of Patent: May 29, 2007
    Assignee: Phoenix Solutions, Inc.
    Inventors: Ian M. Bennett, Bandi Ramesh Babu, Kishor Morkhandikar, Pallaki Gururaj
  • Patent number: 7209883
    Abstract: A speech recognition method includes use of synchronous or asynchronous audio and a video data to enhance speech recognition probabilities. A two stream factorial hidden Markov model is trained and used to identify speech. At least one stream is derived from audio data and a second stream is derived from mouth pattern data. Gestural or other suitable data streams can optionally be combined to reduce speech recognition error rates in noisy environments.
    Type: Grant
    Filed: May 9, 2002
    Date of Patent: April 24, 2007
    Assignee: Intel Corporation
    Inventor: Ara V. Nefian
  • Patent number: 7209881
    Abstract: Noise-superimposed speech data is grouped according to acoustic similarity, and sufficient statistics are prepared using the speech data in each of the groups. A group acoustically similar to voice data of a user of the speech recognition is selected, and sufficient statistics acoustically similar to the user's voice data are selected from the sufficient statistics in the selected group. Using the selected sufficient statistics, an acoustic model is prepared.
    Type: Grant
    Filed: December 18, 2002
    Date of Patent: April 24, 2007
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Shinichi Yoshizawa, Kiyohiro Shikano
  • Patent number: 7188064
    Abstract: A system and method for coding text data wherein a first group of text data is coded using a Viterbi algorithm using a Hidden Markov model. The Hidden Markov Model computes a probable coding responsive to the first group of text data. A second group of text data is coded using the Viterbi algorithm using a corrected Hidden Markov Model. The Hidden Markov Model is based upon the coding of the first group of text data. Coding the first group of text data includes assigning word concepts to groups of at least one word in the first group of text data and assigning propositions to groups of the assigned word concepts.
    Type: Grant
    Filed: April 12, 2002
    Date of Patent: March 6, 2007
    Assignee: University of Texas System Board of Regents
    Inventors: Richard M. Golden, Michael Arthur Durbin, Jason Warner Earwood
  • Patent number: 7165029
    Abstract: A speech recognition method includes use of synchronous or asynchronous audio and a video data to enhance speech recognition probabilities. A two stream coupled hidden Markov model is trained and used to identify speech. At least one stream is derived from audio data and a second stream is derived from mouth pattern data. Gestural or other suitable data streams can optionally be combined to reduce speech recognition error rates in noisy environments.
    Type: Grant
    Filed: May 9, 2002
    Date of Patent: January 16, 2007
    Assignee: Intel Corporation
    Inventor: Ara V. Nefian
  • Patent number: 7103544
    Abstract: A method of modeling a speech recognition system includes decoding a speech signal produced from a training text to produce a sequence of predicted speech units. The training text comprises a sequence of actual speech units that is used with the sequence of predicted speech units to form a confusion model. In further embodiments, the confusion model is used to decode a text to identify an error rate that would be expected if the speech recognition system decoded speech based on the text.
    Type: Grant
    Filed: June 6, 2005
    Date of Patent: September 5, 2006
    Assignee: Microsoft Corporation
    Inventors: Milind Mahajan, Yonggang Deng, Alejandro Acero, Asela J. R. Gunawardana, Ciprian Chelba
  • Patent number: 7089183
    Abstract: A new iterative hierarchical linear regression method for generating a set of linear transforms to adapt HMM speech models to a new environment for improved speech recognition is disclosed. The method determines a new set of linear transforms at an iterative step by Estimate-Maximize (EM) estimation, and then combines the new set of linear transforms with the prior set of linear transforms to form a new merged set of linear transforms. An iterative step may include realignment of adaptation speech data to the adapted HMM models to further improve speech recognition performance.
    Type: Grant
    Filed: June 22, 2001
    Date of Patent: August 8, 2006
    Assignee: Texas Instruments Incorporated
    Inventor: Yifan Gong