Training Of Hmm (epo) Patents (Class 704/256.2)

With insufficient amount of training data, e.g., state sharing, tying, deleted interpolation (epo) (Class 704/256.3)

Integrated speech recognition and semantic classification

Patent number: 7856351

Abstract: A novel system integrates speech recognition and semantic classification, so that acoustic scores in a speech recognizer that accepts spoken utterances may be taken into account when training both language models and semantic classification models. For example, a joint association score may be defined that is indicative of a correspondence of a semantic class and a word sequence for an acoustic signal. The joint association score may incorporate parameters such as weighting parameters for signal-to-class modeling of the acoustic signal, language model parameters and scores, and acoustic model parameters and scores. The parameters may be revised to raise the joint association score of a target word sequence with a target semantic class relative to the joint association score of a competitor word sequence with the target semantic class. The parameters may be designed so that the semantic classification errors in the training data are minimized.

Type: Grant

Filed: January 19, 2007

Date of Patent: December 21, 2010

Assignee: Microsoft Corporation

Inventors: Sibel Yaman, Li Deng, Dong Yu, Ye-Yi Wang, Alejandro Acero
NOISE ADAPTIVE TRAINING FOR SPEECH RECOGNITION

Publication number: 20100318354

Abstract: Technologies are described herein for noise adaptive training to achieve robust automatic speech recognition. Through the use of these technologies, a noise adaptive training (NAT) approach may use both clean and corrupted speech for training. The NAT approach may normalize the environmental distortion as part of the model training. A set of underlying “pseudo-clean” model parameters may be estimated directly. This may be done without point estimation of clean speech features as an intermediate step. The pseudo-clean model parameters learned from the NAT technique may be used with a Vector Taylor Series (VTS) adaptation. Such adaptation may support decoding noisy utterances during the operating phase of a automatic voice recognition system.

Type: Application

Filed: June 12, 2009

Publication date: December 16, 2010

Applicant: Microsoft Corporation

Inventors: Michael Lewis Seltzer, James Garnet Droppo, Ozlem Kalinli, Alejandro Acero
HIDDEN MARKOV MODEL BASED TEXT TO SPEECH SYSTEMS EMPLOYING ROPE-JUMPING ALGORITHM

Publication number: 20100312562

Abstract: A rope-jumping algorithm is employed in a Hidden Markov Model based text to speech system to determine start and end models and to modify the start and end models by setting small co-variances. Disordered acoustic parameters due to violation of parameter constraints are avoided through the modification and result in stable line frequency spectrum for the generated speech.

Type: Application

Filed: June 4, 2009

Publication date: December 9, 2010

Applicant: Microsoft Corporation

Inventors: Wenlin Wang, Guoliang Zhang, Jingyang Xu
Voice recognition method and system based on the contexual modeling of voice units

Patent number: 7818172

Abstract: The method of recognizing speech in an acoustic signal comprises developing acoustic stochastic models of voice units in the form of a set of states of an acoustic signal and using the acoustic models for recognition by a comparison of the signal with predetermined acoustic models obtained via a prior learning process. While developing the acoustic models, the voice units are modeled by means of a first portion of the states independent of adjacent voice units and by means of a second portion of the states dependent on adjacent voice units. The second portion of states dependent on adjacent voice units shares common parameters with a plurality of units sharing same phonemes.

Type: Grant

Filed: April 20, 2004

Date of Patent: October 19, 2010

Assignee: France Telecom

Inventors: Ronaldo Messina, Denis Jouvet
Covariance estimation for pattern recognition

Patent number: 7805301

Abstract: A reliable full covariance matrix estimation algorithm for pattern unit's state output distribution in pattern recognition system is discussed. An intermediate hierarchical tree structure is built to relate models for product units. Full covariance matrices of pattern unit's state output distribution are estimated based on all the related nodes in the tree.

Type: Grant

Filed: July 1, 2005

Date of Patent: September 28, 2010

Assignee: Microsoft Corporation

Inventors: Ye Tian, Frank Kao-Ping Soong, Jian-Lai Zhou
Voice recognition with dynamic filter bank adjustment based on speaker categorization determined from runtime pitch

Patent number: 7778831

Abstract: Voice recognition methods and systems are disclosed. A voice signal is obtained for an utterance of a speaker. A runtime pitch is determined from the voice signal for the utterance. The speaker is categorized based on the runtime pitch and one or more acoustic model parameters are adjusted based on a categorization of the speaker. The parameter adjustment may be performed at any instance of time during the recognition. A voice recognition analysis of the utterance is then performed based on the acoustic model.

Type: Grant

Filed: February 21, 2006

Date of Patent: August 17, 2010

Assignee: Sony Computer Entertainment Inc.

Inventor: Ruxin Chen
SPEECH RECOGNITION METHOD

Publication number: 20100204988

Abstract: A speech recognition method includes receiving a speech input signal in a first noise environment which includes a sequence of observations, determining the likelihood of a sequence of words arising from the sequence of observations using an acoustic model, adapting the model trained in a second noise environment to that of the first environment, wherein adapting the model trained in the second environment to that of the first environment includes using second order or higher order Taylor expansion coefficients derived for a group of probability distributions and the same expansion coefficient is used for the whole group.

Type: Application

Filed: April 20, 2010

Publication date: August 12, 2010

Inventors: Haitian XU, Kean Kheong Chin
Model-based comparative measure for vector sequences and word spotting using same

Publication number: 20100191532

Abstract: An object comparison method comprises: generating a first ordered vector sequence representation of a first object; generating a second ordered vector sequence representation of a second object; representing the first object by a first ordered sequence of model parameters generated by modeling the first ordered vector sequence representation using a semi-continuous hidden Markov model employing a universal basis; representing the second object by a second ordered sequence of model parameters generated by modeling the second ordered vector sequence representation using a semi-continuous hidden Markov model employing the universal basis; and comparing the first and second ordered sequences of model parameters to generate a quantitative comparison measure.

Type: Application

Filed: January 28, 2009

Publication date: July 29, 2010

Applicant: Xerox Corporation

Inventors: Jose A. Rodriguez Serrano, Florent C. Perronnin
Block-diagonal covariance joint subspace tying and model compensation for noise robust automatic speech recognition

Patent number: 7729909

Abstract: Model compression is combined with model compensation. Model compression is needed in embedded ASR to reduce the size and the computational complexity of compressed models. Model-compensation is used to adapt in real-time to changing noise environments. The present invention allows for the design of smaller ASR engines (memory consumption reduced to up to one-sixth) with reduced impact on recognition accuracy and/or robustness to noises.

Type: Grant

Filed: March 6, 2006

Date of Patent: June 1, 2010

Assignee: Panasonic Corporation

Inventors: Luca Rigazio, David Kryze, Keiko Morii, Nobuyuki Kunieda, Jean-Claude Junqua
ONLINE ARABIC HANDWRITING RECOGNITION

Publication number: 20100128985

Abstract: Method for online character recognition of Arabic text, the method including receiving handwritten Arabic text from a user in the form of handwriting strokes, sampling the handwriting strokes to acquire a sequence of two dimensional point representations thereof, with associated temporal data, geometrically pre processing and extracting features on the point representations, detecting delayed strokes and word parts in the pre processed point representations, projecting the delayed strokes onto the body of the word parts, constructing feature vector representations for each word part, thereby generating an observation sequence, and determining the word with maximum probability given the observation sequence, resulting in a list of word probabilities.

Type: Application

Filed: July 26, 2007

Publication date: May 27, 2010

Applicant: BGN TECHNOLOGIES LTD.

Inventors: Jihad El-Sana, Fadi Biadsy
Identification and rejection of meaningless input during natural language classification

Patent number: 7707027

Abstract: A method for identifying data that is meaningless and generating a natural language statistical model which can reject meaningless input. The method can include identifying unigrams that are individually meaningless from a set of training data. At least a portion of the unigrams identified as being meaningless can be assigned to a first n-gram class. The method also can include identifying bigrams that are entirely composed of meaningless unigrams and determining whether the identified bigrams are individually meaningless. At least a portion of the bigrams identified as being individually meaningless can be assigned to the first n-gram class.

Type: Grant

Filed: April 13, 2006

Date of Patent: April 27, 2010

Assignee: Nuance Communications, Inc.

Inventors: Rajesh Balchandran, Linda Boyer
Updating hidden conditional random field model parameters after processing individual training samples

Patent number: 7689419

Abstract: A method and apparatus are provided for training parameters in a hidden conditional random field model for use in speech recognition and phonetic classification. The hidden conditional random field model uses parameterized features that are determined from a segment of speech, and those values are used to identify a phonetic unit for the segment of speech. The parameters are updated after processing of individual training samples.

Type: Grant

Filed: September 22, 2005

Date of Patent: March 30, 2010

Assignee: Microsoft Corporation

Inventors: Milind V. Mahajan, Alejandro Acero, Asela J. Gunawardana, John C. Platt
PARAMETER CLUSTERING AND SHARING FOR VARIABLE-PARAMETER HIDDEN MARKOV MODELS

Publication number: 20100070280

Abstract: A speech recognition system uses Gaussian mixture variable-parameter hidden Markov models (VPHMMs) to recognize speech. The VPHMMs include Gaussian parameters that vary as a function of at least one environmental conditioning parameter. The relationship of each Gaussian parameter to the environmental conditioning parameter(s) is modeled using a piecewise fitting approach, such as by using spline functions. In a training phase, the recognition system can use clustering to identify classes of spline functions, each class grouping together spline functions which are similar to each other based on some distance measure. The recognition system can then store sets of spline parameters that represent respective classes of spline functions. An instance of a spline function that belongs to a class can make reference to an associated shared set of spline parameters. The Gaussian parameters can be represented in an efficient form that accommodates the use of sharing in the above-summarized manner.

Type: Application

Filed: September 16, 2008

Publication date: March 18, 2010

Applicant: Microsoft Corporation

Inventors: Dong Yu, Li Deng, Yifan Gong, Alejandro Acero
Parsimonious modeling by non-uniform kernel allocation

Patent number: 7680664

Abstract: A multi-state pattern recognition model with non-uniform kernel allocation is formed by setting a number of states for a multi-state pattern recognition model and assigning different numbers of kernels to different states. The kernels are then trained using training data to form the multi-state pattern recognition model.

Type: Grant

Filed: August 16, 2006

Date of Patent: March 16, 2010

Assignee: Microsoft Corporation

Inventors: Peng Liu, Jian-Iai Zhou, Frank Kao-ping Soong
Discriminative training of hidden Markov models for continuous speech recognition

Patent number: 7672847

Abstract: Methods are given for improving discriminative training of hidden Markov models for continuous speech recognition. For a mixture component of a hidden Markov model state, a gradient adjustment is calculated of the standard deviation of the mixture component. If the calculated gradient adjustment is greater than a first threshold amount, an adjustment is performed of the standard deviation of the mixture component using the first threshold. If the calculated gradient adjustment is less than a second threshold amount, an adjustment is performed of the standard deviation of the mixture component using the second threshold. Otherwise, an adjustment is performed of the standard deviation of the mixture component using the calculated gradient adjustment.

Type: Grant

Filed: September 30, 2008

Date of Patent: March 2, 2010

Assignee: Nuance Communications, Inc.

Inventors: Chuang He, Jianxiong Wu, Vlad Sejnoha
System and method for speech separation and multi-talker speech recognition

Patent number: 7664643

Abstract: A method, and a system to execute this method is being presented for the identification and separation of sources of an acoustic signal, which signal contains a mixture of multiple simultaneous component signals. The method represents the signal with multiple discrete state-variable sequences and combines acoustic and context level dynamics to achieve the source separation. The method identifies sources by discovering those frames of the signal whose features are dominated by single sources. The signal may be the simultaneous speech of multiple speakers.

Type: Grant

Filed: August 25, 2006

Date of Patent: February 16, 2010

Assignees: Nuance Communications, Inc.

Inventors: Ramesh Ambat Gopinath, John Randall Hershey, Trausti Thor Kristjansson, Peder Andreas Olsen, Steven John Rennie
Speech recognition system and program thereof

Patent number: 7660717

Abstract: Speech recognition is performed by matching between a characteristic quantity of an inputted speech and a composite HMM obtained by synthesizing a speech HMM (hidden Markov model) and a noise HMM for each speech frame of the inputted speech by use of the composite HMM.

Type: Grant

Filed: January 9, 2008

Date of Patent: February 9, 2010

Assignee: Nuance Communications, Inc.

Inventors: Tetsuya Takiguchi, Masafumi Nishimura
Hidden conditional random field models for phonetic classification and speech recognition

Patent number: 7627473

Abstract: A method and apparatus are provided for training and using a hidden conditional random field model for speech recognition and phonetic classification. The hidden conditional random field model uses feature functions, at least one of which is based on a hidden state in a phonetic unit. Values for the feature functions are determined from a segment of speech, and these values are used to identify a phonetic unit for the segment of speech.

Type: Grant

Filed: October 15, 2004

Date of Patent: December 1, 2009

Assignee: Microsoft Corporation

Inventors: Asela J. Gunawardana, Milind Mahajan, Alejandro Acero
Adapter for allowing both online and offline training of a text to text system

Patent number: 7624020

Abstract: An adapter for a text to text training. A main corpus is used for training, and a domain specific corpus is used to adapt the main corpus according to the training information in the domain specific corpus. The adaptation is carried out using a technique that may be faster than the main training. The parameter set from the main training is adapted using the domain specific part.

Type: Grant

Filed: September 9, 2005

Date of Patent: November 24, 2009

Assignee: Language Weaver, Inc.

Inventors: Kenji Yamada, Kevin Knight, Greg Langmead
Standard-model generation for speech recognition using a reference model

Patent number: 7603276

Abstract: A standard model creating apparatus which provides a high-precision standard model used for pattern recognition such as speech recognition, character recognition, or image recognition using a probability model based on a hidden Markov model, Bayesian theory, or linear discrimination analysis; intention interpretation using a probability model such as a Bayesian net; data-mining performed using a probability model; and so forth. The standard model creating apparatus includes a reference model preparing unit that prepares at least one reference model; a reference model storing unit that stores the reference model prepared by the reference model preparing unit; and a standard model creating unit that creates a standard model by calculating statistics of the standard model so as to maximize or locally maximize the probability or likelihood with respect to the reference model stored in the reference storing unit.

Type: Grant

Filed: November 18, 2003

Date of Patent: October 13, 2009

Assignee: Panasonic Corporation

Inventor: Shinichi Yoshizawa
Low memory decision tree

Patent number: 7574411

Abstract: Management of a low memory treelike data structure is shown. The method according to the invention comprises steps for creating a decision tree including a parent node and at least one leaf node, and steps for searching data from said nodes. The nodes of the decision tree are stored sequentially in such a manner that nodes follow the parent node in storage order, wherein the nodes refining the context of the searchable data can be reached without a link from their parent node. The method can preferably be utilized in speech-recognition systems, in text-to-phoneme mapping.

Type: Grant

Filed: April 29, 2004

Date of Patent: August 11, 2009

Assignee: Nokia Corporation

Inventors: Janne Suontausta, Jilei Tian
DISCRIMINATIVE TRAINING OF MULTI-STATE BARGE-IN MODELS FOR SPEECH PROCESSING

Publication number: 20090112595

Abstract: Disclosed are systems and methods for training a barge-in-model for speech processing in a spoken dialogue system comprising the steps of (1) receiving an input having at least one speech segment and at least one non-speech segment, (2) establishing a restriction of recognizing only speech states during speech segments of the input and non-speech states during non-speech segments of the input, (2) generating a hypothesis lattice by allowing any sequence of speech Hidden Markov Models (HMMs) and non-speech HMMs, (4) generating a reference lattice by only allowing speech HMMs for at least one speech segment and non-speech HMMs for at least one non-speech segment, wherein different iterations of training generates at least one different reference lattice and at least one reference transcription, and (5) employing the generated reference lattice as the barge-in-model for speech processing.

Type: Application

Filed: October 31, 2007

Publication date: April 30, 2009

Applicant: AT&T Labs

Inventor: Andrej Ljolje
Method of refining statistical pattern recognition models and statistical pattern recognizers

Patent number: 7509259

Abstract: A device (800) performs statistical pattern recognition using model parameters that are refined by optimizing an objective function that includes a term for many items of training data for which recognition errors occur wherein each term depends on a relative magnitude of a first score for a recognition result for an item of training data and a second score calculated by evaluating a statistical pattern recognition model identified by a transcribed identity of the training data item with feature vectors extracted from the item of training data. The objective function does not include terms for items of training data for which there is a gross discrepancy between a transcribed identity and a recognized identity. Gross discrepancies can be detected by probability score or pattern identity comparisons. Terms, of the objective function are weighted based on the type of recognition error and weights can be increased for high priority patterns.

Type: Grant

Filed: December 21, 2004

Date of Patent: March 24, 2009

Assignee: Motorola, Inc.

Inventor: Jianming J. Song
Adaptation of compressed acoustic models

Patent number: 7499857

Abstract: The present invention is used to adapt acoustic models, quantized in subspaces, using adaptation training data (such as speaker-dependent training data). The acoustic model is compressed into multi-dimensional subspaces. A codebook is generated for each subspace. An adaptation transform is estimated, and it is applied to codewords in the codebooks, rather than to the means themselves.

Type: Grant

Filed: May 15, 2003

Date of Patent: March 3, 2009

Assignee: Microsoft Corporation

Inventor: Asela J. Gunawardana
Discriminative Training of Hidden Markov Models for Continuous Speech Recognition

Publication number: 20090055182

Abstract: Methods are given for improving discriminative training of hidden Markov models for continuous speech recognition. For a mixture component of a hidden Markov model state, a gradient adjustment is calculated of the standard deviation of the mixture component. If the calculated gradient adjustment is greater than a first threshold amount, an adjustment is performed of the standard deviation of the mixture component using the first threshold. If the calculated gradient adjustment is less than a second threshold amount, an adjustment is performed of the standard deviation of the mixture component using the second threshold. Otherwise, an adjustment is performed of the standard deviation of the mixture component using the calculated gradient adjustment.

Type: Application

Filed: September 30, 2008

Publication date: February 26, 2009

Applicant: NUANCE COMMUNICATIONS, INC.

Inventors: Chuang He, Jianxiong Wu, Vlad Sejnoha
Methods and apparatus for statistical biometric model migration

Patent number: 7496509

Abstract: In large-scale deployments of speaker recognition systems the potential for legacy problems increases as the evolving technology may require configuration changes in the system thus invalidating already existing user voice accounts. Unless the entire database of original speech waveform were stored, users need to reenroll to keep their accounts functional, which, however, may be expensive and commercially not acceptable. Model migration is defined as a conversion of obsolete models to new-configuration models without additional data and waveform requirements. The present disclosure investigates ways to achieve such a migration with minimum loss of system accuracy.

Type: Grant

Filed: May 28, 2004

Date of Patent: February 24, 2009

Assignee: International Business Machines Corporation

Inventors: Jiri Navratil, Ganesh N. Ramaswamy, Ran D. Zilca
Method and system for tracking signal sources with wrapped-phase hidden markov models

Patent number: 7475014

Abstract: A method models trajectories of a signal source. Training signals generated by a signal source moving along known trajectories are acquired by each sensor in an array of sensors. Phase differences between all unique pairs of the training signals are determined. A wrapped-phase hidden Markov model is constructed from the phase differences. The wrapped-phase hidden Markov model includes multiple Gaussian distributions to model the known trajectories of the signal source.

Type: Grant

Filed: July 25, 2005

Date of Patent: January 6, 2009

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Paris Smaragdis, Petros Boufounos
Method and system to scale down a decision tree-based hidden markov model (HMM) for speech recognition

Patent number: 7472064

Abstract: A method and system are provided in which a decision tree-based model (“general model”) is scaled down (“trim-down”) for a given task. The trim-down model can be adapted for the given task using task specific data. The general model can be based on a hidden markov model (HMM). By allowing a decision tree-based acoustic model (“general model”) to be scaled according to the vocabulary of the given task, the general model can be configured dynamically into a trim-down model, which can be used to improve speech recognition performance and reduce system resource utilization. Furthermore, the trim-down model can be adapted/adjusted according to task specific data, e.g., task vocabulary, model size, or other like task specific data.

Type: Grant

Filed: September 30, 2000

Date of Patent: December 30, 2008

Assignee: Intel Corporation

Inventors: Qing Guo, Yonghong Yan, Baosheng Yuan
Decoding multiple HMM sets using a single sentence grammar

Patent number: 7464033

Abstract: For a given sentence grammar, speech recognizers are often required to decode M sets of HMMs each of which models a specific acoustic environment. In order to match input acoustic observations to each of the environments, typically recognition search methods require a network of M sub-networks. A new speech recognition search method is described here, which needs a network that is only the size of a single sub-network and yet provides the same recognition performance, thus reducing the memory requirements for network storage by (M-1)/M.

Type: Grant

Filed: February 4, 2005

Date of Patent: December 9, 2008

Assignee: Texas Instruments Incorporated

Inventor: Yifan Gong
Method, apparatus, and system for building a compact model for large vocabulary continuous speech recognition (LVCSR) system

Patent number: 7454341

Abstract: According to one aspect of the invention, a method is provided in which a mean vector set and a variance vector set of a set of N Gaussians are divided into multiple mean sub-vector sets and variance sub-vector sets, respectively. Each mean sub-vector set contains a subset of the dimensions of the corresponding mean vector set and each variance sub-vector set contains a subset of the dimensions of the corresponding variance vector set. Each resultant sub-vector set is clustered to build a codebook for the respective sub-vector set using a modified K-means clustering process which dynamically merges and splits clusters based upon the size and average distortion of each cluster during each iteration in the modified K-means clustering process.

Type: Grant

Filed: September 30, 2000

Date of Patent: November 18, 2008

Assignee: Intel Corporation

Inventors: Jielin Pan, Baosheng Yuan
Speech recognition apparatus

Patent number: 7437288

Abstract: A speech recognition apparatus using a probability model that employs a mixed distribution, the apparatus formed by a standard pattern storage means for storing a standard pattern; a recognition means for outputting recognition results corresponding to an input speech by using the standard pattern; a standard pattern generating means for inputting learning speech and generating the standard pattern; and a standard pattern adjustment means, provided between the standard pattern generating means and the standard pattern storage means, for adjusting the number of element distributions of the mixed distribution of the standard pattern.

Type: Grant

Filed: March 11, 2002

Date of Patent: October 14, 2008

Assignee: NEC Corporation

Inventor: Koichi Shinoda
Systems and methods for classifying audio into broad phoneme classes

Patent number: 7424427

Abstract: An audio classification system classifies sounds in an audio stream as belonging to one of a relatively small number of classes. The audio classification system includes a signal analysis component [301] and a decoder [302]. The decoder [302] includes a number of models [310-316] for performing the audio classifications. In one implementation, the possible classifications include: vowels, fricatives, narrowband, wideband, coughing, gender, and silence. The classified audio may be used to enhance speech recognition of the audio stream.

Type: Grant

Filed: October 16, 2003

Date of Patent: September 9, 2008

Assignees: Verizon Corporate Services Group Inc., BBN Technologies Corp.

Inventors: Daben Liu, Francis G. Kubala
Speech recognition system and program thereof

Patent number: 7403896

Abstract: Speech recognition is performed by matching between a characteristic quantity of an inputted speech and a composite HMM obtained by synthesizing a speech HMM (hidden Markov model) and a noise HMM for each speech frame of the inputted speech by use of the composite HMM.

Type: Grant

Filed: March 14, 2003

Date of Patent: July 22, 2008

Assignee: International Business Machines Corporation

Inventors: Tetsuya Takiguchi, Masafumi Nishimura
System and methods for accent classification and adaptation

Publication number: 20080147404

Abstract: Speech is processed that may be colored by speech accent. A method for recognizing speech includes maintaining a model of speech accent that is established based on training speech data, wherein the training speech data includes at least a first set of training speech data, and wherein establishing the model of speech accent includes not using any phone or phone-class transcription of the first set of training speech data. Related systems are also presented. A system for recognizing speech includes an accent identification module that is configured to identify accent of the speech to be recognized; and a recognizer that is configured to use models to recognize the speech to be recognized, wherein the models include at least an acoustic model that has been adapted for the identified accent using training speech data of a language, other than primary language of the speech to be recognized, that is associated with the identified accent. Related methods are also presented.

Type: Application

Filed: May 15, 2001

Publication date: June 19, 2008

Applicant: NuSuara Technologies SDN BHD

Inventors: Wai Kat Liu, Pascale Fung
System and method for cantonese speech recognition using an optimized phone set

Patent number: 7353172

Abstract: The present invention comprises a system and method for implementing a Cantonese speech recognizer with an optimized phone set, and may include a recognizer configured to compare input speech data to phone strings from a vocabulary dictionary that is implemented according to an optimized Cantonese phone set. The optimized Cantonese phone set may be implemented with a phonetic technique to separately include consonantal phones and vocalic phones. For reasons of system efficiency, the optimized Cantonese phone set may preferably be implemented in a compact manner to include only a minimum required number of consonantal phones and vocalic phones to accurately represent Cantonese speech during the speech recognition procedure.

Type: Grant

Filed: March 24, 2003

Date of Patent: April 1, 2008

Assignees: Sony Corporation, Sony Electronics Inc.

Inventors: Michael Emonts, Xavier Menendez-Pidal, Lex Olorenshaw
System and method for Mandarin Chinese speech recognition using an optimized phone set

Patent number: 7353173

Abstract: The present invention comprises a system and method for implementing a Mandarin Chinese speech recognizer with an optimized phone set, and may include a recognizer configured to compare input speech data to phone strings from a vocabulary dictionary that is implemented according to an optimized Mandarin Chinese phone set. The optimized Mandarin Chinese phone set may be implemented with a phonetic technique to separately include consonantal phones and vocalic phones. For reasons of system efficiency, the optimized Mandarin Chinese phone set may preferably be implemented in a compact manner to include only a minimum required number of consonantal phones and vocalic phones to accurately represent Mandarin Chinese speech during the speech recognition procedure.

Type: Grant

Filed: March 31, 2003

Date of Patent: April 1, 2008

Assignees: Sony Corporation, Sony Electronics Inc.

Inventors: Xavier Menendez-Pidal, Lei Duan, Jingwen Lu, Lex Olorenshaw
System and method for effectively implementing a Mandarin Chinese speech recognition dictionary

Patent number: 7353174

Abstract: The present invention comprises a system and method for effectively implementing a Mandarin Chinese speech recognition dictionary, and may include a recognizer configured to compare input speech data to phone strings from a vocabulary dictionary that is implemented according to an optimized Mandarin Chinese phone set. The optimized Mandarin Chinese phone set may efficiently be implemented by utilizing an allophone and phonemic variation technique. In addition, the foregoing vocabulary dictionary may be implemented by utilizing unified dictionary optimization techniques to provide robust and accurate speech recognition. Furthermore, the vocabulary dictionary may be implemented as an optimized dictionary to accurately recognize either Northern Mandarin Chinese speech or Southern Mandarin Chinese speech during the speech recognition procedure.

Type: Grant

Filed: March 31, 2003

Date of Patent: April 1, 2008

Assignees: Sony Corporation, Sony Electronics Inc.

Inventors: Xavier Menendez-Pidal, Lei Duan, Jingwen Lu, Lex Olorenshaw
Method and apparatus for training an automated speech recognition-based system

Patent number: 7346507

Abstract: A method and apparatus for building a training set for an automated speech recognition-based system, which determines the statistically optimal number of frequently requested responses to automate in order to achieve a desired automation rate. The invention may be used to select the appropriate tokens and responses to train the system and to achieve a desired “phrase coverage” for all of the many different ways human beings may phrase a request that calls for one of a plurality of frequently-requested responses. The invention also determines the statistically optimal number of tokens (spoken requests) required to train a speech recognition-based system to achieve the desired phrase coverage and optimal allocation of tokens over the set of responses that are to be automated.

Type: Grant

Filed: June 4, 2003

Date of Patent: March 18, 2008

Assignee: BBN Technologies Corp.

Inventors: Premkumar Natarajan, Rohit Prasad
Unsupervised learning of video structures in videos using hierarchical statistical models to detect events

Patent number: 7313269

Abstract: A method learns a structure of a video, in an unsupervised setting, to detect events in the video consistent with the structure. Sets of features are selected from the video. Based on the selected features, a hierarchical statistical model is updated, and an information gain of the hierarchical statistical model is evaluated. Redundant features are then filtered, and the hierarchical statistical model is updated, based on the filtered features. A Bayesian information criteria is applied to each model and feature set pair, which can then be rank ordered according to the criteria to detect the events in the video.

Type: Grant

Filed: December 12, 2003

Date of Patent: December 25, 2007

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Lexing Xie, Ajay Divakaran, Shih-Fu Chang
Object activity modeling method

Patent number: 7308030

Abstract: An object activity modeling method which can efficiently model complex objects such as a human body is provided. The object activity modeling method includes the steps of (a) obtaining an optical flow vector from a video sequence; (b) obtaining the probability distribution of the feature vector for a plurality of video frames, using the optical flow vector; (c) modeling states, using the probability distribution of the feature vector; and (d) expressing the activity of the object in the video sequence based on state transition. According to the modeling method, in video indexing and recognition field, complex activities such as human activities can be efficiently modeled and recognized without segmenting objects.

Type: Grant

Filed: April 12, 2005

Date of Patent: December 11, 2007

Assignees: Samsung Electronics Co., Ltd., The Regents of the University of California

Inventors: Yang-lim Choi, Yun-ju Yu, Bangalore S. Manjunath, Xinding Sun, Ching-wei Chen
Decoding multiple HMM sets using a single sentence grammar

Patent number: 7269558

Abstract: For a given sentence grammar, speech recognizers are often required to decode M sets of HMMs each of which models a specific acoustic environment. In order to match input acoustic observations to each of the environments, typically recognition search methods require a network of M sub-networks. A new speech recognition search method is described here, which needs that is only the size of a single sub-network and yet gives the same recognition performance, thus reducing memory requirement for network storage by (M?1)/M.

Type: Grant

Filed: July 26, 2001

Date of Patent: September 11, 2007

Assignee: Texas Instruments Incorporated

Inventor: Yifan Gong
Speech recognition system trained with regional speech characteristics

Patent number: 7225125

Abstract: A speech recognition system uses speech recognition models which are specifically trained and optimized for users residing in a particular geographic area or region. The speech models are trained with samples of word variants expected to be used in a natural language by representative members of a population associated with the geographic region or community of users. The speech recognition system is configured to have a real-time response that imitates a dialogue with a human operator.

Type: Grant

Filed: January 7, 2005

Date of Patent: May 29, 2007

Assignee: Phoenix Solutions, Inc.

Inventors: Ian M. Bennett, Bandi Ramesh Babu, Kishor Morkhandikar, Pallaki Gururaj
Factorial hidden markov model for audiovisual speech recognition

Patent number: 7209883

Abstract: A speech recognition method includes use of synchronous or asynchronous audio and a video data to enhance speech recognition probabilities. A two stream factorial hidden Markov model is trained and used to identify speech. At least one stream is derived from audio data and a second stream is derived from mouth pattern data. Gestural or other suitable data streams can optionally be combined to reduce speech recognition error rates in noisy environments.

Type: Grant

Filed: May 9, 2002

Date of Patent: April 24, 2007

Assignee: Intel Corporation

Inventor: Ara V. Nefian
Preparing acoustic models by sufficient statistics and noise-superimposed speech data

Patent number: 7209881

Abstract: Noise-superimposed speech data is grouped according to acoustic similarity, and sufficient statistics are prepared using the speech data in each of the groups. A group acoustically similar to voice data of a user of the speech recognition is selected, and sufficient statistics acoustically similar to the user's voice data are selected from the sufficient statistics in the selected group. Using the selected sufficient statistics, an acoustic model is prepared.

Type: Grant

Filed: December 18, 2002

Date of Patent: April 24, 2007

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Shinichi Yoshizawa, Kiyohiro Shikano
System and method for automatic semantic coding of free response data using Hidden Markov Model methodology

Patent number: 7188064

Abstract: A system and method for coding text data wherein a first group of text data is coded using a Viterbi algorithm using a Hidden Markov model. The Hidden Markov Model computes a probable coding responsive to the first group of text data. A second group of text data is coded using the Viterbi algorithm using a corrected Hidden Markov Model. The Hidden Markov Model is based upon the coding of the first group of text data. Coding the first group of text data includes assigning word concepts to groups of at least one word in the first group of text data and assigning propositions to groups of the assigned word concepts.

Type: Grant

Filed: April 12, 2002

Date of Patent: March 6, 2007

Assignee: University of Texas System Board of Regents

Inventors: Richard M. Golden, Michael Arthur Durbin, Jason Warner Earwood
Coupled hidden Markov model for audiovisual speech recognition

Patent number: 7165029

Abstract: A speech recognition method includes use of synchronous or asynchronous audio and a video data to enhance speech recognition probabilities. A two stream coupled hidden Markov model is trained and used to identify speech. At least one stream is derived from audio data and a second stream is derived from mouth pattern data. Gestural or other suitable data streams can optionally be combined to reduce speech recognition error rates in noisy environments.

Type: Grant

Filed: May 9, 2002

Date of Patent: January 16, 2007

Assignee: Intel Corporation

Inventor: Ara V. Nefian
Method and apparatus for predicting word error rates from text

Patent number: 7103544

Abstract: A method of modeling a speech recognition system includes decoding a speech signal produced from a training text to produce a sequence of predicted speech units. The training text comprises a sequence of actual speech units that is used with the sequence of predicted speech units to form a confusion model. In further embodiments, the confusion model is used to decode a text to identify an error rate that would be expected if the speech recognition system decoded speech based on the text.

Type: Grant

Filed: June 6, 2005

Date of Patent: September 5, 2006

Assignee: Microsoft Corporation

Inventors: Milind Mahajan, Yonggang Deng, Alejandro Acero, Asela J. R. Gunawardana, Ciprian Chelba
Accumulating transformations for hierarchical linear regression HMM adaptation

Patent number: 7089183

Abstract: A new iterative hierarchical linear regression method for generating a set of linear transforms to adapt HMM speech models to a new environment for improved speech recognition is disclosed. The method determines a new set of linear transforms at an iterative step by Estimate-Maximize (EM) estimation, and then combines the new set of linear transforms with the prior set of linear transforms to form a new merged set of linear transforms. An iterative step may include realignment of adaptation speech data to the adapted HMM models to further improve speech recognition performance.

Type: Grant

Filed: June 22, 2001

Date of Patent: August 8, 2006

Assignee: Texas Instruments Incorporated

Inventor: Yifan Gong

prev 1 2