Hidden Markov Model (hmm) (epo) Patents (Class 704/256.1)

Training of hmm (epo) (Class 704/256.2)

With insufficient amount of training data, e.g., state sharing, tying, deleted interpolation (EPO) (Class 704/256.3)

Duration modeling in hmm, e.g., semi hmm, segmental models, transition probabilities (epo) (Class 704/256.4)

Hidden markov (hm) network (epo) (Class 704/256.5)

State emission probability (epo) (Class 704/256.6)

Continuous density, e.g, Gaussian distribution, Lapalce (EPO) (Class 704/256.7)
Discrete density, e.g., Vector Quantization preprocessor, look up tables (EPO) (Class 704/256.8)

SPEECH RECOGNITION METHOD FOR ROBOT

Publication number: 20120130716

Abstract: A speech recognition method for a robot. The speech recognition method for the robot includes one fundamental acoustic model. Whenever the noisy environment and the speaker are changed, the speech recognition method generates a plurality of parallel acoustic models in which the characteristic for each noisy environment and the characteristic for each speaker are reflected. As a result, the speech recognition method for the robot can freely recognize one of several acoustic models according to individual environments and speakers, such that it can basically remove mismatch between the model training environment and the test environment, thereby improving speech recognition capabilities.

Type: Application

Filed: November 17, 2011

Publication date: May 24, 2012

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventor: Ki Beom KIM
Piecewise-based variable-parameter Hidden Markov Models and the training thereof

Patent number: 8160878

Abstract: A speech recognition system uses Gaussian mixture variable-parameter hidden Markov models (VPHMMs) to recognize speech under many different conditions. Each Gaussian mixture component of the VPHMMs is characterized by a mean parameter ? and a variance parameter ?. Each of these Gaussian parameters varies as a function of at least one environmental conditioning parameter, such as, but not limited to, instantaneous signal-to-noise-ratio (SNR). The way in which a Gaussian parameter varies with the environmental conditioning parameter(s) can be approximated as a piecewise function, such as a cubic spline function. Further, the recognition system formulates the mean parameter ? and the variance parameter ? of each Gaussian mixture component in an efficient form that accommodates the use of discriminative training and parameter sharing. Parameter sharing is carried out so that the otherwise very large number of parameters in the VPHMMs can be effectively reduced with practically feasible amounts of training data.

Type: Grant

Filed: September 16, 2008

Date of Patent: April 17, 2012

Assignee: Microsoft Corporation

Inventors: Dong Yu, Li Deng, Yifan Gong, Alejandro Acero
User intention based on N-best list of recognition hypotheses for utterances in a dialog

Patent number: 8140328

Abstract: Disclosed herein are systems, computer-implemented methods, and tangible computer-readable media for using alternate recognition hypotheses to improve whole-dialog understanding accuracy. The method includes receiving an utterance as part of a user dialog, generating an N-best list of recognition hypotheses for the user dialog turn, selecting an underlying user intention based on a belief distribution across the generated N-best list and at least one contextually similar N-best list, and responding to the user based on the selected underlying user intention. Selecting an intention can further be based on confidence scores associated with recognition hypotheses in the generated N-best lists, and also on the probability of a user's action given their underlying intention. A belief or cumulative confidence score can be assigned to each inferred user intention.

Type: Grant

Filed: December 1, 2008

Date of Patent: March 20, 2012

Assignee: AT&T Intellectual Property I, L.P.

Inventor: Jason Williams
Method and apparatus for automatically recognizing audio data

Patent number: 8140329

Abstract: A method and apparatus are proposed for automatically recognizing observed audio data. An observation vector is created of audio features extracted from the observed audio data and the observed audio data is recognized from the observation vector. The audio features include features are selected from a group of 3 types of features obtained from the observed audio data: (i) ICA features obtained by processing the observed audio data, (ii) first MFCC features obtained by removing a logarithm step from the conventional MFCC process, or (iii) second MFCC features obtained by applying the ICA process to results of a mel scale filter bank.

Type: Grant

Filed: April 5, 2004

Date of Patent: March 20, 2012

Assignee: Sony Corporation

Inventors: Jian Zhang, Wei Lu, Xiaobing Sun
DEEP BELIEF NETWORK FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION

Publication number: 20120065976

Abstract: A method is disclosed herein that includes an act of causing a processor to receive a sample, wherein the sample is one of spoken utterance, an online handwriting sample, or a moving image sample. The method also comprises the act of causing the processor to decode the sample based at least in part upon an output of a combination of a deep structure and a context-dependent Hidden Markov Model (HMM), wherein the deep structure is configured to output a posterior probability of a context-dependent unit. The deep structure is a Deep Belief Network consisting of many layers of nonlinear units with connecting weights between layers trained by a pretraining step followed by a fine-tuning step.

Type: Application

Filed: September 15, 2010

Publication date: March 15, 2012

Applicant: Microsoft Corporation

Inventors: Li Deng, Dong Yu, George Edward Dahl
Radar Microphone Speech Recognition

Publication number: 20120059657

Abstract: A method for detecting and recognizing speech is provided that remotely detects body motions from a speaker during vocalization with one or more radar sensors. Specifically, the radar sensors include a transmit aperture that transmits one or more waveforms towards the speaker, and each of the waveforms has a distinct wavelength. A receiver aperture is configured to receive the scattered radio frequency energy from the speaker. Doppler signals correlated with the speaker vocalization are extracted with a receiver. Digital signal processors are configured to develop feature vectors utilizing the vocalization Doppler signals, and words associated with the feature vectors are recognized with a word classifier.

Type: Application

Filed: June 7, 2011

Publication date: March 8, 2012

Inventors: Jefferson M. Willey, Todd Stephenson, Hugh Faust, James P. Hansen, George J. Linde, Carol Chang, Justin Nevitt, James A. Ballas, Thomas Herne Crystal, Vincent Michael Stanford, Jean W. de Graaf
SPEECH PROCESSING SYSTEM AND METHOD

Publication number: 20120041764

Abstract: A speech processing method, comprising: receiving a speech input which comprises a sequence of feature vectors; determining the likelihood of a sequence of words arising from the sequence of feature vectors using an acoustic model and a language model, comprising: providing an acoustic model for performing speech recognition on an input signal which comprises a sequence of feature vectors, said model having a plurality of model parameters relating to the probability distribution of a word or part thereof being related to a feature vector, wherein said speech input is a mismatched speech input which is received from a speaker in an environment which is not matched to the speaker or environment under which the acoustic model was trained; and adapting the acoustic model to the mismatched speech input, the speech processing method further comprising determining the likelihood of a sequence of features occurring in a given language using a language model; and combining the likelihoods determined by the acoustic

Type: Application

Filed: August 10, 2011

Publication date: February 16, 2012

Applicant: KABUSHIKI KAISHA TOSHIBA

Inventors: Haitian XU, Kean Kheong Chin, Mark John Francis Gales
System and method for text tagging and segmentation using a generative/discriminative hybrid hidden markov model

Patent number: 8086443

Abstract: A method for sequence tagging medical patient records includes providing a labeled corpus of sentences taken from a set of medical records, initializing generative parameters ? and discriminative parameters {tilde over (?)}, providing a functional LL?C×Penalty, where LL is a log-likelihood function LL = log ? ? p ? ( ? , ? ~ ) + ? l = 1 M ? ? [ log ? ? p ? ( X l , Y l | ? ~ ) - log ? ? p ? ( X l | ? ~ ) ] + ? l = 1 M ? ? log ? ? p ? ( X l | ? ) , ? Penalty = ? y ? V Y ? ( em y 2 + tr y 2 + e ? ? m ~ y 2 + t ? ? r ~ y 2 ) , where emy=1???xi?VXp(xi|y), e{tilde over (m)}y=1???xi?VX{tilde over (p)}(xi|y) are emission probability constraints, try=1???yi?VYp(yi|y), t{tilde over (r)}y=1???yi?VY{tilde over (p)}(yi|y) are transition probability constraints, and extracting gradients of LL?C×Penalty with respect to the transition and emission probabilities and solving ?k*,{tilde o

Type: Grant

Filed: August 21, 2008

Date of Patent: December 27, 2011

Assignee: Siemens Medical Solutions USA, Inc.

Inventors: Oksana Yakhnenko, Romer E. Rosales, Radu Stefan Niculescu, Lucian Vlad Lita
ROBUSTNESS TO ENVIRONMENTAL CHANGES OF A CONTEXT DEPENDENT SPEECH RECOGNIZER

Publication number: 20110288869

Abstract: An apparatus to improve robustness to environmental changes of a context dependent speech recognizer for an application, that includes a training database to store sounds for speech recognition training, a dictionary to store words supported by the speech recognizer, and a speech recognizer training module to train a set of one or more multiple state Hidden Markov Models (HMMs) with use of the training database and the dictionary. The speech recognizer training module performs a non-uniform state clustering process on each of the states of each HMM, which includes using a different non-uniform cluster threshold for at least some of the states of each HMM to more heavily cluster and correspondingly reduce a number of observation distributions for those of the states of each HMM that are less empirically affected by one or more contextual dependencies.

Type: Application

Filed: May 21, 2010

Publication date: November 24, 2011

Inventors: Xavier Menendez-Pidal, Ruxin Chen
Robust Speech Recognition

Publication number: 20110257976

Abstract: Speech recognition includes structured modeling, irrelevant variability normalization and unsupervised online adaptation of speech recognition parameters.

Type: Application

Filed: April 14, 2010

Publication date: October 20, 2011

Applicant: Microsoft Corporation

Inventor: Qiang Huo
Method of speaker adaptation for a hidden markov model based voice recognition system

Patent number: 8041567

Abstract: Commercially available voice recognition systems are generally speaker-dependent, with the voice recognition system first being trained to the voice of the speaker before it can be used. A disadvantage with this method is that modified reference data has to be buffered and permanently saved in several steps when the speaker adaptation algorithm is executed, and thus requires a lot of memory space. This primarily negatively affects applications on devices with restricted processor power and limited memory space, such as mobile radio terminals for example. A method of speaker adaptation for a Hidden Markov Model based voice recognition system may address these issues. In the method, the memory space requirement and thus also the processor power required can be considerably reduced. This is achieved by using modified reference data in a speaker adaptation algorithm to adapt a new speaker to a reference speaker. The modified reference data is processed in compressed form.

Type: Grant

Filed: September 22, 2005

Date of Patent: October 18, 2011

Assignee: Siemens Aktiengesellschaft

Inventors: Sergey Astrov, Josef Bauer
Adding prototype information into probabilistic models

Patent number: 8010341

Abstract: Mechanisms are disclosed for incorporating prototype information into probabilistic models for automated information processing, mining, and knowledge discovery. Examples of these models include Hidden Markov Models (HMMs), Latent Dirichlet Allocation (LDA) models, and the like. The prototype information injects prior knowledge to such models, thereby rendering them more accurate, effective, and efficient. For instance, in the context of automated word labeling, additional knowledge is encoded into the models by providing a small set of prototypical words for each possible label. The net result is that words in a given corpus are labeled and are therefore in condition to be summarized, identified, classified, clustered, and the like.

Type: Grant

Filed: September 13, 2007

Date of Patent: August 30, 2011

Assignee: Microsoft Corporation

Inventors: Kannan Achan, Moises Goldszmidt, Lev Ratinov
CONCISE DYNAMIC GRAMMARS USING N-BEST SELECTION

Publication number: 20110202343

Abstract: A method and apparatus derive a dynamic grammar composed of a subset of a plurality of data elements that are each associated with one of a plurality of reference identifiers. The present invention generates a set of selection identifiers on the basis of a user-provided first input identifier and determines which of these selection identifiers are present in a set of pre-stored reference identifiers. The present invention creates a dynamic grammar that includes those data elements that are associated with those reference identifiers that are matched to any of the selection identifiers. Based on a user-provided second identifier and on the data elements of the dynamic grammar, the present invention selects one of the reference identifiers in the dynamic grammar.

Type: Application

Filed: April 28, 2011

Publication date: August 18, 2011

Applicant: AT&T Intellectual Property I, L.P.

Inventors: Deborah W. Brown, Randy G. Goldberg, Stephen Michael Marcus, Richard R. Rosinski
Method for uncovering hidden Markov models

Patent number: 7912717

Abstract: The invention uses the ModelGrower program to generate possible candidates from an original or aggregated model. An isomorphic reduction program operates on the candidates to identify and exclude isomorphic models. A Markov model evaluation and optimization program operates on the remaining non-isomorphic candidates. The candidates are optimized and the ones that most closely conform to the data are kept. The best optimized candidate of one stage becomes the starting candidate for the next stage where ModelGrower and the other programs operate on the optimized candidate to generate a new optimized candidate. The invention repeats the steps of growing, excluding isomorphs, evaluating and optimizing until such repetitions yield no significantly better results.

Type: Grant

Filed: November 18, 2005

Date of Patent: March 22, 2011

Inventor: Albert Galick
Device and method of modeling acoustic characteristics with HMM and collating the same with a voice characteristic vector sequence

Patent number: 7895040

Abstract: According to an embodiment, voice recognition apparatus includes units of: acoustic processing, voice interval detecting, dictionary, collating, search target selecting, storing and determining, and voice recognition method includes processes of: selecting a search range on basis of a beam search, setting and storing a standard frame, storing an output probability of a certain transition path, determining whether or not the output probability of a certain path is stored. Number of times of calculation of the output probability is reduced by selecting the search range on basis of the beam search, calculating the output probability of the certain transition path only once in an interval from when the standard frame is set to when the standard frame is renewed, and storing and using thus calculated value as an approximate value of the output probability in subsequent frames.

Type: Grant

Filed: March 30, 2007

Date of Patent: February 22, 2011

Assignee: Kabushiki Kaisha Toshiba

Inventors: Masaru Sakai, Shinichi Tanaka
Device and method for assessing a quality class of an object to be tested

Patent number: 7873518

Abstract: A device for assessing a quality class of an object to be tested includes a unit for detecting a test signal from the object to be tested. Furthermore, the device for assessing includes a unit for providing a stochastic Markov model including states and transitions between states on the basis of reference measurements of objects of known quality classes, and a unit for evaluating the test signal using the stochastic Markov model. In addition, the device for assessing includes a unit for associating the object to be tested with a quality class based on the evaluation of the test signal. Such a device has the advantage to be able to perform a more precise association of an object to be tested with a quality class as compared to prior art.

Type: Grant

Filed: November 10, 2006

Date of Patent: January 18, 2011

Assignees: Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V., Technische Universitaet Dresden

Inventors: Dieter Hentschel, Constanze Tschoepe, Ruediger Hoffmann, Matthias Eichner, Matthias Wolff
Model-based comparative measure for vector sequences and word spotting using same

Publication number: 20100191532

Abstract: An object comparison method comprises: generating a first ordered vector sequence representation of a first object; generating a second ordered vector sequence representation of a second object; representing the first object by a first ordered sequence of model parameters generated by modeling the first ordered vector sequence representation using a semi-continuous hidden Markov model employing a universal basis; representing the second object by a second ordered sequence of model parameters generated by modeling the second ordered vector sequence representation using a semi-continuous hidden Markov model employing the universal basis; and comparing the first and second ordered sequences of model parameters to generate a quantitative comparison measure.

Type: Application

Filed: January 28, 2009

Publication date: July 29, 2010

Applicant: Xerox Corporation

Inventors: Jose A. Rodriguez Serrano, Florent C. Perronnin
DEALING WITH SWITCH LATENCY IN SPEECH RECOGNITION

Publication number: 20100185448

Abstract: In embodiments of the present invention improved capabilities are described for interacting with a mobile communication facility comprising receiving a switch activation from a user to initiate a speech recognition recording session, wherein the speech recognition recording session comprises a voice command from the user followed by the speech to be recognized from the user; recording the speech recognition recording session using a mobile communication facility resident capture facility; recognizing at least a portion of the voice command as an indication that user speech for recognition will begin following the end of the at least a portion of the voice command; recognizing the recorded speech using a speech recognition facility to produce an external output; and using the selected output to perform a function on the mobile communication facility.

Type: Application

Filed: January 21, 2010

Publication date: July 22, 2010

Inventor: William S. Meisel
SPEAKER ADAPTATION APPARATUS AND PROGRAM THEREOF

Publication number: 20100169094

Abstract: A speaker adaptation apparatus includes an acquiring unit configured to acquire an acoustic model including HMMs and decision trees for estimating what type of the phoneme or the word is included in a feature value used for speech recognition, the HMMs having a plurality of states on a phoneme-to-phoneme basis or a word-to-word basis, and the decision trees being configured to reply to questions relating to the feature value and output likelihoods in the respective states of the HMMs, and a speaker adaptation unit configured to adapt the decision trees to a speaker, the decision trees being adapted using speaker adaptation data vocalized by the speaker of an input speech.

Type: Application

Filed: September 17, 2009

Publication date: July 1, 2010

Applicant: KABUSHIKI KAISHA TOSHIBA

Inventors: Masami Akamine, Jitendra Ajmera, Partha Lal
Systems and Methods for Assessment of Non-Native Spontaneous Speech

Publication number: 20100145698

Abstract: Computer-implemented systems and methods are provided for assessing non-native spontaneous speech pronunciation. Speech recognition on digitized speech is performed using a non-native acoustic model trained with non-native speech to generate word hypotheses for the digitized speech. Time alignment is performed between the digitized speech and the word hypotheses using a reference acoustic model trained with native-quality speech. Statistics are calculated regarding individual words and phonemes in the word hypotheses based on the alignment. A plurality of features for use in assessing pronunciation of the speech are calculated based on the statistics, an assessment score is calculated based on one or more of the calculated features, and the assessment score is stored in a computer-readable memory.

Type: Application

Filed: December 1, 2009

Publication date: June 10, 2010

Applicant: Educational Testing Service

Inventors: Lei Chen, Klaus Zechner, Xiaoming Xi
SPEECH CLASSIFICATION APPARATUS, SPEECH CLASSIFICATION METHOD, AND SPEECH CLASSIFICATION PROGRAM

Publication number: 20100138223

Abstract: An object of the present invention is to allow classification of sequentially input speech signals with good accuracy based on similarity of speakers and environments by using a realistic memory use amount, a realistic processing speed, and an on-line operation. A speech classification probability calculation means 103 calculates a probability (probability of classification into each cluster) that a latest one of the speech signals (speech data) belongs to each cluster based on a generative model which is a probability model. A parameter updating means 107 successively estimates parameters that define the generative model based on the probability of classification of the speech data into each cluster calculated by the speech classification probability calculation means 103 (in FIG. 1).

Type: Application

Filed: March 13, 2008

Publication date: June 3, 2010

Inventor: Takafumi Koshinaka
Gene expression programming based on Hidden Markov Models

Patent number: 7725409

Abstract: Computer programs (600, 700, 800, 900, 1000) and a programmed computer (1100) for automatically generating computer programs (i.e. sequences of instructions) are provided. The computer programs (600, 700, 800, 900, 1000) use Hidden Markov Models (400, 500) to generate sequences of program tokens, e.g., Gene Expression Programming chromosomes (100). Parameters of the Hidden Markov Models (400, 500) are numerically optimized, for example, by Differential Evolution with a goal of increasing the fitness of automatically generated programs.

Type: Grant

Filed: June 5, 2007

Date of Patent: May 25, 2010

Assignee: Motorola, Inc.

Inventors: Chi Zhou, Magdi A. Mohamed, Weimin Xiao
Speaker identification in the presence of packet losses

Patent number: 7720012

Abstract: A system, method, and apparatus for identifying a speaker of an utterance, particularly when the utterance has portions of it missing due to packet losses. Different packet loss models are applied to each speaker's training data in order to improve accuracy, especially for small packet sizes.

Type: Grant

Filed: July 11, 2005

Date of Patent: May 18, 2010

Assignee: Arrowhead Center, Inc.

Inventors: Deva K. Borah, Phillip De Leon
MELODIS CRYSTAL DECODER METHOD AND DEVICE

Publication number: 20100121643

Abstract: The technology disclosed relates to a system and method for fast, accurate and parallelizable speech search, called Crystal Decoder. It is particularly useful for search applications, as opposed to dictation. It can achieve both speed and accuracy, without sacrificing one for the other. It can search different variations of records in the reference database without a significant increase in elapsed processing time. Even the main decoding part can be parallelized as the number of words increase to maintain a fast response time.

Type: Application

Filed: November 2, 2009

Publication date: May 13, 2010

Applicant: Melodis Corporation

Inventors: Keyvan Mohajer, Seyed Majid Emami, Jon Grossman, Joe Kyaw Soe Aung, Sina Sohangir
Identification and rejection of meaningless input during natural language classification

Patent number: 7707027

Abstract: A method for identifying data that is meaningless and generating a natural language statistical model which can reject meaningless input. The method can include identifying unigrams that are individually meaningless from a set of training data. At least a portion of the unigrams identified as being meaningless can be assigned to a first n-gram class. The method also can include identifying bigrams that are entirely composed of meaningless unigrams and determining whether the identified bigrams are individually meaningless. At least a portion of the bigrams identified as being individually meaningless can be assigned to the first n-gram class.

Type: Grant

Filed: April 13, 2006

Date of Patent: April 27, 2010

Assignee: Nuance Communications, Inc.

Inventors: Rajesh Balchandran, Linda Boyer
METHOD AND APPARATUS FOR LOCATING SPEECH KEYWORD AND SPEECH RECOGNITION SYSTEM

Publication number: 20100094626

Abstract: It is an object of the present invention to provide a method and apparatus for locating a keyword of a speech and a speech recognition system. The method includes the steps of: by extracting feature parameters from frames constituting the recognition target speech, forming a feature parameter vector sequence that represents the recognition target speech; by normalizing of the feature parameter vector sequence with use of a codebook containing a plurality of codebook vectors, obtaining a feature trace of the recognition target speech in a vector space; and specifying the position of a keyword by matching prestored keyword template traces with the feature trace. According to the present invention, a keyword template trace and a feature space trace of a target speech are drawn in accordance with an identical codebook. This causes resampling to be unnecessary in performing linear movement matching of speech wave frames having similar phonological feature structures.

Type: Application

Filed: September 27, 2007

Publication date: April 15, 2010

Inventors: Fengqin Li, Yadong Wu, Qinqtao Yang, Chen Chen
PHASE SENSITIVE MODEL ADAPTATION FOR NOISY SPEECH RECOGNITION

Publication number: 20100076758

Abstract: A speech recognition system described herein includes a receiver component that receives a distorted speech utterance. The speech recognition also includes an updater component that is in communication with a first model and a second model, wherein the updater component automatically updates parameters of the second model based at least in part upon joint estimates of additive and convolutive distortions output by the first model, wherein the joint estimates of additive and convolutive distortions are estimates of distortions based on a phase-sensitive model in the speech utterance received by the receiver component. Further, distortions other than additive and convolutive distortions, including other stationary and nonstationary sources, can also be estimated used to update the parameters of the second model.

Type: Application

Filed: September 24, 2008

Publication date: March 25, 2010

Applicant: Microsoft Corporation

Inventors: Jinyu Li, Li Deng, Dong Yu, Yifan Gong, Alejandro Acero
Testing and tuning of automatic speech recognition systems using synthetic inputs generated from its acoustic models

Patent number: 7684988

Abstract: A system and method of testing and tuning a speech recognition system by providing pronunciations to the speech recognizer. First a text document is provided to the system and converted into a sequence of phonemes representative of the words in the text. The phonemes are then converted to model units, such as Hidden Markov Models. From the models a probability is obtained for each model or state, and feature vectors are determined. The feature vector matching the most probable vector for each state is selected for each model. These ideal feature vectors are provided to the speech recognizer, and processed. The end result is compared with the original text, and modifications to the system can be made based on the output text.

Type: Grant

Filed: October 15, 2004

Date of Patent: March 23, 2010

Assignee: Microsoft Corporation

Inventor: Ricardo Lopez Barquilla
PIECEWISE-BASED VARIABLE -PARAMETER HIDDEN MARKOV MODELS AND THE TRAINING THEREOF

Publication number: 20100070279

Abstract: A speech recognition system uses Gaussian mixture variable-parameter hidden Markov models (VPHMMs) to recognize speech under many different conditions. Each Gaussian mixture component of the VPHMMs is characterized by a mean parameter ? and a variance parameter ?. Each of these Gaussian parameters varies as a function of at least one environmental conditioning parameter, such as, but not limited to, instantaneous signal-to-noise-ratio (SNR). The way in which a Gaussian parameter varies with the environmental conditioning parameter(s) can be approximated as a piecewise function, such as a cubic spline function. Further, the recognition system formulates the mean parameter ? and the variance parameter ? of each Gaussian mixture component in an efficient form that accommodates the use of discriminative training and parameter sharing. Parameter sharing is carried out so that the otherwise very large number of parameters in the VPHMMs can be effectively reduced with practically feasible amounts of training data.

Type: Application

Filed: September 16, 2008

Publication date: March 18, 2010

Applicant: Microsoft Corporation

Inventors: Dong Yu, Li Deng, Yifan Gong, Alejandro Acero
APPARATUS AND METHOD FOR SPEECH RECOGNITION BASED ON SOUND SOURCE SEPARATION AND SOUND SOURCE IDENTIFICATION

Publication number: 20100070274

Abstract: An apparatus for a speech recognition based on source separation and identification includes: a sound source separator for separating mixed signals, which are input to two or more microphones, into sound source signals by using independent component analysis (ICA), and estimating direction information of the separated sound source signals; and a speech recognizer for calculating normalized log likelihood probabilities of the separated sound source signals. The apparatus further includes a speech signal identifier identifying a sound source corresponding to a user's speech signal by using both of the estimated direction information and the reliability information based on the normalized log likelihood probabilities.

Type: Application

Filed: July 7, 2009

Publication date: March 18, 2010

Applicant: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE

Inventors: Hoon-Young CHO, Sang Kyu Park, Jun Park, Seung Hi Kim, Ilbin Lee, Kyuwoong Hwang, Hyung-Bae Jeon, Yunkeun Lee
Method for Creating a Speech Model

Publication number: 20100070278

Abstract: A transformation can be derived which would represent that processing required to convert a male speech model to a female speech model. That transformation is subjected to a predetermined modification, and the modified transformation is applied to a female speech model to produce a synthetic children's speech model. The male and female models can be expressed in terms of a vector representing key values defining each speech model and the derived transformation can be in the form of a matrix that would transform the vector of the male model to the vector of the female model. The modification to the derived matrix comprises applying an exponential p which has a value greater than zero and less than 1.

Type: Application

Filed: September 12, 2008

Publication date: March 18, 2010

Inventors: Andreas Hagen, Bryan Peltom, Kadri Hacioglu
Speech Recognition

Publication number: 20100057462

Abstract: The present invention relates to a method for speech recognition of a speech signal comprising the steps of providing at least one codebook comprising codebook entries, in particular, multivariate Gaussians of feature vectors, that are frequency weighted such that higher weights are assigned to entries corresponding to frequencies below a predetermined level than to entries corresponding to frequencies above the predetermined level and processing the speech signal for speech recognition comprising extracting at least one feature vector from the speech signal and matching the feature vector with the entries of the codebook.

Type: Application

Filed: September 2, 2009

Publication date: March 4, 2010

Applicant: NUANCE COMMUNICATIONS, INC.

Inventors: Tobias Herbig, Martin Raab, Raymond Brueckner, Rainer Gruhn
Word spotting score normalization

Patent number: 7650282

Abstract: An approach to scoring acoustically-based events, such as hypothesized instances of keywords, in a speech processing system make use of scores of individual components of the event. Data characterizing an instance of an event are first accepted. This data includes a score for the event. The event is associated with a number of component events from a set of component events, such as a set of phonemes. Probability models are also accepted for component scores associated with each of the set of component events in each of two of more possible classes of the event, such as a class of true occurrences of the event and a class of false detections of the event. The event is then scored. This scoring includes computing a probability of one of the two or more possible classes for the event using the accepted probability models.

Type: Grant

Filed: July 22, 2004

Date of Patent: January 19, 2010

Assignee: Nexidia Inc.

Inventor: Robert W. Morris
Frame Erasure Concealment Technique for a Bitstream-Based Feature Extractor

Publication number: 20090326946

Abstract: A frame erasure concealment technique for a bitstream-based feature extractor in a speech recognition system particularly suited for use in a wireless communication system operates to “delete” each frame in which an erasure is declared. The deletions thus reduce the length of the observation sequence, but have been found to provide for sufficient speech recognition based on both single word and “string” tests of the deletion technique.

Type: Application

Filed: August 19, 2009

Publication date: December 31, 2009

Applicant: AT&T INTELLECTUAL PROPERTY II, L.P.

Inventors: Richard Vandervoort Cox, Hong Kook Kim
Automatic Segmentation in Speech Synthesis

Publication number: 20090313025

Abstract: A method and system are disclosed that automatically segment speech to generate a speech inventory. The method includes initializing a Hidden Markov Model (HMM) using seed input data, performing a segmentation of the HMM into speech units to generate phone labels, correcting the segmentation of the speech units. Correcting the segmentation of the speech units includes re-estimating the HMM based on a current version of the phone labels, embedded re-estimating of the HMM, and updating the current version of the phone labels using spectral boundary correction. The system includes modules configured to control a processor to perform steps of the method.

Type: Application

Filed: August 20, 2009

Publication date: December 17, 2009

Applicant: AT&T Corp.

Inventors: Alistair D. CONKIE, Yeon-Jun KIM
Hidden conditional random field models for phonetic classification and speech recognition

Patent number: 7627473

Abstract: A method and apparatus are provided for training and using a hidden conditional random field model for speech recognition and phonetic classification. The hidden conditional random field model uses feature functions, at least one of which is based on a hidden state in a phonetic unit. Values for the feature functions are determined from a segment of speech, and these values are used to identify a phonetic unit for the segment of speech.

Type: Grant

Filed: October 15, 2004

Date of Patent: December 1, 2009

Assignee: Microsoft Corporation

Inventors: Asela J. Gunawardana, Milind Mahajan, Alejandro Acero
Low memory decision tree

Patent number: 7574411

Abstract: Management of a low memory treelike data structure is shown. The method according to the invention comprises steps for creating a decision tree including a parent node and at least one leaf node, and steps for searching data from said nodes. The nodes of the decision tree are stored sequentially in such a manner that nodes follow the parent node in storage order, wherein the nodes refining the context of the searchable data can be reached without a link from their parent node. The method can preferably be utilized in speech-recognition systems, in text-to-phoneme mapping.

Type: Grant

Filed: April 29, 2004

Date of Patent: August 11, 2009

Assignee: Nokia Corporation

Inventors: Janne Suontausta, Jilei Tian
HIGH PERFORMANCE HMM ADAPTATION WITH JOINT COMPENSATION OF ADDITIVE AND CONVOLUTIVE DISTORTIONS

Publication number: 20090144059

Abstract: A method of compensating for additive and convolutive distortions applied to a signal indicative of an utterance is discussed. The method includes receiving a signal and initializing noise mean and channel mean vectors. Gaussian dependent matrix and Hidden Markov Model (HMM) parameters are calculated or updated to account for additive noise from the noise mean vector or convolutive distortion from the channel mean vector. The HMM parameters are adapted by decoding the utterance using the previously calculated HMM parameters and adjusting the Gaussian dependent matrix and the HMM parameters based upon data received during the decoding. The adapted HMM parameters are applied to decode the input utterance and provide a transcription of the utterance.

Type: Application

Filed: December 3, 2007

Publication date: June 4, 2009

Applicant: MICROSOFT CORPORATION

Inventors: Dong Yu, Li Deng, Alejandro Acero, Yifan Gong, Jinyu Li
Block synchronous decoding

Patent number: 7529671

Abstract: A pattern recognition system and method are provided. Aspects of the invention are particularly useful in combination with multi-state Hidden Markov Models. Pattern recognition is effected by processing Hidden Markov Model Blocks. This block-processing allows the processor to perform more operations upon data while such data is in cache memory. By so increasing cache locality, aspects of the invention provide significantly improved pattern recognition speed.

Type: Grant

Filed: March 4, 2003

Date of Patent: May 5, 2009

Assignee: Microsoft Corporation

Inventors: William H. Rockenbeck, Julian J. Odell
Adaptation of Compound Gaussian Mixture models

Patent number: 7523034

Abstract: Methods and arrangements for enhancing speech recognition in noisy environments, via providing at least one initial Compound Gaussian Mixture model, applying an adaptation algorithm to at least one item associated with speech enrollment data and to the at least one initial Compound Gaussian Mixture model to yield an intermediate output, and mathematically combining the at least one initial Compound Gaussian Mixture model with the intermediate output to yield an adapted Compound Gaussian Mixture model.

Type: Grant

Filed: December 13, 2002

Date of Patent: April 21, 2009

Assignee: International Business Machines Corporation

Inventors: Sabine V. Deligne, Satyanarayana Dharanipragada
ADDING PROTOTYPE INFORMATION INTO PROBABILISTIC MODELS

Publication number: 20090076794

Abstract: Mechanisms are disclosed for incorporating prototype information into probabilistic models for automated information processing, mining, and knowledge discovery. Examples of these models include Hidden Markov Models (HMMs), Latent Dirichlet Allocation (LDA) models, and the like. The prototype information injects prior knowledge to such models, thereby rendering them more accurate, effective, and efficient. For instance, in the context of automated word labeling, additional knowledge is encoded into the models by providing a small set of prototypical words for each possible label. The net result is that words in a given corpus are labeled and are therefore in condition to be summarized, identified, classified, clustered, and the like.

Type: Application

Filed: September 13, 2007

Publication date: March 19, 2009

Applicant: Microsoft Corporation

Inventors: Kannan Achan, Moises Goldszmidt, Lev Ratinov
Speech recognition device for recognizing a word sequence using a switching speech model network

Patent number: 7487091

Abstract: A speech recognition device which can preferably be used for reducing the memory capacity required for speaker-independent speech recognition is provided. A matching unit loads speech models belonging to a first speech model network and a garbage model in a RAM, and gives a speech parameter extracted by a speech parameter extraction unit to the speech model in the RAM, and when an occurrence probability output from the garbage model is equal to or greater than a predetermined value, the matching unit loads speech models belonging to any of speech model groups in the RAM based on the occurrence probability output from the speech model belonging to the first speech model network.

Type: Grant

Filed: May 7, 2003

Date of Patent: February 3, 2009

Assignee: Asahi Kasei Kabushiki Kaisha

Inventor: Toshiyuki Miyazaki
Audio-visual feature fusion and support vector machine useful for continuous speech recognition

Patent number: 7472063

Abstract: A speech recognition method includes several embodiments describing application of support vector machine analysis to a mouth region. Lip position can be accurately determined and used in conjunction with synchronous or asynchronous audio data to enhance speech recognition probabilities.

Type: Grant

Filed: December 19, 2002

Date of Patent: December 30, 2008

Assignee: Intel Corporation

Inventors: Ara V. Nefian, Xiaobo Pi, Luhong Liang, Xiaoxing Liu, Yibao Zhao
SPEECH RECOGNITION FOR IDENTIFYING ADVERTISEMENTS AND/OR WEB PAGES

Publication number: 20080294436

Abstract: A device may identify terms in a speech signal using speech recognition. The device may further retain one or more of the identified terms by comparing them to a set of words and send the retained terms and information associated with the retained terms to a remote device. The device may also receive messages that are related to the retained terms and to the information associated with the retained terms from the remote device.

Type: Application

Filed: May 21, 2007

Publication date: November 27, 2008

Applicant: SONY ERICSSON MOBILE COMMUNICATIONS AB

Inventors: Mans Folke Markus Andreasson, Per Emil Astrand, Erik Johan Vendel Backlund
System and method for quantifying, representing, and identifying similarities in data streams

Publication number: 20080288255

Abstract: A method of quantifying similarities between sequential data streams typically includes providing a pair of sequential data streams, designing a Hidden Markov Model (HMM) of at least a portion of each stream; and computing a quantitative measure of similarity between the streams using the HMMs. For a plurality of sequential data streams, a matrix of quantitative measures of similarity may be created. A spectral analysis may be performed on the matrix of quantitative measure of similarity matrix to define a multi-dimensional diffusion space, and the plurality of sequential data streams may be graphically represented and/or sorted according to the similarities therebetween. In addition, semi-supervised and active learning algorithms may be utilized to learn a user's preferences for data streams and recommend additional data streams that are similar to those preferred by the user. Multi-task learning algorithms may also be applied.

Type: Application

Filed: May 16, 2008

Publication date: November 20, 2008

Inventors: Lawrence Carin, John Paisely, Yuting Qi, Xuejun Liao, Qiuhua Liu
Variational inference and learning for segmental switching state space models of hidden speech dynamics

Patent number: 7454336

Abstract: A system and method that facilitate modeling unobserved speech dynamics based upon a hidden dynamic speech model in the form of segmental switching state space model that employs model parameters including those describing the unobserved speech dynamics and those describing the relationship between the unobserved speech dynamic vector and the observed acoustic feature vector is provided. The model parameters are modified based, at least in part, upon, a variational learning technique. In accordance with an aspect of the present invention, novel and powerful variational expectation maximization (EM) algorithm(s) for the segmental switching state space models used in speech applications, which are capable of capturing key internal (or hidden) dynamics of natural speech production, are provided. For example, modification of model parameters can be based upon an approximate mixture of Gaussian (MOG) posterior and/or based upon an approximate hidden Markov model (HMM) posterior using a variational technique.

Type: Grant

Filed: June 20, 2003

Date of Patent: November 18, 2008

Assignee: Microsoft Corporation

Inventors: Hagai Attias, Li Deng, Leo J. Lee
Method, apparatus, and system for building a compact model for large vocabulary continuous speech recognition (LVCSR) system

Patent number: 7454341

Abstract: According to one aspect of the invention, a method is provided in which a mean vector set and a variance vector set of a set of N Gaussians are divided into multiple mean sub-vector sets and variance sub-vector sets, respectively. Each mean sub-vector set contains a subset of the dimensions of the corresponding mean vector set and each variance sub-vector set contains a subset of the dimensions of the corresponding variance vector set. Each resultant sub-vector set is clustered to build a codebook for the respective sub-vector set using a modified K-means clustering process which dynamically merges and splits clusters based upon the size and average distortion of each cluster during each iteration in the modified K-means clustering process.

Type: Grant

Filed: September 30, 2000

Date of Patent: November 18, 2008

Assignee: Intel Corporation

Inventors: Jielin Pan, Baosheng Yuan
Speech recognition apparatus

Patent number: 7437288

Abstract: A speech recognition apparatus using a probability model that employs a mixed distribution, the apparatus formed by a standard pattern storage means for storing a standard pattern; a recognition means for outputting recognition results corresponding to an input speech by using the standard pattern; a standard pattern generating means for inputting learning speech and generating the standard pattern; and a standard pattern adjustment means, provided between the standard pattern generating means and the standard pattern storage means, for adjusting the number of element distributions of the mixed distribution of the standard pattern.

Type: Grant

Filed: March 11, 2002

Date of Patent: October 14, 2008

Assignee: NEC Corporation

Inventor: Koichi Shinoda
Methods and apparatus for the systematic adaptation of classification systems from sparse adaptation data

Patent number: 7437289

Abstract: Methods and apparatus for the rapid adaptation of classification systems using small amounts of adaptation data. Improvements in classification accuracy are attainable when conditions similar to those that present in adaptation are observed. The attendant methods and apparatus are suitable for a wide variety of different classification schemes, including, e.g., speaker identification and speaker verification.

Type: Grant

Filed: August 16, 2001

Date of Patent: October 14, 2008

Assignee: International Business Machines Corporation

Inventors: Upendra V. Chaudhari, Stephane H. Maes, Jiri Navratil
METHOD AND APPARATUS FOR TRAINING A TEXT INDEPENDENT SPEAKER RECOGNITION SYSTEM USING SPEECH DATA WITH TEXT LABELS

Publication number: 20080235020

Abstract: There is provided an apparatus for providing a Text Independent (TI) speaker recognition mode in a Text Dependent (TD) Hidden Markov Model (HMM) speaker recognition system and/or a Text Constrained (TC) HMM speaker recognition system. The apparatus includes a Gaussian Mixture Model (GMM) generator and a Gaussian weight normalizer. The GMM generator is for creating a GMM by pooling Gaussians from a plurality of HMM states. The Gaussian weight normalizer is for normalizing Gaussian weights with respect to the plurality of HMM states.

Type: Application

Filed: June 4, 2008

Publication date: September 25, 2008

Inventors: Jiri Navratil, James H. Nealand, Jason W. Pelecanos, Ganesh N. Ramaswamy, Ran D. Zilca

prev 1 2 3 next