Hidden Markov Model (hmm) (epo) Patents (Class 704/256.1)
  • Publication number: 20120130716
    Abstract: A speech recognition method for a robot. The speech recognition method for the robot includes one fundamental acoustic model. Whenever the noisy environment and the speaker are changed, the speech recognition method generates a plurality of parallel acoustic models in which the characteristic for each noisy environment and the characteristic for each speaker are reflected. As a result, the speech recognition method for the robot can freely recognize one of several acoustic models according to individual environments and speakers, such that it can basically remove mismatch between the model training environment and the test environment, thereby improving speech recognition capabilities.
    Type: Application
    Filed: November 17, 2011
    Publication date: May 24, 2012
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventor: Ki Beom KIM
  • Patent number: 8160878
    Abstract: A speech recognition system uses Gaussian mixture variable-parameter hidden Markov models (VPHMMs) to recognize speech under many different conditions. Each Gaussian mixture component of the VPHMMs is characterized by a mean parameter ? and a variance parameter ?. Each of these Gaussian parameters varies as a function of at least one environmental conditioning parameter, such as, but not limited to, instantaneous signal-to-noise-ratio (SNR). The way in which a Gaussian parameter varies with the environmental conditioning parameter(s) can be approximated as a piecewise function, such as a cubic spline function. Further, the recognition system formulates the mean parameter ? and the variance parameter ? of each Gaussian mixture component in an efficient form that accommodates the use of discriminative training and parameter sharing. Parameter sharing is carried out so that the otherwise very large number of parameters in the VPHMMs can be effectively reduced with practically feasible amounts of training data.
    Type: Grant
    Filed: September 16, 2008
    Date of Patent: April 17, 2012
    Assignee: Microsoft Corporation
    Inventors: Dong Yu, Li Deng, Yifan Gong, Alejandro Acero
  • Patent number: 8140328
    Abstract: Disclosed herein are systems, computer-implemented methods, and tangible computer-readable media for using alternate recognition hypotheses to improve whole-dialog understanding accuracy. The method includes receiving an utterance as part of a user dialog, generating an N-best list of recognition hypotheses for the user dialog turn, selecting an underlying user intention based on a belief distribution across the generated N-best list and at least one contextually similar N-best list, and responding to the user based on the selected underlying user intention. Selecting an intention can further be based on confidence scores associated with recognition hypotheses in the generated N-best lists, and also on the probability of a user's action given their underlying intention. A belief or cumulative confidence score can be assigned to each inferred user intention.
    Type: Grant
    Filed: December 1, 2008
    Date of Patent: March 20, 2012
    Assignee: AT&T Intellectual Property I, L.P.
    Inventor: Jason Williams
  • Patent number: 8140329
    Abstract: A method and apparatus are proposed for automatically recognizing observed audio data. An observation vector is created of audio features extracted from the observed audio data and the observed audio data is recognized from the observation vector. The audio features include features are selected from a group of 3 types of features obtained from the observed audio data: (i) ICA features obtained by processing the observed audio data, (ii) first MFCC features obtained by removing a logarithm step from the conventional MFCC process, or (iii) second MFCC features obtained by applying the ICA process to results of a mel scale filter bank.
    Type: Grant
    Filed: April 5, 2004
    Date of Patent: March 20, 2012
    Assignee: Sony Corporation
    Inventors: Jian Zhang, Wei Lu, Xiaobing Sun
  • Publication number: 20120065976
    Abstract: A method is disclosed herein that includes an act of causing a processor to receive a sample, wherein the sample is one of spoken utterance, an online handwriting sample, or a moving image sample. The method also comprises the act of causing the processor to decode the sample based at least in part upon an output of a combination of a deep structure and a context-dependent Hidden Markov Model (HMM), wherein the deep structure is configured to output a posterior probability of a context-dependent unit. The deep structure is a Deep Belief Network consisting of many layers of nonlinear units with connecting weights between layers trained by a pretraining step followed by a fine-tuning step.
    Type: Application
    Filed: September 15, 2010
    Publication date: March 15, 2012
    Applicant: Microsoft Corporation
    Inventors: Li Deng, Dong Yu, George Edward Dahl
  • Publication number: 20120059657
    Abstract: A method for detecting and recognizing speech is provided that remotely detects body motions from a speaker during vocalization with one or more radar sensors. Specifically, the radar sensors include a transmit aperture that transmits one or more waveforms towards the speaker, and each of the waveforms has a distinct wavelength. A receiver aperture is configured to receive the scattered radio frequency energy from the speaker. Doppler signals correlated with the speaker vocalization are extracted with a receiver. Digital signal processors are configured to develop feature vectors utilizing the vocalization Doppler signals, and words associated with the feature vectors are recognized with a word classifier.
    Type: Application
    Filed: June 7, 2011
    Publication date: March 8, 2012
    Inventors: Jefferson M. Willey, Todd Stephenson, Hugh Faust, James P. Hansen, George J. Linde, Carol Chang, Justin Nevitt, James A. Ballas, Thomas Herne Crystal, Vincent Michael Stanford, Jean W. de Graaf
  • Publication number: 20120041764
    Abstract: A speech processing method, comprising: receiving a speech input which comprises a sequence of feature vectors; determining the likelihood of a sequence of words arising from the sequence of feature vectors using an acoustic model and a language model, comprising: providing an acoustic model for performing speech recognition on an input signal which comprises a sequence of feature vectors, said model having a plurality of model parameters relating to the probability distribution of a word or part thereof being related to a feature vector, wherein said speech input is a mismatched speech input which is received from a speaker in an environment which is not matched to the speaker or environment under which the acoustic model was trained; and adapting the acoustic model to the mismatched speech input, the speech processing method further comprising determining the likelihood of a sequence of features occurring in a given language using a language model; and combining the likelihoods determined by the acoustic
    Type: Application
    Filed: August 10, 2011
    Publication date: February 16, 2012
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Haitian XU, Kean Kheong Chin, Mark John Francis Gales
  • Patent number: 8086443
    Abstract: A method for sequence tagging medical patient records includes providing a labeled corpus of sentences taken from a set of medical records, initializing generative parameters ? and discriminative parameters {tilde over (?)}, providing a functional LL?CĂ—Penalty, where LL is a log-likelihood function LL = log ? ? p ? ( ? , ? ~ ) + ? l = 1 M ? ? [ log ? ? p ? ( X l , Y l | ? ~ ) - log ? ? p ? ( X l | ? ~ ) ] + ? l = 1 M ? ? log ? ? p ? ( X l | ? ) , ? Penalty = ? y ? V Y ? ( em y 2 + tr y 2 + e ? ? m ~ y 2 + t ? ? r ~ y 2 ) , where emy=1???xi?VXp(xi|y), e{tilde over (m)}y=1???xi?VX{tilde over (p)}(xi|y) are emission probability constraints, try=1???yi?VYp(yi|y), t{tilde over (r)}y=1???yi?VY{tilde over (p)}(yi|y) are transition probability constraints, and extracting gradients of LL?CĂ—Penalty with respect to the transition and emission probabilities and solving ?k*,{tilde o
    Type: Grant
    Filed: August 21, 2008
    Date of Patent: December 27, 2011
    Assignee: Siemens Medical Solutions USA, Inc.
    Inventors: Oksana Yakhnenko, Romer E. Rosales, Radu Stefan Niculescu, Lucian Vlad Lita
  • Publication number: 20110288869
    Abstract: An apparatus to improve robustness to environmental changes of a context dependent speech recognizer for an application, that includes a training database to store sounds for speech recognition training, a dictionary to store words supported by the speech recognizer, and a speech recognizer training module to train a set of one or more multiple state Hidden Markov Models (HMMs) with use of the training database and the dictionary. The speech recognizer training module performs a non-uniform state clustering process on each of the states of each HMM, which includes using a different non-uniform cluster threshold for at least some of the states of each HMM to more heavily cluster and correspondingly reduce a number of observation distributions for those of the states of each HMM that are less empirically affected by one or more contextual dependencies.
    Type: Application
    Filed: May 21, 2010
    Publication date: November 24, 2011
    Inventors: Xavier Menendez-Pidal, Ruxin Chen
  • Publication number: 20110257976
    Abstract: Speech recognition includes structured modeling, irrelevant variability normalization and unsupervised online adaptation of speech recognition parameters.
    Type: Application
    Filed: April 14, 2010
    Publication date: October 20, 2011
    Applicant: Microsoft Corporation
    Inventor: Qiang Huo
  • Patent number: 8041567
    Abstract: Commercially available voice recognition systems are generally speaker-dependent, with the voice recognition system first being trained to the voice of the speaker before it can be used. A disadvantage with this method is that modified reference data has to be buffered and permanently saved in several steps when the speaker adaptation algorithm is executed, and thus requires a lot of memory space. This primarily negatively affects applications on devices with restricted processor power and limited memory space, such as mobile radio terminals for example. A method of speaker adaptation for a Hidden Markov Model based voice recognition system may address these issues. In the method, the memory space requirement and thus also the processor power required can be considerably reduced. This is achieved by using modified reference data in a speaker adaptation algorithm to adapt a new speaker to a reference speaker. The modified reference data is processed in compressed form.
    Type: Grant
    Filed: September 22, 2005
    Date of Patent: October 18, 2011
    Assignee: Siemens Aktiengesellschaft
    Inventors: Sergey Astrov, Josef Bauer
  • Patent number: 8010341
    Abstract: Mechanisms are disclosed for incorporating prototype information into probabilistic models for automated information processing, mining, and knowledge discovery. Examples of these models include Hidden Markov Models (HMMs), Latent Dirichlet Allocation (LDA) models, and the like. The prototype information injects prior knowledge to such models, thereby rendering them more accurate, effective, and efficient. For instance, in the context of automated word labeling, additional knowledge is encoded into the models by providing a small set of prototypical words for each possible label. The net result is that words in a given corpus are labeled and are therefore in condition to be summarized, identified, classified, clustered, and the like.
    Type: Grant
    Filed: September 13, 2007
    Date of Patent: August 30, 2011
    Assignee: Microsoft Corporation
    Inventors: Kannan Achan, Moises Goldszmidt, Lev Ratinov
  • Publication number: 20110202343
    Abstract: A method and apparatus derive a dynamic grammar composed of a subset of a plurality of data elements that are each associated with one of a plurality of reference identifiers. The present invention generates a set of selection identifiers on the basis of a user-provided first input identifier and determines which of these selection identifiers are present in a set of pre-stored reference identifiers. The present invention creates a dynamic grammar that includes those data elements that are associated with those reference identifiers that are matched to any of the selection identifiers. Based on a user-provided second identifier and on the data elements of the dynamic grammar, the present invention selects one of the reference identifiers in the dynamic grammar.
    Type: Application
    Filed: April 28, 2011
    Publication date: August 18, 2011
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Deborah W. Brown, Randy G. Goldberg, Stephen Michael Marcus, Richard R. Rosinski
  • Patent number: 7912717
    Abstract: The invention uses the ModelGrower program to generate possible candidates from an original or aggregated model. An isomorphic reduction program operates on the candidates to identify and exclude isomorphic models. A Markov model evaluation and optimization program operates on the remaining non-isomorphic candidates. The candidates are optimized and the ones that most closely conform to the data are kept. The best optimized candidate of one stage becomes the starting candidate for the next stage where ModelGrower and the other programs operate on the optimized candidate to generate a new optimized candidate. The invention repeats the steps of growing, excluding isomorphs, evaluating and optimizing until such repetitions yield no significantly better results.
    Type: Grant
    Filed: November 18, 2005
    Date of Patent: March 22, 2011
    Inventor: Albert Galick
  • Patent number: 7895040
    Abstract: According to an embodiment, voice recognition apparatus includes units of: acoustic processing, voice interval detecting, dictionary, collating, search target selecting, storing and determining, and voice recognition method includes processes of: selecting a search range on basis of a beam search, setting and storing a standard frame, storing an output probability of a certain transition path, determining whether or not the output probability of a certain path is stored. Number of times of calculation of the output probability is reduced by selecting the search range on basis of the beam search, calculating the output probability of the certain transition path only once in an interval from when the standard frame is set to when the standard frame is renewed, and storing and using thus calculated value as an approximate value of the output probability in subsequent frames.
    Type: Grant
    Filed: March 30, 2007
    Date of Patent: February 22, 2011
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Masaru Sakai, Shinichi Tanaka
  • Patent number: 7873518
    Abstract: A device for assessing a quality class of an object to be tested includes a unit for detecting a test signal from the object to be tested. Furthermore, the device for assessing includes a unit for providing a stochastic Markov model including states and transitions between states on the basis of reference measurements of objects of known quality classes, and a unit for evaluating the test signal using the stochastic Markov model. In addition, the device for assessing includes a unit for associating the object to be tested with a quality class based on the evaluation of the test signal. Such a device has the advantage to be able to perform a more precise association of an object to be tested with a quality class as compared to prior art.
    Type: Grant
    Filed: November 10, 2006
    Date of Patent: January 18, 2011
    Assignees: Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V., Technische Universitaet Dresden
    Inventors: Dieter Hentschel, Constanze Tschoepe, Ruediger Hoffmann, Matthias Eichner, Matthias Wolff
  • Publication number: 20100191532
    Abstract: An object comparison method comprises: generating a first ordered vector sequence representation of a first object; generating a second ordered vector sequence representation of a second object; representing the first object by a first ordered sequence of model parameters generated by modeling the first ordered vector sequence representation using a semi-continuous hidden Markov model employing a universal basis; representing the second object by a second ordered sequence of model parameters generated by modeling the second ordered vector sequence representation using a semi-continuous hidden Markov model employing the universal basis; and comparing the first and second ordered sequences of model parameters to generate a quantitative comparison measure.
    Type: Application
    Filed: January 28, 2009
    Publication date: July 29, 2010
    Applicant: Xerox Corporation
    Inventors: Jose A. Rodriguez Serrano, Florent C. Perronnin
  • Publication number: 20100185448
    Abstract: In embodiments of the present invention improved capabilities are described for interacting with a mobile communication facility comprising receiving a switch activation from a user to initiate a speech recognition recording session, wherein the speech recognition recording session comprises a voice command from the user followed by the speech to be recognized from the user; recording the speech recognition recording session using a mobile communication facility resident capture facility; recognizing at least a portion of the voice command as an indication that user speech for recognition will begin following the end of the at least a portion of the voice command; recognizing the recorded speech using a speech recognition facility to produce an external output; and using the selected output to perform a function on the mobile communication facility.
    Type: Application
    Filed: January 21, 2010
    Publication date: July 22, 2010
    Inventor: William S. Meisel
  • Publication number: 20100169094
    Abstract: A speaker adaptation apparatus includes an acquiring unit configured to acquire an acoustic model including HMMs and decision trees for estimating what type of the phoneme or the word is included in a feature value used for speech recognition, the HMMs having a plurality of states on a phoneme-to-phoneme basis or a word-to-word basis, and the decision trees being configured to reply to questions relating to the feature value and output likelihoods in the respective states of the HMMs, and a speaker adaptation unit configured to adapt the decision trees to a speaker, the decision trees being adapted using speaker adaptation data vocalized by the speaker of an input speech.
    Type: Application
    Filed: September 17, 2009
    Publication date: July 1, 2010
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Masami Akamine, Jitendra Ajmera, Partha Lal
  • Publication number: 20100145698
    Abstract: Computer-implemented systems and methods are provided for assessing non-native spontaneous speech pronunciation. Speech recognition on digitized speech is performed using a non-native acoustic model trained with non-native speech to generate word hypotheses for the digitized speech. Time alignment is performed between the digitized speech and the word hypotheses using a reference acoustic model trained with native-quality speech. Statistics are calculated regarding individual words and phonemes in the word hypotheses based on the alignment. A plurality of features for use in assessing pronunciation of the speech are calculated based on the statistics, an assessment score is calculated based on one or more of the calculated features, and the assessment score is stored in a computer-readable memory.
    Type: Application
    Filed: December 1, 2009
    Publication date: June 10, 2010
    Applicant: Educational Testing Service
    Inventors: Lei Chen, Klaus Zechner, Xiaoming Xi
  • Publication number: 20100138223
    Abstract: An object of the present invention is to allow classification of sequentially input speech signals with good accuracy based on similarity of speakers and environments by using a realistic memory use amount, a realistic processing speed, and an on-line operation. A speech classification probability calculation means 103 calculates a probability (probability of classification into each cluster) that a latest one of the speech signals (speech data) belongs to each cluster based on a generative model which is a probability model. A parameter updating means 107 successively estimates parameters that define the generative model based on the probability of classification of the speech data into each cluster calculated by the speech classification probability calculation means 103 (in FIG. 1).
    Type: Application
    Filed: March 13, 2008
    Publication date: June 3, 2010
    Inventor: Takafumi Koshinaka
  • Patent number: 7725409
    Abstract: Computer programs (600, 700, 800, 900, 1000) and a programmed computer (1100) for automatically generating computer programs (i.e. sequences of instructions) are provided. The computer programs (600, 700, 800, 900, 1000) use Hidden Markov Models (400, 500) to generate sequences of program tokens, e.g., Gene Expression Programming chromosomes (100). Parameters of the Hidden Markov Models (400, 500) are numerically optimized, for example, by Differential Evolution with a goal of increasing the fitness of automatically generated programs.
    Type: Grant
    Filed: June 5, 2007
    Date of Patent: May 25, 2010
    Assignee: Motorola, Inc.
    Inventors: Chi Zhou, Magdi A. Mohamed, Weimin Xiao
  • Patent number: 7720012
    Abstract: A system, method, and apparatus for identifying a speaker of an utterance, particularly when the utterance has portions of it missing due to packet losses. Different packet loss models are applied to each speaker's training data in order to improve accuracy, especially for small packet sizes.
    Type: Grant
    Filed: July 11, 2005
    Date of Patent: May 18, 2010
    Assignee: Arrowhead Center, Inc.
    Inventors: Deva K. Borah, Phillip De Leon
  • Publication number: 20100121643
    Abstract: The technology disclosed relates to a system and method for fast, accurate and parallelizable speech search, called Crystal Decoder. It is particularly useful for search applications, as opposed to dictation. It can achieve both speed and accuracy, without sacrificing one for the other. It can search different variations of records in the reference database without a significant increase in elapsed processing time. Even the main decoding part can be parallelized as the number of words increase to maintain a fast response time.
    Type: Application
    Filed: November 2, 2009
    Publication date: May 13, 2010
    Applicant: Melodis Corporation
    Inventors: Keyvan Mohajer, Seyed Majid Emami, Jon Grossman, Joe Kyaw Soe Aung, Sina Sohangir
  • Patent number: 7707027
    Abstract: A method for identifying data that is meaningless and generating a natural language statistical model which can reject meaningless input. The method can include identifying unigrams that are individually meaningless from a set of training data. At least a portion of the unigrams identified as being meaningless can be assigned to a first n-gram class. The method also can include identifying bigrams that are entirely composed of meaningless unigrams and determining whether the identified bigrams are individually meaningless. At least a portion of the bigrams identified as being individually meaningless can be assigned to the first n-gram class.
    Type: Grant
    Filed: April 13, 2006
    Date of Patent: April 27, 2010
    Assignee: Nuance Communications, Inc.
    Inventors: Rajesh Balchandran, Linda Boyer
  • Publication number: 20100094626
    Abstract: It is an object of the present invention to provide a method and apparatus for locating a keyword of a speech and a speech recognition system. The method includes the steps of: by extracting feature parameters from frames constituting the recognition target speech, forming a feature parameter vector sequence that represents the recognition target speech; by normalizing of the feature parameter vector sequence with use of a codebook containing a plurality of codebook vectors, obtaining a feature trace of the recognition target speech in a vector space; and specifying the position of a keyword by matching prestored keyword template traces with the feature trace. According to the present invention, a keyword template trace and a feature space trace of a target speech are drawn in accordance with an identical codebook. This causes resampling to be unnecessary in performing linear movement matching of speech wave frames having similar phonological feature structures.
    Type: Application
    Filed: September 27, 2007
    Publication date: April 15, 2010
    Inventors: Fengqin Li, Yadong Wu, Qinqtao Yang, Chen Chen
  • Publication number: 20100076758
    Abstract: A speech recognition system described herein includes a receiver component that receives a distorted speech utterance. The speech recognition also includes an updater component that is in communication with a first model and a second model, wherein the updater component automatically updates parameters of the second model based at least in part upon joint estimates of additive and convolutive distortions output by the first model, wherein the joint estimates of additive and convolutive distortions are estimates of distortions based on a phase-sensitive model in the speech utterance received by the receiver component. Further, distortions other than additive and convolutive distortions, including other stationary and nonstationary sources, can also be estimated used to update the parameters of the second model.
    Type: Application
    Filed: September 24, 2008
    Publication date: March 25, 2010
    Applicant: Microsoft Corporation
    Inventors: Jinyu Li, Li Deng, Dong Yu, Yifan Gong, Alejandro Acero
  • Patent number: 7684988
    Abstract: A system and method of testing and tuning a speech recognition system by providing pronunciations to the speech recognizer. First a text document is provided to the system and converted into a sequence of phonemes representative of the words in the text. The phonemes are then converted to model units, such as Hidden Markov Models. From the models a probability is obtained for each model or state, and feature vectors are determined. The feature vector matching the most probable vector for each state is selected for each model. These ideal feature vectors are provided to the speech recognizer, and processed. The end result is compared with the original text, and modifications to the system can be made based on the output text.
    Type: Grant
    Filed: October 15, 2004
    Date of Patent: March 23, 2010
    Assignee: Microsoft Corporation
    Inventor: Ricardo Lopez Barquilla
  • Publication number: 20100070279
    Abstract: A speech recognition system uses Gaussian mixture variable-parameter hidden Markov models (VPHMMs) to recognize speech under many different conditions. Each Gaussian mixture component of the VPHMMs is characterized by a mean parameter ? and a variance parameter ?. Each of these Gaussian parameters varies as a function of at least one environmental conditioning parameter, such as, but not limited to, instantaneous signal-to-noise-ratio (SNR). The way in which a Gaussian parameter varies with the environmental conditioning parameter(s) can be approximated as a piecewise function, such as a cubic spline function. Further, the recognition system formulates the mean parameter ? and the variance parameter ? of each Gaussian mixture component in an efficient form that accommodates the use of discriminative training and parameter sharing. Parameter sharing is carried out so that the otherwise very large number of parameters in the VPHMMs can be effectively reduced with practically feasible amounts of training data.
    Type: Application
    Filed: September 16, 2008
    Publication date: March 18, 2010
    Applicant: Microsoft Corporation
    Inventors: Dong Yu, Li Deng, Yifan Gong, Alejandro Acero
  • Publication number: 20100070274
    Abstract: An apparatus for a speech recognition based on source separation and identification includes: a sound source separator for separating mixed signals, which are input to two or more microphones, into sound source signals by using independent component analysis (ICA), and estimating direction information of the separated sound source signals; and a speech recognizer for calculating normalized log likelihood probabilities of the separated sound source signals. The apparatus further includes a speech signal identifier identifying a sound source corresponding to a user's speech signal by using both of the estimated direction information and the reliability information based on the normalized log likelihood probabilities.
    Type: Application
    Filed: July 7, 2009
    Publication date: March 18, 2010
    Applicant: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE
    Inventors: Hoon-Young CHO, Sang Kyu Park, Jun Park, Seung Hi Kim, Ilbin Lee, Kyuwoong Hwang, Hyung-Bae Jeon, Yunkeun Lee
  • Publication number: 20100070278
    Abstract: A transformation can be derived which would represent that processing required to convert a male speech model to a female speech model. That transformation is subjected to a predetermined modification, and the modified transformation is applied to a female speech model to produce a synthetic children's speech model. The male and female models can be expressed in terms of a vector representing key values defining each speech model and the derived transformation can be in the form of a matrix that would transform the vector of the male model to the vector of the female model. The modification to the derived matrix comprises applying an exponential p which has a value greater than zero and less than 1.
    Type: Application
    Filed: September 12, 2008
    Publication date: March 18, 2010
    Inventors: Andreas Hagen, Bryan Peltom, Kadri Hacioglu
  • Publication number: 20100057462
    Abstract: The present invention relates to a method for speech recognition of a speech signal comprising the steps of providing at least one codebook comprising codebook entries, in particular, multivariate Gaussians of feature vectors, that are frequency weighted such that higher weights are assigned to entries corresponding to frequencies below a predetermined level than to entries corresponding to frequencies above the predetermined level and processing the speech signal for speech recognition comprising extracting at least one feature vector from the speech signal and matching the feature vector with the entries of the codebook.
    Type: Application
    Filed: September 2, 2009
    Publication date: March 4, 2010
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventors: Tobias Herbig, Martin Raab, Raymond Brueckner, Rainer Gruhn
  • Patent number: 7650282
    Abstract: An approach to scoring acoustically-based events, such as hypothesized instances of keywords, in a speech processing system make use of scores of individual components of the event. Data characterizing an instance of an event are first accepted. This data includes a score for the event. The event is associated with a number of component events from a set of component events, such as a set of phonemes. Probability models are also accepted for component scores associated with each of the set of component events in each of two of more possible classes of the event, such as a class of true occurrences of the event and a class of false detections of the event. The event is then scored. This scoring includes computing a probability of one of the two or more possible classes for the event using the accepted probability models.
    Type: Grant
    Filed: July 22, 2004
    Date of Patent: January 19, 2010
    Assignee: Nexidia Inc.
    Inventor: Robert W. Morris
  • Publication number: 20090326946
    Abstract: A frame erasure concealment technique for a bitstream-based feature extractor in a speech recognition system particularly suited for use in a wireless communication system operates to “delete” each frame in which an erasure is declared. The deletions thus reduce the length of the observation sequence, but have been found to provide for sufficient speech recognition based on both single word and “string” tests of the deletion technique.
    Type: Application
    Filed: August 19, 2009
    Publication date: December 31, 2009
    Applicant: AT&T INTELLECTUAL PROPERTY II, L.P.
    Inventors: Richard Vandervoort Cox, Hong Kook Kim
  • Publication number: 20090313025
    Abstract: A method and system are disclosed that automatically segment speech to generate a speech inventory. The method includes initializing a Hidden Markov Model (HMM) using seed input data, performing a segmentation of the HMM into speech units to generate phone labels, correcting the segmentation of the speech units. Correcting the segmentation of the speech units includes re-estimating the HMM based on a current version of the phone labels, embedded re-estimating of the HMM, and updating the current version of the phone labels using spectral boundary correction. The system includes modules configured to control a processor to perform steps of the method.
    Type: Application
    Filed: August 20, 2009
    Publication date: December 17, 2009
    Applicant: AT&T Corp.
    Inventors: Alistair D. CONKIE, Yeon-Jun KIM
  • Patent number: 7627473
    Abstract: A method and apparatus are provided for training and using a hidden conditional random field model for speech recognition and phonetic classification. The hidden conditional random field model uses feature functions, at least one of which is based on a hidden state in a phonetic unit. Values for the feature functions are determined from a segment of speech, and these values are used to identify a phonetic unit for the segment of speech.
    Type: Grant
    Filed: October 15, 2004
    Date of Patent: December 1, 2009
    Assignee: Microsoft Corporation
    Inventors: Asela J. Gunawardana, Milind Mahajan, Alejandro Acero
  • Patent number: 7574411
    Abstract: Management of a low memory treelike data structure is shown. The method according to the invention comprises steps for creating a decision tree including a parent node and at least one leaf node, and steps for searching data from said nodes. The nodes of the decision tree are stored sequentially in such a manner that nodes follow the parent node in storage order, wherein the nodes refining the context of the searchable data can be reached without a link from their parent node. The method can preferably be utilized in speech-recognition systems, in text-to-phoneme mapping.
    Type: Grant
    Filed: April 29, 2004
    Date of Patent: August 11, 2009
    Assignee: Nokia Corporation
    Inventors: Janne Suontausta, Jilei Tian
  • Publication number: 20090144059
    Abstract: A method of compensating for additive and convolutive distortions applied to a signal indicative of an utterance is discussed. The method includes receiving a signal and initializing noise mean and channel mean vectors. Gaussian dependent matrix and Hidden Markov Model (HMM) parameters are calculated or updated to account for additive noise from the noise mean vector or convolutive distortion from the channel mean vector. The HMM parameters are adapted by decoding the utterance using the previously calculated HMM parameters and adjusting the Gaussian dependent matrix and the HMM parameters based upon data received during the decoding. The adapted HMM parameters are applied to decode the input utterance and provide a transcription of the utterance.
    Type: Application
    Filed: December 3, 2007
    Publication date: June 4, 2009
    Applicant: MICROSOFT CORPORATION
    Inventors: Dong Yu, Li Deng, Alejandro Acero, Yifan Gong, Jinyu Li
  • Patent number: 7529671
    Abstract: A pattern recognition system and method are provided. Aspects of the invention are particularly useful in combination with multi-state Hidden Markov Models. Pattern recognition is effected by processing Hidden Markov Model Blocks. This block-processing allows the processor to perform more operations upon data while such data is in cache memory. By so increasing cache locality, aspects of the invention provide significantly improved pattern recognition speed.
    Type: Grant
    Filed: March 4, 2003
    Date of Patent: May 5, 2009
    Assignee: Microsoft Corporation
    Inventors: William H. Rockenbeck, Julian J. Odell
  • Patent number: 7523034
    Abstract: Methods and arrangements for enhancing speech recognition in noisy environments, via providing at least one initial Compound Gaussian Mixture model, applying an adaptation algorithm to at least one item associated with speech enrollment data and to the at least one initial Compound Gaussian Mixture model to yield an intermediate output, and mathematically combining the at least one initial Compound Gaussian Mixture model with the intermediate output to yield an adapted Compound Gaussian Mixture model.
    Type: Grant
    Filed: December 13, 2002
    Date of Patent: April 21, 2009
    Assignee: International Business Machines Corporation
    Inventors: Sabine V. Deligne, Satyanarayana Dharanipragada
  • Publication number: 20090076794
    Abstract: Mechanisms are disclosed for incorporating prototype information into probabilistic models for automated information processing, mining, and knowledge discovery. Examples of these models include Hidden Markov Models (HMMs), Latent Dirichlet Allocation (LDA) models, and the like. The prototype information injects prior knowledge to such models, thereby rendering them more accurate, effective, and efficient. For instance, in the context of automated word labeling, additional knowledge is encoded into the models by providing a small set of prototypical words for each possible label. The net result is that words in a given corpus are labeled and are therefore in condition to be summarized, identified, classified, clustered, and the like.
    Type: Application
    Filed: September 13, 2007
    Publication date: March 19, 2009
    Applicant: Microsoft Corporation
    Inventors: Kannan Achan, Moises Goldszmidt, Lev Ratinov
  • Patent number: 7487091
    Abstract: A speech recognition device which can preferably be used for reducing the memory capacity required for speaker-independent speech recognition is provided. A matching unit loads speech models belonging to a first speech model network and a garbage model in a RAM, and gives a speech parameter extracted by a speech parameter extraction unit to the speech model in the RAM, and when an occurrence probability output from the garbage model is equal to or greater than a predetermined value, the matching unit loads speech models belonging to any of speech model groups in the RAM based on the occurrence probability output from the speech model belonging to the first speech model network.
    Type: Grant
    Filed: May 7, 2003
    Date of Patent: February 3, 2009
    Assignee: Asahi Kasei Kabushiki Kaisha
    Inventor: Toshiyuki Miyazaki
  • Patent number: 7472063
    Abstract: A speech recognition method includes several embodiments describing application of support vector machine analysis to a mouth region. Lip position can be accurately determined and used in conjunction with synchronous or asynchronous audio data to enhance speech recognition probabilities.
    Type: Grant
    Filed: December 19, 2002
    Date of Patent: December 30, 2008
    Assignee: Intel Corporation
    Inventors: Ara V. Nefian, Xiaobo Pi, Luhong Liang, Xiaoxing Liu, Yibao Zhao
  • Publication number: 20080294436
    Abstract: A device may identify terms in a speech signal using speech recognition. The device may further retain one or more of the identified terms by comparing them to a set of words and send the retained terms and information associated with the retained terms to a remote device. The device may also receive messages that are related to the retained terms and to the information associated with the retained terms from the remote device.
    Type: Application
    Filed: May 21, 2007
    Publication date: November 27, 2008
    Applicant: SONY ERICSSON MOBILE COMMUNICATIONS AB
    Inventors: Mans Folke Markus Andreasson, Per Emil Astrand, Erik Johan Vendel Backlund
  • Publication number: 20080288255
    Abstract: A method of quantifying similarities between sequential data streams typically includes providing a pair of sequential data streams, designing a Hidden Markov Model (HMM) of at least a portion of each stream; and computing a quantitative measure of similarity between the streams using the HMMs. For a plurality of sequential data streams, a matrix of quantitative measures of similarity may be created. A spectral analysis may be performed on the matrix of quantitative measure of similarity matrix to define a multi-dimensional diffusion space, and the plurality of sequential data streams may be graphically represented and/or sorted according to the similarities therebetween. In addition, semi-supervised and active learning algorithms may be utilized to learn a user's preferences for data streams and recommend additional data streams that are similar to those preferred by the user. Multi-task learning algorithms may also be applied.
    Type: Application
    Filed: May 16, 2008
    Publication date: November 20, 2008
    Inventors: Lawrence Carin, John Paisely, Yuting Qi, Xuejun Liao, Qiuhua Liu
  • Patent number: 7454336
    Abstract: A system and method that facilitate modeling unobserved speech dynamics based upon a hidden dynamic speech model in the form of segmental switching state space model that employs model parameters including those describing the unobserved speech dynamics and those describing the relationship between the unobserved speech dynamic vector and the observed acoustic feature vector is provided. The model parameters are modified based, at least in part, upon, a variational learning technique. In accordance with an aspect of the present invention, novel and powerful variational expectation maximization (EM) algorithm(s) for the segmental switching state space models used in speech applications, which are capable of capturing key internal (or hidden) dynamics of natural speech production, are provided. For example, modification of model parameters can be based upon an approximate mixture of Gaussian (MOG) posterior and/or based upon an approximate hidden Markov model (HMM) posterior using a variational technique.
    Type: Grant
    Filed: June 20, 2003
    Date of Patent: November 18, 2008
    Assignee: Microsoft Corporation
    Inventors: Hagai Attias, Li Deng, Leo J. Lee
  • Patent number: 7454341
    Abstract: According to one aspect of the invention, a method is provided in which a mean vector set and a variance vector set of a set of N Gaussians are divided into multiple mean sub-vector sets and variance sub-vector sets, respectively. Each mean sub-vector set contains a subset of the dimensions of the corresponding mean vector set and each variance sub-vector set contains a subset of the dimensions of the corresponding variance vector set. Each resultant sub-vector set is clustered to build a codebook for the respective sub-vector set using a modified K-means clustering process which dynamically merges and splits clusters based upon the size and average distortion of each cluster during each iteration in the modified K-means clustering process.
    Type: Grant
    Filed: September 30, 2000
    Date of Patent: November 18, 2008
    Assignee: Intel Corporation
    Inventors: Jielin Pan, Baosheng Yuan
  • Patent number: 7437288
    Abstract: A speech recognition apparatus using a probability model that employs a mixed distribution, the apparatus formed by a standard pattern storage means for storing a standard pattern; a recognition means for outputting recognition results corresponding to an input speech by using the standard pattern; a standard pattern generating means for inputting learning speech and generating the standard pattern; and a standard pattern adjustment means, provided between the standard pattern generating means and the standard pattern storage means, for adjusting the number of element distributions of the mixed distribution of the standard pattern.
    Type: Grant
    Filed: March 11, 2002
    Date of Patent: October 14, 2008
    Assignee: NEC Corporation
    Inventor: Koichi Shinoda
  • Patent number: 7437289
    Abstract: Methods and apparatus for the rapid adaptation of classification systems using small amounts of adaptation data. Improvements in classification accuracy are attainable when conditions similar to those that present in adaptation are observed. The attendant methods and apparatus are suitable for a wide variety of different classification schemes, including, e.g., speaker identification and speaker verification.
    Type: Grant
    Filed: August 16, 2001
    Date of Patent: October 14, 2008
    Assignee: International Business Machines Corporation
    Inventors: Upendra V. Chaudhari, Stephane H. Maes, Jiri Navratil
  • Publication number: 20080235020
    Abstract: There is provided an apparatus for providing a Text Independent (TI) speaker recognition mode in a Text Dependent (TD) Hidden Markov Model (HMM) speaker recognition system and/or a Text Constrained (TC) HMM speaker recognition system. The apparatus includes a Gaussian Mixture Model (GMM) generator and a Gaussian weight normalizer. The GMM generator is for creating a GMM by pooling Gaussians from a plurality of HMM states. The Gaussian weight normalizer is for normalizing Gaussian weights with respect to the plurality of HMM states.
    Type: Application
    Filed: June 4, 2008
    Publication date: September 25, 2008
    Inventors: Jiri Navratil, James H. Nealand, Jason W. Pelecanos, Ganesh N. Ramaswamy, Ran D. Zilca