Markov Patents (Class 704/256)

Hidden markov model (hmm) (epo) (Class 704/256.1)

Speech recognition training

Patent number: 5963906

Abstract: A method and system performs speech recognition training using Hidden Markov Models. Initially, preprocessed speech signals that include a plurality of observations are stored by the system. Initial Hidden Markov Model (HMM) parameters are then assigned. Summations are then calculated using modified equations derived substantially from the following equations, wherein u.ltoreq.v<w:P(X.sub.u.sup.v)=P(x.sub.u.sup.v)P(x.sub.v+1.sup.w)and.OMEGA..sub.ij (x.sub.u.sup.w)=.OMEGA..sub.ij (x.sub.u.sup.v)P(x.sub.v+1.sup.w)+P(x.sub.u.sup.v).OMEGA..sub.ij (x.sub.v+1.sup.w)The calculated summations are then used to perform HMM parameter reestimation. It then determines whether the HMM parameters have converged. If they have, the HMM parameters are then stored. However, if the HMM parameters have not converged, the system again calculates summations, performs HMM parameter reestimation using the summations, and determines whether the parameters have converged.

Type: Grant

Filed: May 20, 1997

Date of Patent: October 5, 1999

Assignee: AT & T Corp

Inventor: William Turin
System and method of recognizing an acoustic environment to adapt a set of based recognition models to the current acoustic environment for subsequent speech recognition

Patent number: 5960397

Abstract: A speech recognition system which effectively recognizes unknown speech from multiple acoustic environments includes a set of secondary models, each associated with one or more particular acoustic environments, integrated with a base set of recognition models. The speech recognition system is trained by making a set of secondary models in a first stage of training, and integrating the set of secondary models with a base set of recognition models in a second stage of training.

Type: Grant

Filed: May 27, 1997

Date of Patent: September 28, 1999

Assignee: AT&T Corp

Inventor: Mazin G. Rahim
Pattern adapting apparatus using minimum description length criterion in pattern recognition processing and speech recognition system

Patent number: 5956676

Abstract: A pattern adapting apparatus including an input pattern forming unit, a tree structure standard pattern storing unit for storing a tree structure standard pattern including a tree structure indicative of inclusive relationships among categories and a parameter set at each node of the tree structure, a pattern matching unit for matching categories of the tree structure standard pattern with input samples of an input pattern, a tree structure standard pattern modifying unit for modifying a tree structure standard pattern based on the results of pattern matching, a node set selecting unit for calculating a description length with respect to a plurality of node sets in a tree structure pattern to select an appropriate node set, a modified standard pattern forming unit for forming a modified standard pattern by using a parameter set of a selected node set, and a standard pattern for recognition storing unit for storing a modified standard pattern.

Type: Grant

Filed: August 27, 1996

Date of Patent: September 21, 1999

Assignee: NEC Corporation

Inventor: Koichi Shinoda
Speech recognition apparatus and method using look-ahead scoring

Patent number: 5956678

Abstract: In the recognition of coherently spoken words, a plurality of hypotheses is usually built up which end in various words during the recognition process and are then to be continued with further words. To keep the number of words yet to be continued as small as possible, especially in the case of a large vocabulary, it is known to carry out a look-ahead in a limited time space. It is suggested according to the invention to use the same phonemes for the look-ahead as for the actual recognition and to add together the differential sums obtained in the look-ahead for the evaluation of the partial hypothesis which has just ended and which is to be continued, and to compare this sum with a threshold value which depends on the extrapolated minimum total evaluation at the end of the time space of the look-ahead. The searching space for hypotheses to be continued can be limited by this in a particularly favorable manner.

Type: Grant

Filed: April 17, 1995

Date of Patent: September 21, 1999

Assignee: U.S. Philips Corporation

Inventors: Reinhold Hab-Umbach, Hermann Ney
Speech processing apparatus and method using a noise-adaptive PMC model

Patent number: 5956679

Abstract: A speech processing apparatus includes a noise model production device for extracting a noise-speech interval from input speech data and producing a noise model by using the data of the extracted interval. The apparatus also includes a composite distribution production device for dividing the distributions of a speech model into a plurality of groups, producing a composite distribution of each group, and determining the positional relationship of each distribution within each group. In addition, the apparatus includes a memory for storing each composite distribution and the positional relationship of each distribution within the group, and a PMC conversion device for PMC-converting each produced composite distribution. Also provided is a noise-adaptive speech model production device for producing a noise-adaptive speech model on the basis of the composite distribution which is PMC-converted by the PMC conversion device and the positional relationship stored by the memory.

Type: Grant

Filed: December 2, 1997

Date of Patent: September 21, 1999

Assignee: Canon Kabushiki Kaisha

Inventors: Yasuhiro Komori, Hiroki Yamamoto
Methods and apparatus for decreasing the size of pattern recognition models by pruning low-scoring models from generated sets of models

Patent number: 5950158

Abstract: Methods and apparatus for producing efficiently sized models suitable for pattern recognition purposes are described. Various embodiments are directed to the automated generation, evaluation, and selection of reduced size models from an initial model having a relatively large number of components, e.g., more components than can be stored for a particular intended application. To achieve model size reduction in an automated iterative manner, expectation maximization (EM) model training techniques are combined, in accordance with the present invention, with model size constraints. In one embodiment, a plurality of reduced size models are generated using a LaGrange multiplier from an input model and input size constraints. The plurality of reduced size models are stored in a buffer and scored using a likelihood scoring technique.

Type: Grant

Filed: July 30, 1997

Date of Patent: September 7, 1999

Assignee: Nynex Science and Technology, Inc.

Inventor: Kuansan Wang
Speaker independent speech recognition system and method

Patent number: 5946653

Abstract: An improved method of training a SISRS uses less processing and memory resources by operating on vectors instead of matrices which represent spoken commands. Memory requirements are linearly proportional to the number of spoken commands for storing each command model. A spoken command is identified from the set of spoken commands by a command recognition procedure (200). The command recognition procedure (200) includes sampling the speaker's speech, deriving cepstral coefficients and delta-cepstral coefficients, and performing a polynomial expansion on cepstral coefficients. The identified spoken command is selected using the dot product of the command model data and the average command structure representing the unidentified spoken command.

Type: Grant

Filed: October 1, 1997

Date of Patent: August 31, 1999

Assignee: Motorola, Inc.

Inventors: William Michael Campbell, John Eric Kleider, Charles Conway Broun, Carl Steven Gifford, Khaled Assaleh
Method of recognizing a sequence of words and device for carrying out the method

Patent number: 5946655

Abstract: When a language model is to be used for the recognition of a speech signal and the vocabulary is composed as a tree, the language model value cannot be taken into account before the word end. Customarily, after each word end the comparison with a tree root is started anew, be it with a score which has been increased by the language model value so that the threshold value for the scores at which hypotheses are terminated must be high and hence many, even unattractive hypotheses remain active for a prolonged period of time. In order to avoid this, in accordance with the invention a correction value is added to the score for at least a part of the nodes of the vocabulary tree; the sum of the correction values on the path to a word then may not be greater than the language model value for the relevant word. As a result, for each test signal the scores of all hypotheses are of a comparable order of magnitude.

Type: Grant

Filed: March 29, 1995

Date of Patent: August 31, 1999

Assignee: U.S. Philips Corporation

Inventors: Volker Steinbiss, Bach-Hiep Tran, Hermann Ney
Speech and speaker recognition using factor analysis to model covariance structure of mixture components

Patent number: 5946656

Abstract: Hidden Markov models (HMMs) rely on high-dimensional feature vectors to summarize the short-time properties of speech correlations between features that can arise when the speech signal is non-stationary or corrupted by noise. These correlations are modeled using factor analysis, a statistical method for dimensionality reduction. Factor analysis is used to model acoustic correlation in automatic speech recognition by introducing a small number of parameters to model the covariance structure of a speech signal. The parameters are estimated by an Expectation Maximization (EM) technique that can be embedded in the training procedures for the HMMs, and then further adjusted using Minimum Classification Error (MCE) training, which demonstrates better discrimination and produces more accurate recognition models.

Type: Grant

Filed: November 17, 1997

Date of Patent: August 31, 1999

Assignee: AT & T Corp.

Inventors: Mazin G. Rahim, Lawrence K. Saul
Method and system for speech recognition using continuous density hidden Markov models

Patent number: 5937384

Abstract: A method and system for achieving an improved recognition accuracy in speech recognition systems which utilize continuous density hidden Markov models to represent phonetic units of speech present in spoken speech utterances is provided. An acoustic score which reflects the likelihood that a speech utterance matches a modeled linguistic expression is dependent on the output probability associated with the states of the hidden Markov model. Context-independent and context-dependent continuous density hidden Markov models are generated for each phonetic unit. The output probability associated with a state is determined by weighing the output probabilities of the context-dependent and context-independent states in accordance with a weighting factor. The weighting factor indicates the robustness of the output probability associated with each state of each model, especially in predicting unseen speech utterances.

Type: Grant

Filed: May 1, 1996

Date of Patent: August 10, 1999

Assignee: Microsoft Corporation

Inventors: Xuedong D. Huang, Milind V. Mahajan
Method and system for pattern recognition based on dynamically constructing a subset of reference vectors

Patent number: 5933806

Abstract: A system and method are used for recognising a time-sequential input pattern (20), which is derived from a continual physical quantity, such as speech. The system has input means (30), which accesses the physical quantity and therefrom generates a plurality of input observation vectors. The input observation vectors represent the input pattern. A reference pattern database (40) is used for storing a plurality of reference patterns. Each reference pattern includes a sequence of reference units, where each reference unit is represented by at least one associated reference vector .mu..sub.a in a set {.mu..sub.a } of reference vectors. A localizer (50) is used for locating among the reference patterns stored in the reference pattern database (40), a recognised reference pattern, which corresponds to the input pattern. The locating includes selecting a subset {.mu..sub.s } of reference vectors from said set {.mu..sub.

Type: Grant

Filed: August 28, 1996

Date of Patent: August 3, 1999

Assignee: U.S. Philips Corporation

Inventors: Peter Beyerlein, Meinhard D. Ullrich
Combining frequency warping and spectral shaping in HMM based speech recognition

Patent number: 5930753

Abstract: Frequency warping approaches to speaker normalization have been proposed and evaluated on various speech recognition tasks. In all cases, frequency warping was found to significantly improve recognition performance by reducing the mismatch between test utterances presented to the recognizer and the speaker independent HMM model. This invention relates to a procedure which compensates utterances by simultaneously scaling the frequency axis and reshaping the spectral energy contour. This procedure is shown to reduce the error rate in a telephone based connected digit recognition task by 30%.

Type: Grant

Filed: March 20, 1997

Date of Patent: July 27, 1999

Assignee: AT&T Corp

Inventors: Alexandros Potamianos, Richard Cameron Rose
Speech recognition method and apparatus, a computer-readable storage medium, and a computer- readable program for obtaining the mean of the time of speech and non-speech portions of input speech in the cepstrum dimension

Patent number: 5924067

Abstract: An apparatus and method for speech recognition includes a device and a step for obtaining a mean of the time of a speech portion in the Cepstrum dimension from the speech portion of the input speech, a device and step for obtaining a mean of a time of the non-speech portion in the Cepstrum dimension from the non-speech portion of the input speech, a device and step for converting each mean time from a Cepstrum region to a linear region, and after that, subtracting it on a linear spectrum dimension, converting the subtracted mean into a Cepstrum dimension, subtracting a mean of a time of a speech portion in a Cepstrum dimension in a speech database for learning from the converted result, and adding the subtracted result to a speech model expressed by Cepstrum. By this arrangement, even when noise is large, the presumed precision of a line fluctuation is raised and the recognition rate can be improved.

Type: Grant

Filed: March 20, 1997

Date of Patent: July 13, 1999

Assignee: Canon Kabushiki Kaisha

Inventors: Tetsuo Kosaka, Yasunori Ohora
System and method for classifying a speech signal

Patent number: 5924066

Abstract: A system and method for classifying a speech signal within a likely speech signal class of a plurality of speech signal classes are provided. Stochastic models include a plurality of states having state transitions and output probabilities to generate state sequences which model evolutionary characteristics and durational variability of a speech signal. The method includes extracting a frame sequence, and determining a state sequence for each stochastic model with each state sequence having full state segmentation. Representative frames are determined to provide speech signal time normalization. A likely speech signal class is determined from a neural network having a plurality of inputs receiving the representative frames and a plurality of outputs corresponding to the plurality of speech signal classes. An output signal is generated based on the likely stochastic model.

Type: Grant

Filed: September 26, 1997

Date of Patent: July 13, 1999

Assignees: U S WEST, Inc., MediaOne, Inc.

Inventor: Amlan Kundu
Word recognition with HMM speech, model, using feature vector prediction from current feature vector and state control vector values

Patent number: 5920839

Abstract: A pattern recognition technology includes a set of control signal vector and covariance matrix for respective states of a reference pattern of an objective word for recognition, which reference pattern is expressed by a plurality of states and transitions between the states, and transition probabilities between respective states. A prediction vector of t th feature vector is derived on the basis of the t-1 th feature vector and the control signal vector for the current (n th) state, determined beforehand for each of the states. A feature vector output probability for outputting the t th feature vector in n th state of the reference pattern of the objective word for recognition is derived from multi-dimensional gaussian distribution determined by the prediction vector and the covariance matrix with taking the prediction vector as an average vector.

Type: Grant

Filed: February 10, 1997

Date of Patent: July 6, 1999

Assignee: NEC Corporation

Inventor: Ken-Ichi Iso
Speaker identification with user-selected password phrases

Patent number: 5913192

Abstract: A speaker identification system includes a speaker-independent phrase recognizer. The speaker-independent phrase recognizer scores a password utterance against all the sets of phonetic transcriptions in a lexicon database to determine the N best speaker-independent scores, determines the N best sets of phonetic transcriptions based on the N best speaker-independent scores, and determines the N best possible identities. A speaker-dependent phrase recognizer retrieves the hidden Markov model corresponding to each of the N best possible identities, and scores the password utterance against each of the N hidden Markov models to generate a speaker-dependent score for each of the N best possible identities. A score processor coupled to the outputs of the speaker-independent phrase recognizer and the speaker-dependent phrase recognizer determines a putative identity. A verifier coupled to the score processor authenticates the determined putative identity.

Type: Grant

Filed: August 22, 1997

Date of Patent: June 15, 1999

Assignee: AT&T Corp

Inventors: Sarangarajan Parthasarathy, Aaron Edward Rosenberg
Method and system of runtime acoustic unit selection for speech synthesis

Patent number: 5913193

Abstract: The present invention pertains to a concatenative speech synthesis system and method which produces a more natural sounding speech. The system provides for multiple instances of each acoustic unit which can be used to generate a speech waveform representing an linguistic expression. The multiple instances are formed during an analysis or training phase of the synthesis process and are limited to a robust representation of the highest probability instances. The provision of multiple instances enables the synthesizer to select the instance which closely resembles the desired instance thereby eliminating the need to alter the stored instance to match the desired instance. This in essence minimizes the spectral distortion between the boundaries of adjacent instances thereby producing more natural sounding speech.

Type: Grant

Filed: April 30, 1996

Date of Patent: June 15, 1999

Assignee: Microsoft Corporation

Inventors: Xuedong D. Huang, Michael D. Plumpe, Alejandro Acero, James L. Adcock
Location of pattern in signal

Patent number: 5907825

Abstract: A method for determining the location of a pattern, when input in isolation, within a representative input signal is provided. The method aligns the input signal with a signal representative of a plurality of connected patterns, one of which is the same as the pattern within the input signal. The method then determines the location from the results of the aligning step. The location determined using this apparatus can be used to determine an isolated reference model by extracting features of the input signal from the location found. This isolated reference model can then be used to generate a continuous reference model for the pattern, by aligning the isolated reference model with the signals representative of a plurality of connected patterns, one of which is the pattern to be modelled.

Type: Grant

Filed: February 6, 1997

Date of Patent: May 25, 1999

Assignee: Canon Kabushiki Kaisha

Inventor: Eli Tzirkel-Hancock
Speaker-independent speech recognition using vowel/consonant segmentation based on pitch intensity values

Patent number: 5907826

Abstract: A speech recognition apparatus includes a feature extraction section, and a recognition section. The feature extraction section extracts the feature vectors of input speech. The feature extraction section includes at least a pitch intensity extraction section. The pitch intensity extraction section extracts the intensities of the fundamental frequency components of the input speech. The recognition section performs speech recognition by using the feature vectors from the feature extraction section.

Type: Grant

Filed: October 28, 1997

Date of Patent: May 25, 1999

Assignee: NEC Corporation

Inventor: Keizaburo Takagi
Method of preparing speech model and speech recognition apparatus using this method

Patent number: 5903865

Abstract: A speech model preparing method capable of easily preparing a new Hidden Markov Model (HMM) of an input speech with a very few number of utterances like one or two times, and a speech recognition apparatus using this method. A speech recognition apparatus uses, as a speech model, a continuous distribution type HMM defined by three parameters of a state transition probability, an average vector and a variance. The apparatus computes an average vector of an input speech to be learned, selects an HMM approximate to the input to-be-learned speech as an initial model from a registration dictionary, replaces at least an average vector of the selected HMM with the computed average vector of the to-be-learned speech and adds an obtained HMM as an HMM for the input to-be-learned speech in the dictionary.

Type: Grant

Filed: August 29, 1996

Date of Patent: May 11, 1999

Assignee: Pioneer Electronic Corporation

Inventors: Shunsuke Ishimitsu, Ikuo Fujita
Method and apparatus for adapting the language model's size in a speech recognition system

Patent number: 5899973

Abstract: In this speech recognition system, the size of the language model is reduced by discarding those n-grams that the acoustic part of the system can recognize most accurately without support from a language model. The n-grams can be discarded dynamically during the running of the system or during the build or setup-time of the system. Trigrams occurring infrequently in the text corpora are substituted for the discarded n-grams to increase the accuracy of the word recognitions.

Type: Grant

Filed: September 25, 1997

Date of Patent: May 4, 1999

Assignee: International Business Machines Corporation

Inventors: Upali Bandara, Siegfried Kunzmann, Karlheinz Mohr, Burn L. Lewis
Methods and apparatus for generating and using speaker independent garbage models for speaker dependent speech recognition purpose

Patent number: 5895448

Abstract: Methods and apparatus for generating and using both speaker dependent and speaker independent garbage models in speaker dependent speech recognition applications are described. The present invention recognizes that in some speech recognition systems, e.g., systems where multiple speech recognition operations are performed on the same signal, it may be desirable to recognize and treat words or phrases in one part of the speech recognition system as garbage or out of vocabulary utterances with the understanding that the very same words or phrases will be recognized and treated as in-vocabulary by another portion of the system. In accordance with the present invention, in systems where both speaker independent and speaker dependent speech recognition operations are performed independently, e.g.

Type: Grant

Filed: April 30, 1997

Date of Patent: April 20, 1999

Assignee: Nynex Science and Technology, Inc.

Inventors: George J. Vysotsky, Vijay R. Raman
Speech recoginition methods and apparatus

Patent number: 5893059

Abstract: Methods and apparatus for transitioning from one speech recognition system to another and for reusing existing speech recognition data are described. In particular, various methods of converting speech recognition templates or models from a first format to a second format are described. Methods for improving the recognition rate achieved using converted templates or models are also described. These methods involve storing source and/or scoring information for templates or models so that converted models or templates can be scored differently than original models or templates to thereby reflect the effect the conversion process has on recognition scores. In order to enhance recognition results in one embodiment an available compressed voice recording is used in the conversion process. The methods and apparatus of the present invention can be applied to a wide variety of speech recognition template and model conversion applications. Methods and apparatus for generating garbage models are also described.

Type: Grant

Filed: April 17, 1997

Date of Patent: April 6, 1999

Assignee: Nynex Science and Technology, Inc.

Inventor: Vijay R. Raman
Enhancement of esophageal speech by injection noise rejection

Patent number: 5890111

Abstract: Injection noise and silence are detected in an input speech signal and an external amplifier is switched on or off based on the detected injection noise or silence. The input speech signal is digitized and a first copy of the digitized signal is preemphasized. After the input speech signal is preemphasized, a predetermined number of Mel-frequency cepstral coefficients (MFCCs) and difference cepstra are calculated for each window of the speech signal. A measure of signal energy and a measure of the rate of change of the signal energy is computed. A second copy of the digitized input speech signal is processed using amplitude summation or by differencing a center-clipped signal. The measures of signal energy, rate of change of the signal energy, the Mel coefficients, difference cepstra, and either the amplitude summation value or the differenced value are combined to form an observation vector.

Type: Grant

Filed: December 24, 1996

Date of Patent: March 30, 1999

Assignee: Technology Research Association of Medical Welfare Apparatus

Inventors: Hector Raul Javkin, Michael Galler, Nancy Niedzielski
Method and apparatus for training Hidden Markov Model

Patent number: 5890114

Abstract: HMM training method comprising a first parameter predicting step, a centroid state set calculating step, a reconstructing step, a second parameter predicting step and a control step. In the first parameter predicting step, a parameter of an HMM (hidden Markov model) is predicted based on training data. In the centroid state set calculating step, a centroid state set is calculated by clustering the state of said HMM whose parameter is predicted in the first parameter predicting step. In the reconstructing step, an HMM is reconstructed with using the centroid state calculated in the centroid state set calculating step. In the second parameter predicting step, predicted a parameter of the HMM reconstructed in the reconstructing step with using the training data. And, the centroid step is reexecuted by the control step in the case that a likelihood of the HMM whose parameter is predicted in the second parameter predicting step does not satisfy a predetermined condition.

Type: Grant

Filed: February 28, 1997

Date of Patent: March 30, 1999

Assignee: Oki Electric Industry Co., Ltd.

Inventor: Jie Yi
Method and apparatus for a time-synchronous tree-based search strategy

Patent number: 5884259

Abstract: A method and apparatus for using a tree structure to constrain a time-synchronous, fast search for candidate words in an acoustic stream is described. A minimum stay of three frames in each graph node visited is imposed by allowing transitions only every third frame. This constraint enables the simplest possible Markov model for each phoneme while enforcing the desired minimum duration. The fast, time-synchronous search for likely words is done for an entire sentence/utterance. The list of hypotheses beginning at each time frame is stored for providing, on-demand, lists of contender/candidate words to the asynchronous, detailed match phase of decoding.

Type: Grant

Filed: February 12, 1997

Date of Patent: March 16, 1999

Assignee: International Business Machines Corporation

Inventors: Lalit Rai Bahl, Ellen Marie Eide
Speech recognition apparatus equipped with means for removing erroneous candidate of speech recognition

Patent number: 5878390

Abstract: A speech recognition apparatus which includes a speech recognition section for performing a speech recognition process on an uttered speech with reference to a predetermined statistical language model, based on a series of speech signal of the uttered speech sentence composed of a series of input words. The speech recognition section calculates a functional value of a predetermined erroneous sentence judging function with respect to speech recognition candidates, where the erroneous sentence judging representing a degree of unsuitability for the speech recognition candidates. When the calculated functional value exceeds a predetermined threshold value, the speech recognition section performs the speech recognition process by eliminating a speech recognition candidate corresponding to a calculated functional value.

Type: Grant

Filed: June 23, 1997

Date of Patent: March 2, 1999

Assignee: ATR Interpreting Telecommunications Research Laboratories

Inventors: Jun Kawai, Yumi Wakita
Method and apparatus for an improved language recognition system

Patent number: 5870706

Abstract: Methods and apparatus for a language model and language recognition systems are disclosed. The method utilizes a plurality of probabilistic finite state machines having the ability to recognize a pair of sequences, one sequence scanned leftwards, the other scanned rightwards. Each word in the lexicon of the language model is associated with one or more such machines which model the semantic relations between the word and other words. Machine transitions create phrases from a set of word string hypotheses, and incrementally calculate costs related to the probability that such phrases represent the language to be recognized. The cascading lexical head machines utilized in the methods and apparatus capture the structural associations implicit in the hierachical organization of a sentence, resulting in a language model and language recognition systems that combine the lexical sensitivity of N-gram models with the structural properties of dependency grammar.

Type: Grant

Filed: April 10, 1996

Date of Patent: February 9, 1999

Assignee: Lucent Technologies, Inc.

Inventor: Hiyan Alshawi
Method and apparatus for speech recognition adapted to an individual speaker

Patent number: 5864810

Abstract: A method and apparatus for automatic recognition of speech adapts to a particular speaker by using adaptation data to develop a transformation through which speaker independent models are transformed into speaker adapted models. The speaker adapted models are then used for speaker recognition and achieve better recognition accuracy than non-adapted models. In a further embodiment, the transformation-based adaptation technique is combined with a known Bayesian adaptation technique.

Type: Grant

Filed: January 20, 1995

Date of Patent: January 26, 1999

Assignee: SRI International

Inventors: Vassilios Digalakis, Leonardo Neumeyer, Dimitry Rtischev
Decision-directed frame-synchronous adaptive equalization filtering of a speech signal by implementing a hidden markov model

Patent number: 5864806

Abstract: For equalizing a speech signal constituted by an observed sequence of successive input sound frames, which speech signal is liable to be affected by disturbances, the speech signal is modelled by means of a hidden Markov model and, at each instant t: equalization filters are constituted in association with the paths in the Markov sense at instant t; at least a plurality of the equalization filters are applied to the frames to obtain, at instant t, a plurality of filtered sound frame sequences and an utterance probability for each of the paths respectively associated with the equalization filters applied; the equalization filter corresponding to the most probable path in the Markov sense is selected; and the filtered frame supplied by the selected equalization filter is selected as the equalized frame.

Type: Grant

Filed: May 5, 1997

Date of Patent: January 26, 1999

Assignee: France Telecom

Inventors: Chafic Mokbel, Denis Jouvet, Jean Monne
Speech recognition apparatus and speech recognition method

Patent number: 5860062

Abstract: A speech recognition apparatus and method learns in advance a plurality of kinds of noises that can occur in the environment of use to determine a plurality of noise HMMs, synthesizes these noise HMMs into one noise HMM, generates a NOVO-HMM by executing NOVO (voice mixed with noise) conversion for a speech HMM of a reference pattern by using this composite noise HMM, and uses this NOVO-HMM for a speech recognition processing. Since a plurality of noises are incorporated in the NOVO-HMM generated in this manner, the speech can be recognized with high accuracy even when the noise changes.

Type: Grant

Filed: June 13, 1997

Date of Patent: January 12, 1999

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Kenichi Taniguchi, Nobuyuki Kono, Toshimichi Tokuda, Yoshio Ikura
Method and system for pattern recognition based on tree organized probability densities

Patent number: 5857169

Abstract: A time-sequential input pattern (20), which is derived from a continual physical quantity, such as speech is recognized. The system includes input means (30), which accesses the physical quantity and therefrom generates a sequence of input observation vectors. The input observation vectors represent the input pattern. A reference pattern database (40) is used for storing reference patterns, which consist of a sequence of reference units. Each reference unit is represented by associated reference probability densities. A tree builder (60) represents for each reference unit the set of associated reference probability densities as a tree structure. Each leaf node of the tree corresponds to a reference probability density. Each non-leaf node corresponds to a cluster probability density, which is derived from all reference probability densities corresponding to leaf nodes in branches below the non-leaf node.

Type: Grant

Filed: August 28, 1996

Date of Patent: January 5, 1999

Assignee: U.S. Philips Corporation

Inventor: Frank Seide
Apparatuses and methods for training and operating speech recognition systems

Patent number: 5850627

Abstract: A word recognition system can: respond to the input of a character string from a user by limiting the words it will recognize to words having a related, but not necessarily the same, string; score signals generated after a user has been prompted to generate a given word against words other than the prompted word to determine if the signal should be used to train the prompted word; vary the number of signals a user is prompted to generate to train a given word as a function of how well the training signals score against each other or prior models for the prompted word; create a new acoustic model of a phrase by concatenating prior acoustic models of the words in the phrase; obtain information from another program running on the same computer, such as its commands or the context of text being entered into it, and use that information to vary which words it can recognize; determine which program unit, such as an application program or dialog box, currently has input focus on its computer and create a vocabulary

Type: Grant

Filed: June 26, 1997

Date of Patent: December 15, 1998

Assignee: Dragon Systems, Inc.

Inventors: Joel M. Gould, Elizabeth E. Steele, Frank J. McGrath, Steven D. Squires, Peter S. Heitman, Joel W. Parke, Dean G. Sturtevant, Jed M. Roberts, James K. Baker
Methods and apparatus for generating and using garbage models for speaker dependent speech recognition purposes

Patent number: 5842165

Abstract: Methods and apparatus for the generation of speaker dependent garbage models from the very same data used to generate speaker dependent speech recognition models, e.g., word models, are described. The technique involves processing the data included in the speaker dependent speech recognition models to create one or more speaker dependent garbage models. The speaker dependent garbage model generation technique involves what may be described as distorting or morphing of a speaker dependent speech recognition model to generate a speaker dependent garbage model therefrom. One or more speaker dependent speech recognition models may then be combined with the generated speaker dependent garbage model to produce an updated garbage model. The scoring of speaker dependent garbage models is varied in accordance with the present invention as a function of the number of speech recognition models from which the speaker dependent garbage model was created.

Type: Grant

Filed: April 30, 1997

Date of Patent: November 24, 1998

Assignee: Nynex Science & Technology, Inc.

Inventors: Vijay R. Raman, George J. Vysotsky
Speaker-independent model generation apparatus and speech recognition apparatus each equipped with means for splitting state having maximum increase in likelihood

Patent number: 5839105

Abstract: There is provided a speaker-independent model generation apparatus and a speech recognition apparatus which require a processing unit to have less memory capacity and which allow its computation time to be reduced, as compared with a conventional counterpart. A single Gaussian HMM is generated with a Baum-Welch training algorithm based on spoken speech data from a plurality of specific speakers. A state having a maximum increase in likelihood as a result of splitting one state in contextual or temporal domains is searched. Then, the state having a maximum increase in likelihood is split in a contextual or temporal domain corresponding to the maximum increase in likelihood. Thereafter, a single Gaussian HMM is generated with the Baum-Welch training algorithm, and these steps are iterated until the states within the single Gaussian HMM can no longer be split or until a predetermined number of splits is reached. Thus, a speaker-independent HMM is generated.

Type: Grant

Filed: November 29, 1996

Date of Patent: November 17, 1998

Assignee: ATR Interpreting Telecommunications Research Laboratories

Inventors: Mari Ostendorf, Harald Singer
Method for speaker adaptation of speech models recognition scheme using the method and recording medium having the speech recognition method recorded thereon

Patent number: 5835890

Abstract: In a speaker adaptation method for speech models, input speech is transformed to a feature parameter sequence like a cepstral sequence, and N model sequences of maximum likelihood for the feature parameter sequence are extracted from speaker-independent speech HMMs by an N-best hypothesis extraction method. The extracted model sequences are each provisionally adapted to maximize its likelihood for the feature parameter sequence of the input speech while changing the HMM parameters of each sequence, and that one of the provisionally adapted model sequences which has the maximum likelihood for the feature parameter sequence of the input speech is selected and speech models of the selected sequence are provided as adapted HMMs of the speaker to be recognized.

Type: Grant

Filed: April 9, 1997

Date of Patent: November 10, 1998

Assignee: Nippon Telegraph and Telephone Corporation

Inventors: Tomoko Matsui, Sadaoki Furui
Class-based word clustering for speech recognition using a three-level balanced hierarchical similarity

Patent number: 5835893

Abstract: In a word clustering apparatus for clustering words, a plurality of words is clustered to obtain a total tree diagram of a word dictionary representing a word clustering result, where the total tree diagram includes tree diagrams of an upper layer, a middle layer and a lower layer. In a speech recognition apparatus, a microphone converts an input utterance speech composed of a plurality of words into a speech signal, and a feature extractor extracts predetermined acoustic feature parameters from the converted speech signal. Then, a speech recognition controller executes a speech recognition process on the extracted acoustic feature parameters with reference to a predetermined Hidden Markov Model and the obtained total tree diagram of the word dictionary, and outputs a result of the speech recognition.

Type: Grant

Filed: April 18, 1996

Date of Patent: November 10, 1998

Assignee: ATR Interpreting Telecommunications Research Labs

Inventor: Akira Ushioda
Devices and methods for speech recognition of vocabulary words with simultaneous detection and verification

Patent number: 5832430

Abstract: Devices and methods for speech recognition enable simultaneous word hypothesis detection and verification in a one-pass procedure that provides for different segmentations of the speech input. A confidence measure of a target hypothesis for a known word is determined according to a recursion formula that operates on parameters of a target models and alternate models of known words, a language model and a lexicon, and feature vectors of the speech input in a likelihood ratio decoder. The confidence measure is processed to determine an accept/reject signal for the target hypothesis that is output with a target hypothesis signal. The recursion formula is based on hidden Markov models with a single optimum state sequence and may take the form of a modified Viterbi algorithm.

Type: Grant

Filed: December 8, 1995

Date of Patent: November 3, 1998

Assignee: Lucent Technologies, Inc.

Inventors: Eduardo Lleida, Richard Cameron Rose
Adjusting a hidden Markov model tagger for sentence fragments

Patent number: 5822731

Abstract: A system for parsing information representative of a sequence of words having parts of speech. The sequence of words forms a sentence or sentence fragment. A hidden Markov model is provided for determining the most likely part of speech of a selected word of the sequence of words. The hidden Markov model has an initial transition matrix and a subsequent transition matrix for storing probabilities of occurrence of the parts of speech. The initial transition matrix of the hidden Markov model is removed to provide a modified hidden Markov model. The modified hidden Markov model is applied to the sequence of words to determine the most likely part of speech of a selected word within a sentence fragment with increased accuracy.

Type: Grant

Filed: September 15, 1995

Date of Patent: October 13, 1998

Assignee: Infonautics Corporation

Inventor: John Michael Schultz
Task-constrained connected speech recognition of propagation of tokens only if valid propagation path is present

Patent number: 5819222

Abstract: A speech recognition system recognizes connected speech using a plurality of vocabulary nodes, at least one of which has an associated signature. In use, partial recognition paths are examined at decision nodes intermediate the beginning and end of the recognition path, each decision node having an associated set of valid accumulated signatures. A token received by a decision node is only propagated if the accumulated signature of that token is one of those in the set of valid accumulated signatures associated with that decision node.

Type: Grant

Filed: October 11, 1995

Date of Patent: October 6, 1998

Assignee: British Telecommunications public limited company

Inventors: Samuel Gavin Smyth, Simon Patrick Alexander Ringland
Speech adaptation device suitable for speech recognition device and word spotting device

Patent number: 5819223

Abstract: A speech adaptation device comprises a vocabulary independent reference pattern memory for memorizing a plurality of vocabulary independent reference patterns having one or more categories. Each category has one or more acoustic units, and has such a connection relation of the acoustic units that allows reception of any sequence of the acoustic units appearing in the input speech. A preliminary matching unit is for use in making time-alignment between the time series of the feature vectors of the input speech obtained from the analysis unit and the vocabulary independent reference pattern to obtain mean vectors for individual categories of the input speech and the vocabulary independent reference pattern from the aligned portion for the individual categories of the feature vectors of the input speech and the vocabulary independent reference pattern.

Type: Grant

Filed: January 26, 1996

Date of Patent: October 6, 1998

Assignee: NEC Corporation

Inventor: Keizaburo Takagi
State transition model design method and voice recognition method and apparatus using same

Patent number: 5812975

Abstract: A method of designing a state transition model capable of high speed voice recognition and a voice recognition method and apparatus using the state transition model is provided. The methods provide a state transition model in which a state shared structure of the state transition model is designed. The method includes a step of setting the states of a triphone state transition model in an acoustic space as initial clusters, a clustering step of generating a cluster containing the initial clusters by top-down clustering, a step of determining a state shared structure by assigning a short distance cluster among clusters generated by the clustering step, to the state transition model, and a step of learning a state shared model by analyzing the states of the triphones in accordance with the determined state shared structure.

Type: Grant

Filed: June 18, 1996

Date of Patent: September 22, 1998

Assignee: Canon Kabushiki Kaisha

Inventors: Yasuhiro Komori, Yasunori Ohora
Speech recognition using middle-to-middle context hidden markov models

Patent number: 5812974

Abstract: This is a speech recognition method for modeling adjacent word context, comprising: dividing a first word or period of silence into two portions; dividing a second word or period of silence, adjacent to the first word, into two potions; and combining last portion of the first word or period of silence and first portion of the second word or period of silence to make an acoustic model. The method includes constructing a grammar to restrict the acoustic models to the middle-to-middle context.

Type: Grant

Filed: April 10, 1996

Date of Patent: September 22, 1998

Assignee: Texas Instruments Incorporated

Inventors: Charles T. Hemphill, Lorin P. Netsch, Christopher M. Kribs
Speaker independent speech recognition method utilizing multiple training iterations

Patent number: 5806034

Abstract: A method for recognizing spoken utterances of a speaker is disclosed, the method comprising the steps of providing a database of labeled speech data; providing a prototype of a Hidden Markov Model (HMM) definition to define the characteristics of the HMM; and parameterizing speech utterances according to one of linear prediction parameters or Mel-scale filter bank parameters. The method further includes selecting a frame period for accommodating the parameters and generating HMMs and decoding to specified speech utterances by causing the user to utter predefined training speech utterances for each HMM. The method then statistically computes the generated HMMs with the prototype HMM to provide a set of fully trained HMMs for each utterance indicative of the speaker.

Type: Grant

Filed: August 2, 1995

Date of Patent: September 8, 1998

Assignee: ITT Corporation

Inventors: Joe A. Naylor, William Y. Huang, Lawrence G. Bahler
Speech recognition system and method using a hidden markov model adapted to recognize a number of words and trained to recognize a greater number of phonetically dissimilar words.

Patent number: 5799278

Abstract: A speech recognition system for discrete words uses a single Hidden Markov Model (HMM), which is nominally adapted to recognise N different isolated words, but which is trained to recognise M different words, where M>N. This is achieved by providing M sets of audio recordings, each set comprising multiple recordings of a respective one of said M words being spoken. Only N different labels are assigned to the M sets of audio recordings, so that at least one of the N labels has two or more sets of audio recordings assigned thereto. These two or more sets of audio recordings correspond to phonetically dissimilar words. The HMM is then trained by inputting each set of audio recordings and its assigned label. The HMM can effectively compensate for the phonetic variations between the different words assigned the same label, thereby avoiding the need to utilise a larger model (i.e., to use M labels).

Type: Grant

Filed: July 2, 1996

Date of Patent: August 25, 1998

Assignee: International Business Machines Corporation

Inventors: Michael Cobbett, John Brian Pickering
Acoustic model generating method for speech recognition

Patent number: 5799277

Abstract: The acoustic model generating method for speech recognition enables a high representation effect on the basis of the minimum possible model parameters. In an initial model having a smaller number of signal sources, the acoustic model for speech recognition is generated by selecting the splitting processing or the merging processing for the signal sources successively and repeatedly. The merging processing is executed prior to the splitting processing. In the merging processing, when the merged result is not appropriate, the splitting processing is executed for the model obtained before merging processing (without use of the merged result).

Type: Grant

Filed: October 25, 1995

Date of Patent: August 25, 1998

Assignee: Victor Company of Japan, Ltd.

Inventor: Junichi Takami
Method of key-phase detection and verification for flexible speech understanding

Patent number: 5797123

Abstract: A key-phrase detection and verification method that can be advantageously used to realize understanding of flexible (i.e., unconstrained) speech. A "multiple pass" procedure is applied to a spoken utterance comprising a sequence of words (i.e., a "sentence"). First, a plurality of key-phrases are detected (i.e., recognized) based on a set of phrase sub-grammars which may, for example, be specific to the state of the dialogue. These key-phrases are then verified by assigning confidence measures thereto and comparing these confidence measures to a threshold, resulting in a set of verified key-phrase candidates. Next, the verified key-phrase candidates are connected into sentence hypotheses based upon the confidence measures and predetermined (e.g., task-specific) semantic information. And, finally, one or more of these sentence hypotheses are verified to produce a verified sentence hypothesis and, from that, a resultant understanding of the spoken utterance.

Type: Grant

Filed: December 20, 1996

Date of Patent: August 18, 1998

Assignee: Lucent Technologies Inc.

Inventors: Wu Chou, Biing-Hwang Juang, Tatsuya Kawahara, Chin-Hui Lee
Pattern recognition method

Patent number: 5794198

Abstract: One-dimensional normal distributions in respective dimensions of a continuous multi-dimensional normal distribution of each state of HMMs representing speech units mean and variance values are tied among similar one-dimensional distributions. As a result, the total number of normal distributions for representing the model is reduced without degrading recognition performance.

Type: Grant

Filed: October 24, 1995

Date of Patent: August 11, 1998

Assignee: Nippon Telegraph and Telephone Corporation

Inventors: Satoshi Takahashi, Shigeki Sagayama
Speech recognition method

Patent number: 5787396

Abstract: A speech recognition method uses continuous mixture Hidden Markov Models (HMM) for probability processing including a first type of HMM having a small number of mixtures and a second type of HMM having a larger number of mixtures. First output probabilities are formed for inputted speech using the small number of mixtures type HMM and second output probabilities are formed for the input speech using the large number of mixtures type HMM for selected states corresponding to the highest output probabilities of the first type HMM. The input speech is recognized from both the first and second output probabilities.

Type: Grant

Filed: September 18, 1995

Date of Patent: July 28, 1998

Assignee: Canon Kabushiki Kaisha

Inventors: Yasuhiro Komori, Yasunori Ohora, Masayuki Yamada
Method of speech recognition using decoded state sequences having constrained state likelihoods

Patent number: 5778341

Abstract: The invention is a speech recognition system and method for transmitting information including the receipt and decoding of speech information such as that modeled by hidden Markov models (HMMs). In this invention, the state likelihoods of the modeled state sequences contained within the speech information are assigned penalties based on the difference between those state likelihoods and a maximum possible state likelihood. Once penalties have been assigned, the modified state sequence with the modified state likelihoods having the highest cumulative state likelihoods is used in further speech recognition processing. In this manner, state sequences having no extremely poor state likelihoods are favored over those having both extremely high and extremely poor state likelihoods.

Type: Grant

Filed: January 26, 1996

Date of Patent: July 7, 1998

Assignee: Lucent Technologies Inc.

Inventor: Ilija Zeljkovic

prev … 9 10 11 12 13 14 next