Markov Patents (Class 704/256)
  • Patent number: 5963906
    Abstract: A method and system performs speech recognition training using Hidden Markov Models. Initially, preprocessed speech signals that include a plurality of observations are stored by the system. Initial Hidden Markov Model (HMM) parameters are then assigned. Summations are then calculated using modified equations derived substantially from the following equations, wherein u.ltoreq.v<w:P(X.sub.u.sup.v)=P(x.sub.u.sup.v)P(x.sub.v+1.sup.w)and.OMEGA..sub.ij (x.sub.u.sup.w)=.OMEGA..sub.ij (x.sub.u.sup.v)P(x.sub.v+1.sup.w)+P(x.sub.u.sup.v).OMEGA..sub.ij (x.sub.v+1.sup.w)The calculated summations are then used to perform HMM parameter reestimation. It then determines whether the HMM parameters have converged. If they have, the HMM parameters are then stored. However, if the HMM parameters have not converged, the system again calculates summations, performs HMM parameter reestimation using the summations, and determines whether the parameters have converged.
    Type: Grant
    Filed: May 20, 1997
    Date of Patent: October 5, 1999
    Assignee: AT & T Corp
    Inventor: William Turin
  • Patent number: 5960397
    Abstract: A speech recognition system which effectively recognizes unknown speech from multiple acoustic environments includes a set of secondary models, each associated with one or more particular acoustic environments, integrated with a base set of recognition models. The speech recognition system is trained by making a set of secondary models in a first stage of training, and integrating the set of secondary models with a base set of recognition models in a second stage of training.
    Type: Grant
    Filed: May 27, 1997
    Date of Patent: September 28, 1999
    Assignee: AT&T Corp
    Inventor: Mazin G. Rahim
  • Patent number: 5956676
    Abstract: A pattern adapting apparatus including an input pattern forming unit, a tree structure standard pattern storing unit for storing a tree structure standard pattern including a tree structure indicative of inclusive relationships among categories and a parameter set at each node of the tree structure, a pattern matching unit for matching categories of the tree structure standard pattern with input samples of an input pattern, a tree structure standard pattern modifying unit for modifying a tree structure standard pattern based on the results of pattern matching, a node set selecting unit for calculating a description length with respect to a plurality of node sets in a tree structure pattern to select an appropriate node set, a modified standard pattern forming unit for forming a modified standard pattern by using a parameter set of a selected node set, and a standard pattern for recognition storing unit for storing a modified standard pattern.
    Type: Grant
    Filed: August 27, 1996
    Date of Patent: September 21, 1999
    Assignee: NEC Corporation
    Inventor: Koichi Shinoda
  • Patent number: 5956678
    Abstract: In the recognition of coherently spoken words, a plurality of hypotheses is usually built up which end in various words during the recognition process and are then to be continued with further words. To keep the number of words yet to be continued as small as possible, especially in the case of a large vocabulary, it is known to carry out a look-ahead in a limited time space. It is suggested according to the invention to use the same phonemes for the look-ahead as for the actual recognition and to add together the differential sums obtained in the look-ahead for the evaluation of the partial hypothesis which has just ended and which is to be continued, and to compare this sum with a threshold value which depends on the extrapolated minimum total evaluation at the end of the time space of the look-ahead. The searching space for hypotheses to be continued can be limited by this in a particularly favorable manner.
    Type: Grant
    Filed: April 17, 1995
    Date of Patent: September 21, 1999
    Assignee: U.S. Philips Corporation
    Inventors: Reinhold Hab-Umbach, Hermann Ney
  • Patent number: 5956679
    Abstract: A speech processing apparatus includes a noise model production device for extracting a noise-speech interval from input speech data and producing a noise model by using the data of the extracted interval. The apparatus also includes a composite distribution production device for dividing the distributions of a speech model into a plurality of groups, producing a composite distribution of each group, and determining the positional relationship of each distribution within each group. In addition, the apparatus includes a memory for storing each composite distribution and the positional relationship of each distribution within the group, and a PMC conversion device for PMC-converting each produced composite distribution. Also provided is a noise-adaptive speech model production device for producing a noise-adaptive speech model on the basis of the composite distribution which is PMC-converted by the PMC conversion device and the positional relationship stored by the memory.
    Type: Grant
    Filed: December 2, 1997
    Date of Patent: September 21, 1999
    Assignee: Canon Kabushiki Kaisha
    Inventors: Yasuhiro Komori, Hiroki Yamamoto
  • Patent number: 5950158
    Abstract: Methods and apparatus for producing efficiently sized models suitable for pattern recognition purposes are described. Various embodiments are directed to the automated generation, evaluation, and selection of reduced size models from an initial model having a relatively large number of components, e.g., more components than can be stored for a particular intended application. To achieve model size reduction in an automated iterative manner, expectation maximization (EM) model training techniques are combined, in accordance with the present invention, with model size constraints. In one embodiment, a plurality of reduced size models are generated using a LaGrange multiplier from an input model and input size constraints. The plurality of reduced size models are stored in a buffer and scored using a likelihood scoring technique.
    Type: Grant
    Filed: July 30, 1997
    Date of Patent: September 7, 1999
    Assignee: Nynex Science and Technology, Inc.
    Inventor: Kuansan Wang
  • Patent number: 5946653
    Abstract: An improved method of training a SISRS uses less processing and memory resources by operating on vectors instead of matrices which represent spoken commands. Memory requirements are linearly proportional to the number of spoken commands for storing each command model. A spoken command is identified from the set of spoken commands by a command recognition procedure (200). The command recognition procedure (200) includes sampling the speaker's speech, deriving cepstral coefficients and delta-cepstral coefficients, and performing a polynomial expansion on cepstral coefficients. The identified spoken command is selected using the dot product of the command model data and the average command structure representing the unidentified spoken command.
    Type: Grant
    Filed: October 1, 1997
    Date of Patent: August 31, 1999
    Assignee: Motorola, Inc.
    Inventors: William Michael Campbell, John Eric Kleider, Charles Conway Broun, Carl Steven Gifford, Khaled Assaleh
  • Patent number: 5946655
    Abstract: When a language model is to be used for the recognition of a speech signal and the vocabulary is composed as a tree, the language model value cannot be taken into account before the word end. Customarily, after each word end the comparison with a tree root is started anew, be it with a score which has been increased by the language model value so that the threshold value for the scores at which hypotheses are terminated must be high and hence many, even unattractive hypotheses remain active for a prolonged period of time. In order to avoid this, in accordance with the invention a correction value is added to the score for at least a part of the nodes of the vocabulary tree; the sum of the correction values on the path to a word then may not be greater than the language model value for the relevant word. As a result, for each test signal the scores of all hypotheses are of a comparable order of magnitude.
    Type: Grant
    Filed: March 29, 1995
    Date of Patent: August 31, 1999
    Assignee: U.S. Philips Corporation
    Inventors: Volker Steinbiss, Bach-Hiep Tran, Hermann Ney
  • Patent number: 5946656
    Abstract: Hidden Markov models (HMMs) rely on high-dimensional feature vectors to summarize the short-time properties of speech correlations between features that can arise when the speech signal is non-stationary or corrupted by noise. These correlations are modeled using factor analysis, a statistical method for dimensionality reduction. Factor analysis is used to model acoustic correlation in automatic speech recognition by introducing a small number of parameters to model the covariance structure of a speech signal. The parameters are estimated by an Expectation Maximization (EM) technique that can be embedded in the training procedures for the HMMs, and then further adjusted using Minimum Classification Error (MCE) training, which demonstrates better discrimination and produces more accurate recognition models.
    Type: Grant
    Filed: November 17, 1997
    Date of Patent: August 31, 1999
    Assignee: AT & T Corp.
    Inventors: Mazin G. Rahim, Lawrence K. Saul
  • Patent number: 5937384
    Abstract: A method and system for achieving an improved recognition accuracy in speech recognition systems which utilize continuous density hidden Markov models to represent phonetic units of speech present in spoken speech utterances is provided. An acoustic score which reflects the likelihood that a speech utterance matches a modeled linguistic expression is dependent on the output probability associated with the states of the hidden Markov model. Context-independent and context-dependent continuous density hidden Markov models are generated for each phonetic unit. The output probability associated with a state is determined by weighing the output probabilities of the context-dependent and context-independent states in accordance with a weighting factor. The weighting factor indicates the robustness of the output probability associated with each state of each model, especially in predicting unseen speech utterances.
    Type: Grant
    Filed: May 1, 1996
    Date of Patent: August 10, 1999
    Assignee: Microsoft Corporation
    Inventors: Xuedong D. Huang, Milind V. Mahajan
  • Patent number: 5933806
    Abstract: A system and method are used for recognising a time-sequential input pattern (20), which is derived from a continual physical quantity, such as speech. The system has input means (30), which accesses the physical quantity and therefrom generates a plurality of input observation vectors. The input observation vectors represent the input pattern. A reference pattern database (40) is used for storing a plurality of reference patterns. Each reference pattern includes a sequence of reference units, where each reference unit is represented by at least one associated reference vector .mu..sub.a in a set {.mu..sub.a } of reference vectors. A localizer (50) is used for locating among the reference patterns stored in the reference pattern database (40), a recognised reference pattern, which corresponds to the input pattern. The locating includes selecting a subset {.mu..sub.s } of reference vectors from said set {.mu..sub.
    Type: Grant
    Filed: August 28, 1996
    Date of Patent: August 3, 1999
    Assignee: U.S. Philips Corporation
    Inventors: Peter Beyerlein, Meinhard D. Ullrich
  • Patent number: 5930753
    Abstract: Frequency warping approaches to speaker normalization have been proposed and evaluated on various speech recognition tasks. In all cases, frequency warping was found to significantly improve recognition performance by reducing the mismatch between test utterances presented to the recognizer and the speaker independent HMM model. This invention relates to a procedure which compensates utterances by simultaneously scaling the frequency axis and reshaping the spectral energy contour. This procedure is shown to reduce the error rate in a telephone based connected digit recognition task by 30%.
    Type: Grant
    Filed: March 20, 1997
    Date of Patent: July 27, 1999
    Assignee: AT&T Corp
    Inventors: Alexandros Potamianos, Richard Cameron Rose
  • Patent number: 5924067
    Abstract: An apparatus and method for speech recognition includes a device and a step for obtaining a mean of the time of a speech portion in the Cepstrum dimension from the speech portion of the input speech, a device and step for obtaining a mean of a time of the non-speech portion in the Cepstrum dimension from the non-speech portion of the input speech, a device and step for converting each mean time from a Cepstrum region to a linear region, and after that, subtracting it on a linear spectrum dimension, converting the subtracted mean into a Cepstrum dimension, subtracting a mean of a time of a speech portion in a Cepstrum dimension in a speech database for learning from the converted result, and adding the subtracted result to a speech model expressed by Cepstrum. By this arrangement, even when noise is large, the presumed precision of a line fluctuation is raised and the recognition rate can be improved.
    Type: Grant
    Filed: March 20, 1997
    Date of Patent: July 13, 1999
    Assignee: Canon Kabushiki Kaisha
    Inventors: Tetsuo Kosaka, Yasunori Ohora
  • Patent number: 5924066
    Abstract: A system and method for classifying a speech signal within a likely speech signal class of a plurality of speech signal classes are provided. Stochastic models include a plurality of states having state transitions and output probabilities to generate state sequences which model evolutionary characteristics and durational variability of a speech signal. The method includes extracting a frame sequence, and determining a state sequence for each stochastic model with each state sequence having full state segmentation. Representative frames are determined to provide speech signal time normalization. A likely speech signal class is determined from a neural network having a plurality of inputs receiving the representative frames and a plurality of outputs corresponding to the plurality of speech signal classes. An output signal is generated based on the likely stochastic model.
    Type: Grant
    Filed: September 26, 1997
    Date of Patent: July 13, 1999
    Assignees: U S WEST, Inc., MediaOne, Inc.
    Inventor: Amlan Kundu
  • Patent number: 5920839
    Abstract: A pattern recognition technology includes a set of control signal vector and covariance matrix for respective states of a reference pattern of an objective word for recognition, which reference pattern is expressed by a plurality of states and transitions between the states, and transition probabilities between respective states. A prediction vector of t th feature vector is derived on the basis of the t-1 th feature vector and the control signal vector for the current (n th) state, determined beforehand for each of the states. A feature vector output probability for outputting the t th feature vector in n th state of the reference pattern of the objective word for recognition is derived from multi-dimensional gaussian distribution determined by the prediction vector and the covariance matrix with taking the prediction vector as an average vector.
    Type: Grant
    Filed: February 10, 1997
    Date of Patent: July 6, 1999
    Assignee: NEC Corporation
    Inventor: Ken-Ichi Iso
  • Patent number: 5913192
    Abstract: A speaker identification system includes a speaker-independent phrase recognizer. The speaker-independent phrase recognizer scores a password utterance against all the sets of phonetic transcriptions in a lexicon database to determine the N best speaker-independent scores, determines the N best sets of phonetic transcriptions based on the N best speaker-independent scores, and determines the N best possible identities. A speaker-dependent phrase recognizer retrieves the hidden Markov model corresponding to each of the N best possible identities, and scores the password utterance against each of the N hidden Markov models to generate a speaker-dependent score for each of the N best possible identities. A score processor coupled to the outputs of the speaker-independent phrase recognizer and the speaker-dependent phrase recognizer determines a putative identity. A verifier coupled to the score processor authenticates the determined putative identity.
    Type: Grant
    Filed: August 22, 1997
    Date of Patent: June 15, 1999
    Assignee: AT&T Corp
    Inventors: Sarangarajan Parthasarathy, Aaron Edward Rosenberg
  • Patent number: 5913193
    Abstract: The present invention pertains to a concatenative speech synthesis system and method which produces a more natural sounding speech. The system provides for multiple instances of each acoustic unit which can be used to generate a speech waveform representing an linguistic expression. The multiple instances are formed during an analysis or training phase of the synthesis process and are limited to a robust representation of the highest probability instances. The provision of multiple instances enables the synthesizer to select the instance which closely resembles the desired instance thereby eliminating the need to alter the stored instance to match the desired instance. This in essence minimizes the spectral distortion between the boundaries of adjacent instances thereby producing more natural sounding speech.
    Type: Grant
    Filed: April 30, 1996
    Date of Patent: June 15, 1999
    Assignee: Microsoft Corporation
    Inventors: Xuedong D. Huang, Michael D. Plumpe, Alejandro Acero, James L. Adcock
  • Patent number: 5907825
    Abstract: A method for determining the location of a pattern, when input in isolation, within a representative input signal is provided. The method aligns the input signal with a signal representative of a plurality of connected patterns, one of which is the same as the pattern within the input signal. The method then determines the location from the results of the aligning step. The location determined using this apparatus can be used to determine an isolated reference model by extracting features of the input signal from the location found. This isolated reference model can then be used to generate a continuous reference model for the pattern, by aligning the isolated reference model with the signals representative of a plurality of connected patterns, one of which is the pattern to be modelled.
    Type: Grant
    Filed: February 6, 1997
    Date of Patent: May 25, 1999
    Assignee: Canon Kabushiki Kaisha
    Inventor: Eli Tzirkel-Hancock
  • Patent number: 5907826
    Abstract: A speech recognition apparatus includes a feature extraction section, and a recognition section. The feature extraction section extracts the feature vectors of input speech. The feature extraction section includes at least a pitch intensity extraction section. The pitch intensity extraction section extracts the intensities of the fundamental frequency components of the input speech. The recognition section performs speech recognition by using the feature vectors from the feature extraction section.
    Type: Grant
    Filed: October 28, 1997
    Date of Patent: May 25, 1999
    Assignee: NEC Corporation
    Inventor: Keizaburo Takagi
  • Patent number: 5903865
    Abstract: A speech model preparing method capable of easily preparing a new Hidden Markov Model (HMM) of an input speech with a very few number of utterances like one or two times, and a speech recognition apparatus using this method. A speech recognition apparatus uses, as a speech model, a continuous distribution type HMM defined by three parameters of a state transition probability, an average vector and a variance. The apparatus computes an average vector of an input speech to be learned, selects an HMM approximate to the input to-be-learned speech as an initial model from a registration dictionary, replaces at least an average vector of the selected HMM with the computed average vector of the to-be-learned speech and adds an obtained HMM as an HMM for the input to-be-learned speech in the dictionary.
    Type: Grant
    Filed: August 29, 1996
    Date of Patent: May 11, 1999
    Assignee: Pioneer Electronic Corporation
    Inventors: Shunsuke Ishimitsu, Ikuo Fujita
  • Patent number: 5899973
    Abstract: In this speech recognition system, the size of the language model is reduced by discarding those n-grams that the acoustic part of the system can recognize most accurately without support from a language model. The n-grams can be discarded dynamically during the running of the system or during the build or setup-time of the system. Trigrams occurring infrequently in the text corpora are substituted for the discarded n-grams to increase the accuracy of the word recognitions.
    Type: Grant
    Filed: September 25, 1997
    Date of Patent: May 4, 1999
    Assignee: International Business Machines Corporation
    Inventors: Upali Bandara, Siegfried Kunzmann, Karlheinz Mohr, Burn L. Lewis
  • Patent number: 5895448
    Abstract: Methods and apparatus for generating and using both speaker dependent and speaker independent garbage models in speaker dependent speech recognition applications are described. The present invention recognizes that in some speech recognition systems, e.g., systems where multiple speech recognition operations are performed on the same signal, it may be desirable to recognize and treat words or phrases in one part of the speech recognition system as garbage or out of vocabulary utterances with the understanding that the very same words or phrases will be recognized and treated as in-vocabulary by another portion of the system. In accordance with the present invention, in systems where both speaker independent and speaker dependent speech recognition operations are performed independently, e.g.
    Type: Grant
    Filed: April 30, 1997
    Date of Patent: April 20, 1999
    Assignee: Nynex Science and Technology, Inc.
    Inventors: George J. Vysotsky, Vijay R. Raman
  • Patent number: 5893059
    Abstract: Methods and apparatus for transitioning from one speech recognition system to another and for reusing existing speech recognition data are described. In particular, various methods of converting speech recognition templates or models from a first format to a second format are described. Methods for improving the recognition rate achieved using converted templates or models are also described. These methods involve storing source and/or scoring information for templates or models so that converted models or templates can be scored differently than original models or templates to thereby reflect the effect the conversion process has on recognition scores. In order to enhance recognition results in one embodiment an available compressed voice recording is used in the conversion process. The methods and apparatus of the present invention can be applied to a wide variety of speech recognition template and model conversion applications. Methods and apparatus for generating garbage models are also described.
    Type: Grant
    Filed: April 17, 1997
    Date of Patent: April 6, 1999
    Assignee: Nynex Science and Technology, Inc.
    Inventor: Vijay R. Raman
  • Patent number: 5890111
    Abstract: Injection noise and silence are detected in an input speech signal and an external amplifier is switched on or off based on the detected injection noise or silence. The input speech signal is digitized and a first copy of the digitized signal is preemphasized. After the input speech signal is preemphasized, a predetermined number of Mel-frequency cepstral coefficients (MFCCs) and difference cepstra are calculated for each window of the speech signal. A measure of signal energy and a measure of the rate of change of the signal energy is computed. A second copy of the digitized input speech signal is processed using amplitude summation or by differencing a center-clipped signal. The measures of signal energy, rate of change of the signal energy, the Mel coefficients, difference cepstra, and either the amplitude summation value or the differenced value are combined to form an observation vector.
    Type: Grant
    Filed: December 24, 1996
    Date of Patent: March 30, 1999
    Assignee: Technology Research Association of Medical Welfare Apparatus
    Inventors: Hector Raul Javkin, Michael Galler, Nancy Niedzielski
  • Patent number: 5890114
    Abstract: HMM training method comprising a first parameter predicting step, a centroid state set calculating step, a reconstructing step, a second parameter predicting step and a control step. In the first parameter predicting step, a parameter of an HMM (hidden Markov model) is predicted based on training data. In the centroid state set calculating step, a centroid state set is calculated by clustering the state of said HMM whose parameter is predicted in the first parameter predicting step. In the reconstructing step, an HMM is reconstructed with using the centroid state calculated in the centroid state set calculating step. In the second parameter predicting step, predicted a parameter of the HMM reconstructed in the reconstructing step with using the training data. And, the centroid step is reexecuted by the control step in the case that a likelihood of the HMM whose parameter is predicted in the second parameter predicting step does not satisfy a predetermined condition.
    Type: Grant
    Filed: February 28, 1997
    Date of Patent: March 30, 1999
    Assignee: Oki Electric Industry Co., Ltd.
    Inventor: Jie Yi
  • Patent number: 5884259
    Abstract: A method and apparatus for using a tree structure to constrain a time-synchronous, fast search for candidate words in an acoustic stream is described. A minimum stay of three frames in each graph node visited is imposed by allowing transitions only every third frame. This constraint enables the simplest possible Markov model for each phoneme while enforcing the desired minimum duration. The fast, time-synchronous search for likely words is done for an entire sentence/utterance. The list of hypotheses beginning at each time frame is stored for providing, on-demand, lists of contender/candidate words to the asynchronous, detailed match phase of decoding.
    Type: Grant
    Filed: February 12, 1997
    Date of Patent: March 16, 1999
    Assignee: International Business Machines Corporation
    Inventors: Lalit Rai Bahl, Ellen Marie Eide
  • Patent number: 5878390
    Abstract: A speech recognition apparatus which includes a speech recognition section for performing a speech recognition process on an uttered speech with reference to a predetermined statistical language model, based on a series of speech signal of the uttered speech sentence composed of a series of input words. The speech recognition section calculates a functional value of a predetermined erroneous sentence judging function with respect to speech recognition candidates, where the erroneous sentence judging representing a degree of unsuitability for the speech recognition candidates. When the calculated functional value exceeds a predetermined threshold value, the speech recognition section performs the speech recognition process by eliminating a speech recognition candidate corresponding to a calculated functional value.
    Type: Grant
    Filed: June 23, 1997
    Date of Patent: March 2, 1999
    Assignee: ATR Interpreting Telecommunications Research Laboratories
    Inventors: Jun Kawai, Yumi Wakita
  • Patent number: 5870706
    Abstract: Methods and apparatus for a language model and language recognition systems are disclosed. The method utilizes a plurality of probabilistic finite state machines having the ability to recognize a pair of sequences, one sequence scanned leftwards, the other scanned rightwards. Each word in the lexicon of the language model is associated with one or more such machines which model the semantic relations between the word and other words. Machine transitions create phrases from a set of word string hypotheses, and incrementally calculate costs related to the probability that such phrases represent the language to be recognized. The cascading lexical head machines utilized in the methods and apparatus capture the structural associations implicit in the hierachical organization of a sentence, resulting in a language model and language recognition systems that combine the lexical sensitivity of N-gram models with the structural properties of dependency grammar.
    Type: Grant
    Filed: April 10, 1996
    Date of Patent: February 9, 1999
    Assignee: Lucent Technologies, Inc.
    Inventor: Hiyan Alshawi
  • Patent number: 5864810
    Abstract: A method and apparatus for automatic recognition of speech adapts to a particular speaker by using adaptation data to develop a transformation through which speaker independent models are transformed into speaker adapted models. The speaker adapted models are then used for speaker recognition and achieve better recognition accuracy than non-adapted models. In a further embodiment, the transformation-based adaptation technique is combined with a known Bayesian adaptation technique.
    Type: Grant
    Filed: January 20, 1995
    Date of Patent: January 26, 1999
    Assignee: SRI International
    Inventors: Vassilios Digalakis, Leonardo Neumeyer, Dimitry Rtischev
  • Patent number: 5864806
    Abstract: For equalizing a speech signal constituted by an observed sequence of successive input sound frames, which speech signal is liable to be affected by disturbances, the speech signal is modelled by means of a hidden Markov model and, at each instant t: equalization filters are constituted in association with the paths in the Markov sense at instant t; at least a plurality of the equalization filters are applied to the frames to obtain, at instant t, a plurality of filtered sound frame sequences and an utterance probability for each of the paths respectively associated with the equalization filters applied; the equalization filter corresponding to the most probable path in the Markov sense is selected; and the filtered frame supplied by the selected equalization filter is selected as the equalized frame.
    Type: Grant
    Filed: May 5, 1997
    Date of Patent: January 26, 1999
    Assignee: France Telecom
    Inventors: Chafic Mokbel, Denis Jouvet, Jean Monne
  • Patent number: 5860062
    Abstract: A speech recognition apparatus and method learns in advance a plurality of kinds of noises that can occur in the environment of use to determine a plurality of noise HMMs, synthesizes these noise HMMs into one noise HMM, generates a NOVO-HMM by executing NOVO (voice mixed with noise) conversion for a speech HMM of a reference pattern by using this composite noise HMM, and uses this NOVO-HMM for a speech recognition processing. Since a plurality of noises are incorporated in the NOVO-HMM generated in this manner, the speech can be recognized with high accuracy even when the noise changes.
    Type: Grant
    Filed: June 13, 1997
    Date of Patent: January 12, 1999
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Kenichi Taniguchi, Nobuyuki Kono, Toshimichi Tokuda, Yoshio Ikura
  • Patent number: 5857169
    Abstract: A time-sequential input pattern (20), which is derived from a continual physical quantity, such as speech is recognized. The system includes input means (30), which accesses the physical quantity and therefrom generates a sequence of input observation vectors. The input observation vectors represent the input pattern. A reference pattern database (40) is used for storing reference patterns, which consist of a sequence of reference units. Each reference unit is represented by associated reference probability densities. A tree builder (60) represents for each reference unit the set of associated reference probability densities as a tree structure. Each leaf node of the tree corresponds to a reference probability density. Each non-leaf node corresponds to a cluster probability density, which is derived from all reference probability densities corresponding to leaf nodes in branches below the non-leaf node.
    Type: Grant
    Filed: August 28, 1996
    Date of Patent: January 5, 1999
    Assignee: U.S. Philips Corporation
    Inventor: Frank Seide
  • Patent number: 5850627
    Abstract: A word recognition system can: respond to the input of a character string from a user by limiting the words it will recognize to words having a related, but not necessarily the same, string; score signals generated after a user has been prompted to generate a given word against words other than the prompted word to determine if the signal should be used to train the prompted word; vary the number of signals a user is prompted to generate to train a given word as a function of how well the training signals score against each other or prior models for the prompted word; create a new acoustic model of a phrase by concatenating prior acoustic models of the words in the phrase; obtain information from another program running on the same computer, such as its commands or the context of text being entered into it, and use that information to vary which words it can recognize; determine which program unit, such as an application program or dialog box, currently has input focus on its computer and create a vocabulary
    Type: Grant
    Filed: June 26, 1997
    Date of Patent: December 15, 1998
    Assignee: Dragon Systems, Inc.
    Inventors: Joel M. Gould, Elizabeth E. Steele, Frank J. McGrath, Steven D. Squires, Peter S. Heitman, Joel W. Parke, Dean G. Sturtevant, Jed M. Roberts, James K. Baker
  • Patent number: 5842165
    Abstract: Methods and apparatus for the generation of speaker dependent garbage models from the very same data used to generate speaker dependent speech recognition models, e.g., word models, are described. The technique involves processing the data included in the speaker dependent speech recognition models to create one or more speaker dependent garbage models. The speaker dependent garbage model generation technique involves what may be described as distorting or morphing of a speaker dependent speech recognition model to generate a speaker dependent garbage model therefrom. One or more speaker dependent speech recognition models may then be combined with the generated speaker dependent garbage model to produce an updated garbage model. The scoring of speaker dependent garbage models is varied in accordance with the present invention as a function of the number of speech recognition models from which the speaker dependent garbage model was created.
    Type: Grant
    Filed: April 30, 1997
    Date of Patent: November 24, 1998
    Assignee: Nynex Science & Technology, Inc.
    Inventors: Vijay R. Raman, George J. Vysotsky
  • Patent number: 5839105
    Abstract: There is provided a speaker-independent model generation apparatus and a speech recognition apparatus which require a processing unit to have less memory capacity and which allow its computation time to be reduced, as compared with a conventional counterpart. A single Gaussian HMM is generated with a Baum-Welch training algorithm based on spoken speech data from a plurality of specific speakers. A state having a maximum increase in likelihood as a result of splitting one state in contextual or temporal domains is searched. Then, the state having a maximum increase in likelihood is split in a contextual or temporal domain corresponding to the maximum increase in likelihood. Thereafter, a single Gaussian HMM is generated with the Baum-Welch training algorithm, and these steps are iterated until the states within the single Gaussian HMM can no longer be split or until a predetermined number of splits is reached. Thus, a speaker-independent HMM is generated.
    Type: Grant
    Filed: November 29, 1996
    Date of Patent: November 17, 1998
    Assignee: ATR Interpreting Telecommunications Research Laboratories
    Inventors: Mari Ostendorf, Harald Singer
  • Patent number: 5835890
    Abstract: In a speaker adaptation method for speech models, input speech is transformed to a feature parameter sequence like a cepstral sequence, and N model sequences of maximum likelihood for the feature parameter sequence are extracted from speaker-independent speech HMMs by an N-best hypothesis extraction method. The extracted model sequences are each provisionally adapted to maximize its likelihood for the feature parameter sequence of the input speech while changing the HMM parameters of each sequence, and that one of the provisionally adapted model sequences which has the maximum likelihood for the feature parameter sequence of the input speech is selected and speech models of the selected sequence are provided as adapted HMMs of the speaker to be recognized.
    Type: Grant
    Filed: April 9, 1997
    Date of Patent: November 10, 1998
    Assignee: Nippon Telegraph and Telephone Corporation
    Inventors: Tomoko Matsui, Sadaoki Furui
  • Patent number: 5835893
    Abstract: In a word clustering apparatus for clustering words, a plurality of words is clustered to obtain a total tree diagram of a word dictionary representing a word clustering result, where the total tree diagram includes tree diagrams of an upper layer, a middle layer and a lower layer. In a speech recognition apparatus, a microphone converts an input utterance speech composed of a plurality of words into a speech signal, and a feature extractor extracts predetermined acoustic feature parameters from the converted speech signal. Then, a speech recognition controller executes a speech recognition process on the extracted acoustic feature parameters with reference to a predetermined Hidden Markov Model and the obtained total tree diagram of the word dictionary, and outputs a result of the speech recognition.
    Type: Grant
    Filed: April 18, 1996
    Date of Patent: November 10, 1998
    Assignee: ATR Interpreting Telecommunications Research Labs
    Inventor: Akira Ushioda
  • Patent number: 5832430
    Abstract: Devices and methods for speech recognition enable simultaneous word hypothesis detection and verification in a one-pass procedure that provides for different segmentations of the speech input. A confidence measure of a target hypothesis for a known word is determined according to a recursion formula that operates on parameters of a target models and alternate models of known words, a language model and a lexicon, and feature vectors of the speech input in a likelihood ratio decoder. The confidence measure is processed to determine an accept/reject signal for the target hypothesis that is output with a target hypothesis signal. The recursion formula is based on hidden Markov models with a single optimum state sequence and may take the form of a modified Viterbi algorithm.
    Type: Grant
    Filed: December 8, 1995
    Date of Patent: November 3, 1998
    Assignee: Lucent Technologies, Inc.
    Inventors: Eduardo Lleida, Richard Cameron Rose
  • Patent number: 5822731
    Abstract: A system for parsing information representative of a sequence of words having parts of speech. The sequence of words forms a sentence or sentence fragment. A hidden Markov model is provided for determining the most likely part of speech of a selected word of the sequence of words. The hidden Markov model has an initial transition matrix and a subsequent transition matrix for storing probabilities of occurrence of the parts of speech. The initial transition matrix of the hidden Markov model is removed to provide a modified hidden Markov model. The modified hidden Markov model is applied to the sequence of words to determine the most likely part of speech of a selected word within a sentence fragment with increased accuracy.
    Type: Grant
    Filed: September 15, 1995
    Date of Patent: October 13, 1998
    Assignee: Infonautics Corporation
    Inventor: John Michael Schultz
  • Patent number: 5819222
    Abstract: A speech recognition system recognizes connected speech using a plurality of vocabulary nodes, at least one of which has an associated signature. In use, partial recognition paths are examined at decision nodes intermediate the beginning and end of the recognition path, each decision node having an associated set of valid accumulated signatures. A token received by a decision node is only propagated if the accumulated signature of that token is one of those in the set of valid accumulated signatures associated with that decision node.
    Type: Grant
    Filed: October 11, 1995
    Date of Patent: October 6, 1998
    Assignee: British Telecommunications public limited company
    Inventors: Samuel Gavin Smyth, Simon Patrick Alexander Ringland
  • Patent number: 5819223
    Abstract: A speech adaptation device comprises a vocabulary independent reference pattern memory for memorizing a plurality of vocabulary independent reference patterns having one or more categories. Each category has one or more acoustic units, and has such a connection relation of the acoustic units that allows reception of any sequence of the acoustic units appearing in the input speech. A preliminary matching unit is for use in making time-alignment between the time series of the feature vectors of the input speech obtained from the analysis unit and the vocabulary independent reference pattern to obtain mean vectors for individual categories of the input speech and the vocabulary independent reference pattern from the aligned portion for the individual categories of the feature vectors of the input speech and the vocabulary independent reference pattern.
    Type: Grant
    Filed: January 26, 1996
    Date of Patent: October 6, 1998
    Assignee: NEC Corporation
    Inventor: Keizaburo Takagi
  • Patent number: 5812975
    Abstract: A method of designing a state transition model capable of high speed voice recognition and a voice recognition method and apparatus using the state transition model is provided. The methods provide a state transition model in which a state shared structure of the state transition model is designed. The method includes a step of setting the states of a triphone state transition model in an acoustic space as initial clusters, a clustering step of generating a cluster containing the initial clusters by top-down clustering, a step of determining a state shared structure by assigning a short distance cluster among clusters generated by the clustering step, to the state transition model, and a step of learning a state shared model by analyzing the states of the triphones in accordance with the determined state shared structure.
    Type: Grant
    Filed: June 18, 1996
    Date of Patent: September 22, 1998
    Assignee: Canon Kabushiki Kaisha
    Inventors: Yasuhiro Komori, Yasunori Ohora
  • Patent number: 5812974
    Abstract: This is a speech recognition method for modeling adjacent word context, comprising: dividing a first word or period of silence into two portions; dividing a second word or period of silence, adjacent to the first word, into two potions; and combining last portion of the first word or period of silence and first portion of the second word or period of silence to make an acoustic model. The method includes constructing a grammar to restrict the acoustic models to the middle-to-middle context.
    Type: Grant
    Filed: April 10, 1996
    Date of Patent: September 22, 1998
    Assignee: Texas Instruments Incorporated
    Inventors: Charles T. Hemphill, Lorin P. Netsch, Christopher M. Kribs
  • Patent number: 5806034
    Abstract: A method for recognizing spoken utterances of a speaker is disclosed, the method comprising the steps of providing a database of labeled speech data; providing a prototype of a Hidden Markov Model (HMM) definition to define the characteristics of the HMM; and parameterizing speech utterances according to one of linear prediction parameters or Mel-scale filter bank parameters. The method further includes selecting a frame period for accommodating the parameters and generating HMMs and decoding to specified speech utterances by causing the user to utter predefined training speech utterances for each HMM. The method then statistically computes the generated HMMs with the prototype HMM to provide a set of fully trained HMMs for each utterance indicative of the speaker.
    Type: Grant
    Filed: August 2, 1995
    Date of Patent: September 8, 1998
    Assignee: ITT Corporation
    Inventors: Joe A. Naylor, William Y. Huang, Lawrence G. Bahler
  • Patent number: 5799278
    Abstract: A speech recognition system for discrete words uses a single Hidden Markov Model (HMM), which is nominally adapted to recognise N different isolated words, but which is trained to recognise M different words, where M>N. This is achieved by providing M sets of audio recordings, each set comprising multiple recordings of a respective one of said M words being spoken. Only N different labels are assigned to the M sets of audio recordings, so that at least one of the N labels has two or more sets of audio recordings assigned thereto. These two or more sets of audio recordings correspond to phonetically dissimilar words. The HMM is then trained by inputting each set of audio recordings and its assigned label. The HMM can effectively compensate for the phonetic variations between the different words assigned the same label, thereby avoiding the need to utilise a larger model (i.e., to use M labels).
    Type: Grant
    Filed: July 2, 1996
    Date of Patent: August 25, 1998
    Assignee: International Business Machines Corporation
    Inventors: Michael Cobbett, John Brian Pickering
  • Patent number: 5799277
    Abstract: The acoustic model generating method for speech recognition enables a high representation effect on the basis of the minimum possible model parameters. In an initial model having a smaller number of signal sources, the acoustic model for speech recognition is generated by selecting the splitting processing or the merging processing for the signal sources successively and repeatedly. The merging processing is executed prior to the splitting processing. In the merging processing, when the merged result is not appropriate, the splitting processing is executed for the model obtained before merging processing (without use of the merged result).
    Type: Grant
    Filed: October 25, 1995
    Date of Patent: August 25, 1998
    Assignee: Victor Company of Japan, Ltd.
    Inventor: Junichi Takami
  • Patent number: 5797123
    Abstract: A key-phrase detection and verification method that can be advantageously used to realize understanding of flexible (i.e., unconstrained) speech. A "multiple pass" procedure is applied to a spoken utterance comprising a sequence of words (i.e., a "sentence"). First, a plurality of key-phrases are detected (i.e., recognized) based on a set of phrase sub-grammars which may, for example, be specific to the state of the dialogue. These key-phrases are then verified by assigning confidence measures thereto and comparing these confidence measures to a threshold, resulting in a set of verified key-phrase candidates. Next, the verified key-phrase candidates are connected into sentence hypotheses based upon the confidence measures and predetermined (e.g., task-specific) semantic information. And, finally, one or more of these sentence hypotheses are verified to produce a verified sentence hypothesis and, from that, a resultant understanding of the spoken utterance.
    Type: Grant
    Filed: December 20, 1996
    Date of Patent: August 18, 1998
    Assignee: Lucent Technologies Inc.
    Inventors: Wu Chou, Biing-Hwang Juang, Tatsuya Kawahara, Chin-Hui Lee
  • Patent number: 5794198
    Abstract: One-dimensional normal distributions in respective dimensions of a continuous multi-dimensional normal distribution of each state of HMMs representing speech units mean and variance values are tied among similar one-dimensional distributions. As a result, the total number of normal distributions for representing the model is reduced without degrading recognition performance.
    Type: Grant
    Filed: October 24, 1995
    Date of Patent: August 11, 1998
    Assignee: Nippon Telegraph and Telephone Corporation
    Inventors: Satoshi Takahashi, Shigeki Sagayama
  • Patent number: 5787396
    Abstract: A speech recognition method uses continuous mixture Hidden Markov Models (HMM) for probability processing including a first type of HMM having a small number of mixtures and a second type of HMM having a larger number of mixtures. First output probabilities are formed for inputted speech using the small number of mixtures type HMM and second output probabilities are formed for the input speech using the large number of mixtures type HMM for selected states corresponding to the highest output probabilities of the first type HMM. The input speech is recognized from both the first and second output probabilities.
    Type: Grant
    Filed: September 18, 1995
    Date of Patent: July 28, 1998
    Assignee: Canon Kabushiki Kaisha
    Inventors: Yasuhiro Komori, Yasunori Ohora, Masayuki Yamada
  • Patent number: 5778341
    Abstract: The invention is a speech recognition system and method for transmitting information including the receipt and decoding of speech information such as that modeled by hidden Markov models (HMMs). In this invention, the state likelihoods of the modeled state sequences contained within the speech information are assigned penalties based on the difference between those state likelihoods and a maximum possible state likelihood. Once penalties have been assigned, the modified state sequence with the modified state likelihoods having the highest cumulative state likelihoods is used in further speech recognition processing. In this manner, state sequences having no extremely poor state likelihoods are favored over those having both extremely high and extremely poor state likelihoods.
    Type: Grant
    Filed: January 26, 1996
    Date of Patent: July 7, 1998
    Assignee: Lucent Technologies Inc.
    Inventor: Ilija Zeljkovic