Markov Patents (Class 704/256)
-
Patent number: 5963906Abstract: A method and system performs speech recognition training using Hidden Markov Models. Initially, preprocessed speech signals that include a plurality of observations are stored by the system. Initial Hidden Markov Model (HMM) parameters are then assigned. Summations are then calculated using modified equations derived substantially from the following equations, wherein u.ltoreq.v<w:P(X.sub.u.sup.v)=P(x.sub.u.sup.v)P(x.sub.v+1.sup.w)and.OMEGA..sub.ij (x.sub.u.sup.w)=.OMEGA..sub.ij (x.sub.u.sup.v)P(x.sub.v+1.sup.w)+P(x.sub.u.sup.v).OMEGA..sub.ij (x.sub.v+1.sup.w)The calculated summations are then used to perform HMM parameter reestimation. It then determines whether the HMM parameters have converged. If they have, the HMM parameters are then stored. However, if the HMM parameters have not converged, the system again calculates summations, performs HMM parameter reestimation using the summations, and determines whether the parameters have converged.Type: GrantFiled: May 20, 1997Date of Patent: October 5, 1999Assignee: AT & T CorpInventor: William Turin
-
Patent number: 5960397Abstract: A speech recognition system which effectively recognizes unknown speech from multiple acoustic environments includes a set of secondary models, each associated with one or more particular acoustic environments, integrated with a base set of recognition models. The speech recognition system is trained by making a set of secondary models in a first stage of training, and integrating the set of secondary models with a base set of recognition models in a second stage of training.Type: GrantFiled: May 27, 1997Date of Patent: September 28, 1999Assignee: AT&T CorpInventor: Mazin G. Rahim
-
Patent number: 5956676Abstract: A pattern adapting apparatus including an input pattern forming unit, a tree structure standard pattern storing unit for storing a tree structure standard pattern including a tree structure indicative of inclusive relationships among categories and a parameter set at each node of the tree structure, a pattern matching unit for matching categories of the tree structure standard pattern with input samples of an input pattern, a tree structure standard pattern modifying unit for modifying a tree structure standard pattern based on the results of pattern matching, a node set selecting unit for calculating a description length with respect to a plurality of node sets in a tree structure pattern to select an appropriate node set, a modified standard pattern forming unit for forming a modified standard pattern by using a parameter set of a selected node set, and a standard pattern for recognition storing unit for storing a modified standard pattern.Type: GrantFiled: August 27, 1996Date of Patent: September 21, 1999Assignee: NEC CorporationInventor: Koichi Shinoda
-
Patent number: 5956678Abstract: In the recognition of coherently spoken words, a plurality of hypotheses is usually built up which end in various words during the recognition process and are then to be continued with further words. To keep the number of words yet to be continued as small as possible, especially in the case of a large vocabulary, it is known to carry out a look-ahead in a limited time space. It is suggested according to the invention to use the same phonemes for the look-ahead as for the actual recognition and to add together the differential sums obtained in the look-ahead for the evaluation of the partial hypothesis which has just ended and which is to be continued, and to compare this sum with a threshold value which depends on the extrapolated minimum total evaluation at the end of the time space of the look-ahead. The searching space for hypotheses to be continued can be limited by this in a particularly favorable manner.Type: GrantFiled: April 17, 1995Date of Patent: September 21, 1999Assignee: U.S. Philips CorporationInventors: Reinhold Hab-Umbach, Hermann Ney
-
Patent number: 5956679Abstract: A speech processing apparatus includes a noise model production device for extracting a noise-speech interval from input speech data and producing a noise model by using the data of the extracted interval. The apparatus also includes a composite distribution production device for dividing the distributions of a speech model into a plurality of groups, producing a composite distribution of each group, and determining the positional relationship of each distribution within each group. In addition, the apparatus includes a memory for storing each composite distribution and the positional relationship of each distribution within the group, and a PMC conversion device for PMC-converting each produced composite distribution. Also provided is a noise-adaptive speech model production device for producing a noise-adaptive speech model on the basis of the composite distribution which is PMC-converted by the PMC conversion device and the positional relationship stored by the memory.Type: GrantFiled: December 2, 1997Date of Patent: September 21, 1999Assignee: Canon Kabushiki KaishaInventors: Yasuhiro Komori, Hiroki Yamamoto
-
Patent number: 5950158Abstract: Methods and apparatus for producing efficiently sized models suitable for pattern recognition purposes are described. Various embodiments are directed to the automated generation, evaluation, and selection of reduced size models from an initial model having a relatively large number of components, e.g., more components than can be stored for a particular intended application. To achieve model size reduction in an automated iterative manner, expectation maximization (EM) model training techniques are combined, in accordance with the present invention, with model size constraints. In one embodiment, a plurality of reduced size models are generated using a LaGrange multiplier from an input model and input size constraints. The plurality of reduced size models are stored in a buffer and scored using a likelihood scoring technique.Type: GrantFiled: July 30, 1997Date of Patent: September 7, 1999Assignee: Nynex Science and Technology, Inc.Inventor: Kuansan Wang
-
Patent number: 5946653Abstract: An improved method of training a SISRS uses less processing and memory resources by operating on vectors instead of matrices which represent spoken commands. Memory requirements are linearly proportional to the number of spoken commands for storing each command model. A spoken command is identified from the set of spoken commands by a command recognition procedure (200). The command recognition procedure (200) includes sampling the speaker's speech, deriving cepstral coefficients and delta-cepstral coefficients, and performing a polynomial expansion on cepstral coefficients. The identified spoken command is selected using the dot product of the command model data and the average command structure representing the unidentified spoken command.Type: GrantFiled: October 1, 1997Date of Patent: August 31, 1999Assignee: Motorola, Inc.Inventors: William Michael Campbell, John Eric Kleider, Charles Conway Broun, Carl Steven Gifford, Khaled Assaleh
-
Patent number: 5946655Abstract: When a language model is to be used for the recognition of a speech signal and the vocabulary is composed as a tree, the language model value cannot be taken into account before the word end. Customarily, after each word end the comparison with a tree root is started anew, be it with a score which has been increased by the language model value so that the threshold value for the scores at which hypotheses are terminated must be high and hence many, even unattractive hypotheses remain active for a prolonged period of time. In order to avoid this, in accordance with the invention a correction value is added to the score for at least a part of the nodes of the vocabulary tree; the sum of the correction values on the path to a word then may not be greater than the language model value for the relevant word. As a result, for each test signal the scores of all hypotheses are of a comparable order of magnitude.Type: GrantFiled: March 29, 1995Date of Patent: August 31, 1999Assignee: U.S. Philips CorporationInventors: Volker Steinbiss, Bach-Hiep Tran, Hermann Ney
-
Patent number: 5946656Abstract: Hidden Markov models (HMMs) rely on high-dimensional feature vectors to summarize the short-time properties of speech correlations between features that can arise when the speech signal is non-stationary or corrupted by noise. These correlations are modeled using factor analysis, a statistical method for dimensionality reduction. Factor analysis is used to model acoustic correlation in automatic speech recognition by introducing a small number of parameters to model the covariance structure of a speech signal. The parameters are estimated by an Expectation Maximization (EM) technique that can be embedded in the training procedures for the HMMs, and then further adjusted using Minimum Classification Error (MCE) training, which demonstrates better discrimination and produces more accurate recognition models.Type: GrantFiled: November 17, 1997Date of Patent: August 31, 1999Assignee: AT & T Corp.Inventors: Mazin G. Rahim, Lawrence K. Saul
-
Patent number: 5937384Abstract: A method and system for achieving an improved recognition accuracy in speech recognition systems which utilize continuous density hidden Markov models to represent phonetic units of speech present in spoken speech utterances is provided. An acoustic score which reflects the likelihood that a speech utterance matches a modeled linguistic expression is dependent on the output probability associated with the states of the hidden Markov model. Context-independent and context-dependent continuous density hidden Markov models are generated for each phonetic unit. The output probability associated with a state is determined by weighing the output probabilities of the context-dependent and context-independent states in accordance with a weighting factor. The weighting factor indicates the robustness of the output probability associated with each state of each model, especially in predicting unseen speech utterances.Type: GrantFiled: May 1, 1996Date of Patent: August 10, 1999Assignee: Microsoft CorporationInventors: Xuedong D. Huang, Milind V. Mahajan
-
Patent number: 5933806Abstract: A system and method are used for recognising a time-sequential input pattern (20), which is derived from a continual physical quantity, such as speech. The system has input means (30), which accesses the physical quantity and therefrom generates a plurality of input observation vectors. The input observation vectors represent the input pattern. A reference pattern database (40) is used for storing a plurality of reference patterns. Each reference pattern includes a sequence of reference units, where each reference unit is represented by at least one associated reference vector .mu..sub.a in a set {.mu..sub.a } of reference vectors. A localizer (50) is used for locating among the reference patterns stored in the reference pattern database (40), a recognised reference pattern, which corresponds to the input pattern. The locating includes selecting a subset {.mu..sub.s } of reference vectors from said set {.mu..sub.Type: GrantFiled: August 28, 1996Date of Patent: August 3, 1999Assignee: U.S. Philips CorporationInventors: Peter Beyerlein, Meinhard D. Ullrich
-
Patent number: 5930753Abstract: Frequency warping approaches to speaker normalization have been proposed and evaluated on various speech recognition tasks. In all cases, frequency warping was found to significantly improve recognition performance by reducing the mismatch between test utterances presented to the recognizer and the speaker independent HMM model. This invention relates to a procedure which compensates utterances by simultaneously scaling the frequency axis and reshaping the spectral energy contour. This procedure is shown to reduce the error rate in a telephone based connected digit recognition task by 30%.Type: GrantFiled: March 20, 1997Date of Patent: July 27, 1999Assignee: AT&T CorpInventors: Alexandros Potamianos, Richard Cameron Rose
-
Patent number: 5924067Abstract: An apparatus and method for speech recognition includes a device and a step for obtaining a mean of the time of a speech portion in the Cepstrum dimension from the speech portion of the input speech, a device and step for obtaining a mean of a time of the non-speech portion in the Cepstrum dimension from the non-speech portion of the input speech, a device and step for converting each mean time from a Cepstrum region to a linear region, and after that, subtracting it on a linear spectrum dimension, converting the subtracted mean into a Cepstrum dimension, subtracting a mean of a time of a speech portion in a Cepstrum dimension in a speech database for learning from the converted result, and adding the subtracted result to a speech model expressed by Cepstrum. By this arrangement, even when noise is large, the presumed precision of a line fluctuation is raised and the recognition rate can be improved.Type: GrantFiled: March 20, 1997Date of Patent: July 13, 1999Assignee: Canon Kabushiki KaishaInventors: Tetsuo Kosaka, Yasunori Ohora
-
Patent number: 5924066Abstract: A system and method for classifying a speech signal within a likely speech signal class of a plurality of speech signal classes are provided. Stochastic models include a plurality of states having state transitions and output probabilities to generate state sequences which model evolutionary characteristics and durational variability of a speech signal. The method includes extracting a frame sequence, and determining a state sequence for each stochastic model with each state sequence having full state segmentation. Representative frames are determined to provide speech signal time normalization. A likely speech signal class is determined from a neural network having a plurality of inputs receiving the representative frames and a plurality of outputs corresponding to the plurality of speech signal classes. An output signal is generated based on the likely stochastic model.Type: GrantFiled: September 26, 1997Date of Patent: July 13, 1999Assignees: U S WEST, Inc., MediaOne, Inc.Inventor: Amlan Kundu
-
Patent number: 5920839Abstract: A pattern recognition technology includes a set of control signal vector and covariance matrix for respective states of a reference pattern of an objective word for recognition, which reference pattern is expressed by a plurality of states and transitions between the states, and transition probabilities between respective states. A prediction vector of t th feature vector is derived on the basis of the t-1 th feature vector and the control signal vector for the current (n th) state, determined beforehand for each of the states. A feature vector output probability for outputting the t th feature vector in n th state of the reference pattern of the objective word for recognition is derived from multi-dimensional gaussian distribution determined by the prediction vector and the covariance matrix with taking the prediction vector as an average vector.Type: GrantFiled: February 10, 1997Date of Patent: July 6, 1999Assignee: NEC CorporationInventor: Ken-Ichi Iso
-
Patent number: 5913192Abstract: A speaker identification system includes a speaker-independent phrase recognizer. The speaker-independent phrase recognizer scores a password utterance against all the sets of phonetic transcriptions in a lexicon database to determine the N best speaker-independent scores, determines the N best sets of phonetic transcriptions based on the N best speaker-independent scores, and determines the N best possible identities. A speaker-dependent phrase recognizer retrieves the hidden Markov model corresponding to each of the N best possible identities, and scores the password utterance against each of the N hidden Markov models to generate a speaker-dependent score for each of the N best possible identities. A score processor coupled to the outputs of the speaker-independent phrase recognizer and the speaker-dependent phrase recognizer determines a putative identity. A verifier coupled to the score processor authenticates the determined putative identity.Type: GrantFiled: August 22, 1997Date of Patent: June 15, 1999Assignee: AT&T CorpInventors: Sarangarajan Parthasarathy, Aaron Edward Rosenberg
-
Patent number: 5913193Abstract: The present invention pertains to a concatenative speech synthesis system and method which produces a more natural sounding speech. The system provides for multiple instances of each acoustic unit which can be used to generate a speech waveform representing an linguistic expression. The multiple instances are formed during an analysis or training phase of the synthesis process and are limited to a robust representation of the highest probability instances. The provision of multiple instances enables the synthesizer to select the instance which closely resembles the desired instance thereby eliminating the need to alter the stored instance to match the desired instance. This in essence minimizes the spectral distortion between the boundaries of adjacent instances thereby producing more natural sounding speech.Type: GrantFiled: April 30, 1996Date of Patent: June 15, 1999Assignee: Microsoft CorporationInventors: Xuedong D. Huang, Michael D. Plumpe, Alejandro Acero, James L. Adcock
-
Patent number: 5907825Abstract: A method for determining the location of a pattern, when input in isolation, within a representative input signal is provided. The method aligns the input signal with a signal representative of a plurality of connected patterns, one of which is the same as the pattern within the input signal. The method then determines the location from the results of the aligning step. The location determined using this apparatus can be used to determine an isolated reference model by extracting features of the input signal from the location found. This isolated reference model can then be used to generate a continuous reference model for the pattern, by aligning the isolated reference model with the signals representative of a plurality of connected patterns, one of which is the pattern to be modelled.Type: GrantFiled: February 6, 1997Date of Patent: May 25, 1999Assignee: Canon Kabushiki KaishaInventor: Eli Tzirkel-Hancock
-
Patent number: 5907826Abstract: A speech recognition apparatus includes a feature extraction section, and a recognition section. The feature extraction section extracts the feature vectors of input speech. The feature extraction section includes at least a pitch intensity extraction section. The pitch intensity extraction section extracts the intensities of the fundamental frequency components of the input speech. The recognition section performs speech recognition by using the feature vectors from the feature extraction section.Type: GrantFiled: October 28, 1997Date of Patent: May 25, 1999Assignee: NEC CorporationInventor: Keizaburo Takagi
-
Patent number: 5903865Abstract: A speech model preparing method capable of easily preparing a new Hidden Markov Model (HMM) of an input speech with a very few number of utterances like one or two times, and a speech recognition apparatus using this method. A speech recognition apparatus uses, as a speech model, a continuous distribution type HMM defined by three parameters of a state transition probability, an average vector and a variance. The apparatus computes an average vector of an input speech to be learned, selects an HMM approximate to the input to-be-learned speech as an initial model from a registration dictionary, replaces at least an average vector of the selected HMM with the computed average vector of the to-be-learned speech and adds an obtained HMM as an HMM for the input to-be-learned speech in the dictionary.Type: GrantFiled: August 29, 1996Date of Patent: May 11, 1999Assignee: Pioneer Electronic CorporationInventors: Shunsuke Ishimitsu, Ikuo Fujita
-
Patent number: 5899973Abstract: In this speech recognition system, the size of the language model is reduced by discarding those n-grams that the acoustic part of the system can recognize most accurately without support from a language model. The n-grams can be discarded dynamically during the running of the system or during the build or setup-time of the system. Trigrams occurring infrequently in the text corpora are substituted for the discarded n-grams to increase the accuracy of the word recognitions.Type: GrantFiled: September 25, 1997Date of Patent: May 4, 1999Assignee: International Business Machines CorporationInventors: Upali Bandara, Siegfried Kunzmann, Karlheinz Mohr, Burn L. Lewis
-
Patent number: 5895448Abstract: Methods and apparatus for generating and using both speaker dependent and speaker independent garbage models in speaker dependent speech recognition applications are described. The present invention recognizes that in some speech recognition systems, e.g., systems where multiple speech recognition operations are performed on the same signal, it may be desirable to recognize and treat words or phrases in one part of the speech recognition system as garbage or out of vocabulary utterances with the understanding that the very same words or phrases will be recognized and treated as in-vocabulary by another portion of the system. In accordance with the present invention, in systems where both speaker independent and speaker dependent speech recognition operations are performed independently, e.g.Type: GrantFiled: April 30, 1997Date of Patent: April 20, 1999Assignee: Nynex Science and Technology, Inc.Inventors: George J. Vysotsky, Vijay R. Raman
-
Patent number: 5893059Abstract: Methods and apparatus for transitioning from one speech recognition system to another and for reusing existing speech recognition data are described. In particular, various methods of converting speech recognition templates or models from a first format to a second format are described. Methods for improving the recognition rate achieved using converted templates or models are also described. These methods involve storing source and/or scoring information for templates or models so that converted models or templates can be scored differently than original models or templates to thereby reflect the effect the conversion process has on recognition scores. In order to enhance recognition results in one embodiment an available compressed voice recording is used in the conversion process. The methods and apparatus of the present invention can be applied to a wide variety of speech recognition template and model conversion applications. Methods and apparatus for generating garbage models are also described.Type: GrantFiled: April 17, 1997Date of Patent: April 6, 1999Assignee: Nynex Science and Technology, Inc.Inventor: Vijay R. Raman
-
Patent number: 5890111Abstract: Injection noise and silence are detected in an input speech signal and an external amplifier is switched on or off based on the detected injection noise or silence. The input speech signal is digitized and a first copy of the digitized signal is preemphasized. After the input speech signal is preemphasized, a predetermined number of Mel-frequency cepstral coefficients (MFCCs) and difference cepstra are calculated for each window of the speech signal. A measure of signal energy and a measure of the rate of change of the signal energy is computed. A second copy of the digitized input speech signal is processed using amplitude summation or by differencing a center-clipped signal. The measures of signal energy, rate of change of the signal energy, the Mel coefficients, difference cepstra, and either the amplitude summation value or the differenced value are combined to form an observation vector.Type: GrantFiled: December 24, 1996Date of Patent: March 30, 1999Assignee: Technology Research Association of Medical Welfare ApparatusInventors: Hector Raul Javkin, Michael Galler, Nancy Niedzielski
-
Patent number: 5890114Abstract: HMM training method comprising a first parameter predicting step, a centroid state set calculating step, a reconstructing step, a second parameter predicting step and a control step. In the first parameter predicting step, a parameter of an HMM (hidden Markov model) is predicted based on training data. In the centroid state set calculating step, a centroid state set is calculated by clustering the state of said HMM whose parameter is predicted in the first parameter predicting step. In the reconstructing step, an HMM is reconstructed with using the centroid state calculated in the centroid state set calculating step. In the second parameter predicting step, predicted a parameter of the HMM reconstructed in the reconstructing step with using the training data. And, the centroid step is reexecuted by the control step in the case that a likelihood of the HMM whose parameter is predicted in the second parameter predicting step does not satisfy a predetermined condition.Type: GrantFiled: February 28, 1997Date of Patent: March 30, 1999Assignee: Oki Electric Industry Co., Ltd.Inventor: Jie Yi
-
Patent number: 5884259Abstract: A method and apparatus for using a tree structure to constrain a time-synchronous, fast search for candidate words in an acoustic stream is described. A minimum stay of three frames in each graph node visited is imposed by allowing transitions only every third frame. This constraint enables the simplest possible Markov model for each phoneme while enforcing the desired minimum duration. The fast, time-synchronous search for likely words is done for an entire sentence/utterance. The list of hypotheses beginning at each time frame is stored for providing, on-demand, lists of contender/candidate words to the asynchronous, detailed match phase of decoding.Type: GrantFiled: February 12, 1997Date of Patent: March 16, 1999Assignee: International Business Machines CorporationInventors: Lalit Rai Bahl, Ellen Marie Eide
-
Patent number: 5878390Abstract: A speech recognition apparatus which includes a speech recognition section for performing a speech recognition process on an uttered speech with reference to a predetermined statistical language model, based on a series of speech signal of the uttered speech sentence composed of a series of input words. The speech recognition section calculates a functional value of a predetermined erroneous sentence judging function with respect to speech recognition candidates, where the erroneous sentence judging representing a degree of unsuitability for the speech recognition candidates. When the calculated functional value exceeds a predetermined threshold value, the speech recognition section performs the speech recognition process by eliminating a speech recognition candidate corresponding to a calculated functional value.Type: GrantFiled: June 23, 1997Date of Patent: March 2, 1999Assignee: ATR Interpreting Telecommunications Research LaboratoriesInventors: Jun Kawai, Yumi Wakita
-
Patent number: 5870706Abstract: Methods and apparatus for a language model and language recognition systems are disclosed. The method utilizes a plurality of probabilistic finite state machines having the ability to recognize a pair of sequences, one sequence scanned leftwards, the other scanned rightwards. Each word in the lexicon of the language model is associated with one or more such machines which model the semantic relations between the word and other words. Machine transitions create phrases from a set of word string hypotheses, and incrementally calculate costs related to the probability that such phrases represent the language to be recognized. The cascading lexical head machines utilized in the methods and apparatus capture the structural associations implicit in the hierachical organization of a sentence, resulting in a language model and language recognition systems that combine the lexical sensitivity of N-gram models with the structural properties of dependency grammar.Type: GrantFiled: April 10, 1996Date of Patent: February 9, 1999Assignee: Lucent Technologies, Inc.Inventor: Hiyan Alshawi
-
Patent number: 5864810Abstract: A method and apparatus for automatic recognition of speech adapts to a particular speaker by using adaptation data to develop a transformation through which speaker independent models are transformed into speaker adapted models. The speaker adapted models are then used for speaker recognition and achieve better recognition accuracy than non-adapted models. In a further embodiment, the transformation-based adaptation technique is combined with a known Bayesian adaptation technique.Type: GrantFiled: January 20, 1995Date of Patent: January 26, 1999Assignee: SRI InternationalInventors: Vassilios Digalakis, Leonardo Neumeyer, Dimitry Rtischev
-
Patent number: 5864806Abstract: For equalizing a speech signal constituted by an observed sequence of successive input sound frames, which speech signal is liable to be affected by disturbances, the speech signal is modelled by means of a hidden Markov model and, at each instant t: equalization filters are constituted in association with the paths in the Markov sense at instant t; at least a plurality of the equalization filters are applied to the frames to obtain, at instant t, a plurality of filtered sound frame sequences and an utterance probability for each of the paths respectively associated with the equalization filters applied; the equalization filter corresponding to the most probable path in the Markov sense is selected; and the filtered frame supplied by the selected equalization filter is selected as the equalized frame.Type: GrantFiled: May 5, 1997Date of Patent: January 26, 1999Assignee: France TelecomInventors: Chafic Mokbel, Denis Jouvet, Jean Monne
-
Patent number: 5860062Abstract: A speech recognition apparatus and method learns in advance a plurality of kinds of noises that can occur in the environment of use to determine a plurality of noise HMMs, synthesizes these noise HMMs into one noise HMM, generates a NOVO-HMM by executing NOVO (voice mixed with noise) conversion for a speech HMM of a reference pattern by using this composite noise HMM, and uses this NOVO-HMM for a speech recognition processing. Since a plurality of noises are incorporated in the NOVO-HMM generated in this manner, the speech can be recognized with high accuracy even when the noise changes.Type: GrantFiled: June 13, 1997Date of Patent: January 12, 1999Assignee: Matsushita Electric Industrial Co., Ltd.Inventors: Kenichi Taniguchi, Nobuyuki Kono, Toshimichi Tokuda, Yoshio Ikura
-
Patent number: 5857169Abstract: A time-sequential input pattern (20), which is derived from a continual physical quantity, such as speech is recognized. The system includes input means (30), which accesses the physical quantity and therefrom generates a sequence of input observation vectors. The input observation vectors represent the input pattern. A reference pattern database (40) is used for storing reference patterns, which consist of a sequence of reference units. Each reference unit is represented by associated reference probability densities. A tree builder (60) represents for each reference unit the set of associated reference probability densities as a tree structure. Each leaf node of the tree corresponds to a reference probability density. Each non-leaf node corresponds to a cluster probability density, which is derived from all reference probability densities corresponding to leaf nodes in branches below the non-leaf node.Type: GrantFiled: August 28, 1996Date of Patent: January 5, 1999Assignee: U.S. Philips CorporationInventor: Frank Seide
-
Patent number: 5850627Abstract: A word recognition system can: respond to the input of a character string from a user by limiting the words it will recognize to words having a related, but not necessarily the same, string; score signals generated after a user has been prompted to generate a given word against words other than the prompted word to determine if the signal should be used to train the prompted word; vary the number of signals a user is prompted to generate to train a given word as a function of how well the training signals score against each other or prior models for the prompted word; create a new acoustic model of a phrase by concatenating prior acoustic models of the words in the phrase; obtain information from another program running on the same computer, such as its commands or the context of text being entered into it, and use that information to vary which words it can recognize; determine which program unit, such as an application program or dialog box, currently has input focus on its computer and create a vocabularyType: GrantFiled: June 26, 1997Date of Patent: December 15, 1998Assignee: Dragon Systems, Inc.Inventors: Joel M. Gould, Elizabeth E. Steele, Frank J. McGrath, Steven D. Squires, Peter S. Heitman, Joel W. Parke, Dean G. Sturtevant, Jed M. Roberts, James K. Baker
-
Patent number: 5842165Abstract: Methods and apparatus for the generation of speaker dependent garbage models from the very same data used to generate speaker dependent speech recognition models, e.g., word models, are described. The technique involves processing the data included in the speaker dependent speech recognition models to create one or more speaker dependent garbage models. The speaker dependent garbage model generation technique involves what may be described as distorting or morphing of a speaker dependent speech recognition model to generate a speaker dependent garbage model therefrom. One or more speaker dependent speech recognition models may then be combined with the generated speaker dependent garbage model to produce an updated garbage model. The scoring of speaker dependent garbage models is varied in accordance with the present invention as a function of the number of speech recognition models from which the speaker dependent garbage model was created.Type: GrantFiled: April 30, 1997Date of Patent: November 24, 1998Assignee: Nynex Science & Technology, Inc.Inventors: Vijay R. Raman, George J. Vysotsky
-
Patent number: 5839105Abstract: There is provided a speaker-independent model generation apparatus and a speech recognition apparatus which require a processing unit to have less memory capacity and which allow its computation time to be reduced, as compared with a conventional counterpart. A single Gaussian HMM is generated with a Baum-Welch training algorithm based on spoken speech data from a plurality of specific speakers. A state having a maximum increase in likelihood as a result of splitting one state in contextual or temporal domains is searched. Then, the state having a maximum increase in likelihood is split in a contextual or temporal domain corresponding to the maximum increase in likelihood. Thereafter, a single Gaussian HMM is generated with the Baum-Welch training algorithm, and these steps are iterated until the states within the single Gaussian HMM can no longer be split or until a predetermined number of splits is reached. Thus, a speaker-independent HMM is generated.Type: GrantFiled: November 29, 1996Date of Patent: November 17, 1998Assignee: ATR Interpreting Telecommunications Research LaboratoriesInventors: Mari Ostendorf, Harald Singer
-
Patent number: 5835890Abstract: In a speaker adaptation method for speech models, input speech is transformed to a feature parameter sequence like a cepstral sequence, and N model sequences of maximum likelihood for the feature parameter sequence are extracted from speaker-independent speech HMMs by an N-best hypothesis extraction method. The extracted model sequences are each provisionally adapted to maximize its likelihood for the feature parameter sequence of the input speech while changing the HMM parameters of each sequence, and that one of the provisionally adapted model sequences which has the maximum likelihood for the feature parameter sequence of the input speech is selected and speech models of the selected sequence are provided as adapted HMMs of the speaker to be recognized.Type: GrantFiled: April 9, 1997Date of Patent: November 10, 1998Assignee: Nippon Telegraph and Telephone CorporationInventors: Tomoko Matsui, Sadaoki Furui
-
Patent number: 5835893Abstract: In a word clustering apparatus for clustering words, a plurality of words is clustered to obtain a total tree diagram of a word dictionary representing a word clustering result, where the total tree diagram includes tree diagrams of an upper layer, a middle layer and a lower layer. In a speech recognition apparatus, a microphone converts an input utterance speech composed of a plurality of words into a speech signal, and a feature extractor extracts predetermined acoustic feature parameters from the converted speech signal. Then, a speech recognition controller executes a speech recognition process on the extracted acoustic feature parameters with reference to a predetermined Hidden Markov Model and the obtained total tree diagram of the word dictionary, and outputs a result of the speech recognition.Type: GrantFiled: April 18, 1996Date of Patent: November 10, 1998Assignee: ATR Interpreting Telecommunications Research LabsInventor: Akira Ushioda
-
Patent number: 5832430Abstract: Devices and methods for speech recognition enable simultaneous word hypothesis detection and verification in a one-pass procedure that provides for different segmentations of the speech input. A confidence measure of a target hypothesis for a known word is determined according to a recursion formula that operates on parameters of a target models and alternate models of known words, a language model and a lexicon, and feature vectors of the speech input in a likelihood ratio decoder. The confidence measure is processed to determine an accept/reject signal for the target hypothesis that is output with a target hypothesis signal. The recursion formula is based on hidden Markov models with a single optimum state sequence and may take the form of a modified Viterbi algorithm.Type: GrantFiled: December 8, 1995Date of Patent: November 3, 1998Assignee: Lucent Technologies, Inc.Inventors: Eduardo Lleida, Richard Cameron Rose
-
Patent number: 5822731Abstract: A system for parsing information representative of a sequence of words having parts of speech. The sequence of words forms a sentence or sentence fragment. A hidden Markov model is provided for determining the most likely part of speech of a selected word of the sequence of words. The hidden Markov model has an initial transition matrix and a subsequent transition matrix for storing probabilities of occurrence of the parts of speech. The initial transition matrix of the hidden Markov model is removed to provide a modified hidden Markov model. The modified hidden Markov model is applied to the sequence of words to determine the most likely part of speech of a selected word within a sentence fragment with increased accuracy.Type: GrantFiled: September 15, 1995Date of Patent: October 13, 1998Assignee: Infonautics CorporationInventor: John Michael Schultz
-
Patent number: 5819222Abstract: A speech recognition system recognizes connected speech using a plurality of vocabulary nodes, at least one of which has an associated signature. In use, partial recognition paths are examined at decision nodes intermediate the beginning and end of the recognition path, each decision node having an associated set of valid accumulated signatures. A token received by a decision node is only propagated if the accumulated signature of that token is one of those in the set of valid accumulated signatures associated with that decision node.Type: GrantFiled: October 11, 1995Date of Patent: October 6, 1998Assignee: British Telecommunications public limited companyInventors: Samuel Gavin Smyth, Simon Patrick Alexander Ringland
-
Patent number: 5819223Abstract: A speech adaptation device comprises a vocabulary independent reference pattern memory for memorizing a plurality of vocabulary independent reference patterns having one or more categories. Each category has one or more acoustic units, and has such a connection relation of the acoustic units that allows reception of any sequence of the acoustic units appearing in the input speech. A preliminary matching unit is for use in making time-alignment between the time series of the feature vectors of the input speech obtained from the analysis unit and the vocabulary independent reference pattern to obtain mean vectors for individual categories of the input speech and the vocabulary independent reference pattern from the aligned portion for the individual categories of the feature vectors of the input speech and the vocabulary independent reference pattern.Type: GrantFiled: January 26, 1996Date of Patent: October 6, 1998Assignee: NEC CorporationInventor: Keizaburo Takagi
-
Patent number: 5812975Abstract: A method of designing a state transition model capable of high speed voice recognition and a voice recognition method and apparatus using the state transition model is provided. The methods provide a state transition model in which a state shared structure of the state transition model is designed. The method includes a step of setting the states of a triphone state transition model in an acoustic space as initial clusters, a clustering step of generating a cluster containing the initial clusters by top-down clustering, a step of determining a state shared structure by assigning a short distance cluster among clusters generated by the clustering step, to the state transition model, and a step of learning a state shared model by analyzing the states of the triphones in accordance with the determined state shared structure.Type: GrantFiled: June 18, 1996Date of Patent: September 22, 1998Assignee: Canon Kabushiki KaishaInventors: Yasuhiro Komori, Yasunori Ohora
-
Patent number: 5812974Abstract: This is a speech recognition method for modeling adjacent word context, comprising: dividing a first word or period of silence into two portions; dividing a second word or period of silence, adjacent to the first word, into two potions; and combining last portion of the first word or period of silence and first portion of the second word or period of silence to make an acoustic model. The method includes constructing a grammar to restrict the acoustic models to the middle-to-middle context.Type: GrantFiled: April 10, 1996Date of Patent: September 22, 1998Assignee: Texas Instruments IncorporatedInventors: Charles T. Hemphill, Lorin P. Netsch, Christopher M. Kribs
-
Patent number: 5806034Abstract: A method for recognizing spoken utterances of a speaker is disclosed, the method comprising the steps of providing a database of labeled speech data; providing a prototype of a Hidden Markov Model (HMM) definition to define the characteristics of the HMM; and parameterizing speech utterances according to one of linear prediction parameters or Mel-scale filter bank parameters. The method further includes selecting a frame period for accommodating the parameters and generating HMMs and decoding to specified speech utterances by causing the user to utter predefined training speech utterances for each HMM. The method then statistically computes the generated HMMs with the prototype HMM to provide a set of fully trained HMMs for each utterance indicative of the speaker.Type: GrantFiled: August 2, 1995Date of Patent: September 8, 1998Assignee: ITT CorporationInventors: Joe A. Naylor, William Y. Huang, Lawrence G. Bahler
-
Patent number: 5799278Abstract: A speech recognition system for discrete words uses a single Hidden Markov Model (HMM), which is nominally adapted to recognise N different isolated words, but which is trained to recognise M different words, where M>N. This is achieved by providing M sets of audio recordings, each set comprising multiple recordings of a respective one of said M words being spoken. Only N different labels are assigned to the M sets of audio recordings, so that at least one of the N labels has two or more sets of audio recordings assigned thereto. These two or more sets of audio recordings correspond to phonetically dissimilar words. The HMM is then trained by inputting each set of audio recordings and its assigned label. The HMM can effectively compensate for the phonetic variations between the different words assigned the same label, thereby avoiding the need to utilise a larger model (i.e., to use M labels).Type: GrantFiled: July 2, 1996Date of Patent: August 25, 1998Assignee: International Business Machines CorporationInventors: Michael Cobbett, John Brian Pickering
-
Patent number: 5799277Abstract: The acoustic model generating method for speech recognition enables a high representation effect on the basis of the minimum possible model parameters. In an initial model having a smaller number of signal sources, the acoustic model for speech recognition is generated by selecting the splitting processing or the merging processing for the signal sources successively and repeatedly. The merging processing is executed prior to the splitting processing. In the merging processing, when the merged result is not appropriate, the splitting processing is executed for the model obtained before merging processing (without use of the merged result).Type: GrantFiled: October 25, 1995Date of Patent: August 25, 1998Assignee: Victor Company of Japan, Ltd.Inventor: Junichi Takami
-
Patent number: 5797123Abstract: A key-phrase detection and verification method that can be advantageously used to realize understanding of flexible (i.e., unconstrained) speech. A "multiple pass" procedure is applied to a spoken utterance comprising a sequence of words (i.e., a "sentence"). First, a plurality of key-phrases are detected (i.e., recognized) based on a set of phrase sub-grammars which may, for example, be specific to the state of the dialogue. These key-phrases are then verified by assigning confidence measures thereto and comparing these confidence measures to a threshold, resulting in a set of verified key-phrase candidates. Next, the verified key-phrase candidates are connected into sentence hypotheses based upon the confidence measures and predetermined (e.g., task-specific) semantic information. And, finally, one or more of these sentence hypotheses are verified to produce a verified sentence hypothesis and, from that, a resultant understanding of the spoken utterance.Type: GrantFiled: December 20, 1996Date of Patent: August 18, 1998Assignee: Lucent Technologies Inc.Inventors: Wu Chou, Biing-Hwang Juang, Tatsuya Kawahara, Chin-Hui Lee
-
Patent number: 5794198Abstract: One-dimensional normal distributions in respective dimensions of a continuous multi-dimensional normal distribution of each state of HMMs representing speech units mean and variance values are tied among similar one-dimensional distributions. As a result, the total number of normal distributions for representing the model is reduced without degrading recognition performance.Type: GrantFiled: October 24, 1995Date of Patent: August 11, 1998Assignee: Nippon Telegraph and Telephone CorporationInventors: Satoshi Takahashi, Shigeki Sagayama
-
Patent number: 5787396Abstract: A speech recognition method uses continuous mixture Hidden Markov Models (HMM) for probability processing including a first type of HMM having a small number of mixtures and a second type of HMM having a larger number of mixtures. First output probabilities are formed for inputted speech using the small number of mixtures type HMM and second output probabilities are formed for the input speech using the large number of mixtures type HMM for selected states corresponding to the highest output probabilities of the first type HMM. The input speech is recognized from both the first and second output probabilities.Type: GrantFiled: September 18, 1995Date of Patent: July 28, 1998Assignee: Canon Kabushiki KaishaInventors: Yasuhiro Komori, Yasunori Ohora, Masayuki Yamada
-
Patent number: 5778341Abstract: The invention is a speech recognition system and method for transmitting information including the receipt and decoding of speech information such as that modeled by hidden Markov models (HMMs). In this invention, the state likelihoods of the modeled state sequences contained within the speech information are assigned penalties based on the difference between those state likelihoods and a maximum possible state likelihood. Once penalties have been assigned, the modified state sequence with the modified state likelihoods having the highest cumulative state likelihoods is used in further speech recognition processing. In this manner, state sequences having no extremely poor state likelihoods are favored over those having both extremely high and extremely poor state likelihoods.Type: GrantFiled: January 26, 1996Date of Patent: July 7, 1998Assignee: Lucent Technologies Inc.Inventor: Ilija Zeljkovic