Markov Patents (Class 704/256)
-
Patent number: 6182038Abstract: A method and apparatus for generating a context dependent phoneme network as an intermediate step of encoding speech information. The context dependent phoneme network is generated from speech in a phoneme network generator (48) associated with an operating system (44). The context dependent phoneme network is then transmitted to a first application (52).Type: GrantFiled: December 1, 1997Date of Patent: January 30, 2001Assignee: Motorola, Inc.Inventors: Sreeram Balakrishnan, Stephen Austin
-
Patent number: 6163767Abstract: A Chinese speech recognition (SR) method and system for single or un-correlated Chinese character(s). The method uses various types of Character Description Language (CDL) to describe the single or un-correlated Chinese character(s) to be inputted. The SR system uses CDL grammar directed speech recognizer to accept CDLs, which are inputted by voice. On the basis of analysis of CDL parser, the character generator gives a corresponding character. Therefore, recognition of single or un-correlated Chinese character(s) out of context can be made reliably.Type: GrantFiled: August 28, 1998Date of Patent: December 19, 2000Assignee: International Business Machines CorporationInventors: Donald T. Tang, Li Qin Shen, Xiao Jin Zhu
-
Patent number: 6163769Abstract: A text-to-speech system includes a storage device for storing a clustered set of context-dependent phoneme-based units of a target speaker. In one embodiment, decision trees are used wherein each decision tree based context-dependent phoneme-based unit is arranged based on context of at least one immediately preceding and succeeding phoneme. At least one of the context-dependent phoneme-based units represents other non-stored context-dependent phoneme units of similar sound due to similar contexts. A text analyzer obtains a string of phonetic symbols representative of text to be converted to speech. A concatenation module selects stored decision tree based context-dependent phoneme-based units from the set decision tree based context-dependent phoneme-based units based on the context of the phonetic symbols and synthesizes the selected phoneme-based units to generate speech corresponding to the text.Type: GrantFiled: October 2, 1997Date of Patent: December 19, 2000Assignee: Microsoft CorporationInventors: Alejandro Acero, Hsiao-Wuen Hon, Xuedong D. Huang
-
Patent number: 6161091Abstract: A speech recognition synthesis based encoding/decoding method recognizes phonetic segments, syllables, words or the like as character information from an input speech signal and detects pitch periods, phoneme or syllable durations or the like, as information for prosody generation, from the input speech signal, transfers or stores the character information and information for prosody generation as code data, decodes the transferred or stored code data to acquire the character information and information for prosody generation, and synthesizes the acquired character information and information for prosody generation to obtain a speech signal.Type: GrantFiled: March 17, 1998Date of Patent: December 12, 2000Assignee: Kabushiki Kaisha ToshibaInventors: Masami Akamine, Ryosuke Koshiba
-
Patent number: 6157912Abstract: Language models which take into account the probabilities of word sequences are used in speech recognition, in particular in the recognition of fluently spoken language with a wide vocabulary, in order to increase the recognition reliability. These models are obtained from comparatively large quantities of text and accordingly represent values which were averaged over several texts. This means, however, that the language model is not well adapted to peculiarities of a special text. To achieve such an adaptation of a given language model to a special text on the basis of only a short text fragment, according to the invention, it is suggested that first the unigram language model is adapted with the short text and, in dependence thereon, the M-gram language model is subsequently adapted. A method is described for adapting the unigram language model values which automatically carries out a subdivision of the words into semantic classes.Type: GrantFiled: March 2, 1998Date of Patent: December 5, 2000Assignee: U.S. Philips CorporationInventors: Reinhard Kneser, Jochen Peters, Dietrich Klakow
-
Patent number: 6151573Abstract: A maximum likelihood (ML) linear regression (LR) solution to environment normalization is provided where the environment is modeled as a hidden (non-observable) variable. By application of an expectation maximization algorithm and extension of Baum-Welch forward and backward variables (Steps 23a-23d) a source normalization is achieved such that it is not necessary to label a database in terms of environment such as speaker identity, channel, microphone and noise type.Type: GrantFiled: August 15, 1998Date of Patent: November 21, 2000Assignee: Texas Instruments IncorporatedInventor: Yifan Gong
-
Patent number: 6151574Abstract: A speech recognition system learns characteristics of speech by a user during a learning phase to improve its performance. Adaptation data derived from the user's speech and its recognized result is collected during the learning phase. Parameters characterizing hidden Markov Models (HMMs) used in the system for speech recognition are modified based on the adaptation data. To that end, a hierarchical structure is defined in an HMM parameter space. This structure may assume the form of a tree structure having multiple layers, each of which includes one or more nodes. Each node on each layer is connected to at least one node on another layer. The nodes on the lowest layer of the tree structure are referred to as "leaf nodes." Each node in the tree structure represents a subset of the HMM parameters, and is associated with a probability measure which is derived from the adaptation data.Type: GrantFiled: September 8, 1998Date of Patent: November 21, 2000Assignee: Lucent Technologies Inc.Inventors: Chin-Hui Lee, Koichi Shinoda
-
Patent number: 6148284Abstract: A method and apparatus for speech recognition using Markov processes on curves are presented. The method and apparatus operate such that input speech utterances are received and represented as multidimensional curves. The curve is split into acoustic segments representing different components based on initial model estimates. The segments are used to create a new statistical model for the curve. The process may be reiterated to produce a more precise statistical model for recognition. As a result, feature vectors are extracted from input speech and contribute to a recognition score in proportion to their arc length. The arc lengths are weighted to minimize recognition errors due to variations in speaking rate. In addition, more importance is attached to short-lived but non-stationary sounds, such as consonants.Type: GrantFiled: December 11, 1998Date of Patent: November 14, 2000Assignee: AT&T CorporationInventor: Lawrence Kevin Saul
-
Patent number: 6138097Abstract: A recognition test matches a speech segment supplied to the system with a set of parameters associated with a reference and memorized in a dictionary. A provisional version of each set of parameters to be memorized in the dictionary in association with a reference is estimated on the basis of one or more segments of speech, after which the provisional version is memorized in the dictionary in association with the reference. At least one repetition of the speech segment is submitted to a recognition test, after which depending on whether it has matched the speech segment with the provisional version, the provisional version is modified and the modified provisional version is memorized.Type: GrantFiled: September 28, 1998Date of Patent: October 24, 2000Assignee: Matra Nortel CommunicationsInventors: Philip Lockwood, Catherine Glorion, Laurent Lelievre
-
Patent number: 6133904Abstract: An apparatus for manipulating the colour of an image is provided, having a microphone for providing electrical speech signals representative of a user command, a speech recognition unit for recognizing the input speech signal, a command interpreter for interpreting the recognized speech, a graphics package responsive to the command interpreter and a display for displaying the current image being edited. The apparatus accepts other inputs, for example, from a pointing device.Type: GrantFiled: February 4, 1997Date of Patent: October 17, 2000Assignee: Canon Kabushiki KaishaInventor: Eli Tzirkel-Hancock
-
Patent number: 6128596Abstract: A method (700), device (1101), and system (1100) provide generalized bidirectional island-driven chart parsing based on congruency checking to prevent edge overgeneration for robust and efficient parsing of a word graph. The method prevents edge overgeneration by selecting, in accordance with a predetermined scheme, a candidate edge with a starting vertex, an ending vertex, a label, and a congruence key for entry in a chart from an agenda of edges, selecting an edge equivalence set in the chart that matches the starting vertex, the ending vertex, and the label of the candidate edge, and entering the candidate edge into the chart if the congruence key of the candidate edge fails to match the congruence key of any edge in the edge equivalence set.Type: GrantFiled: April 3, 1998Date of Patent: October 3, 2000Assignee: Motorola, Inc.Inventor: Andrew William Mackie
-
Patent number: 6119087Abstract: A system and method for efficiently distributing voice call data received from speech recognition servers over a telephone network having a shared processing resource is disclosed. Incoming calls are received from phone lines and assigned grammar types by speech recognition servers. A request for processing the voice call data is sent to a resource manager which monitors the shared processing resource and identifies a preferred processor within the shared resource. The resource manager sends an instruction to the speech recognition server to send the voice call data to a preferred processor for processing. The preferred processor is determined by known processor efficiencies for voice call data having the assigned grammar type of the incoming voice call data and a measure of processor loads. While the system is operating, the resource manger develops and updates a history of each processor. The histories include processing efficiency values for all grammar types received.Type: GrantFiled: March 13, 1998Date of Patent: September 12, 2000Assignee: Nuance CommunicationsInventors: Thomas Murray Kuhn, Matthew Lennig, Peter Christopher Monaco, David Bruce Peters
-
Patent number: 6112175Abstract: A method and apparatus using a combined MLLR and MCE approach to estimating the time-varying polynomial Gaussian mean functions in the trended HMM has advantageous results. This integrated approach is referred to as the minimum classification error linear regression (MCELR), which has been developed and implemented in speaker adaptation experiments using a large body of utterances from different types of speakers. Experimental results show that the adaptation of linear regression on time-varying mean parameters is always better when fewer than three adaptation tokens are used.Type: GrantFiled: March 2, 1998Date of Patent: August 29, 2000Assignee: Lucent Technologies Inc.Inventor: Rathinavelu Chengalvarayan
-
Patent number: 6108628Abstract: A high-speed speech recognition method with a high recognition rate, utilizing speaker models, includes the steps of executing an acoustic process on the input speech, calculating a coarse output probability utilizing an unspecified speaker model, and calculating a fine output probability utilizing an unspecified speaker model and clustered speaker models, for the states estimated, by the result of coarse calculation, to contribute to the results of recognition. Candidates of recognition are then extracted by a common language search based on the obtained result, and a fine language search is conducted on the thus extracted candidates to determine the result of recognition.Type: GrantFiled: September 16, 1997Date of Patent: August 22, 2000Assignee: Canon Kabushiki KaishaInventors: Yasuhiro Komori, Tetsuo Kosaka, Masayuki Yamada
-
Patent number: 6092045Abstract: Comparing a series of observations representing unknown speech, to stored models representing known speech, the series of observations being divided into at least two blocks each comprising two or more of the observations, is carried out in an order which makes better use of memory. First, the observations in one of the blocks are compared (31), to a subset comprising one or more of the models, to determine a likelihood of a match to each of the one or more models. This step is repeated (33) for models other than those in the subset; and the whole process is repeated (34) for each block.Type: GrantFiled: July 21, 1998Date of Patent: July 18, 2000Assignee: Nortel Networks CorporationInventors: Peter R. Stubley, Andre Gillet, Vishwa N. Gupta, Christopher K. Toulson, David B. Peters
-
Patent number: 6085160Abstract: A speech recognition system uses language independent acoustic models derived from speech data from multiple languages to represent speech units which are concatenated into words. In addition, the input speech signal which is compared to the language independent acoustic models may be vector quantized according to a codebook which is derived from speech data from multiple languages.Type: GrantFiled: July 10, 1998Date of Patent: July 4, 2000Assignee: Lernout & Hauspie Speech Products N.V.Inventors: Bart D'hoore, Dirk Van Compernolle
-
Patent number: 6078883Abstract: For training a speech recognition to a multi-item repertoire, the following steps are executed: a speech item is presented by a user person, and the distinctivity thereof in the repertoire is asserted. Under control of a distinctivity found the speech item is inserted into the repertoire. These steps are repeated until reaching repertoire sufficiency. In particular, the asserting determines a likeness among the actually presented speech item and all items already in the repertoire, wherein undue likeness with one particular stored item creates a contingency procedure. This implies offering to the user a choice between ignoring the actually presented speech item and alternatively inserting the actually presented speech item at a price of deleting the particular stored item.Type: GrantFiled: December 17, 1997Date of Patent: June 20, 2000Assignee: U.S. Philips CorporationInventors: Benoit Guilhaumon, Gilles Miet
-
Patent number: 6078884Abstract: Pattern recognition apparatus uses a recognition processor for processing an input signal to indicate its similarity to allowed sequences of reference patterns to be recognised. A speech recognition processor includes a classification arrangement to identify a sequence of patterns corresponding to said input signal and for repeatedly partitioning the input signal into a speech-containing portion and, preceding and/or following said speech-containing portion, noise or silence portions. A noise model generator is provided to generate a pattern of the noise or silence portion, for subsequent use by said classification means for pattern identification purposes. The noise model generator may generate a noise model for each noise portion of the input signal, which may be used to adapt the reference patterns.Type: GrantFiled: March 26, 1998Date of Patent: June 20, 2000Assignee: British Telecommunications public limited companyInventor: Simon N. Downey
-
Patent number: 6076056Abstract: Speech recognition is performed by receiving isolated speech training data indicative of a plurality of discretely spoken training words, and receiving continuous speech training data indicative of a plurality of continuously spoken training words. A plurality of speech unit models is trained based on the isolated speech training data and the continuous speech training data. Speech is recognized based on the speech unit models trained.Type: GrantFiled: September 19, 1997Date of Patent: June 13, 2000Assignee: Microsoft CorporationInventors: Xuedong D. Huang, Fileno A. Alleva, Li Jiang, Mei-Yuh Hwang
-
Patent number: 6076057Abstract: An unsupervised, discriminative, sentence level, HMM adaptation based on speech-silence classification is presented. Silence and speech regions are determined either using a speech end-pointer or the segmentation obtained from the recognizer in a first pass. The discriminative training procedure using a GPD or any other discriminative training algorithm, employed in conjunction with the HMM-based recognizer, is then used to increase the discrimination between silence and speech.Type: GrantFiled: May 21, 1997Date of Patent: June 13, 2000Assignee: AT&T CorpInventors: Shrikanth Sambasivan Narayanan, Alexandros Potamianos, Ilija Zeljkovic
-
Patent number: 6076058Abstract: The proposed model aims at finding an optimal linear transformation on the Mel-warped DFT features according to the minimum classification error (MCE) criterion. This linear transformation, along with the (NSHMM) parameters, are automatically trained using the gradient descent method. An advantageous error rate reduction can be realized on a standard 39-class TIMIT phone classification task in comparison with the MCE-trained NSHMM using conventional preprocessing techniques.Type: GrantFiled: March 2, 1998Date of Patent: June 13, 2000Assignee: Lucent Technologies Inc.Inventor: Rathinavelu Chengalvarayan
-
Patent number: 6073099Abstract: A confusability tool generates a confusability cost associated with two phonemic transcriptions. The confusability cost measures the likelihood that a human or machine hearing the first word will mistakenly hear the second word. The cost calculation is based on a weighting of the Levinstein distance between the transcription pair.Type: GrantFiled: November 4, 1997Date of Patent: June 6, 2000Assignee: Nortel Networks CorporationInventors: Michael Sabourin, Marc Fabiani
-
Patent number: 6073098Abstract: An approximate weighted finite-state automaton can be constructed in place of a weighted finite-state automaton so long as the approximate weighted finite-state automaton maintains a sufficient portion of the original best strings in the weighted finite-state automaton and sufficiently few spurious strings are introduced into the approximate weighted finite-state automaton compared to the weighted finite-state automaton. An approximate weighted finite-state automaton can be created from a non-deterministic weighted finite-state automaton during determinization by discarding the requirement that old states be used in place of new states only when an old state is identical to a new state. Instead, in an approximate weighted finite-state automaton, old states will be used in place of new states when each of the remainders of the new state is sufficiently close to the corresponding remainder of the old state. An error tolerance parameter .tau.Type: GrantFiled: November 21, 1997Date of Patent: June 6, 2000Assignee: AT&T CorporationInventors: Adam Louis Buchsbaum, Raffaele Giancarlo, Jeffery Rex Westbrook
-
Patent number: 6070136Abstract: A speech recognition system utilizes both matrix and vector quantizers as front ends to a second stage speech classifier. Matrix quantization exploits input signal information in both frequency and time domains, and the vector quantizer primarily operates on frequency domain information. However, in some circumstances, time domain information may be substantially limited which may introduce error into the matrix quantization. Information derived from vector quantization may be utilized by a hybrid decision generator to error compensate information derived from matrix quantization. Additionally, fuzz methods of quantization and robust distance measures may be introduced to also enhance speech recognition accuracy. Furthermore, other speech classification stages may be used, such as hidden Markov models which introduce probabilistic processes to further enhance speech recognition accuracy.Type: GrantFiled: October 27, 1997Date of Patent: May 30, 2000Assignee: Advanced Micro Devices, Inc.Inventors: Lin Cong, Safdar M. Asghar
-
Patent number: 6067517Abstract: A technique to improve the recognition accuracy when transcribing speech data that contains data from a wide range of environments. Input data in many situations contains data from a variety of sources in different environments. Such classes include: clean speech, speech corrupted by noise (e.g., music), non-speech (e.g., pure music with no speech), telephone speech, and the identity of a speaker. A technique is described whereby the different classes of data are first automatically identified, and then each class is transcribed by a system that is made specifically for it. The invention also describes a segmentation algorithm that is based on making up an acoustic model that characterizes the data in each class, and then using a dynamic programming algorithm (the viterbi algorithm) to automatically identify segments that belong to each class. The acoustic models are made in a certain feature space, and the invention also describes different feature spaces for use with different classes.Type: GrantFiled: February 2, 1996Date of Patent: May 23, 2000Assignee: International Business Machines CorporationInventors: Lalit Rai Bahl, Ponani Gopalakrishnan, Ramesh Ambat Gopinath, Stephane Herman Maes, Mukund Panmanabhan, Lazaros Polymenakos
-
Patent number: 6067515Abstract: A speech recognition system utilizes both split matrix and split vector quantizers as front ends to a second stage speech classifier such as hidden Markov models (HMMs) to, for example, efficiently utilize processing resources and improve speech recognition performance. Fuzzy split matrix quantization (FSMQ) exploits the "evolution" of the speech short-term spectral envelopes as well as frequency domain information, and fuzzy split vector quantization (FSVQ) primarily operates on frequency domain information. Time domain information may be substantially limited which may introduce error into the matrix quantization, and the FSVQ may provide error compensation. Additionally, acoustic noise influence may affect particular frequency domain subbands. This system also, for example, exploits the localized noise by efficiently allocating enhanced processing technology to target noise-affected input signal parameters and minimize noise influence.Type: GrantFiled: October 27, 1997Date of Patent: May 23, 2000Assignee: Advanced Micro Devices, Inc.Inventors: Lin Cong, Safdar M. Asghar
-
Patent number: 6067513Abstract: A speech recognition method of recognizing an input speech in a noisy environment by using a plurality of clean speech models is provided. Each of the clean speech models has a clean speech feature parameter S representing a cepstrum parameter of a clean speech thereof. The speech recognition method has the processes of: detecting a noise feature parameter N representing a cepstrum parameter of a noise in the noisy environment, immediately before the input speech is input; detecting an input speech feature parameter X representing a cepstrum parameter of the input speech in the noisy environment; calculating a modified clean speech feature parameter Y according to a following equation:Y=k.multidot.S+(1-k).multidot.N (0<k.ltoreq.Type: GrantFiled: October 22, 1998Date of Patent: May 23, 2000Assignee: Pioneer Electronic CorporationInventor: Shunsuke Ishimitsu
-
Patent number: 6067520Abstract: A mandarin speech input method for directly translating arbitrary sentences of mandarin speech into corresponding Chinese Characters. The present invention is capable of processing a sequence of "mono-syllables," "(but each of the characters in the poly-character word is continuous)," "prosodic segments," or even a "whole sentence of continuous mandarin speech." A prosodic segment comprising one or more words is a segment that is automatically isolated by a speaker by pausing where characters in the prosodic segment are continuous.Type: GrantFiled: December 29, 1995Date of Patent: May 23, 2000Assignee: Lee and LiInventor: Lin-Shan Lee
-
Patent number: 6064958Abstract: A pattern recognition scheme using probabilistic models that are capable of reducing a calculation cost for the output probability while improving a recognition performance even when a number of mixture component distributions of respective states is small, by arranging distributions with low calculation cost and high expressive power as the mixture component distribution. In this pattern recognition scheme, a probability of each probabilistic model expressing features of each recognition category with respect to each input feature vector derived from each input signal is calculated, where the probabilistic model represents a feature parameter subspace in which feature vectors of each recognition category exist and the feature parameter subspace is expressed by using mixture distributions of one-dimensional discrete distributions with arbitrary distribution shapes which are arranged in respective dimensions.Type: GrantFiled: September 19, 1997Date of Patent: May 16, 2000Assignee: Nippon Telegraph and Telephone CorporationInventors: Satoshi Takahashi, Shigeki Sagayama
-
Patent number: 6061653Abstract: A method of operating a speech recognition system. The method loads a speech model from a storage facility into a memory accessible by a processor. This loading step includes two steps. A first of these steps loads process-independent state data representative of a plurality of states of the speech model. A second of these steps loads process-specific state data representative of the plurality of states of the speech model. The speech recognition system then performs a first speech recognition process with the processor by accessing the process-independent state data and a first portion of the process-specific state data. The speech recognition system also performs a second speech recognition process with the processor, where the second process also accesses the process-independent state data but further accesses a second portion of the process-specific state data different than the first portion of the process-specific state data.Type: GrantFiled: July 14, 1998Date of Patent: May 9, 2000Assignee: Alcatel USA Sourcing, L.P.Inventors: Thomas D. Fisher, Dearborn R. Mowry, Jeffrey J. Spiess
-
Patent number: 6058363Abstract: Method and system of determining an out-of-vocabulary score for speaker-independent recognition of user-defined phrases comprises enrolling a user-defined phrase with a set of speaker-independent (SI) recognition models using an enrollment grammar. An enrollment grammar score of the spoken phrase may be determined by comparing features of the spoken phrase to the SI recognition models using the enrollment grammar. The enrollment grammar score may be penalized to generate an out-of-vocabulary score.Type: GrantFiled: December 29, 1997Date of Patent: May 2, 2000Assignee: Texas Instruments IncorporatedInventor: Coimbatore S. Ramalingam
-
Patent number: 6058365Abstract: Continuous speech is recognized by selecting among hypotheses, consisting of candidates of symbol strings obtained by connecting phonemes corresponding to a Hidden Markov Model (HMM) having the highest probability, by referring to a phoneme context dependent type HMM from input speech using a HMM phoneme verification portion. A phoneme context dependent type LR (Left-Right) parser portion predicts a subsequent phoneme by referring to an action specifying item stored in an LR (Left to Right) parsing table to predict a phoneme context around the predicted phoneme using an action specifying item of the LR table.Type: GrantFiled: July 6, 1993Date of Patent: May 2, 2000Assignee: ATR Interpreting Telephony Research LaboratoriesInventors: Akito Nagai, Kenji Kita, Shigeki Sagayama
-
Patent number: 6052662Abstract: Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.Type: GrantFiled: January 29, 1998Date of Patent: April 18, 2000Assignee: Regents of the University of CaliforniaInventor: John E. Hogden
-
Patent number: 6047256Abstract: In a system for recognizing a time sequence of feature vectors of a speech signal representative of an unknown utterance as one of a plurality of reference patterns, a generator (11) for generating the reference patterns has a converter (15) for converting a plurality of time sequences of feature vectors of an input pattern of a speech signal with variances to a plurality of time sequences of feature codes with reference to code vectors (14) which are previously prepared by the known clustering. A first pattern former (16) generates a state transition probability distribution and an occurrence probability distribution of feature codes for each state in a state transition network. A function generator (17) calculates parameters of continuous Gaussian density function from the code vectors and the occurrence probability distribution to produce the continuous Gaussian density function approximating the occurrence probability distribution.Type: GrantFiled: June 17, 1993Date of Patent: April 4, 2000Assignee: NEC CorporationInventors: Shinji Koga, Takao Watanabe, Kazunaga Yoshida
-
Patent number: 6044344Abstract: A method is provided for training a statistical pattern recognition decoder on new data while preserving its accuracy of old, previously learned data. Previously learned data are represented as constrained equations that define a constrained domain (T) in a space of statistical parameters (K) of the decoder. Some part of a previously learned data is represented as a feasible point on the constrained domain. A training procedure is reformulated as optimization of objective functions over the constrained domain. Finally, the constrained optimization functions are solved. This training method ensures that previously learned data is preserved during iterative training steps. While an exemplary speech recognition decoder is discussed, the inventive method is also suited to other pattern recognition problems such as, for example, handwriting recognition, image recognition, machine translation, or natural language processing.Type: GrantFiled: January 3, 1997Date of Patent: March 28, 2000Assignee: International Business Machines CorporationInventor: Dimitri Kanevsky
-
Patent number: 6044343Abstract: One embodiment of a speech recognition system is organized with speech input signal preprocessing and feature extraction followed by a fuzzy matrix quantizer (FMQ) designed with respective codebook sets at multiple signal to noise ratios. The FMQ quantizes various training words from a set of vocabulary words and produces observation sequences O output data to train a hidden Markov model (HMM) processes .lambda.j and produces fuzzy distance measure output data for each vocabulary word codebook. A fuzzy Viterbi algorithm is used by a processor to compute maximum likelihood probabilities PR(O.vertline..lambda.j) for each vocabulary word. The fuzzy distance measures and maximum likelihood probabilities are mixed in a variety of ways to preferably optimize speech recognition accuracy and speech recognition speed performance.Type: GrantFiled: June 27, 1997Date of Patent: March 28, 2000Assignee: Advanced Micro Devices, Inc.Inventors: Lin Cong, Safdar M. Asghar
-
Patent number: 6026359Abstract: A model adaptation scheme in the pattern recognition, which is capable of realizing a fast, real time model adaptation and improving the recognition performance. This model adaptation scheme determines a change in a parameter expressing a condition of pattern recognition and probabilistic model training between an initial condition at a time of acquiring training data used in obtaining a model parameter of each probabilistic model and a current condition at a time of actual recognition. Then, the probabilistic models are adapted by obtaining a model parameter after a condition change by updating a model parameter before a condition change according to the determined change, when the initial condition and the current condition are mismatching. The adaptation processing uses a Taylor expansion expressing a change in the model parameter in terms of a change in the parameter expressing the condition.Type: GrantFiled: September 15, 1997Date of Patent: February 15, 2000Assignee: Nippon Telegraph and Telephone CorporationInventors: Yoshikazu Yamaguchi, Shigeki Sagayama, Jun-ichi Takahashi, Satoshi Takahashi
-
Patent number: 6006182Abstract: Systems and methods consistent with the present invention determine whether to accept one of a plurality of intermediate recognition results output by a speech recognition system as a final recognition result. The system first combines a plurality of speech rejection features into a feature function in which weights are assigned to each rejection feature in accordance with a recognition accuracy of each rejection feature. Feature values are then calculated for each of the rejection features using the plurality of intermediate recognition results. The system next computes the feature function according to the calculated feature values to determine a rejection decision value. Finally, one of the plurality of intermediate recognition results is accepted as the final recognition result according to the rejection decision value.Type: GrantFiled: September 22, 1997Date of Patent: December 21, 1999Assignee: Northern Telecom LimitedInventors: Waleed Fakhr, Serge Robillard, Vishwa Gupta, Real Tremblay, Michael Sabourin, Jean-Francois Crespo
-
Patent number: 6006186Abstract: A method and an apparatus for a parameter sharing speech recognition system are provided. Speech signals are received into a processor of a speech recognition system. The speech signals are processed using a speech recognition system hosting a shared hidden Markov model (HMM) produced by generating a number of phoneme models, some of which are shared. The phoneme models are generated by retaining as a separate phoneme model any triphone model having a number of trained frames available that exceeds a prespecified threshold. A shared phoneme model is generated to represent each of the groups of triphone phoneme models for which the number of trained frames having a common biphone exceed the prespecified threshold. A shared phoneme model is generated to represent each of the groups of triphone phoneme models for which the number of trained frames having an equivalent effect on a phonemic context exceed the prespecified threshold.Type: GrantFiled: October 16, 1997Date of Patent: December 21, 1999Assignees: Sony Corporation, Sony Electronics, Inc.Inventors: Ruxin Chen, Miyuki Tanaka, Duanpei Wu, Lex S. Olorenshaw
-
Patent number: 6003003Abstract: In one embodiment, a speech recognition system is organized with a fuzzy matrix quantizer with a single codebook representing u codewords. The single codebook is designed with entries from u codebooks which are designed with respective words at multiple signal to noise ratio levels. Such entries are, in one embodiment, centroids of clustered training data. The training data is, in one embodiment, derived from line spectral frequency pairs representing respective speech input signals at various signal to noise ratios. The single codebook trained in this manner provides a codebook for a robust front end speech processor, such as the fuzzy matrix quantizer, for training a speech classifier such as a u hidden Markov models and a speech post classifier such as a neural network. In one embodiment, a fuzzy Viterbi algorithm is used with the hidden Markov models to describe the speech input signal probabilistically.Type: GrantFiled: June 27, 1997Date of Patent: December 14, 1999Assignee: Advanced Micro Devices, Inc.Inventors: Safdar M. Asghar, Lin Cong
-
Patent number: 5995934Abstract: A recognition method for alpha-numeric strings in a Chinese speech recognition system, uses a special coding scheme to map each of 36 alpha-numeric symbols into an easily remembered Chinese idiom or word consisting of a multiple of Chinese characters. When representing a numeral, each idiom/word starts with the Chinese character for that numeral. When representing an English alphabet letter, each idiom/word will have a first character which starts with that English alphabet letter in its Pinyin form. If it is necessary to include some control words, idiom/words similar in semantics can be used. The method resolves the problem of unreliable recognition when a string of random alpha-numeric symbols or some control words are inputted by voice to a Chinese speech recognition system.Type: GrantFiled: August 28, 1998Date of Patent: November 30, 1999Assignee: International Business Machines CorporationInventor: Donald T. Tang
-
Patent number: 5995927Abstract: A method and an apparatus for performing stochastic matching of a set of input test speech data with a corresponding set of training speech data. In particular, a set of input test speech feature information, having been generated from an input test speech utterance, is transformed so that the stochastic characteristics thereof more closely match the stochastic characteristics of a corresponding set of training speech feature information. The corresponding set of training speech data may, for example, comprise training data which was generated from a speaker having the claimed identity of the speaker of the input test speech utterance. Specifically, in accordance with the present invention, a first covariance matrix representative of stochastic characteristics of input test speech feature information is generated based on the input test speech feature information.Type: GrantFiled: March 14, 1997Date of Patent: November 30, 1999Assignee: Lucent Technologies Inc.Inventor: Qi P. Li
-
Patent number: 5991720Abstract: The input speech is segmented using plural grammar networks, including a network that includes a filler model designed to represent noise or extraneous speech. Recognition processing results in plural lists of candidates, each list containing the N-best candidates generated. The lists are then separately aligned with the dictionary of valid names to generate two lists of valid names. The final recognition pass combines these two lists of names into a dynamic grammar and this dynamic grammar may be used to find the best candidate name using Viterbi recognition. A telephone call routing application based on the recognition system selects the best candidate name corresponding to the name spelled by the user, whether the user pronounces the name prior to spelling, or not.Type: GrantFiled: April 16, 1997Date of Patent: November 23, 1999Assignee: Matsushita Electric Industrial Co., Ltd.Inventors: Michael Galler, Jean-Claude Junqua
-
Patent number: 5991442Abstract: The present invention provides a method and apparatus for performing pattern recognition on given information such as speech data or image data with a reduced amount of calculations of the degree of matching associated with reference patterns. The method and apparatus provides a high speed operation without an increase in the amount of calculations of the degree of matching even when there are a great number of reference patterns.Type: GrantFiled: May 1, 1996Date of Patent: November 23, 1999Assignee: Canon Kabushiki KaishaInventors: Masayuki Yamada, Yasuhiro Komori
-
Patent number: 5987410Abstract: A method and device for recognizing speech that has a sequence of words each including one or more letters. The word and letters form a recognition data base. The method receives and recognizes the speech by preliminary modelling among various probably recognized sequences. The method selects one or more model sequences as result. In particular, the method allows in a model sequence of exclusively letters, various words as a subset. Such words are used to qualify one or more neighbouring or included letters in the sequence. An applicable model is a mixed information unit model.Type: GrantFiled: November 10, 1997Date of Patent: November 16, 1999Assignee: U.S. Philips CorporationInventors: Andreas Kellner, Frank Seide
-
Patent number: 5983180Abstract: In a method of automatically recognizing data which comprises sequential data units represented as sequential tokens grouped into one or more items, known items are stored as respective finite state sequence models. Each state corresponds to a token and the models which have common prefix states are organized in a tree structure such that suffix states comprise branches from common prefix states and there are a plurality of tree structures each having a different prefix state. Each sequential data unit is compared with stored reference data units identified by reference tokens to generate scores indicating the similarity of the data units to reference data units.Type: GrantFiled: February 27, 1998Date of Patent: November 9, 1999Assignee: SoftSound LimitedInventor: Anthony John Robinson
-
Patent number: 5983178Abstract: A speaker clustering apparatus generates HMMs for clusters based on feature quantities of a vocal-tract configuration of speech waveform data, and a speech recognition apparatus provided with the speaker clustering apparatus. In response to the speech waveform data of N speakers, an estimator estimates feature quantities of vocal-tract configurations, with reference to correspondence between vocal-tract configuration parameters and Formant frequencies predetermined based on a predetermined vocal tract model of a standard speaker. Further, a clustering processor calculates speaker-to-speaker distances between the N speakers based on the feature quantities of the vocal-tract configurations of the N speakers as estimated, and clusters the vocal-tract configurations of the N speakers using a clustering algorithm based on calculated speaker-to-speaker distances, thereby generating K clusters.Type: GrantFiled: December 10, 1998Date of Patent: November 9, 1999Assignee: ATR Interpreting Telecommunications Research LaboratoriesInventors: Masaki Naito, Li Deng, Yoshinori Sagisaka
-
Patent number: 5970452Abstract: The method recognizes a signal pause between two patterns that are present in a time-variant measurement signal and that are recognized using hidden Markov models. In a first signal processing stage, feature vectors are formed periodically for pattern recognition, which describe a signal curve of a measurement signal within a time slice. No speech pause is detected by a pause detector contained therein in a first time slice based on present features of a first feature vector. In a second signal processing stage, in a second time slice that follows the first time slice the first feature vector is compared with at least two hidden Markov models, of which at least one has been trained to a pattern to be recognized and another has been trained to a pattern characteristic for a pause.Type: GrantFiled: September 4, 1997Date of Patent: October 19, 1999Assignee: Siemens AktiengesellschaftInventors: Abdulmesih Aktas, Klaus Zunkler
-
Patent number: 5963905Abstract: Methods and apparatus for performing a tree search based acoustic fast match in a speech recognition system for decoding a speech utterance, the tree having a tree root and tree nodes connected by tree branches, the tree nodes having phonetic models associated therewith, are provided.Type: GrantFiled: October 24, 1997Date of Patent: October 5, 1999Assignee: International Business Machines CorporationInventors: Miroslav Novak, Michael Alan Picheny
-
Patent number: 5963902Abstract: Methods and apparatus for producing efficiently sized models suitable for pattern recognition purposes are described. Various embodiments are directed to the automated generation, evaluation, and selection of reduced size models from an initial model having a relatively large number of components, e.g., more components than can be stored for a particular intended application. To achieve model size reduction in an automated iterative manner, expectation maximization (EM) model training techniques are combined, in accordance with the present invention, with model size constraints. In one embodiment, a new reduced size model is generated using a LaGrange multiplier from an input model and input size constraints during each iteration of the size reducing model training process. The reduced size model generated during one iteration of the process serves as the input to the next iteration. Scoring, e.g.Type: GrantFiled: July 30, 1997Date of Patent: October 5, 1999Assignee: Nynex Science & Technology, Inc.Inventor: Kuansan Wang