Markov Patents (Class 704/256)
  • Patent number: 6182038
    Abstract: A method and apparatus for generating a context dependent phoneme network as an intermediate step of encoding speech information. The context dependent phoneme network is generated from speech in a phoneme network generator (48) associated with an operating system (44). The context dependent phoneme network is then transmitted to a first application (52).
    Type: Grant
    Filed: December 1, 1997
    Date of Patent: January 30, 2001
    Assignee: Motorola, Inc.
    Inventors: Sreeram Balakrishnan, Stephen Austin
  • Patent number: 6163767
    Abstract: A Chinese speech recognition (SR) method and system for single or un-correlated Chinese character(s). The method uses various types of Character Description Language (CDL) to describe the single or un-correlated Chinese character(s) to be inputted. The SR system uses CDL grammar directed speech recognizer to accept CDLs, which are inputted by voice. On the basis of analysis of CDL parser, the character generator gives a corresponding character. Therefore, recognition of single or un-correlated Chinese character(s) out of context can be made reliably.
    Type: Grant
    Filed: August 28, 1998
    Date of Patent: December 19, 2000
    Assignee: International Business Machines Corporation
    Inventors: Donald T. Tang, Li Qin Shen, Xiao Jin Zhu
  • Patent number: 6163769
    Abstract: A text-to-speech system includes a storage device for storing a clustered set of context-dependent phoneme-based units of a target speaker. In one embodiment, decision trees are used wherein each decision tree based context-dependent phoneme-based unit is arranged based on context of at least one immediately preceding and succeeding phoneme. At least one of the context-dependent phoneme-based units represents other non-stored context-dependent phoneme units of similar sound due to similar contexts. A text analyzer obtains a string of phonetic symbols representative of text to be converted to speech. A concatenation module selects stored decision tree based context-dependent phoneme-based units from the set decision tree based context-dependent phoneme-based units based on the context of the phonetic symbols and synthesizes the selected phoneme-based units to generate speech corresponding to the text.
    Type: Grant
    Filed: October 2, 1997
    Date of Patent: December 19, 2000
    Assignee: Microsoft Corporation
    Inventors: Alejandro Acero, Hsiao-Wuen Hon, Xuedong D. Huang
  • Patent number: 6161091
    Abstract: A speech recognition synthesis based encoding/decoding method recognizes phonetic segments, syllables, words or the like as character information from an input speech signal and detects pitch periods, phoneme or syllable durations or the like, as information for prosody generation, from the input speech signal, transfers or stores the character information and information for prosody generation as code data, decodes the transferred or stored code data to acquire the character information and information for prosody generation, and synthesizes the acquired character information and information for prosody generation to obtain a speech signal.
    Type: Grant
    Filed: March 17, 1998
    Date of Patent: December 12, 2000
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Masami Akamine, Ryosuke Koshiba
  • Patent number: 6157912
    Abstract: Language models which take into account the probabilities of word sequences are used in speech recognition, in particular in the recognition of fluently spoken language with a wide vocabulary, in order to increase the recognition reliability. These models are obtained from comparatively large quantities of text and accordingly represent values which were averaged over several texts. This means, however, that the language model is not well adapted to peculiarities of a special text. To achieve such an adaptation of a given language model to a special text on the basis of only a short text fragment, according to the invention, it is suggested that first the unigram language model is adapted with the short text and, in dependence thereon, the M-gram language model is subsequently adapted. A method is described for adapting the unigram language model values which automatically carries out a subdivision of the words into semantic classes.
    Type: Grant
    Filed: March 2, 1998
    Date of Patent: December 5, 2000
    Assignee: U.S. Philips Corporation
    Inventors: Reinhard Kneser, Jochen Peters, Dietrich Klakow
  • Patent number: 6151573
    Abstract: A maximum likelihood (ML) linear regression (LR) solution to environment normalization is provided where the environment is modeled as a hidden (non-observable) variable. By application of an expectation maximization algorithm and extension of Baum-Welch forward and backward variables (Steps 23a-23d) a source normalization is achieved such that it is not necessary to label a database in terms of environment such as speaker identity, channel, microphone and noise type.
    Type: Grant
    Filed: August 15, 1998
    Date of Patent: November 21, 2000
    Assignee: Texas Instruments Incorporated
    Inventor: Yifan Gong
  • Patent number: 6151574
    Abstract: A speech recognition system learns characteristics of speech by a user during a learning phase to improve its performance. Adaptation data derived from the user's speech and its recognized result is collected during the learning phase. Parameters characterizing hidden Markov Models (HMMs) used in the system for speech recognition are modified based on the adaptation data. To that end, a hierarchical structure is defined in an HMM parameter space. This structure may assume the form of a tree structure having multiple layers, each of which includes one or more nodes. Each node on each layer is connected to at least one node on another layer. The nodes on the lowest layer of the tree structure are referred to as "leaf nodes." Each node in the tree structure represents a subset of the HMM parameters, and is associated with a probability measure which is derived from the adaptation data.
    Type: Grant
    Filed: September 8, 1998
    Date of Patent: November 21, 2000
    Assignee: Lucent Technologies Inc.
    Inventors: Chin-Hui Lee, Koichi Shinoda
  • Patent number: 6148284
    Abstract: A method and apparatus for speech recognition using Markov processes on curves are presented. The method and apparatus operate such that input speech utterances are received and represented as multidimensional curves. The curve is split into acoustic segments representing different components based on initial model estimates. The segments are used to create a new statistical model for the curve. The process may be reiterated to produce a more precise statistical model for recognition. As a result, feature vectors are extracted from input speech and contribute to a recognition score in proportion to their arc length. The arc lengths are weighted to minimize recognition errors due to variations in speaking rate. In addition, more importance is attached to short-lived but non-stationary sounds, such as consonants.
    Type: Grant
    Filed: December 11, 1998
    Date of Patent: November 14, 2000
    Assignee: AT&T Corporation
    Inventor: Lawrence Kevin Saul
  • Patent number: 6138097
    Abstract: A recognition test matches a speech segment supplied to the system with a set of parameters associated with a reference and memorized in a dictionary. A provisional version of each set of parameters to be memorized in the dictionary in association with a reference is estimated on the basis of one or more segments of speech, after which the provisional version is memorized in the dictionary in association with the reference. At least one repetition of the speech segment is submitted to a recognition test, after which depending on whether it has matched the speech segment with the provisional version, the provisional version is modified and the modified provisional version is memorized.
    Type: Grant
    Filed: September 28, 1998
    Date of Patent: October 24, 2000
    Assignee: Matra Nortel Communications
    Inventors: Philip Lockwood, Catherine Glorion, Laurent Lelievre
  • Patent number: 6133904
    Abstract: An apparatus for manipulating the colour of an image is provided, having a microphone for providing electrical speech signals representative of a user command, a speech recognition unit for recognizing the input speech signal, a command interpreter for interpreting the recognized speech, a graphics package responsive to the command interpreter and a display for displaying the current image being edited. The apparatus accepts other inputs, for example, from a pointing device.
    Type: Grant
    Filed: February 4, 1997
    Date of Patent: October 17, 2000
    Assignee: Canon Kabushiki Kaisha
    Inventor: Eli Tzirkel-Hancock
  • Patent number: 6128596
    Abstract: A method (700), device (1101), and system (1100) provide generalized bidirectional island-driven chart parsing based on congruency checking to prevent edge overgeneration for robust and efficient parsing of a word graph. The method prevents edge overgeneration by selecting, in accordance with a predetermined scheme, a candidate edge with a starting vertex, an ending vertex, a label, and a congruence key for entry in a chart from an agenda of edges, selecting an edge equivalence set in the chart that matches the starting vertex, the ending vertex, and the label of the candidate edge, and entering the candidate edge into the chart if the congruence key of the candidate edge fails to match the congruence key of any edge in the edge equivalence set.
    Type: Grant
    Filed: April 3, 1998
    Date of Patent: October 3, 2000
    Assignee: Motorola, Inc.
    Inventor: Andrew William Mackie
  • Patent number: 6119087
    Abstract: A system and method for efficiently distributing voice call data received from speech recognition servers over a telephone network having a shared processing resource is disclosed. Incoming calls are received from phone lines and assigned grammar types by speech recognition servers. A request for processing the voice call data is sent to a resource manager which monitors the shared processing resource and identifies a preferred processor within the shared resource. The resource manager sends an instruction to the speech recognition server to send the voice call data to a preferred processor for processing. The preferred processor is determined by known processor efficiencies for voice call data having the assigned grammar type of the incoming voice call data and a measure of processor loads. While the system is operating, the resource manger develops and updates a history of each processor. The histories include processing efficiency values for all grammar types received.
    Type: Grant
    Filed: March 13, 1998
    Date of Patent: September 12, 2000
    Assignee: Nuance Communications
    Inventors: Thomas Murray Kuhn, Matthew Lennig, Peter Christopher Monaco, David Bruce Peters
  • Patent number: 6112175
    Abstract: A method and apparatus using a combined MLLR and MCE approach to estimating the time-varying polynomial Gaussian mean functions in the trended HMM has advantageous results. This integrated approach is referred to as the minimum classification error linear regression (MCELR), which has been developed and implemented in speaker adaptation experiments using a large body of utterances from different types of speakers. Experimental results show that the adaptation of linear regression on time-varying mean parameters is always better when fewer than three adaptation tokens are used.
    Type: Grant
    Filed: March 2, 1998
    Date of Patent: August 29, 2000
    Assignee: Lucent Technologies Inc.
    Inventor: Rathinavelu Chengalvarayan
  • Patent number: 6108628
    Abstract: A high-speed speech recognition method with a high recognition rate, utilizing speaker models, includes the steps of executing an acoustic process on the input speech, calculating a coarse output probability utilizing an unspecified speaker model, and calculating a fine output probability utilizing an unspecified speaker model and clustered speaker models, for the states estimated, by the result of coarse calculation, to contribute to the results of recognition. Candidates of recognition are then extracted by a common language search based on the obtained result, and a fine language search is conducted on the thus extracted candidates to determine the result of recognition.
    Type: Grant
    Filed: September 16, 1997
    Date of Patent: August 22, 2000
    Assignee: Canon Kabushiki Kaisha
    Inventors: Yasuhiro Komori, Tetsuo Kosaka, Masayuki Yamada
  • Patent number: 6092045
    Abstract: Comparing a series of observations representing unknown speech, to stored models representing known speech, the series of observations being divided into at least two blocks each comprising two or more of the observations, is carried out in an order which makes better use of memory. First, the observations in one of the blocks are compared (31), to a subset comprising one or more of the models, to determine a likelihood of a match to each of the one or more models. This step is repeated (33) for models other than those in the subset; and the whole process is repeated (34) for each block.
    Type: Grant
    Filed: July 21, 1998
    Date of Patent: July 18, 2000
    Assignee: Nortel Networks Corporation
    Inventors: Peter R. Stubley, Andre Gillet, Vishwa N. Gupta, Christopher K. Toulson, David B. Peters
  • Patent number: 6085160
    Abstract: A speech recognition system uses language independent acoustic models derived from speech data from multiple languages to represent speech units which are concatenated into words. In addition, the input speech signal which is compared to the language independent acoustic models may be vector quantized according to a codebook which is derived from speech data from multiple languages.
    Type: Grant
    Filed: July 10, 1998
    Date of Patent: July 4, 2000
    Assignee: Lernout & Hauspie Speech Products N.V.
    Inventors: Bart D'hoore, Dirk Van Compernolle
  • Patent number: 6078883
    Abstract: For training a speech recognition to a multi-item repertoire, the following steps are executed: a speech item is presented by a user person, and the distinctivity thereof in the repertoire is asserted. Under control of a distinctivity found the speech item is inserted into the repertoire. These steps are repeated until reaching repertoire sufficiency. In particular, the asserting determines a likeness among the actually presented speech item and all items already in the repertoire, wherein undue likeness with one particular stored item creates a contingency procedure. This implies offering to the user a choice between ignoring the actually presented speech item and alternatively inserting the actually presented speech item at a price of deleting the particular stored item.
    Type: Grant
    Filed: December 17, 1997
    Date of Patent: June 20, 2000
    Assignee: U.S. Philips Corporation
    Inventors: Benoit Guilhaumon, Gilles Miet
  • Patent number: 6078884
    Abstract: Pattern recognition apparatus uses a recognition processor for processing an input signal to indicate its similarity to allowed sequences of reference patterns to be recognised. A speech recognition processor includes a classification arrangement to identify a sequence of patterns corresponding to said input signal and for repeatedly partitioning the input signal into a speech-containing portion and, preceding and/or following said speech-containing portion, noise or silence portions. A noise model generator is provided to generate a pattern of the noise or silence portion, for subsequent use by said classification means for pattern identification purposes. The noise model generator may generate a noise model for each noise portion of the input signal, which may be used to adapt the reference patterns.
    Type: Grant
    Filed: March 26, 1998
    Date of Patent: June 20, 2000
    Assignee: British Telecommunications public limited company
    Inventor: Simon N. Downey
  • Patent number: 6076056
    Abstract: Speech recognition is performed by receiving isolated speech training data indicative of a plurality of discretely spoken training words, and receiving continuous speech training data indicative of a plurality of continuously spoken training words. A plurality of speech unit models is trained based on the isolated speech training data and the continuous speech training data. Speech is recognized based on the speech unit models trained.
    Type: Grant
    Filed: September 19, 1997
    Date of Patent: June 13, 2000
    Assignee: Microsoft Corporation
    Inventors: Xuedong D. Huang, Fileno A. Alleva, Li Jiang, Mei-Yuh Hwang
  • Patent number: 6076057
    Abstract: An unsupervised, discriminative, sentence level, HMM adaptation based on speech-silence classification is presented. Silence and speech regions are determined either using a speech end-pointer or the segmentation obtained from the recognizer in a first pass. The discriminative training procedure using a GPD or any other discriminative training algorithm, employed in conjunction with the HMM-based recognizer, is then used to increase the discrimination between silence and speech.
    Type: Grant
    Filed: May 21, 1997
    Date of Patent: June 13, 2000
    Assignee: AT&T Corp
    Inventors: Shrikanth Sambasivan Narayanan, Alexandros Potamianos, Ilija Zeljkovic
  • Patent number: 6076058
    Abstract: The proposed model aims at finding an optimal linear transformation on the Mel-warped DFT features according to the minimum classification error (MCE) criterion. This linear transformation, along with the (NSHMM) parameters, are automatically trained using the gradient descent method. An advantageous error rate reduction can be realized on a standard 39-class TIMIT phone classification task in comparison with the MCE-trained NSHMM using conventional preprocessing techniques.
    Type: Grant
    Filed: March 2, 1998
    Date of Patent: June 13, 2000
    Assignee: Lucent Technologies Inc.
    Inventor: Rathinavelu Chengalvarayan
  • Patent number: 6073099
    Abstract: A confusability tool generates a confusability cost associated with two phonemic transcriptions. The confusability cost measures the likelihood that a human or machine hearing the first word will mistakenly hear the second word. The cost calculation is based on a weighting of the Levinstein distance between the transcription pair.
    Type: Grant
    Filed: November 4, 1997
    Date of Patent: June 6, 2000
    Assignee: Nortel Networks Corporation
    Inventors: Michael Sabourin, Marc Fabiani
  • Patent number: 6073098
    Abstract: An approximate weighted finite-state automaton can be constructed in place of a weighted finite-state automaton so long as the approximate weighted finite-state automaton maintains a sufficient portion of the original best strings in the weighted finite-state automaton and sufficiently few spurious strings are introduced into the approximate weighted finite-state automaton compared to the weighted finite-state automaton. An approximate weighted finite-state automaton can be created from a non-deterministic weighted finite-state automaton during determinization by discarding the requirement that old states be used in place of new states only when an old state is identical to a new state. Instead, in an approximate weighted finite-state automaton, old states will be used in place of new states when each of the remainders of the new state is sufficiently close to the corresponding remainder of the old state. An error tolerance parameter .tau.
    Type: Grant
    Filed: November 21, 1997
    Date of Patent: June 6, 2000
    Assignee: AT&T Corporation
    Inventors: Adam Louis Buchsbaum, Raffaele Giancarlo, Jeffery Rex Westbrook
  • Patent number: 6070136
    Abstract: A speech recognition system utilizes both matrix and vector quantizers as front ends to a second stage speech classifier. Matrix quantization exploits input signal information in both frequency and time domains, and the vector quantizer primarily operates on frequency domain information. However, in some circumstances, time domain information may be substantially limited which may introduce error into the matrix quantization. Information derived from vector quantization may be utilized by a hybrid decision generator to error compensate information derived from matrix quantization. Additionally, fuzz methods of quantization and robust distance measures may be introduced to also enhance speech recognition accuracy. Furthermore, other speech classification stages may be used, such as hidden Markov models which introduce probabilistic processes to further enhance speech recognition accuracy.
    Type: Grant
    Filed: October 27, 1997
    Date of Patent: May 30, 2000
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Lin Cong, Safdar M. Asghar
  • Patent number: 6067517
    Abstract: A technique to improve the recognition accuracy when transcribing speech data that contains data from a wide range of environments. Input data in many situations contains data from a variety of sources in different environments. Such classes include: clean speech, speech corrupted by noise (e.g., music), non-speech (e.g., pure music with no speech), telephone speech, and the identity of a speaker. A technique is described whereby the different classes of data are first automatically identified, and then each class is transcribed by a system that is made specifically for it. The invention also describes a segmentation algorithm that is based on making up an acoustic model that characterizes the data in each class, and then using a dynamic programming algorithm (the viterbi algorithm) to automatically identify segments that belong to each class. The acoustic models are made in a certain feature space, and the invention also describes different feature spaces for use with different classes.
    Type: Grant
    Filed: February 2, 1996
    Date of Patent: May 23, 2000
    Assignee: International Business Machines Corporation
    Inventors: Lalit Rai Bahl, Ponani Gopalakrishnan, Ramesh Ambat Gopinath, Stephane Herman Maes, Mukund Panmanabhan, Lazaros Polymenakos
  • Patent number: 6067515
    Abstract: A speech recognition system utilizes both split matrix and split vector quantizers as front ends to a second stage speech classifier such as hidden Markov models (HMMs) to, for example, efficiently utilize processing resources and improve speech recognition performance. Fuzzy split matrix quantization (FSMQ) exploits the "evolution" of the speech short-term spectral envelopes as well as frequency domain information, and fuzzy split vector quantization (FSVQ) primarily operates on frequency domain information. Time domain information may be substantially limited which may introduce error into the matrix quantization, and the FSVQ may provide error compensation. Additionally, acoustic noise influence may affect particular frequency domain subbands. This system also, for example, exploits the localized noise by efficiently allocating enhanced processing technology to target noise-affected input signal parameters and minimize noise influence.
    Type: Grant
    Filed: October 27, 1997
    Date of Patent: May 23, 2000
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Lin Cong, Safdar M. Asghar
  • Patent number: 6067513
    Abstract: A speech recognition method of recognizing an input speech in a noisy environment by using a plurality of clean speech models is provided. Each of the clean speech models has a clean speech feature parameter S representing a cepstrum parameter of a clean speech thereof. The speech recognition method has the processes of: detecting a noise feature parameter N representing a cepstrum parameter of a noise in the noisy environment, immediately before the input speech is input; detecting an input speech feature parameter X representing a cepstrum parameter of the input speech in the noisy environment; calculating a modified clean speech feature parameter Y according to a following equation:Y=k.multidot.S+(1-k).multidot.N (0<k.ltoreq.
    Type: Grant
    Filed: October 22, 1998
    Date of Patent: May 23, 2000
    Assignee: Pioneer Electronic Corporation
    Inventor: Shunsuke Ishimitsu
  • Patent number: 6067520
    Abstract: A mandarin speech input method for directly translating arbitrary sentences of mandarin speech into corresponding Chinese Characters. The present invention is capable of processing a sequence of "mono-syllables," "(but each of the characters in the poly-character word is continuous)," "prosodic segments," or even a "whole sentence of continuous mandarin speech." A prosodic segment comprising one or more words is a segment that is automatically isolated by a speaker by pausing where characters in the prosodic segment are continuous.
    Type: Grant
    Filed: December 29, 1995
    Date of Patent: May 23, 2000
    Assignee: Lee and Li
    Inventor: Lin-Shan Lee
  • Patent number: 6064958
    Abstract: A pattern recognition scheme using probabilistic models that are capable of reducing a calculation cost for the output probability while improving a recognition performance even when a number of mixture component distributions of respective states is small, by arranging distributions with low calculation cost and high expressive power as the mixture component distribution. In this pattern recognition scheme, a probability of each probabilistic model expressing features of each recognition category with respect to each input feature vector derived from each input signal is calculated, where the probabilistic model represents a feature parameter subspace in which feature vectors of each recognition category exist and the feature parameter subspace is expressed by using mixture distributions of one-dimensional discrete distributions with arbitrary distribution shapes which are arranged in respective dimensions.
    Type: Grant
    Filed: September 19, 1997
    Date of Patent: May 16, 2000
    Assignee: Nippon Telegraph and Telephone Corporation
    Inventors: Satoshi Takahashi, Shigeki Sagayama
  • Patent number: 6061653
    Abstract: A method of operating a speech recognition system. The method loads a speech model from a storage facility into a memory accessible by a processor. This loading step includes two steps. A first of these steps loads process-independent state data representative of a plurality of states of the speech model. A second of these steps loads process-specific state data representative of the plurality of states of the speech model. The speech recognition system then performs a first speech recognition process with the processor by accessing the process-independent state data and a first portion of the process-specific state data. The speech recognition system also performs a second speech recognition process with the processor, where the second process also accesses the process-independent state data but further accesses a second portion of the process-specific state data different than the first portion of the process-specific state data.
    Type: Grant
    Filed: July 14, 1998
    Date of Patent: May 9, 2000
    Assignee: Alcatel USA Sourcing, L.P.
    Inventors: Thomas D. Fisher, Dearborn R. Mowry, Jeffrey J. Spiess
  • Patent number: 6058363
    Abstract: Method and system of determining an out-of-vocabulary score for speaker-independent recognition of user-defined phrases comprises enrolling a user-defined phrase with a set of speaker-independent (SI) recognition models using an enrollment grammar. An enrollment grammar score of the spoken phrase may be determined by comparing features of the spoken phrase to the SI recognition models using the enrollment grammar. The enrollment grammar score may be penalized to generate an out-of-vocabulary score.
    Type: Grant
    Filed: December 29, 1997
    Date of Patent: May 2, 2000
    Assignee: Texas Instruments Incorporated
    Inventor: Coimbatore S. Ramalingam
  • Patent number: 6058365
    Abstract: Continuous speech is recognized by selecting among hypotheses, consisting of candidates of symbol strings obtained by connecting phonemes corresponding to a Hidden Markov Model (HMM) having the highest probability, by referring to a phoneme context dependent type HMM from input speech using a HMM phoneme verification portion. A phoneme context dependent type LR (Left-Right) parser portion predicts a subsequent phoneme by referring to an action specifying item stored in an LR (Left to Right) parsing table to predict a phoneme context around the predicted phoneme using an action specifying item of the LR table.
    Type: Grant
    Filed: July 6, 1993
    Date of Patent: May 2, 2000
    Assignee: ATR Interpreting Telephony Research Laboratories
    Inventors: Akito Nagai, Kenji Kita, Shigeki Sagayama
  • Patent number: 6052662
    Abstract: Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.
    Type: Grant
    Filed: January 29, 1998
    Date of Patent: April 18, 2000
    Assignee: Regents of the University of California
    Inventor: John E. Hogden
  • Patent number: 6047256
    Abstract: In a system for recognizing a time sequence of feature vectors of a speech signal representative of an unknown utterance as one of a plurality of reference patterns, a generator (11) for generating the reference patterns has a converter (15) for converting a plurality of time sequences of feature vectors of an input pattern of a speech signal with variances to a plurality of time sequences of feature codes with reference to code vectors (14) which are previously prepared by the known clustering. A first pattern former (16) generates a state transition probability distribution and an occurrence probability distribution of feature codes for each state in a state transition network. A function generator (17) calculates parameters of continuous Gaussian density function from the code vectors and the occurrence probability distribution to produce the continuous Gaussian density function approximating the occurrence probability distribution.
    Type: Grant
    Filed: June 17, 1993
    Date of Patent: April 4, 2000
    Assignee: NEC Corporation
    Inventors: Shinji Koga, Takao Watanabe, Kazunaga Yoshida
  • Patent number: 6044344
    Abstract: A method is provided for training a statistical pattern recognition decoder on new data while preserving its accuracy of old, previously learned data. Previously learned data are represented as constrained equations that define a constrained domain (T) in a space of statistical parameters (K) of the decoder. Some part of a previously learned data is represented as a feasible point on the constrained domain. A training procedure is reformulated as optimization of objective functions over the constrained domain. Finally, the constrained optimization functions are solved. This training method ensures that previously learned data is preserved during iterative training steps. While an exemplary speech recognition decoder is discussed, the inventive method is also suited to other pattern recognition problems such as, for example, handwriting recognition, image recognition, machine translation, or natural language processing.
    Type: Grant
    Filed: January 3, 1997
    Date of Patent: March 28, 2000
    Assignee: International Business Machines Corporation
    Inventor: Dimitri Kanevsky
  • Patent number: 6044343
    Abstract: One embodiment of a speech recognition system is organized with speech input signal preprocessing and feature extraction followed by a fuzzy matrix quantizer (FMQ) designed with respective codebook sets at multiple signal to noise ratios. The FMQ quantizes various training words from a set of vocabulary words and produces observation sequences O output data to train a hidden Markov model (HMM) processes .lambda.j and produces fuzzy distance measure output data for each vocabulary word codebook. A fuzzy Viterbi algorithm is used by a processor to compute maximum likelihood probabilities PR(O.vertline..lambda.j) for each vocabulary word. The fuzzy distance measures and maximum likelihood probabilities are mixed in a variety of ways to preferably optimize speech recognition accuracy and speech recognition speed performance.
    Type: Grant
    Filed: June 27, 1997
    Date of Patent: March 28, 2000
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Lin Cong, Safdar M. Asghar
  • Patent number: 6026359
    Abstract: A model adaptation scheme in the pattern recognition, which is capable of realizing a fast, real time model adaptation and improving the recognition performance. This model adaptation scheme determines a change in a parameter expressing a condition of pattern recognition and probabilistic model training between an initial condition at a time of acquiring training data used in obtaining a model parameter of each probabilistic model and a current condition at a time of actual recognition. Then, the probabilistic models are adapted by obtaining a model parameter after a condition change by updating a model parameter before a condition change according to the determined change, when the initial condition and the current condition are mismatching. The adaptation processing uses a Taylor expansion expressing a change in the model parameter in terms of a change in the parameter expressing the condition.
    Type: Grant
    Filed: September 15, 1997
    Date of Patent: February 15, 2000
    Assignee: Nippon Telegraph and Telephone Corporation
    Inventors: Yoshikazu Yamaguchi, Shigeki Sagayama, Jun-ichi Takahashi, Satoshi Takahashi
  • Patent number: 6006182
    Abstract: Systems and methods consistent with the present invention determine whether to accept one of a plurality of intermediate recognition results output by a speech recognition system as a final recognition result. The system first combines a plurality of speech rejection features into a feature function in which weights are assigned to each rejection feature in accordance with a recognition accuracy of each rejection feature. Feature values are then calculated for each of the rejection features using the plurality of intermediate recognition results. The system next computes the feature function according to the calculated feature values to determine a rejection decision value. Finally, one of the plurality of intermediate recognition results is accepted as the final recognition result according to the rejection decision value.
    Type: Grant
    Filed: September 22, 1997
    Date of Patent: December 21, 1999
    Assignee: Northern Telecom Limited
    Inventors: Waleed Fakhr, Serge Robillard, Vishwa Gupta, Real Tremblay, Michael Sabourin, Jean-Francois Crespo
  • Patent number: 6006186
    Abstract: A method and an apparatus for a parameter sharing speech recognition system are provided. Speech signals are received into a processor of a speech recognition system. The speech signals are processed using a speech recognition system hosting a shared hidden Markov model (HMM) produced by generating a number of phoneme models, some of which are shared. The phoneme models are generated by retaining as a separate phoneme model any triphone model having a number of trained frames available that exceeds a prespecified threshold. A shared phoneme model is generated to represent each of the groups of triphone phoneme models for which the number of trained frames having a common biphone exceed the prespecified threshold. A shared phoneme model is generated to represent each of the groups of triphone phoneme models for which the number of trained frames having an equivalent effect on a phonemic context exceed the prespecified threshold.
    Type: Grant
    Filed: October 16, 1997
    Date of Patent: December 21, 1999
    Assignees: Sony Corporation, Sony Electronics, Inc.
    Inventors: Ruxin Chen, Miyuki Tanaka, Duanpei Wu, Lex S. Olorenshaw
  • Patent number: 6003003
    Abstract: In one embodiment, a speech recognition system is organized with a fuzzy matrix quantizer with a single codebook representing u codewords. The single codebook is designed with entries from u codebooks which are designed with respective words at multiple signal to noise ratio levels. Such entries are, in one embodiment, centroids of clustered training data. The training data is, in one embodiment, derived from line spectral frequency pairs representing respective speech input signals at various signal to noise ratios. The single codebook trained in this manner provides a codebook for a robust front end speech processor, such as the fuzzy matrix quantizer, for training a speech classifier such as a u hidden Markov models and a speech post classifier such as a neural network. In one embodiment, a fuzzy Viterbi algorithm is used with the hidden Markov models to describe the speech input signal probabilistically.
    Type: Grant
    Filed: June 27, 1997
    Date of Patent: December 14, 1999
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Safdar M. Asghar, Lin Cong
  • Patent number: 5995934
    Abstract: A recognition method for alpha-numeric strings in a Chinese speech recognition system, uses a special coding scheme to map each of 36 alpha-numeric symbols into an easily remembered Chinese idiom or word consisting of a multiple of Chinese characters. When representing a numeral, each idiom/word starts with the Chinese character for that numeral. When representing an English alphabet letter, each idiom/word will have a first character which starts with that English alphabet letter in its Pinyin form. If it is necessary to include some control words, idiom/words similar in semantics can be used. The method resolves the problem of unreliable recognition when a string of random alpha-numeric symbols or some control words are inputted by voice to a Chinese speech recognition system.
    Type: Grant
    Filed: August 28, 1998
    Date of Patent: November 30, 1999
    Assignee: International Business Machines Corporation
    Inventor: Donald T. Tang
  • Patent number: 5995927
    Abstract: A method and an apparatus for performing stochastic matching of a set of input test speech data with a corresponding set of training speech data. In particular, a set of input test speech feature information, having been generated from an input test speech utterance, is transformed so that the stochastic characteristics thereof more closely match the stochastic characteristics of a corresponding set of training speech feature information. The corresponding set of training speech data may, for example, comprise training data which was generated from a speaker having the claimed identity of the speaker of the input test speech utterance. Specifically, in accordance with the present invention, a first covariance matrix representative of stochastic characteristics of input test speech feature information is generated based on the input test speech feature information.
    Type: Grant
    Filed: March 14, 1997
    Date of Patent: November 30, 1999
    Assignee: Lucent Technologies Inc.
    Inventor: Qi P. Li
  • Patent number: 5991720
    Abstract: The input speech is segmented using plural grammar networks, including a network that includes a filler model designed to represent noise or extraneous speech. Recognition processing results in plural lists of candidates, each list containing the N-best candidates generated. The lists are then separately aligned with the dictionary of valid names to generate two lists of valid names. The final recognition pass combines these two lists of names into a dynamic grammar and this dynamic grammar may be used to find the best candidate name using Viterbi recognition. A telephone call routing application based on the recognition system selects the best candidate name corresponding to the name spelled by the user, whether the user pronounces the name prior to spelling, or not.
    Type: Grant
    Filed: April 16, 1997
    Date of Patent: November 23, 1999
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Michael Galler, Jean-Claude Junqua
  • Patent number: 5991442
    Abstract: The present invention provides a method and apparatus for performing pattern recognition on given information such as speech data or image data with a reduced amount of calculations of the degree of matching associated with reference patterns. The method and apparatus provides a high speed operation without an increase in the amount of calculations of the degree of matching even when there are a great number of reference patterns.
    Type: Grant
    Filed: May 1, 1996
    Date of Patent: November 23, 1999
    Assignee: Canon Kabushiki Kaisha
    Inventors: Masayuki Yamada, Yasuhiro Komori
  • Patent number: 5987410
    Abstract: A method and device for recognizing speech that has a sequence of words each including one or more letters. The word and letters form a recognition data base. The method receives and recognizes the speech by preliminary modelling among various probably recognized sequences. The method selects one or more model sequences as result. In particular, the method allows in a model sequence of exclusively letters, various words as a subset. Such words are used to qualify one or more neighbouring or included letters in the sequence. An applicable model is a mixed information unit model.
    Type: Grant
    Filed: November 10, 1997
    Date of Patent: November 16, 1999
    Assignee: U.S. Philips Corporation
    Inventors: Andreas Kellner, Frank Seide
  • Patent number: 5983180
    Abstract: In a method of automatically recognizing data which comprises sequential data units represented as sequential tokens grouped into one or more items, known items are stored as respective finite state sequence models. Each state corresponds to a token and the models which have common prefix states are organized in a tree structure such that suffix states comprise branches from common prefix states and there are a plurality of tree structures each having a different prefix state. Each sequential data unit is compared with stored reference data units identified by reference tokens to generate scores indicating the similarity of the data units to reference data units.
    Type: Grant
    Filed: February 27, 1998
    Date of Patent: November 9, 1999
    Assignee: SoftSound Limited
    Inventor: Anthony John Robinson
  • Patent number: 5983178
    Abstract: A speaker clustering apparatus generates HMMs for clusters based on feature quantities of a vocal-tract configuration of speech waveform data, and a speech recognition apparatus provided with the speaker clustering apparatus. In response to the speech waveform data of N speakers, an estimator estimates feature quantities of vocal-tract configurations, with reference to correspondence between vocal-tract configuration parameters and Formant frequencies predetermined based on a predetermined vocal tract model of a standard speaker. Further, a clustering processor calculates speaker-to-speaker distances between the N speakers based on the feature quantities of the vocal-tract configurations of the N speakers as estimated, and clusters the vocal-tract configurations of the N speakers using a clustering algorithm based on calculated speaker-to-speaker distances, thereby generating K clusters.
    Type: Grant
    Filed: December 10, 1998
    Date of Patent: November 9, 1999
    Assignee: ATR Interpreting Telecommunications Research Laboratories
    Inventors: Masaki Naito, Li Deng, Yoshinori Sagisaka
  • Patent number: 5970452
    Abstract: The method recognizes a signal pause between two patterns that are present in a time-variant measurement signal and that are recognized using hidden Markov models. In a first signal processing stage, feature vectors are formed periodically for pattern recognition, which describe a signal curve of a measurement signal within a time slice. No speech pause is detected by a pause detector contained therein in a first time slice based on present features of a first feature vector. In a second signal processing stage, in a second time slice that follows the first time slice the first feature vector is compared with at least two hidden Markov models, of which at least one has been trained to a pattern to be recognized and another has been trained to a pattern characteristic for a pause.
    Type: Grant
    Filed: September 4, 1997
    Date of Patent: October 19, 1999
    Assignee: Siemens Aktiengesellschaft
    Inventors: Abdulmesih Aktas, Klaus Zunkler
  • Patent number: 5963905
    Abstract: Methods and apparatus for performing a tree search based acoustic fast match in a speech recognition system for decoding a speech utterance, the tree having a tree root and tree nodes connected by tree branches, the tree nodes having phonetic models associated therewith, are provided.
    Type: Grant
    Filed: October 24, 1997
    Date of Patent: October 5, 1999
    Assignee: International Business Machines Corporation
    Inventors: Miroslav Novak, Michael Alan Picheny
  • Patent number: 5963902
    Abstract: Methods and apparatus for producing efficiently sized models suitable for pattern recognition purposes are described. Various embodiments are directed to the automated generation, evaluation, and selection of reduced size models from an initial model having a relatively large number of components, e.g., more components than can be stored for a particular intended application. To achieve model size reduction in an automated iterative manner, expectation maximization (EM) model training techniques are combined, in accordance with the present invention, with model size constraints. In one embodiment, a new reduced size model is generated using a LaGrange multiplier from an input model and input size constraints during each iteration of the size reducing model training process. The reduced size model generated during one iteration of the process serves as the input to the next iteration. Scoring, e.g.
    Type: Grant
    Filed: July 30, 1997
    Date of Patent: October 5, 1999
    Assignee: Nynex Science & Technology, Inc.
    Inventor: Kuansan Wang