Markov Patents (Class 704/256)

Hidden markov model (hmm) (epo) (Class 704/256.1)

Context dependent phoneme networks for encoding speech information

Patent number: 6182038

Abstract: A method and apparatus for generating a context dependent phoneme network as an intermediate step of encoding speech information. The context dependent phoneme network is generated from speech in a phoneme network generator (48) associated with an operating system (44). The context dependent phoneme network is then transmitted to a first application (52).

Type: Grant

Filed: December 1, 1997

Date of Patent: January 30, 2001

Assignee: Motorola, Inc.

Inventors: Sreeram Balakrishnan, Stephen Austin
Speech recognition method and system for recognizing single or un-correlated Chinese characters

Patent number: 6163767

Abstract: A Chinese speech recognition (SR) method and system for single or un-correlated Chinese character(s). The method uses various types of Character Description Language (CDL) to describe the single or un-correlated Chinese character(s) to be inputted. The SR system uses CDL grammar directed speech recognizer to accept CDLs, which are inputted by voice. On the basis of analysis of CDL parser, the character generator gives a corresponding character. Therefore, recognition of single or un-correlated Chinese character(s) out of context can be made reliably.

Type: Grant

Filed: August 28, 1998

Date of Patent: December 19, 2000

Assignee: International Business Machines Corporation

Inventors: Donald T. Tang, Li Qin Shen, Xiao Jin Zhu
Text-to-speech using clustered context-dependent phoneme-based units

Patent number: 6163769

Abstract: A text-to-speech system includes a storage device for storing a clustered set of context-dependent phoneme-based units of a target speaker. In one embodiment, decision trees are used wherein each decision tree based context-dependent phoneme-based unit is arranged based on context of at least one immediately preceding and succeeding phoneme. At least one of the context-dependent phoneme-based units represents other non-stored context-dependent phoneme units of similar sound due to similar contexts. A text analyzer obtains a string of phonetic symbols representative of text to be converted to speech. A concatenation module selects stored decision tree based context-dependent phoneme-based units from the set decision tree based context-dependent phoneme-based units based on the context of the phonetic symbols and synthesizes the selected phoneme-based units to generate speech corresponding to the text.

Type: Grant

Filed: October 2, 1997

Date of Patent: December 19, 2000

Assignee: Microsoft Corporation

Inventors: Alejandro Acero, Hsiao-Wuen Hon, Xuedong D. Huang
Speech recognition-synthesis based encoding/decoding method, and speech encoding/decoding system

Patent number: 6161091

Abstract: A speech recognition synthesis based encoding/decoding method recognizes phonetic segments, syllables, words or the like as character information from an input speech signal and detects pitch periods, phoneme or syllable durations or the like, as information for prosody generation, from the input speech signal, transfers or stores the character information and information for prosody generation as code data, decodes the transferred or stored code data to acquire the character information and information for prosody generation, and synthesizes the acquired character information and information for prosody generation to obtain a speech signal.

Type: Grant

Filed: March 17, 1998

Date of Patent: December 12, 2000

Assignee: Kabushiki Kaisha Toshiba

Inventors: Masami Akamine, Ryosuke Koshiba
Speech recognition method with language model adaptation

Patent number: 6157912

Abstract: Language models which take into account the probabilities of word sequences are used in speech recognition, in particular in the recognition of fluently spoken language with a wide vocabulary, in order to increase the recognition reliability. These models are obtained from comparatively large quantities of text and accordingly represent values which were averaged over several texts. This means, however, that the language model is not well adapted to peculiarities of a special text. To achieve such an adaptation of a given language model to a special text on the basis of only a short text fragment, according to the invention, it is suggested that first the unigram language model is adapted with the short text and, in dependence thereon, the M-gram language model is subsequently adapted. A method is described for adapting the unigram language model values which automatically carries out a subdivision of the words into semantic classes.

Type: Grant

Filed: March 2, 1998

Date of Patent: December 5, 2000

Assignee: U.S. Philips Corporation

Inventors: Reinhard Kneser, Jochen Peters, Dietrich Klakow
Source normalization training for HMM modeling of speech

Patent number: 6151573

Abstract: A maximum likelihood (ML) linear regression (LR) solution to environment normalization is provided where the environment is modeled as a hidden (non-observable) variable. By application of an expectation maximization algorithm and extension of Baum-Welch forward and backward variables (Steps 23a-23d) a source normalization is achieved such that it is not necessary to label a database in terms of environment such as speaker identity, channel, microphone and noise type.

Type: Grant

Filed: August 15, 1998

Date of Patent: November 21, 2000

Assignee: Texas Instruments Incorporated

Inventor: Yifan Gong
Technique for adaptation of hidden markov models for speech recognition

Patent number: 6151574

Abstract: A speech recognition system learns characteristics of speech by a user during a learning phase to improve its performance. Adaptation data derived from the user's speech and its recognized result is collected during the learning phase. Parameters characterizing hidden Markov Models (HMMs) used in the system for speech recognition are modified based on the adaptation data. To that end, a hierarchical structure is defined in an HMM parameter space. This structure may assume the form of a tree structure having multiple layers, each of which includes one or more nodes. Each node on each layer is connected to at least one node on another layer. The nodes on the lowest layer of the tree structure are referred to as "leaf nodes." Each node in the tree structure represents a subset of the HMM parameters, and is associated with a probability measure which is derived from the adaptation data.

Type: Grant

Filed: September 8, 1998

Date of Patent: November 21, 2000

Assignee: Lucent Technologies Inc.

Inventors: Chin-Hui Lee, Koichi Shinoda
Method and apparatus for automatic speech recognition using Markov processes on curves

Patent number: 6148284

Abstract: A method and apparatus for speech recognition using Markov processes on curves are presented. The method and apparatus operate such that input speech utterances are received and represented as multidimensional curves. The curve is split into acoustic segments representing different components based on initial model estimates. The segments are used to create a new statistical model for the curve. The process may be reiterated to produce a more precise statistical model for recognition. As a result, feature vectors are extracted from input speech and contribute to a recognition score in proportion to their arc length. The arc lengths are weighted to minimize recognition errors due to variations in speaking rate. In addition, more importance is attached to short-lived but non-stationary sounds, such as consonants.

Type: Grant

Filed: December 11, 1998

Date of Patent: November 14, 2000

Assignee: AT&T Corporation

Inventor: Lawrence Kevin Saul
Method of learning in a speech recognition system

Patent number: 6138097

Abstract: A recognition test matches a speech segment supplied to the system with a set of parameters associated with a reference and memorized in a dictionary. A provisional version of each set of parameters to be memorized in the dictionary in association with a reference is estimated on the basis of one or more segments of speech, after which the provisional version is memorized in the dictionary in association with the reference. At least one repetition of the speech segment is submitted to a recognition test, after which depending on whether it has matched the speech segment with the provisional version, the provisional version is modified and the modified provisional version is memorized.

Type: Grant

Filed: September 28, 1998

Date of Patent: October 24, 2000

Assignee: Matra Nortel Communications

Inventors: Philip Lockwood, Catherine Glorion, Laurent Lelievre
Image manipulation

Patent number: 6133904

Abstract: An apparatus for manipulating the colour of an image is provided, having a microphone for providing electrical speech signals representative of a user command, a speech recognition unit for recognizing the input speech signal, a command interpreter for interpreting the recognized speech, a graphics package responsive to the command interpreter and a display for displaying the current image being edited. The apparatus accepts other inputs, for example, from a pointing device.

Type: Grant

Filed: February 4, 1997

Date of Patent: October 17, 2000

Assignee: Canon Kabushiki Kaisha

Inventor: Eli Tzirkel-Hancock
Method, device and system for generalized bidirectional island-driven chart parsing

Patent number: 6128596

Abstract: A method (700), device (1101), and system (1100) provide generalized bidirectional island-driven chart parsing based on congruency checking to prevent edge overgeneration for robust and efficient parsing of a word graph. The method prevents edge overgeneration by selecting, in accordance with a predetermined scheme, a candidate edge with a starting vertex, an ending vertex, a label, and a congruence key for entry in a chart from an agenda of edges, selecting an edge equivalence set in the chart that matches the starting vertex, the ending vertex, and the label of the candidate edge, and entering the candidate edge into the chart if the congruence key of the candidate edge fails to match the congruence key of any edge in the edge equivalence set.

Type: Grant

Filed: April 3, 1998

Date of Patent: October 3, 2000

Assignee: Motorola, Inc.

Inventor: Andrew William Mackie
System architecture for and method of voice processing

Patent number: 6119087

Abstract: A system and method for efficiently distributing voice call data received from speech recognition servers over a telephone network having a shared processing resource is disclosed. Incoming calls are received from phone lines and assigned grammar types by speech recognition servers. A request for processing the voice call data is sent to a resource manager which monitors the shared processing resource and identifies a preferred processor within the shared resource. The resource manager sends an instruction to the speech recognition server to send the voice call data to a preferred processor for processing. The preferred processor is determined by known processor efficiencies for voice call data having the assigned grammar type of the incoming voice call data and a measure of processor loads. While the system is operating, the resource manger develops and updates a history of each processor. The histories include processing efficiency values for all grammar types received.

Type: Grant

Filed: March 13, 1998

Date of Patent: September 12, 2000

Assignee: Nuance Communications

Inventors: Thomas Murray Kuhn, Matthew Lennig, Peter Christopher Monaco, David Bruce Peters
Speaker adaptation using discriminative linear regression on time-varying mean parameters in trended HMM

Patent number: 6112175

Abstract: A method and apparatus using a combined MLLR and MCE approach to estimating the time-varying polynomial Gaussian mean functions in the trended HMM has advantageous results. This integrated approach is referred to as the minimum classification error linear regression (MCELR), which has been developed and implemented in speaker adaptation experiments using a large body of utterances from different types of speakers. Experimental results show that the adaptation of linear regression on time-varying mean parameters is always better when fewer than three adaptation tokens are used.

Type: Grant

Filed: March 2, 1998

Date of Patent: August 29, 2000

Assignee: Lucent Technologies Inc.

Inventor: Rathinavelu Chengalvarayan
Speech recognition method and apparatus using coarse and fine output probabilities utilizing an unspecified speaker model

Patent number: 6108628

Abstract: A high-speed speech recognition method with a high recognition rate, utilizing speaker models, includes the steps of executing an acoustic process on the input speech, calculating a coarse output probability utilizing an unspecified speaker model, and calculating a fine output probability utilizing an unspecified speaker model and clustered speaker models, for the states estimated, by the result of coarse calculation, to contribute to the results of recognition. Candidates of recognition are then extracted by a common language search based on the obtained result, and a fine language search is conducted on the thus extracted candidates to determine the result of recognition.

Type: Grant

Filed: September 16, 1997

Date of Patent: August 22, 2000

Assignee: Canon Kabushiki Kaisha

Inventors: Yasuhiro Komori, Tetsuo Kosaka, Masayuki Yamada
Method and apparatus for speech recognition

Patent number: 6092045

Abstract: Comparing a series of observations representing unknown speech, to stored models representing known speech, the series of observations being divided into at least two blocks each comprising two or more of the observations, is carried out in an order which makes better use of memory. First, the observations in one of the blocks are compared (31), to a subset comprising one or more of the models, to determine a likelihood of a match to each of the one or more models. This step is repeated (33) for models other than those in the subset; and the whole process is repeated (34) for each block.

Type: Grant

Filed: July 21, 1998

Date of Patent: July 18, 2000

Assignee: Nortel Networks Corporation

Inventors: Peter R. Stubley, Andre Gillet, Vishwa N. Gupta, Christopher K. Toulson, David B. Peters
Language independent speech recognition

Patent number: 6085160

Abstract: A speech recognition system uses language independent acoustic models derived from speech data from multiple languages to represent speech units which are concatenated into words. In addition, the input speech signal which is compared to the language independent acoustic models may be vector quantized according to a codebook which is derived from speech data from multiple languages.

Type: Grant

Filed: July 10, 1998

Date of Patent: July 4, 2000

Assignee: Lernout & Hauspie Speech Products N.V.

Inventors: Bart D'hoore, Dirk Van Compernolle
Method for training a speech recognition system and an apparatus for practising the method, in particular, a portable telephone apparatus

Patent number: 6078883

Abstract: For training a speech recognition to a multi-item repertoire, the following steps are executed: a speech item is presented by a user person, and the distinctivity thereof in the repertoire is asserted. Under control of a distinctivity found the speech item is inserted into the repertoire. These steps are repeated until reaching repertoire sufficiency. In particular, the asserting determines a likeness among the actually presented speech item and all items already in the repertoire, wherein undue likeness with one particular stored item creates a contingency procedure. This implies offering to the user a choice between ignoring the actually presented speech item and alternatively inserting the actually presented speech item at a price of deleting the particular stored item.

Type: Grant

Filed: December 17, 1997

Date of Patent: June 20, 2000

Assignee: U.S. Philips Corporation

Inventors: Benoit Guilhaumon, Gilles Miet
Pattern recognition

Patent number: 6078884

Abstract: Pattern recognition apparatus uses a recognition processor for processing an input signal to indicate its similarity to allowed sequences of reference patterns to be recognised. A speech recognition processor includes a classification arrangement to identify a sequence of patterns corresponding to said input signal and for repeatedly partitioning the input signal into a speech-containing portion and, preceding and/or following said speech-containing portion, noise or silence portions. A noise model generator is provided to generate a pattern of the noise or silence portion, for subsequent use by said classification means for pattern identification purposes. The noise model generator may generate a noise model for each noise portion of the input signal, which may be used to adapt the reference patterns.

Type: Grant

Filed: March 26, 1998

Date of Patent: June 20, 2000

Assignee: British Telecommunications public limited company

Inventor: Simon N. Downey
Speech recognition system for recognizing continuous and isolated speech

Patent number: 6076056

Abstract: Speech recognition is performed by receiving isolated speech training data indicative of a plurality of discretely spoken training words, and receiving continuous speech training data indicative of a plurality of continuously spoken training words. A plurality of speech unit models is trained based on the isolated speech training data and the continuous speech training data. Speech is recognized based on the speech unit models trained.

Type: Grant

Filed: September 19, 1997

Date of Patent: June 13, 2000

Assignee: Microsoft Corporation

Inventors: Xuedong D. Huang, Fileno A. Alleva, Li Jiang, Mei-Yuh Hwang
Unsupervised HMM adaptation based on speech-silence discrimination

Patent number: 6076057

Abstract: An unsupervised, discriminative, sentence level, HMM adaptation based on speech-silence classification is presented. Silence and speech regions are determined either using a speech end-pointer or the segmentation obtained from the recognizer in a first pass. The discriminative training procedure using a GPD or any other discriminative training algorithm, employed in conjunction with the HMM-based recognizer, is then used to increase the discrimination between silence and speech.

Type: Grant

Filed: May 21, 1997

Date of Patent: June 13, 2000

Assignee: AT&T Corp

Inventors: Shrikanth Sambasivan Narayanan, Alexandros Potamianos, Ilija Zeljkovic
Linear trajectory models incorporating preprocessing parameters for speech recognition

Patent number: 6076058

Abstract: The proposed model aims at finding an optimal linear transformation on the Mel-warped DFT features according to the minimum classification error (MCE) criterion. This linear transformation, along with the (NSHMM) parameters, are automatically trained using the gradient descent method. An advantageous error rate reduction can be realized on a standard 39-class TIMIT phone classification task in comparison with the MCE-trained NSHMM using conventional preprocessing techniques.

Type: Grant

Filed: March 2, 1998

Date of Patent: June 13, 2000

Assignee: Lucent Technologies Inc.

Inventor: Rathinavelu Chengalvarayan
Predicting auditory confusions using a weighted Levinstein distance

Patent number: 6073099

Abstract: A confusability tool generates a confusability cost associated with two phonemic transcriptions. The confusability cost measures the likelihood that a human or machine hearing the first word will mistakenly hear the second word. The cost calculation is based on a weighting of the Levinstein distance between the transcription pair.

Type: Grant

Filed: November 4, 1997

Date of Patent: June 6, 2000

Assignee: Nortel Networks Corporation

Inventors: Michael Sabourin, Marc Fabiani
Method and apparatus for generating deterministic approximate weighted finite-state automata

Patent number: 6073098

Abstract: An approximate weighted finite-state automaton can be constructed in place of a weighted finite-state automaton so long as the approximate weighted finite-state automaton maintains a sufficient portion of the original best strings in the weighted finite-state automaton and sufficiently few spurious strings are introduced into the approximate weighted finite-state automaton compared to the weighted finite-state automaton. An approximate weighted finite-state automaton can be created from a non-deterministic weighted finite-state automaton during determinization by discarding the requirement that old states be used in place of new states only when an old state is identical to a new state. Instead, in an approximate weighted finite-state automaton, old states will be used in place of new states when each of the remainders of the new state is sufficiently close to the corresponding remainder of the old state. An error tolerance parameter .tau.

Type: Grant

Filed: November 21, 1997

Date of Patent: June 6, 2000

Assignee: AT&T Corporation

Inventors: Adam Louis Buchsbaum, Raffaele Giancarlo, Jeffery Rex Westbrook
Matrix quantization with vector quantization error compensation for robust speech recognition

Patent number: 6070136

Abstract: A speech recognition system utilizes both matrix and vector quantizers as front ends to a second stage speech classifier. Matrix quantization exploits input signal information in both frequency and time domains, and the vector quantizer primarily operates on frequency domain information. However, in some circumstances, time domain information may be substantially limited which may introduce error into the matrix quantization. Information derived from vector quantization may be utilized by a hybrid decision generator to error compensate information derived from matrix quantization. Additionally, fuzz methods of quantization and robust distance measures may be introduced to also enhance speech recognition accuracy. Furthermore, other speech classification stages may be used, such as hidden Markov models which introduce probabilistic processes to further enhance speech recognition accuracy.

Type: Grant

Filed: October 27, 1997

Date of Patent: May 30, 2000

Assignee: Advanced Micro Devices, Inc.

Inventors: Lin Cong, Safdar M. Asghar
Transcription of speech data with segments from acoustically dissimilar environments

Patent number: 6067517

Abstract: A technique to improve the recognition accuracy when transcribing speech data that contains data from a wide range of environments. Input data in many situations contains data from a variety of sources in different environments. Such classes include: clean speech, speech corrupted by noise (e.g., music), non-speech (e.g., pure music with no speech), telephone speech, and the identity of a speaker. A technique is described whereby the different classes of data are first automatically identified, and then each class is transcribed by a system that is made specifically for it. The invention also describes a segmentation algorithm that is based on making up an acoustic model that characterizes the data in each class, and then using a dynamic programming algorithm (the viterbi algorithm) to automatically identify segments that belong to each class. The acoustic models are made in a certain feature space, and the invention also describes different feature spaces for use with different classes.

Type: Grant

Filed: February 2, 1996

Date of Patent: May 23, 2000

Assignee: International Business Machines Corporation

Inventors: Lalit Rai Bahl, Ponani Gopalakrishnan, Ramesh Ambat Gopinath, Stephane Herman Maes, Mukund Panmanabhan, Lazaros Polymenakos
Split matrix quantization with split vector quantization error compensation and selective enhanced processing for robust speech recognition

Patent number: 6067515

Abstract: A speech recognition system utilizes both split matrix and split vector quantizers as front ends to a second stage speech classifier such as hidden Markov models (HMMs) to, for example, efficiently utilize processing resources and improve speech recognition performance. Fuzzy split matrix quantization (FSMQ) exploits the "evolution" of the speech short-term spectral envelopes as well as frequency domain information, and fuzzy split vector quantization (FSVQ) primarily operates on frequency domain information. Time domain information may be substantially limited which may introduce error into the matrix quantization, and the FSVQ may provide error compensation. Additionally, acoustic noise influence may affect particular frequency domain subbands. This system also, for example, exploits the localized noise by efficiently allocating enhanced processing technology to target noise-affected input signal parameters and minimize noise influence.

Type: Grant

Filed: October 27, 1997

Date of Patent: May 23, 2000

Assignee: Advanced Micro Devices, Inc.

Inventors: Lin Cong, Safdar M. Asghar
Speech recognition method and speech recognition apparatus

Patent number: 6067513

Abstract: A speech recognition method of recognizing an input speech in a noisy environment by using a plurality of clean speech models is provided. Each of the clean speech models has a clean speech feature parameter S representing a cepstrum parameter of a clean speech thereof. The speech recognition method has the processes of: detecting a noise feature parameter N representing a cepstrum parameter of a noise in the noisy environment, immediately before the input speech is input; detecting an input speech feature parameter X representing a cepstrum parameter of the input speech in the noisy environment; calculating a modified clean speech feature parameter Y according to a following equation:Y=k.multidot.S+(1-k).multidot.N (0<k.ltoreq.

Type: Grant

Filed: October 22, 1998

Date of Patent: May 23, 2000

Assignee: Pioneer Electronic Corporation

Inventor: Shunsuke Ishimitsu
System and method of recognizing continuous mandarin speech utilizing chinese hidden markou models

Patent number: 6067520

Abstract: A mandarin speech input method for directly translating arbitrary sentences of mandarin speech into corresponding Chinese Characters. The present invention is capable of processing a sequence of "mono-syllables," "(but each of the characters in the poly-character word is continuous)," "prosodic segments," or even a "whole sentence of continuous mandarin speech." A prosodic segment comprising one or more words is a segment that is automatically isolated by a speaker by pausing where characters in the prosodic segment are continuous.

Type: Grant

Filed: December 29, 1995

Date of Patent: May 23, 2000

Assignee: Lee and Li

Inventor: Lin-Shan Lee
Pattern recognition scheme using probabilistic models based on mixtures distribution of discrete distribution

Patent number: 6064958

Abstract: A pattern recognition scheme using probabilistic models that are capable of reducing a calculation cost for the output probability while improving a recognition performance even when a number of mixture component distributions of respective states is small, by arranging distributions with low calculation cost and high expressive power as the mixture component distribution. In this pattern recognition scheme, a probability of each probabilistic model expressing features of each recognition category with respect to each input feature vector derived from each input signal is calculated, where the probabilistic model represents a feature parameter subspace in which feature vectors of each recognition category exist and the feature parameter subspace is expressed by using mixture distributions of one-dimensional discrete distributions with arbitrary distribution shapes which are arranged in respective dimensions.

Type: Grant

Filed: September 19, 1997

Date of Patent: May 16, 2000

Assignee: Nippon Telegraph and Telephone Corporation

Inventors: Satoshi Takahashi, Shigeki Sagayama
Speech recognition system using shared speech models for multiple recognition processes

Patent number: 6061653

Abstract: A method of operating a speech recognition system. The method loads a speech model from a storage facility into a memory accessible by a processor. This loading step includes two steps. A first of these steps loads process-independent state data representative of a plurality of states of the speech model. A second of these steps loads process-specific state data representative of the plurality of states of the speech model. The speech recognition system then performs a first speech recognition process with the processor by accessing the process-independent state data and a first portion of the process-specific state data. The speech recognition system also performs a second speech recognition process with the processor, where the second process also accesses the process-independent state data but further accesses a second portion of the process-specific state data different than the first portion of the process-specific state data.

Type: Grant

Filed: July 14, 1998

Date of Patent: May 9, 2000

Assignee: Alcatel USA Sourcing, L.P.

Inventors: Thomas D. Fisher, Dearborn R. Mowry, Jeffrey J. Spiess
Method and system for speaker-independent recognition of user-defined phrases

Patent number: 6058363

Abstract: Method and system of determining an out-of-vocabulary score for speaker-independent recognition of user-defined phrases comprises enrolling a user-defined phrase with a set of speaker-independent (SI) recognition models using an enrollment grammar. An enrollment grammar score of the spoken phrase may be determined by comparing features of the spoken phrase to the SI recognition models using the enrollment grammar. The enrollment grammar score may be penalized to generate an out-of-vocabulary score.

Type: Grant

Filed: December 29, 1997

Date of Patent: May 2, 2000

Assignee: Texas Instruments Incorporated

Inventor: Coimbatore S. Ramalingam
Speech processing using an expanded left to right parser

Patent number: 6058365

Abstract: Continuous speech is recognized by selecting among hypotheses, consisting of candidates of symbol strings obtained by connecting phonemes corresponding to a Hidden Markov Model (HMM) having the highest probability, by referring to a phoneme context dependent type HMM from input speech using a HMM phoneme verification portion. A phoneme context dependent type LR (Left-Right) parser portion predicts a subsequent phoneme by referring to an action specifying item stored in an LR (Left to Right) parsing table to predict a phoneme context around the predicted phoneme using an action specifying item of the LR table.

Type: Grant

Filed: July 6, 1993

Date of Patent: May 2, 2000

Assignee: ATR Interpreting Telephony Research Laboratories

Inventors: Akito Nagai, Kenji Kita, Shigeki Sagayama
Speech processing using maximum likelihood continuity mapping

Patent number: 6052662

Abstract: Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.

Type: Grant

Filed: January 29, 1998

Date of Patent: April 18, 2000

Assignee: Regents of the University of California

Inventor: John E. Hogden
Device for generating a reference pattern with a continuous probability density function derived from feature code occurrence probability distribution

Patent number: 6047256

Abstract: In a system for recognizing a time sequence of feature vectors of a speech signal representative of an unknown utterance as one of a plurality of reference patterns, a generator (11) for generating the reference patterns has a converter (15) for converting a plurality of time sequences of feature vectors of an input pattern of a speech signal with variances to a plurality of time sequences of feature codes with reference to code vectors (14) which are previously prepared by the known clustering. A first pattern former (16) generates a state transition probability distribution and an occurrence probability distribution of feature codes for each state in a state transition network. A function generator (17) calculates parameters of continuous Gaussian density function from the code vectors and the occurrence probability distribution to produce the continuous Gaussian density function approximating the occurrence probability distribution.

Type: Grant

Filed: June 17, 1993

Date of Patent: April 4, 2000

Assignee: NEC Corporation

Inventors: Shinji Koga, Takao Watanabe, Kazunaga Yoshida
Constrained corrective training for continuous parameter system

Patent number: 6044344

Abstract: A method is provided for training a statistical pattern recognition decoder on new data while preserving its accuracy of old, previously learned data. Previously learned data are represented as constrained equations that define a constrained domain (T) in a space of statistical parameters (K) of the decoder. Some part of a previously learned data is represented as a feasible point on the constrained domain. A training procedure is reformulated as optimization of objective functions over the constrained domain. Finally, the constrained optimization functions are solved. This training method ensures that previously learned data is preserved during iterative training steps. While an exemplary speech recognition decoder is discussed, the inventive method is also suited to other pattern recognition problems such as, for example, handwriting recognition, image recognition, machine translation, or natural language processing.

Type: Grant

Filed: January 3, 1997

Date of Patent: March 28, 2000

Assignee: International Business Machines Corporation

Inventor: Dimitri Kanevsky
Adaptive speech recognition with selective input data to a speech classifier

Patent number: 6044343

Abstract: One embodiment of a speech recognition system is organized with speech input signal preprocessing and feature extraction followed by a fuzzy matrix quantizer (FMQ) designed with respective codebook sets at multiple signal to noise ratios. The FMQ quantizes various training words from a set of vocabulary words and produces observation sequences O output data to train a hidden Markov model (HMM) processes .lambda.j and produces fuzzy distance measure output data for each vocabulary word codebook. A fuzzy Viterbi algorithm is used by a processor to compute maximum likelihood probabilities PR(O.vertline..lambda.j) for each vocabulary word. The fuzzy distance measures and maximum likelihood probabilities are mixed in a variety of ways to preferably optimize speech recognition accuracy and speech recognition speed performance.

Type: Grant

Filed: June 27, 1997

Date of Patent: March 28, 2000

Assignee: Advanced Micro Devices, Inc.

Inventors: Lin Cong, Safdar M. Asghar
Scheme for model adaptation in pattern recognition based on Taylor expansion

Patent number: 6026359

Abstract: A model adaptation scheme in the pattern recognition, which is capable of realizing a fast, real time model adaptation and improving the recognition performance. This model adaptation scheme determines a change in a parameter expressing a condition of pattern recognition and probabilistic model training between an initial condition at a time of acquiring training data used in obtaining a model parameter of each probabilistic model and a current condition at a time of actual recognition. Then, the probabilistic models are adapted by obtaining a model parameter after a condition change by updating a model parameter before a condition change according to the determined change, when the initial condition and the current condition are mismatching. The adaptation processing uses a Taylor expansion expressing a change in the model parameter in terms of a change in the parameter expressing the condition.

Type: Grant

Filed: September 15, 1997

Date of Patent: February 15, 2000

Assignee: Nippon Telegraph and Telephone Corporation

Inventors: Yoshikazu Yamaguchi, Shigeki Sagayama, Jun-ichi Takahashi, Satoshi Takahashi
Speech recognition rejection method using generalized additive models

Patent number: 6006182

Abstract: Systems and methods consistent with the present invention determine whether to accept one of a plurality of intermediate recognition results output by a speech recognition system as a final recognition result. The system first combines a plurality of speech rejection features into a feature function in which weights are assigned to each rejection feature in accordance with a recognition accuracy of each rejection feature. Feature values are then calculated for each of the rejection features using the plurality of intermediate recognition results. The system next computes the feature function according to the calculated feature values to determine a rejection decision value. Finally, one of the plurality of intermediate recognition results is accepted as the final recognition result according to the rejection decision value.

Type: Grant

Filed: September 22, 1997

Date of Patent: December 21, 1999

Assignee: Northern Telecom Limited

Inventors: Waleed Fakhr, Serge Robillard, Vishwa Gupta, Real Tremblay, Michael Sabourin, Jean-Francois Crespo
Method and apparatus for a parameter sharing speech recognition system

Patent number: 6006186

Abstract: A method and an apparatus for a parameter sharing speech recognition system are provided. Speech signals are received into a processor of a speech recognition system. The speech signals are processed using a speech recognition system hosting a shared hidden Markov model (HMM) produced by generating a number of phoneme models, some of which are shared. The phoneme models are generated by retaining as a separate phoneme model any triphone model having a number of trained frames available that exceeds a prespecified threshold. A shared phoneme model is generated to represent each of the groups of triphone phoneme models for which the number of trained frames having a common biphone exceed the prespecified threshold. A shared phoneme model is generated to represent each of the groups of triphone phoneme models for which the number of trained frames having an equivalent effect on a phonemic context exceed the prespecified threshold.

Type: Grant

Filed: October 16, 1997

Date of Patent: December 21, 1999

Assignees: Sony Corporation, Sony Electronics, Inc.

Inventors: Ruxin Chen, Miyuki Tanaka, Duanpei Wu, Lex S. Olorenshaw
Speech recognition system having a quantizer using a single robust codebook designed at multiple signal to noise ratios

Patent number: 6003003

Abstract: In one embodiment, a speech recognition system is organized with a fuzzy matrix quantizer with a single codebook representing u codewords. The single codebook is designed with entries from u codebooks which are designed with respective words at multiple signal to noise ratio levels. Such entries are, in one embodiment, centroids of clustered training data. The training data is, in one embodiment, derived from line spectral frequency pairs representing respective speech input signals at various signal to noise ratios. The single codebook trained in this manner provides a codebook for a robust front end speech processor, such as the fuzzy matrix quantizer, for training a speech classifier such as a u hidden Markov models and a speech post classifier such as a neural network. In one embodiment, a fuzzy Viterbi algorithm is used with the hidden Markov models to describe the speech input signal probabilistically.

Type: Grant

Filed: June 27, 1997

Date of Patent: December 14, 1999

Assignee: Advanced Micro Devices, Inc.

Inventors: Safdar M. Asghar, Lin Cong
Method for recognizing alpha-numeric strings in a Chinese speech recognition system

Patent number: 5995934

Abstract: A recognition method for alpha-numeric strings in a Chinese speech recognition system, uses a special coding scheme to map each of 36 alpha-numeric symbols into an easily remembered Chinese idiom or word consisting of a multiple of Chinese characters. When representing a numeral, each idiom/word starts with the Chinese character for that numeral. When representing an English alphabet letter, each idiom/word will have a first character which starts with that English alphabet letter in its Pinyin form. If it is necessary to include some control words, idiom/words similar in semantics can be used. The method resolves the problem of unreliable recognition when a string of random alpha-numeric symbols or some control words are inputted by voice to a Chinese speech recognition system.

Type: Grant

Filed: August 28, 1998

Date of Patent: November 30, 1999

Assignee: International Business Machines Corporation

Inventor: Donald T. Tang
Method for performing stochastic matching for use in speaker verification

Patent number: 5995927

Abstract: A method and an apparatus for performing stochastic matching of a set of input test speech data with a corresponding set of training speech data. In particular, a set of input test speech feature information, having been generated from an input test speech utterance, is transformed so that the stochastic characteristics thereof more closely match the stochastic characteristics of a corresponding set of training speech feature information. The corresponding set of training speech data may, for example, comprise training data which was generated from a speaker having the claimed identity of the speaker of the input test speech utterance. Specifically, in accordance with the present invention, a first covariance matrix representative of stochastic characteristics of input test speech feature information is generated based on the input test speech feature information.

Type: Grant

Filed: March 14, 1997

Date of Patent: November 30, 1999

Assignee: Lucent Technologies Inc.

Inventor: Qi P. Li
Speech recognition system employing multiple grammar networks

Patent number: 5991720

Abstract: The input speech is segmented using plural grammar networks, including a network that includes a filler model designed to represent noise or extraneous speech. Recognition processing results in plural lists of candidates, each list containing the N-best candidates generated. The lists are then separately aligned with the dictionary of valid names to generate two lists of valid names. The final recognition pass combines these two lists of names into a dynamic grammar and this dynamic grammar may be used to find the best candidate name using Viterbi recognition. A telephone call routing application based on the recognition system selects the best candidate name corresponding to the name spelled by the user, whether the user pronounces the name prior to spelling, or not.

Type: Grant

Filed: April 16, 1997

Date of Patent: November 23, 1999

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Michael Galler, Jean-Claude Junqua
Method and apparatus for pattern recognition utilizing gaussian distribution functions

Patent number: 5991442

Abstract: The present invention provides a method and apparatus for performing pattern recognition on given information such as speech data or image data with a reduced amount of calculations of the degree of matching associated with reference patterns. The method and apparatus provides a high speed operation without an increase in the amount of calculations of the degree of matching even when there are a great number of reference patterns.

Type: Grant

Filed: May 1, 1996

Date of Patent: November 23, 1999

Assignee: Canon Kabushiki Kaisha

Inventors: Masayuki Yamada, Yasuhiro Komori
Method and device for recognizing speech in a spelling mode including word qualifiers

Patent number: 5987410

Abstract: A method and device for recognizing speech that has a sequence of words each including one or more letters. The word and letters form a recognition data base. The method receives and recognizes the speech by preliminary modelling among various probably recognized sequences. The method selects one or more model sequences as result. In particular, the method allows in a model sequence of exclusively letters, various words as a subset. Such words are used to qualify one or more neighbouring or included letters in the sequence. An applicable model is a mixed information unit model.

Type: Grant

Filed: November 10, 1997

Date of Patent: November 16, 1999

Assignee: U.S. Philips Corporation

Inventors: Andreas Kellner, Frank Seide
Recognition of sequential data using finite state sequence models organized in a tree structure

Patent number: 5983180

Abstract: In a method of automatically recognizing data which comprises sequential data units represented as sequential tokens grouped into one or more items, known items are stored as respective finite state sequence models. Each state corresponds to a token and the models which have common prefix states are organized in a tree structure such that suffix states comprise branches from common prefix states and there are a plurality of tree structures each having a different prefix state. Each sequential data unit is compared with stored reference data units identified by reference tokens to generate scores indicating the similarity of the data units to reference data units.

Type: Grant

Filed: February 27, 1998

Date of Patent: November 9, 1999

Assignee: SoftSound Limited

Inventor: Anthony John Robinson
Speaker clustering apparatus based on feature quantities of vocal-tract configuration and speech recognition apparatus therewith

Patent number: 5983178

Abstract: A speaker clustering apparatus generates HMMs for clusters based on feature quantities of a vocal-tract configuration of speech waveform data, and a speech recognition apparatus provided with the speaker clustering apparatus. In response to the speech waveform data of N speakers, an estimator estimates feature quantities of vocal-tract configurations, with reference to correspondence between vocal-tract configuration parameters and Formant frequencies predetermined based on a predetermined vocal tract model of a standard speaker. Further, a clustering processor calculates speaker-to-speaker distances between the N speakers based on the feature quantities of the vocal-tract configurations of the N speakers as estimated, and clusters the vocal-tract configurations of the N speakers using a clustering algorithm based on calculated speaker-to-speaker distances, thereby generating K clusters.

Type: Grant

Filed: December 10, 1998

Date of Patent: November 9, 1999

Assignee: ATR Interpreting Telecommunications Research Laboratories

Inventors: Masaki Naito, Li Deng, Yoshinori Sagisaka
Method for detecting a signal pause between two patterns which are present on a time-variant measurement signal using hidden Markov models

Patent number: 5970452

Abstract: The method recognizes a signal pause between two patterns that are present in a time-variant measurement signal and that are recognized using hidden Markov models. In a first signal processing stage, feature vectors are formed periodically for pattern recognition, which describe a signal curve of a measurement signal within a time slice. No speech pause is detected by a pause detector contained therein in a first time slice based on present features of a first feature vector. In a second signal processing stage, in a second time slice that follows the first time slice the first feature vector is compared with at least two hidden Markov models, of which at least one has been trained to a pattern to be recognized and another has been trained to a pattern characteristic for a pause.

Type: Grant

Filed: September 4, 1997

Date of Patent: October 19, 1999

Assignee: Siemens Aktiengesellschaft

Inventors: Abdulmesih Aktas, Klaus Zunkler
Method and apparatus for improving acoustic fast match speed using a cache for phone probabilities

Patent number: 5963905

Abstract: Methods and apparatus for performing a tree search based acoustic fast match in a speech recognition system for decoding a speech utterance, the tree having a tree root and tree nodes connected by tree branches, the tree nodes having phonetic models associated therewith, are provided.

Type: Grant

Filed: October 24, 1997

Date of Patent: October 5, 1999

Assignee: International Business Machines Corporation

Inventors: Miroslav Novak, Michael Alan Picheny
Methods and apparatus for decreasing the size of generated models trained for automatic pattern recognition

Patent number: 5963902

Abstract: Methods and apparatus for producing efficiently sized models suitable for pattern recognition purposes are described. Various embodiments are directed to the automated generation, evaluation, and selection of reduced size models from an initial model having a relatively large number of components, e.g., more components than can be stored for a particular intended application. To achieve model size reduction in an automated iterative manner, expectation maximization (EM) model training techniques are combined, in accordance with the present invention, with model size constraints. In one embodiment, a new reduced size model is generated using a LaGrange multiplier from an input model and input size constraints during each iteration of the size reducing model training process. The reduced size model generated during one iteration of the process serves as the input to the next iteration. Scoring, e.g.

Type: Grant

Filed: July 30, 1997

Date of Patent: October 5, 1999

Assignee: Nynex Science & Technology, Inc.

Inventor: Kuansan Wang

prev … 8 9 10 11 12 13 14 next