Abstract: The invention provides a method of speech recognition comprising the steps of receiving a signal comprising one or more spoken words, extracting a spoken word from the signal using a Hidden Markov Model, passing the spoken word to a plurality of word models, one or more of the word models based on a Hidden Markov Model, determining the word model most likely to represent the spoken word, and outputting the word model representing the spoken word. The invention also provides a related speech recognition system and a speech recognition computer program.
Type:
Application
Filed:
September 5, 2003
Publication date:
March 4, 2004
Inventors:
Nikola Kirilov Kasabov, Waleed Habib Abdulla
Abstract: Client speaker locations in a speaker space are used to generate speech models for comparison with test speaker data or test speaker speech models. The speaker space can be constructed using training speakers that are entirely separate from the population of client speakers, or from client speakers, or from a mix of training and client speakers. Reestimation of the speaker space based on client environment information is also provided to improve the likelihood that the client data will fall within the speaker space. During enrollment of the clients into the speaker space, additional client speech can be obtained when predetermined conditions are met. The speaker distribution can also be used in the client enrollment step.
Type:
Grant
Filed:
July 5, 2000
Date of Patent:
February 24, 2004
Assignee:
Matsushita Electric Industrial Co., Ltd.
Inventors:
Roland Kuhn, Olivier Thyes, Patrick Nguyen, Jean-Claude Junqua, Robert Boman
Abstract: A method for use in a speech recognition system in which a speech waveform to be modelled is represented by a set of feature extracted parameters in the time domain, the method comprising dividing individual ones of one or more of said feature extracted parameters to provide for each divided feature extracted parameter a plurality of frequency channels, and demodulating at least one of the plurality of frequency channels to provide at least one corresponding baseband frequency signal.
Abstract: A technique for audio searches by statistical pattern matching is disclosed. The audio to be located is processed for feature extraction and decoded using a maximum likelihood (“ML”) search. A left-right Hidden Markov Model (“HMM”) is constructed from the ML state sequence. Transition probabilities are defined as normalized state occupancies from the most likely state sequence of the decoding operation. Utterance duration is measured from the search sample. Other model parameters are gleaned from an acoustic model. A ML search of an audio corpus is conducted with respect to the HMM and a garbage model. New start states are added at each frame. Low scoring and long state sequences (with respect to the search sample duration) are discarded at each frame. Locations where scores of the new model are higher than those of the garbage model are marked as potential matches. The highest scoring matches are presented as results.
Abstract: A differential compression technique is disclosed for compression individual speaker models, such as Gaussian mixture models, by computing a delta model from the difference between an individual speaker model and a baseline model. Further compression may be applied to the delta model to reduce the large storage requirements generally attributed to speaker models.
Abstract: A method and system that improves voice recognition by improving storage of voice recognition (VR) templates. The improved storage means that more VR models can be stored in memory. The more VR models that are stored in memory, the more robust the VR system and therefore the more accurate the VR system. Lossy compression techniques are used to compress VR models. In one embodiment, A-law compression and A-law expansion are used to compress and expand VR models. In another embodiment, Mu-law compression and Mu-law expansion are used to compress and expand VR models. VR models are compressed during a training process and they are expanded during voice recognition.
Abstract: An arrangement is provided for embedded coupled hidden Markov model. To train an embedded coupled hidden Markov model, training data is first segmented into uniform segments at different layers of the embedded coupled hidden Markov model. At each layer, a uniform segment corresponds to a state of a coupled hidden Markov model at that layer. An optimal segmentation is generated at the lower layer based on the uniform segmentation and is then used to update parameters of models associated with the states of coupled hidden Markov models at lower layer. The updated model parameters at the lower layer are then used to update the model parameters associated with states at the super layer.
Abstract: A method and system that combines voice recognition engines and resolves any differences between the results of individual voice recognition engines. A speaker independent (SI) Hidden Markov Model (HMM) engine, a speaker independent Dynamic Time Warping (DTW-SI) engine and a speaker dependent Dynamic Time Warping (DTW-SD) engine are combined. Combining and resolving the results of these engines results in a system with better recognition accuracy and lower rejection rates than using the results of only one engine.
Type:
Grant
Filed:
July 18, 2000
Date of Patent:
December 30, 2003
Assignee:
Qualcomm Incorporated
Inventors:
Harinath Garudadri, David Puig Oses, Ning Bi, Yingyong Qi
Abstract: Methods and systems for recognizing speech include receiving information reflecting the speech, determining at least one broad-class of the received information, classifying the received information based on the determined broad-class, selecting a model based on the classification of the received information, and recognizing the speech using the selected model and the received information.
Type:
Application
Filed:
June 13, 2002
Publication date:
December 18, 2003
Applicant:
INDUSTRIAL TECHNOLOGY RESEARCH INSTITUTE
Abstract: Audio coding processes like quantization can cause spectral components of an encoded audio signal to be set to zero, creating spectral holes in the signal. These spectral holes can degrade the perceived quality of audio signals that are reproduced by audio coding systems. An improved decoder avoids or reduces the degradation by filling the spectral holes with synthesized spectral components. An improved encoder may also be used to realize further improvements in the decoder.
Type:
Application
Filed:
June 17, 2002
Publication date:
December 18, 2003
Inventors:
Michael Mead Truman, Grant Allen Davidson, Matthew Conrad Fellers, Mark Stuart Vinton, Matthew Aubrey Watson, Charles Quito Robinson
Abstract: An adaptive speech recognition method with noise compensation is disclosed. In speech recognition, optimal equalization factors for feature vectors of a plurality of speech frames corresponding to each probability density function in a speech model are determined based on the plurality of speech frames of the input speech and the speech model. The parameters of the speech model are adapted by the optimal equalization factor and a bias compensation vector, which is corresponding to and retrieved by the optimal equalization factor. The optimal equalization factor is provided to adjust a distance of the mean vector in the speech model. The bias compensation vector is provided to adjust a direction change of the mean vector in the speech model.
Abstract: A method and apparatus is provided for identifying patterns from a series of feature vectors representing a time-varying signal. The method and apparatus use both a frame-based model and a segment model in a unified framework. The frame-based model determines the probability of an individual feature vector given a frame state. The segment model determines the probability of sub-sequences of feature vectors given a single segment state. The probabilities from the frame-based model and the segment model are then combined to form a single path score that is indicative of the probability of a sequence of patterns. Another aspect of the invention is the use of a frame-based model and a segment model to segment feature vectors during model training. Under this aspect of the invention, the frame-based model and the segment model are used together to identify probabilities associated with different segmentations.
Abstract: Speech recognition is performed by matching between a characteristic quantity of an inputted speech and a composite HMM obtained by synthesizing a speech HMM (hidden Markov model) and a noise HMM for each speech frame of the inputted speech by use of the composite HMM.
Type:
Application
Filed:
March 14, 2003
Publication date:
December 4, 2003
Applicant:
International Business Machines Corporation
Abstract: On improved transformation method uses an initial set of Hidden Markov Models (HMMs) trained on a large amount of speech recorded in a low noise environment R to provide rich information on co-articulation and speaker variation and a smaller database in a more noisy target environment T. A set H of HMMs is trained with data provided in the low noise environment R and the utterances in the noisy environment T are transcribed phonetically using set H of HMMs. The transcribed segments are grouped into a set of Classes C. For each subclass c of Classes C, the transformation &PHgr;c is found to maximize likelihood utterances in T, given H. The HMMs are transformed and steps repeated until likelihood stabilizes.
Abstract: A true/false judgment on a result of speech recognition is made with high accuracy using a less volume of processing. By comparing acoustic models HMMsb against the feature vector sequence V(n) of utterances, a recognition result RCG specifying the acoustic model HMMsb having the maximum likelihood, a first score FSCR indicting the value of the maximum likelihood, and a second score SSCR indicating the value of the second highest likelihood are found. Then, by comparing an evaluation value FSCR×(FSCR−SSCR) based on the first score FSCR and the second score SSCR with a pre-set threshold value THD, a true/false judgment on the recognition result RCG is made. When the recognition result RCG is judged as being true, speaker adaptation is applied to the acoustic models HMMsb, and when the recognition result RCG is judged as being false, speaker adaptation is not applied to the acoustic models HMMsb. It is thus possible to improve the accuracy of speaker adaptation.
Abstract: A speech recognition device comprises an HMM model database which prestores keyword HMMs which represent feature patterns of keywords to be recognized, likelihood calculator which calculates the likelihood of an extracted feature value of a speech signal in each frame by comparing it with keyword HMMs and designated-speech HMMs, extraneous-speech likelihood setting device which sets extraneous-speech likelihood based on the calculated likelihood of a match with the designated-speech HMMs, matching processor which performs a matching process based on the calculated likelihood and the extraneous-speech likelihood, and determining device which determines the keywords contained in the spontaneous speech based on the matching process.
Abstract: A speech recognition method includes use of synchronous or asynchronous audio and a video data to enhance speech recognition probabilities. A two stream coupled hidden Markov model is trained and used to identify speech. At least one stream is derived from audio data and a second stream is derived from mouth pattern data. Gestural or other suitable data streams can optionally be combined to reduce speech recognition error rates in noisy environments.
Abstract: A speech recognition method includes use of synchronous or asynchronous audio and a video data to enhance speech recognition probabilities. A two stream factorial hidden Markov model is trained and used to identify speech. At least one stream is derived from audio data and a second stream is derived from mouth pattern data. Gestural or other suitable data streams can optionally be combined to reduce speech recognition error rates in noisy environments.
Abstract: A speech recognition apparatus comprises a speech analyzer which extracts feature patterns of spontaneous speech divided into frames; a keyword model database which prestores keyword which represent feature patterns of a plurality of keywords to be recognized; a garbage model database which prestores feature patterns of components of extraneous speech to be identified; and a first likelihood calculator which calculates likelihood of feature values based on feature values patterns of each frames and keywords; a second likelihood calculator which calculates likelihood of feature values based on feature values patterns of each frames and extraneous speech. The device recognizes keywords contained in the spontaneous speech by calculating cumulative likelihood based on the calculated likelihood adding a predetermined correction value in the second likelihood calculator.
Abstract: The invention provides a method and apparatus for automatically generating a summary or key phrase for a song. The song, or a portion thereof, is digitized and converted into a sequence of feature vectors, such mel-frequency cepstral coefficients (MFCCs). The feature vectors are then processed in order decipher the song's structure. Those sections that correspond to different structural elements are then marked with corresponding labels. Once the song is labeled, various heuristics are applied to select a key phrase corresponding to the song's summary. For example, the system may identify the label that appears most frequently within the song, and then select the longest duration of that label as the summary.
Type:
Grant
Filed:
April 7, 2000
Date of Patent:
October 14, 2003
Assignee:
Hewlett-Packard Development Company, L.P.
Abstract: An estimate of clean speech vector, typically Mel-Frequency Cepstral Coefficient (MFCC) given its noisy observation is provided. The method makes use of two Gaussian mixtures. The first one is trained on clean speech and the second is derived from the first one using some noise samples. The method gives an estimate of a clean speech feature vector as the conditional expectancy of clean speech given an observed noisy vector.
Abstract: Systems and methods for identifying the N-best strings of a weighted automaton. A potential for each state of an input automaton to a set of destination states of the input automaton is first determined. Then, the N-best paths are found in the result of an on-the-fly determinization of the input automaton. Only the portion of the input automaton needed to identify the N-best paths is determinized. As the input automaton is determinized, a potential for each new state of the partially determinized automaton is determined and is used in identifying the N-best paths of the determinized automaton, which correspond exactly to the N-best strings of the input automaton.
Abstract: In many real applications such as voice control in vehicles there is the problem that the users change relatively frequently. Then the question arises: which is the correct data set for the current user? The invention provides a process making it possible automatically for the duration of operation of the system to recognize whether the speaker changes, or which (speaker dependent) data set is correct for the actual user. This task is solved by a speech recognition system which is based on a so-called Semi-Continuous Hidden Markov Model (SCHMM). Codebooks are produced, normal distribution is represented, speaker-specific data sets are stored in addition to a so-called base-line data set, and the inventive speech recognition system correlates the speech signal by means of vector quantitization with the speaker-independent and the speaker-dependent codebooks, making it possible to ascertain the identity of the speaker.
Type:
Application
Filed:
March 3, 2003
Publication date:
October 2, 2003
Inventors:
Fritz Class, Udo Haiber, Alfred Kaltenmeier
Abstract: A speech recognition method and system utilize an acoustic model that is capable of providing probabilities for both a large acoustic unit and an acoustic sub-unit. Each of these probabilities describes the likelihood of a set of feature vectors from a series of feature vectors representing a speech signal. The large acoustic unit is formed from a plurality of acoustic sub-units. At least one sub-unit probability and at least on large unit probability from the acoustic model are used by a decoder to generate a score for a sequence of hypothesized words. When combined, the acoustic sub-units associated with all of the sub-unit probabilities used to determine the score span fewer than all of the feature vectors in the series of feature vectors. An overlapping decoding technique is also provided.
Abstract: A model generation unit (17) is provided. The model generation unit includes an alignment module (80) arranged to receive pairs of sequences of parameter frame vectors from a buffer (16) and to perform dynamic time warping of the parameter frame vectors to align corresponding parts of the pair of utterances. A consistency checking module (82) is provided to determine whether the aligned parameter frame vectors correspond to the same word. If this is the case the aligned parameter frame vectors are passed to a clustering module (84) which groups the parameter frame vectors into a number of clusters. Whilst clustering the parameter frame vectors, the clustering module (80) determines for each grouping an objective function calculating the best fit of a model to the clusters per degrees of freedom of that model.
Abstract: A speech recognition system is disclosed including a model generation unit (20) and a speech recognition unit (22). When signals are received from a microphone (7) the model generation unit (20) utilises the signals to generate hidden Markov models that are stored in a hidden Markov model database (24). Subsequently, when utterances are to be recognised, the speech recognition unit (22) utilises the stored hidden Markov models to associate an utterance with a word. When a new hidden Markov model is generated by the model generation unit (20) the new hidden Markov model is processed by a confusability checker (26) against the hidden Markov models already stored in the database (24). A value indicative of the likelihood of utterances corresponding to the new model being confused with previously stored models is determined by the confusability checker (26) directly from the parameters for the new hidden Markov model and the other hidden Markov models stored in the database (24).
Abstract: An automatic speech recognition system for the condition that an incoming caller's speech is quiet and a resulting echo (of a loud playing prompt) can cause the residual (the portion of the echo remaining after even echo cancellation) to be of the magnitude of the incoming speech input. Such loud echoes can falsely trigger the speech recognition system and interfere with the recognition of valid input speech. An echo model has been proven to alleviate this fairly common problem and to be effective in eliminating such false triggering. Further, this automatic speech recognition system enhanced the recognition of valid speech was provided within an existing hidden Markov modeling framework.
Type:
Grant
Filed:
August 31, 2000
Date of Patent:
August 12, 2003
Assignee:
Lucent Technologies Inc.
Inventors:
Rathinavelu Chengalvarayan, Richard Harry Ketchum, Anand Rangaswamy Setlur, David Lynn Thomson
Abstract: Embodiments of the present invention provide a spoken language interface to an information database. A grammars database based on the entries contained in the information database may be generated. The entries in the grammars database may be a compact representation of the entries in the information database. An index database based on the entries contained in the information database may be generated. The grammars database and the index database may be updated periodically based on updated entries contained in the information database. A recognized result of a user's communication based on the updated grammars database may be generated. The updated index database may be searched for a list of matching entries that match the recognized result. The list of matching entries may be output.
Type:
Application
Filed:
December 31, 2002
Publication date:
August 7, 2003
Inventors:
Esther Levin, Susan Boyce, Brian Helfrich, Yevgeniy Lyudovyk, Robert Burke, Ilija Zeljkovic
Abstract: A maximum a posteriori (MAP) processor employs a block processing technique for the MAP algorithm to provide a parallel architecture that allows for multiple word memory read/write processing and voltage scaling of a given circuit implementation. The block processing technique forms a merged trellis with states having modified branch inputs to provide the parallel structure. When block processing occurs, the trellis may be modified to show transitions from the oldest state at time k−N to the present state at time k. For the merged trellis, the number of states remains the same, but each state receives 2N input transitions instead of the two input transitions. Branch metrics associated with the transitions in the merged trellis are cumulative, and are employed for the update process of forward and backward probabilities by the MAP algorithm.
Type:
Application
Filed:
January 22, 2002
Publication date:
July 24, 2003
Inventors:
Thaddeus J. Gabara, Inkyu Lee, Maria Luisa Lopez-Vallejo, Syed Mujtaba
Abstract: In a text recognition system, the computational efficiency of a text line image decoding operation is improved by utilizing the characteristic of a graph known as the cut set. The branches of the data structure that represents the image are initially labeled with estimated scores. When estimated scores are used, the decoding operation must perform iteratively on a text line before producing the best path through the data structure. After each iteration, nodes in the best path are re-scored with actual scores. The decoding operation incorporates an operating mode called skip mode.
Type:
Grant
Filed:
May 12, 2000
Date of Patent:
July 15, 2003
Inventors:
Thomas P. Minka, Dan S. Bloomberg, Ashok C. Popat
Abstract: An apparatus for voice-activated control of an electrical device comprises a receiving arrangement for receiving audio data generated by user. A vioce recognition arrangement is provided for determining whether the received audio data is a command word for controlling the electrical device. The voice recognition arrangement includes a microprocessor for comparing the received audio data with voice recognition data previously stored in the voice recognition arrangement. The voice recognition arrangment generates at least one control signal based on the comparison when the comparison reaches a predetermined threshold value. A power control controls power delivered to the electrical device. The power control is responsive to at least one control signal generated by the voice recognition arrangement for operating the electrical device in response to the at least one audio command generated by the user.
Abstract: The present invention is a method and apparatus to determine a similarity measure between first and second patterns. First and second storages store first and second feature vectors which represent the first and second patterns, respectively. A similarity estimator is coupled to the first and second storages to compute a similarity probability of the first and second feature vectors using a piecewise linear probability density function (PDF). The similarity probability corresponds to the similarity measure.
Abstract: A method of signal modelling comprises inputting to a statistical signal modelling system the output of a deterministic modelling system to thereby effect a reduction in the overall computational overhead.
Abstract: A keyword recognition system for speaker dependent, dynamic time warping (DTW) recognition systems uses all of the trained word templates in the system, (keyword and vocabulary), to determine if an utterance is a keyword utterance or not. The utterance is selected as the keyword if a keyword score indicates a significant match to the keyword template and if the keyword score indicates a better match than do the entirety of scores to the vocabulary word templates.
Abstract: Unweighted finite state automata may be used in speech recognition systems, but considerably reduce the speed and accuracy of the speech recognition system. Unfortunately, developing a suitable training corpus for a speech recognition task is time consuming and expensive, if it is even possible. Additionally, it is unlikely that a training corpus could adequately reflect the various probabilities for the word and/or phoneme combinations. Accordingly, such very-large-vocabulary speech recognition systems often must be used in an unweighted state. The directed graph optimizing systems and methods determine the shortest distances between source and end nodes of a weighted directed graph. These various directed graph optimizing systems and methods also reweight the directed graph based on the determined shortest distances, so that the weights are, for example, front weighted. Accordingly, searches through the directed graph that are based on the total weights of the paths taken will be more efficient.
Abstract: The invention relates to pre-processing of a pronunciation dictionary for compression in a data processing device, the pronunciation dictionary comprising at least one entry, the entry comprising a sequence of character units and a sequence of phoneme units. According to one aspect of the invention the sequence of character units and the sequence of phoneme units are aligned using a statistical algorithm. The aligned sequence of character units and aligned sequence of phoneme units are interleaved by inserting each phoneme unit at a predetermined location relative to the corresponding character unit.
Abstract: A large vocabulary speech recognizer including a combined weighted network of transducers reflecting fully expanded context-dependent modeling of pronunciations and language that can be used with a single-pass Viterbi or other coder based on sequences of labels provided by feature analysis of input speech.
Abstract: A method and system of performing confidence measure in a speech recognition system includes receiving an utterance of input speech and creating a near-miss pattern or a near-miss list of possible word entries for the utterance. Each word entry includes an associated value of probability that the utterance corresponds to the word entry. The near-miss list of possible word entries is compared with corresponding stored near-miss confidence templates. Each word in the vocabulary (or keyword list) of near-miss confidence template, which includes a list of word entries and each word entry in each list includes an associated value. Confidence measure for a particular hypothesis word is performed based on the comparison of the values in the near-miss list of possible word entries with the values of the corresponding near-miss confidence template.
Type:
Grant
Filed:
November 13, 1998
Date of Patent:
May 27, 2003
Assignee:
Microsoft Corporation
Inventors:
Hsiao-Wuen Hon, Asela J. R. Gunawardana
Abstract: An information processing apparatus inputs a document having a plurality of input items, and displays it using an information display unit. An active input item is discriminated from the plurality of input items in accordance with the display state of the document. A specific grammar corresponding to the discriminated active input item is selected from a grammar holding unit for holding a plurality of types of grammars, and the selected grammar is used in a speech recognition process.
Abstract: An HMM-based text-to-phoneme parser uses probability information within a probability database to generate one or more phoneme strings for a written input word. Techniques for training the text-to-phoneme parser are provided.
Abstract: A speech recognition system is trained to be sensitive not only to the actual spoken text, but also to the manner in which the text is spoken, for example, whether something is said confidently, or hesitatingly. In the preferred embodiment, this is achieved by using a Hidden Markov Model (HMM) as the recognition engine, and training the HMM to recognise different styles of input. This approach finds particular application in the telephony voice processing environment, where short caller responses need to be recognised, and the system can then react in a fashion appropriate to the tone or manner in which the caller has spoken.
Type:
Application
Filed:
December 20, 2002
Publication date:
May 8, 2003
Applicant:
International Business Machines Corporation
Abstract: A low complexity speaker verification system that employs universal cohort models an automatic score thresholding. The universal cohort models are generated using a simplified cohort model generating scheme. In certain embodiments of the invention, a simplified hidden Markov modeling (HMM) scheme is used to generate the cohort models. In addition, the low complexity speaker verification system is trained by various users of the low complexity speaker verification system. The total number of users of the low complexity speaker verification system may be modified over time as required by the specific application, and the universal cohort models may be updated accordingly to accommodate the new users. The present invention employs a combination of universal cohort modeling and thresholding to ensure high performance.
Abstract: A method and apparatus is provided for using multiple feature streams in speech recognition. In the method and apparatus, a feature extractor generates at least two feature vectors for a segment of an input signal. A decoder then generates a path score that is indicative of the probability that a word is represented by the input signal. The path score is generated by selecting the best feature vector to use for each segment. For each segment, the corresponding part in the path score for that segment is based in part on a chosen segment score that is selected from a group of at least two segment scores. The segment scores each represent a separate probability that a particular segment unit (e.g. senone, phoneme, diphone, triphone, or word) appears in that segment of the input signal. Although each segment score in the group relates to the same segment unit, the scores are based on different feature vectors for the segment.
Abstract: In speech recognition based on HMM, in which speech recognition is performed by performing vector quantization and obtaining an output probability by table reference, the amount of computation and use of memory area are minimized while achieving a high ability of recognition.
Abstract: Briefly, in accordance with one embodiment of the invention, a recognition system may modify speech models using noise in an input signal that is received prior to a speech sample in the input sample.
Abstract: The improved noise adaptation technique employs a linear or non-linear transformation to the set of Jacobian matrices corresponding to an initial noise condition. An &agr;-adaptation parameter or artificial intelligence operation is employed in a linear or non-linear way to increase the adaptation bias added to the speech models. This corrects shortcomings of conventional Jacobian adaptation, which tend to underestimate the effect of noise. The improved adaptation technique is further enhanced by a reduced dimensionality, principal component analysis technique that reduces the computational burden, making the adaptation technique beneficial in embedded recognition systems.
Type:
Grant
Filed:
April 18, 2000
Date of Patent:
March 4, 2003
Assignee:
Matsushita Electric Industrial Co., Ltd.
Inventors:
Christophe Cerisara, Luca Rigazio, Robert Boman, Jean-Claude Junqua
Abstract: A processor-based system may utilize a remote control unit which not only allows mouse input commands to be provided to the processor-based system but also includes a microphone and a speech engine for decoding spoken commands and providing code for presenting the commands to the processor-based unit. The processor-based system may provide information to the remote control unit about the vocabulary currently being used by applications active on the processor-based system. This allows the speech engine in the remote control unit to focus on a more limited vocabulary, increasing the accuracy of the speech recognition function and decreasing the capabilities necessary in the remote control unit based speech engine.
Abstract: The discriminative clustering technique tests a provided set of Gaussian distributions corresponding to an acoustic vector space. A distance metric, such as the Bhattacharyya distance, is used to assess which distributions are sufficiently proximal to be merged into a new distribution. Merging is accomplished by computing the centroid of the new distribution by minimizing the Bhattacharyya distance between the parameters of the Gaussian distributions being merged.
Type:
Grant
Filed:
November 29, 1999
Date of Patent:
February 25, 2003
Assignee:
Matsushita Electric Industrial Co., Ltd.
Abstract: A character recognition system is disclosed, In a feature extraction parameter storage section 22 a transformation matrix for reducing a number of dimensions of feature parameters and a codebook for quantization are stored. In an HMM storage section 23 a constitution and parameters of Hidden Markov Model (HMM) for character string expression are stored. A feature extraction section 32 scans a word image given from an image storage means from left to right in a predetermined cycle with a slit having a sufficiently small width than the character width and thus outputs a feature symbol at each predetermined timing. A matching section 33 matches a feature symbol row and a probability maximization HMM state, thereby recognizing the character string.
Abstract: A huge vocabulary speech recognition system for recognizing a sequence of spoken words, having an input means for receiving a time-sequential input pattern representative of the sequence of spoken words. The system further includes a plurality of large vocabulary speech recognizers each being associated with a respective, different large vocabulary recognition model. Each of the recognition models is targeted to a specific part of the huge vocabulary. The system comprises a controller operative to direct the input pattern to a plurality of the speech recognizers and to select a recognized word sequence from the word sequences recognized by the plurality of speech recognizers.
Type:
Grant
Filed:
August 9, 1999
Date of Patent:
February 25, 2003
Assignee:
Koninklijke Philips Electronics N.V.
Inventors:
Eric Thelen, Stefan Besling, Meinhard Ullrich