Markov Patents (Class 704/256)
  • Patent number: 6370505
    Abstract: The present invention relates to a method of processing speech, in which input speech is processed to determine an input speech vector (or) representing a sample of the speech. A number of possible output states are defined with each output state (j) being represented by a number of state mixture components (m). Each state mixture component is then approximated by a weighted sum of a number of predetermined generic components (x), allowing the likelihoods of each output states (j) corresponding to the input speech vector (or) to be determined.
    Type: Grant
    Filed: April 30, 1999
    Date of Patent: April 9, 2002
    Inventor: Julian Odell
  • Patent number: 6370504
    Abstract: A technique to perform speech recognition directly from audio files compressed using the MPEG/Audio coding standard. The technique works in the compressed domain and does not require the MPEG/Audio file to be decompressed. Only the encoded subband signals are extracted and processed for training and recognition. The underlying speech recognition engine is based on the Hidden Markov model. The technique is applicable to layers I and II of MPEG/Audio and training under one layer can be used to recognize the other.
    Type: Grant
    Filed: May 22, 1998
    Date of Patent: April 9, 2002
    Assignee: University of Washington
    Inventors: Gregory L. Zick, Lawrence Yapp
  • Publication number: 20020035473
    Abstract: A new method, which builds the models at m-th step directly from models at the initial step, is provided to minimize the storage and calculation. The method therefore merges the M×N transformations into a single transformation. The merge guarantees the exactness of the transformations and make it possible for recognizers on mobile devices to have adaptation capability.
    Type: Application
    Filed: June 22, 2001
    Publication date: March 21, 2002
    Inventor: Yifan Gong
  • Patent number: 6349281
    Abstract: A voice model learning data creation method and apparatus makes possible the creation of an inexpensive voice model in a short period of time when creating a voice model for a new word not in a preexisting database. Verbal data from several persons is selected from among the verbal data held in the database. This selected verbal data is referred to as standard speaker data, and is stored in a standard speaker data storage component. The remaining verbal data in the preexisting database is designated as learning speaker data, as is stored in a learning speaker data storage component. A data conversion function from the standard speaker data space to the learning speaker data space is derived. Then, the learning data for the new word is created by the data conversion function. Thus, the data which is obtained from the standard speaker speaking the new word is converted to the learning speaker data space.
    Type: Grant
    Filed: January 22, 1998
    Date of Patent: February 19, 2002
    Assignee: Seiko Epson Corporation
    Inventors: Yasunaga Miyazawa, Hiroshi Hasegawa, Mitsuhiro Inazumi, Tadashi Aizawa
  • Patent number: 6341264
    Abstract: Electronic commerce (E-commerce) and Voice commerce (V-commerce) proceeds by having the user speak into the system. The user's speech is converted by speech recognizer into a form required by the transaction processor that effects the electronic commerce operation. A dimensionality reduction processor converts the user's input speech into a reduced dimensionality set of values termed eigenvoice parameters. These parameters are compared with a set of previously stored eigenvoice parameters representing a speaker population (the eigenspace representing speaker space) and the comparison is used by the speech model adaptation system to rapidly adapt the speech recognizer to the user's speech characteristics. The user's eigenvoice parameters are also stored for subsequent use by the speaker verification and speaker identification modules.
    Type: Grant
    Filed: February 25, 1999
    Date of Patent: January 22, 2002
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Roland Kuhn, Jean-Claude Junqua
  • Publication number: 20020002457
    Abstract: A method determines a representative sound on the basis of a structure which includes a set of sound models. Each sound model has at least one representative for the modeled sound. In the structure, a first sound model, matching with regard to a first quality criterion, is determined from the set of sound models. At least one second sound model is determined from the set of sound models dependent on a characteristic state criterion of the structure. At least some of the representatives of the first sound model and of the at least one second sound model are assessed in addition to the first quality criterion with regard to a second quality criterion. The at least one representative which has an adequate overall quality criterion with regard to the first and second quality criteria is determined as a representative sound from the representatives of the first sound model and the at least one second sound model.
    Type: Application
    Filed: August 21, 2001
    Publication date: January 3, 2002
    Inventor: Martin Holzapfel
  • Patent number: 6336108
    Abstract: The invention performs speech recognition using an array of mixtures of Bayesian networks. A mixture of Bayesian networks (MBN) consists of plural hypothesis-specific Bayesian networks (HSBNs) having possibly hidden and observed variables. A common external hidden variable is associated with the MBN, but is not included in any of the HSBNs. The number of HSBNs in the MBN corresponds to the number of states of the common external hidden variable, and each HSBN models the world under the hypothesis that the common external hidden variable is in a corresponding one of those states. In accordance with the invention, the MBNs encode the probabilities of observing the sets of acoustic observations given the utterance of a respective one of said parts of speech. Each of the HSBNs encodes the probabilities of observing the sets of acoustic observations given the utterance of a respective one of the parts of speech and given a hidden common variable being in a particular state.
    Type: Grant
    Filed: December 23, 1998
    Date of Patent: January 1, 2002
    Assignee: Microsoft Corporation
    Inventors: Bo Thiesson, Christopher A. Meek, David Maxwell Chickering, David Earl Heckerman, Fileno A. Alleva, Mei-Yuh Hwang
  • Publication number: 20010051871
    Abstract: A speech recognition system for transforming an acoustic signal into a stream of phonetic estimates includes a frequency analyzer for generating a short-time frequency representation of the acoustic signal. A novelty processor separates background components of the representation from region of interest components of the representation. The output of the novelty processor includes the region of interest components of the representation according to the novelty parameters. An attention processor produces a gating signal as a function of the novelty output according to attention parameters. A coincidence processor produces information regarding co-occurrences between samples of the novelty output over time and frequency. The coincidence processor selectively gates the coincidence output as a function of the gating signal according to one or more coincidence parameters.
    Type: Application
    Filed: March 23, 2001
    Publication date: December 13, 2001
    Inventor: John Kroeker
  • Patent number: 6327565
    Abstract: A set of speaker dependent models is trained upon a comparatively large number of training speakers, one model per speaker, and model parameters are extracted in a predefined order to construct a set of supervectors, one per speaker. Principal component analysis is then performed on the set of supervectors to generate a set of eigenvectors that define an eigenvoice space. If desired, the number of vectors may be reduced to achieve data compression. Thereafter, a new speaker provides adaptation data from which a supervector is constructed by constraining this supervector to be in the eigenvoice space. The resulting coefficients in the eigenspace of this new speaker may then be used to construct a new set of model parameters from which an adapted model is constructed for that speaker. Environmental adaptation may be performed by including environmental variations in the training data.
    Type: Grant
    Filed: April 30, 1998
    Date of Patent: December 4, 2001
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Roland Kuhn, Jean-Claude Junqua
  • Publication number: 20010047258
    Abstract: A speech control system and method is described, wherein a state definition information is loaded from a network application server. The state definition information defines possible states of the network application server and is used for determining a set of valid commands of the network application server, such that a validity of a text command obtained by converting an input speech command can be checked by comparing said text command with said determined set of valid commands. Thereby, a transmission of erroneous text commands to the network application server can be prevented so as to reduce total processing time and response delays.
    Type: Application
    Filed: March 16, 2001
    Publication date: November 29, 2001
    Inventor: Anthony Rodrigo
  • Patent number: 6324510
    Abstract: A method of organizing an acoustic model for speech recognition is comprised of the steps of calculating a measure of acoustic dissimilarity of subphonetic units. A clustering technique is recursively applied to the subphonetic units based on the calculated measure of acoustic dissimilarity to automatically generate a hierarchically arranged model. Each application of the clustering technique produces another level of the hierarchy with the levels progressing from the least specific to the most specific. A technique for adapting the structure and size of a trained acoustic model to an unseen domain using only a small amount of adaptation data is also disclosed.
    Type: Grant
    Filed: November 6, 1998
    Date of Patent: November 27, 2001
    Assignee: Lernout & Hauspie Speech Products N.V.
    Inventors: Alex Waibel, Juergen Fritsch
  • Patent number: 6324512
    Abstract: Users of the system can access the TV contents and program media recorder by speaking in natural language sentences. The user interacts with the television and with other multimedia equipment, such as media recorders and VCRs, through the unified access controller. A speaker verification/identification module determines the identity of the speaker and this information is used to control how the dialog between user and system proceeds. Speech can be input through either a microphone or over the telephone. In addition, the user can interact with the system using a suitable computer attached via the internet. Regardless of the mode of access, the unified access controller interprets the semantic content of the user's request and supplies the appropriate control signals to the television tuner and/or recorder.
    Type: Grant
    Filed: August 26, 1999
    Date of Patent: November 27, 2001
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Jean-Claude Junqua, Roland Kuhn, Tony Davis, Yi Zhao, Weiying Li
  • Patent number: 6324513
    Abstract: There is provided a spoken dialog system, in which an interactive operation is effectively carried out in a natural manner as to a speech containing words out of a set vocabulary.
    Type: Grant
    Filed: September 24, 1999
    Date of Patent: November 27, 2001
    Assignee: Mitsubishi Denki Kabushiki Kaisha
    Inventors: Akito Nagai, Keisuke Watanabe, Yasushi Ishikawa
  • Patent number: 6317712
    Abstract: Phonetic modeling includes the steps of forming triphone grammars (11) from phonetic data, training triphone models (13), clustering triphones (14) that are acoustically close together and mapping unclustered triphone grammars into a clustered model (16). The clustering process includes using a decision tree based on the acoustic likelihood and allows sub-model clusters in user-definable units.
    Type: Grant
    Filed: January 21, 1999
    Date of Patent: November 13, 2001
    Assignee: Texas Instruments Incorporated
    Inventors: Yu-Hung Kao, Kazuhiro Kondo
  • Patent number: 6314399
    Abstract: An apparatus generates a statistical class sequence model called A class bi-multigram model from input training strings of discrete-valued units, where bigram dependencies are assumed between adjacent variable length sequences of maximum length N units, and where class labels are assigned to the sequences. The number of times all sequences of units occur are counted, as well as the number of times all pairs of sequences of units co-occur in the input training strings. An initial bigram probability distribution of all the pairs of sequences is computed as the number of times the two sequences co-occur, divided by the number of times the first sequence occurs in the input training string. Then, the input sequences are classified into a pre-specified desired number of classes. Further, an estimate of the bigram probability distribution of the sequences is calculated by using an EM algorithm to maximize the likelihood of the input training string computed with the input probability distributions.
    Type: Grant
    Filed: April 13, 1999
    Date of Patent: November 6, 2001
    Assignee: ATR Interpreting Telecommunications Research
    Inventors: Sabine Deligne, Yoshinori Sagisaka, Hideharu Nakajima
  • Patent number: 6308155
    Abstract: An automatic speech recognition apparatus and method with a front end feature extractor that improves recognition performance under adverse acoustic conditions are disclosed. The inventive feature extractor is characterized by a critical bandwidth spectral resolution, an emphasis on slow changes in the spectral structure of the speech signal, and adaptive automatic gain control. In one the feature extractor includes a feature generator configured to compute short-term parameters of the speech signal, a filter system configured to filter the time sequences of the short-term parameters, and a normalizer configured to normalize the filtered parameters with respect to one or more previous values of the filtered parameters.
    Type: Grant
    Filed: May 25, 1999
    Date of Patent: October 23, 2001
    Assignee: International Computer Science Institute
    Inventors: Brian E. D. Kingsbury, Steven Greenberg, Nelson H. Morgan
  • Publication number: 20010032075
    Abstract: Disclosed is a speech recognition method in a speech recognition apparatus to applying speech recognition to a voice signal applied thereto. The input voice signal is converted from an analog to a digital signal and sequences of feature vectors are extracted based upon the digital signal (S12). A search space is defined by the sequences of feature vectors and an HMM (16) prepared beforehand for each unit of speech. The search space allows a transition between HMMs only in specific feature-vector sequences. A search is conducted in this space to find an optimum path for which the largest acoustic likelihood regarding the voice signal is obtained to find the result of recognition (S14), and this result is output (S15).
    Type: Application
    Filed: March 27, 2001
    Publication date: October 18, 2001
    Inventor: Hiroki Yamamoto
  • Patent number: 6304674
    Abstract: A method for recognizing user specified pen-based gestures uses Hidden Markov Models. A gesture recognizer is implemented which includes a fast pruning procedure. In addition, an incremental training method is utilized.
    Type: Grant
    Filed: August 3, 1998
    Date of Patent: October 16, 2001
    Assignee: Xerox Corporation
    Inventors: Todd A. Cass, Lynn D. Wilcox, Tichomir G. Tenev
  • Patent number: 6301561
    Abstract: A method and apparatus for speech recognition using Markov processes on curves are presented. The method and apparatus operate such that input speech utterances are received and represented as multidimensional curves. The curve is split into acoustic segments representing different components based on initial model estimates. The segments are used to create a new statistical model for the curve. The process may be reiterated to produce a more precise statistical model for recognition. As a result, feature vectors are extracted from input speech and contribute to a recognition score in proportion to their arc length. The arc lengths are weighted to minimize recognition errors due to variations in speaking rate. In addition, more importance is attached to short-lived but non-stationary sounds, such as consonants.
    Type: Grant
    Filed: September 18, 2000
    Date of Patent: October 9, 2001
    Assignee: AT&T Corporation
    Inventor: Lawrence Kevin Saul
  • Patent number: 6301562
    Abstract: A speech recognition method that combines time encoding and hidden Markov approaches. The speech is input and encoded using time encoding, such as TESPAR. A hidden Markov model generates scores; the scores are used to determine the speech element; and the result is output.
    Type: Grant
    Filed: April 27, 2000
    Date of Patent: October 9, 2001
    Assignee: New Transducers Limited
    Inventors: Henry Azima, Charalampos Ferekidis, Sean Kavanagh
  • Patent number: 6292779
    Abstract: A modeless large vocabulary continuous speech recognition system is provided that represents an input utterance as a sequence of input vectors. The system includes a common library of acoustic model states for arrangement in sequences that form acoustic models. Each acoustic model is composed of a sequence of segment models and each segment model is composed of a sequence of model states. An input processor compares each vector in a sequence of input vectors to a set of model states in the common library to produce a match score for each model state in the set, reflecting the likelihood that a state is represented by a vector. The system also includes a plurality of recognition modules and associated recognition grammars. The recognition modules operate in parallel and use the match scores with the acoustic models to determine at least one recognition result in each of the recognition modules.
    Type: Grant
    Filed: March 9, 1999
    Date of Patent: September 18, 2001
    Assignee: Lernout & Hauspie Speech Products N.V.
    Inventors: Brian Wilson, Manfred Grabherr, Ramesh Sarukkai, William F. Ganong, III
  • Patent number: 6292775
    Abstract: A speech processing system (10) incorporates an analogue to digital converter (16) to digitize input speech signals for Fourier transformation to produce short-term spectral cross-sections. These cross-sections are compared with one hundred and fifty reference patterns in a store (34), the patterns having respective stored sets of formant frequencies assigned thereto by a human expert. Six stored patterns most closely matching each input cross-section are selected for further processing by dynamic programming, which indicates the pattern which is a best match to the input cross-section by using frequency-scale warping to achieve alignment. The stores formant frequencies of the best matching pattern are modified by the frequency warping, and the results are used as formant frequency estimates for the input cross-section. The frequencies are further refined on the basis of the shape of the input cross-section near to the chosen formants.
    Type: Grant
    Filed: February 18, 1999
    Date of Patent: September 18, 2001
    Assignee: The Secretary of State for Defence in Her Britannic Majesty's Government of the United Kingdom of Great Britain and Northern Ireland
    Inventor: John N Holmes
  • Patent number: 6292778
    Abstract: An automated speech recognition system comprises a preprocessor, a speech recognizer, and a task-independent utterance verifier. The task independent utterance verifier employs a first subword acoustic Hidden Markov Model for determining a first likelihood that a speech segment contains a sound corresponding to a speech recognition hypothesis, and a second anti-subword acoustic Hidden Markov Model for determining a second likelihood that a speech segment contains a sound other than one corresponding to the speech recognition hypothesis. In operation, the utterance verifier employs the subword and anti-subword models to produce for each recognized subword in the input speech the first and second likelihoods. The utterance verifier determines a subword verification score as the log of the ratio of the first and second likelihoods.
    Type: Grant
    Filed: October 30, 1998
    Date of Patent: September 18, 2001
    Assignee: Lucent Technologies Inc.
    Inventor: Rafid Antoon Sukkar
  • Patent number: 6285981
    Abstract: A speed up speech recognition search method is provided wherein the number of HMM states is determined and a microslot is allocated for Hidden Markov Models (HMMs) below a given threshold level of states. A macroslot treats a whole HMM as a basic unit. The lowest level of macroslot is a phone. If the number of states exceeds the threshold level a microslot is allocated for this HMM.
    Type: Grant
    Filed: June 7, 1999
    Date of Patent: September 4, 2001
    Assignee: Texas Instruments Incorporated
    Inventor: Yu-Hung Kao
  • Patent number: 6285980
    Abstract: A natural number recognition method and system that uses minimum classification error trained inter-word context dependent models of the head-body-tail type over a specific vocabulary. One part of the method and system allows recognition of spoken monetary amounts in financial transactions. A second part of the method and system allows recognition of numbers such as credit card or U.S. telephone numbers. A third part of the method and system allows recognition of natural language expressions of time, such as time of day, day of the week and date of the month for applications such as scheduling or schedule inquires. Even though limited natural language expressions are allowed, context sharing between similar sounds in the vocabulary within a head-body-tail model keeps storage and processing time requirements to manageable levels.
    Type: Grant
    Filed: November 2, 1998
    Date of Patent: September 4, 2001
    Assignee: Lucent Technologies Inc.
    Inventors: Malan Bhatki Gandhi, John Jacob
  • Publication number: 20010018653
    Abstract: In a speech recognition system, the received speech and the sequence of words, recognized in the speech by a recognizer (100), are stored in a memory (320, 330). Markers are stored as well, indicating a correspondence between the word and a segment of the received signal in which the word was recognized. In a synchronous reproduction mode, a controller (310) ensures that the speech is played-back via speakers (350) and that for each speech segment a word, which has been recognized for the segment, is indicated (e.g. highlighted) on a display (340) can detect whether the user has provided an editing instruction, while the synchronous reproduction is active. If so, the synchronous reproduction is automatically paused and the editing instruction executed.
    Type: Application
    Filed: December 19, 2000
    Publication date: August 30, 2001
    Inventor: Heribert Wutte
  • Publication number: 20010011218
    Abstract: A continuous, speaker independent, speech recognition method and system for recognizing a variety of vocabulary input signals. A language model which is an implicit description of a graph consisting of a plurality of states and arcs is inputted into the system. An input speech signal, corresponding to a plurality of speech frames is received and processed using a shared memory multipurpose machine having a plurality of microprocessors working in parallel to produce a textual representation of the speech signal.
    Type: Application
    Filed: March 13, 2001
    Publication date: August 2, 2001
    Inventors: Steven Phillips, Anne Rogers
  • Patent number: 6269334
    Abstract: A statistical modeling paradigm for automatic machine recognition of speech uses mixtures of nongaussion statistical probability densities which provides improved recognition accuracy. Speech is modeled by building probability densities from functions of the form exp(−t&agr;/2) for t≧0 and &agr;>0. Mixture components are constructed from different univariate functions. The mixture model is used in a maximum likelihood model of speech data.
    Type: Grant
    Filed: June 25, 1998
    Date of Patent: July 31, 2001
    Assignee: International Business Machines Corporation
    Inventors: Sankar Basu, Charles A. Micchelli
  • Patent number: 6266636
    Abstract: A process for removing additive noise due to the influence of ambient circumstances in a real-time manner in order to improve the precision of speech recognition which is performed in a real-time manner includes a converting process for converting a selected speech model distribution into a representative distribution, combining a noise model with the converted to generate speech model a noise superimposed speech model, performing a first likelihood calculation to recognize an input speech by using the noise superimposed speech model, converting the noise superimposed speech model to a noise adapted distribution that retains the relationship of the selected speech model, and performing a second likelihood calculation to recognize the input speech by using the noise adapted distribution.
    Type: Grant
    Filed: March 11, 1998
    Date of Patent: July 24, 2001
    Assignee: Canon Kabushiki Kaisha
    Inventors: Tetsuo Kosaka, Yasuhiro Komori
  • Patent number: 6253178
    Abstract: Speech recognition systems and methods consistent with the present invention process input speech signals organized into a series of frames. The input speech signal is decimated to select K frames out of every L frames of the input speech signal according to a decimation rate K/L. A first set of model distances is then calculated for each of the K selected frames of the input speech signal, and a Hidden Markov Model (HMM) topology of a first set of models is reduced according to the decimation rate K/L. The system then selects a reduced set of model distances from the computed first set of model distances according to the reduced HMM topology and selects a first plurality of candidate choices for recognition according to the reduced set of model distances. A second set of model distances is computed, using a second set of models, for a second plurality of candidate choices, wherein the second plurality of candidate choices correspond to at least a subset of the first plurality of candidate choices.
    Type: Grant
    Filed: September 22, 1997
    Date of Patent: June 26, 2001
    Assignee: Nortel Networks Limited
    Inventors: Serge Robillard, Nadia Girolamo, Andre Gillet, Waleed Fakhr
  • Patent number: 6249761
    Abstract: A continuous, speaker independent, speech recognition method and system recognizes a variety of vocabulary input signals. A language model, which is an implicit description of a graph consisting of a plurality of states and arcs, is input into the system. An input speech signal, corresponding to a plurality of speech frames, is received and processed using a shared memory multipurpose machine having a plurality of microprocessors. Threads are created and assigned to processors, and active state subsets and active arc subsets are created and assigned to specific threads and associated microprocessors. Active state subsets and active arc subsets are processed in parallel to produce a textual representation of the speech signal.
    Type: Grant
    Filed: September 30, 1997
    Date of Patent: June 19, 2001
    Assignee: AT&T Corp.
    Inventors: Steven Phillips, Anne Rogers
  • Patent number: 6246985
    Abstract: A method and apparatus is disclosed for automatic segregation of signals of different origin, using models that statistically characterize a wave signal, more particularly including feature vectors consisting of a plurality of parameters extracted from a data stream of a known type for use in identifying data types by comparison, which can be Hidden Markov Model based methods, thereby enabling automatic data type identification and routing of received data streams to the appropriate destination device, thereby further enabling a user to transmit different data types over the same communication channel without changing communication settings.
    Type: Grant
    Filed: August 20, 1998
    Date of Patent: June 12, 2001
    Assignee: International Business Machines Corporation
    Inventors: Dimitri Kanevsky, Stephane H. Maes, Wlodek Wlodzimierz Zadrozny, Alexander Zlatsin
  • Patent number: 6243679
    Abstract: A pattern recognition system and method for optimal reduction of redundancy and size of a weighted and labeled graph presents receiving speech signals, converting the speech signals into word sequences, interpreting the word sequences in a graph where the graph is labeled with word sequences and weighted with probabilities and determinizing the graph by removing redundant word sequences. The size of the graph can also be minimized by collapsing some nodes of the graph in a reverse determinizing manner. The graph can further be tested for determinizability to determine if the graph can be determinized. The resulting word sequence in the graph may be shown in a display device so that recognition of speech signals can be demonstrated.
    Type: Grant
    Filed: October 2, 1998
    Date of Patent: June 5, 2001
    Assignee: AT&T Corporation
    Inventors: Mehryar Mohri, Fernando Carlos Neves Pereira, Michael Dennis Riley
  • Patent number: 6236963
    Abstract: In a speaker normalization processor apparatus, a vocal-tract configuration estimator estimates feature quantities of a vocal-tract configuration showing an anatomical configuration of a vocal tract of each normalization-target speaker, by looking up to a correspondence between vocal-tract configuration parameters and Formant frequencies previously determined based on a vocal tract model of the standard speaker, based on speech waveform data of each normalization-target speaker.
    Type: Grant
    Filed: March 16, 1999
    Date of Patent: May 22, 2001
    Assignee: ATR Interpreting Telecommunications Research Laboratories
    Inventors: Masaki Naito, Li Deng, Yoshinori Sagisaka
  • Patent number: 6226612
    Abstract: The present invention provides a method of calculating, within the framework of a speaker dependent system, a standard filler, or garbage model, for the detection of out-of-vocabulary utterances. In particular, the method receives new training data in a speech recognition system (202); calculates statistical parameters for the new training data (204); calculates global statistical parameters based upon the statistical parameters for the new training data (206); and updates a garbage model based upon the global statistical parameters (208). This is carried out on-line while the user is enrolling the vocabulary. The garbage model described in this disclosure is preferably an average speaker model, representative of all the speech data enrolled by the user to date. Also, the garbage model is preferably obtained as a by-product of the vocabulary enrollment procedure and is similar in it characteristics and topology to all the other regular vocabulary HMMs.
    Type: Grant
    Filed: January 30, 1998
    Date of Patent: May 1, 2001
    Assignee: Motorola, Inc.
    Inventors: Edward Srenger, Jeffrey A. Meunier, William M. Kushner
  • Patent number: 6226613
    Abstract: The invention provides an information decoding system which takes advantage of the finite duration of channel memory and other distortions to permit efficient decoding of hidden Markov modeled information while storing only a subset of matrices used by the previous art. The invention may be applied to the maximum a posteriori (MAP) estimation of the input symbols of an input-output hidden Markov model, which can be described by the input-output transition probability density matrices or, alternatively, by finite-state systems. The invention is also applied to MAP decoding of information transmitted over channels with bursts of errors, to handwriting and speech recognition and other probabilistic systems as well.
    Type: Grant
    Filed: October 30, 1998
    Date of Patent: May 1, 2001
    Assignee: AT&T Corporation
    Inventor: William Turin
  • Patent number: 6223159
    Abstract: Voice feature quantity extractor extracts feature vector time-series data by acoustic feature quantity analysis of the speaker's voice. Reference speaker-dependent conversion factor computation device computes reference speaker-dependent conversion factors through use of a reference speaker voice data feature vector and an initial standard pattern. The reference speaker-dependent conversion factors are stored in a reference speaker-dependent conversion factor storage device. Speaker-dependent conversion factor selector selects one or more sets of reference speaker-dependent conversion factors stored in the reference speaker-dependent conversion factor storage device. Speaker-dependent conversion factor computation device computes speaker-dependent conversion factors, through use of the selected one or more sets of reference speaker-dependent conversion factors.
    Type: Grant
    Filed: December 22, 1998
    Date of Patent: April 24, 2001
    Assignee: Mitsubishi Denki Kabushiki Kaisha
    Inventor: Jun Ishii
  • Patent number: 6223155
    Abstract: A speaker-dependent (SD) speech recognition system. The invention is specifically tailored to operate with very little training data, and also within hardware constraints such as limited memory and processing resources. A garbage model and a vocabulary model are generated and are subsequently used to perform comparison to a speech signal to decide if the speech signal is a specific vocabulary word. A word score is generated, and it is compared to a number of parameters, including an absolute threshold and another word score. Off-line training of the system is performed, in one embodiment, using compressed training tokens. A speech signal is segmented into scramble frames wherein the scramble frames have certain characteristics. For example, length is one characteristic of the scramble frames, each scramble frame having a length of an average vowel sound, or a predetermined length of nominally 40-50 msec. The invention is operable to be trained using as little as one single training token that is segmented.
    Type: Grant
    Filed: August 14, 1998
    Date of Patent: April 24, 2001
    Assignee: Conexant Systems, Inc.
    Inventor: Aruna Bayya
  • Patent number: 6219453
    Abstract: A method and apparatus for correcting misrecognized words appearing in electronic documents that have been generated by scanning an original document in accordance with an optical character recognition (“OCR”) technique. Each recognized word is generated by first producing, for each character position of the corresponding word in the original document, the N-best characters for occupying that character position. If an incorrect word is found in the electronic document, the present invention generates a plurality of reference words from which one is selected for replacing the incorrect word. This selected reference word is determined by the present invention to be the reference word that is the most likely correct replacement for the incorrect recognized word. This selection is accomplished by computing for each reference word a replacement word value. The reference word that is selected to replace the incorrect recognized word corresponds to the highest replacement word value.
    Type: Grant
    Filed: August 11, 1997
    Date of Patent: April 17, 2001
    Assignee: AT&T Corp.
    Inventor: Randy G. Goldberg
  • Patent number: 6219646
    Abstract: Methods and apparatus for performing translation between different language are provided. The present invention includes a translation system that performs translation having increased accuracy by providing a three-dimensional topical dual-language database. The topical database includes a set of source-to-target language translations for each topic that the database is being used for. In one embodiment, a user first selects the topic of conversation, then words spoken into a telephone are translated and produced as synthesized voice signals from another telephone so that a near real-time conversation may be had between two people speaking different languages. An additional feature of the present invention is the addition of a computer terminal that displays the input and output phrases so that either user may edit the input phrases, or indicate that the translation was ambiguous and request a rephrasing of the material.
    Type: Grant
    Filed: May 9, 2000
    Date of Patent: April 17, 2001
    Assignee: Gedanken Corp.
    Inventor: Julius Cherny
  • Patent number: 6219642
    Abstract: A speech recognition system utilizes multiple quantizers to process frequency parameters and mean compensated frequency parameters derived from an input signal. The quantizers may be matrix and vector quantizer pairs, and such quantizer pairs may also function as front ends to a second stage speech classifiers such as hidden Markov models (HMMs) and/or utilizes neural network postprocessing to, for example, improve speech recognition performance. Mean compensating the frequency parameters can remove noise frequency components that remain approximately constant during the duration of the input signal. HMM initial state and state transition probabilities derived from common quantizer types and the same input signal may be consolidated to improve recognition system performance and efficiency. Matrix quantization exploits the “evolution” of the speech short-term spectral envelopes as well as frequency domain information, and vector quantization (VQ) primarily operates on frequency domain information.
    Type: Grant
    Filed: October 5, 1998
    Date of Patent: April 17, 2001
    Assignee: Legerity, Inc.
    Inventors: Safdar M. Asghar, Lin Cong
  • Patent number: 6212500
    Abstract: In a method for determining the similarities of sounds across different languages, hidden Markov modelling of multilingual phonemes is employed wherein language-specific as well as language-independent properties are identified by combining of the probability densities for different hidden Markov sound models in various languages.
    Type: Grant
    Filed: March 9, 1999
    Date of Patent: April 3, 2001
    Assignee: Siemens Aktiengesellschaft
    Inventor: Joachim Köhler
  • Patent number: 6208967
    Abstract: For machine segmenting of speech, first utterances from a database of known spoken words are classified and segmented into three broad phonetic classes (BPC) voiced, unvoiced, and silence. Next, using preliminary segmentation positions as anchor points, sequence-constrained vector quantization is used for further segmentation into phoneme-like units. Finally, exact tuning to the segmented phonemes is done through Hidden-Markov Modelling and after training a diphone set is composed for further usage.
    Type: Grant
    Filed: February 25, 1997
    Date of Patent: March 27, 2001
    Assignee: U.S. Philips Corporation
    Inventors: Stefan C. Pauws, Yves G. C. Kamp, Leonardus F. W. Willems
  • Patent number: 6202047
    Abstract: A method and apparatus for speech recognition using second order statistics and linear estimation of cepstral coefficients. In one embodiment, a speech input signal is received and cepstral features are extracted. An answer is generated using the extracted cepstral features and a fixed signal independent diagonal matrix as the covariance matrix for the cepstral components of the speech input signal and, for example, a hidden Markov model. In another embodiment, a noisy speech input signal is received and a cepstral vector representing a clean speech input signal is generated based on the noisy speech input signal and an explicit linear minimum mean square error cepstral estimator.
    Type: Grant
    Filed: March 30, 1998
    Date of Patent: March 13, 2001
    Assignee: AT&T Corp.
    Inventors: Yariv Ephraim, Mazin G. Rahim
  • Patent number: 6195637
    Abstract: A method for correcting misrecognition errors comprises the steps of: dictating to a speech application; marking misrecognized words during the dictating step; and, after the dictating and marking steps, displaying and correcting the marked misrecognized words, whereby the correcting of the misrecognized words is deferred until after the dictating step is concluded and the dictating step is not significantly interrupted. The displaying and correcting step can be implemented by invoking a correction tool of the speech application, whereby the correcting of the misrecognized words trains the speech application.
    Type: Grant
    Filed: March 25, 1998
    Date of Patent: February 27, 2001
    Assignee: International Business Machines Corp.
    Inventors: Barbara E. Ballard, Kerry A. Ortega
  • Patent number: 6195636
    Abstract: In a system in which user equipment is connected to a packet network and a speech recognition application server is also connected to the packet network for performing speech recognition on speech data from the user equipment, a speech recognition system selectively performs feature extraction at a user end before transmitting speech data to be recognized. The feature extraction is performed only for speech which is to be recognized.
    Type: Grant
    Filed: February 19, 1999
    Date of Patent: February 27, 2001
    Assignee: Texas Instruments Incorporated
    Inventors: Joseph A. Crupi, Zoran Mladenovic, Edward B. Morgan, Bogdan R. Kosanovic, Negendra Kumar
  • Patent number: 6188982
    Abstract: A system for adaptively generating a composite noisy speech model to process speech in, e.g., a nonstationary environment comprises a speech recognizer, a re-estimation circuit, a combiner circuit, a classifier circuit, and a discrimination circuit. In particular, the speech recognizer generates frames of current input utterances based on received speech data and determines which of the generated frames are aligned with noisy states to produce a current noise model. The re-estimation circuit re-estimates the produced current noise model by interpolating the number of frames in the current noise model with parameters from a previous noise model. The combiner circuit combines the parameters of the current noise model with model parameters of a corresponding current clean speech model to generate model parameters of a composite noisy speech model. The classifier circuit determines a discrimination function by generating a weighted PMC HMM model.
    Type: Grant
    Filed: December 1, 1997
    Date of Patent: February 13, 2001
    Assignee: Industrial Technology Research Institute
    Inventor: Tung-Hui Chiang
  • Patent number: 6185531
    Abstract: A method for improving the associating articles of information or stories with topics associated with specific subjects (subject topics) and with a general topic of words that are not associated with any subject. The inventive method is trained using Hidden Markov Models (HMM) to represent each story with each state in the HMM representing each topic. A standard Expectation and Maximization algorithm, as are known in this art field can be used to maximize the expected likelihood to the method relating the words associated with each topic to that topic. In the method, the probability that each word in a story is related to a subject topic is determined and evaluated, and the subject topics with the lowest probability are discarded. The remaining subject topics are evaluated and a sub-set of subject topics with the highest probabilities over all the words in a story are considered to be the “correct” subject topic set.
    Type: Grant
    Filed: January 9, 1998
    Date of Patent: February 6, 2001
    Assignee: GTE Internetworking Incorporated
    Inventors: Richard M. Schwartz, Toru Imai
  • Patent number: 6185528
    Abstract: A method and a device for recognition of isolated words in large vocabularies are described, wherein recognition is performed through two sequential steps using neural networks and Markov models techniques, respectively, and the results of both techniques are adequately combined so as to improve recognition accuracy. The devices performing the combination also provide an evaluation of recognition reliability.
    Type: Grant
    Filed: April 29, 1999
    Date of Patent: February 6, 2001
    Assignee: CSELT - Centro Studi e Laboratori Telecomunicazioni S.p.A.
    Inventors: Luciano Fissore, Roberto Gemello, Franco Ravera
  • Patent number: 6182026
    Abstract: For translating a word-organized source text into a word-organized target text through mapping of source words on target words, both a translation model and a language model are used. In particular, alignment probabilities are ascertained between various source word & target word pairs, whilst preemptively assuming that alignment between such word pairs is monotonous through at least substantial substrings of a particular sentence. This is done by evaluating incrementally statistical translation performance of various target word strings, deciding on an optimum target word string, and outputting the latter.
    Type: Grant
    Filed: June 26, 1998
    Date of Patent: January 30, 2001
    Assignee: U.S. Philips Corporation
    Inventors: Christoph Tillmann, Stephan Vogel, Hermann Ney