Markov Patents (Class 704/256)

Hidden markov model (hmm) (epo) (Class 704/256.1)

Speech recognition system and method

Patent number: 6370505

Abstract: The present invention relates to a method of processing speech, in which input speech is processed to determine an input speech vector (or) representing a sample of the speech. A number of possible output states are defined with each output state (j) being represented by a number of state mixture components (m). Each state mixture component is then approximated by a weighted sum of a number of predetermined generic components (x), allowing the likelihoods of each output states (j) corresponding to the input speech vector (or) to be determined.

Type: Grant

Filed: April 30, 1999

Date of Patent: April 9, 2002

Inventor: Julian Odell
Speech recognition on MPEG/Audio encoded files

Patent number: 6370504

Abstract: A technique to perform speech recognition directly from audio files compressed using the MPEG/Audio coding standard. The technique works in the compressed domain and does not require the MPEG/Audio file to be decompressed. Only the encoded subband signals are extracted and processed for training and recognition. The underlying speech recognition engine is based on the Hidden Markov model. The technique is applicable to layers I and II of MPEG/Audio and training under one layer can be used to recognize the other.

Type: Grant

Filed: May 22, 1998

Date of Patent: April 9, 2002

Assignee: University of Washington

Inventors: Gregory L. Zick, Lawrence Yapp
Accumulating transformations for hierarchical linear regression HMM adaptation

Publication number: 20020035473

Abstract: A new method, which builds the models at m-th step directly from models at the initial step, is provided to minimize the storage and calculation. The method therefore merges the M×N transformations into a single transformation. The merge guarantees the exactness of the transformations and make it possible for recognizers on mobile devices to have adaptation capability.

Type: Application

Filed: June 22, 2001

Publication date: March 21, 2002

Inventor: Yifan Gong
Voice model learning data creation method and its apparatus

Patent number: 6349281

Abstract: A voice model learning data creation method and apparatus makes possible the creation of an inexpensive voice model in a short period of time when creating a voice model for a new word not in a preexisting database. Verbal data from several persons is selected from among the verbal data held in the database. This selected verbal data is referred to as standard speaker data, and is stored in a standard speaker data storage component. The remaining verbal data in the preexisting database is designated as learning speaker data, as is stored in a learning speaker data storage component. A data conversion function from the standard speaker data space to the learning speaker data space is derived. Then, the learning data for the new word is created by the data conversion function. Thus, the data which is obtained from the standard speaker speaking the new word is converted to the learning speaker data space.

Type: Grant

Filed: January 22, 1998

Date of Patent: February 19, 2002

Assignee: Seiko Epson Corporation

Inventors: Yasunaga Miyazawa, Hiroshi Hasegawa, Mitsuhiro Inazumi, Tadashi Aizawa
Adaptation system and method for E-commerce and V-commerce applications

Patent number: 6341264

Abstract: Electronic commerce (E-commerce) and Voice commerce (V-commerce) proceeds by having the user speak into the system. The user's speech is converted by speech recognizer into a form required by the transaction processor that effects the electronic commerce operation. A dimensionality reduction processor converts the user's input speech into a reduced dimensionality set of values termed eigenvoice parameters. These parameters are compared with a set of previously stored eigenvoice parameters representing a speaker population (the eigenspace representing speaker space) and the comparison is used by the speech model adaptation system to rapidly adapt the speech recognizer to the user's speech characteristics. The user's eigenvoice parameters are also stored for subsequent use by the speaker verification and speaker identification modules.

Type: Grant

Filed: February 25, 1999

Date of Patent: January 22, 2002

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Roland Kuhn, Jean-Claude Junqua
Method and configuration for determining a representative sound, method for synthesizing speech, and method for speech processing

Publication number: 20020002457

Abstract: A method determines a representative sound on the basis of a structure which includes a set of sound models. Each sound model has at least one representative for the modeled sound. In the structure, a first sound model, matching with regard to a first quality criterion, is determined from the set of sound models. At least one second sound model is determined from the set of sound models dependent on a characteristic state criterion of the structure. At least some of the representatives of the first sound model and of the at least one second sound model are assessed in addition to the first quality criterion with regard to a second quality criterion. The at least one representative which has an adequate overall quality criterion with regard to the first and second quality criteria is determined as a representative sound from the representatives of the first sound model and the at least one second sound model.

Type: Application

Filed: August 21, 2001

Publication date: January 3, 2002

Inventor: Martin Holzapfel
Speech recognition with mixtures of bayesian networks

Patent number: 6336108

Abstract: The invention performs speech recognition using an array of mixtures of Bayesian networks. A mixture of Bayesian networks (MBN) consists of plural hypothesis-specific Bayesian networks (HSBNs) having possibly hidden and observed variables. A common external hidden variable is associated with the MBN, but is not included in any of the HSBNs. The number of HSBNs in the MBN corresponds to the number of states of the common external hidden variable, and each HSBN models the world under the hypothesis that the common external hidden variable is in a corresponding one of those states. In accordance with the invention, the MBNs encode the probabilities of observing the sets of acoustic observations given the utterance of a respective one of said parts of speech. Each of the HSBNs encodes the probabilities of observing the sets of acoustic observations given the utterance of a respective one of the parts of speech and given a hidden common variable being in a particular state.

Type: Grant

Filed: December 23, 1998

Date of Patent: January 1, 2002

Assignee: Microsoft Corporation

Inventors: Bo Thiesson, Christopher A. Meek, David Maxwell Chickering, David Earl Heckerman, Fileno A. Alleva, Mei-Yuh Hwang
Novel approach to speech recognition

Publication number: 20010051871

Abstract: A speech recognition system for transforming an acoustic signal into a stream of phonetic estimates includes a frequency analyzer for generating a short-time frequency representation of the acoustic signal. A novelty processor separates background components of the representation from region of interest components of the representation. The output of the novelty processor includes the region of interest components of the representation according to the novelty parameters. An attention processor produces a gating signal as a function of the novelty output according to attention parameters. A coincidence processor produces information regarding co-occurrences between samples of the novelty output over time and frequency. The coincidence processor selectively gates the coincidence output as a function of the gating signal according to one or more coincidence parameters.

Type: Application

Filed: March 23, 2001

Publication date: December 13, 2001

Inventor: John Kroeker
Speaker and environment adaptation based on eigenvoices

Patent number: 6327565

Abstract: A set of speaker dependent models is trained upon a comparatively large number of training speakers, one model per speaker, and model parameters are extracted in a predefined order to construct a set of supervectors, one per speaker. Principal component analysis is then performed on the set of supervectors to generate a set of eigenvectors that define an eigenvoice space. If desired, the number of vectors may be reduced to achieve data compression. Thereafter, a new speaker provides adaptation data from which a supervector is constructed by constraining this supervector to be in the eigenvoice space. The resulting coefficients in the eigenspace of this new speaker may then be used to construct a new set of model parameters from which an adapted model is constructed for that speaker. Environmental adaptation may be performed by including environmental variations in the training data.

Type: Grant

Filed: April 30, 1998

Date of Patent: December 4, 2001

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Roland Kuhn, Jean-Claude Junqua
Method and system of configuring a speech recognition system

Publication number: 20010047258

Abstract: A speech control system and method is described, wherein a state definition information is loaded from a network application server. The state definition information defines possible states of the network application server and is used for determining a set of valid commands of the network application server, such that a validity of a text command obtained by converting an input speech command can be checked by comparing said text command with said determined set of valid commands. Thereby, a transmission of erroneous text commands to the network application server can be prevented so as to reduce total processing time and response delays.

Type: Application

Filed: March 16, 2001

Publication date: November 29, 2001

Inventor: Anthony Rodrigo
Method and apparatus of hierarchically organizing an acoustic model for speech recognition and adaptation of the model to unseen domains

Patent number: 6324510

Abstract: A method of organizing an acoustic model for speech recognition is comprised of the steps of calculating a measure of acoustic dissimilarity of subphonetic units. A clustering technique is recursively applied to the subphonetic units based on the calculated measure of acoustic dissimilarity to automatically generate a hierarchically arranged model. Each application of the clustering technique produces another level of the hierarchy with the levels progressing from the least specific to the most specific. A technique for adapting the structure and size of a trained acoustic model to an unseen domain using only a small amount of adaptation data is also disclosed.

Type: Grant

Filed: November 6, 1998

Date of Patent: November 27, 2001

Assignee: Lernout & Hauspie Speech Products N.V.

Inventors: Alex Waibel, Juergen Fritsch
System and method for allowing family members to access TV contents and program media recorder over telephone or internet

Patent number: 6324512

Abstract: Users of the system can access the TV contents and program media recorder by speaking in natural language sentences. The user interacts with the television and with other multimedia equipment, such as media recorders and VCRs, through the unified access controller. A speaker verification/identification module determines the identity of the speaker and this information is used to control how the dialog between user and system proceeds. Speech can be input through either a microphone or over the telephone. In addition, the user can interact with the system using a suitable computer attached via the internet. Regardless of the mode of access, the unified access controller interprets the semantic content of the user's request and supplies the appropriate control signals to the television tuner and/or recorder.

Type: Grant

Filed: August 26, 1999

Date of Patent: November 27, 2001

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Jean-Claude Junqua, Roland Kuhn, Tony Davis, Yi Zhao, Weiying Li
Spoken dialog system capable of performing natural interactive access

Patent number: 6324513

Abstract: There is provided a spoken dialog system, in which an interactive operation is effectively carried out in a natural manner as to a speech containing words out of a set vocabulary.

Type: Grant

Filed: September 24, 1999

Date of Patent: November 27, 2001

Assignee: Mitsubishi Denki Kabushiki Kaisha

Inventors: Akito Nagai, Keisuke Watanabe, Yasushi Ishikawa
Method of phonetic modeling using acoustic decision tree

Patent number: 6317712

Abstract: Phonetic modeling includes the steps of forming triphone grammars (11) from phonetic data, training triphone models (13), clustering triphones (14) that are acoustically close together and mapping unclustered triphone grammars into a clustered model (16). The clustering process includes using a decision tree based on the acoustic likelihood and allows sub-model clusters in user-definable units.

Type: Grant

Filed: January 21, 1999

Date of Patent: November 13, 2001

Assignee: Texas Instruments Incorporated

Inventors: Yu-Hung Kao, Kazuhiro Kondo
Apparatus for generating a statistical sequence model called class bi-multigram model with bigram dependencies assumed between adjacent sequences

Patent number: 6314399

Abstract: An apparatus generates a statistical class sequence model called A class bi-multigram model from input training strings of discrete-valued units, where bigram dependencies are assumed between adjacent variable length sequences of maximum length N units, and where class labels are assigned to the sequences. The number of times all sequences of units occur are counted, as well as the number of times all pairs of sequences of units co-occur in the input training strings. An initial bigram probability distribution of all the pairs of sequences is computed as the number of times the two sequences co-occur, divided by the number of times the first sequence occurs in the input training string. Then, the input sequences are classified into a pre-specified desired number of classes. Further, an estimate of the bigram probability distribution of the sequences is calculated by using an EM algorithm to maximize the likelihood of the input training string computed with the input probability distributions.

Type: Grant

Filed: April 13, 1999

Date of Patent: November 6, 2001

Assignee: ATR Interpreting Telecommunications Research

Inventors: Sabine Deligne, Yoshinori Sagisaka, Hideharu Nakajima
Feature extraction for automatic speech recognition

Patent number: 6308155

Abstract: An automatic speech recognition apparatus and method with a front end feature extractor that improves recognition performance under adverse acoustic conditions are disclosed. The inventive feature extractor is characterized by a critical bandwidth spectral resolution, an emphasis on slow changes in the spectral structure of the speech signal, and adaptive automatic gain control. In one the feature extractor includes a feature generator configured to compute short-term parameters of the speech signal, a filter system configured to filter the time sequences of the short-term parameters, and a normalizer configured to normalize the filtered parameters with respect to one or more previous values of the filtered parameters.

Type: Grant

Filed: May 25, 1999

Date of Patent: October 23, 2001

Assignee: International Computer Science Institute

Inventors: Brian E. D. Kingsbury, Steven Greenberg, Nelson H. Morgan
Speech recognition method, apparatus and storage medium

Publication number: 20010032075

Abstract: Disclosed is a speech recognition method in a speech recognition apparatus to applying speech recognition to a voice signal applied thereto. The input voice signal is converted from an analog to a digital signal and sequences of feature vectors are extracted based upon the digital signal (S12). A search space is defined by the sequences of feature vectors and an HMM (16) prepared beforehand for each unit of speech. The search space allows a transition between HMMs only in specific feature-vector sequences. A search is conducted in this space to find an optimum path for which the largest acoustic likelihood regarding the voice signal is obtained to find the result of recognition (S14), and this result is output (S15).

Type: Application

Filed: March 27, 2001

Publication date: October 18, 2001

Inventor: Hiroki Yamamoto
System and method for recognizing user-specified pen-based gestures using hidden markov models

Patent number: 6304674

Abstract: A method for recognizing user specified pen-based gestures uses Hidden Markov Models. A gesture recognizer is implemented which includes a fast pruning procedure. In addition, an incremental training method is utilized.

Type: Grant

Filed: August 3, 1998

Date of Patent: October 16, 2001

Assignee: Xerox Corporation

Inventors: Todd A. Cass, Lynn D. Wilcox, Tichomir G. Tenev
Automatic speech recognition using multi-dimensional curve-linear representations

Patent number: 6301561

Abstract: A method and apparatus for speech recognition using Markov processes on curves are presented. The method and apparatus operate such that input speech utterances are received and represented as multidimensional curves. The curve is split into acoustic segments representing different components based on initial model estimates. The segments are used to create a new statistical model for the curve. The process may be reiterated to produce a more precise statistical model for recognition. As a result, feature vectors are extracted from input speech and contribute to a recognition score in proportion to their arc length. The arc lengths are weighted to minimize recognition errors due to variations in speaking rate. In addition, more importance is attached to short-lived but non-stationary sounds, such as consonants.

Type: Grant

Filed: September 18, 2000

Date of Patent: October 9, 2001

Assignee: AT&T Corporation

Inventor: Lawrence Kevin Saul
Speech recognition using both time encoding and HMM in parallel

Patent number: 6301562

Abstract: A speech recognition method that combines time encoding and hidden Markov approaches. The speech is input and encoded using time encoding, such as TESPAR. A hidden Markov model generates scores; the scores are used to determine the speech element; and the result is output.

Type: Grant

Filed: April 27, 2000

Date of Patent: October 9, 2001

Assignee: New Transducers Limited

Inventors: Henry Azima, Charalampos Ferekidis, Sean Kavanagh
System and method for modeless large vocabulary speech recognition

Patent number: 6292779

Abstract: A modeless large vocabulary continuous speech recognition system is provided that represents an input utterance as a sequence of input vectors. The system includes a common library of acoustic model states for arrangement in sequences that form acoustic models. Each acoustic model is composed of a sequence of segment models and each segment model is composed of a sequence of model states. An input processor compares each vector in a sequence of input vectors to a set of model states in the common library to produce a match score for each model state in the set, reflecting the likelihood that a state is represented by a vector. The system also includes a plurality of recognition modules and associated recognition grammars. The recognition modules operate in parallel and use the match scores with the acoustic models to determine at least one recognition result in each of the recognition modules.

Type: Grant

Filed: March 9, 1999

Date of Patent: September 18, 2001

Assignee: Lernout & Hauspie Speech Products N.V.

Inventors: Brian Wilson, Manfred Grabherr, Ramesh Sarukkai, William F. Ganong, III
Speech processing system using format analysis

Patent number: 6292775

Abstract: A speech processing system (10) incorporates an analogue to digital converter (16) to digitize input speech signals for Fourier transformation to produce short-term spectral cross-sections. These cross-sections are compared with one hundred and fifty reference patterns in a store (34), the patterns having respective stored sets of formant frequencies assigned thereto by a human expert. Six stored patterns most closely matching each input cross-section are selected for further processing by dynamic programming, which indicates the pattern which is a best match to the input cross-section by using frequency-scale warping to achieve alignment. The stores formant frequencies of the best matching pattern are modified by the frequency warping, and the results are used as formant frequency estimates for the input cross-section. The frequencies are further refined on the basis of the shape of the input cross-section near to the chosen formants.

Type: Grant

Filed: February 18, 1999

Date of Patent: September 18, 2001

Assignee: The Secretary of State for Defence in Her Britannic Majesty's Government of the United Kingdom of Great Britain and Northern Ireland

Inventor: John N Holmes
Task-independent utterance verification with subword-based minimum verification error training

Patent number: 6292778

Abstract: An automated speech recognition system comprises a preprocessor, a speech recognizer, and a task-independent utterance verifier. The task independent utterance verifier employs a first subword acoustic Hidden Markov Model for determining a first likelihood that a speech segment contains a sound corresponding to a speech recognition hypothesis, and a second anti-subword acoustic Hidden Markov Model for determining a second likelihood that a speech segment contains a sound other than one corresponding to the speech recognition hypothesis. In operation, the utterance verifier employs the subword and anti-subword models to produce for each recognized subword in the input speech the first and second likelihoods. The utterance verifier determines a subword verification score as the log of the ratio of the first and second likelihoods.

Type: Grant

Filed: October 30, 1998

Date of Patent: September 18, 2001

Assignee: Lucent Technologies Inc.

Inventor: Rafid Antoon Sukkar
Speed up speech recognition search using macro evaluator

Patent number: 6285981

Abstract: A speed up speech recognition search method is provided wherein the number of HMM states is determined and a microslot is allocated for Hidden Markov Models (HMMs) below a given threshold level of states. A macroslot treats a whole HMM as a basic unit. The lowest level of macroslot is a phone. If the number of states exceeds the threshold level a microslot is allocated for this HMM.

Type: Grant

Filed: June 7, 1999

Date of Patent: September 4, 2001

Assignee: Texas Instruments Incorporated

Inventor: Yu-Hung Kao
Context sharing of similarities in context dependent word models

Patent number: 6285980

Abstract: A natural number recognition method and system that uses minimum classification error trained inter-word context dependent models of the head-body-tail type over a specific vocabulary. One part of the method and system allows recognition of spoken monetary amounts in financial transactions. A second part of the method and system allows recognition of numbers such as credit card or U.S. telephone numbers. A third part of the method and system allows recognition of natural language expressions of time, such as time of day, day of the week and date of the month for applications such as scheduling or schedule inquires. Even though limited natural language expressions are allowed, context sharing between similar sounds in the vocabulary within a head-body-tail model keeps storage and processing time requirements to manageable levels.

Type: Grant

Filed: November 2, 1998

Date of Patent: September 4, 2001

Assignee: Lucent Technologies Inc.

Inventors: Malan Bhatki Gandhi, John Jacob
Synchronous reproduction in a speech recognition system

Publication number: 20010018653

Abstract: In a speech recognition system, the received speech and the sequence of words, recognized in the speech by a recognizer (100), are stored in a memory (320, 330). Markers are stored as well, indicating a correspondence between the word and a segment of the received signal in which the word was recognized. In a synchronous reproduction mode, a controller (310) ensures that the speech is played-back via speakers (350) and that for each speech segment a word, which has been recognized for the segment, is indicated (e.g. highlighted) on a display (340) can detect whether the user has provided an editing instruction, while the synchronous reproduction is active. If so, the synchronous reproduction is automatically paused and the editing instruction executed.

Type: Application

Filed: December 19, 2000

Publication date: August 30, 2001

Inventor: Heribert Wutte
A SYSTEM AND APPARATUS FOR RECOGNIZING SPEECH

Publication number: 20010011218

Abstract: A continuous, speaker independent, speech recognition method and system for recognizing a variety of vocabulary input signals. A language model which is an implicit description of a graph consisting of a plurality of states and arcs is inputted into the system. An input speech signal, corresponding to a plurality of speech frames is received and processed using a shared memory multipurpose machine having a plurality of microprocessors working in parallel to produce a textual representation of the speech signal.

Type: Application

Filed: March 13, 2001

Publication date: August 2, 2001

Inventors: Steven Phillips, Anne Rogers
Nongaussian density estimation for the classification of acoustic feature vectors in speech recognition

Patent number: 6269334

Abstract: A statistical modeling paradigm for automatic machine recognition of speech uses mixtures of nongaussion statistical probability densities which provides improved recognition accuracy. Speech is modeled by building probability densities from functions of the form exp(−t&agr;/2) for t≧0 and &agr;>0. Mixture components are constructed from different univariate functions. The mixture model is used in a maximum likelihood model of speech data.

Type: Grant

Filed: June 25, 1998

Date of Patent: July 31, 2001

Assignee: International Business Machines Corporation

Inventors: Sankar Basu, Charles A. Micchelli
Single distribution and mixed distribution model conversion in speech recognition method, apparatus, and computer readable medium

Patent number: 6266636

Abstract: A process for removing additive noise due to the influence of ambient circumstances in a real-time manner in order to improve the precision of speech recognition which is performed in a real-time manner includes a converting process for converting a selected speech model distribution into a representative distribution, combining a noise model with the converted to generate speech model a noise superimposed speech model, performing a first likelihood calculation to recognize an input speech by using the noise superimposed speech model, converting the noise superimposed speech model to a noise adapted distribution that retains the relationship of the selected speech model, and performing a second likelihood calculation to recognize the input speech by using the noise adapted distribution.

Type: Grant

Filed: March 11, 1998

Date of Patent: July 24, 2001

Assignee: Canon Kabushiki Kaisha

Inventors: Tetsuo Kosaka, Yasuhiro Komori
Search and rescoring method for a speech recognition system

Patent number: 6253178

Abstract: Speech recognition systems and methods consistent with the present invention process input speech signals organized into a series of frames. The input speech signal is decimated to select K frames out of every L frames of the input speech signal according to a decimation rate K/L. A first set of model distances is then calculated for each of the K selected frames of the input speech signal, and a Hidden Markov Model (HMM) topology of a first set of models is reduced according to the decimation rate K/L. The system then selects a reduced set of model distances from the computed first set of model distances according to the reduced HMM topology and selects a first plurality of candidate choices for recognition according to the reduced set of model distances. A second set of model distances is computed, using a second set of models, for a second plurality of candidate choices, wherein the second plurality of candidate choices correspond to at least a subset of the first plurality of candidate choices.

Type: Grant

Filed: September 22, 1997

Date of Patent: June 26, 2001

Assignee: Nortel Networks Limited

Inventors: Serge Robillard, Nadia Girolamo, Andre Gillet, Waleed Fakhr
Assigning and processing states and arcs of a speech recognition model in parallel processors

Patent number: 6249761

Abstract: A continuous, speaker independent, speech recognition method and system recognizes a variety of vocabulary input signals. A language model, which is an implicit description of a graph consisting of a plurality of states and arcs, is input into the system. An input speech signal, corresponding to a plurality of speech frames, is received and processed using a shared memory multipurpose machine having a plurality of microprocessors. Threads are created and assigned to processors, and active state subsets and active arc subsets are created and assigned to specific threads and associated microprocessors. Active state subsets and active arc subsets are processed in parallel to produce a textual representation of the speech signal.

Type: Grant

Filed: September 30, 1997

Date of Patent: June 19, 2001

Assignee: AT&T Corp.

Inventors: Steven Phillips, Anne Rogers
Method and apparatus for automatic segregation and routing of signals of different origins by using prototypes

Patent number: 6246985

Abstract: A method and apparatus is disclosed for automatic segregation of signals of different origin, using models that statistically characterize a wave signal, more particularly including feature vectors consisting of a plurality of parameters extracted from a data stream of a known type for use in identifying data types by comparison, which can be Hidden Markov Model based methods, thereby enabling automatic data type identification and routing of received data streams to the appropriate destination device, thereby further enabling a user to transmit different data types over the same communication channel without changing communication settings.

Type: Grant

Filed: August 20, 1998

Date of Patent: June 12, 2001

Assignee: International Business Machines Corporation

Inventors: Dimitri Kanevsky, Stephane H. Maes, Wlodek Wlodzimierz Zadrozny, Alexander Zlatsin
Systems and methods for determinization and minimization a finite state transducer for speech recognition

Patent number: 6243679

Abstract: A pattern recognition system and method for optimal reduction of redundancy and size of a weighted and labeled graph presents receiving speech signals, converting the speech signals into word sequences, interpreting the word sequences in a graph where the graph is labeled with word sequences and weighted with probabilities and determinizing the graph by removing redundant word sequences. The size of the graph can also be minimized by collapsing some nodes of the graph in a reverse determinizing manner. The graph can further be tested for determinizability to determine if the graph can be determinized. The resulting word sequence in the graph may be shown in a display device so that recognition of speech signals can be demonstrated.

Type: Grant

Filed: October 2, 1998

Date of Patent: June 5, 2001

Assignee: AT&T Corporation

Inventors: Mehryar Mohri, Fernando Carlos Neves Pereira, Michael Dennis Riley
Speaker normalization processor apparatus for generating frequency warping function, and speech recognition apparatus with said speaker normalization processor apparatus

Patent number: 6236963

Abstract: In a speaker normalization processor apparatus, a vocal-tract configuration estimator estimates feature quantities of a vocal-tract configuration showing an anatomical configuration of a vocal tract of each normalization-target speaker, by looking up to a correspondence between vocal-tract configuration parameters and Formant frequencies previously determined based on a vocal tract model of the standard speaker, based on speech waveform data of each normalization-target speaker.

Type: Grant

Filed: March 16, 1999

Date of Patent: May 22, 2001

Assignee: ATR Interpreting Telecommunications Research Laboratories

Inventors: Masaki Naito, Li Deng, Yoshinori Sagisaka
Method of evaluating an utterance in a speech recognition system

Patent number: 6226612

Abstract: The present invention provides a method of calculating, within the framework of a speaker dependent system, a standard filler, or garbage model, for the detection of out-of-vocabulary utterances. In particular, the method receives new training data in a speech recognition system (202); calculates statistical parameters for the new training data (204); calculates global statistical parameters based upon the statistical parameters for the new training data (206); and updates a garbage model based upon the global statistical parameters (208). This is carried out on-line while the user is enrolling the vocabulary. The garbage model described in this disclosure is preferably an average speaker model, representative of all the speech data enrolled by the user to date. Also, the garbage model is preferably obtained as a by-product of the vocabulary enrollment procedure and is similar in it characteristics and topology to all the other regular vocabulary HMMs.

Type: Grant

Filed: January 30, 1998

Date of Patent: May 1, 2001

Assignee: Motorola, Inc.

Inventors: Edward Srenger, Jeffrey A. Meunier, William M. Kushner
Decoding input symbols to input/output hidden markoff models

Patent number: 6226613

Abstract: The invention provides an information decoding system which takes advantage of the finite duration of channel memory and other distortions to permit efficient decoding of hidden Markov modeled information while storing only a subset of matrices used by the previous art. The invention may be applied to the maximum a posteriori (MAP) estimation of the input symbols of an input-output hidden Markov model, which can be described by the input-output transition probability density matrices or, alternatively, by finite-state systems. The invention is also applied to MAP decoding of information transmitted over channels with bursts of errors, to handwriting and speech recognition and other probabilistic systems as well.

Type: Grant

Filed: October 30, 1998

Date of Patent: May 1, 2001

Assignee: AT&T Corporation

Inventor: William Turin
Speaker adaptation device and speech recognition device

Patent number: 6223159

Abstract: Voice feature quantity extractor extracts feature vector time-series data by acoustic feature quantity analysis of the speaker's voice. Reference speaker-dependent conversion factor computation device computes reference speaker-dependent conversion factors through use of a reference speaker voice data feature vector and an initial standard pattern. The reference speaker-dependent conversion factors are stored in a reference speaker-dependent conversion factor storage device. Speaker-dependent conversion factor selector selects one or more sets of reference speaker-dependent conversion factors stored in the reference speaker-dependent conversion factor storage device. Speaker-dependent conversion factor computation device computes speaker-dependent conversion factors, through use of the selected one or more sets of reference speaker-dependent conversion factors.

Type: Grant

Filed: December 22, 1998

Date of Patent: April 24, 2001

Assignee: Mitsubishi Denki Kabushiki Kaisha

Inventor: Jun Ishii
Method of independently creating and using a garbage model for improved rejection in a limited-training speaker-dependent speech recognition system

Patent number: 6223155

Abstract: A speaker-dependent (SD) speech recognition system. The invention is specifically tailored to operate with very little training data, and also within hardware constraints such as limited memory and processing resources. A garbage model and a vocabulary model are generated and are subsequently used to perform comparison to a speech signal to decide if the speech signal is a specific vocabulary word. A word score is generated, and it is compared to a number of parameters, including an absolute threshold and another word score. Off-line training of the system is performed, in one embodiment, using compressed training tokens. A speech signal is segmented into scramble frames wherein the scramble frames have certain characteristics. For example, length is one characteristic of the scramble frames, each scramble frame having a length of an average vowel sound, or a predetermined length of nominally 40-50 msec. The invention is operable to be trained using as little as one single training token that is segmented.

Type: Grant

Filed: August 14, 1998

Date of Patent: April 24, 2001

Assignee: Conexant Systems, Inc.

Inventor: Aruna Bayya
Method and apparatus for performing an automatic correction of misrecognized words produced by an optical character recognition technique by using a Hidden Markov Model based algorithm

Patent number: 6219453

Abstract: A method and apparatus for correcting misrecognized words appearing in electronic documents that have been generated by scanning an original document in accordance with an optical character recognition (“OCR”) technique. Each recognized word is generated by first producing, for each character position of the corresponding word in the original document, the N-best characters for occupying that character position. If an incorrect word is found in the electronic document, the present invention generates a plurality of reference words from which one is selected for replacing the incorrect word. This selected reference word is determined by the present invention to be the reference word that is the most likely correct replacement for the incorrect recognized word. This selection is accomplished by computing for each reference word a replacement word value. The reference word that is selected to replace the incorrect recognized word corresponds to the highest replacement word value.

Type: Grant

Filed: August 11, 1997

Date of Patent: April 17, 2001

Assignee: AT&T Corp.

Inventor: Randy G. Goldberg
Methods and apparatus for translating between languages

Patent number: 6219646

Abstract: Methods and apparatus for performing translation between different language are provided. The present invention includes a translation system that performs translation having increased accuracy by providing a three-dimensional topical dual-language database. The topical database includes a set of source-to-target language translations for each topic that the database is being used for. In one embodiment, a user first selects the topic of conversation, then words spoken into a telephone are translated and produced as synthesized voice signals from another telephone so that a near real-time conversation may be had between two people speaking different languages. An additional feature of the present invention is the addition of a computer terminal that displays the input and output phrases so that either user may edit the input phrases, or indicate that the translation was ambiguous and request a rephrasing of the material.

Type: Grant

Filed: May 9, 2000

Date of Patent: April 17, 2001

Assignee: Gedanken Corp.

Inventor: Julius Cherny
Quantization using frequency and mean compensated frequency input data for robust speech recognition

Patent number: 6219642

Abstract: A speech recognition system utilizes multiple quantizers to process frequency parameters and mean compensated frequency parameters derived from an input signal. The quantizers may be matrix and vector quantizer pairs, and such quantizer pairs may also function as front ends to a second stage speech classifiers such as hidden Markov models (HMMs) and/or utilizes neural network postprocessing to, for example, improve speech recognition performance. Mean compensating the frequency parameters can remove noise frequency components that remain approximately constant during the duration of the input signal. HMM initial state and state transition probabilities derived from common quantizer types and the same input signal may be consolidated to improve recognition system performance and efficiency. Matrix quantization exploits the “evolution” of the speech short-term spectral envelopes as well as frequency domain information, and vector quantization (VQ) primarily operates on frequency domain information.

Type: Grant

Filed: October 5, 1998

Date of Patent: April 17, 2001

Assignee: Legerity, Inc.

Inventors: Safdar M. Asghar, Lin Cong
Process for the multilingual use of a hidden markov sound model in a speech recognition system

Patent number: 6212500

Abstract: In a method for determining the similarities of sounds across different languages, hidden Markov modelling of multilingual phonemes is employed wherein language-specific as well as language-independent properties are identified by combining of the probability densities for different hidden Markov sound models in various languages.

Type: Grant

Filed: March 9, 1999

Date of Patent: April 3, 2001

Assignee: Siemens Aktiengesellschaft

Inventor: Joachim Köhler
Method and apparatus for automatic speech segmentation into phoneme-like units for use in speech processing applications, and based on segmentation into broad phonetic classes, sequence-constrained vector quantization and hidden-markov-models

Patent number: 6208967

Abstract: For machine segmenting of speech, first utterances from a database of known spoken words are classified and segmented into three broad phonetic classes (BPC) voiced, unvoiced, and silence. Next, using preliminary segmentation positions as anchor points, sequence-constrained vector quantization is used for further segmentation into phoneme-like units. Finally, exact tuning to the segmented phonemes is done through Hidden-Markov Modelling and after training a diphone set is composed for further usage.

Type: Grant

Filed: February 25, 1997

Date of Patent: March 27, 2001

Assignee: U.S. Philips Corporation

Inventors: Stefan C. Pauws, Yves G. C. Kamp, Leonardus F. W. Willems
Method and apparatus for speech recognition using second order statistics and linear estimation of cepstral coefficients

Patent number: 6202047

Abstract: A method and apparatus for speech recognition using second order statistics and linear estimation of cepstral coefficients. In one embodiment, a speech input signal is received and cepstral features are extracted. An answer is generated using the extracted cepstral features and a fixed signal independent diagonal matrix as the covariance matrix for the cepstral components of the speech input signal and, for example, a hidden Markov model. In another embodiment, a noisy speech input signal is received and a cepstral vector representing a clean speech input signal is generated based on the noisy speech input signal and an explicit linear minimum mean square error cepstral estimator.

Type: Grant

Filed: March 30, 1998

Date of Patent: March 13, 2001

Assignee: AT&T Corp.

Inventors: Yariv Ephraim, Mazin G. Rahim
Marking and deferring correction of misrecognition errors

Patent number: 6195637

Abstract: A method for correcting misrecognition errors comprises the steps of: dictating to a speech application; marking misrecognized words during the dictating step; and, after the dictating and marking steps, displaying and correcting the marked misrecognized words, whereby the correcting of the misrecognized words is deferred until after the dictating step is concluded and the dictating step is not significantly interrupted. The displaying and correcting step can be implemented by invoking a correction tool of the speech application, whereby the correcting of the misrecognized words trains the speech application.

Type: Grant

Filed: March 25, 1998

Date of Patent: February 27, 2001

Assignee: International Business Machines Corp.

Inventors: Barbara E. Ballard, Kerry A. Ortega
Speech recognition over packet networks

Patent number: 6195636

Abstract: In a system in which user equipment is connected to a packet network and a speech recognition application server is also connected to the packet network for performing speech recognition on speech data from the user equipment, a speech recognition system selectively performs feature extraction at a user end before transmitting speech data to be recognized. The feature extraction is performed only for speech which is to be recognized.

Type: Grant

Filed: February 19, 1999

Date of Patent: February 27, 2001

Assignee: Texas Instruments Incorporated

Inventors: Joseph A. Crupi, Zoran Mladenovic, Edward B. Morgan, Bogdan R. Kosanovic, Negendra Kumar
On-line background noise adaptation of parallel model combination HMM with discriminative learning using weighted HMM for noisy speech recognition

Patent number: 6188982

Abstract: A system for adaptively generating a composite noisy speech model to process speech in, e.g., a nonstationary environment comprises a speech recognizer, a re-estimation circuit, a combiner circuit, a classifier circuit, and a discrimination circuit. In particular, the speech recognizer generates frames of current input utterances based on received speech data and determines which of the generated frames are aligned with noisy states to produce a current noise model. The re-estimation circuit re-estimates the produced current noise model by interpolating the number of frames in the current noise model with parameters from a previous noise model. The combiner circuit combines the parameters of the current noise model with model parameters of a corresponding current clean speech model to generate model parameters of a composite noisy speech model. The classifier circuit determines a discrimination function by generating a weighted PMC HMM model.

Type: Grant

Filed: December 1, 1997

Date of Patent: February 13, 2001

Assignee: Industrial Technology Research Institute

Inventor: Tung-Hui Chiang
Topic indexing method

Patent number: 6185531

Abstract: A method for improving the associating articles of information or stories with topics associated with specific subjects (subject topics) and with a general topic of words that are not associated with any subject. The inventive method is trained using Hidden Markov Models (HMM) to represent each story with each state in the HMM representing each topic. A standard Expectation and Maximization algorithm, as are known in this art field can be used to maximize the expected likelihood to the method relating the words associated with each topic to that topic. In the method, the probability that each word in a story is related to a subject topic is determined and evaluated, and the subject topics with the lowest probability are discarded. The remaining subject topics are evaluated and a sub-set of subject topics with the highest probabilities over all the words in a story are considered to be the “correct” subject topic set.

Type: Grant

Filed: January 9, 1998

Date of Patent: February 6, 2001

Assignee: GTE Internetworking Incorporated

Inventors: Richard M. Schwartz, Toru Imai
Method of and a device for speech recognition employing neural network and markov model recognition techniques

Patent number: 6185528

Abstract: A method and a device for recognition of isolated words in large vocabularies are described, wherein recognition is performed through two sequential steps using neural networks and Markov models techniques, respectively, and the results of both techniques are adequately combined so as to improve recognition accuracy. The devices performing the combination also provide an evaluation of recognition reliability.

Type: Grant

Filed: April 29, 1999

Date of Patent: February 6, 2001

Assignee: CSELT - Centro Studi e Laboratori Telecomunicazioni S.p.A.

Inventors: Luciano Fissore, Roberto Gemello, Franco Ravera
Method and device for translating a source text into a target using modeling and dynamic programming

Patent number: 6182026

Abstract: For translating a word-organized source text into a word-organized target text through mapping of source words on target words, both a translation model and a language model are used. In particular, alignment probabilities are ascertained between various source word & target word pairs, whilst preemptively assuming that alignment between such word pairs is monotonous through at least substantial substrings of a particular sentence. This is done by evaluating incrementally statistical translation performance of various target word strings, deciding on an optimum target word string, and outputting the latter.

Type: Grant

Filed: June 26, 1998

Date of Patent: January 30, 2001

Assignee: U.S. Philips Corporation

Inventors: Christoph Tillmann, Stephan Vogel, Hermann Ney

prev … 7 8 9 10 11 12 13 14 next