Clustering Patents (Class 704/245)

Adaptively compressing sound with multiple codebooks

Patent number: 6243674

Abstract: A sound compression system adaptively switches codebooks in and out based on a calculation carried out with the output of the codebook. The system uses three separate codebooks: adaptive vector quantization codebook, real pitch codebook, and noise codebook. The perceptually-weighted filter is generated adaptively using the predictive coefficients from the current sub-frame.

Type: Grant

Filed: March 2, 1998

Date of Patent: June 5, 2001

Assignee: American Online, Inc.

Inventor: Alfred Yu
Process for the multilingual use of a hidden markov sound model in a speech recognition system

Patent number: 6212500

Abstract: In a method for determining the similarities of sounds across different languages, hidden Markov modelling of multilingual phonemes is employed wherein language-specific as well as language-independent properties are identified by combining of the probability densities for different hidden Markov sound models in various languages.

Type: Grant

Filed: March 9, 1999

Date of Patent: April 3, 2001

Assignee: Siemens Aktiengesellschaft

Inventor: Joachim Köhler
Two-staged cohort selection for speaker verification system

Patent number: 6205424

Abstract: Speech signals from speakers having known identities are used to create sets of acoustic models. The acoustic models along with their corresponding identities are stored in a memory. A plurality of sets of cohort models that characterize the speech signals are selected from the stored sets of acoustic models, and linked to the set of acoustic models of each identified speaker. During a testing session speech signals produced by an unknown speaker having a claimed identity are processed to generate processed speech signals. The processed speech signals are compared to the set of models of the claimed speaker to produce first scores. The processed speech signals are also compared to the sets cohort models to produce second scores. A subset of scores are dynamically selected from the second scores according to a predetermined criteria.

Type: Grant

Filed: July 31, 1996

Date of Patent: March 20, 2001

Assignee: Compaq Computer Corporation

Inventors: William D. Goldenthal, Brian S. Eberman
Speaker recognition over large population with fast and detailed matches

Patent number: 6182037

Abstract: Fast and detailed match techniques for speaker recognition are combined into a hybrid system in which speakers are associated in groups when potential confusion is detected between a speaker being enrolled and a previously enrolled speaker. Thus the detailed match techniques are invoked only at the potential onset of saturation of the fast match technique while the detailed match is facilitated by limitation of comparisons to the group and the development of speaker-dependent models which principally function to distinguish between members of a group rather than to more fully characterize each speaker. Thus storage and computational requirements are limited and fast and accurate speaker recognition can be extended over populations of speakers which would degrade or saturate fast match systems and degrade performance of detailed match systems.

Type: Grant

Filed: May 6, 1997

Date of Patent: January 30, 2001

Assignee: International Business Machines Corporation

Inventor: Stephane Herman Maes
Text-to-speech using clustered context-dependent phoneme-based units

Patent number: 6163769

Abstract: A text-to-speech system includes a storage device for storing a clustered set of context-dependent phoneme-based units of a target speaker. In one embodiment, decision trees are used wherein each decision tree based context-dependent phoneme-based unit is arranged based on context of at least one immediately preceding and succeeding phoneme. At least one of the context-dependent phoneme-based units represents other non-stored context-dependent phoneme units of similar sound due to similar contexts. A text analyzer obtains a string of phonetic symbols representative of text to be converted to speech. A concatenation module selects stored decision tree based context-dependent phoneme-based units from the set decision tree based context-dependent phoneme-based units based on the context of the phonetic symbols and synthesizes the selected phoneme-based units to generate speech corresponding to the text.

Type: Grant

Filed: October 2, 1997

Date of Patent: December 19, 2000

Assignee: Microsoft Corporation

Inventors: Alejandro Acero, Hsiao-Wuen Hon, Xuedong D. Huang
Dynamically configurable acoustic model for speech recognition system

Patent number: 6141641

Abstract: The present invention includes a system for recognizing speech based on an input data stream. The system includes an acoustic model which has a model size. The model is adjustable to a desired size based on characteristics of a computer system on which the recognition system is run.

Type: Grant

Filed: April 15, 1998

Date of Patent: October 31, 2000

Assignee: Microsoft Corporation

Inventors: Mei-Yuh Hwang, Xuedong D. Huang
Check-sum based method and apparatus for performing speech recognition

Patent number: 6122612

Abstract: A method and apparatus for matching at least a first input identifier with a reference identifier. A user provides an input identifier into a system, and the system produces a recognized identifier based on the input identifier. The system of the present invention perform a check-sum operation to determine whether the recognized identifier was recognized correctly. If the check-sum operation reveals that the recognized identifier is incorrect, the system of the present invention generates a plurality of substitute identifiers. The substitute identifiers are compared to a set of pre-stored reference identifiers. If a match is found between a reference identifier and a substitute identifier, the matched reference identifier is selected as corresponding to the input identifier provided by the user.

Type: Grant

Filed: November 20, 1997

Date of Patent: September 19, 2000

Assignee: AT&T Corp

Inventor: Randy G. Goldberg
Systems and methods for access filtering employing relaxed recognition constraints

Patent number: 6107935

Abstract: A speaker recognition system for selectively permitting access by a requesting speaker to one of a service and facility include an acoustic front-end for computing at least one feature vector from a speech utterance provided by the requesting speaker; a speaker dependent codebook store for pre-storing sets of acoustic features, in the form of codebooks, respectively corresponding to a pool of previously enrolled speakers; a speaker identifier/verifier module operatively coupled to the acoustic front-end, wherein: the speaker identifier/verifier module identifies, from identifying indicia provided by the requesting speaker, a previously enrolled speaker as a claimed speaker; further, the speaker identifier/verifier module associates, with the claimed speaker, first and second groups of previously enrolled speakers, the first group being defined as speakers whose codebooks are respectively acoustically similar to the claimed speaker (i.e.

Type: Grant

Filed: February 11, 1998

Date of Patent: August 22, 2000

Assignee: International Business Machines Corporation

Inventors: Liam David Comerford, Stephane Herman Maes
Speaker adaptation system and method based on class-specific pre-clustering training speakers

Patent number: 6073096

Abstract: A method of speech recognition, in accordance with the present invention includes the steps of grouping acoustics to form classes based on acoustic features, clustering training speakers by the classes to provide class-specific cluster systems, selecting from the cluster systems, a subset of cluster systems closest to adaptation data from a test speaker, transforming the subset of cluster systems to bring the subset of cluster systems closer to the test speaker based on the adaptation data to form adapted cluster systems and combining the adapted cluster systems to create a speaker adapted system for decoding speech from the test speaker. System and methods for building speech recognition systems as well as adapting speaker systems for class-specific speaker clusters are included.

Type: Grant

Filed: February 4, 1998

Date of Patent: June 6, 2000

Assignee: International Business Machines Corporation

Inventors: Yuqing Gao, Mukund Padmanabhan, Michael Alan Picheny
Pattern recognition scheme using probabilistic models based on mixtures distribution of discrete distribution

Patent number: 6064958

Abstract: A pattern recognition scheme using probabilistic models that are capable of reducing a calculation cost for the output probability while improving a recognition performance even when a number of mixture component distributions of respective states is small, by arranging distributions with low calculation cost and high expressive power as the mixture component distribution. In this pattern recognition scheme, a probability of each probabilistic model expressing features of each recognition category with respect to each input feature vector derived from each input signal is calculated, where the probabilistic model represents a feature parameter subspace in which feature vectors of each recognition category exist and the feature parameter subspace is expressed by using mixture distributions of one-dimensional discrete distributions with arbitrary distribution shapes which are arranged in respective dimensions.

Type: Grant

Filed: September 19, 1997

Date of Patent: May 16, 2000

Assignee: Nippon Telegraph and Telephone Corporation

Inventors: Satoshi Takahashi, Shigeki Sagayama
Speech recognition apparatus

Patent number: 6061652

Abstract: A HMM device, and a DP matching device, capable of performing word spotting accurately with a small amount of calculation is provided. For that purpose, a code book is provided in which representative vectors of respective clusters are stored in a form searchable by their labels, wherein in HMM, similarity degrees based on Kullbach-Leibler Divergence of distributions of occurrence probabilities of the clusters under respective states and distribution of degrees of input feature vectors to be recognized to the respective clusters are rendered occurrence degrees of the feature vectors from the states and in DP matching, similarity degrees based on Kullbach-Leibler Divergence of distributions of membership degrees of feature vectors forming reference patterns to the respective clusters and distribution of degrees of input feature vectors to the respective clusters are rendered inter-frame similarity degrees of frames of the input patterns and frames of corresponding reference patterns.

Type: Grant

Filed: June 18, 1996

Date of Patent: May 9, 2000

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Eiichi Tsuboka, Junichi Nakahashi
Robust speech processing with affine transform replicated data

Patent number: 6038528

Abstract: The present invention relates to a robust speech processing method and system which models channel and noise variations with affine transforms to reduce mismatched conditions between training and testing. The affine transform relating the training vectors C.sub.k with the vectors for testing condition c.sub.k', is represented by the form:c'.sub.k.sup.T =Ac.sub.k.sup.T +bfor k=1 to N in which A is a matrix of predicator coefficients representing noise distortions and vector b represents channel distortions. Alternatively, an affine invariant cepstrum is generated during testing and training for modeling speech to account for noise and channel effects. From the improved speech processing, improved speaker recognition with channel and noise variations is obtained.

Type: Grant

Filed: July 17, 1996

Date of Patent: March 14, 2000

Assignee: T-Netix, Inc.

Inventors: Richard Mammone, Xiaoyu Zhang
Training speech recognition by matching audio segment frequency of occurrence with frequency of words and letter combinations in a corpus

Patent number: 6009392

Abstract: A method is provided which trains acoustic models in an automatic speech recognizer ("ASR") without explicitly matching decoded scripts with correct scripts from which acoustic training data is generated. In the method, audio data is input and segmented to produce audio segments. The audio segments are clustered into groups of clustered audio segments such that the clustered audio segments in each of the groups have similar characteristics. Also, the groups respectively form audio similarity classes. Then, audio segment probability distributions for the clustered audio segments in the audio similarity classes are calculated, and audio segment frequencies for the clustered audio segments are determined based on the audio segment probability distributions. The audio segment frequencies are matched to known audio segment frequencies for at least one of letters, combination of letters, and words to determine frequency matches, and a textual corpus of words is formed based on the frequency matches.

Type: Grant

Filed: January 15, 1998

Date of Patent: December 28, 1999

Assignee: International Business Machines Corporation

Inventors: Dimitri Kanevsky, Wlodek Wlodzimierz Zadrozny
Technique for selective use of Gaussian kernels and mixture component weights of tied-mixture hidden Markov models for speech recognition

Patent number: 6009390

Abstract: In a speech recognition system, tied-mixture hidden Markov models (HMMs) are used to match, in the maximum likelihood sense, the phonemes of spoken words given the acoustic input thereof. In a well known manner, such speech recognition requires computation of state observation likelihoods (SOLs). Because of the use of HMMs, each SOL computation involves a substantial number of Gaussian kernels and mixture component weights. In accordance with the invention, the number of Gaussian kernels is cut down to reduce the computational complexity and increase the efficiency of memory access to the kernels. For example, only the non-zero mixture component weights and the Gaussian kernels associated therewith are considered in the SOL computation. In accordance with an aspect of the invention, only a subset of the Gaussian kernels of significant values, regardless of the values of the associated mixture component weights, are considered in the SOL computation.

Type: Grant

Filed: September 11, 1997

Date of Patent: December 28, 1999

Assignee: Lucent Technologies Inc.

Inventors: Sunil K. Gupta, Raziel Haimi-Cohen, Frank K. Soong
Tree structured cohort selection for speaker recognition system

Patent number: 6006184

Abstract: In a speaker recognition system, a tree-structured reference pattern storing unit has first through M-th node stages each of which has nodes that respectively store a reference pattern of inhibiting speakers. The reference pattern of each node of (N-1)-th node stage represents acoustic features in the reference patterns of predetermined ones of the nodes of the N-th node stage. An analysis unit analyzes input speech and converts the input speech into feature vectors. A similarities calculating unit calculates similarities between the feature vectors and the reference patterns of all of the inhibiting speakers. An inhibiting speaker selecting unit sorts the similarities and selects a predetermined number of inhibiting speakers.

Type: Grant

Filed: January 28, 1998

Date of Patent: December 21, 1999

Assignee: NEC Corporation

Inventors: Eiko Yamada, Hiroaki Hattori
Method and apparatus for recognizing spoken words in a speech signal by organizing the vocabulary in the form of a tree

Patent number: 5995930

Abstract: A method and apparatus for processing a sequence of words in a speech signal for speech recognition. The method includes the steps of sampling, at recurrent instants, said speech signal for generating a series of test signals. Signal-by-signal matching and scoring is generated between the test signals and a series of reference signals, where each of the series of reference signals forms one of a plurality of vocabulary words arranged as a vocabulary tree. The vocabulary tree includes a root and a plurality of tree branches wherein any tree branch has a predetermined number of reference signals and is assigned to a speech element and any vocabulary word is assigned to a particular branch junction or branch end. Acoustic recombination determines both continuations of branches and the most probable partial hypotheses within a word because of the use of a vocabulary built up as a tree with branches having reference signals.

Type: Grant

Filed: November 19, 1996

Date of Patent: November 30, 1999

Assignee: U.S. Philips Corporation

Inventors: Reinhold Hab-Umbach, Hermann Ney
Method and apparatus for selecting a vocabulary sub-set from a speech recognition dictionary for use in real time automated directory assistance

Patent number: 5987414

Abstract: A vocabulary sub-set is selected from a large speech recognition dictionary. The selected vocabulary sub-set may be used in a real time directory assistance system to improve the system's real-time performance. The selection process is effected on the basis of the cost-benefit ratio, the benefit being measured in savings in operator working time. On the other hand, the cost is measured in terms of hardware limitations, namely processor throughput. Typically, the vocabulary sub-set is limited to a maximum number orthographies that would enable the system to achieve real-time performance.

Type: Grant

Filed: October 31, 1996

Date of Patent: November 16, 1999

Assignee: Nortel Networks Corporation

Inventors: Michael George Sabourin, Jeff Marcus
Speaker clustering apparatus based on feature quantities of vocal-tract configuration and speech recognition apparatus therewith

Patent number: 5983178

Abstract: A speaker clustering apparatus generates HMMs for clusters based on feature quantities of a vocal-tract configuration of speech waveform data, and a speech recognition apparatus provided with the speaker clustering apparatus. In response to the speech waveform data of N speakers, an estimator estimates feature quantities of vocal-tract configurations, with reference to correspondence between vocal-tract configuration parameters and Formant frequencies predetermined based on a predetermined vocal tract model of a standard speaker. Further, a clustering processor calculates speaker-to-speaker distances between the N speakers based on the feature quantities of the vocal-tract configurations of the N speakers as estimated, and clusters the vocal-tract configurations of the N speakers using a clustering algorithm based on calculated speaker-to-speaker distances, thereby generating K clusters.

Type: Grant

Filed: December 10, 1998

Date of Patent: November 9, 1999

Assignee: ATR Interpreting Telecommunications Research Laboratories

Inventors: Masaki Naito, Li Deng, Yoshinori Sagisaka
Standard pattern production system employing information criterion

Patent number: 5960396

Abstract: The invention provides a standard pattern production system which produces an optimum recognition unit in terms of an information criterion to given learning data using an information criterion in learning of a standard pattern in pattern recognition. An input pattern production section holds an input pattern, and a standard pattern producing parameter production section calculates and outputs parameters necessary to produce standard patterns of individual categories. A cluster set production section divides a category set into cluster sets. A common standard pattern production section calculates standard patterns of individual clusters of the cluster sets. An optimum cluster selection section receives a plurality of cluster sets and common standard patterns and selects an optimum cluster using an information criterion. A standard pattern storage section stores the common standard pattern of the optimum cluster set as a standard pattern for the individual categories.

Type: Grant

Filed: April 21, 1997

Date of Patent: September 28, 1999

Assignee: NEC Corporation

Inventor: Koichi Shinoda
Methods and apparatus for generating and using speaker independent garbage models for speaker dependent speech recognition purpose

Patent number: 5895448

Abstract: Methods and apparatus for generating and using both speaker dependent and speaker independent garbage models in speaker dependent speech recognition applications are described. The present invention recognizes that in some speech recognition systems, e.g., systems where multiple speech recognition operations are performed on the same signal, it may be desirable to recognize and treat words or phrases in one part of the speech recognition system as garbage or out of vocabulary utterances with the understanding that the very same words or phrases will be recognized and treated as in-vocabulary by another portion of the system. In accordance with the present invention, in systems where both speaker independent and speaker dependent speech recognition operations are performed independently, e.g.

Type: Grant

Filed: April 30, 1997

Date of Patent: April 20, 1999

Assignee: Nynex Science and Technology, Inc.

Inventors: George J. Vysotsky, Vijay R. Raman
Method and apparatus for training Hidden Markov Model

Patent number: 5890114

Abstract: HMM training method comprising a first parameter predicting step, a centroid state set calculating step, a reconstructing step, a second parameter predicting step and a control step. In the first parameter predicting step, a parameter of an HMM (hidden Markov model) is predicted based on training data. In the centroid state set calculating step, a centroid state set is calculated by clustering the state of said HMM whose parameter is predicted in the first parameter predicting step. In the reconstructing step, an HMM is reconstructed with using the centroid state calculated in the centroid state set calculating step. In the second parameter predicting step, predicted a parameter of the HMM reconstructed in the reconstructing step with using the training data. And, the centroid step is reexecuted by the control step in the case that a likelihood of the HMM whose parameter is predicted in the second parameter predicting step does not satisfy a predetermined condition.

Type: Grant

Filed: February 28, 1997

Date of Patent: March 30, 1999

Assignee: Oki Electric Industry Co., Ltd.

Inventor: Jie Yi
Variable dimension vector quantization

Patent number: 5890110

Abstract: A variable dimension vector quantization method that uses a single "universal" codebook. The method can be given the interpretation of sampling full-dimensioned codevectors in the universal codebook and generating subcodevectors of the same dimension as input data subvector, which dimension may vary in time. A subcodevector is selected from the codebook to have minimum distortion between it and the input data subvector. The subcodevector with minimum distortion corresponds to the representative, full-dimensioned codevector in the codebook. The codebook is designed by inverse sampling of training subvectors to obtain full-dimension vectors, then iteratively clustering the training set until a stable centroid vector is obtained.

Type: Grant

Filed: March 27, 1995

Date of Patent: March 30, 1999

Assignee: The Regents of the University of California

Inventors: Allen Gersho, Amitava Das, Ajit Venkat Rao
Method and system for editing phrases during continuous speech recognition

Patent number: 5884258

Abstract: A method and system for editing words that have been misrecognized. The system allows a speaker to specify a number of alternative words to be displayed in a correction window by resizing the correction window. The system also displays the words in the correction window in alphabetical order. A preferred system eliminates the possibility, when a misrecognized word is respoken, that the respoken utterance will be again recognized as the same misrecognized word. The system, when operating with a word processor, allows the speaker to specify the amount of speech that is buffered before transferring to the word processor.

Type: Grant

Filed: October 31, 1996

Date of Patent: March 16, 1999

Assignee: Microsoft Corporation

Inventors: Michael J. Rozak, Fileno A Alleva
Speech recognition system for determining a recognition result at an intermediate state of processing

Patent number: 5875425

Abstract: A speech recognition system for recognizing a system user's speech can shorten a recognition period by reducing the amount of necessary calculations without deteriorating the accuracy rate of recognition. The speech recognition system successively calculates statistical probabilities of acoustic models, outputs a one sentence recognition result corresponding to acoustic models having the highest reliability when the one sentence is detected and stops the following calculations.

Type: Grant

Filed: December 23, 1996

Date of Patent: February 23, 1999

Assignee: Kokusai Denshin Denwa Co., Ltd.

Inventors: Makoto Nakamura, Naomi Inoue, Fumihiro Yato, Seiichi Yamamoto
Method and apparatus for training a speaker recognition system

Patent number: 5864807

Abstract: A method and apparatus for training a system to assess the identity of a person through the audio characteristics of their voice. The system inserts an audio input (10) into an A/D Converter (20) for processing in a digital signal processor (30). The system then applies Neural network type processing by using a polynomial pattern classifier (60) for training the speaker recognition system.

Type: Grant

Filed: February 25, 1997

Date of Patent: January 26, 1999

Assignee: Motorola, Inc.

Inventors: William Michael Campbell, Khaled Talal Assaleh
Automated meaningful phrase clustering

Patent number: 5860063

Abstract: A system and method for automated task selection is provided where a selected task is identified from the natural speech of the user making the selection. The system and method incorporate the selection of meaningful phrases through the use of a test for significance. The selected meaningful phrases are then clustered. The meaningful phrase clusters are input to a speech recognizer that determines whether any meaningful phrase clusters are present in the input speech. Task-type decisions are then made on the basis of the recognized meaningful phrase clusters.

Type: Grant

Filed: July 11, 1997

Date of Patent: January 12, 1999

Assignee: AT&T Corp

Inventors: Allen Louis Gorin, Jeremy Huntley Wright
Method and system for speech recognition with compensation for variations in the speech environment

Patent number: 5854999

Abstract: Compensatory values for compensating a reference pattern to match with an utterance environment of an input speech are employed for determining an environmental variation index to be input to a secondary matching controller, which is responsible for magnitudes of the index smaller than a threshold to hold a second matching section inoperative so that a recognition result of a primary matching of a previous compensated reference pattern is output, and for magnitudes of the index larger than the threshold to operate the second matching section to output a recognition result of a second matching based on a current compensated reference pattern to be stored as a subsequent reference pattern.

Type: Grant

Filed: June 24, 1996

Date of Patent: December 29, 1998

Assignee: NEC Corporation

Inventor: Hiroshi Hirayama
Method and apparatus for speech recognition

Patent number: 5852804

Abstract: A speech recognizing apparatus compares a speech command from a user with one of registration patterns stored in a storage unit in turn. Then if the speech command coincides with one of the registration patterns, the speech recognizing apparatus controls a predetermined electronic apparatus associated with an operation related to the registration pattern. If the speech command does not coincide with any one of the registration patterns, the speech recognizing apparatus stores into a memory the speech command as a new registration pattern in which the speech command is related to a manipulation of the electronic apparatus produced by the user immediately after speech command is produced.

Type: Grant

Filed: April 11, 1997

Date of Patent: December 22, 1998

Assignee: Fujitsu Limited

Inventor: Kazuya Sako
Speech recognition using clustered between word and/or phrase coarticulation

Patent number: 5819221

Abstract: Improved speech recognition is achieved according to the present invention by use of between word and/or between phrase coarticulation. The increase in the number of phonetic models required to model this additional vocabulary is reduced by clustering 19, 20 the inter-word/phrase models and grammar into only a few classes. By using one class for consonant inter-word context and two classes for vowel contexts, the accuracy for Japanese was almost as good as for unclustered models while the number of models was reduced more than half.

Type: Grant

Filed: August 31, 1994

Date of Patent: October 6, 1998

Assignee: Texas Instruments Incorporated

Inventors: Kazuhiro Kondo, Ikuo Kudo, Yu-Hung Kao, Barbara J. Wheatley
State transition model design method and voice recognition method and apparatus using same

Patent number: 5812975

Abstract: A method of designing a state transition model capable of high speed voice recognition and a voice recognition method and apparatus using the state transition model is provided. The methods provide a state transition model in which a state shared structure of the state transition model is designed. The method includes a step of setting the states of a triphone state transition model in an acoustic space as initial clusters, a clustering step of generating a cluster containing the initial clusters by top-down clustering, a step of determining a state shared structure by assigning a short distance cluster among clusters generated by the clustering step, to the state transition model, and a step of learning a state shared model by analyzing the states of the triphones in accordance with the determined state shared structure.

Type: Grant

Filed: June 18, 1996

Date of Patent: September 22, 1998

Assignee: Canon Kabushiki Kaisha

Inventors: Yasuhiro Komori, Yasunori Ohora
Signal conditioned minimum error rate training for continuous speech recognition

Patent number: 5806029

Abstract: Hierarchical signal bias removal (HSBR) signal conditioning uses a codebook constructed from the set of recognition models and is updated as the recognition models are modified during recognition model training. As a result, HSBR signal conditioning and recognition model training are based on the same set of recognition model parameters, which provides significant reduction in recognition error rate for the speech recognition system.

Type: Grant

Filed: September 15, 1995

Date of Patent: September 8, 1998

Assignee: AT&T Corp

Inventors: Eric Rolfe Buhrke, Wu Chou, Mazin G. Rahim
Low complexity, high accuracy clustering method for speech recognizer

Patent number: 5806030

Abstract: The clustering technique produces a low complexity and yet high accuracy speech representation for use with speech recognizers. The task database comprising the test speech to be modeled is segmented into subword units such as phonemes and labeled to indicate each phoneme in its left and right context (triphones). Hidden Markov Models are constructed for each context-independent phoneme and trained. Then the center states are tied for all phonemes of the same class. Triphones are trained and all poorly-trained models are eliminated by merging their training data with the nearest well-trained model using a weighted divergence computation to ascertain distance. Before merging, the threshold for each class is adjusted until the number of good models for each phoneme class is within predetermined upper and lower limits. Finally, if desired, the number of mixture components used to represent each model may be increased and the models retrained. This latter step increases the accuracy.

Type: Grant

Filed: May 6, 1996

Date of Patent: September 8, 1998

Inventor: Jean-Claude Junqua
State-dependent speaker clustering for speaker adaptation

Patent number: 5787394

Abstract: A system and method for adaptation of a speaker independent speech recognition system for use by a particular user. The system and method gather acoustic characterization data from a test speaker and compare the data with acoustic characterization data generated for a plurality of training speakers. A match score is computed between the test speaker's acoustic characterization for a particular acoustic subspace and each training speaker's acoustic characterization for the same acoustic subspace. The training speakers are ranked for the subspace according to their scores and a new acoustic model is generated for the test speaker based upon the test speaker's acoustic characterization data and the acoustic characterization data of the closest matching training speakers. The process is repeated for each acoustic subspace.

Type: Grant

Filed: December 13, 1995

Date of Patent: July 28, 1998

Assignee: International Business Machines Corporation

Inventors: Lalit Rai Bahl, Ponani Gopalakrishnan, David Nahamoo, Mukund Padmanabhan
Word and pattern recognition through overlapping hierarchical tree defined by relational features

Patent number: 5787395

Abstract: A voice recognizing method in which a plurality of voice recognition objective words are provided. Scores are accumulated for an unknown input voice signal as compared to the voice recognition objective words by using parameters which are calculated in advance. Upon receipt of an unknown voice signal, a corresponding voice recognition objective word is extracted and recognized. The voice recognition objective words are structured into an overlapping hierarchical structure by using correlation values between each pair of voice recognition objective words. This correlation may be computed from acoustic features, HMM parameters or the like. Score calculation is performed on the unknown input voice signal by using a dictionary of the voice recognition objective words structured in the hierarchical structure. Upon preliminary recognition, the dictionary of the voice recognition objective words is resorted without recalculation of the correlation values.

Type: Grant

Filed: July 18, 1996

Date of Patent: July 28, 1998

Assignee: Sony Corporation

Inventor: Katsuki Minamino
Speech coding and joint data/channel bias estimation using finite state vector quantizer derived from sequential constraints

Patent number: 5778336

Abstract: A joint data (features) and channel (bias) estimation framework for robust processing of speech received over a channel is described. A trellis encoded vector quantizer is used as a pre-processor to estimate the channel bias using blind maximum likelihood sequence estimation. Sequential constraint in the feature vector sequence of a speech signal is applied for the selection of the quantized signal constellation and for the decoding process in joint data and channel estimation. A two state trellis encoded vector quantizer is designed for signal bias removal applications.

Type: Grant

Filed: October 1, 1996

Date of Patent: July 7, 1998

Assignee: Lucent Technologies Inc.

Inventors: Wu Chou, Nambirajan Seshadri
Communications device responsive to spoken commands and methods of using same

Patent number: 5749072

Abstract: A communications device (20) that is responsive to voice commands is provided. The communications device (20) can be a two-way radio, cellular telephone, PDA, or pager. The communications device (20) includes an interface (22) for allowing a user to access a communications channel according a control signal and a speech-recognition system (24) for producing the control signal in response to a voice command. Included in the speech recognition system (24) are a feature extractor (26) and one or more classifiers (28) utilizing polynomial discriminant functions.

Type: Grant

Filed: December 28, 1995

Date of Patent: May 5, 1998

Assignee: Motorola Inc.

Inventors: Theodore Mazurkiewicz, Gil E. Levendel, Shay-Ping Thomas Wang
Method and apparatus for developing a neural network for phoneme recognition

Patent number: 5749066

Abstract: An automated speech recognition system converts a speech signal into a compact, coded representation that correlates to a speech phoneme set. A number of different neural network pattern matching schemes may be used to perform the necessary speech coding. An integrated user interface guides a user unfamiliar with the details of speech recognition or neural networks to quickly develop and test a neural network for phoneme recognition. To train the neural network, digitized voice data containing known phonemes that the user wants the neural network to ultimately recognize are processed by the integrated user interface. The digitized speech is segmented into phonemes with each segment being labelled with a corresponding phoneme code. Based on a user selected transformation method and transformation parameters, each segment is transformed into a series of multiple dimension vectors representative of the speech characteristics of that segment.

Type: Grant

Filed: April 24, 1995

Date of Patent: May 5, 1998

Assignee: Ericsson Messaging Systems Inc.

Inventor: Paul A. Nussbaum
Method of training a speaker-dependent speech recognizer with automated supervision of training sufficiency

Patent number: 5664058

Abstract: To train a speech recognizer, a new voice message (one or a few isolated words), after being spoken by a user, is converted into a token. The token is then compared with a plurality of templates stored in the recognizer and a recognition score is obtained each time. The templates previously stored in the recognizer include templates for previously trained voice messages and one or more previously formed templates of the new voice message. Three tests are applied to the recognition scores to determine if the token and one of the previously formed templates of the new voice message can become paradigm templates, if the new voice message is too close in pronunciation to a voice message the recognizer has been previously trained to recognize, or if the user should repeat the new voice message to form another token. This training procedure provides a certain level of automatic control over the training process of a speaker dependent speech recognizer in an otherwise unsupervised environment.

Type: Grant

Filed: May 12, 1993

Date of Patent: September 2, 1997

Assignee: NYNEX Science & Technology

Inventor: George Vysotsky

prev … 3 4 5 6 7