Clustering Patents (Class 704/245)
  • Patent number: 6243674
    Abstract: A sound compression system adaptively switches codebooks in and out based on a calculation carried out with the output of the codebook. The system uses three separate codebooks: adaptive vector quantization codebook, real pitch codebook, and noise codebook. The perceptually-weighted filter is generated adaptively using the predictive coefficients from the current sub-frame.
    Type: Grant
    Filed: March 2, 1998
    Date of Patent: June 5, 2001
    Assignee: American Online, Inc.
    Inventor: Alfred Yu
  • Patent number: 6212500
    Abstract: In a method for determining the similarities of sounds across different languages, hidden Markov modelling of multilingual phonemes is employed wherein language-specific as well as language-independent properties are identified by combining of the probability densities for different hidden Markov sound models in various languages.
    Type: Grant
    Filed: March 9, 1999
    Date of Patent: April 3, 2001
    Assignee: Siemens Aktiengesellschaft
    Inventor: Joachim Köhler
  • Patent number: 6205424
    Abstract: Speech signals from speakers having known identities are used to create sets of acoustic models. The acoustic models along with their corresponding identities are stored in a memory. A plurality of sets of cohort models that characterize the speech signals are selected from the stored sets of acoustic models, and linked to the set of acoustic models of each identified speaker. During a testing session speech signals produced by an unknown speaker having a claimed identity are processed to generate processed speech signals. The processed speech signals are compared to the set of models of the claimed speaker to produce first scores. The processed speech signals are also compared to the sets cohort models to produce second scores. A subset of scores are dynamically selected from the second scores according to a predetermined criteria.
    Type: Grant
    Filed: July 31, 1996
    Date of Patent: March 20, 2001
    Assignee: Compaq Computer Corporation
    Inventors: William D. Goldenthal, Brian S. Eberman
  • Patent number: 6182037
    Abstract: Fast and detailed match techniques for speaker recognition are combined into a hybrid system in which speakers are associated in groups when potential confusion is detected between a speaker being enrolled and a previously enrolled speaker. Thus the detailed match techniques are invoked only at the potential onset of saturation of the fast match technique while the detailed match is facilitated by limitation of comparisons to the group and the development of speaker-dependent models which principally function to distinguish between members of a group rather than to more fully characterize each speaker. Thus storage and computational requirements are limited and fast and accurate speaker recognition can be extended over populations of speakers which would degrade or saturate fast match systems and degrade performance of detailed match systems.
    Type: Grant
    Filed: May 6, 1997
    Date of Patent: January 30, 2001
    Assignee: International Business Machines Corporation
    Inventor: Stephane Herman Maes
  • Patent number: 6163769
    Abstract: A text-to-speech system includes a storage device for storing a clustered set of context-dependent phoneme-based units of a target speaker. In one embodiment, decision trees are used wherein each decision tree based context-dependent phoneme-based unit is arranged based on context of at least one immediately preceding and succeeding phoneme. At least one of the context-dependent phoneme-based units represents other non-stored context-dependent phoneme units of similar sound due to similar contexts. A text analyzer obtains a string of phonetic symbols representative of text to be converted to speech. A concatenation module selects stored decision tree based context-dependent phoneme-based units from the set decision tree based context-dependent phoneme-based units based on the context of the phonetic symbols and synthesizes the selected phoneme-based units to generate speech corresponding to the text.
    Type: Grant
    Filed: October 2, 1997
    Date of Patent: December 19, 2000
    Assignee: Microsoft Corporation
    Inventors: Alejandro Acero, Hsiao-Wuen Hon, Xuedong D. Huang
  • Patent number: 6141641
    Abstract: The present invention includes a system for recognizing speech based on an input data stream. The system includes an acoustic model which has a model size. The model is adjustable to a desired size based on characteristics of a computer system on which the recognition system is run.
    Type: Grant
    Filed: April 15, 1998
    Date of Patent: October 31, 2000
    Assignee: Microsoft Corporation
    Inventors: Mei-Yuh Hwang, Xuedong D. Huang
  • Patent number: 6122612
    Abstract: A method and apparatus for matching at least a first input identifier with a reference identifier. A user provides an input identifier into a system, and the system produces a recognized identifier based on the input identifier. The system of the present invention perform a check-sum operation to determine whether the recognized identifier was recognized correctly. If the check-sum operation reveals that the recognized identifier is incorrect, the system of the present invention generates a plurality of substitute identifiers. The substitute identifiers are compared to a set of pre-stored reference identifiers. If a match is found between a reference identifier and a substitute identifier, the matched reference identifier is selected as corresponding to the input identifier provided by the user.
    Type: Grant
    Filed: November 20, 1997
    Date of Patent: September 19, 2000
    Assignee: AT&T Corp
    Inventor: Randy G. Goldberg
  • Patent number: 6107935
    Abstract: A speaker recognition system for selectively permitting access by a requesting speaker to one of a service and facility include an acoustic front-end for computing at least one feature vector from a speech utterance provided by the requesting speaker; a speaker dependent codebook store for pre-storing sets of acoustic features, in the form of codebooks, respectively corresponding to a pool of previously enrolled speakers; a speaker identifier/verifier module operatively coupled to the acoustic front-end, wherein: the speaker identifier/verifier module identifies, from identifying indicia provided by the requesting speaker, a previously enrolled speaker as a claimed speaker; further, the speaker identifier/verifier module associates, with the claimed speaker, first and second groups of previously enrolled speakers, the first group being defined as speakers whose codebooks are respectively acoustically similar to the claimed speaker (i.e.
    Type: Grant
    Filed: February 11, 1998
    Date of Patent: August 22, 2000
    Assignee: International Business Machines Corporation
    Inventors: Liam David Comerford, Stephane Herman Maes
  • Patent number: 6073096
    Abstract: A method of speech recognition, in accordance with the present invention includes the steps of grouping acoustics to form classes based on acoustic features, clustering training speakers by the classes to provide class-specific cluster systems, selecting from the cluster systems, a subset of cluster systems closest to adaptation data from a test speaker, transforming the subset of cluster systems to bring the subset of cluster systems closer to the test speaker based on the adaptation data to form adapted cluster systems and combining the adapted cluster systems to create a speaker adapted system for decoding speech from the test speaker. System and methods for building speech recognition systems as well as adapting speaker systems for class-specific speaker clusters are included.
    Type: Grant
    Filed: February 4, 1998
    Date of Patent: June 6, 2000
    Assignee: International Business Machines Corporation
    Inventors: Yuqing Gao, Mukund Padmanabhan, Michael Alan Picheny
  • Patent number: 6064958
    Abstract: A pattern recognition scheme using probabilistic models that are capable of reducing a calculation cost for the output probability while improving a recognition performance even when a number of mixture component distributions of respective states is small, by arranging distributions with low calculation cost and high expressive power as the mixture component distribution. In this pattern recognition scheme, a probability of each probabilistic model expressing features of each recognition category with respect to each input feature vector derived from each input signal is calculated, where the probabilistic model represents a feature parameter subspace in which feature vectors of each recognition category exist and the feature parameter subspace is expressed by using mixture distributions of one-dimensional discrete distributions with arbitrary distribution shapes which are arranged in respective dimensions.
    Type: Grant
    Filed: September 19, 1997
    Date of Patent: May 16, 2000
    Assignee: Nippon Telegraph and Telephone Corporation
    Inventors: Satoshi Takahashi, Shigeki Sagayama
  • Patent number: 6061652
    Abstract: A HMM device, and a DP matching device, capable of performing word spotting accurately with a small amount of calculation is provided. For that purpose, a code book is provided in which representative vectors of respective clusters are stored in a form searchable by their labels, wherein in HMM, similarity degrees based on Kullbach-Leibler Divergence of distributions of occurrence probabilities of the clusters under respective states and distribution of degrees of input feature vectors to be recognized to the respective clusters are rendered occurrence degrees of the feature vectors from the states and in DP matching, similarity degrees based on Kullbach-Leibler Divergence of distributions of membership degrees of feature vectors forming reference patterns to the respective clusters and distribution of degrees of input feature vectors to the respective clusters are rendered inter-frame similarity degrees of frames of the input patterns and frames of corresponding reference patterns.
    Type: Grant
    Filed: June 18, 1996
    Date of Patent: May 9, 2000
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Eiichi Tsuboka, Junichi Nakahashi
  • Patent number: 6038528
    Abstract: The present invention relates to a robust speech processing method and system which models channel and noise variations with affine transforms to reduce mismatched conditions between training and testing. The affine transform relating the training vectors C.sub.k with the vectors for testing condition c.sub.k', is represented by the form:c'.sub.k.sup.T =Ac.sub.k.sup.T +bfor k=1 to N in which A is a matrix of predicator coefficients representing noise distortions and vector b represents channel distortions. Alternatively, an affine invariant cepstrum is generated during testing and training for modeling speech to account for noise and channel effects. From the improved speech processing, improved speaker recognition with channel and noise variations is obtained.
    Type: Grant
    Filed: July 17, 1996
    Date of Patent: March 14, 2000
    Assignee: T-Netix, Inc.
    Inventors: Richard Mammone, Xiaoyu Zhang
  • Patent number: 6009392
    Abstract: A method is provided which trains acoustic models in an automatic speech recognizer ("ASR") without explicitly matching decoded scripts with correct scripts from which acoustic training data is generated. In the method, audio data is input and segmented to produce audio segments. The audio segments are clustered into groups of clustered audio segments such that the clustered audio segments in each of the groups have similar characteristics. Also, the groups respectively form audio similarity classes. Then, audio segment probability distributions for the clustered audio segments in the audio similarity classes are calculated, and audio segment frequencies for the clustered audio segments are determined based on the audio segment probability distributions. The audio segment frequencies are matched to known audio segment frequencies for at least one of letters, combination of letters, and words to determine frequency matches, and a textual corpus of words is formed based on the frequency matches.
    Type: Grant
    Filed: January 15, 1998
    Date of Patent: December 28, 1999
    Assignee: International Business Machines Corporation
    Inventors: Dimitri Kanevsky, Wlodek Wlodzimierz Zadrozny
  • Patent number: 6009390
    Abstract: In a speech recognition system, tied-mixture hidden Markov models (HMMs) are used to match, in the maximum likelihood sense, the phonemes of spoken words given the acoustic input thereof. In a well known manner, such speech recognition requires computation of state observation likelihoods (SOLs). Because of the use of HMMs, each SOL computation involves a substantial number of Gaussian kernels and mixture component weights. In accordance with the invention, the number of Gaussian kernels is cut down to reduce the computational complexity and increase the efficiency of memory access to the kernels. For example, only the non-zero mixture component weights and the Gaussian kernels associated therewith are considered in the SOL computation. In accordance with an aspect of the invention, only a subset of the Gaussian kernels of significant values, regardless of the values of the associated mixture component weights, are considered in the SOL computation.
    Type: Grant
    Filed: September 11, 1997
    Date of Patent: December 28, 1999
    Assignee: Lucent Technologies Inc.
    Inventors: Sunil K. Gupta, Raziel Haimi-Cohen, Frank K. Soong
  • Patent number: 6006184
    Abstract: In a speaker recognition system, a tree-structured reference pattern storing unit has first through M-th node stages each of which has nodes that respectively store a reference pattern of inhibiting speakers. The reference pattern of each node of (N-1)-th node stage represents acoustic features in the reference patterns of predetermined ones of the nodes of the N-th node stage. An analysis unit analyzes input speech and converts the input speech into feature vectors. A similarities calculating unit calculates similarities between the feature vectors and the reference patterns of all of the inhibiting speakers. An inhibiting speaker selecting unit sorts the similarities and selects a predetermined number of inhibiting speakers.
    Type: Grant
    Filed: January 28, 1998
    Date of Patent: December 21, 1999
    Assignee: NEC Corporation
    Inventors: Eiko Yamada, Hiroaki Hattori
  • Patent number: 5995930
    Abstract: A method and apparatus for processing a sequence of words in a speech signal for speech recognition. The method includes the steps of sampling, at recurrent instants, said speech signal for generating a series of test signals. Signal-by-signal matching and scoring is generated between the test signals and a series of reference signals, where each of the series of reference signals forms one of a plurality of vocabulary words arranged as a vocabulary tree. The vocabulary tree includes a root and a plurality of tree branches wherein any tree branch has a predetermined number of reference signals and is assigned to a speech element and any vocabulary word is assigned to a particular branch junction or branch end. Acoustic recombination determines both continuations of branches and the most probable partial hypotheses within a word because of the use of a vocabulary built up as a tree with branches having reference signals.
    Type: Grant
    Filed: November 19, 1996
    Date of Patent: November 30, 1999
    Assignee: U.S. Philips Corporation
    Inventors: Reinhold Hab-Umbach, Hermann Ney
  • Patent number: 5987414
    Abstract: A vocabulary sub-set is selected from a large speech recognition dictionary. The selected vocabulary sub-set may be used in a real time directory assistance system to improve the system's real-time performance. The selection process is effected on the basis of the cost-benefit ratio, the benefit being measured in savings in operator working time. On the other hand, the cost is measured in terms of hardware limitations, namely processor throughput. Typically, the vocabulary sub-set is limited to a maximum number orthographies that would enable the system to achieve real-time performance.
    Type: Grant
    Filed: October 31, 1996
    Date of Patent: November 16, 1999
    Assignee: Nortel Networks Corporation
    Inventors: Michael George Sabourin, Jeff Marcus
  • Patent number: 5983178
    Abstract: A speaker clustering apparatus generates HMMs for clusters based on feature quantities of a vocal-tract configuration of speech waveform data, and a speech recognition apparatus provided with the speaker clustering apparatus. In response to the speech waveform data of N speakers, an estimator estimates feature quantities of vocal-tract configurations, with reference to correspondence between vocal-tract configuration parameters and Formant frequencies predetermined based on a predetermined vocal tract model of a standard speaker. Further, a clustering processor calculates speaker-to-speaker distances between the N speakers based on the feature quantities of the vocal-tract configurations of the N speakers as estimated, and clusters the vocal-tract configurations of the N speakers using a clustering algorithm based on calculated speaker-to-speaker distances, thereby generating K clusters.
    Type: Grant
    Filed: December 10, 1998
    Date of Patent: November 9, 1999
    Assignee: ATR Interpreting Telecommunications Research Laboratories
    Inventors: Masaki Naito, Li Deng, Yoshinori Sagisaka
  • Patent number: 5960396
    Abstract: The invention provides a standard pattern production system which produces an optimum recognition unit in terms of an information criterion to given learning data using an information criterion in learning of a standard pattern in pattern recognition. An input pattern production section holds an input pattern, and a standard pattern producing parameter production section calculates and outputs parameters necessary to produce standard patterns of individual categories. A cluster set production section divides a category set into cluster sets. A common standard pattern production section calculates standard patterns of individual clusters of the cluster sets. An optimum cluster selection section receives a plurality of cluster sets and common standard patterns and selects an optimum cluster using an information criterion. A standard pattern storage section stores the common standard pattern of the optimum cluster set as a standard pattern for the individual categories.
    Type: Grant
    Filed: April 21, 1997
    Date of Patent: September 28, 1999
    Assignee: NEC Corporation
    Inventor: Koichi Shinoda
  • Patent number: 5895448
    Abstract: Methods and apparatus for generating and using both speaker dependent and speaker independent garbage models in speaker dependent speech recognition applications are described. The present invention recognizes that in some speech recognition systems, e.g., systems where multiple speech recognition operations are performed on the same signal, it may be desirable to recognize and treat words or phrases in one part of the speech recognition system as garbage or out of vocabulary utterances with the understanding that the very same words or phrases will be recognized and treated as in-vocabulary by another portion of the system. In accordance with the present invention, in systems where both speaker independent and speaker dependent speech recognition operations are performed independently, e.g.
    Type: Grant
    Filed: April 30, 1997
    Date of Patent: April 20, 1999
    Assignee: Nynex Science and Technology, Inc.
    Inventors: George J. Vysotsky, Vijay R. Raman
  • Patent number: 5890114
    Abstract: HMM training method comprising a first parameter predicting step, a centroid state set calculating step, a reconstructing step, a second parameter predicting step and a control step. In the first parameter predicting step, a parameter of an HMM (hidden Markov model) is predicted based on training data. In the centroid state set calculating step, a centroid state set is calculated by clustering the state of said HMM whose parameter is predicted in the first parameter predicting step. In the reconstructing step, an HMM is reconstructed with using the centroid state calculated in the centroid state set calculating step. In the second parameter predicting step, predicted a parameter of the HMM reconstructed in the reconstructing step with using the training data. And, the centroid step is reexecuted by the control step in the case that a likelihood of the HMM whose parameter is predicted in the second parameter predicting step does not satisfy a predetermined condition.
    Type: Grant
    Filed: February 28, 1997
    Date of Patent: March 30, 1999
    Assignee: Oki Electric Industry Co., Ltd.
    Inventor: Jie Yi
  • Patent number: 5890110
    Abstract: A variable dimension vector quantization method that uses a single "universal" codebook. The method can be given the interpretation of sampling full-dimensioned codevectors in the universal codebook and generating subcodevectors of the same dimension as input data subvector, which dimension may vary in time. A subcodevector is selected from the codebook to have minimum distortion between it and the input data subvector. The subcodevector with minimum distortion corresponds to the representative, full-dimensioned codevector in the codebook. The codebook is designed by inverse sampling of training subvectors to obtain full-dimension vectors, then iteratively clustering the training set until a stable centroid vector is obtained.
    Type: Grant
    Filed: March 27, 1995
    Date of Patent: March 30, 1999
    Assignee: The Regents of the University of California
    Inventors: Allen Gersho, Amitava Das, Ajit Venkat Rao
  • Patent number: 5884258
    Abstract: A method and system for editing words that have been misrecognized. The system allows a speaker to specify a number of alternative words to be displayed in a correction window by resizing the correction window. The system also displays the words in the correction window in alphabetical order. A preferred system eliminates the possibility, when a misrecognized word is respoken, that the respoken utterance will be again recognized as the same misrecognized word. The system, when operating with a word processor, allows the speaker to specify the amount of speech that is buffered before transferring to the word processor.
    Type: Grant
    Filed: October 31, 1996
    Date of Patent: March 16, 1999
    Assignee: Microsoft Corporation
    Inventors: Michael J. Rozak, Fileno A Alleva
  • Patent number: 5875425
    Abstract: A speech recognition system for recognizing a system user's speech can shorten a recognition period by reducing the amount of necessary calculations without deteriorating the accuracy rate of recognition. The speech recognition system successively calculates statistical probabilities of acoustic models, outputs a one sentence recognition result corresponding to acoustic models having the highest reliability when the one sentence is detected and stops the following calculations.
    Type: Grant
    Filed: December 23, 1996
    Date of Patent: February 23, 1999
    Assignee: Kokusai Denshin Denwa Co., Ltd.
    Inventors: Makoto Nakamura, Naomi Inoue, Fumihiro Yato, Seiichi Yamamoto
  • Patent number: 5864807
    Abstract: A method and apparatus for training a system to assess the identity of a person through the audio characteristics of their voice. The system inserts an audio input (10) into an A/D Converter (20) for processing in a digital signal processor (30). The system then applies Neural network type processing by using a polynomial pattern classifier (60) for training the speaker recognition system.
    Type: Grant
    Filed: February 25, 1997
    Date of Patent: January 26, 1999
    Assignee: Motorola, Inc.
    Inventors: William Michael Campbell, Khaled Talal Assaleh
  • Patent number: 5860063
    Abstract: A system and method for automated task selection is provided where a selected task is identified from the natural speech of the user making the selection. The system and method incorporate the selection of meaningful phrases through the use of a test for significance. The selected meaningful phrases are then clustered. The meaningful phrase clusters are input to a speech recognizer that determines whether any meaningful phrase clusters are present in the input speech. Task-type decisions are then made on the basis of the recognized meaningful phrase clusters.
    Type: Grant
    Filed: July 11, 1997
    Date of Patent: January 12, 1999
    Assignee: AT&T Corp
    Inventors: Allen Louis Gorin, Jeremy Huntley Wright
  • Patent number: 5854999
    Abstract: Compensatory values for compensating a reference pattern to match with an utterance environment of an input speech are employed for determining an environmental variation index to be input to a secondary matching controller, which is responsible for magnitudes of the index smaller than a threshold to hold a second matching section inoperative so that a recognition result of a primary matching of a previous compensated reference pattern is output, and for magnitudes of the index larger than the threshold to operate the second matching section to output a recognition result of a second matching based on a current compensated reference pattern to be stored as a subsequent reference pattern.
    Type: Grant
    Filed: June 24, 1996
    Date of Patent: December 29, 1998
    Assignee: NEC Corporation
    Inventor: Hiroshi Hirayama
  • Patent number: 5852804
    Abstract: A speech recognizing apparatus compares a speech command from a user with one of registration patterns stored in a storage unit in turn. Then if the speech command coincides with one of the registration patterns, the speech recognizing apparatus controls a predetermined electronic apparatus associated with an operation related to the registration pattern. If the speech command does not coincide with any one of the registration patterns, the speech recognizing apparatus stores into a memory the speech command as a new registration pattern in which the speech command is related to a manipulation of the electronic apparatus produced by the user immediately after speech command is produced.
    Type: Grant
    Filed: April 11, 1997
    Date of Patent: December 22, 1998
    Assignee: Fujitsu Limited
    Inventor: Kazuya Sako
  • Patent number: 5819221
    Abstract: Improved speech recognition is achieved according to the present invention by use of between word and/or between phrase coarticulation. The increase in the number of phonetic models required to model this additional vocabulary is reduced by clustering 19, 20 the inter-word/phrase models and grammar into only a few classes. By using one class for consonant inter-word context and two classes for vowel contexts, the accuracy for Japanese was almost as good as for unclustered models while the number of models was reduced more than half.
    Type: Grant
    Filed: August 31, 1994
    Date of Patent: October 6, 1998
    Assignee: Texas Instruments Incorporated
    Inventors: Kazuhiro Kondo, Ikuo Kudo, Yu-Hung Kao, Barbara J. Wheatley
  • Patent number: 5812975
    Abstract: A method of designing a state transition model capable of high speed voice recognition and a voice recognition method and apparatus using the state transition model is provided. The methods provide a state transition model in which a state shared structure of the state transition model is designed. The method includes a step of setting the states of a triphone state transition model in an acoustic space as initial clusters, a clustering step of generating a cluster containing the initial clusters by top-down clustering, a step of determining a state shared structure by assigning a short distance cluster among clusters generated by the clustering step, to the state transition model, and a step of learning a state shared model by analyzing the states of the triphones in accordance with the determined state shared structure.
    Type: Grant
    Filed: June 18, 1996
    Date of Patent: September 22, 1998
    Assignee: Canon Kabushiki Kaisha
    Inventors: Yasuhiro Komori, Yasunori Ohora
  • Patent number: 5806029
    Abstract: Hierarchical signal bias removal (HSBR) signal conditioning uses a codebook constructed from the set of recognition models and is updated as the recognition models are modified during recognition model training. As a result, HSBR signal conditioning and recognition model training are based on the same set of recognition model parameters, which provides significant reduction in recognition error rate for the speech recognition system.
    Type: Grant
    Filed: September 15, 1995
    Date of Patent: September 8, 1998
    Assignee: AT&T Corp
    Inventors: Eric Rolfe Buhrke, Wu Chou, Mazin G. Rahim
  • Patent number: 5806030
    Abstract: The clustering technique produces a low complexity and yet high accuracy speech representation for use with speech recognizers. The task database comprising the test speech to be modeled is segmented into subword units such as phonemes and labeled to indicate each phoneme in its left and right context (triphones). Hidden Markov Models are constructed for each context-independent phoneme and trained. Then the center states are tied for all phonemes of the same class. Triphones are trained and all poorly-trained models are eliminated by merging their training data with the nearest well-trained model using a weighted divergence computation to ascertain distance. Before merging, the threshold for each class is adjusted until the number of good models for each phoneme class is within predetermined upper and lower limits. Finally, if desired, the number of mixture components used to represent each model may be increased and the models retrained. This latter step increases the accuracy.
    Type: Grant
    Filed: May 6, 1996
    Date of Patent: September 8, 1998
    Inventor: Jean-Claude Junqua
  • Patent number: 5787394
    Abstract: A system and method for adaptation of a speaker independent speech recognition system for use by a particular user. The system and method gather acoustic characterization data from a test speaker and compare the data with acoustic characterization data generated for a plurality of training speakers. A match score is computed between the test speaker's acoustic characterization for a particular acoustic subspace and each training speaker's acoustic characterization for the same acoustic subspace. The training speakers are ranked for the subspace according to their scores and a new acoustic model is generated for the test speaker based upon the test speaker's acoustic characterization data and the acoustic characterization data of the closest matching training speakers. The process is repeated for each acoustic subspace.
    Type: Grant
    Filed: December 13, 1995
    Date of Patent: July 28, 1998
    Assignee: International Business Machines Corporation
    Inventors: Lalit Rai Bahl, Ponani Gopalakrishnan, David Nahamoo, Mukund Padmanabhan
  • Patent number: 5787395
    Abstract: A voice recognizing method in which a plurality of voice recognition objective words are provided. Scores are accumulated for an unknown input voice signal as compared to the voice recognition objective words by using parameters which are calculated in advance. Upon receipt of an unknown voice signal, a corresponding voice recognition objective word is extracted and recognized. The voice recognition objective words are structured into an overlapping hierarchical structure by using correlation values between each pair of voice recognition objective words. This correlation may be computed from acoustic features, HMM parameters or the like. Score calculation is performed on the unknown input voice signal by using a dictionary of the voice recognition objective words structured in the hierarchical structure. Upon preliminary recognition, the dictionary of the voice recognition objective words is resorted without recalculation of the correlation values.
    Type: Grant
    Filed: July 18, 1996
    Date of Patent: July 28, 1998
    Assignee: Sony Corporation
    Inventor: Katsuki Minamino
  • Patent number: 5778336
    Abstract: A joint data (features) and channel (bias) estimation framework for robust processing of speech received over a channel is described. A trellis encoded vector quantizer is used as a pre-processor to estimate the channel bias using blind maximum likelihood sequence estimation. Sequential constraint in the feature vector sequence of a speech signal is applied for the selection of the quantized signal constellation and for the decoding process in joint data and channel estimation. A two state trellis encoded vector quantizer is designed for signal bias removal applications.
    Type: Grant
    Filed: October 1, 1996
    Date of Patent: July 7, 1998
    Assignee: Lucent Technologies Inc.
    Inventors: Wu Chou, Nambirajan Seshadri
  • Patent number: 5749072
    Abstract: A communications device (20) that is responsive to voice commands is provided. The communications device (20) can be a two-way radio, cellular telephone, PDA, or pager. The communications device (20) includes an interface (22) for allowing a user to access a communications channel according a control signal and a speech-recognition system (24) for producing the control signal in response to a voice command. Included in the speech recognition system (24) are a feature extractor (26) and one or more classifiers (28) utilizing polynomial discriminant functions.
    Type: Grant
    Filed: December 28, 1995
    Date of Patent: May 5, 1998
    Assignee: Motorola Inc.
    Inventors: Theodore Mazurkiewicz, Gil E. Levendel, Shay-Ping Thomas Wang
  • Patent number: 5749066
    Abstract: An automated speech recognition system converts a speech signal into a compact, coded representation that correlates to a speech phoneme set. A number of different neural network pattern matching schemes may be used to perform the necessary speech coding. An integrated user interface guides a user unfamiliar with the details of speech recognition or neural networks to quickly develop and test a neural network for phoneme recognition. To train the neural network, digitized voice data containing known phonemes that the user wants the neural network to ultimately recognize are processed by the integrated user interface. The digitized speech is segmented into phonemes with each segment being labelled with a corresponding phoneme code. Based on a user selected transformation method and transformation parameters, each segment is transformed into a series of multiple dimension vectors representative of the speech characteristics of that segment.
    Type: Grant
    Filed: April 24, 1995
    Date of Patent: May 5, 1998
    Assignee: Ericsson Messaging Systems Inc.
    Inventor: Paul A. Nussbaum
  • Patent number: 5664058
    Abstract: To train a speech recognizer, a new voice message (one or a few isolated words), after being spoken by a user, is converted into a token. The token is then compared with a plurality of templates stored in the recognizer and a recognition score is obtained each time. The templates previously stored in the recognizer include templates for previously trained voice messages and one or more previously formed templates of the new voice message. Three tests are applied to the recognition scores to determine if the token and one of the previously formed templates of the new voice message can become paradigm templates, if the new voice message is too close in pronunciation to a voice message the recognizer has been previously trained to recognize, or if the user should repeat the new voice message to form another token. This training procedure provides a certain level of automatic control over the training process of a speaker dependent speech recognizer in an otherwise unsupervised environment.
    Type: Grant
    Filed: May 12, 1993
    Date of Patent: September 2, 1997
    Assignee: NYNEX Science & Technology
    Inventor: George Vysotsky