Specialized Equations Or Comparisons Patents (Class 704/236)
  • Patent number: 6438518
    Abstract: A method and apparatus for using coding scheme selection patterns in a predictive speech coder to reduce sensitivity to frame error conditions includes a speech coder configured to select from among various predictive coding modes. After a predefined number of speech frames have been predictively coded, the speech coder codes one frame with a nonpredictive coding mode or a mildly predictive coding mode. The predefined number of frames can be determined in advance from the subjective standpoint of a listener. The predefined number of frames may be varied periodically. An average coding bit rate may be maintained for the speech coder by ensuring that an average coding bit rate is maintained for each successive pattern, or group, of predictively coded speech frames including at least one nonpredictively coded or mildly predictively coded speech frame.
    Type: Grant
    Filed: October 28, 1999
    Date of Patent: August 20, 2002
    Assignee: Qualcomm Incorporated
    Inventors: Sharath Manjunath, Andrew P. Dejaco, Arasanipalai K. Ananthapadmanabhan, Eddie Lun Tik Choy
  • Publication number: 20020095286
    Abstract: A conversation manager processes spoken utterances from a user of a computer. The conversation manager includes a semantics analysis module and a syntax manager. A domain model that is used in processing the spoken utterances includes an ontology (i.e., world view for the relevant domain of the spoken utterances), lexicon, and syntax definitions. The syntax manager combines the ontology, lexicon, and syntax definitions to generate a grammatic specification. The semantics module uses the grammatic specification and the domain model to develop a set of frames (i.e., internal representation of the spoken utterance). The semantics module then develops a set of propositions from the set of frames. The conversation manager then uses the set of propositions in further processing to provide a reply to the spoken utterance.
    Type: Application
    Filed: October 25, 2001
    Publication date: July 18, 2002
    Applicant: International Business Machines Corporation
    Inventors: Steven I. Ross, Robert C. Armes, Julie F. Alweis, Elizabeth A. Brownholtz, Jeffrey G. MacAllister
  • Patent number: 6421644
    Abstract: An information apparatus is constructed for notifying output information to a remote terminal in response to an input signal of a sound. In the information apparatus, a first memory block memorizes characteristic data representing characteristics of various sounds. A second memory block memorizes various items of output information in correspondence to the characteristic data of the various sounds such that each item of the output information is associated to each sound. An input device collects a sound to provide an input signal of the collected sound. An analyzer device extracts characteristic data from the input signal of the collected sound. A controller device operates according to the extracted characteristic data for addressing the first memory block and the second memory block to identity the item of the output information corresponding to the collected sound. A transmitter device transmits the identified item of the output information to the remote terminal.
    Type: Grant
    Filed: July 26, 1999
    Date of Patent: July 16, 2002
    Assignee: Yamaha Corporation
    Inventor: Hiromi Okitsu
  • Patent number: 6421640
    Abstract: The invention relates to a method of automatically recognizing speech utterances, in which a recognition result is evaluated by means of a first confidence measure and a plurality of second confidence measures determined for a recognition result is automatically combined for determining the first confidence measure. To reduce the resultant error rate in the assessment of the correctness of a recognition result, the method is characterized in that the determination of the parameters weighting the combination of the second confidence measures is based on a minimization of a cross-entropy-error measure. A further improvement is achieved by means of a post-processing operation based on the maximization of the Gardner-Derrida error function.
    Type: Grant
    Filed: September 13, 1999
    Date of Patent: July 16, 2002
    Assignee: Koninklijke Philips Electronics N.V.
    Inventors: Jannes G. A. Dolfing, Andreas Wendemuth
  • Patent number: 6418409
    Abstract: A method for use in a detection unit that produces a score to be converted into a binary decision via the setting of a threshold is a method for generating the score as an error-derived score such that the threshold is a tolerable one-sided error probability. The method includes the steps of generating a primary score that is a monotonic function of the posterior probability, obtaining a distribution of primary scores of input signals that ought to lead to a particular binary decision, and translating, based on the distribution, the primary score of a current input signal to the error-derived score.
    Type: Grant
    Filed: October 26, 1999
    Date of Patent: July 9, 2002
    Assignee: Persay Inc.
    Inventor: Yaakov Metzger
  • Patent number: 6411930
    Abstract: Speaker identification is performed using a single Gaussian mixture model (GMM) for multiple speakers—referred to herein as a Discriminative Gaussian mixture model (DGMM). A likelihood sum of the single GMM is factored into two parts, one of which depends only on the Gaussian mixture model, and the other of which is a discriminative term. The discriminative term allows for the use of a binary classifier, such as a support vector machine (SVM). In one embodiment of the invention, a voice messaging system incorporates a DGMM to identify the speaker who generated a message, if that speaker is a member of a chosen list of target speakers, or to identify the speaker as a “non-target” otherwise.
    Type: Grant
    Filed: February 12, 1999
    Date of Patent: June 25, 2002
    Assignee: Lucent Technologies Inc.
    Inventor: Christopher John Burges
  • Patent number: 6411925
    Abstract: A speech processing apparatus is provided in which the distribution of energy with frequency within each frame of an input speech signal is determined and any energy components which are less than a masking level determined relative to the maximum energy within the frame are made equal to the masking level.
    Type: Grant
    Filed: September 30, 1999
    Date of Patent: June 25, 2002
    Assignee: Canon Kabushiki Kaisha
    Inventor: Robert Alexander Keiller
  • Patent number: 6401064
    Abstract: A method and apparatus for speech recognition using Markov processes on curves are presented. The method and apparatus operate such that input speech utterances are received and represented as multidimensional curves. The curve is split into acoustic segments representing different components based on initial model estimates. The segments are used to create a new statistical model for the curve. The process may be reiterated to produce a more precise statistical model for recognition. As a result, feature vectors are extracted from input speech and contribute to a recognition score in proportion to their arc length. The arc lengths are weighted to minimize recognition errors due to variations in speaking rate. In addition, more importance is attached to short-lived but non-stationary sounds, such as consonants.
    Type: Grant
    Filed: May 24, 2001
    Date of Patent: June 4, 2002
    Assignee: AT&T Corp.
    Inventor: Lawrence Kevin Saul
  • Publication number: 20020062211
    Abstract: A method and apparatus for extracting speech features from a speech signal in which the linear frequency spectrum data, as generated, for example, by a conventional frequency transform, is first converted to logarithmic frequency spectrum data having frequency data distributed on a substantially logarithmic (rather than linear) frequency scale. Then, a plurality of digital auditory filters is applied to the resultant logarithmic frequency spectrum data, each of these filters having a substantially similar shape, but centered at different points on the logarithmic frequency scale. Because each of the filters have a similar shape, the feature extraction approach of the present invention advantageously can be easily modified or tuned by adjusting each of the filters in a coordinated manner, with the adjustment of only a handful of filter parameters.
    Type: Application
    Filed: April 2, 2001
    Publication date: May 23, 2002
    Inventors: Qi P. Li, Olivier Siohan, Frank Kao-Ping Soong
  • Publication number: 20020055838
    Abstract: The invention relates to a method for speech processing in which input variables containing speech features are mapped onto output variables. In the mapping process, the input variables are weighted and/or identical maps are produced for different sets of input variables and at least one output variable.
    Type: Application
    Filed: September 24, 2001
    Publication date: May 9, 2002
    Inventors: Achim Mueller, Hans-Georg Zimmermann
  • Publication number: 20020046024
    Abstract: A method for recognizing speech is proposed wherein the process of recognition is started using the starting acoustic model (SAM) and wherein the current acoustic model (CAM) is modified by removing or cancelling model function mixture components (MFMjk) which are negligible for the description of the speaking behaviour and quality of the current speaker. Therefore, the size of the acoustic model (SAM, CAM) is reduced by adaptation to the current speaker enabling fast performance and increased recognition efficiency.
    Type: Application
    Filed: September 5, 2001
    Publication date: April 18, 2002
    Inventors: Ralf Kompe, Silke Goronzy
  • Patent number: 6374216
    Abstract: A nonparametric family of density functions formed by histogram estimators for modeling acoustic vectors are used in automatic recognition of speech. A Gaussian kernel is set forth in the density estimator. When the densities are found for all the basic sounds in a training stage, an acoustic vector is assigned to a phoneme label corresponding to the highest likelihood for the basis of the decoding of acoustic vectors into text.
    Type: Grant
    Filed: September 27, 1999
    Date of Patent: April 16, 2002
    Assignee: International Business Machines Corporation
    Inventors: Charles A. Micchelli, Peder A. Olsen
  • Patent number: 6353810
    Abstract: A voice signal and an emotion associated therewith is provided. Then, the emotion associated with the voice signal is determined in an automated manner and subsequently stored. Next, a user determined emotion associated with the voice signal is determined by a user and received. The automatically determined emotion with the user determined emotion are then compared.
    Type: Grant
    Filed: August 31, 1999
    Date of Patent: March 5, 2002
    Assignee: Accenture LLP
    Inventor: Valery A. Petrushin
  • Publication number: 20020010581
    Abstract: A voice recognition device, where at least two input signals are routed in parallel via respective, separate channels to a recognition device having a feature extraction device for forming feature vectors, a transformation device for forming transformed feature vectors, and having a subsequent classification unit that classifies the supplied, transformed feature vectors and emits output signals corresponding to the determined classes. A high rate of recognition at a relatively low expenditure for the design and processing are achieved in that the feature extraction device has feature extraction stages separately arranged in the individual channels, the feature extraction stages being connected at their outputs to the shared transformation device.
    Type: Application
    Filed: June 13, 2001
    Publication date: January 24, 2002
    Inventors: Stephan Euler, Andreas Korthauer
  • Patent number: 6341264
    Abstract: Electronic commerce (E-commerce) and Voice commerce (V-commerce) proceeds by having the user speak into the system. The user's speech is converted by speech recognizer into a form required by the transaction processor that effects the electronic commerce operation. A dimensionality reduction processor converts the user's input speech into a reduced dimensionality set of values termed eigenvoice parameters. These parameters are compared with a set of previously stored eigenvoice parameters representing a speaker population (the eigenspace representing speaker space) and the comparison is used by the speech model adaptation system to rapidly adapt the speech recognizer to the user's speech characteristics. The user's eigenvoice parameters are also stored for subsequent use by the speaker verification and speaker identification modules.
    Type: Grant
    Filed: February 25, 1999
    Date of Patent: January 22, 2002
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Roland Kuhn, Jean-Claude Junqua
  • Publication number: 20010044718
    Abstract: A feature extraction process for use in a wireless communication system provides automatic speech recognition based on both spectral envelope and voicing information. The shape of the spectral envelope is used to determine the LSPs of the incoming bitstream and the adaptive gain coefficients and fixed gain coefficients are used to generate the “voiced” and “unvoiced” feature parameter information.
    Type: Application
    Filed: December 5, 2000
    Publication date: November 22, 2001
    Inventors: Richard Vandervoort Cox, Hong Kook Kim
  • Patent number: 6314392
    Abstract: In a computerized method a continuous signal is segmented in order to determine statistically stationary units of the signal. The continuous signal is sampled at periodic intervals to produce a timed sequence of digital samples. Fixed numbers of adjacent digital samples are grouped into a plurality of disjoint sets or frames. A statistical distance between adjacent frames is determined. The adjacent sets are merged into a larger set of samples or cluster if the statistical distance is less than a predetermined threshold. In an iterative process, the statistical distance between the adjacent sets are determined, and as long as the distance is less than the predetermined threshold, the sets are iteratively merged to segment the signal into statistically stationary units.
    Type: Grant
    Filed: September 20, 1996
    Date of Patent: November 6, 2001
    Assignee: Digital Equipment Corporation
    Inventors: Brian S. Eberman, William D. Goldenthal
  • Patent number: 6301562
    Abstract: A speech recognition method that combines time encoding and hidden Markov approaches. The speech is input and encoded using time encoding, such as TESPAR. A hidden Markov model generates scores; the scores are used to determine the speech element; and the result is output.
    Type: Grant
    Filed: April 27, 2000
    Date of Patent: October 9, 2001
    Assignee: New Transducers Limited
    Inventors: Henry Azima, Charalampos Ferekidis, Sean Kavanagh
  • Patent number: 6292776
    Abstract: A method and apparatus for first training and then recognizing speech. The method and apparatus use subband cepstral features to improve the recognition string accuracy rates for speech inputs.
    Type: Grant
    Filed: March 12, 1999
    Date of Patent: September 18, 2001
    Assignee: Lucent Technologies Inc.
    Inventor: Rathinavelu Chengalvarayan
  • Patent number: 6263308
    Abstract: Automated methods and apparatus for synchronizing audio and text data, e.g., in the form of electronic files, representing audio and text expressions of the same work or information are described. Also described are automated methods of detecting errors and other discrepancies between the audio and text versions of the same work. A speech recognition operation is performed on the audio data initially using a speaker independent acoustic model. The recognized text in addition to audio time stamps are produced by the speech recognition operation. The recognized text is compared to the text in text data to identify correctly recognized words. The acoustic model is then retrained using the correctly recognized text and corresponding audio segments from the audio data transforming the initial acoustic model into a speaker trained acoustic model. The retrained acoustic model is then used to perform an additional speech recognition operation on the audio data.
    Type: Grant
    Filed: March 20, 2000
    Date of Patent: July 17, 2001
    Assignee: Microsoft Corporation
    Inventors: David E. Heckerman, Fileno A. Alleva, Robert L. Rounthwaite, Daniel Rosen, Mei-Yuh Hwang, Yoram Yaacovi, John L. Manferdelli
  • Patent number: 6263309
    Abstract: A set of speaker dependent models is trained upon a comparatively large number of training speakers, one model per speaker, and model parameters are extracted in a predefined order to construct a set of supervectors, one per speaker. Principle component analysis is then performed on the set of supervectors to generate a set of eigenvectors that define an eigenvoice space. If desired, the number of vectors may be reduced to achieve data compression. Thereafter, a new speaker provides adaptation data from which a supervector is constructed by constraining this supervector to be in the eigenvoice space based on a maximum likelihood estimation. The resulting coefficients in the eigenspace of this new speaker may then be used to construct a new set of model parameters from which an adapted model is constructed for that speaker. Environmental adaptation may be performed by including environmental variations in the training data.
    Type: Grant
    Filed: April 30, 1998
    Date of Patent: July 17, 2001
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Patrick Nguyen, Roland Kuhn, Jean-Claude Junqua
  • Patent number: 6260012
    Abstract: An apparatus and method for performing improved speech recognition in a communication terminal, e.g., a mobile phone with a hands-free voice dialing function. In a speech recognition mode, a user's input speech such as a desired called party name, number or a phone command, is converted to feature data and compared to individual pre-stored feature data sets corresponding to pre-recorded speech obtained during a registration process. Difference values representing the respective differences between the current user's input speech and the respective data sets are computed. A first closest (most similar) and second closest feature data set correspond to the first smallest and second smallest difference values so obtained. A closeness threshold is computed as the sum of a small, predetermined threshold and a differential value between the first and second difference values.
    Type: Grant
    Filed: March 1, 1999
    Date of Patent: July 10, 2001
    Assignee: Samsung Electronics Co., LTD
    Inventor: Joung-Kyou Park
  • Patent number: 6253181
    Abstract: The recognizer tests input utterances using a confidence measure to select words of high recognition confidence for use in the adaptation process. Adaptation is performed rapidly using a priori knowledge of about the class of speakers who will be using the system. This a priori knowledge can be expressed using eigenvoice basis vectors that capture information about the entire targeted user population. The dialogue system may also use the confidence measure to output a pronunciation example to the user, based on the confidence that the system has in the results of recognition, given the different possibilities that can be recognized. The dialogue system may also provide voiced prompts that teach the user how to correctly pronounce words.
    Type: Grant
    Filed: January 22, 1999
    Date of Patent: June 26, 2001
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventor: Jean-Claude Junqua
  • Patent number: 6246978
    Abstract: A system that provides measurements of speech distortion that correspond closely to user perceptions of speech distortion is provided. The system calculates and analyzes first and second discrete derivatives to detect and determine the incidence of change in the voice waveform that would not have been made by human articulation because natural voice signals change at a limited rate. Statistical analysis is performed of both the first and second discrete derivatives to detect speech distortion by looking at the distribution of the signals. For example, the kurtosis of the signals is analyzed as well as the number of times these values exceed a predetermined threshold. Additionally, the number of times the first derivative data is less than a predetermined low value is analyzed to provide a level of speech distortion and clipping of the signal due to lost data packets.
    Type: Grant
    Filed: May 18, 1999
    Date of Patent: June 12, 2001
    Assignee: MCI WorldCom, Inc.
    Inventor: William C. Hardy
  • Patent number: 6236960
    Abstract: An improved speech coder takes advantage of the fact that any given pulse combination can be uniquely described by the following four properties: number of degenerate pulses, signs of pulses, positions of pulses, and pulse magnitudes. In accordance with the invention, a four stage iterative classification of the pulse combinations, where each stage groups the pulse combinations by one of these four properties, is performed. The process starts with the number of pulses, then determines the total number of possible sign combinations, pulse position combinations, and pulse magnitude combinations. This flexibility allows for the sign combinations to be grouped in the last stage. Since the number of sign combinations is always a power of two, leaving the sign combinations for last along with appropriately ordering the elements in the previous three stages allows the signs to be coded by independent bits, in turn allowing for error protection of those bits.
    Type: Grant
    Filed: August 6, 1999
    Date of Patent: May 22, 2001
    Assignee: Motorola, Inc.
    Inventors: Weimin Peng, Edgardo Manuel Cruz Zeno, James Patrick Ashley
  • Patent number: 6230128
    Abstract: A path link passing speech recognition system and method recognizes input connected speech. The recognition system has a plurality of vocabulary nodes associated with word representation models, at least one of the vocabulary nodes of the network being able to process more than one path link simultaneously, so allowing for more than one recognition result.
    Type: Grant
    Filed: November 21, 1995
    Date of Patent: May 8, 2001
    Assignee: British Telecommunications public limited company
    Inventor: Samuel Gavin Smyth
  • Patent number: 6230126
    Abstract: A device for speech recognition includes a dictionary which stores features of recognition objects. The device further includes a matching unit which compares features of input speech with the features of the recognition objects, and a dictionary updating unit which updates time lengths of phonemics in the dictionary based on the input speech when the matching unit finds substantial similarities between the input speech and one of the recognition objects.
    Type: Grant
    Filed: December 17, 1998
    Date of Patent: May 8, 2001
    Assignee: Ricoh Company, Ltd.
    Inventor: Masaru Kuroda
  • Patent number: 6208972
    Abstract: According to the present invention, a method for integrating processes with a multi-faceted human centered interface is provided. The interface is facilitated to implement a hands free, voice driven environment to control processes and applications. A natural language model is used to parse voice initiated commands and data, and to route those voice initiated inputs to the required applications or processes. The use of an intelligent context based parser allows the system to intelligently determine what processes are required to complete a task which is initiated using natural language. A single window environment provides an interface which is comfortable to the user by preventing the occurrence of distracting windows from appearing. The single window has a plurality of facets which allow distinct viewing areas. Each facet has an independent process routing its outputs thereto. As other processes are activated, each facet can reshape itself to bring a new process into one of the viewing areas.
    Type: Grant
    Filed: December 23, 1998
    Date of Patent: March 27, 2001
    Inventors: Richard Grant, Pedro E. McGregor
  • Patent number: 6199039
    Abstract: An MPEG-II audio decoder with a synthesis subband filter includes a fast IMDCT (Inverse Modified Discrete Cosine Transform) module and an IPQMF (Inverse Pseudo Quadrature Mirror Filter) module. The fast IMDCT module involves a butterfly stage of input subband samples which requires only about ¼ the amount of multiplier-accumulate computation of the ISO suggested method. The IPQMF module involves an efficient memory configuration which requires only half size of the standard synthesis subband filter bank.
    Type: Grant
    Filed: August 3, 1998
    Date of Patent: March 6, 2001
    Assignee: National Science Council
    Inventors: Liang-Gee Chen, Tsung-Han Tsai, Yuan-Chen Liu
  • Patent number: 6195639
    Abstract: The present invention provides a system and method for improving conventional, isolated word, speech recognition systems. According to exemplary embodiments of the present invention, a pattern matching algorithm is provided that permits an unknown speech signal to be recognized with fewer memory access operations compared to conventional techniques. The pattern matching algorithm performs multiple successive calculations on speech reference data retrieved from memory to thereby reduce the number of times that the same data is retrieved. By reducing the number of memory access operations, the throughput of the speech recognition system can be increased. As an alternative, the pattern matching algorithm allows for an increase in the size of the speech recognition system's vocabulary.
    Type: Grant
    Filed: May 28, 1999
    Date of Patent: February 27, 2001
    Assignee: Telefonaktiebolaget LM Ericsson (publ)
    Inventors: Alberto Jimenez Feltström, Jim Rasmusson
  • Patent number: 6192353
    Abstract: An improved method and system for training and classifying using a low complexity and high accuracy multiresolutional polynomial classifier (412) is presented. A method of training an multiresolutional polynomial classifier which reduces the complexity of existing classifiers allows models representing subgroups of classes to easily be created. The models which represent subgroups of classes are applied to an unidentified input to produce a coarse classification of the unidentified input using a low order classifier. Once a coarse classification of the unidentified input is performed, a more detailed classification is performed using another low complexity classifier.
    Type: Grant
    Filed: February 9, 1998
    Date of Patent: February 20, 2001
    Assignee: Motorola, Inc.
    Inventors: Khaled Assaleh, William Michael Campbell, John Eric Kleider
  • Patent number: 6185531
    Abstract: A method for improving the associating articles of information or stories with topics associated with specific subjects (subject topics) and with a general topic of words that are not associated with any subject. The inventive method is trained using Hidden Markov Models (HMM) to represent each story with each state in the HMM representing each topic. A standard Expectation and Maximization algorithm, as are known in this art field can be used to maximize the expected likelihood to the method relating the words associated with each topic to that topic. In the method, the probability that each word in a story is related to a subject topic is determined and evaluated, and the subject topics with the lowest probability are discarded. The remaining subject topics are evaluated and a sub-set of subject topics with the highest probabilities over all the words in a story are considered to be the “correct” subject topic set.
    Type: Grant
    Filed: January 9, 1998
    Date of Patent: February 6, 2001
    Assignee: GTE Internetworking Incorporated
    Inventors: Richard M. Schwartz, Toru Imai
  • Patent number: 6182035
    Abstract: A voice activity detector that implements a fast wavelet transformation using filter pairs. A quadrature high pass filter provides an output signal corresponding to the upper half of the Nyquist frequency and a quadrature low pass filter provides an output signal corresponding to the lower half of the Nyquist frequency. The quadrature high pass filter is useful for catching and isolating transients in the input signal and the quadrature low pass filter is useful for fine frequency analysis. The voice activity detector can utilize multiple decomposition levels that are arranged in a pyramid or tree formation to increase the reliability of the voice activity decision. For example, the output of the quadrature low pass filter can be further decomposed using a second pair of filters. The voice activity decision can be generated by comparing a signal power estimate for the output of the filter pairs to threshold levels that are specific for each filter or frequency range.
    Type: Grant
    Filed: March 26, 1998
    Date of Patent: January 30, 2001
    Assignee: Telefonaktiebolaget LM Ericsson (publ)
    Inventor: Fisseha Mekuria
  • Patent number: 6151575
    Abstract: A source-adapted model for use in speech recognition is generated by defining a linear relationship between a first element of an initial model and a first element of the source-adapted model. Thereafter, speech data that corresponds to the first element of the initial model is assembled from a set of speech data for a particular source associated with the source-adapted model. A linear transform that maps between the assembled speech data and the first element of the initial model is then determined. Finally, a first element of the source-adapted model is produced from the first element of the initial model using the linear transform.
    Type: Grant
    Filed: October 28, 1997
    Date of Patent: November 21, 2000
    Assignee: Dragon Systems, Inc.
    Inventors: Michael Jack Newman, Laurence S. Gillick, Venkatesh Nagesha
  • Patent number: 6134525
    Abstract: A discriminant or identification function is used for pattern recognition in which the highest performance can be offered when adaptation is made. Learning is carried out while a discriminant or identification function is adapted to a learning sample. For example, a standard pattern of the character "A" used as an identification function is learned such that when the character "A" slanting in the right or left direction is input, the standard pattern of the character "A" is rotated (adapted) in accordance with the slanting of the input learning sample.
    Type: Grant
    Filed: October 21, 1998
    Date of Patent: October 17, 2000
    Assignee: Sony Corporation
    Inventor: Naoto Iwahashi
  • Patent number: 6125345
    Abstract: A multiple confidence measures subsystem of an automated speech recognition system allows otherwise independent confidence measures to be integrated and used for both training and testing on a consistent basis. Speech to be recognized is input to a speech recognizer and a recognition verifier of the multiple confidence measures subsystem. The speech recognizer generates one or more confidence measures. The speech recognizer preferably generates a misclassification error (MCE) distance as one of the confidence measures. The recognized speech output by the speech recognizer is input to the recognition verifier, which outputs one or more confidence measures. The recognition verifier preferably outputs a misverification error (MVE) distance as one of the confidence measures. The confidence measures output by the speech recognizer and the recognition verifier are normalized and then input to an integrator.
    Type: Grant
    Filed: September 19, 1997
    Date of Patent: September 26, 2000
    Assignee: AT&T Corporation
    Inventors: Piyush C. Modi, Mazin G. Rahim
  • Patent number: 6122612
    Abstract: A method and apparatus for matching at least a first input identifier with a reference identifier. A user provides an input identifier into a system, and the system produces a recognized identifier based on the input identifier. The system of the present invention perform a check-sum operation to determine whether the recognized identifier was recognized correctly. If the check-sum operation reveals that the recognized identifier is incorrect, the system of the present invention generates a plurality of substitute identifiers. The substitute identifiers are compared to a set of pre-stored reference identifiers. If a match is found between a reference identifier and a substitute identifier, the matched reference identifier is selected as corresponding to the input identifier provided by the user.
    Type: Grant
    Filed: November 20, 1997
    Date of Patent: September 19, 2000
    Assignee: AT&T Corp
    Inventor: Randy G. Goldberg
  • Patent number: 6078883
    Abstract: For training a speech recognition to a multi-item repertoire, the following steps are executed: a speech item is presented by a user person, and the distinctivity thereof in the repertoire is asserted. Under control of a distinctivity found the speech item is inserted into the repertoire. These steps are repeated until reaching repertoire sufficiency. In particular, the asserting determines a likeness among the actually presented speech item and all items already in the repertoire, wherein undue likeness with one particular stored item creates a contingency procedure. This implies offering to the user a choice between ignoring the actually presented speech item and alternatively inserting the actually presented speech item at a price of deleting the particular stored item.
    Type: Grant
    Filed: December 17, 1997
    Date of Patent: June 20, 2000
    Assignee: U.S. Philips Corporation
    Inventors: Benoit Guilhaumon, Gilles Miet
  • Patent number: 6076053
    Abstract: A speech recognition method comprises the steps of using given speech data and the N-best algorithm to generate alternative pronunciations and then merging the obtained pronunciations into a pronunciation networks structure; using additional parameters to characterize a pronunciation network for a particular word; optimizing the parameters of the pronunciation networks using a minimum classification error criterion that maximizes a discrimination between different pronunciation networks; and adapting parameters of the pronunciation networks by, first, adjusting probabilities of the possible pronunciations that may be generated by the pronunciation network for a word claimed to be a true one and, second, to correct weights for all of the pronunciation networks by using the adjusted probabilities.
    Type: Grant
    Filed: May 21, 1998
    Date of Patent: June 13, 2000
    Assignee: Lucent Technologies Inc.
    Inventors: Biing-Hwang Juang, Filipp E. Korkmazskiy
  • Patent number: 6044343
    Abstract: One embodiment of a speech recognition system is organized with speech input signal preprocessing and feature extraction followed by a fuzzy matrix quantizer (FMQ) designed with respective codebook sets at multiple signal to noise ratios. The FMQ quantizes various training words from a set of vocabulary words and produces observation sequences O output data to train a hidden Markov model (HMM) processes .lambda.j and produces fuzzy distance measure output data for each vocabulary word codebook. A fuzzy Viterbi algorithm is used by a processor to compute maximum likelihood probabilities PR(O.vertline..lambda.j) for each vocabulary word. The fuzzy distance measures and maximum likelihood probabilities are mixed in a variety of ways to preferably optimize speech recognition accuracy and speech recognition speed performance.
    Type: Grant
    Filed: June 27, 1997
    Date of Patent: March 28, 2000
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Lin Cong, Safdar M. Asghar
  • Patent number: 6023673
    Abstract: A speech coding apparatus and method uses a hierarchy of prototype sets to code an utterance while consuming fewer computing resources. The value of at least one feature of an utterance is measured during each of a series of successive time intervals to produce a series of feature vector signals representing the feature values. A plurality of level subsets of prototype vector signals is computed, wherein each prototype vector signal in a higher level subset is associated with at least one prototype vector signal in a lower level subset. Each level subset contains a plurality of prototype vector signals, with lower level subsets containing more prototypes than higher level subsets. The closeness of the feature value of the first feature vector signal is compared to the parameter values of prototype vector signals in the first level subset of prototype vector signals to obtain a ranked list of prototype match scores for the first feature vector signal and each prototype vector signal in the first level subset.
    Type: Grant
    Filed: June 4, 1997
    Date of Patent: February 8, 2000
    Assignee: International Business Machines Corporation
    Inventors: Raimo Bakis, David Nahamoo, Michael Alan Picheny, Jan Sedivy
  • Patent number: 6009391
    Abstract: One embodiment of a speech recognition system is organized with speech input signal preprocessing and feature extraction followed by a fuzzy matrix quantizer (FMQ). Frames of the speech input signal are represented in a matrix by a vectorf of line spectral pair frequencies and energy coefficients and are fuzzy matrix quantized to respective vector f entries of a matrix codeword in a codebook of the FMQ. The energy coefficients include the original energy and the first and second derivatives of the original energy which increase recognition accuracy by, for example, being generally distinctive speech input signal parameters and providing noise signal suppression especially when the noise signal has a relatively constant energy over at least two time frame intervals. To reduce data while maintaining sufficient resolution, the energy coefficients may be normalized and logarithmically represented. A distance measure between f and f, d(f, f), is defined as ##EQU1## where the constants .alpha..sub.1, .alpha..sub.
    Type: Grant
    Filed: August 6, 1997
    Date of Patent: December 28, 1999
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Safdar M. Asghar, Lin Cong
  • Patent number: 6003002
    Abstract: The method and system of adapting speech recognition models to a speaker environment may comprise receiving a spoken password (52) and getting a set of speaker independent (SI) speech recognition models (54). A mapping sequence may be determined for the spoken password (56). Using the mapping sequence, a speaker ID may be identified (58). A transform may be determined (66) between the SI speech recognition models and the spoken password using the mapping sequence. Speaker adapted (SA) speech recognition models may be generated (68) by applying the transform to SI speech recognition models. A speech input may be recognized (70) by applying the SA speech recognition models.
    Type: Grant
    Filed: December 29, 1997
    Date of Patent: December 14, 1999
    Assignee: Texas Instruments Incorporated
    Inventor: Lorin P. Netsch
  • Patent number: 5987405
    Abstract: A method of transmitting speech signals with reduced bandwith requirements. With this invention an original speech signal is first converted to a textual representation, and a facsimile of the original speech is determined from the textual representation. Then a minimum error turn is derived from the difference between the original speech signal and the facsimile of the original speech signal. The minimum error turn is then compressed, and it is this compressed minimum error turn, along with the textual representation, that is transmitted on the communications medium. At the receiving end, the textual representation and the difference representation are split through a demultiplexer. The textual representation is then passed through a synthesizer while the difference representation is passed through a mapper.
    Type: Grant
    Filed: June 24, 1997
    Date of Patent: November 16, 1999
    Assignee: International Business Machines Corporation
    Inventors: David Frederick Bantz, Robert Joseph Zavrel, Jr.
  • Patent number: 5983176
    Abstract: A method and apparatus for searching for multimedia files in a distributed database and for displaying results of the search based on the context and content of the multimedia files.
    Type: Grant
    Filed: April 30, 1997
    Date of Patent: November 9, 1999
    Assignee: Magnifi, Inc.
    Inventors: Eric M. Hoffert, Karl Cremin, Leo Degen
  • Patent number: 5970239
    Abstract: Method for performing acoustic model estimation to optimize classification accuracy on speaker derived feature vectors with respect to a plurality of classes corresponding to phones to which a plurality of acoustic models respectively correspond comprises: (a) initializing an acoustic model for each phone; (b) evaluating the merit of the acoustic model initialized for each phone utilizing an objective function having a two component discriminant measure capable of characterizing each phone whereby a first component is defined as a probability that the model for the phone assigns to feature vectors from the phone and a second component is defined as a probability that the model for the phone assigns to feature vectors from other phones; (c) adapting the model for selected phones so as to increase the first component for the phone or decrease the second component for the phone, the adapting step yielding a new model for each selected phone; (d) evaluating the merit of the new models for each phone adapted in st
    Type: Grant
    Filed: August 11, 1997
    Date of Patent: October 19, 1999
    Assignee: International Business Machines Corporation
    Inventors: Lalit Rai Bahl, Mukund Padmanabhan
  • Patent number: 5963906
    Abstract: A method and system performs speech recognition training using Hidden Markov Models. Initially, preprocessed speech signals that include a plurality of observations are stored by the system. Initial Hidden Markov Model (HMM) parameters are then assigned. Summations are then calculated using modified equations derived substantially from the following equations, wherein u.ltoreq.v<w:P(X.sub.u.sup.v)=P(x.sub.u.sup.v)P(x.sub.v+1.sup.w)and.OMEGA..sub.ij (x.sub.u.sup.w)=.OMEGA..sub.ij (x.sub.u.sup.v)P(x.sub.v+1.sup.w)+P(x.sub.u.sup.v).OMEGA..sub.ij (x.sub.v+1.sup.w)The calculated summations are then used to perform HMM parameter reestimation. It then determines whether the HMM parameters have converged. If they have, the HMM parameters are then stored. However, if the HMM parameters have not converged, the system again calculates summations, performs HMM parameter reestimation using the summations, and determines whether the parameters have converged.
    Type: Grant
    Filed: May 20, 1997
    Date of Patent: October 5, 1999
    Assignee: AT & T Corp
    Inventor: William Turin
  • Patent number: 5960392
    Abstract: A method and an arrangement for adapting data models in adaptive speaker verification systems or similar adaptive systems using models based on data collected from a person, system or process during a certain time period. A plurality of different model units are used in the same speaker verification system. The verification system is put into operation using a simple model unit requiring a small amount of speech data. During the use, more speech data is collected continuously. This speech data is used to further train either (1) only more complex model units, or (2) both the simple model unit already in operation and the more complex model units. At suitable intervals, a comparison is made of the performance capacities of the model units. Once a more complex model unit yields a more reliable verification result, the more complex model unit is assigned to take over the verification in the operating situation. The more complex model unit may be put into operation either instantaneously or gradually, e.g.
    Type: Grant
    Filed: August 15, 1997
    Date of Patent: September 28, 1999
    Assignee: Telia Research AB
    Inventors: Erik Sundberg, Hakan Melin
  • Patent number: 5956679
    Abstract: A speech processing apparatus includes a noise model production device for extracting a noise-speech interval from input speech data and producing a noise model by using the data of the extracted interval. The apparatus also includes a composite distribution production device for dividing the distributions of a speech model into a plurality of groups, producing a composite distribution of each group, and determining the positional relationship of each distribution within each group. In addition, the apparatus includes a memory for storing each composite distribution and the positional relationship of each distribution within the group, and a PMC conversion device for PMC-converting each produced composite distribution. Also provided is a noise-adaptive speech model production device for producing a noise-adaptive speech model on the basis of the composite distribution which is PMC-converted by the PMC conversion device and the positional relationship stored by the memory.
    Type: Grant
    Filed: December 2, 1997
    Date of Patent: September 21, 1999
    Assignee: Canon Kabushiki Kaisha
    Inventors: Yasuhiro Komori, Hiroki Yamamoto
  • Patent number: 5946655
    Abstract: When a language model is to be used for the recognition of a speech signal and the vocabulary is composed as a tree, the language model value cannot be taken into account before the word end. Customarily, after each word end the comparison with a tree root is started anew, be it with a score which has been increased by the language model value so that the threshold value for the scores at which hypotheses are terminated must be high and hence many, even unattractive hypotheses remain active for a prolonged period of time. In order to avoid this, in accordance with the invention a correction value is added to the score for at least a part of the nodes of the vocabulary tree; the sum of the correction values on the path to a word then may not be greater than the language model value for the relevant word. As a result, for each test signal the scores of all hypotheses are of a comparable order of magnitude.
    Type: Grant
    Filed: March 29, 1995
    Date of Patent: August 31, 1999
    Assignee: U.S. Philips Corporation
    Inventors: Volker Steinbiss, Bach-Hiep Tran, Hermann Ney