Similarity Patents (Class 704/239)
  • Patent number: 6470314
    Abstract: A method of adapting a speech recognition system to one or more acoustic conditions comprises the steps of: (i) computing cumulative distribution functions based on dimensions of speech vectors associated with training speech data provided to the speech recognition system; (ii) computing cumulative distribution functions based on dimensions of speech vectors associated with test speech data provided to the speech recognition system; (iii) computing a nonlinear transformation mapping based on the cumulative distribution functions associated with the training speech data and the cumulative distribution functions associated with the test speech data; and (iv) applying the nonlinear transformation mapping to speech vectors associated with the test speech data prior to recognition, wherein the speech vectors transformed in accordance with the nonlinear transformation mapping are substantially similar to speech vectors associated with the training speech data.
    Type: Grant
    Filed: April 6, 2000
    Date of Patent: October 22, 2002
    Assignee: International Business Machines Corporation
    Inventors: Satyanarayana Dharanipragada, Mukund Padmanabhan
  • Patent number: 6427134
    Abstract: A voice activity detector suitable for deployment in a mobile phone apparatus is disclosed. An advantage of the voice activity detector is that it is better able to provide a decision (79) as to whether an input signal (19) consists of noise (which it is not desired to transmit) or comprises speech or information tones (which are required to be transmitted), especially in noisy environments. The voice activity detector includes a number of components, in particular an auxiliary voice activity detector (3). The auxiliary voice activity detector (3) distinguishes between noise and speech on the basis that the spectrum of speech changes more rapidly than that of noise. This results in the auxiliary detector (3) rarely mistaking a speech signal to be a noise signal. Hence, a very reliable noise template (421) is obtained. For this reason, the auxiliary detector (3) is also useful in noise reduction applications. The voice activity detector also uses a neural net classifier (7).
    Type: Grant
    Filed: September 26, 1998
    Date of Patent: July 30, 2002
    Assignee: British Telecommunications public limited company
    Inventors: Neil Robert Garner, Paul Alexander Barrett
  • Patent number: 6421640
    Abstract: The invention relates to a method of automatically recognizing speech utterances, in which a recognition result is evaluated by means of a first confidence measure and a plurality of second confidence measures determined for a recognition result is automatically combined for determining the first confidence measure. To reduce the resultant error rate in the assessment of the correctness of a recognition result, the method is characterized in that the determination of the parameters weighting the combination of the second confidence measures is based on a minimization of a cross-entropy-error measure. A further improvement is achieved by means of a post-processing operation based on the maximization of the Gardner-Derrida error function.
    Type: Grant
    Filed: September 13, 1999
    Date of Patent: July 16, 2002
    Assignee: Koninklijke Philips Electronics N.V.
    Inventors: Jannes G. A. Dolfing, Andreas Wendemuth
  • Patent number: 6411929
    Abstract: Frames making up an input speech are each collated with a string of phonemes representing speech candidates to be recognized, whereby evaluation values regarding the phonemes are computed. The frames are each compared with part of the phoneme string so as to reduce computations and memory capacity required in recognizing the input speech based on the evaluation values. That is, each frame is compared with a portion of the phoneme string to acquire an evaluation value for each phoneme. If the acquired evaluation value meets a predetermined condition, part of the phonemes to be collated with the next frame are changed. Illustratively, if the evaluation value for the phoneme heading a given portion of collated phonemes is smaller than the evaluation value of the phoneme which terminates that phoneme portion, then the head phoneme is replaced by the next phoneme. The new portion of phonemes obtained by the replacement is used for collation with the next frame.
    Type: Grant
    Filed: July 26, 2000
    Date of Patent: June 25, 2002
    Assignee: Hitachi, Ltd.
    Inventors: Kazuyoshi Ishiwatari, Kazuo Kondo, Shinji Wakisaka
  • Patent number: 6408271
    Abstract: The invention relates to a method and apparatus for generating phrasal transcriptions. The invention provides generating a group of word transcriptions for each vocabulary item in an orthographic phrase. According to a first embodiment, the invention further provides permuting the word transcriptions to generate a plurality of phrasal transcriptions and computing a score for each phrasal transcription in the plurality of phrasal transcriptions. The set of phrasal transcriptions is then selected from the plurality of phrasal transcriptions at least in part on a basis of the score data elements and stored in a format suitable for use by a speech recognition dictionary. As a variant, the phrasal transcriptions may be released in a format suitable for use by a speech synthesizer.
    Type: Grant
    Filed: September 24, 1999
    Date of Patent: June 18, 2002
    Assignee: Nortel Networks Limited
    Inventors: Kenneth W. Smith, Michael G. Sabourin
  • Patent number: 6404925
    Abstract: Methods for segmenting audio-video recording of meetings containing slide presentations by one or more speakers are described. These segments serve as indexes into the recorded meeting. If an agenda is provided for the meeting, these segments can be labeled using information from the agenda. The system automatically detects intervals of video that correspond to presentation slides. Under the assumption that only one person is speaking during an interval when slides are displayed in the video, possible speaker intervals are extracted from the audio soundtrack by finding these regions. Since the same speaker may talk across multiple slide intervals, the acoustic data from these intervals is clustered to yield an estimate of the number of distinct speakers and their order. Clustering the audio data from these intervals yields an estimate of the number of different speakers and their order.
    Type: Grant
    Filed: March 11, 1999
    Date of Patent: June 11, 2002
    Assignees: Fuji Xerox Co., Ltd., Xerox Corporation
    Inventors: Jonathan T. Foote, Lynn Wilcox
  • Patent number: 6397086
    Abstract: The disclosure relates to an infrared controlled hand-free operator for cellular phones housed in a vehicle. More particularly, it is concerned with a hand-free operator which can automatically transmit an infrared signal similar to a control signal of a vehicle's audio stereo system to turn the stereo system off or mute when income signals are received by a cellular phone mounted to a vehicle. The hand-free operator thus can automatically cut off the noise sound of an audio stereo system in operation so as to make the operation of a cellular phone free of interference in a vehicle on the move.
    Type: Grant
    Filed: June 22, 1999
    Date of Patent: May 28, 2002
    Assignee: E-Lead Electronic Co., Ltd.
    Inventor: Tonny Chen
  • Patent number: 6393397
    Abstract: An apparatus for selecting a cohort model for use in a speaker verification system includes a model generator (108) for determining a target speaker model (114) from a speech sample collected from the target speaker (106). A cohort selector (110) determines a similarity value between each of a number of predetermined existing speaker models from a model pool (112) and the target speaker model (114) and a dissimilarity value between each of the existing speaker models and any previously selected cohort models (116). An existing speaker model which is most similar to the target speaker model, but most dissimilar to previously chosen cohort models, is then chosen as another cohort model for the target speaker.
    Type: Grant
    Filed: June 14, 1999
    Date of Patent: May 21, 2002
    Assignee: Motorola, Inc.
    Inventors: Ho Chuen Choi, Xiaoyuan Zhu, Jianming Song
  • Patent number: 6349148
    Abstract: The invention relates to a device for the verification of time-dependent, user-specific signals which includes means for generating a set of feature vectors which serve to provide an approximative description of an input signal and are associated with selectable sampling intervals of the signal; means for preparing an HMM model for the signal; means for determining a first probability value which describes the probability of occurrence of the set of feature vectors, given the HMM model, and a threshold decider for comparing the first probability value with a threshold value and for deciding on the verification of the signal.
    Type: Grant
    Filed: May 26, 1999
    Date of Patent: February 19, 2002
    Assignee: U.S. Philips Corporation
    Inventor: Jannes G.A. Dolfing
  • Patent number: 6341263
    Abstract: A voice recognition system, method and storage medium is provided. The system includes a plurality of storage sections, a selection section, an adaptation section, a plurality of calculation sections, an adaptation section, a normalization section and a decision section. The method includes the steps for performing the functions associated with the sections.
    Type: Grant
    Filed: May 17, 1999
    Date of Patent: January 22, 2002
    Assignee: NEC Corporation
    Inventors: Eiko Yamada, Hiroaki Hattori
  • Publication number: 20010047257
    Abstract: A method of assigning a similarity score representative of a similarity between a first speech signal and a second speech signal. The method includes generating a signal transformation responsive to both the first and second signals, determining a transformation score based on at least one characteristic of the generated transformation and calculating the similarity score as a function of the transformation score.
    Type: Application
    Filed: January 24, 2001
    Publication date: November 29, 2001
    Inventors: Gabriel Artzi, Yaron Paz, Yehuda Hershkovits
  • Patent number: 6314392
    Abstract: In a computerized method a continuous signal is segmented in order to determine statistically stationary units of the signal. The continuous signal is sampled at periodic intervals to produce a timed sequence of digital samples. Fixed numbers of adjacent digital samples are grouped into a plurality of disjoint sets or frames. A statistical distance between adjacent frames is determined. The adjacent sets are merged into a larger set of samples or cluster if the statistical distance is less than a predetermined threshold. In an iterative process, the statistical distance between the adjacent sets are determined, and as long as the distance is less than the predetermined threshold, the sets are iteratively merged to segment the signal into statistically stationary units.
    Type: Grant
    Filed: September 20, 1996
    Date of Patent: November 6, 2001
    Assignee: Digital Equipment Corporation
    Inventors: Brian S. Eberman, William D. Goldenthal
  • Patent number: 6301559
    Abstract: To realize a speech recognition method and speech recognition device that reduce erroneous recognition for words that are not to be recognized and ambient sounds, and to improve the recognition capability, characteristic parameters of words to be recognized and characteristic parameters of words that are not to be recognized and ambient sounds are previously entered in a vocabulary template 40. Degrees of similarity are obtained in a speech recognition section 30 between characteristic parameters for input words (or sounds) and all the characteristic parameters stored in the vocabulary template 40. Information indicating one characteristic parameter, from among the characteristic parameters stored in the vocabulary template 40, that is the closest approximation to the characteristic parameter of the input word (or sound), is generated as a recognition result.
    Type: Grant
    Filed: November 16, 1998
    Date of Patent: October 9, 2001
    Assignee: Oki Electric Industry Co., Ltd.
    Inventors: Hiroshi Shinotsuka, Noritoshi Hino
  • Patent number: 6292776
    Abstract: A method and apparatus for first training and then recognizing speech. The method and apparatus use subband cepstral features to improve the recognition string accuracy rates for speech inputs.
    Type: Grant
    Filed: March 12, 1999
    Date of Patent: September 18, 2001
    Assignee: Lucent Technologies Inc.
    Inventor: Rathinavelu Chengalvarayan
  • Patent number: 6269335
    Abstract: A method of identifying homophones of a word uttered by a user from at least a portion of existing words of a vocabulary of a speech recognition engine comprises the steps of: a user uttering the word; decoding the uttered word; computing respective measures between the decoded word and at least a portion of the other existing vocabulary words, the respective measures indicative of acoustic similarity between the word and the at least a portion of other existing words; if at least one measure is within a threshold range, indicating, to the user, results associated with the at least one measure, the results preferably including the decoded word and the other existing vocabulary word associated with the at least one measure; and the user preferably making a selection depending on the word the user intended to utter.
    Type: Grant
    Filed: August 14, 1998
    Date of Patent: July 31, 2001
    Assignee: International Business Machines Corporation
    Inventors: Abraham Ittycheriah, Stephane Herman Maes, Michael Daniel Monkowski, Jeffrey Scott Sorensen
  • Publication number: 20010010039
    Abstract: Apparatus for Mandarin Chinese speech recognition by using initial/final phoneme similarity vector, for improving the Chinese speech recognition accuracy and downsizing the needed memory is provided.
    Type: Application
    Filed: December 8, 2000
    Publication date: July 26, 2001
    Applicant: Matsushita Electrical Industrial Co., Ltd.
    Inventor: Chung-Ho Yang
  • Patent number: 6263309
    Abstract: A set of speaker dependent models is trained upon a comparatively large number of training speakers, one model per speaker, and model parameters are extracted in a predefined order to construct a set of supervectors, one per speaker. Principle component analysis is then performed on the set of supervectors to generate a set of eigenvectors that define an eigenvoice space. If desired, the number of vectors may be reduced to achieve data compression. Thereafter, a new speaker provides adaptation data from which a supervector is constructed by constraining this supervector to be in the eigenvoice space based on a maximum likelihood estimation. The resulting coefficients in the eigenspace of this new speaker may then be used to construct a new set of model parameters from which an adapted model is constructed for that speaker. Environmental adaptation may be performed by including environmental variations in the training data.
    Type: Grant
    Filed: April 30, 1998
    Date of Patent: July 17, 2001
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Patrick Nguyen, Roland Kuhn, Jean-Claude Junqua
  • Patent number: 6260012
    Abstract: An apparatus and method for performing improved speech recognition in a communication terminal, e.g., a mobile phone with a hands-free voice dialing function. In a speech recognition mode, a user's input speech such as a desired called party name, number or a phone command, is converted to feature data and compared to individual pre-stored feature data sets corresponding to pre-recorded speech obtained during a registration process. Difference values representing the respective differences between the current user's input speech and the respective data sets are computed. A first closest (most similar) and second closest feature data set correspond to the first smallest and second smallest difference values so obtained. A closeness threshold is computed as the sum of a small, predetermined threshold and a differential value between the first and second difference values.
    Type: Grant
    Filed: March 1, 1999
    Date of Patent: July 10, 2001
    Assignee: Samsung Electronics Co., LTD
    Inventor: Joung-Kyou Park
  • Patent number: 6256630
    Abstract: A database accessing system for processing a request to access a database including a multiplicity of entries, each entry including at least one word, the request including a sequence of representations of possibly erroneous user inputs, the system including a similar word finder operative, for at least one interpretation of each representation, to find at least one database word which is at least similar to that interpretation, and a database entry evaluator operative, for each database word found by the similar word finder, to assign similarity values for relevant entries in the database, said values representing the degree of similarity between each database entry and the request.
    Type: Grant
    Filed: June 17, 1999
    Date of Patent: July 3, 2001
    Assignee: Phonetic Systems Ltd.
    Inventors: Atzmon Gilai, Hezi Resnekov
  • Patent number: 6246982
    Abstract: A method for computing a distance between collections of distributions or finite mixture models of features. Data is processed so as to define at least first and second collections of distributions of features. For each distribution of the first collection, the distance to each distribution of the second collection is measured to determine which distribution of the second collection is the closest (most similar). The same procedure is performed for the distributions of the second collection. Based on the closest distance measures, a final distance is computed representing the distance between the first and second collections. This final distance may be a weighted sum of the closest distances. The distance measure may be used in a number of applications such as [speaker classification,] speaker recognition and audio segmentation.
    Type: Grant
    Filed: January 26, 1999
    Date of Patent: June 12, 2001
    Assignee: International Business Machines Corporation
    Inventors: Homayoon S. M. Beigi, Stephane H. Maes, Jeffrey S. Sorensen
  • Publication number: 20010003173
    Abstract: A method for increasing voice recognition rate in a voice recognition system comprising the steps of: establishing a reference model for user voices subjected to recognition; receiving the user voices for voice recognition commands; detecting the range and characteristics of the received voice data; comparing the range and characteristics of the detected voice data with the characteristics of the previously obtained reference voice model to retrieve a word having the largest similarity; comparing the similarity of the retrieved word with the similarity reference value to report a voice recognition failure when the compared result is below the reference value, and to report a voice recognition success and perform the command corresponding to the recognized word when the compared result is at least the reference value; and modifying the characteristics of the voice data which succeeded in the voice recognition into the reference voice model which was used in the corresponding voice recognition.
    Type: Application
    Filed: December 6, 2000
    Publication date: June 7, 2001
    Applicant: LG Electronics Inc.
    Inventor: Keun Ok Lim
  • Patent number: 6236964
    Abstract: A speech recognition method and apparatus in which a speech section is sliced by the unit of a word by spotting and candidate words are selected. Next, in a second stage, matching is conducted by the unit of a phoneme. Consequently, selection of the candidate words and slicing of the speech section can be performed concurrently. Furthermore, narrowing of the candidate words is facilitated. Furthermore, since reference phoneme patterns under a plurality of environments are prepared, recognition of an input speech under a larger number of conditions is possible using a smaller amount of data when compared with the case in which reference word patterns under a plurality of environments are prepared.
    Type: Grant
    Filed: February 14, 1994
    Date of Patent: May 22, 2001
    Assignee: Canon Kabushiki Kaisha
    Inventors: Junichi Tamura, Tetsuo Kosaka, Atsushi Sakurai
  • Patent number: 6211876
    Abstract: Methods and apparatus are provided for accessing an experience journal which includes unstructured text items relating to a topic, such as a medical condition. The method is implemented in a computer system including a processor, a storage device, a video display unit having a display screen, and a user interface. The unstructured text items are stored in the storage device. Similarities among the unstructured text items are determined, and icons, one corresponding to each of the unstructured text items, are displayed on the display screen. The icons are positioned on the display screen relative to each other, such that the distances between icons are representative of the determined similarities among the unstructured text items. In response to user selection of one of the icons, the corresponding unstructured text item is displayed on the display screen.
    Type: Grant
    Filed: June 22, 1998
    Date of Patent: April 3, 2001
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Edith Ackermann, Dennis Nathan Bromley, David Ray DeMaso, Sara Frances Frisken Gibson, Joseph Gonzalez-Heydrich, Judith Galler Karlin, Joseph Marks, Chia Shen, Carol Strohecker
  • Patent number: 6195634
    Abstract: Assessing decoys for use in an audio recognition process for identifying predetermined sounds in an unknown input audio signal, involves a test recognition process for matching known training audio signals to models representing the predetermined sounds and the decoys and determining for each of the decoys, from the results of the test recognition process, a score representing the effect of the respective decoy in the recognition of any of the known training audio signals. An advantage arising from generating scores for decoys is that the chance of a poor selection of decoys can be reduced. Thus the possibility of poor recognition performance arising from poorly selected decoys can be reduced. Furthermore, the requirement for expert input into the decoy creation process, which may be time consuming, can be reduced. This can make it easier, or quicker, or less expensive to install.
    Type: Grant
    Filed: December 24, 1997
    Date of Patent: February 27, 2001
    Assignee: Nortel Networks Corporation
    Inventors: Martin Dudemaine, Claude Pelletier
  • Patent number: 6192337
    Abstract: A method of training at least one new word for addition to a vocabulary of a speech recognition engine containing existing words comprises the steps of: a user uttering the at least one new word; computing respective measures between the at least one newly uttered word and at least a portion of the existing vocabulary words, the respective measures indicative of acoustic similarity between the at least one word and the at least a portion of existing words; if no measure is within the threshold range, automatically adding the at least one newly uttered word to the vocabulary; and if at least one measure is within a threshold range, refraining from automatically adding the at least one newly uttered word to the vocabulary.
    Type: Grant
    Filed: August 14, 1998
    Date of Patent: February 20, 2001
    Assignee: International Business Machines Corporation
    Inventors: Abraham Ittycheriah, Stephane H. Maes
  • Patent number: 6182039
    Abstract: The speech recognizer incorporates a language model that reduces the number of acoustic pattern matching sequences that must be performed by the recognizer. The language model is based on knowledge of a pre-defined set of syntactically defined content and includes a data structure that organizes the content according to acoustic confusability. A spelled name recognition system based on the recognizer employs a language model based on classes of letters that the recognizer frequently confuses for one another. The language model data structure is optionally an N-gram data structure, a tree data structure, or an incrementally configured network that is built during a training sequence. The incrementally configured network has nodes that are selected based on acoustic distance from a predetermined lexicon.
    Type: Grant
    Filed: March 24, 1998
    Date of Patent: January 30, 2001
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Luca Rigazio, Jean-Claude Junqua, Michael Galler
  • Patent number: 6157911
    Abstract: A method and a system substantially eliminates an erroneous voice recognition of repetitive elements in word spotting. One preferred embodiment according to the current invention eliminates erroneous voice recognition of repetitive elements by selectively prolonging a response time of words containing repetitive elements. In order to substantially eliminate the errors, in another preferred embodiment according to the current invention, words containing repetitive elements are marked by a silent key word.
    Type: Grant
    Filed: March 27, 1998
    Date of Patent: December 5, 2000
    Assignee: Ricoh Company, Ltd.
    Inventor: Masaru Kuroda
  • Patent number: 6151575
    Abstract: A source-adapted model for use in speech recognition is generated by defining a linear relationship between a first element of an initial model and a first element of the source-adapted model. Thereafter, speech data that corresponds to the first element of the initial model is assembled from a set of speech data for a particular source associated with the source-adapted model. A linear transform that maps between the assembled speech data and the first element of the initial model is then determined. Finally, a first element of the source-adapted model is produced from the first element of the initial model using the linear transform.
    Type: Grant
    Filed: October 28, 1997
    Date of Patent: November 21, 2000
    Assignee: Dragon Systems, Inc.
    Inventors: Michael Jack Newman, Laurence S. Gillick, Venkatesh Nagesha
  • Patent number: 6151576
    Abstract: Methods and apparatus of processing, storing and transmitting an original data stream of digitized speech samples. The method converts a stream of digitized speech samples to a stream of text and associated reliability measures. A mixed-media data stream is created with the stream of text as a text component and selected portions of the digitized stream of speech as a speech component. The selected portions are those whose corresponding reliability measures fall below a threshold. The threshold can be changed to change the amount of storage or bandwidth used by the mixed-media data stream. The mixed-media data stream can be searched and the results can be spoken as synthetic speech derived form the text component or as speech samples taken from the digitized speech component.
    Type: Grant
    Filed: August 11, 1998
    Date of Patent: November 21, 2000
    Assignee: Adobe Systems Incorporated
    Inventors: John E. Warnock, T. V. Raman
  • Patent number: 6134527
    Abstract: A method of testing a new vocabulary word is performed using any set of enrollment utterances provided by the user or from an available database. The present method preferably does not use separate training and similarity test utterances. This allows any or all available repetitions of a vocabulary word being enrolled to be used for training (204), therefore improving the robustness of the trained models. Likewise, any or all training repetitions can also be utilized for similarity analysis (212), providing additional test samples which should further improve the detection of acoustically similar words. Additionally, the similarity analysis progresses incrementally and does not need to continue if a confusable word is found. Finally, first and second thresholds could be employed (212, 302) to provide greater flexibility for a user training a speech recognition system.
    Type: Grant
    Filed: January 30, 1998
    Date of Patent: October 17, 2000
    Assignee: Motorola, Inc.
    Inventors: Jeffrey Arthur Meunier, Edward Srenger, Steven Albrecht
  • Patent number: 6094632
    Abstract: A speaker recognition device for judging whether or not an unknown speaker is an authentic registered speaker himself/herself executes `text verification using speaker independent speech recognition` and `speaker verification by comparison with a reference pattern of a password of a registered speaker`. A presentation section instructs the unknown speaker to input an ID and utter a specified text designated by a text generation section and a password. The `text verification` of the specified text is executed by a text verification section, and the `speaker verification` of the password is executed by a similarity calculation section. The judgment section judges that the unknown speaker is the authentic registered speaker himself/herself if both the results of the `text verification` and the `speaker verification` are affirmative.
    Type: Grant
    Filed: January 29, 1998
    Date of Patent: July 25, 2000
    Assignee: NEC Corporation
    Inventor: Hiroaki Hattori
  • Patent number: 6052662
    Abstract: Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.
    Type: Grant
    Filed: January 29, 1998
    Date of Patent: April 18, 2000
    Assignee: Regents of the University of California
    Inventor: John E. Hogden
  • Patent number: 6018736
    Abstract: A database accessing system for processing a request to access a database including a multiplicity of entries, each entry including at least one word, the request including a sequence of representations of possibly erroneous user inputs, the system including a similar word finder operative, for at least one interpretation of each representation, to find at least one database word which is at least similar to that interpretation, and a database entry evaluator operative, for each database word found by the similar word finder, to assign similarity values for relevant entries in the database, said values representing the degree of similarity between each database entry and the request.
    Type: Grant
    Filed: November 20, 1996
    Date of Patent: January 25, 2000
    Assignee: Phonetic Systems Ltd.
    Inventors: Atzmon Gilai, Hezi Resnekov
  • Patent number: 6006182
    Abstract: Systems and methods consistent with the present invention determine whether to accept one of a plurality of intermediate recognition results output by a speech recognition system as a final recognition result. The system first combines a plurality of speech rejection features into a feature function in which weights are assigned to each rejection feature in accordance with a recognition accuracy of each rejection feature. Feature values are then calculated for each of the rejection features using the plurality of intermediate recognition results. The system next computes the feature function according to the calculated feature values to determine a rejection decision value. Finally, one of the plurality of intermediate recognition results is accepted as the final recognition result according to the rejection decision value.
    Type: Grant
    Filed: September 22, 1997
    Date of Patent: December 21, 1999
    Assignee: Northern Telecom Limited
    Inventors: Waleed Fakhr, Serge Robillard, Vishwa Gupta, Real Tremblay, Michael Sabourin, Jean-Francois Crespo
  • Patent number: 5999902
    Abstract: A recognizer is provided with a priori probability values (e.g., from some previous recognition) indicating how likely the various words of the recognizer's vocabulary are to occur in the particular context, and recognition "scores" are weighted by these values before a result (or results) is chosen. The recognizer also employs "pruning" whereby low-scoring partial results are discarded, so as to speed the recognition process. To avoid premature pruning of the more likely words, probability values are applied before the pruning decisions are made. A method of applying these probability values is described.
    Type: Grant
    Filed: July 16, 1997
    Date of Patent: December 7, 1999
    Assignee: British Telecommunications public Limited Company
    Inventors: Francis James Scahill, Alison Diane Simons, Steven John Whittaker
  • Patent number: 5991721
    Abstract: An apparatus and a method for processing a natural language arranged so as to improve the speech recognition rate. In an example search section, the degree of similarity between each of a plurality of examples of the actual use of the language stored in an example data base and each of a plurality of probable recognition results output from a recognition section, and one of the examples corresponding to the highest degree of similarity is selected. A final speech recognition result is obtained by using the selected example. The example search section calculates the degree of similarity by weighting the degree of similarity on the basis of a context according to at least one of the examples previously selected.
    Type: Grant
    Filed: May 29, 1996
    Date of Patent: November 23, 1999
    Assignee: Sony Corporation
    Inventors: Yasuharu Asano, Masao Watari, Makoto Akabane, Tetsuya Kagami, Kazuo Ishii, Miyuki Tanaka, Yasuhiko Kato, Hiroshi Kakuda, Hiroaki Ogawa
  • Patent number: 5987411
    Abstract: Methods and systems consistent with the present invention enroll a candidate phrase uttered by a user in a dictionary having at least one previously enrolled phrase. The system receives utterances of the candidate phrase and determines whether the first utterance is confusingly similar to a previously enrolled phrase and whether they are consistent with each other. The system then enrolls the candidate phrase in the dictionary according to these determinations.
    Type: Grant
    Filed: December 17, 1997
    Date of Patent: November 16, 1999
    Assignee: Northern Telecom Limited
    Inventors: Marco Petroni, Hung S. Ma
  • Patent number: 5970450
    Abstract: A speech recognition system, in which partial reference patterns, and cumulative similarities of these patterns, are stored in a temporary pattern memory. The partial reference patterns are to be used as subjects of a similarity computation with an input speech pattern that has its feature quantities extracted by a speech analyzing unit. A counting unit counts partial reference patterns having corresponding cumulative similarities that are higher than a threshold value stored in a threshold memory. A threshold computing unit computes a threshold of pruning from a correspondence relation between the number of partial reference patterns that have corresponding cumulative similarities that exceed the threshold, and the threshold. A similarity computing unit computes a similarity, with respect to the feature quantities, of partial reference patterns with corresponding cumulative similarities that are greater than the threshold of pruning.
    Type: Grant
    Filed: November 24, 1997
    Date of Patent: October 19, 1999
    Assignee: NEC Corporation
    Inventor: Hiroaki Hattori
  • Patent number: 5953699
    Abstract: A speech recognition apparatus has an analysis section that outputs features of input speech as a time sequence of feature vectors defined for discrete time points corresponding to a processed speech frame. Reference paradigm utterances are converted into a time sequence of standard (reference) feature vectors. The possible continuous variation of standard feature vectors at each point in time is expressed by a line segment, or set of line segments, connecting the feature vectors for the two end points of the "movable" range within which the feature can change, rather than using a larger set of reference vectors as in a conventional multitemplate approach to speech recognition. For example, the continuous range of possible background noise levels in input speech defines a line segment connecting the two feature vectors at the two SNR value limits.
    Type: Grant
    Filed: October 28, 1997
    Date of Patent: September 14, 1999
    Assignee: NEC Corporation
    Inventor: Keizaburo Takagi
  • Patent number: 5878390
    Abstract: A speech recognition apparatus which includes a speech recognition section for performing a speech recognition process on an uttered speech with reference to a predetermined statistical language model, based on a series of speech signal of the uttered speech sentence composed of a series of input words. The speech recognition section calculates a functional value of a predetermined erroneous sentence judging function with respect to speech recognition candidates, where the erroneous sentence judging representing a degree of unsuitability for the speech recognition candidates. When the calculated functional value exceeds a predetermined threshold value, the speech recognition section performs the speech recognition process by eliminating a speech recognition candidate corresponding to a calculated functional value.
    Type: Grant
    Filed: June 23, 1997
    Date of Patent: March 2, 1999
    Assignee: ATR Interpreting Telecommunications Research Laboratories
    Inventors: Jun Kawai, Yumi Wakita
  • Patent number: 5848389
    Abstract: In a speech recognizing apparatus, a grammatical qualification of a proposed speech recognition result candidate is judged without using a grammatical rule. The speech recognizing apparatus for performing sentence/speech recognition is comprised of an analyzing unit for acoustically analyzing speech inputted therein to extract a feature parameter of the inputted speech; a recognizing unit for recognizing the inputted speech based upon the feature parameter outputted from said analyzing unit to thereby a plurality of proposed recognition result candidates; an example data base for storing therein a plurality of examples; and an example retrieving unit for calculating a resemblance degree between each of said plurality of proposed recognition result candidates and each of the plural examples stored in the example data base and for obtaining the speech recognition result based on said calculated resemblance degree.
    Type: Grant
    Filed: April 5, 1996
    Date of Patent: December 8, 1998
    Assignee: Sony Corporation
    Inventors: Yasuharu Asano, Hiroaki Ogawa, Yasuhiko Kato, Tetsuya Kagami, Masao Watari, Makoto Akabane, Kazuo Ishii, Miyuki Tanaka, Hiroshi Kakuda
  • Patent number: 5848388
    Abstract: A recognition system includes a speech recognition processing unit for processing input speech signals to indicate similarity to predetermined patterns to be recognized. The recognition processing unit is arranged to repeatedly partition the input speech signal into a pattern-containing portion and, preceding and following the pattern-containing portions, noise or silence portions, and to identify a pattern corresponding to the pattern containing portion. An output supplies a recognition signal indicating recognition of one of the patterns. A pause detector detects the noise or silence portion which follows the pattern-containing portion. In response to its detection, a signal identifying the pattern currently corresponding to the pattern portion is supplied to the output. Also provided are similarly operating rejection portions.
    Type: Grant
    Filed: December 19, 1995
    Date of Patent: December 8, 1998
    Assignee: British Telecommunications plc
    Inventors: Kevin Joseph Power, Stephen Howard Johnson, Francis James Scahill, Simon Patrick Ringland, John Edward Talintyre
  • Patent number: 5842161
    Abstract: A recognition criterion or set of recognition criteria are updated automatically, over time, in accordance with the speech input of the user(s). Each input utterance is compared to one or more models of speech to determine a similarity metric for each such comparison. A model of speech which most closely matches the utterance is determined based on the one or more similarity metrics. The similarity metric corresponding to the most closely matching model of speech is analyzed to determine whether the similarity metric satisfies the selected set of recognition criteria. The recognition criteria are automatically altered during use or "on-the-fly", so that more appropriate criteria (and associated thresholds) may be used to either increase the probability of recognition or decrease the incidence of false positive results. Illustratively, if a voice sample results in a near miss of a template, a more liberal criterion is thereafter employed to increase the probability of recognition for subsequent input.
    Type: Grant
    Filed: June 25, 1996
    Date of Patent: November 24, 1998
    Assignee: Lucent Technologies Inc.
    Inventors: Paul Wesley Cohrs, Mitra P. Deldar, Donald Marion Keen, Ellen Anne Keen
  • Patent number: 5828998
    Abstract: A discriminant or identification function is used for pattern recognition in which the highest performance can be offered when adaptation is made. Learning is carried out while a discriminant or identification function is adapted to a learning sample. For example, a standard pattern of the character "A" used as an identification function is learned such that when the character "A" slanting in the right or left direction is input, the standard pattern of the character "A" is rotated (adapted) in accordance with the slanting of the input learning sample.
    Type: Grant
    Filed: September 24, 1996
    Date of Patent: October 27, 1998
    Assignee: Sony Corporation
    Inventor: Naoto Iwahashi
  • Patent number: 5822728
    Abstract: The multistage word recognizer uses a word reference representation based on reliably detected peaks of phoneme similarity values. The word reference representation captures the basic features of the words by targets that describe the location and shape of stable peaks of phoneme similarity values. The first stage of the word hypothesizer represents each reference word with statistical information on the number of high similarity regions over a predefined number of time intervals. The second stage represents each word by a prototype that consists of a series of phoneme targets and global statistics, namely the average word duration and average match rate. These represent the degree of fit of the word prototype to its training data. Word recognition scores generated in the two stages are converted to dimensionless normalized values and combined by averaging for use in selecting the most probable word candidates.
    Type: Grant
    Filed: September 8, 1995
    Date of Patent: October 13, 1998
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Ted H. Applebaum, Philippe R. Morin
  • Patent number: 5819219
    Abstract: A digital signal processor employable for utilization for speech processing or for some other pattern recognition overcomes the weaknesses of digital signal processors given the subtraction with following amount formation that must often be implemented in these applications, an auxiliary hardware is provided that contains the feature vector that is to be compared to reference feature vectors from the dictionary in a separate memory. The calculating work is thereby implemented by a separate arithmetic unit that provides a separate difference-forming and amount-forming unit for each feature comparison. The number of clock cycles of the digital signal processor required per comparison can be dramatically reduced by the invention. A suitable addressing method thereby assures that it is always corresponding features of the individual feature vectors that can be compared to one another.
    Type: Grant
    Filed: December 11, 1996
    Date of Patent: October 6, 1998
    Assignee: Siemens Aktiengesellschaft
    Inventors: Luc De Vos, Daniel Goryn
  • Patent number: 5809461
    Abstract: A speech recognition apparatus using a neural network is provided. A neuron-like element stores a value of its inner conditions. The neuron-like element also updates a value of its internal status on the basis of an output from the neuron-like element itself, outputs from other neuron-like elements and an external input outside. The neuron-like element also converts a value of its internal status into an external output. Accordingly, the neuron-like element itself can retain the history of input data. This enables time series data, such as speech, to be processed without providing any special devices in the neural network.
    Type: Grant
    Filed: June 7, 1995
    Date of Patent: September 15, 1998
    Assignee: Seiko Epson Corporation
    Inventor: Mitsuhiro Inazumi
  • Patent number: 5806024
    Abstract: Harmonics coefficients are estimated in primary coefficients of an orthogonal transform of a speech or a music input signal by using a pitch frequency extracted from the input signal and are quantized into a harmonics code vector. Residue coefficients are calculated by removing the harmonics coefficients from the primary coefficients and quantized into residue code vectors and gain code vectors. It is possible to search harmonics excitation pulses at the harmonics locations for harmonics quantization into the harmonics code vector. On the other hand, it is possible to estimate the harmonics coefficients or excitation pulses by using quantized LSP parameters and to calculate secondary coefficients for use in weighting the harmonics quantization and residue quantization and, if applicable, in excitation pulse search.
    Type: Grant
    Filed: December 23, 1996
    Date of Patent: September 8, 1998
    Assignee: NEC Corporation
    Inventor: Kazunori Ozawa
  • Patent number: 5806028
    Abstract: A method and device for determining quality of speech. The speech to be evaluated is listened to by a person who reproduces the speech. The end of vowel sounds in the produced and reproduced speech respectively are determined. The difference between the ends of the vowel sounds is registered. From the obtained time differences an average value is determined. The average value indicates the quality of the produced speech. The invention can be used for evaluation of different speech sources.
    Type: Grant
    Filed: February 14, 1996
    Date of Patent: September 8, 1998
    Assignee: Telia AB
    Inventor: Bertil Lyberg
  • Patent number: 5799274
    Abstract: A speech recognition system and method having an increased recognition accuracy for a compound word composed of a first word and a second word. Standard information corresponding to each of the registered words is stored. The standard information includes predetermined feature information and time information with respect to each of the registered words. The time information represents a continuous time length for pronouncing each of the registered words at a normal speed. Feature information extracted from an input word is compared with the standard information to obtain a similarity between the feature information and the standard information corresponding to one of the registered words. A determination time is set to determine a result of recognition when the compound word is input and when a first degree of similarity is obtained from the first word at a first time and a maximum degree of similarity is obtained from one of the second word and the compound word at a second time.
    Type: Grant
    Filed: September 18, 1996
    Date of Patent: August 25, 1998
    Assignee: Ricoh Company, Ltd.
    Inventor: Masaru Kuroda