Similarity Patents (Class 704/239)
  • Publication number: 20030154077
    Abstract: When a user issued voice command does not match grammars registered in advance, the voice command is identified as a sentence (step S305). This sentence is compared with the registered grammars to calculate a similarity (step S307). When the similarity is higher than a first threshold value (TH1), the voice command is executed (step S315). When the similarity is equal to or lower than the first threshold value (TH1) and higher than a second threshold value (TH2), command choices are displayed for the user and the user is permitted to select a command to be executed (step S319). When the similarity is equal to or lower than the second threshold value (TH2), the command is not executed (step S321). Furthermore, once a command has been executed it is added as a grammar, so that it can be identified when next it is used.
    Type: Application
    Filed: February 10, 2003
    Publication date: August 14, 2003
    Applicant: International Business Machines Corporation
    Inventors: Yoshinori Tahara, Daisuke Tomoda, Kikuo Mitsubo, Yoshinori Atake
  • Patent number: 6594392
    Abstract: The present invention is a method and apparatus to determine a similarity measure between first and second patterns. First and second storages store first and second feature vectors which represent the first and second patterns, respectively. A similarity estimator is coupled to the first and second storages to compute a similarity probability of the first and second feature vectors using a piecewise linear probability density function (PDF). The similarity probability corresponds to the similarity measure.
    Type: Grant
    Filed: May 17, 1999
    Date of Patent: July 15, 2003
    Assignee: Intel Corporation
    Inventor: Umberto Santoni
  • Patent number: 6577996
    Abstract: A method and apparatus for objectively evaluating sound quality of a signal processor or transmission channel. The present invention analyzes the distortion in a series of test sound frames compared to a series of sample sound frames. The invention detects sequences of test sound frames having distortion levels that are greater than a temporal distortion threshold and calculates an average length and a maximum length of these sequences. The present invention also detects individual test sound frames having distortion levels that are greater than an outlier distortion threshold and calculates a percentage of these frames present in the series of test sound frames. Further, the present invention calculates the average distortion level in the series of test sound frames and a variance of the distortion level in the test sound frames.
    Type: Grant
    Filed: December 8, 1998
    Date of Patent: June 10, 2003
    Assignee: Cisco Technology, Inc.
    Inventor: Ramanathan T. Jagadeesan
  • Patent number: 6560575
    Abstract: An apparatus is provided for checking the consistency between two training words which can be used in, for example, a speech recognition or verification system. Two training examples are aligned using a dynamic programming alignment process and an average frame score is calculated from the alignment results together with the worst score in a number of consecutive frames. These values are then compared with similar values obtained from training examples which are known to be consistent to determine if the training examples are consistent.
    Type: Grant
    Filed: September 30, 1999
    Date of Patent: May 6, 2003
    Assignee: Canon Kabushiki Kaisha
    Inventor: Robert Alexander Keiller
  • Publication number: 20030065510
    Abstract: A similarity evaluation program capable of determining similarity between probability models at a high speed (with little calculation) is disclosed. The similarity evaluation program is implemented on an apparatus such as a computer for evaluating similarity between a pair of probability model information each including a plurality of probability information constituted by a plurality of types of data, and this apparatus is provided with a dynamic programming operation unit for performing arithmetic processing based on dynamic programming techniques using a similarity value indicating similarity between probability information included in one of the pair probability model information and probability information included in the other of the pair of probability model information as an indicator for selecting a path.
    Type: Application
    Filed: March 28, 2002
    Publication date: April 3, 2003
    Applicant: Fujitsu Limited
    Inventor: Makihiko Sato
  • Patent number: 6539351
    Abstract: A method is provided for generating a high dimensional density model within an acoustic model for one of a speech and a speaker recognition system. Acoustic data obtained from at least one speaker is transformed into high dimensional feature vectors. The density model is formed to model the feature vectors by a mixture of compound Gaussians with a linear transform, wherein each compound Gaussian is associated with a compound Gaussian prior and models each coordinate of each component of the density model independently by a univariate Gaussian mixture comprising a univariate Gaussian prior, variance, and mean. An iterative expectation maximization (EM) method is applied to the feature vectors. The EM method includes the step of computing an auxiliary function Q of the EM method.
    Type: Grant
    Filed: May 5, 2000
    Date of Patent: March 25, 2003
    Assignee: International Business Machines Corporation
    Inventors: Scott Shaobing Chen, Ramesh Ambat Gopinath
  • Patent number: 6535850
    Abstract: In a speech training and recognition system, the current invention detects and warns the user about the similar sounding entries to vocabulary and permits entry of such confusingly similar terms which are marked along with the stored similar terms to identify the similar words. In addition, the states in similar words are weighted to apply more emphasis to the differences between similar words than the similarities of such words. Another aspect of the current invention is to use modified scoring algorithm to improve the recognition performance in the case where confusing entries were made to the vocabulary despite the warning. Yet another aspect of the current invention is to detect and warn the user about potential problems with new entries such as short words and two or more word entries with long silence periods in between words. Finally, the current invention also includes alerting the user about the dissimilarity of the multiple tokens of the same vocabulary item in the case of multiple-token training.
    Type: Grant
    Filed: March 9, 2000
    Date of Patent: March 18, 2003
    Assignee: Conexant Systems, Inc.
    Inventor: Aruna Bayya
  • Patent number: 6507815
    Abstract: A group of words to be registered in a word dictionary are sorted in order of sound models to produce a word list. A tree-structure word dictionary in which sound models at head part of the words are shared among the words, is prepared using this word list. Each node having a different set of reachable words from a parent node holds word information including a minimum out of word IDs of words reachable from that node, and the number of words reachable from that node. For searching for a word matching with speech input, language likelihoods are looked ahead using this word information. The word matching with the speech input can be recognized efficiently, using such a tree-structure word dictionary and a look-ahead method of language likelihood.
    Type: Grant
    Filed: March 29, 2000
    Date of Patent: January 14, 2003
    Assignee: Canon Kabushiki Kaisha
    Inventor: Hiroki Yamamoto
  • Patent number: 6496800
    Abstract: A speaker verification system using the voice of a user uttering a continuous, random length digit string is provided. The speaker verification system includes a random digit generator for generating a continuous, random length digit string; a user interface for providing the continuous, random length digit string; a feature extractor for extracting voice features from the user's voice uttering the continuous, random length digit string; a digit voice verification unit for comparing the voice features with items in a speaker-independent continuous digit voice model to derive a digit string corresponding to items in the speaker-independent continuous digit voice model, which match the voice features, and for determining whether the derived digit string is identical to the digit string provided to the user via the user interface; and a speaker verification unit for comparing the voice features with a speaker-dependent model of the user to measure the similarity between them.
    Type: Grant
    Filed: May 1, 2000
    Date of Patent: December 17, 2002
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Byung-goo Kong, Sang-ryong Kim
  • Patent number: 6490559
    Abstract: The distance computation represents a central, constantly recurrent task in sample and speech recognition. It is used in speech recognition as a degree of similarity between a part of a speech utterance and a speech reference. In picture processing and sample recognition, it is used for data compression. The distance computation requires the longest computation time so that a reduction of the computation time results in a considerable efficiency improvement. A reduction of the computation time is achieved by the integration of the distance computation in a memory module in which particularly the reference data are stored. Due to this integration, the other components of the overall system are relieved of this constantly recurrent task and are available for more complex processes in this period of time. This integration makes the distance computation essentially shorter because the communication between memory sections and computation unit takes place directly without utilizing a busy system.
    Type: Grant
    Filed: October 13, 1998
    Date of Patent: December 3, 2002
    Assignee: Koninklijke Philips Electronics N.V.
    Inventors: Wolfgang O. Budde, Volker Steinbiss
  • Patent number: 6470314
    Abstract: A method of adapting a speech recognition system to one or more acoustic conditions comprises the steps of: (i) computing cumulative distribution functions based on dimensions of speech vectors associated with training speech data provided to the speech recognition system; (ii) computing cumulative distribution functions based on dimensions of speech vectors associated with test speech data provided to the speech recognition system; (iii) computing a nonlinear transformation mapping based on the cumulative distribution functions associated with the training speech data and the cumulative distribution functions associated with the test speech data; and (iv) applying the nonlinear transformation mapping to speech vectors associated with the test speech data prior to recognition, wherein the speech vectors transformed in accordance with the nonlinear transformation mapping are substantially similar to speech vectors associated with the training speech data.
    Type: Grant
    Filed: April 6, 2000
    Date of Patent: October 22, 2002
    Assignee: International Business Machines Corporation
    Inventors: Satyanarayana Dharanipragada, Mukund Padmanabhan
  • Patent number: 6427134
    Abstract: A voice activity detector suitable for deployment in a mobile phone apparatus is disclosed. An advantage of the voice activity detector is that it is better able to provide a decision (79) as to whether an input signal (19) consists of noise (which it is not desired to transmit) or comprises speech or information tones (which are required to be transmitted), especially in noisy environments. The voice activity detector includes a number of components, in particular an auxiliary voice activity detector (3). The auxiliary voice activity detector (3) distinguishes between noise and speech on the basis that the spectrum of speech changes more rapidly than that of noise. This results in the auxiliary detector (3) rarely mistaking a speech signal to be a noise signal. Hence, a very reliable noise template (421) is obtained. For this reason, the auxiliary detector (3) is also useful in noise reduction applications. The voice activity detector also uses a neural net classifier (7).
    Type: Grant
    Filed: September 26, 1998
    Date of Patent: July 30, 2002
    Assignee: British Telecommunications public limited company
    Inventors: Neil Robert Garner, Paul Alexander Barrett
  • Patent number: 6421640
    Abstract: The invention relates to a method of automatically recognizing speech utterances, in which a recognition result is evaluated by means of a first confidence measure and a plurality of second confidence measures determined for a recognition result is automatically combined for determining the first confidence measure. To reduce the resultant error rate in the assessment of the correctness of a recognition result, the method is characterized in that the determination of the parameters weighting the combination of the second confidence measures is based on a minimization of a cross-entropy-error measure. A further improvement is achieved by means of a post-processing operation based on the maximization of the Gardner-Derrida error function.
    Type: Grant
    Filed: September 13, 1999
    Date of Patent: July 16, 2002
    Assignee: Koninklijke Philips Electronics N.V.
    Inventors: Jannes G. A. Dolfing, Andreas Wendemuth
  • Patent number: 6411929
    Abstract: Frames making up an input speech are each collated with a string of phonemes representing speech candidates to be recognized, whereby evaluation values regarding the phonemes are computed. The frames are each compared with part of the phoneme string so as to reduce computations and memory capacity required in recognizing the input speech based on the evaluation values. That is, each frame is compared with a portion of the phoneme string to acquire an evaluation value for each phoneme. If the acquired evaluation value meets a predetermined condition, part of the phonemes to be collated with the next frame are changed. Illustratively, if the evaluation value for the phoneme heading a given portion of collated phonemes is smaller than the evaluation value of the phoneme which terminates that phoneme portion, then the head phoneme is replaced by the next phoneme. The new portion of phonemes obtained by the replacement is used for collation with the next frame.
    Type: Grant
    Filed: July 26, 2000
    Date of Patent: June 25, 2002
    Assignee: Hitachi, Ltd.
    Inventors: Kazuyoshi Ishiwatari, Kazuo Kondo, Shinji Wakisaka
  • Patent number: 6408271
    Abstract: The invention relates to a method and apparatus for generating phrasal transcriptions. The invention provides generating a group of word transcriptions for each vocabulary item in an orthographic phrase. According to a first embodiment, the invention further provides permuting the word transcriptions to generate a plurality of phrasal transcriptions and computing a score for each phrasal transcription in the plurality of phrasal transcriptions. The set of phrasal transcriptions is then selected from the plurality of phrasal transcriptions at least in part on a basis of the score data elements and stored in a format suitable for use by a speech recognition dictionary. As a variant, the phrasal transcriptions may be released in a format suitable for use by a speech synthesizer.
    Type: Grant
    Filed: September 24, 1999
    Date of Patent: June 18, 2002
    Assignee: Nortel Networks Limited
    Inventors: Kenneth W. Smith, Michael G. Sabourin
  • Patent number: 6404925
    Abstract: Methods for segmenting audio-video recording of meetings containing slide presentations by one or more speakers are described. These segments serve as indexes into the recorded meeting. If an agenda is provided for the meeting, these segments can be labeled using information from the agenda. The system automatically detects intervals of video that correspond to presentation slides. Under the assumption that only one person is speaking during an interval when slides are displayed in the video, possible speaker intervals are extracted from the audio soundtrack by finding these regions. Since the same speaker may talk across multiple slide intervals, the acoustic data from these intervals is clustered to yield an estimate of the number of distinct speakers and their order. Clustering the audio data from these intervals yields an estimate of the number of different speakers and their order.
    Type: Grant
    Filed: March 11, 1999
    Date of Patent: June 11, 2002
    Assignees: Fuji Xerox Co., Ltd., Xerox Corporation
    Inventors: Jonathan T. Foote, Lynn Wilcox
  • Patent number: 6397086
    Abstract: The disclosure relates to an infrared controlled hand-free operator for cellular phones housed in a vehicle. More particularly, it is concerned with a hand-free operator which can automatically transmit an infrared signal similar to a control signal of a vehicle's audio stereo system to turn the stereo system off or mute when income signals are received by a cellular phone mounted to a vehicle. The hand-free operator thus can automatically cut off the noise sound of an audio stereo system in operation so as to make the operation of a cellular phone free of interference in a vehicle on the move.
    Type: Grant
    Filed: June 22, 1999
    Date of Patent: May 28, 2002
    Assignee: E-Lead Electronic Co., Ltd.
    Inventor: Tonny Chen
  • Patent number: 6393397
    Abstract: An apparatus for selecting a cohort model for use in a speaker verification system includes a model generator (108) for determining a target speaker model (114) from a speech sample collected from the target speaker (106). A cohort selector (110) determines a similarity value between each of a number of predetermined existing speaker models from a model pool (112) and the target speaker model (114) and a dissimilarity value between each of the existing speaker models and any previously selected cohort models (116). An existing speaker model which is most similar to the target speaker model, but most dissimilar to previously chosen cohort models, is then chosen as another cohort model for the target speaker.
    Type: Grant
    Filed: June 14, 1999
    Date of Patent: May 21, 2002
    Assignee: Motorola, Inc.
    Inventors: Ho Chuen Choi, Xiaoyuan Zhu, Jianming Song
  • Patent number: 6349148
    Abstract: The invention relates to a device for the verification of time-dependent, user-specific signals which includes means for generating a set of feature vectors which serve to provide an approximative description of an input signal and are associated with selectable sampling intervals of the signal; means for preparing an HMM model for the signal; means for determining a first probability value which describes the probability of occurrence of the set of feature vectors, given the HMM model, and a threshold decider for comparing the first probability value with a threshold value and for deciding on the verification of the signal.
    Type: Grant
    Filed: May 26, 1999
    Date of Patent: February 19, 2002
    Assignee: U.S. Philips Corporation
    Inventor: Jannes G.A. Dolfing
  • Patent number: 6341263
    Abstract: A voice recognition system, method and storage medium is provided. The system includes a plurality of storage sections, a selection section, an adaptation section, a plurality of calculation sections, an adaptation section, a normalization section and a decision section. The method includes the steps for performing the functions associated with the sections.
    Type: Grant
    Filed: May 17, 1999
    Date of Patent: January 22, 2002
    Assignee: NEC Corporation
    Inventors: Eiko Yamada, Hiroaki Hattori
  • Publication number: 20010047257
    Abstract: A method of assigning a similarity score representative of a similarity between a first speech signal and a second speech signal. The method includes generating a signal transformation responsive to both the first and second signals, determining a transformation score based on at least one characteristic of the generated transformation and calculating the similarity score as a function of the transformation score.
    Type: Application
    Filed: January 24, 2001
    Publication date: November 29, 2001
    Inventors: Gabriel Artzi, Yaron Paz, Yehuda Hershkovits
  • Patent number: 6314392
    Abstract: In a computerized method a continuous signal is segmented in order to determine statistically stationary units of the signal. The continuous signal is sampled at periodic intervals to produce a timed sequence of digital samples. Fixed numbers of adjacent digital samples are grouped into a plurality of disjoint sets or frames. A statistical distance between adjacent frames is determined. The adjacent sets are merged into a larger set of samples or cluster if the statistical distance is less than a predetermined threshold. In an iterative process, the statistical distance between the adjacent sets are determined, and as long as the distance is less than the predetermined threshold, the sets are iteratively merged to segment the signal into statistically stationary units.
    Type: Grant
    Filed: September 20, 1996
    Date of Patent: November 6, 2001
    Assignee: Digital Equipment Corporation
    Inventors: Brian S. Eberman, William D. Goldenthal
  • Patent number: 6301559
    Abstract: To realize a speech recognition method and speech recognition device that reduce erroneous recognition for words that are not to be recognized and ambient sounds, and to improve the recognition capability, characteristic parameters of words to be recognized and characteristic parameters of words that are not to be recognized and ambient sounds are previously entered in a vocabulary template 40. Degrees of similarity are obtained in a speech recognition section 30 between characteristic parameters for input words (or sounds) and all the characteristic parameters stored in the vocabulary template 40. Information indicating one characteristic parameter, from among the characteristic parameters stored in the vocabulary template 40, that is the closest approximation to the characteristic parameter of the input word (or sound), is generated as a recognition result.
    Type: Grant
    Filed: November 16, 1998
    Date of Patent: October 9, 2001
    Assignee: Oki Electric Industry Co., Ltd.
    Inventors: Hiroshi Shinotsuka, Noritoshi Hino
  • Patent number: 6292776
    Abstract: A method and apparatus for first training and then recognizing speech. The method and apparatus use subband cepstral features to improve the recognition string accuracy rates for speech inputs.
    Type: Grant
    Filed: March 12, 1999
    Date of Patent: September 18, 2001
    Assignee: Lucent Technologies Inc.
    Inventor: Rathinavelu Chengalvarayan
  • Patent number: 6269335
    Abstract: A method of identifying homophones of a word uttered by a user from at least a portion of existing words of a vocabulary of a speech recognition engine comprises the steps of: a user uttering the word; decoding the uttered word; computing respective measures between the decoded word and at least a portion of the other existing vocabulary words, the respective measures indicative of acoustic similarity between the word and the at least a portion of other existing words; if at least one measure is within a threshold range, indicating, to the user, results associated with the at least one measure, the results preferably including the decoded word and the other existing vocabulary word associated with the at least one measure; and the user preferably making a selection depending on the word the user intended to utter.
    Type: Grant
    Filed: August 14, 1998
    Date of Patent: July 31, 2001
    Assignee: International Business Machines Corporation
    Inventors: Abraham Ittycheriah, Stephane Herman Maes, Michael Daniel Monkowski, Jeffrey Scott Sorensen
  • Publication number: 20010010039
    Abstract: Apparatus for Mandarin Chinese speech recognition by using initial/final phoneme similarity vector, for improving the Chinese speech recognition accuracy and downsizing the needed memory is provided.
    Type: Application
    Filed: December 8, 2000
    Publication date: July 26, 2001
    Applicant: Matsushita Electrical Industrial Co., Ltd.
    Inventor: Chung-Ho Yang
  • Patent number: 6263309
    Abstract: A set of speaker dependent models is trained upon a comparatively large number of training speakers, one model per speaker, and model parameters are extracted in a predefined order to construct a set of supervectors, one per speaker. Principle component analysis is then performed on the set of supervectors to generate a set of eigenvectors that define an eigenvoice space. If desired, the number of vectors may be reduced to achieve data compression. Thereafter, a new speaker provides adaptation data from which a supervector is constructed by constraining this supervector to be in the eigenvoice space based on a maximum likelihood estimation. The resulting coefficients in the eigenspace of this new speaker may then be used to construct a new set of model parameters from which an adapted model is constructed for that speaker. Environmental adaptation may be performed by including environmental variations in the training data.
    Type: Grant
    Filed: April 30, 1998
    Date of Patent: July 17, 2001
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Patrick Nguyen, Roland Kuhn, Jean-Claude Junqua
  • Patent number: 6260012
    Abstract: An apparatus and method for performing improved speech recognition in a communication terminal, e.g., a mobile phone with a hands-free voice dialing function. In a speech recognition mode, a user's input speech such as a desired called party name, number or a phone command, is converted to feature data and compared to individual pre-stored feature data sets corresponding to pre-recorded speech obtained during a registration process. Difference values representing the respective differences between the current user's input speech and the respective data sets are computed. A first closest (most similar) and second closest feature data set correspond to the first smallest and second smallest difference values so obtained. A closeness threshold is computed as the sum of a small, predetermined threshold and a differential value between the first and second difference values.
    Type: Grant
    Filed: March 1, 1999
    Date of Patent: July 10, 2001
    Assignee: Samsung Electronics Co., LTD
    Inventor: Joung-Kyou Park
  • Patent number: 6256630
    Abstract: A database accessing system for processing a request to access a database including a multiplicity of entries, each entry including at least one word, the request including a sequence of representations of possibly erroneous user inputs, the system including a similar word finder operative, for at least one interpretation of each representation, to find at least one database word which is at least similar to that interpretation, and a database entry evaluator operative, for each database word found by the similar word finder, to assign similarity values for relevant entries in the database, said values representing the degree of similarity between each database entry and the request.
    Type: Grant
    Filed: June 17, 1999
    Date of Patent: July 3, 2001
    Assignee: Phonetic Systems Ltd.
    Inventors: Atzmon Gilai, Hezi Resnekov
  • Patent number: 6246982
    Abstract: A method for computing a distance between collections of distributions or finite mixture models of features. Data is processed so as to define at least first and second collections of distributions of features. For each distribution of the first collection, the distance to each distribution of the second collection is measured to determine which distribution of the second collection is the closest (most similar). The same procedure is performed for the distributions of the second collection. Based on the closest distance measures, a final distance is computed representing the distance between the first and second collections. This final distance may be a weighted sum of the closest distances. The distance measure may be used in a number of applications such as [speaker classification,] speaker recognition and audio segmentation.
    Type: Grant
    Filed: January 26, 1999
    Date of Patent: June 12, 2001
    Assignee: International Business Machines Corporation
    Inventors: Homayoon S. M. Beigi, Stephane H. Maes, Jeffrey S. Sorensen
  • Publication number: 20010003173
    Abstract: A method for increasing voice recognition rate in a voice recognition system comprising the steps of: establishing a reference model for user voices subjected to recognition; receiving the user voices for voice recognition commands; detecting the range and characteristics of the received voice data; comparing the range and characteristics of the detected voice data with the characteristics of the previously obtained reference voice model to retrieve a word having the largest similarity; comparing the similarity of the retrieved word with the similarity reference value to report a voice recognition failure when the compared result is below the reference value, and to report a voice recognition success and perform the command corresponding to the recognized word when the compared result is at least the reference value; and modifying the characteristics of the voice data which succeeded in the voice recognition into the reference voice model which was used in the corresponding voice recognition.
    Type: Application
    Filed: December 6, 2000
    Publication date: June 7, 2001
    Applicant: LG Electronics Inc.
    Inventor: Keun Ok Lim
  • Patent number: 6236964
    Abstract: A speech recognition method and apparatus in which a speech section is sliced by the unit of a word by spotting and candidate words are selected. Next, in a second stage, matching is conducted by the unit of a phoneme. Consequently, selection of the candidate words and slicing of the speech section can be performed concurrently. Furthermore, narrowing of the candidate words is facilitated. Furthermore, since reference phoneme patterns under a plurality of environments are prepared, recognition of an input speech under a larger number of conditions is possible using a smaller amount of data when compared with the case in which reference word patterns under a plurality of environments are prepared.
    Type: Grant
    Filed: February 14, 1994
    Date of Patent: May 22, 2001
    Assignee: Canon Kabushiki Kaisha
    Inventors: Junichi Tamura, Tetsuo Kosaka, Atsushi Sakurai
  • Patent number: 6211876
    Abstract: Methods and apparatus are provided for accessing an experience journal which includes unstructured text items relating to a topic, such as a medical condition. The method is implemented in a computer system including a processor, a storage device, a video display unit having a display screen, and a user interface. The unstructured text items are stored in the storage device. Similarities among the unstructured text items are determined, and icons, one corresponding to each of the unstructured text items, are displayed on the display screen. The icons are positioned on the display screen relative to each other, such that the distances between icons are representative of the determined similarities among the unstructured text items. In response to user selection of one of the icons, the corresponding unstructured text item is displayed on the display screen.
    Type: Grant
    Filed: June 22, 1998
    Date of Patent: April 3, 2001
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Edith Ackermann, Dennis Nathan Bromley, David Ray DeMaso, Sara Frances Frisken Gibson, Joseph Gonzalez-Heydrich, Judith Galler Karlin, Joseph Marks, Chia Shen, Carol Strohecker
  • Patent number: 6195634
    Abstract: Assessing decoys for use in an audio recognition process for identifying predetermined sounds in an unknown input audio signal, involves a test recognition process for matching known training audio signals to models representing the predetermined sounds and the decoys and determining for each of the decoys, from the results of the test recognition process, a score representing the effect of the respective decoy in the recognition of any of the known training audio signals. An advantage arising from generating scores for decoys is that the chance of a poor selection of decoys can be reduced. Thus the possibility of poor recognition performance arising from poorly selected decoys can be reduced. Furthermore, the requirement for expert input into the decoy creation process, which may be time consuming, can be reduced. This can make it easier, or quicker, or less expensive to install.
    Type: Grant
    Filed: December 24, 1997
    Date of Patent: February 27, 2001
    Assignee: Nortel Networks Corporation
    Inventors: Martin Dudemaine, Claude Pelletier
  • Patent number: 6192337
    Abstract: A method of training at least one new word for addition to a vocabulary of a speech recognition engine containing existing words comprises the steps of: a user uttering the at least one new word; computing respective measures between the at least one newly uttered word and at least a portion of the existing vocabulary words, the respective measures indicative of acoustic similarity between the at least one word and the at least a portion of existing words; if no measure is within the threshold range, automatically adding the at least one newly uttered word to the vocabulary; and if at least one measure is within a threshold range, refraining from automatically adding the at least one newly uttered word to the vocabulary.
    Type: Grant
    Filed: August 14, 1998
    Date of Patent: February 20, 2001
    Assignee: International Business Machines Corporation
    Inventors: Abraham Ittycheriah, Stephane H. Maes
  • Patent number: 6182039
    Abstract: The speech recognizer incorporates a language model that reduces the number of acoustic pattern matching sequences that must be performed by the recognizer. The language model is based on knowledge of a pre-defined set of syntactically defined content and includes a data structure that organizes the content according to acoustic confusability. A spelled name recognition system based on the recognizer employs a language model based on classes of letters that the recognizer frequently confuses for one another. The language model data structure is optionally an N-gram data structure, a tree data structure, or an incrementally configured network that is built during a training sequence. The incrementally configured network has nodes that are selected based on acoustic distance from a predetermined lexicon.
    Type: Grant
    Filed: March 24, 1998
    Date of Patent: January 30, 2001
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Luca Rigazio, Jean-Claude Junqua, Michael Galler
  • Patent number: 6157911
    Abstract: A method and a system substantially eliminates an erroneous voice recognition of repetitive elements in word spotting. One preferred embodiment according to the current invention eliminates erroneous voice recognition of repetitive elements by selectively prolonging a response time of words containing repetitive elements. In order to substantially eliminate the errors, in another preferred embodiment according to the current invention, words containing repetitive elements are marked by a silent key word.
    Type: Grant
    Filed: March 27, 1998
    Date of Patent: December 5, 2000
    Assignee: Ricoh Company, Ltd.
    Inventor: Masaru Kuroda
  • Patent number: 6151576
    Abstract: Methods and apparatus of processing, storing and transmitting an original data stream of digitized speech samples. The method converts a stream of digitized speech samples to a stream of text and associated reliability measures. A mixed-media data stream is created with the stream of text as a text component and selected portions of the digitized stream of speech as a speech component. The selected portions are those whose corresponding reliability measures fall below a threshold. The threshold can be changed to change the amount of storage or bandwidth used by the mixed-media data stream. The mixed-media data stream can be searched and the results can be spoken as synthetic speech derived form the text component or as speech samples taken from the digitized speech component.
    Type: Grant
    Filed: August 11, 1998
    Date of Patent: November 21, 2000
    Assignee: Adobe Systems Incorporated
    Inventors: John E. Warnock, T. V. Raman
  • Patent number: 6151575
    Abstract: A source-adapted model for use in speech recognition is generated by defining a linear relationship between a first element of an initial model and a first element of the source-adapted model. Thereafter, speech data that corresponds to the first element of the initial model is assembled from a set of speech data for a particular source associated with the source-adapted model. A linear transform that maps between the assembled speech data and the first element of the initial model is then determined. Finally, a first element of the source-adapted model is produced from the first element of the initial model using the linear transform.
    Type: Grant
    Filed: October 28, 1997
    Date of Patent: November 21, 2000
    Assignee: Dragon Systems, Inc.
    Inventors: Michael Jack Newman, Laurence S. Gillick, Venkatesh Nagesha
  • Patent number: 6134527
    Abstract: A method of testing a new vocabulary word is performed using any set of enrollment utterances provided by the user or from an available database. The present method preferably does not use separate training and similarity test utterances. This allows any or all available repetitions of a vocabulary word being enrolled to be used for training (204), therefore improving the robustness of the trained models. Likewise, any or all training repetitions can also be utilized for similarity analysis (212), providing additional test samples which should further improve the detection of acoustically similar words. Additionally, the similarity analysis progresses incrementally and does not need to continue if a confusable word is found. Finally, first and second thresholds could be employed (212, 302) to provide greater flexibility for a user training a speech recognition system.
    Type: Grant
    Filed: January 30, 1998
    Date of Patent: October 17, 2000
    Assignee: Motorola, Inc.
    Inventors: Jeffrey Arthur Meunier, Edward Srenger, Steven Albrecht
  • Patent number: 6094632
    Abstract: A speaker recognition device for judging whether or not an unknown speaker is an authentic registered speaker himself/herself executes `text verification using speaker independent speech recognition` and `speaker verification by comparison with a reference pattern of a password of a registered speaker`. A presentation section instructs the unknown speaker to input an ID and utter a specified text designated by a text generation section and a password. The `text verification` of the specified text is executed by a text verification section, and the `speaker verification` of the password is executed by a similarity calculation section. The judgment section judges that the unknown speaker is the authentic registered speaker himself/herself if both the results of the `text verification` and the `speaker verification` are affirmative.
    Type: Grant
    Filed: January 29, 1998
    Date of Patent: July 25, 2000
    Assignee: NEC Corporation
    Inventor: Hiroaki Hattori
  • Patent number: 6052662
    Abstract: Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.
    Type: Grant
    Filed: January 29, 1998
    Date of Patent: April 18, 2000
    Assignee: Regents of the University of California
    Inventor: John E. Hogden
  • Patent number: 6018736
    Abstract: A database accessing system for processing a request to access a database including a multiplicity of entries, each entry including at least one word, the request including a sequence of representations of possibly erroneous user inputs, the system including a similar word finder operative, for at least one interpretation of each representation, to find at least one database word which is at least similar to that interpretation, and a database entry evaluator operative, for each database word found by the similar word finder, to assign similarity values for relevant entries in the database, said values representing the degree of similarity between each database entry and the request.
    Type: Grant
    Filed: November 20, 1996
    Date of Patent: January 25, 2000
    Assignee: Phonetic Systems Ltd.
    Inventors: Atzmon Gilai, Hezi Resnekov
  • Patent number: 6006182
    Abstract: Systems and methods consistent with the present invention determine whether to accept one of a plurality of intermediate recognition results output by a speech recognition system as a final recognition result. The system first combines a plurality of speech rejection features into a feature function in which weights are assigned to each rejection feature in accordance with a recognition accuracy of each rejection feature. Feature values are then calculated for each of the rejection features using the plurality of intermediate recognition results. The system next computes the feature function according to the calculated feature values to determine a rejection decision value. Finally, one of the plurality of intermediate recognition results is accepted as the final recognition result according to the rejection decision value.
    Type: Grant
    Filed: September 22, 1997
    Date of Patent: December 21, 1999
    Assignee: Northern Telecom Limited
    Inventors: Waleed Fakhr, Serge Robillard, Vishwa Gupta, Real Tremblay, Michael Sabourin, Jean-Francois Crespo
  • Patent number: 5999902
    Abstract: A recognizer is provided with a priori probability values (e.g., from some previous recognition) indicating how likely the various words of the recognizer's vocabulary are to occur in the particular context, and recognition "scores" are weighted by these values before a result (or results) is chosen. The recognizer also employs "pruning" whereby low-scoring partial results are discarded, so as to speed the recognition process. To avoid premature pruning of the more likely words, probability values are applied before the pruning decisions are made. A method of applying these probability values is described.
    Type: Grant
    Filed: July 16, 1997
    Date of Patent: December 7, 1999
    Assignee: British Telecommunications public Limited Company
    Inventors: Francis James Scahill, Alison Diane Simons, Steven John Whittaker
  • Patent number: 5991721
    Abstract: An apparatus and a method for processing a natural language arranged so as to improve the speech recognition rate. In an example search section, the degree of similarity between each of a plurality of examples of the actual use of the language stored in an example data base and each of a plurality of probable recognition results output from a recognition section, and one of the examples corresponding to the highest degree of similarity is selected. A final speech recognition result is obtained by using the selected example. The example search section calculates the degree of similarity by weighting the degree of similarity on the basis of a context according to at least one of the examples previously selected.
    Type: Grant
    Filed: May 29, 1996
    Date of Patent: November 23, 1999
    Assignee: Sony Corporation
    Inventors: Yasuharu Asano, Masao Watari, Makoto Akabane, Tetsuya Kagami, Kazuo Ishii, Miyuki Tanaka, Yasuhiko Kato, Hiroshi Kakuda, Hiroaki Ogawa
  • Patent number: 5987411
    Abstract: Methods and systems consistent with the present invention enroll a candidate phrase uttered by a user in a dictionary having at least one previously enrolled phrase. The system receives utterances of the candidate phrase and determines whether the first utterance is confusingly similar to a previously enrolled phrase and whether they are consistent with each other. The system then enrolls the candidate phrase in the dictionary according to these determinations.
    Type: Grant
    Filed: December 17, 1997
    Date of Patent: November 16, 1999
    Assignee: Northern Telecom Limited
    Inventors: Marco Petroni, Hung S. Ma
  • Patent number: 5970450
    Abstract: A speech recognition system, in which partial reference patterns, and cumulative similarities of these patterns, are stored in a temporary pattern memory. The partial reference patterns are to be used as subjects of a similarity computation with an input speech pattern that has its feature quantities extracted by a speech analyzing unit. A counting unit counts partial reference patterns having corresponding cumulative similarities that are higher than a threshold value stored in a threshold memory. A threshold computing unit computes a threshold of pruning from a correspondence relation between the number of partial reference patterns that have corresponding cumulative similarities that exceed the threshold, and the threshold. A similarity computing unit computes a similarity, with respect to the feature quantities, of partial reference patterns with corresponding cumulative similarities that are greater than the threshold of pruning.
    Type: Grant
    Filed: November 24, 1997
    Date of Patent: October 19, 1999
    Assignee: NEC Corporation
    Inventor: Hiroaki Hattori
  • Patent number: 5953699
    Abstract: A speech recognition apparatus has an analysis section that outputs features of input speech as a time sequence of feature vectors defined for discrete time points corresponding to a processed speech frame. Reference paradigm utterances are converted into a time sequence of standard (reference) feature vectors. The possible continuous variation of standard feature vectors at each point in time is expressed by a line segment, or set of line segments, connecting the feature vectors for the two end points of the "movable" range within which the feature can change, rather than using a larger set of reference vectors as in a conventional multitemplate approach to speech recognition. For example, the continuous range of possible background noise levels in input speech defines a line segment connecting the two feature vectors at the two SNR value limits.
    Type: Grant
    Filed: October 28, 1997
    Date of Patent: September 14, 1999
    Assignee: NEC Corporation
    Inventor: Keizaburo Takagi
  • Patent number: 5878390
    Abstract: A speech recognition apparatus which includes a speech recognition section for performing a speech recognition process on an uttered speech with reference to a predetermined statistical language model, based on a series of speech signal of the uttered speech sentence composed of a series of input words. The speech recognition section calculates a functional value of a predetermined erroneous sentence judging function with respect to speech recognition candidates, where the erroneous sentence judging representing a degree of unsuitability for the speech recognition candidates. When the calculated functional value exceeds a predetermined threshold value, the speech recognition section performs the speech recognition process by eliminating a speech recognition candidate corresponding to a calculated functional value.
    Type: Grant
    Filed: June 23, 1997
    Date of Patent: March 2, 1999
    Assignee: ATR Interpreting Telecommunications Research Laboratories
    Inventors: Jun Kawai, Yumi Wakita