Similarity Patents (Class 704/239)

Method and apparatus for rapid adapt via cumulative distribution function matching for continuous speech

Patent number: 6470314

Abstract: A method of adapting a speech recognition system to one or more acoustic conditions comprises the steps of: (i) computing cumulative distribution functions based on dimensions of speech vectors associated with training speech data provided to the speech recognition system; (ii) computing cumulative distribution functions based on dimensions of speech vectors associated with test speech data provided to the speech recognition system; (iii) computing a nonlinear transformation mapping based on the cumulative distribution functions associated with the training speech data and the cumulative distribution functions associated with the test speech data; and (iv) applying the nonlinear transformation mapping to speech vectors associated with the test speech data prior to recognition, wherein the speech vectors transformed in accordance with the nonlinear transformation mapping are substantially similar to speech vectors associated with the training speech data.

Type: Grant

Filed: April 6, 2000

Date of Patent: October 22, 2002

Assignee: International Business Machines Corporation

Inventors: Satyanarayana Dharanipragada, Mukund Padmanabhan
Voice activity detector for calculating spectral irregularity measure on the basis of spectral difference measurements

Patent number: 6427134

Abstract: A voice activity detector suitable for deployment in a mobile phone apparatus is disclosed. An advantage of the voice activity detector is that it is better able to provide a decision (79) as to whether an input signal (19) consists of noise (which it is not desired to transmit) or comprises speech or information tones (which are required to be transmitted), especially in noisy environments. The voice activity detector includes a number of components, in particular an auxiliary voice activity detector (3). The auxiliary voice activity detector (3) distinguishes between noise and speech on the basis that the spectrum of speech changes more rapidly than that of noise. This results in the auxiliary detector (3) rarely mistaking a speech signal to be a noise signal. Hence, a very reliable noise template (421) is obtained. For this reason, the auxiliary detector (3) is also useful in noise reduction applications. The voice activity detector also uses a neural net classifier (7).

Type: Grant

Filed: September 26, 1998

Date of Patent: July 30, 2002

Assignee: British Telecommunications public limited company

Inventors: Neil Robert Garner, Paul Alexander Barrett
Speech recognition method using confidence measure evaluation

Patent number: 6421640

Abstract: The invention relates to a method of automatically recognizing speech utterances, in which a recognition result is evaluated by means of a first confidence measure and a plurality of second confidence measures determined for a recognition result is automatically combined for determining the first confidence measure. To reduce the resultant error rate in the assessment of the correctness of a recognition result, the method is characterized in that the determination of the parameters weighting the combination of the second confidence measures is based on a minimization of a cross-entropy-error measure. A further improvement is achieved by means of a post-processing operation based on the maximization of the Gardner-Derrida error function.

Type: Grant

Filed: September 13, 1999

Date of Patent: July 16, 2002

Assignee: Koninklijke Philips Electronics N.V.

Inventors: Jannes G. A. Dolfing, Andreas Wendemuth
Speech recognition method and system

Patent number: 6411929

Abstract: Frames making up an input speech are each collated with a string of phonemes representing speech candidates to be recognized, whereby evaluation values regarding the phonemes are computed. The frames are each compared with part of the phoneme string so as to reduce computations and memory capacity required in recognizing the input speech based on the evaluation values. That is, each frame is compared with a portion of the phoneme string to acquire an evaluation value for each phoneme. If the acquired evaluation value meets a predetermined condition, part of the phonemes to be collated with the next frame are changed. Illustratively, if the evaluation value for the phoneme heading a given portion of collated phonemes is smaller than the evaluation value of the phoneme which terminates that phoneme portion, then the head phoneme is replaced by the next phoneme. The new portion of phonemes obtained by the replacement is used for collation with the next frame.

Type: Grant

Filed: July 26, 2000

Date of Patent: June 25, 2002

Assignee: Hitachi, Ltd.

Inventors: Kazuyoshi Ishiwatari, Kazuo Kondo, Shinji Wakisaka
Method and apparatus for generating phrasal transcriptions

Patent number: 6408271

Abstract: The invention relates to a method and apparatus for generating phrasal transcriptions. The invention provides generating a group of word transcriptions for each vocabulary item in an orthographic phrase. According to a first embodiment, the invention further provides permuting the word transcriptions to generate a plurality of phrasal transcriptions and computing a score for each phrasal transcription in the plurality of phrasal transcriptions. The set of phrasal transcriptions is then selected from the plurality of phrasal transcriptions at least in part on a basis of the score data elements and stored in a format suitable for use by a speech recognition dictionary. As a variant, the phrasal transcriptions may be released in a format suitable for use by a speech synthesizer.

Type: Grant

Filed: September 24, 1999

Date of Patent: June 18, 2002

Assignee: Nortel Networks Limited

Inventors: Kenneth W. Smith, Michael G. Sabourin
Methods and apparatuses for segmenting an audio-visual recording using image similarity searching and audio speaker recognition

Patent number: 6404925

Abstract: Methods for segmenting audio-video recording of meetings containing slide presentations by one or more speakers are described. These segments serve as indexes into the recorded meeting. If an agenda is provided for the meeting, these segments can be labeled using information from the agenda. The system automatically detects intervals of video that correspond to presentation slides. Under the assumption that only one person is speaking during an interval when slides are displayed in the video, possible speaker intervals are extracted from the audio soundtrack by finding these regions. Since the same speaker may talk across multiple slide intervals, the acoustic data from these intervals is clustered to yield an estimate of the number of distinct speakers and their order. Clustering the audio data from these intervals yields an estimate of the number of different speakers and their order.

Type: Grant

Filed: March 11, 1999

Date of Patent: June 11, 2002

Assignees: Fuji Xerox Co., Ltd., Xerox Corporation

Inventors: Jonathan T. Foote, Lynn Wilcox
Hand-free operator capable of infrared controlling a vehicle's audio stereo system

Patent number: 6397086

Abstract: The disclosure relates to an infrared controlled hand-free operator for cellular phones housed in a vehicle. More particularly, it is concerned with a hand-free operator which can automatically transmit an infrared signal similar to a control signal of a vehicle's audio stereo system to turn the stereo system off or mute when income signals are received by a cellular phone mounted to a vehicle. The hand-free operator thus can automatically cut off the noise sound of an audio stereo system in operation so as to make the operation of a cellular phone free of interference in a vehicle on the move.

Type: Grant

Filed: June 22, 1999

Date of Patent: May 28, 2002

Assignee: E-Lead Electronic Co., Ltd.

Inventor: Tonny Chen
Cohort model selection apparatus and method

Patent number: 6393397

Abstract: An apparatus for selecting a cohort model for use in a speaker verification system includes a model generator (108) for determining a target speaker model (114) from a speech sample collected from the target speaker (106). A cohort selector (110) determines a similarity value between each of a number of predetermined existing speaker models from a model pool (112) and the target speaker model (114) and a dissimilarity value between each of the existing speaker models and any previously selected cohort models (116). An existing speaker model which is most similar to the target speaker model, but most dissimilar to previously chosen cohort models, is then chosen as another cohort model for the target speaker.

Type: Grant

Filed: June 14, 1999

Date of Patent: May 21, 2002

Assignee: Motorola, Inc.

Inventors: Ho Chuen Choi, Xiaoyuan Zhu, Jianming Song
Signal verification device

Patent number: 6349148

Abstract: The invention relates to a device for the verification of time-dependent, user-specific signals which includes means for generating a set of feature vectors which serve to provide an approximative description of an input signal and are associated with selectable sampling intervals of the signal; means for preparing an HMM model for the signal; means for determining a first probability value which describes the probability of occurrence of the set of feature vectors, given the HMM model, and a threshold decider for comparing the first probability value with a threshold value and for deciding on the verification of the signal.

Type: Grant

Filed: May 26, 1999

Date of Patent: February 19, 2002

Assignee: U.S. Philips Corporation

Inventor: Jannes G.A. Dolfing
Speech recognition apparatus, method and storage medium thereof

Patent number: 6341263

Abstract: A voice recognition system, method and storage medium is provided. The system includes a plurality of storage sections, a selection section, an adaptation section, a plurality of calculation sections, an adaptation section, a normalization section and a decision section. The method includes the steps for performing the functions associated with the sections.

Type: Grant

Filed: May 17, 1999

Date of Patent: January 22, 2002

Assignee: NEC Corporation

Inventors: Eiko Yamada, Hiroaki Hattori
Noise immune speech recognition method and system

Publication number: 20010047257

Abstract: A method of assigning a similarity score representative of a similarity between a first speech signal and a second speech signal. The method includes generating a signal transformation responsive to both the first and second signals, determining a transformation score based on at least one characteristic of the generated transformation and calculating the similarity score as a function of the transformation score.

Type: Application

Filed: January 24, 2001

Publication date: November 29, 2001

Inventors: Gabriel Artzi, Yaron Paz, Yehuda Hershkovits
Method and apparatus for clustering-based signal segmentation

Patent number: 6314392

Abstract: In a computerized method a continuous signal is segmented in order to determine statistically stationary units of the signal. The continuous signal is sampled at periodic intervals to produce a timed sequence of digital samples. Fixed numbers of adjacent digital samples are grouped into a plurality of disjoint sets or frames. A statistical distance between adjacent frames is determined. The adjacent sets are merged into a larger set of samples or cluster if the statistical distance is less than a predetermined threshold. In an iterative process, the statistical distance between the adjacent sets are determined, and as long as the distance is less than the predetermined threshold, the sets are iteratively merged to segment the signal into statistically stationary units.

Type: Grant

Filed: September 20, 1996

Date of Patent: November 6, 2001

Assignee: Digital Equipment Corporation

Inventors: Brian S. Eberman, William D. Goldenthal
Speech recognition method and speech recognition device

Patent number: 6301559

Abstract: To realize a speech recognition method and speech recognition device that reduce erroneous recognition for words that are not to be recognized and ambient sounds, and to improve the recognition capability, characteristic parameters of words to be recognized and characteristic parameters of words that are not to be recognized and ambient sounds are previously entered in a vocabulary template 40. Degrees of similarity are obtained in a speech recognition section 30 between characteristic parameters for input words (or sounds) and all the characteristic parameters stored in the vocabulary template 40. Information indicating one characteristic parameter, from among the characteristic parameters stored in the vocabulary template 40, that is the closest approximation to the characteristic parameter of the input word (or sound), is generated as a recognition result.

Type: Grant

Filed: November 16, 1998

Date of Patent: October 9, 2001

Assignee: Oki Electric Industry Co., Ltd.

Inventors: Hiroshi Shinotsuka, Noritoshi Hino
Hierarchial subband linear predictive cepstral features for HMM-based speech recognition

Patent number: 6292776

Abstract: A method and apparatus for first training and then recognizing speech. The method and apparatus use subband cepstral features to improve the recognition string accuracy rates for speech inputs.

Type: Grant

Filed: March 12, 1999

Date of Patent: September 18, 2001

Assignee: Lucent Technologies Inc.

Inventor: Rathinavelu Chengalvarayan
Apparatus and methods for identifying homophones among words in a speech recognition system

Patent number: 6269335

Abstract: A method of identifying homophones of a word uttered by a user from at least a portion of existing words of a vocabulary of a speech recognition engine comprises the steps of: a user uttering the word; decoding the uttered word; computing respective measures between the decoded word and at least a portion of the other existing vocabulary words, the respective measures indicative of acoustic similarity between the word and the at least a portion of other existing words; if at least one measure is within a threshold range, indicating, to the user, results associated with the at least one measure, the results preferably including the decoded word and the other existing vocabulary word associated with the at least one measure; and the user preferably making a selection depending on the word the user intended to utter.

Type: Grant

Filed: August 14, 1998

Date of Patent: July 31, 2001

Assignee: International Business Machines Corporation

Inventors: Abraham Ittycheriah, Stephane Herman Maes, Michael Daniel Monkowski, Jeffrey Scott Sorensen
Method and apparatus for mandarin chinese speech recognition by using initial/final phoneme similarity vector

Publication number: 20010010039

Abstract: Apparatus for Mandarin Chinese speech recognition by using initial/final phoneme similarity vector, for improving the Chinese speech recognition accuracy and downsizing the needed memory is provided.

Type: Application

Filed: December 8, 2000

Publication date: July 26, 2001

Applicant: Matsushita Electrical Industrial Co., Ltd.

Inventor: Chung-Ho Yang
Maximum likelihood method for finding an adapted speaker model in eigenvoice space

Patent number: 6263309

Abstract: A set of speaker dependent models is trained upon a comparatively large number of training speakers, one model per speaker, and model parameters are extracted in a predefined order to construct a set of supervectors, one per speaker. Principle component analysis is then performed on the set of supervectors to generate a set of eigenvectors that define an eigenvoice space. If desired, the number of vectors may be reduced to achieve data compression. Thereafter, a new speaker provides adaptation data from which a supervector is constructed by constraining this supervector to be in the eigenvoice space based on a maximum likelihood estimation. The resulting coefficients in the eigenspace of this new speaker may then be used to construct a new set of model parameters from which an adapted model is constructed for that speaker. Environmental adaptation may be performed by including environmental variations in the training data.

Type: Grant

Filed: April 30, 1998

Date of Patent: July 17, 2001

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Patrick Nguyen, Roland Kuhn, Jean-Claude Junqua
Mobile phone having speaker dependent voice recognition method and apparatus

Patent number: 6260012

Abstract: An apparatus and method for performing improved speech recognition in a communication terminal, e.g., a mobile phone with a hands-free voice dialing function. In a speech recognition mode, a user's input speech such as a desired called party name, number or a phone command, is converted to feature data and compared to individual pre-stored feature data sets corresponding to pre-recorded speech obtained during a registration process. Difference values representing the respective differences between the current user's input speech and the respective data sets are computed. A first closest (most similar) and second closest feature data set correspond to the first smallest and second smallest difference values so obtained. A closeness threshold is computed as the sum of a small, predetermined threshold and a differential value between the first and second difference values.

Type: Grant

Filed: March 1, 1999

Date of Patent: July 10, 2001

Assignee: Samsung Electronics Co., LTD

Inventor: Joung-Kyou Park
Word-containing database accessing system for responding to ambiguous queries, including a dictionary of database words, a dictionary searcher and a database searcher

Patent number: 6256630

Abstract: A database accessing system for processing a request to access a database including a multiplicity of entries, each entry including at least one word, the request including a sequence of representations of possibly erroneous user inputs, the system including a similar word finder operative, for at least one interpretation of each representation, to find at least one database word which is at least similar to that interpretation, and a database entry evaluator operative, for each database word found by the similar word finder, to assign similarity values for relevant entries in the database, said values representing the degree of similarity between each database entry and the request.

Type: Grant

Filed: June 17, 1999

Date of Patent: July 3, 2001

Assignee: Phonetic Systems Ltd.

Inventors: Atzmon Gilai, Hezi Resnekov
Method for measuring distance between collections of distributions

Patent number: 6246982

Abstract: A method for computing a distance between collections of distributions or finite mixture models of features. Data is processed so as to define at least first and second collections of distributions of features. For each distribution of the first collection, the distance to each distribution of the second collection is measured to determine which distribution of the second collection is the closest (most similar). The same procedure is performed for the distributions of the second collection. Based on the closest distance measures, a final distance is computed representing the distance between the first and second collections. This final distance may be a weighted sum of the closest distances. The distance measure may be used in a number of applications such as [speaker classification,] speaker recognition and audio segmentation.

Type: Grant

Filed: January 26, 1999

Date of Patent: June 12, 2001

Assignee: International Business Machines Corporation

Inventors: Homayoon S. M. Beigi, Stephane H. Maes, Jeffrey S. Sorensen
Method for increasing recognition rate in voice recognition system

Publication number: 20010003173

Abstract: A method for increasing voice recognition rate in a voice recognition system comprising the steps of: establishing a reference model for user voices subjected to recognition; receiving the user voices for voice recognition commands; detecting the range and characteristics of the received voice data; comparing the range and characteristics of the detected voice data with the characteristics of the previously obtained reference voice model to retrieve a word having the largest similarity; comparing the similarity of the retrieved word with the similarity reference value to report a voice recognition failure when the compared result is below the reference value, and to report a voice recognition success and perform the command corresponding to the recognized word when the compared result is at least the reference value; and modifying the characteristics of the voice data which succeeded in the voice recognition into the reference voice model which was used in the corresponding voice recognition.

Type: Application

Filed: December 6, 2000

Publication date: June 7, 2001

Applicant: LG Electronics Inc.

Inventor: Keun Ok Lim
Speech recognition apparatus and method for matching inputted speech and a word generated from stored referenced phoneme data

Patent number: 6236964

Abstract: A speech recognition method and apparatus in which a speech section is sliced by the unit of a word by spotting and candidate words are selected. Next, in a second stage, matching is conducted by the unit of a phoneme. Consequently, selection of the candidate words and slicing of the speech section can be performed concurrently. Furthermore, narrowing of the candidate words is facilitated. Furthermore, since reference phoneme patterns under a plurality of environments are prepared, recognition of an input speech under a larger number of conditions is possible using a smaller amount of data when compared with the case in which reference word patterns under a plurality of environments are prepared.

Type: Grant

Filed: February 14, 1994

Date of Patent: May 22, 2001

Assignee: Canon Kabushiki Kaisha

Inventors: Junichi Tamura, Tetsuo Kosaka, Atsushi Sakurai
Method and system for displaying icons representing information items stored in a database

Patent number: 6211876

Abstract: Methods and apparatus are provided for accessing an experience journal which includes unstructured text items relating to a topic, such as a medical condition. The method is implemented in a computer system including a processor, a storage device, a video display unit having a display screen, and a user interface. The unstructured text items are stored in the storage device. Similarities among the unstructured text items are determined, and icons, one corresponding to each of the unstructured text items, are displayed on the display screen. The icons are positioned on the display screen relative to each other, such that the distances between icons are representative of the determined similarities among the unstructured text items. In response to user selection of one of the icons, the corresponding unstructured text item is displayed on the display screen.

Type: Grant

Filed: June 22, 1998

Date of Patent: April 3, 2001

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Edith Ackermann, Dennis Nathan Bromley, David Ray DeMaso, Sara Frances Frisken Gibson, Joseph Gonzalez-Heydrich, Judith Galler Karlin, Joseph Marks, Chia Shen, Carol Strohecker
Selection of decoys for non-vocabulary utterances rejection

Patent number: 6195634

Abstract: Assessing decoys for use in an audio recognition process for identifying predetermined sounds in an unknown input audio signal, involves a test recognition process for matching known training audio signals to models representing the predetermined sounds and the decoys and determining for each of the decoys, from the results of the test recognition process, a score representing the effect of the respective decoy in the recognition of any of the known training audio signals. An advantage arising from generating scores for decoys is that the chance of a poor selection of decoys can be reduced. Thus the possibility of poor recognition performance arising from poorly selected decoys can be reduced. Furthermore, the requirement for expert input into the decoy creation process, which may be time consuming, can be reduced. This can make it easier, or quicker, or less expensive to install.

Type: Grant

Filed: December 24, 1997

Date of Patent: February 27, 2001

Assignee: Nortel Networks Corporation

Inventors: Martin Dudemaine, Claude Pelletier
Apparatus and methods for rejecting confusible words during training associated with a speech recognition system

Patent number: 6192337

Abstract: A method of training at least one new word for addition to a vocabulary of a speech recognition engine containing existing words comprises the steps of: a user uttering the at least one new word; computing respective measures between the at least one newly uttered word and at least a portion of the existing vocabulary words, the respective measures indicative of acoustic similarity between the at least one word and the at least a portion of existing words; if no measure is within the threshold range, automatically adding the at least one newly uttered word to the vocabulary; and if at least one measure is within a threshold range, refraining from automatically adding the at least one newly uttered word to the vocabulary.

Type: Grant

Filed: August 14, 1998

Date of Patent: February 20, 2001

Assignee: International Business Machines Corporation

Inventors: Abraham Ittycheriah, Stephane H. Maes
Method and apparatus using probabilistic language model based on confusable sets for speech recognition

Patent number: 6182039

Abstract: The speech recognizer incorporates a language model that reduces the number of acoustic pattern matching sequences that must be performed by the recognizer. The language model is based on knowledge of a pre-defined set of syntactically defined content and includes a data structure that organizes the content according to acoustic confusability. A spelled name recognition system based on the recognizer employs a language model based on classes of letters that the recognizer frequently confuses for one another. The language model data structure is optionally an N-gram data structure, a tree data structure, or an incrementally configured network that is built during a training sequence. The incrementally configured network has nodes that are selected based on acoustic distance from a predetermined lexicon.

Type: Grant

Filed: March 24, 1998

Date of Patent: January 30, 2001

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Luca Rigazio, Jean-Claude Junqua, Michael Galler
Method and a system for substantially eliminating speech recognition error in detecting repetitive sound elements

Patent number: 6157911

Abstract: A method and a system substantially eliminates an erroneous voice recognition of repetitive elements in word spotting. One preferred embodiment according to the current invention eliminates erroneous voice recognition of repetitive elements by selectively prolonging a response time of words containing repetitive elements. In order to substantially eliminate the errors, in another preferred embodiment according to the current invention, words containing repetitive elements are marked by a silent key word.

Type: Grant

Filed: March 27, 1998

Date of Patent: December 5, 2000

Assignee: Ricoh Company, Ltd.

Inventor: Masaru Kuroda
Rapid adaptation of speech models

Patent number: 6151575

Abstract: A source-adapted model for use in speech recognition is generated by defining a linear relationship between a first element of an initial model and a first element of the source-adapted model. Thereafter, speech data that corresponds to the first element of the initial model is assembled from a set of speech data for a particular source associated with the source-adapted model. A linear transform that maps between the assembled speech data and the first element of the initial model is then determined. Finally, a first element of the source-adapted model is produced from the first element of the initial model using the linear transform.

Type: Grant

Filed: October 28, 1997

Date of Patent: November 21, 2000

Assignee: Dragon Systems, Inc.

Inventors: Michael Jack Newman, Laurence S. Gillick, Venkatesh Nagesha
Mixing digitized speech and text using reliability indices

Patent number: 6151576

Abstract: Methods and apparatus of processing, storing and transmitting an original data stream of digitized speech samples. The method converts a stream of digitized speech samples to a stream of text and associated reliability measures. A mixed-media data stream is created with the stream of text as a text component and selected portions of the digitized stream of speech as a speech component. The selected portions are those whose corresponding reliability measures fall below a threshold. The threshold can be changed to change the amount of storage or bandwidth used by the mixed-media data stream. The mixed-media data stream can be searched and the results can be spoken as synthetic speech derived form the text component or as speech samples taken from the digitized speech component.

Type: Grant

Filed: August 11, 1998

Date of Patent: November 21, 2000

Assignee: Adobe Systems Incorporated

Inventors: John E. Warnock, T. V. Raman
Method of testing a vocabulary word being enrolled in a speech recognition system

Patent number: 6134527

Abstract: A method of testing a new vocabulary word is performed using any set of enrollment utterances provided by the user or from an available database. The present method preferably does not use separate training and similarity test utterances. This allows any or all available repetitions of a vocabulary word being enrolled to be used for training (204), therefore improving the robustness of the trained models. Likewise, any or all training repetitions can also be utilized for similarity analysis (212), providing additional test samples which should further improve the detection of acoustically similar words. Additionally, the similarity analysis progresses incrementally and does not need to continue if a confusable word is found. Finally, first and second thresholds could be employed (212, 302) to provide greater flexibility for a user training a speech recognition system.

Type: Grant

Filed: January 30, 1998

Date of Patent: October 17, 2000

Assignee: Motorola, Inc.

Inventors: Jeffrey Arthur Meunier, Edward Srenger, Steven Albrecht
Speaker recognition device

Patent number: 6094632

Abstract: A speaker recognition device for judging whether or not an unknown speaker is an authentic registered speaker himself/herself executes `text verification using speaker independent speech recognition` and `speaker verification by comparison with a reference pattern of a password of a registered speaker`. A presentation section instructs the unknown speaker to input an ID and utter a specified text designated by a text generation section and a password. The `text verification` of the specified text is executed by a text verification section, and the `speaker verification` of the password is executed by a similarity calculation section. The judgment section judges that the unknown speaker is the authentic registered speaker himself/herself if both the results of the `text verification` and the `speaker verification` are affirmative.

Type: Grant

Filed: January 29, 1998

Date of Patent: July 25, 2000

Assignee: NEC Corporation

Inventor: Hiroaki Hattori
Speech processing using maximum likelihood continuity mapping

Patent number: 6052662

Abstract: Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.

Type: Grant

Filed: January 29, 1998

Date of Patent: April 18, 2000

Assignee: Regents of the University of California

Inventor: John E. Hogden
Word-containing database accessing system for responding to ambiguous queries, including a dictionary of database words, a dictionary searcher and a database searcher

Patent number: 6018736

Abstract: A database accessing system for processing a request to access a database including a multiplicity of entries, each entry including at least one word, the request including a sequence of representations of possibly erroneous user inputs, the system including a similar word finder operative, for at least one interpretation of each representation, to find at least one database word which is at least similar to that interpretation, and a database entry evaluator operative, for each database word found by the similar word finder, to assign similarity values for relevant entries in the database, said values representing the degree of similarity between each database entry and the request.

Type: Grant

Filed: November 20, 1996

Date of Patent: January 25, 2000

Assignee: Phonetic Systems Ltd.

Inventors: Atzmon Gilai, Hezi Resnekov
Speech recognition rejection method using generalized additive models

Patent number: 6006182

Abstract: Systems and methods consistent with the present invention determine whether to accept one of a plurality of intermediate recognition results output by a speech recognition system as a final recognition result. The system first combines a plurality of speech rejection features into a feature function in which weights are assigned to each rejection feature in accordance with a recognition accuracy of each rejection feature. Feature values are then calculated for each of the rejection features using the plurality of intermediate recognition results. The system next computes the feature function according to the calculated feature values to determine a rejection decision value. Finally, one of the plurality of intermediate recognition results is accepted as the final recognition result according to the rejection decision value.

Type: Grant

Filed: September 22, 1997

Date of Patent: December 21, 1999

Assignee: Northern Telecom Limited

Inventors: Waleed Fakhr, Serge Robillard, Vishwa Gupta, Real Tremblay, Michael Sabourin, Jean-Francois Crespo
Speech recognition incorporating a priori probability weighting factors

Patent number: 5999902

Abstract: A recognizer is provided with a priori probability values (e.g., from some previous recognition) indicating how likely the various words of the recognizer's vocabulary are to occur in the particular context, and recognition "scores" are weighted by these values before a result (or results) is chosen. The recognizer also employs "pruning" whereby low-scoring partial results are discarded, so as to speed the recognition process. To avoid premature pruning of the more likely words, probability values are applied before the pruning decisions are made. A method of applying these probability values is described.

Type: Grant

Filed: July 16, 1997

Date of Patent: December 7, 1999

Assignee: British Telecommunications public Limited Company

Inventors: Francis James Scahill, Alison Diane Simons, Steven John Whittaker
Apparatus and method for processing natural language and apparatus and method for speech recognition

Patent number: 5991721

Abstract: An apparatus and a method for processing a natural language arranged so as to improve the speech recognition rate. In an example search section, the degree of similarity between each of a plurality of examples of the actual use of the language stored in an example data base and each of a plurality of probable recognition results output from a recognition section, and one of the examples corresponding to the highest degree of similarity is selected. A final speech recognition result is obtained by using the selected example. The example search section calculates the degree of similarity by weighting the degree of similarity on the basis of a context according to at least one of the examples previously selected.

Type: Grant

Filed: May 29, 1996

Date of Patent: November 23, 1999

Assignee: Sony Corporation

Inventors: Yasuharu Asano, Masao Watari, Makoto Akabane, Tetsuya Kagami, Kazuo Ishii, Miyuki Tanaka, Yasuhiko Kato, Hiroshi Kakuda, Hiroaki Ogawa
Recognition system for determining whether speech is confusing or inconsistent

Patent number: 5987411

Abstract: Methods and systems consistent with the present invention enroll a candidate phrase uttered by a user in a dictionary having at least one previously enrolled phrase. The system receives utterances of the candidate phrase and determines whether the first utterance is confusingly similar to a previously enrolled phrase and whether they are consistent with each other. The system then enrolls the candidate phrase in the dictionary according to these determinations.

Type: Grant

Filed: December 17, 1997

Date of Patent: November 16, 1999

Assignee: Northern Telecom Limited

Inventors: Marco Petroni, Hung S. Ma
Speech recognition system using modifiable recognition threshold to reduce the size of the pruning tree

Patent number: 5970450

Abstract: A speech recognition system, in which partial reference patterns, and cumulative similarities of these patterns, are stored in a temporary pattern memory. The partial reference patterns are to be used as subjects of a similarity computation with an input speech pattern that has its feature quantities extracted by a speech analyzing unit. A counting unit counts partial reference patterns having corresponding cumulative similarities that are higher than a threshold value stored in a threshold memory. A threshold computing unit computes a threshold of pruning from a correspondence relation between the number of partial reference patterns that have corresponding cumulative similarities that exceed the threshold, and the threshold. A similarity computing unit computes a similarity, with respect to the feature quantities, of partial reference patterns with corresponding cumulative similarities that are greater than the threshold of pruning.

Type: Grant

Filed: November 24, 1997

Date of Patent: October 19, 1999

Assignee: NEC Corporation

Inventor: Hiroaki Hattori
Speech recognition using distance between feature vector of one sequence and line segment connecting feature-variation-end-point vectors in another sequence

Patent number: 5953699

Abstract: A speech recognition apparatus has an analysis section that outputs features of input speech as a time sequence of feature vectors defined for discrete time points corresponding to a processed speech frame. Reference paradigm utterances are converted into a time sequence of standard (reference) feature vectors. The possible continuous variation of standard feature vectors at each point in time is expressed by a line segment, or set of line segments, connecting the feature vectors for the two end points of the "movable" range within which the feature can change, rather than using a larger set of reference vectors as in a conventional multitemplate approach to speech recognition. For example, the continuous range of possible background noise levels in input speech defines a line segment connecting the two feature vectors at the two SNR value limits.

Type: Grant

Filed: October 28, 1997

Date of Patent: September 14, 1999

Assignee: NEC Corporation

Inventor: Keizaburo Takagi
Speech recognition apparatus equipped with means for removing erroneous candidate of speech recognition

Patent number: 5878390

Abstract: A speech recognition apparatus which includes a speech recognition section for performing a speech recognition process on an uttered speech with reference to a predetermined statistical language model, based on a series of speech signal of the uttered speech sentence composed of a series of input words. The speech recognition section calculates a functional value of a predetermined erroneous sentence judging function with respect to speech recognition candidates, where the erroneous sentence judging representing a degree of unsuitability for the speech recognition candidates. When the calculated functional value exceeds a predetermined threshold value, the speech recognition section performs the speech recognition process by eliminating a speech recognition candidate corresponding to a calculated functional value.

Type: Grant

Filed: June 23, 1997

Date of Patent: March 2, 1999

Assignee: ATR Interpreting Telecommunications Research Laboratories

Inventors: Jun Kawai, Yumi Wakita
Speech recognizing method and apparatus, and speech translating system

Patent number: 5848389

Abstract: In a speech recognizing apparatus, a grammatical qualification of a proposed speech recognition result candidate is judged without using a grammatical rule. The speech recognizing apparatus for performing sentence/speech recognition is comprised of an analyzing unit for acoustically analyzing speech inputted therein to extract a feature parameter of the inputted speech; a recognizing unit for recognizing the inputted speech based upon the feature parameter outputted from said analyzing unit to thereby a plurality of proposed recognition result candidates; an example data base for storing therein a plurality of examples; and an example retrieving unit for calculating a resemblance degree between each of said plurality of proposed recognition result candidates and each of the plural examples stored in the example data base and for obtaining the speech recognition result based on said calculated resemblance degree.

Type: Grant

Filed: April 5, 1996

Date of Patent: December 8, 1998

Assignee: Sony Corporation

Inventors: Yasuharu Asano, Hiroaki Ogawa, Yasuhiko Kato, Tetsuya Kagami, Masao Watari, Makoto Akabane, Kazuo Ishii, Miyuki Tanaka, Hiroshi Kakuda
Speech recognition with sequence parsing, rejection and pause detection options

Patent number: 5848388

Abstract: A recognition system includes a speech recognition processing unit for processing input speech signals to indicate similarity to predetermined patterns to be recognized. The recognition processing unit is arranged to repeatedly partition the input speech signal into a pattern-containing portion and, preceding and following the pattern-containing portions, noise or silence portions, and to identify a pattern corresponding to the pattern containing portion. An output supplies a recognition signal indicating recognition of one of the patterns. A pause detector detects the noise or silence portion which follows the pattern-containing portion. In response to its detection, a signal identifying the pattern currently corresponding to the pattern portion is supplied to the output. Also provided are similarly operating rejection portions.

Type: Grant

Filed: December 19, 1995

Date of Patent: December 8, 1998

Assignee: British Telecommunications plc

Inventors: Kevin Joseph Power, Stephen Howard Johnson, Francis James Scahill, Simon Patrick Ringland, John Edward Talintyre
Telecommunications instrument employing variable criteria speech recognition

Patent number: 5842161

Abstract: A recognition criterion or set of recognition criteria are updated automatically, over time, in accordance with the speech input of the user(s). Each input utterance is compared to one or more models of speech to determine a similarity metric for each such comparison. A model of speech which most closely matches the utterance is determined based on the one or more similarity metrics. The similarity metric corresponding to the most closely matching model of speech is analyzed to determine whether the similarity metric satisfies the selected set of recognition criteria. The recognition criteria are automatically altered during use or "on-the-fly", so that more appropriate criteria (and associated thresholds) may be used to either increase the probability of recognition or decrease the incidence of false positive results. Illustratively, if a voice sample results in a near miss of a template, a more liberal criterion is thereafter employed to increase the probability of recognition for subsequent input.

Type: Grant

Filed: June 25, 1996

Date of Patent: November 24, 1998

Assignee: Lucent Technologies Inc.

Inventors: Paul Wesley Cohrs, Mitra P. Deldar, Donald Marion Keen, Ellen Anne Keen
Identification-function calculator, identification-function calculating method, identification unit, identification method, and speech recognition system

Patent number: 5828998

Abstract: A discriminant or identification function is used for pattern recognition in which the highest performance can be offered when adaptation is made. Learning is carried out while a discriminant or identification function is adapted to a learning sample. For example, a standard pattern of the character "A" used as an identification function is learned such that when the character "A" slanting in the right or left direction is input, the standard pattern of the character "A" is rotated (adapted) in accordance with the slanting of the input learning sample.

Type: Grant

Filed: September 24, 1996

Date of Patent: October 27, 1998

Assignee: Sony Corporation

Inventor: Naoto Iwahashi
Multistage word recognizer based on reliably detected phoneme similarity regions

Patent number: 5822728

Abstract: The multistage word recognizer uses a word reference representation based on reliably detected peaks of phoneme similarity values. The word reference representation captures the basic features of the words by targets that describe the location and shape of stable peaks of phoneme similarity values. The first stage of the word hypothesizer represents each reference word with statistical information on the number of high similarity regions over a predefined number of time intervals. The second stage represents each word by a prototype that consists of a series of phoneme targets and global statistics, namely the average word duration and average match rate. These represent the degree of fit of the word prototype to its training data. Word recognition scores generated in the two stages are converted to dimensionless normalized values and combined by averaging for use in selecting the most probable word candidates.

Type: Grant

Filed: September 8, 1995

Date of Patent: October 13, 1998

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Ted H. Applebaum, Philippe R. Morin
Digital signal processor arrangement and method for comparing feature vectors

Patent number: 5819219

Abstract: A digital signal processor employable for utilization for speech processing or for some other pattern recognition overcomes the weaknesses of digital signal processors given the subtraction with following amount formation that must often be implemented in these applications, an auxiliary hardware is provided that contains the feature vector that is to be compared to reference feature vectors from the dictionary in a separate memory. The calculating work is thereby implemented by a separate arithmetic unit that provides a separate difference-forming and amount-forming unit for each feature comparison. The number of clock cycles of the digital signal processor required per comparison can be dramatically reduced by the invention. A suitable addressing method thereby assures that it is always corresponding features of the individual feature vectors that can be compared to one another.

Type: Grant

Filed: December 11, 1996

Date of Patent: October 6, 1998

Assignee: Siemens Aktiengesellschaft

Inventors: Luc De Vos, Daniel Goryn
Speech recognition apparatus using neural network and learning method therefor

Patent number: 5809461

Abstract: A speech recognition apparatus using a neural network is provided. A neuron-like element stores a value of its inner conditions. The neuron-like element also updates a value of its internal status on the basis of an output from the neuron-like element itself, outputs from other neuron-like elements and an external input outside. The neuron-like element also converts a value of its internal status into an external output. Accordingly, the neuron-like element itself can retain the history of input data. This enables time series data, such as speech, to be processed without providing any special devices in the neural network.

Type: Grant

Filed: June 7, 1995

Date of Patent: September 15, 1998

Assignee: Seiko Epson Corporation

Inventor: Mitsuhiro Inazumi
Coding of a speech or music signal with quantization of harmonics components specifically and then residue components

Patent number: 5806024

Abstract: Harmonics coefficients are estimated in primary coefficients of an orthogonal transform of a speech or a music input signal by using a pitch frequency extracted from the input signal and are quantized into a harmonics code vector. Residue coefficients are calculated by removing the harmonics coefficients from the primary coefficients and quantized into residue code vectors and gain code vectors. It is possible to search harmonics excitation pulses at the harmonics locations for harmonics quantization into the harmonics code vector. On the other hand, it is possible to estimate the harmonics coefficients or excitation pulses by using quantized LSP parameters and to calculate secondary coefficients for use in weighting the harmonics quantization and residue quantization and, if applicable, in excitation pulse search.

Type: Grant

Filed: December 23, 1996

Date of Patent: September 8, 1998

Assignee: NEC Corporation

Inventor: Kazunori Ozawa
Method and device for rating of speech quality by calculating time delays from onset of vowel sounds

Patent number: 5806028

Abstract: A method and device for determining quality of speech. The speech to be evaluated is listened to by a person who reproduces the speech. The end of vowel sounds in the produced and reproduced speech respectively are determined. The difference between the ends of the vowel sounds is registered. From the obtained time differences an average value is determined. The average value indicates the quality of the produced speech. The invention can be used for evaluation of different speech sources.

Type: Grant

Filed: February 14, 1996

Date of Patent: September 8, 1998

Assignee: Telia AB

Inventor: Bertil Lyberg
Speech recognition system and method for properly recognizing a compound word composed of a plurality of words

Patent number: 5799274

Abstract: A speech recognition system and method having an increased recognition accuracy for a compound word composed of a first word and a second word. Standard information corresponding to each of the registered words is stored. The standard information includes predetermined feature information and time information with respect to each of the registered words. The time information represents a continuous time length for pronouncing each of the registered words at a normal speed. Feature information extracted from an input word is compared with the standard information to obtain a similarity between the feature information and the standard information corresponding to one of the registered words. A determination time is set to determine a result of recognition when the compound word is input and when a first degree of similarity is obtained from the first word at a first time and a maximum degree of similarity is obtained from one of the second word and the compound word at a second time.

Type: Grant

Filed: September 18, 1996

Date of Patent: August 25, 1998

Assignee: Ricoh Company, Ltd.

Inventor: Masaru Kuroda

prev … 2 3 4 5 6 7 next