Distance Patents (Class 704/238)

Speech recognition support method and apparatus

Patent number: 6718304

Abstract: A speech recognition support method in a system to retrieve a map in response to a user's input speech. The user's speech is recognized and a recognition result is obtained. If the recognition result represents a point on the map, a distance between the point and a base point on the map is calculated. The distance is decided to be above a threshold or not. If the distance is above the threshold, an inquiry to confirm whether the recognition result is correct is output to the user.

Type: Grant

Filed: June 29, 2000

Date of Patent: April 6, 2004

Assignee: Kabushiki Kaisha Toshiba

Inventors: Mitsuyoshi Tachimori, Hiroshi Kanazawa
Speech recognizing apparatus

Patent number: 6701292

Abstract: A speech-recognizing apparatus for recognizing input speech comprises, an analysis unit for computing a characteristic vector for each of frames of the input speech, a correction-value storage unit for storing a correction distance in advance, a vector-to-vector-distance-computing unit for computing a vector-to-vector distance between the characteristic vector and the phoneme characteristic vector, an average-value-computing unit for computing an average value of vector-to-vector distances for one of the frames, a correction unit for computing a corrected vector-to-vector distance as a value of an expression of (the vector-to-vector distance-the average value+the correction distance), and a recognition unit for cumulating corrected vector-to-vector distances into a cumulative vector-to-vector distance and comparing the cumulative vector-to-vector distance with the word standard pattern in order to recognize the input speech.

Type: Grant

Filed: October 30, 2000

Date of Patent: March 2, 2004

Assignee: Fujitsu Limited

Inventors: Chiharu Kawai, Hiroshi Katayama, Takehiro Nakai
Dynamic time warping of speech

Publication number: 20030220789

Abstract: A method includes (i) measuring first distances between (a) vectors belonging to a set of vectors that represent an utterance and (b) vectors belonging to a set of vectors that represent a template, the measuring being done in accordance with a first order of the utterance vectors a first order of the template vectors, and (ii) measuring second distances between (a) individual vectors belonging to the set of vectors that represent the utterance and (b) individual vectors belonging to the set of vectors that represent the template, the measuring being done in accordance with a second order of the utterance vectors and a second order of the template vectors, and (iii) in which the first template vector order and the second template vector order are different and/or the first utterance vector order and the second utterance vector order are different.

Type: Application

Filed: May 21, 2002

Publication date: November 27, 2003

Inventor: Veton K. Kepuska
Pattern matching for large vocabulary speech recognition systems

Publication number: 20030200085

Abstract: A method is provided for improving pattern matching in a speech recognition system having a plurality of acoustic models. The improved method includes: receiving continuous speech input; generating a sequence of acoustic feature vectors that represent temporal and spectral behavior of the speech input; loading a first group of acoustic feature vectors from the sequence of acoustic feature vectors into a memory workspace accessible to a processor; loading an acoustic model from the plurality of acoustic models into the memory workspace; and determining a similarity measure for each acoustic feature vector of the first group of acoustic feature vectors in relation to the acoustic model. Prior to retrieving another group of acoustic feature vectors, similarity measures are computed for the first group of acoustic feature vectors in relation to each of the acoustic models employed by the speech recognition system.

Type: Application

Filed: April 22, 2002

Publication date: October 23, 2003

Inventors: Patrick Nguyen, Luca Rigazio
Speech recognizing apparatus and speech recognizing method

Publication number: 20030125943

Abstract: A recognizing target vocabulary comparing unit calculates a compared likelihood of a recognizing target vocabulary, i.e., a compared likelihood of a registered vocabulary, by using the time series of the amount of characteristics of an input speech. An environment adaptive noise model comparing unit calculates a compared likelihood of a noise model adaptive to a noise environment, i.e., a compared likelihood of environmental noise. A rejection determining unit compares the likelihood of the registered vocabulary with the likelihood of the environmental noise, and determines whether or not the input speech is the noise. When it is determined that the input speech is the noise, a noise model adapting unit adaptively updates an environment adaptive noise model by using the input speech. Thus, the environment adaptive noise model matches to a real environment and the rejection determination can be performed for a noise input with high accuracy.

Type: Application

Filed: December 27, 2002

Publication date: July 3, 2003

Applicant: KABUSHIKI KAISHA TOSHIBA

Inventor: Ryosuke Koshiba
Phonetic distance calculation method for similarity comparison between phonetic transcriptions of foreign words

Patent number: 6581034

Abstract: A phonetic distance calculation method for similarity comparison between phonetic transcriptions of foreign words. A system manager defines character element transformation patterns occurrable between phonetic transcriptions derived from the same foreign language. A system generates new phonetic transcriptions according to the defined character element transformation patterns and assigns a demerit mark to each of the generated phonetic transcriptions according to a phonetic distance. A minimum phonetic distance between each of the generated phonetic transcriptions and a given phonetic transcription is calculated on the basis of a minimum edit distance calculation method. Any one of the generated phonetic transcriptions with a smallest one of the calculated minimum phonetic distances is determined to be most similar to the given phonetic transcription.

Type: Grant

Filed: January 17, 2000

Date of Patent: June 17, 2003

Assignee: Korea Advanced Institute of Science and Technology

Inventors: Key-Sun Choi, Byung-ju Kang
System and method for compressing biometric models

Patent number: 6580814

Abstract: A system and method for building compressed biometric models and performing biometric identification using such models. The use of the compressed biometric models results in a significant decrease in the storage requirements for biometric models in conventional biometric systems. A given number of L reference biometric models are built. The L reference models are randomly divided into M subsets. During user enrollment, distance measurements between a temporary biometric model and each of the reference models in the M subsets are computed.

Type: Grant

Filed: July 31, 1998

Date of Patent: June 17, 2003

Assignee: International Business Machines Corporation

Inventors: Abraham P. Ittycheriah, Stephane H. Maes
Voice recognition rejection scheme

Patent number: 6574596

Abstract: A voice recognition rejection scheme for capturing an utterance includes the steps accepting the utterance, applying an N-best algorithm to the utterance, or rejecting the utterance. The utterance is accepted if a first predefined relationship exists between one or more closest comparison results for the utterance with respect to a stored word and one or more differences between the one or more closest comparison results and one or more other comparison results between the utterance and one or more other stored words. An N-best algorithm is applied to the utterance if a second predefined relationship exists between the one or more closest comparison results and the one or more differences between the one or more closest comparison results and the one or more other comparison results.

Type: Grant

Filed: February 8, 1999

Date of Patent: June 3, 2003

Assignee: Qualcomm Incorporated

Inventors: Ning Bi, Chienchung Chang, Harinath Garudadri, Andrew P. Dejaco
Concatenative speech synthesis using a finite-state transducer

Publication number: 20030055641

Abstract: A method for concatenative speech synthesis includes a processing stage that selects segments based on their symbolic labeling in an efficient graph-based search, which uses a finite-state transducer formalism. This graph-based search uses a representation of concatenation constraints and costs that does not necessarily grow with the size of the source corpus thereby limiting the increase in computation required for the search as the size of the source corpus increases. In one application of this method, multiple alternative segment sequences are generated and a best segment sequence is then be selected using characteristics that depend on specific signal characteristics of the segments.

Type: Application

Filed: September 17, 2001

Publication date: March 20, 2003

Inventors: Jon Rong-Wei Yi, James Robert Glass, Irvine Lee Hetherington
METHOD AND APPARATUS FOR SPEAKER RECOGNITION USING A HIERARCHICAL SPEAKER MODEL TREE

Publication number: 20030014250

Abstract: A method for generating a hierarchical speaker model tree. In an illustrative embodiment, a speaker model is generated for each of a number of speakers from which speech samples have been obtained. Each speaker model contains a collection of distributions of audio feature data derived from the speech sample of the associated speaker. The hierarchical speaker model tree is created by merging similar speaker models on a layer by layer basis. Each time two or more speaker models are merged, a corresponding parent speaker model is created in the next higher layer of the tree. The tree is useful in applications such as speaker verification and speaker identification. A speaker verification method is disclosed in which a claimed ID from a claimant is received, where the claimed ID represents a speaker corresponding to a particular one of the speaker models. A cohort set of similar speaker models associated with the particular speaker model is established.

Type: Application

Filed: January 26, 1999

Publication date: January 16, 2003

Inventors: HOMAYOON S. M. BEIGI, STEPHANE H. MAES, JEFFREY S. SORENSEN
Method and apparatus for determining a measure of similarity between natural language sentences

Publication number: 20030004716

Abstract: Systems and methods for classifying natural language (NL) sentences using a combination of NL algorithms or techniques is disclosed. Each NL algorithm or technique may identify a different similarity trait between two or more sentences, and each may help compare the meaning of the sentences. By combining the various similarity factors, preferably by various weighting factors, a distance metric can be computed. The distance metric provides a measure of the overall similarity between sentences, and can be used to assign a sentences to an appropriate sentence category.

Type: Application

Filed: June 29, 2001

Publication date: January 2, 2003

Inventors: Karen Z. Haigh, Kevin M. Kramer
Memory having speech recognition support, by integrated local distance-computation/reference-vector-storage, for applications with general-purpose microprocessor systems

Patent number: 6490559

Abstract: The distance computation represents a central, constantly recurrent task in sample and speech recognition. It is used in speech recognition as a degree of similarity between a part of a speech utterance and a speech reference. In picture processing and sample recognition, it is used for data compression. The distance computation requires the longest computation time so that a reduction of the computation time results in a considerable efficiency improvement. A reduction of the computation time is achieved by the integration of the distance computation in a memory module in which particularly the reference data are stored. Due to this integration, the other components of the overall system are relieved of this constantly recurrent task and are available for more complex processes in this period of time. This integration makes the distance computation essentially shorter because the communication between memory sections and computation unit takes place directly without utilizing a busy system.

Type: Grant

Filed: October 13, 1998

Date of Patent: December 3, 2002

Assignee: Koninklijke Philips Electronics N.V.

Inventors: Wolfgang O. Budde, Volker Steinbiss
Speech recognition device

Publication number: 20020169607

Abstract: The speech recognition device, which can realize speech recognition with a small-scaled circuit, has been disclosed. The speech recognition device comprises the similarity circuit, which receives speech input signals and puts out characteristics based on the self-organizing algorithm, and the matrix circuit that performs the matrix operations of the output signal, wherein: the similarity circuit comprises a circuit that calculates distances between plural multi-dimensional input vectors and the pattern vectors prepared in advance, calculates a value corresponding to one dimension using a pair of neuron MOSFETS, and forms a voltage signal in accordance with the degree of similarity by summing up the current that flows in each neuron MOSFET; and the matrix circuit, in which capacitors corresponding to weighting operations are arranged in matrix, receives a voltage signal in accordance with the degree of similarity and outputs what is most similar, to the patterns prepared in advance.

Type: Application

Filed: December 27, 2001

Publication date: November 14, 2002

Inventors: Yoshikazu Miyanaga, Masayuki Kabasawa
Face synthesis system and methodology

Patent number: 6449595

Abstract: A system and method for synthesizing a facial image, compares a speech frame from an incoming speech signal with acoustic features stored within visually similar entries in an audio-visual codebook to produce a set of weights. The audio-visual codebook also stores visual features corresponding to the acoustic features. A composite visual feature is generated as a weighted sum of the corresponding visual features, from which the facial image is synthesized. The audio-visual codebook may include multiple samples of the acoustic and visual features for each entry, which corresponds to a sequence of one or more phonemes.

Type: Grant

Filed: March 11, 1999

Date of Patent: September 10, 2002

Assignee: Microsoft Corporation

Inventors: Levent Mustafa Arslan, David Thieme Talkin
Method and device for enhancing the recognition of speech among speech-impaired individuals

Patent number: 6413092

Abstract: A method and a system is disclosed that provide means to enable individuals with speech, language and reading based communication disabilities, due to a temporal processing problem, to improve their temporal processing abilities as well as their communication abilities. The method and system include provisions to elongate portions of phonemes that have brief and/or rapidly changing acoustic spectra, such as occur in the stop consonants b and d in the phonemes /ba/ and /da/, as well as reduce the duration of the steady state portion of the syllable. In addition, some emphasis is added to the rapidly changing segments of these phonemes. Additionally, the disclosure includes method for and computer software to modify fluent speech to make the modified speech better recognizable by communicatively impaired individuals. Finally, the disclosure includes method for and computer software to train temporal processing abilities, specifically speed and precision of temporal integration, sequencing and serial memory.

Type: Grant

Filed: June 5, 2000

Date of Patent: July 2, 2002

Assignees: The Regents of the University of California, Rutgers, The State University of New Jersey

Inventors: Paula Anne Tallal, Michael Mathias Merzenich, William Michael Jenkins, Steven Lamont Miller, Christoph E. Schreiner
Method and device for enhancing the recognition of speech among speech-impaired individuals

Patent number: 6413093

Abstract: A method and a system is disclosed that provide means to enable individuals with speech, language and reading based communication disabilities, due to a temporal processing problem, to improve their temporal processing abilities as well as their communication abilities. The method and system include provisions to elongate portions of phonemes that have brief and/or rapidly changing acoustic spectra, such as occur in the stop consonants b and d in the phonemes /ba/ and /da/, as well as reduce the duration of the steady state portion of the syllable. In addition, some emphasis is added to the rapidly changing segments of these phonemes. Additionally, the disclosure includes method for and computer software to modify fluent speech to make the modified speech better recognizable by communicatively impaired individuals. Finally, the disclosure includes method for and computer software to train temporal processing abilities, specifically speed and precision of temporal integration, sequencing and serial memory.

Type: Grant

Filed: September 19, 2000

Date of Patent: July 2, 2002

Assignees: The Regents of the University of California, Rutgers, The State University of New Jersey

Inventors: Paula Anne Tallal, Michael Mathias Merzenich, William Michael Jenkins, Steven Lamont Miller, Christoph E. Schreiner
Method and device for enhancing the recognition of speech among speech-impaired individuals

Patent number: 6413095

Abstract: A method and a system is disclosed that provide means to enable individuals with speech, language and reading based communication disabilities, due to a temporal processing problem, to improve their temporal processing abilities as well as their communication abilities. The method and system include provisions to elongate portions of phonemes that have brief and/or rapidly changing acoustic spectra, such as occur in the stop consonants b and d in the phonemes /ba/ and /da/, as well as reduce the duration of the steady state portion of the syllable. In addition, some emphasis is added to the rapidly changing segments of these phonemes. Additionally, the disclosure includes method for and computer software to modify fluent speech to make the modified speech better recognizable by communicatively impaired individuals. Finally, the disclosure includes method for and computer software to train temporal processing abilities, specifically speed and precision of temporal integration, sequencing and serial memory.

Type: Grant

Filed: September 19, 2000

Date of Patent: July 2, 2002

Assignees: The Regents of the University of California, Rutgers, The State University of New Jersey

Inventors: Paula Anne Tallal, Michael Mathias Merzenich, William Michael Jenkins, Steven Lamont Miller, Christoph E. Schreiner
Method and device for enhancing the recognition of speech among speech-impaired individuals

Patent number: 6413096

Abstract: A method and a system is disclosed that provide means to enable individuals with speech, language and reading based communication disabilities, due to a temporal processing problem, to improve their temporal processing abilities as well as their communication abilities. The method and system include provisions to elongate portions of phonemes that have brief and/or rapidly changing acoustic spectra, such as occur in the stop consonants b and d in-the phonemes /ba/ and /da/, as well as reduce the duration of the steady state portion of the syllable. In addition, some emphasis is added to the rapidly changing segments of these phonemes. Additionally, the disclosure includes method for and computer software to modify fluent speech to make the modified speech better recognizable by communicatively impaired individuals. Finally, the disclosure includes method for and computer software to train temporal processing abilities, specifically speed and precision of temporal integration, sequencing and serial memory.

Type: Grant

Filed: September 19, 2000

Date of Patent: July 2, 2002

Assignees: The Regents of the University of California, Rutgers, The State University of New Jersey

Inventors: Paula Anne Tallal, Michael Mathias Merzenich, William Michael Jenkins, Steven Lamont Miller, Christoph E. Schreiner
Method and device for enhancing the recognition of speech among speech-impaired individuals

Patent number: 6413094

Abstract: A method and a system is disclosed that provide means to enable individuals with speech, language and reading based communication disabilities, due to a temporal processing problem, to improve their temporal processing abilities as well as their communication abilities. The method and system include provisions to elongate portions of phonemes that have brief and/or rapidly changing acoustic spectra, such as occur in the stop consonants b and d in the phonemes /ba/ and /da/, as well as reduce the duration of the steady state portion of the syllable. In addition, some emphasis is added to the rapidly changing segments of these phonemes. Additionally, the disclosure includes method for and computer software to modify fluent speech to make the modified speech better recognizable by communicatively impaired individuals. Finally, the disclosure includes method for and computer software to train temporal processing abilities, specifically speed and precision of temporal integration, sequencing and serial memory.

Type: Grant

Filed: September 19, 2000

Date of Patent: July 2, 2002

Assignees: The Regents of the University of California, Rutgers, The State University of New Jersey

Inventors: Paula Anne Tallal, Michael Mathias Merzenich, William Michael Jenkins, Steven Lamont Miller, Christoph E. Schreiner
Method and device for enhancing the recognition of speech among speech-impaired individuals

Patent number: 6413098

Abstract: A method and a system is disclosed that provide means to enable individuals with speech, language and reading based communication disabilities, due to a temporal processing problem, to improve their temporal processing abilities as well as their communication abilities. The method and system include provisions to elongate portions of phonemes that have brief and/or rapidly changing acoustic spectra, such as occur in the stop consonants b and d in the phonemes /ba/ and /da/, as well as reduce the duration of the steady state portion of the syllable. In addition, some emphasis is added to the rapidly changing segments of these phonemes. Additionally, the disclosure includes method for and computer software to modify fluent speech to make the modified speech better recognizable by communicatively impaired individuals. Finally, the disclosure includes method for and computer software to train temporal processing abilities, specifically speed and precision of temporal integration, sequencing and serial memory.

Type: Grant

Filed: September 19, 2000

Date of Patent: July 2, 2002

Assignees: The Regents of the University of California, Rutgers, The State University of New Jersey

Inventors: Paula Anne Tallal, Michael Mathias Merzenich, William Michael Jenkins, Steven Lamont Miller, Christoph E. Schreiner
Method and device for enhancing the recognition of speech among speech-impaired individuals

Patent number: 6413097

Abstract: A method and a system is disclosed that provide means to enable individuals with speech, language and reading based communication disabilities, due to a temporal processing problem, to improve their temporal processing abilities as well as their communication abilities. The method and system include provisions to elongate portions of phonemes that have brief and/or rapidly changing acoustic spectra, such as occur in the stop consonants b and d in the phonemes /ba/ and /da/, as well as reduce the duration of the steady state portion of the syllable. In addition, some emphasis is added to the rapidly changing segments of these phonemes. Additionally, the disclosure includes method for and computer software to modify fluent speech to make the modified speech better recognizable by communicatively impaired individuals. Finally, the disclosure includes method for and computer software to train temporal processing abilities, specifically speed and precision of temporal integration, sequencing and serial memory.

Type: Grant

Filed: September 19, 2000

Date of Patent: July 2, 2002

Assignees: The Regents of the University of California, Rutgers, The State University of New Jersey

Inventors: Paula Anne Tallal, Michael Mathias Merzenich, William Michael Jenkins, Steven Lamont Miller, Christoph E. Schreiner
Cohort model selection apparatus and method

Patent number: 6393397

Abstract: An apparatus for selecting a cohort model for use in a speaker verification system includes a model generator (108) for determining a target speaker model (114) from a speech sample collected from the target speaker (106). A cohort selector (110) determines a similarity value between each of a number of predetermined existing speaker models from a model pool (112) and the target speaker model (114) and a dissimilarity value between each of the existing speaker models and any previously selected cohort models (116). An existing speaker model which is most similar to the target speaker model, but most dissimilar to previously chosen cohort models, is then chosen as another cohort model for the target speaker.

Type: Grant

Filed: June 14, 1999

Date of Patent: May 21, 2002

Assignee: Motorola, Inc.

Inventors: Ho Chuen Choi, Xiaoyuan Zhu, Jianming Song
Method and apparatus for speaker recognition

Patent number: 6349280

Abstract: A method of recognizing a speaker of an input speech according to the distance between an input speech pattern, obtained by converting the input speech to a feature parameter series, and a reference pattern preliminarily registered as feature parameter series for each speaker is provided. Contents of the input and reference speech patterns is obtained by recognition. An identical section, in which the contents of the input and reference speech patterns are identical is determined. The distance between the input and reference speech patterns in the calculated identical content section is determined. The speaker of the input speech is recognized on the basis of the determined distance.

Type: Grant

Filed: March 4, 1999

Date of Patent: February 19, 2002

Assignee: NEC Corporation

Inventor: Hiroaki Hattori
INTEGRATED DISTANCE-COMPUTATION/REFERENCE-VECTOR-MEMORY MODULE FOR MICROCOMPUTER SPEECH RECOGNIZER

Publication number: 20020007274

Abstract: The distance computation represents a central, constantly recurrent task in sample and speech recognition. It is used in speech recognition as a degree of similarity between a part of a speech utterance and a speech reference. In picture processing and sample recognition, it is used for data compression (MPEG). The distance computation requires the longest computation time so that a reduction of the computation time results in a considerable efficiency improvement. A reduction of the computation time is achieved by the integration of the distance computation in a memory module (1) in which particularly the reference data are stored. Due to this integration, the other components (2, 3, 4) of the overall system are relieved of this constantly recurrent task and are available for more complex processes in this period of time.

Type: Application

Filed: October 13, 1998

Publication date: January 17, 2002

Inventors: WOLFGANG O. BUDDE, VOLKER STEINBIB
Method and apparatus for clustering-based signal segmentation

Patent number: 6314392

Abstract: In a computerized method a continuous signal is segmented in order to determine statistically stationary units of the signal. The continuous signal is sampled at periodic intervals to produce a timed sequence of digital samples. Fixed numbers of adjacent digital samples are grouped into a plurality of disjoint sets or frames. A statistical distance between adjacent frames is determined. The adjacent sets are merged into a larger set of samples or cluster if the statistical distance is less than a predetermined threshold. In an iterative process, the statistical distance between the adjacent sets are determined, and as long as the distance is less than the predetermined threshold, the sets are iteratively merged to segment the signal into statistically stationary units.

Type: Grant

Filed: September 20, 1996

Date of Patent: November 6, 2001

Assignee: Digital Equipment Corporation

Inventors: Brian S. Eberman, William D. Goldenthal
Method and apparatus of speech recognition and speech control system using the speech recognition method

Patent number: 6308152

Abstract: A string of acoustic feature parameters of each of recognition-desired words and a string of acoustic feature parameters of each of reception words are registered in advance. When an uttered word is received, a string of acoustic feature parameters is extracted from the uttered word, the acoustic feature parameters of the uttered word is compared with the string of acoustic feature parameters of each recognition-desired word, and a recognition-desired word recognition score indicating a similarity degree between the uttered word and each recognition-desired word is calculated. Also, a reception word recognition score indicating a similarity degree between the uttered word and each reception word is calculated.

Type: Grant

Filed: June 22, 1999

Date of Patent: October 23, 2001

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Tomohiro Konuma, Hiroyasu Kuwano
Inter-pattern distance calculation method and apparatus thereof, and pattern recognition method and apparatus thereof

Publication number: 20010014858

Abstract: An pattern dissimilarity calculator according to the present invention, which calculates a pattern dissimilarity between a first and second sequence feature pattern using either the DP matching approach or the HMM (Hidden Marcov Model) approach, comprises cumulative distance calculator 1 for calculating the distance between frame i of said first sequence feature pattern and each of frames of said second sequence feature pattern, and obtaining a current cumulative distance by adding to an cumulative distance obtained in terms of frame i−1, which is decoded in cumulative decoder 4; and cumulative distance encoder 2 for encoding the cumulative distance calculated by the cumulative distance calculator 1. The cumulative distance decoder 4 decodes cumulative distances encoded by the encoding means.

Type: Application

Filed: February 23, 2001

Publication date: August 16, 2001

Applicant: NEC Corporation

Inventor: Hiroshi Hirayama
Search and rescoring method for a speech recognition system

Patent number: 6253178

Abstract: Speech recognition systems and methods consistent with the present invention process input speech signals organized into a series of frames. The input speech signal is decimated to select K frames out of every L frames of the input speech signal according to a decimation rate K/L. A first set of model distances is then calculated for each of the K selected frames of the input speech signal, and a Hidden Markov Model (HMM) topology of a first set of models is reduced according to the decimation rate K/L. The system then selects a reduced set of model distances from the computed first set of model distances according to the reduced HMM topology and selects a first plurality of candidate choices for recognition according to the reduced set of model distances. A second set of model distances is computed, using a second set of models, for a second plurality of candidate choices, wherein the second plurality of candidate choices correspond to at least a subset of the first plurality of candidate choices.

Type: Grant

Filed: September 22, 1997

Date of Patent: June 26, 2001

Assignee: Nortel Networks Limited

Inventors: Serge Robillard, Nadia Girolamo, Andre Gillet, Waleed Fakhr
Method for measuring distance between collections of distributions

Patent number: 6246982

Abstract: A method for computing a distance between collections of distributions or finite mixture models of features. Data is processed so as to define at least first and second collections of distributions of features. For each distribution of the first collection, the distance to each distribution of the second collection is measured to determine which distribution of the second collection is the closest (most similar). The same procedure is performed for the distributions of the second collection. Based on the closest distance measures, a final distance is computed representing the distance between the first and second collections. This final distance may be a weighted sum of the closest distances. The distance measure may be used in a number of applications such as [speaker classification,] speaker recognition and audio segmentation.

Type: Grant

Filed: January 26, 1999

Date of Patent: June 12, 2001

Assignee: International Business Machines Corporation

Inventors: Homayoon S. M. Beigi, Stephane H. Maes, Jeffrey S. Sorensen
Speech recognition apparatus and method for matching inputted speech and a word generated from stored referenced phoneme data

Patent number: 6236964

Abstract: A speech recognition method and apparatus in which a speech section is sliced by the unit of a word by spotting and candidate words are selected. Next, in a second stage, matching is conducted by the unit of a phoneme. Consequently, selection of the candidate words and slicing of the speech section can be performed concurrently. Furthermore, narrowing of the candidate words is facilitated. Furthermore, since reference phoneme patterns under a plurality of environments are prepared, recognition of an input speech under a larger number of conditions is possible using a smaller amount of data when compared with the case in which reference word patterns under a plurality of environments are prepared.

Type: Grant

Filed: February 14, 1994

Date of Patent: May 22, 2001

Assignee: Canon Kabushiki Kaisha

Inventors: Junichi Tamura, Tetsuo Kosaka, Atsushi Sakurai
DP Pattern matching which determines current path propagation using the amount of path overlap to the subsequent time point

Patent number: 6226610

Abstract: A method and apparatus for matching a first sequence of patterns representative of a first signal with a second sequence of patterns representative of a second signal using a dynamic programming matching technique is described. The second signal patterns which are at the end of a dynamic programming path for a current first signal pattern are listed in an active list 201. The dynamic programming paths are propagated by processing the second signal patterns on the active list, and a new active list 205 is generated for the succeeding input pattern. In order to propagate each path, the system determines how many second signal patterns lie within an overlap region in which a comparison has to be made, and processes each path in dependence upon the determined amount of overlap.

Type: Grant

Filed: February 8, 1999

Date of Patent: May 1, 2001

Assignee: Canon Kabushiki Kaisha

Inventors: Robert Alexander Keiller, Eli Tzirkel-Hancock, Julian Richard Seward
Method for reducing search complexity in a speech recognition system

Patent number: 6178401

Abstract: A method is provided for reducing search complexity in a speech recognition system having a fast match, a detailed match, and a language model. Based on at least one predetermined variable, the fast match is optionally employed to generate candidate words and acoustic scores corresponding to the candidate words. The language model is employed to generate language model scores. The acoustic scores are combined with the language model scores and the combined scores are ranked to determine top ranking candidate words to be later processed by the detailed match, when the fast match is employed. The detailed match is employed to generate detailed match scores for the top ranking candidate words.

Type: Grant

Filed: August 28, 1998

Date of Patent: January 23, 2001

Assignee: International Business Machines Corporation

Inventors: Martin Franz, Miroslav Novak
Speaker verification and speaker identification based on eigenvoices

Patent number: 6141644

Abstract: Speech models are constructed and trained upon the speech of known client speakers (and also impostor speakers, in the case of speaker verification). Parameters from these models are concatenated to define supervectors and a linear transformation upon these supervectors results in a dimensionality reduction yielding a low-dimensional space called eigenspace. The training speakers are then represented as points or distributions in eigenspace. Thereafter, new speech data from the test speaker is placed into eigenspace through a similar linear transformation and the proximity in eigenspace of the test speaker to the training speakers serves to authenticate or identify the test speaker.

Type: Grant

Filed: September 4, 1998

Date of Patent: October 31, 2000

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Roland Kuhn, Patrick Nguyen, Jean-Claude Junqua, Robert Boman
Speech recognition method and system in which said method is implemented

Patent number: 6138094

Abstract: In a speech recognition method and system a spoken word to be recognized is broken down into input vectors (K2a), ambient noise is evaluated (K1), a recognized word is chosen from a dictionary having associated reference vectors which are separated from the input vectors by a shortest distance (K3), and the recognized word is validated by a comparison of this distance with a threshold value which is derived as a function of the result of the evaluation of ambient noise. The ambient noise evaluation may be carried out at an instant when the speaker is silent, i.e. either before or after the speaker speaks the word to be recognized.

Type: Grant

Filed: January 27, 1998

Date of Patent: October 24, 2000

Assignee: U.S. Philips Corporation

Inventors: Gilles Miet, Benoit Guilhaumon
Speech recognition method and apparatus using coarse and fine output probabilities utilizing an unspecified speaker model

Patent number: 6108628

Abstract: A high-speed speech recognition method with a high recognition rate, utilizing speaker models, includes the steps of executing an acoustic process on the input speech, calculating a coarse output probability utilizing an unspecified speaker model, and calculating a fine output probability utilizing an unspecified speaker model and clustered speaker models, for the states estimated, by the result of coarse calculation, to contribute to the results of recognition. Candidates of recognition are then extracted by a common language search based on the obtained result, and a fine language search is conducted on the thus extracted candidates to determine the result of recognition.

Type: Grant

Filed: September 16, 1997

Date of Patent: August 22, 2000

Assignee: Canon Kabushiki Kaisha

Inventors: Yasuhiro Komori, Tetsuo Kosaka, Masayuki Yamada
Speech processing using maximum likelihood continuity mapping

Patent number: 6052662

Abstract: Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.

Type: Grant

Filed: January 29, 1998

Date of Patent: April 18, 2000

Assignee: Regents of the University of California

Inventor: John E. Hogden
Distance measure in a speech recognition system for speech recognition using frequency shifting factors to compensate for input signal frequency shifts

Patent number: 6032116

Abstract: One embodiment of a speech recognition system is organized with speech input signal preprocessing and feature extraction followed by a fuzzy matrix quantizer (FMQ). Frames of the speech input signal are represented by a vector .function. of line spectral pair frequencies and are fuzzy matrix quantized to respective a vector .function. entries in a codebook of the FMQ. A distance measure between .function. and .function., d(.function.,.function.), is defined as ##EQU1## where the constants .alpha..sub.1, a.sub.2, .beta..sub.1 and .beta..sub.2 are set to substantially minimize quantization error, and e.sub.i is the error power spectrum of the speech input signal and a predicted speech input signal at the ith line spectral pair frequency of the speech input signal. The speech recognition system may also include hidden Markov models and neural networks, such as a multilevel perceptron neural network, speech classifiers.

Type: Grant

Filed: June 27, 1997

Date of Patent: February 29, 2000

Assignee: Advanced Micro Devices, Inc.

Inventors: Safdar M. Asghar, Lin Cong
Method for coding human speech by joining source frames and an apparatus for reproducing human speech so coded

Patent number: 6009384

Abstract: For coding human speech for subsequent audio reproduction thereof, a plurality of speech segments is derived from speech received, and systematically stored in a data base for later concatenated readout. After the deriving, respective speech segments are fragmented into temporally consecutive source frames, similar source frames as governed by a predetermined similarity measure thereamongst that is based on an underlying parameter set are joined, and joined source frames are collectively mapped onto a single storage frame. Respective segments are stored as containing sequenced referrals to storage frames for therefrom reconstituting the segment in question.

Type: Grant

Filed: May 20, 1997

Date of Patent: December 28, 1999

Assignee: U.S. Philips Corporation

Inventors: Raymond N. J. Veldhuis, Paul A. P. Kaufholz
Line spectral frequencies and energy features in a robust signal recognition system

Patent number: 6009391

Abstract: One embodiment of a speech recognition system is organized with speech input signal preprocessing and feature extraction followed by a fuzzy matrix quantizer (FMQ). Frames of the speech input signal are represented in a matrix by a vectorf of line spectral pair frequencies and energy coefficients and are fuzzy matrix quantized to respective vector f entries of a matrix codeword in a codebook of the FMQ. The energy coefficients include the original energy and the first and second derivatives of the original energy which increase recognition accuracy by, for example, being generally distinctive speech input signal parameters and providing noise signal suppression especially when the noise signal has a relatively constant energy over at least two time frame intervals. To reduce data while maintaining sufficient resolution, the energy coefficients may be normalized and logarithmically represented. A distance measure between f and f, d(f, f), is defined as ##EQU1## where the constants .alpha..sub.1, .alpha..sub.

Type: Grant

Filed: August 6, 1997

Date of Patent: December 28, 1999

Assignee: Advanced Micro Devices, Inc.

Inventors: Safdar M. Asghar, Lin Cong
Recognition system for determining whether speech is confusing or inconsistent

Patent number: 5987411

Abstract: Methods and systems consistent with the present invention enroll a candidate phrase uttered by a user in a dictionary having at least one previously enrolled phrase. The system receives utterances of the candidate phrase and determines whether the first utterance is confusingly similar to a previously enrolled phrase and whether they are consistent with each other. The system then enrolls the candidate phrase in the dictionary according to these determinations.

Type: Grant

Filed: December 17, 1997

Date of Patent: November 16, 1999

Assignee: Northern Telecom Limited

Inventors: Marco Petroni, Hung S. Ma
Speech recognition system using modifiable recognition threshold to reduce the size of the pruning tree

Patent number: 5970450

Abstract: A speech recognition system, in which partial reference patterns, and cumulative similarities of these patterns, are stored in a temporary pattern memory. The partial reference patterns are to be used as subjects of a similarity computation with an input speech pattern that has its feature quantities extracted by a speech analyzing unit. A counting unit counts partial reference patterns having corresponding cumulative similarities that are higher than a threshold value stored in a threshold memory. A threshold computing unit computes a threshold of pruning from a correspondence relation between the number of partial reference patterns that have corresponding cumulative similarities that exceed the threshold, and the threshold. A similarity computing unit computes a similarity, with respect to the feature quantities, of partial reference patterns with corresponding cumulative similarities that are greater than the threshold of pruning.

Type: Grant

Filed: November 24, 1997

Date of Patent: October 19, 1999

Assignee: NEC Corporation

Inventor: Hiroaki Hattori
Speech recognition using distance between feature vector of one sequence and line segment connecting feature-variation-end-point vectors in another sequence

Patent number: 5953699

Abstract: A speech recognition apparatus has an analysis section that outputs features of input speech as a time sequence of feature vectors defined for discrete time points corresponding to a processed speech frame. Reference paradigm utterances are converted into a time sequence of standard (reference) feature vectors. The possible continuous variation of standard feature vectors at each point in time is expressed by a line segment, or set of line segments, connecting the feature vectors for the two end points of the "movable" range within which the feature can change, rather than using a larger set of reference vectors as in a conventional multitemplate approach to speech recognition. For example, the continuous range of possible background noise levels in input speech defines a line segment connecting the two feature vectors at the two SNR value limits.

Type: Grant

Filed: October 28, 1997

Date of Patent: September 14, 1999

Assignee: NEC Corporation

Inventor: Keizaburo Takagi
Speech recognition system

Patent number: 5909665

Abstract: To construct an inexpensive speech recognition system, a speech recogntion system includes an analyzing unit for extracting a sound, sequentially dividing the sound into a plurality of frames, converting each of the frames sequentially to first data, and sequentially storing the first data to an input pattern memory, a distance calculating unit for reading a predetermined number of the first data from the input pattern memory, reading one of second data from a standard pattern memory, calculating first distances between each of the predetermined number of the first data and the one of the second data, and a judging unit for judging a word representing the sound based on the first distances.

Type: Grant

Filed: May 29, 1997

Date of Patent: June 1, 1999

Assignee: NEC Corporation

Inventor: Yasuko Kato
Method and apparatus for adapting the language model's size in a speech recognition system

Patent number: 5899973

Abstract: In this speech recognition system, the size of the language model is reduced by discarding those n-grams that the acoustic part of the system can recognize most accurately without support from a language model. The n-grams can be discarded dynamically during the running of the system or during the build or setup-time of the system. Trigrams occurring infrequently in the text corpora are substituted for the discarded n-grams to increase the accuracy of the word recognitions.

Type: Grant

Filed: September 25, 1997

Date of Patent: May 4, 1999

Assignee: International Business Machines Corporation

Inventors: Upali Bandara, Siegfried Kunzmann, Karlheinz Mohr, Burn L. Lewis
Speech recognition method and apparatus for recognizing phonemes using a plurality of speech analyzing and recognizing methods for each kind of phoneme

Patent number: 5893058

Abstract: A method and apparatus for recognizing speech employing a word dictionary in which the phoneme of words are stored and for recognizing speech based on the recognition of the phonemes. The method and apparatus recognize phonemes and produce data associated with each phoneme according to different speech analyzing and recognizing methods for each kind of phoneme, normalize the produced data, and match the recognized phonemes with words in the word dictionary by means of dynamic programming based on the normalized data.

Type: Grant

Filed: November 14, 1994

Date of Patent: April 6, 1999

Assignee: Canon Kabushiki Kaisha

Inventor: Tetsuo Kosaka
Digital signal processor arrangement and method for comparing feature vectors

Patent number: 5819219

Abstract: A digital signal processor employable for utilization for speech processing or for some other pattern recognition overcomes the weaknesses of digital signal processors given the subtraction with following amount formation that must often be implemented in these applications, an auxiliary hardware is provided that contains the feature vector that is to be compared to reference feature vectors from the dictionary in a separate memory. The calculating work is thereby implemented by a separate arithmetic unit that provides a separate difference-forming and amount-forming unit for each feature comparison. The number of clock cycles of the digital signal processor required per comparison can be dramatically reduced by the invention. A suitable addressing method thereby assures that it is always corresponding features of the individual feature vectors that can be compared to one another.

Type: Grant

Filed: December 11, 1996

Date of Patent: October 6, 1998

Assignee: Siemens Aktiengesellschaft

Inventors: Luc De Vos, Daniel Goryn
Speech recognition apparatus using neural network and learning method therefor

Patent number: 5809461

Abstract: A speech recognition apparatus using a neural network is provided. A neuron-like element stores a value of its inner conditions. The neuron-like element also updates a value of its internal status on the basis of an output from the neuron-like element itself, outputs from other neuron-like elements and an external input outside. The neuron-like element also converts a value of its internal status into an external output. Accordingly, the neuron-like element itself can retain the history of input data. This enables time series data, such as speech, to be processed without providing any special devices in the neural network.

Type: Grant

Filed: June 7, 1995

Date of Patent: September 15, 1998

Assignee: Seiko Epson Corporation

Inventor: Mitsuhiro Inazumi
Speaker independent speech recognition method utilizing multiple training iterations

Patent number: 5806034

Abstract: A method for recognizing spoken utterances of a speaker is disclosed, the method comprising the steps of providing a database of labeled speech data; providing a prototype of a Hidden Markov Model (HMM) definition to define the characteristics of the HMM; and parameterizing speech utterances according to one of linear prediction parameters or Mel-scale filter bank parameters. The method further includes selecting a frame period for accommodating the parameters and generating HMMs and decoding to specified speech utterances by causing the user to utter predefined training speech utterances for each HMM. The method then statistically computes the generated HMMs with the prototype HMM to provide a set of fully trained HMMs for each utterance indicative of the speaker.

Type: Grant

Filed: August 2, 1995

Date of Patent: September 8, 1998

Assignee: ITT Corporation

Inventors: Joe A. Naylor, William Y. Huang, Lawrence G. Bahler
Speech recognizing device and method assuming a current frame is an end point of a current reference pattern

Patent number: 5799275

Abstract: A speech recognition system automatically designates a scope of a partial reference pattern. Plural reference patterns, each of which ends in each of composing frames and starts from a preceding frame, are supposed and cumulative distances at every frame are calculated. A partial reference pattern that has a minimal distance value as compared with all other partial reference patterns is taken as a partial input speech recognizing result.

Type: Grant

Filed: June 18, 1996

Date of Patent: August 25, 1998

Assignees: The Japan Iron and Steel Federation, Sharp Kabushiki Kaisha, Real World Computing Partnership

Inventors: Yoshiaki Itoh, Jiro Kiyama, Hiroshi Kojima, Susumu Seki, Ryuichi Oka
State-dependent speaker clustering for speaker adaptation

Patent number: 5787394

Abstract: A system and method for adaptation of a speaker independent speech recognition system for use by a particular user. The system and method gather acoustic characterization data from a test speaker and compare the data with acoustic characterization data generated for a plurality of training speakers. A match score is computed between the test speaker's acoustic characterization for a particular acoustic subspace and each training speaker's acoustic characterization for the same acoustic subspace. The training speakers are ranked for the subspace according to their scores and a new acoustic model is generated for the test speaker based upon the test speaker's acoustic characterization data and the acoustic characterization data of the closest matching training speakers. The process is repeated for each acoustic subspace.

Type: Grant

Filed: December 13, 1995

Date of Patent: July 28, 1998

Assignee: International Business Machines Corporation

Inventors: Lalit Rai Bahl, Ponani Gopalakrishnan, David Nahamoo, Mukund Padmanabhan

prev 1 2 3 4 5 6 next