Specialized Models Patents (Class 704/250)

Control of access for telephone service providers by means of voice recognition

Publication number: 20040015355

Abstract: The invention relates to a telecommunications network (1) comprising a plurality of network elements (2 to 7, 17) and a plurality of subscriber connections (8 to 10). Said network is characterised in that at least one of the network elements (2 to 7, 17) comprises a voice recognition module (18) for recognising biometric voice parameters.

Type: Application

Filed: June 5, 2003

Publication date: January 22, 2004

Inventors: Marian Trinkel, Franz Steimer, Christel Mueller
Voice recognition apparatus

Publication number: 20040015356

Abstract: The invention aims at providing voice recognition apparatus which can perform training without a speaker being conscious thereof by utilizing the fact that the name of a distant party is frequently uttered at the beginning of conversation over telephone and increase the recognition ratio and recognition speed of the speaker dependent system as the speaker uses the voice recognition apparatus. The invention includes a voice recognition processor of the speaker independent system for comparing acoustic data obtained by splitting an input sound signal with a plurality of word acoustic data and detecting word acoustic data matching the split acoustic data, wherein the voice recognition processor sequentially compares word acoustic data generated from a phoneme model with acoustic data generated from a name uttered by the speaker, and stores the acoustic data identifier corresponding to the generated acoustic data, which match the word acoustic data, as a training signal.

Type: Application

Filed: July 16, 2003

Publication date: January 22, 2004

Applicant: Matsushita Electric Industrial Co., Ltd.

Inventors: Kenji Nakamura, Hiroshi Harada, Yoshiyuki Ogata, Masakazu Tachiyama, Tatsuhiro Goto, Yasuyuki Nishioka, Yoshiaki Kuroki
Optimization of detection systems using a detection error tradeoff analysis criterion

Publication number: 20030216916

Abstract: In detection systems, such as speaker verification systems, for a given operating point range, with an associated detection “cost”, the detection cost is preferably reduced by essentially trading off the system error in the area of interest with areas essentially “outside” that interest. Among the advantages achieved thereby are higher optimization gain and better generalization. From a measurable Detection Error Tradeoff (DET) curve of the given detection system, a criterion is preferably derived, such that its minimization provably leads to detection cost reduction in the area of interest. The criterion allows for selective access to the slope and offset of the DET curve (a line in case of normally distributed detection scores, a curve approximated by mixture of Gaussians in case of other distributions). By modifying the slope of the DET curve, the behavior of the detection system is changed favorably with respect to the given area of interest.

Type: Application

Filed: May 19, 2002

Publication date: November 20, 2003

Applicant: IBM Corporation

Inventors: Jiri Navratil, Ganesh N. Ramaswamy
Method of and device for phone-based speaker recognition

Patent number: 6618702

Abstract: A language-independent speaker-recognition system based on parallel cumulative differences in dynamic realization of phonetic features ( i.e. , pronunciation) between speakers rather than spectral differences in voice quality. The system exploits phonetic information from many phone recognizers to perform text independent speaker recognition. A digitized speech signal from a speaker is converted to a sequence of phones by each phone recognizer. Each phone sequence is then modified based on the energy in the signal. The modified phone sequences are tokenized to produce phone n-grams that are compared against a speaker and a background model for each phone recognizer to produce log-likelihood ratio scores. The log-likelihood ratio scores from each phone recognizer are fused to produce a final recognition score for each speaker model. The recognition score for each speaker model is then evaluated to determine which of the modeled speakers, if any, produced the digitized speech signal.

Type: Grant

Filed: June 14, 2002

Date of Patent: September 9, 2003

Inventors: Mary Antoinette Kohler, Walter Doyle Andrews, III, Joseph Paul Campbell, Jr.
Intelligent personal assistants

Publication number: 20030167167

Abstract: An intelligent social agent is an animated computer interface agent with social intelligence that has been developed for a given application or type of applications and a particular user population. The social intelligence of the agent comes from the ability of the agent to be appealing, affective, adaptive, and appropriate when interacting with the user. An intelligent personal assistant is an implementation of an intelligent social agent that assists a user in operating a computing device and using application programs on a computing device.

Type: Application

Filed: May 31, 2002

Publication date: September 4, 2003

Inventor: Li Gong
Intelligent social agents

Publication number: 20030163311

Abstract: An intelligent social agent is an animated computer interface agent with social intelligence that has been developed for a given application or type of applications and a particular user population. The social intelligence of the agent comes from the ability of the agent to be appealing, affective, adaptive, and appropriate when interacting with the user.

Type: Application

Filed: April 30, 2002

Publication date: August 28, 2003

Inventor: Li Gong
Word boundary acoustic units

Patent number: 6606594

Abstract: A speech recognition system recognizes an input utterance of spoken words. The system includes a set of word models for modeling vocabulary to be recognized, each word model being associated with a word in the vocabulary, each word in the vocabulary considered as a sequence of phones including a first phone and a last phone, wherein each word model begins in the middle of the first phone of its associated word and ends in the middle of the last phone of its associated word; a set of word connecting models for modeling acoustic transitions between the middle of a word's last phone and the middle of an immediately succeeding word's first phone; and a recognition engine for processing the input utterance in relation to the set of word models and the set of word connecting models to cause recognition of the input utterance.

Type: Grant

Filed: September 29, 1999

Date of Patent: August 12, 2003

Assignee: ScanSoft, Inc.

Inventors: Vladimir Sejnoha, Tom Lynch, Ramesh Sarukkai
System, method and computer program product for spelling fallback during large-scale speech recognition

Publication number: 20030149565

Abstract: A system, method and computer program product are provided for recognizing utterances. Initially, an utterance is recognized. Thereafter, it is determined whether the utterance can be recognized utilizing speech recognition. If the utterance can not be recognized utilizing speech recognition, spelling recognition is used to recognize the utterance.

Type: Application

Filed: March 9, 2001

Publication date: August 7, 2003

Inventors: Steve S. Chang, Bertrand A. Damiba
Voice recognition system method and apparatus

Publication number: 20030135371

Abstract: An improved system for an interactive voice recognition system (400) includes a voice prompt generator (401) for generating voice prompt in a first frequency band (501). A speech detector (406) detects presence of speech energy in a second frequency band (502). The first and second frequency bands (501, 502) are essentially conjugate frequency bands. A voice data generator (412) generates voice data based on an output of the voice prompt generator (401) and audible speech of a voice response generator (402). A control signal (422) controls the voice prompt generator (401) based on whether the speech detector (406) detects presence of speech energy in the second frequency band (502). A back end (405) of the interactive voice recognition system (400) is configured to operate on an extracted front end voice feature based on whether the speech detector (406) detects presence of speech energy in the second frequency band (502).

Type: Application

Filed: January 15, 2002

Publication date: July 17, 2003

Inventors: Chienchung Chang, Narendranath Malayath
System and method of noise-dependent classification

Patent number: 6577997

Abstract: A noise-dependent classifier for a speech recognition system includes a recognizer (15) that provides scores and score differences of two closest in-vocabulary words from a received utterance. A noise detector (17) detects the noise level of a pre-speech portion of the utterance. A classifier (19) is responsive to the detected noise level and scores and noise dependent model for making decisions for accepting or rejecting the utterance as a recognized word depending on the noise-dependent model and the scores.

Type: Grant

Filed: April 27, 2000

Date of Patent: June 10, 2003

Assignee: Texas Instruments Incorporated

Inventor: Yifan Gong
Context-dependent acoustic models for medium and large vocabulary speech recognition with eigenvoice training

Patent number: 6571208

Abstract: A reduced dimensionality eigenvoice analytical technique is used during training to develop context-dependent acoustic models for allophones. The eigenvoice technique is also used during run time upon the speech of a new speaker. The technique removes individual speaker idiosyncrasies, to produce more universally applicable and robust allophone models. In one embodiment the eigenvoice technique is used to identify the centroid of each speaker, which may then be “subtracted out” of the recognition equation. In another embodiment maximum likelihood estimation techniques are used to develop common decision tree frameworks that may be shared across all speakers when constructing the eigenvoice representation of speaker space.

Type: Grant

Filed: November 29, 1999

Date of Patent: May 27, 2003

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Roland Kuhn, Jean-Claude Junqua, Matteo Contolini
Efficient magnitude spectrum approximation

Patent number: 6567777

Abstract: Methods and systems for efficient magnitude approximation, for example, for approximation of magnitudes of complex rectilinear Fourier transform coefficients in portable or other low-power speech recognition equipment, and for cartesian-to-polar coordinate transforms.

Type: Grant

Filed: August 2, 2000

Date of Patent: May 20, 2003

Assignee: Motorola, Inc.

Inventor: Manjirnath Chatterjee
Speech recognition method using speaker cluster models

Patent number: 6567776

Abstract: In speaker-independent speech recognition, between-speaker variability is one of the major resources of recognition errors. A speaker cluster model is used to manage recognition problems caused by between-speaker variability. In the training phase, the score function is used as a discriminative function. The parameters of at least two cluster-dependent models are adjusted through a discriminative training method to improve performance of the speech recognition.

Type: Grant

Filed: April 4, 2000

Date of Patent: May 20, 2003

Assignee: Industrial Technology Research Institute

Inventors: Sen-Chia Chang, Shih-Chieh Chien, Chung-Mou Penwu
Speech enabled, automatic telephone dialer using names, including seamless interface with computer-based address book programs

Patent number: 6563911

Abstract: The present invention a speech enabled automatic telephone dialer device, system, and method using a spoken name corresponding to name-telephone number data of computer-based address book programs. The invention includes user telephones connected to a PBX-type telephony mechanism, which is connected to a telephony board of a name dialer device. User computer workstations containing loaded address book programs with name-telephone number data are connected to the name dialer device. The name dialer device includes a host computer in a network; a telephony board for controlling the PBX for dialing; memory within the host computer for storing software and name-telephone number data; and, software to access computer-based address book programs, to receive voice inputs from the PBX-type telephony mechanism, to create converted phonemes from names to match voice inputs with specific name-telephone number data from the computer-based address book programs for initiating an automatic dialing.

Type: Grant

Filed: January 23, 2001

Date of Patent: May 13, 2003

Assignee: iVoice, Inc.

Inventor: Jerome R. Mahoney
Probabalistic networks for detecting signal content

Publication number: 20030061040

Abstract: A method and apparatus using a probabilistic network to estimate probability values each representing a probability that at least part of a signal represents content, such as voice activity, and to combine the probability values into an overall probability value. The invention may conform itself to particular system and/or signal characteristics by using some probability estimates and discarding other probability estimates.

Type: Application

Filed: September 25, 2001

Publication date: March 27, 2003

Inventors: Maxim Likhachev, Murat Eren
Subword-based speaker verification with multiple-classifier score fusion weight and threshold adaptation

Patent number: 6539352

Abstract: The voice print system of the present invention is a subword-based, text-dependent automatic speaker verification system that embodies the capability of user-selectable passwords with no constraints on the choice of vocabulary words or the language. Automatic blind speech segmentation allows speech to be segmented into subword units without any linguistic knowledge of the password. Subword modeling is performed using a multiple classifiers. The system also takes advantage of such concepts as multiple classifier fusion and data resampling to successfully boost the performance. Key word/key phrase spotting is used to optimally locate the password phrase. Numerous adaptation techniques increase the flexibility of the base system, and include: channel adaptation, fusion adaptation, model adaptation and threshold adaptation.

Type: Grant

Filed: November 21, 1997

Date of Patent: March 25, 2003

Inventors: Manish Sharma, Xiaoyu Zhang, Richard J. Mammone
Segmentation approach for speech recognition systems

Patent number: 6535851

Abstract: Phonetic units are identified in a body of utterance data according to a novel segmentation approach. A body of received utterance data is processed and a set of candidate phonetic unit boundaries is determined that defines a set of candidate phonetic units. The set of candidate phonetic unit boundaries is determined based upon changes in Cepstral coefficient values, changes in utterance energy, changes in phonetic classification, broad category analysis (retroflex, back vowels, front vowels) and sonorant onset detection. The set of candidate phonetic unit boundaries is filtered by priority and proximity to other candidate phonetic units and by silence regions. The set of candidate phonetic units is filtered using no-cross region analysis to generate a set of filtered candidate phonetic units. No-cross region analysis generally involves discarding candidate phonetic units that completely span an energy up, energy down, dip or broad category type no-cross region.

Type: Grant

Filed: March 24, 2000

Date of Patent: March 18, 2003

Assignee: SpeechWorks, International, Inc.

Inventors: Mark Fanty, Michael S. Phillips
Method for noise adaptation in automatic speech recognition using transformed matrices

Patent number: 6529872

Abstract: The improved noise adaptation technique employs a linear or non-linear transformation to the set of Jacobian matrices corresponding to an initial noise condition. An &agr;-adaptation parameter or artificial intelligence operation is employed in a linear or non-linear way to increase the adaptation bias added to the speech models. This corrects shortcomings of conventional Jacobian adaptation, which tend to underestimate the effect of noise. The improved adaptation technique is further enhanced by a reduced dimensionality, principal component analysis technique that reduces the computational burden, making the adaptation technique beneficial in embedded recognition systems.

Type: Grant

Filed: April 18, 2000

Date of Patent: March 4, 2003

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Christophe Cerisara, Luca Rigazio, Robert Boman, Jean-Claude Junqua
Methods and apparatus for the systematic adaptation of classification systems from sparse adaptation data

Publication number: 20030036904

Abstract: Methods and apparatus for the rapid adaptation of classification systems using small amounts of adaptation data. Improvements in classification accuracy are attainable when conditions similar to those that present in adaptation are observed. The attendant methods and apparatus are suitable for a wide variety of different classification schemes, including, e.g., speaker identification and speaker verification.

Type: Application

Filed: August 16, 2001

Publication date: February 20, 2003

Applicant: IBM Corporation

Inventors: Upendra V. Chaudhari, Stephane H. Maes, Jiri Navratil
Information detection apparatus and method, and information search apparatus and method

Publication number: 20030036905

Abstract: A process of identifying a speaker in coded speech data and a process of searching for the speaker are efficiently performed with fewer computations and with a smaller storage capacity. In an information search apparatus, an LSP decoding section extracts and decodes only LSP information from coded speech data which is read for each block. An LPC conversion section converts the LSP information into LPC information. A Cepstrum conversion section converts the obtained LPC information into an LPC Cepstrum which represents features of speech. A vector quantization section performs vector quantization on the LPC Cepstrum. A speaker identification section identifies a speaker on the basis of the result of the vector quantization. Furthermore, the identified speaker is compared with a search condition in a condition comparison section, and based on the result, the search result is output.

Type: Application

Filed: July 23, 2002

Publication date: February 20, 2003

Inventors: Yasuhiro Toguri, Masayuki Nishiguchi
Model adaptation of neural tree networks and other fused models for speaker verification

Patent number: 6519561

Abstract: The model adaptation system of the present invention is a speaker verification system that embodies the capability to adapt models learned during the enrollment component to track aging of a user's voice. The system has the advantage of only requiring a single enrollment for the user. The model adaptation system and methods can be applied to several types of speaker recognition models including neural tree networks (NTN), Gaussian Mixture Models (GMMs), and dynamic time warping (DTW) or to multiple models (i.e., combinations of NTNs, GMMs and DTW). Moreover, the present invention can be applied to text-dependent or text-independent systems.

Type: Grant

Filed: November 3, 1998

Date of Patent: February 11, 2003

Assignee: T-Netix, Inc.

Inventors: Kevin Farrell, William Mistretta
Background model design for flexible and portable speaker verification systems

Patent number: 6519563

Abstract: A speaker verification method and apparatus which advantageously minimizes the constraints on the customer and simplifies the system architecture by using a speaker dependent, rather than a speaker independent, background model, thereby obtaining many of the advantages of using a background model in a speaker verification process without many of the disadvantages thereof. In particular, no training data (e.g. speech) from anyone other than the customer is required, no speaker independent models need to be produced, no a priori knowledge of acoustic rules are required, and, no multi-lingual phone models, dictionaries, or letter-to-sound rules are needed. Nonetheless, in accordance with an illustrative embodiment of the present invention, the customer is free to select any password phrase in any language.

Type: Grant

Filed: November 22, 1999

Date of Patent: February 11, 2003

Assignee: Lucent Technologies Inc.

Inventors: Chin-Hui Lee, Qi P. Li, Olivier Siohan, Arun Chandrasekaran Surendran
Method and system for automatically adjusting prompt feedback based on predicted recognition accuracy

Patent number: 6505155

Abstract: In a computer speech user interface, a method and computer apparatus for automatically adjusting the content of feedback in a responsive prompt based upon predicted recognition accuracy by a speech recognizer. The method includes the steps of receiving a user voice command from the speech recognizer; calculating present speech recognition accuracy based upon the received user voice command; predicting future recognition accuracy based upon the calculated present speech recognition accuracy; and, generating feedback in a responsive prompt responsive to the predicted recognition accuracy. For predicting future poor recognition accuracy based upon poor present recognition accuracy, the calculating step can include monitoring the received user voice command; detecting a reduced accuracy condition in the monitored user voice command; and, determining poor present recognition accuracy if the reduced accuracy condition is detected in the detecting step.

Type: Grant

Filed: May 6, 1999

Date of Patent: January 7, 2003

Assignee: International Business Machines Corporation

Inventors: Ronald Vanbuskirk, Huifang Wang, Kerry A. Ortega, Catherine G. Wolf
Method and apparatus for hierarchical training of speech models for use in speaker verification

Patent number: 6499012

Abstract: A method and apparatus for generating a pair of data elements is provided suitable for use in a speaker verification system. The pair includes a first element representative of a speaker independent template and a second element representative of an extended speaker specific speech pattern. An audio signal forming enrollment data associated with a given speaker is received and processed to derive a speaker independent template and a speaker specific speech pattern. The speaker specific speech pattern is then processed to derive an extended speaker specific speech pattern. The extended speaker specific speech pattern includes a set of expanded speech models, each expanded speech model including a plurality of groups of states, the groups of states being linked to one another by inter-group transitions. Optionally, the expanded speech models are processed on the basis of the enrollment data to condition at least one of the plurality of inter-group transitions.

Type: Grant

Filed: December 23, 1999

Date of Patent: December 24, 2002

Assignee: Nortel Networks Limited

Inventors: Stephen Douglas Peters, Matthieu Hebert, Daniel Boies
Method and system for non-intrusive speaker verification using behavior models

Patent number: 6490560

Abstract: A system and method for verifying user identity, in accordance with the present invention, includes a conversational system for receiving inputs from a user and transforming the inputs into formal commands. A behavior verifier is coupled to the conversational system for extracting features from the inputs. The features include behavior patterns of the user. The behavior verifier is adapted to compare the input behavior to a behavior model to determine if the user is authorized to interact with the system.

Type: Grant

Filed: March 1, 2000

Date of Patent: December 3, 2002

Assignee: International Business Machines Corporation

Inventors: Ganesh N. Ramaswamy, Upendra V. Chaudhari
Off site voice enrollment on a transcription device for speech recognition

Patent number: 6477493

Abstract: A method and system for use with a computer recognition system to enroll a user. The method involves a series of steps. The invention provides a user with an enrollment script. The invention then receives a recording made with a transcription device of a dictation session in which the user has dictated at least a portion of the enrollment script. Additionally, the invention can enroll the user in the speech recognition system by decoding the recording and training the speech recognition system.

Type: Grant

Filed: July 15, 1999

Date of Patent: November 5, 2002

Assignee: International Business Machines Corporation

Inventors: Brian S. Brooks, Waltraud Brunner, Carmi Gazit, Arthur Keller, Antonio R. Lee, Thomas Netousek, Kerry A. Ortega
Voice selection apparatus voice response apparatus, and game apparatus using word tables from which selected words are output as voice selections

Patent number: 6471420

Abstract: A game apparatus of the invention includes: a voice input section for inputting at least one voice set including voice uttered by an operator, for converting the voice set into a first electric signal, and for outputting the first electric signal; a voice recognition section for recognizing the voice set on the basis of the first electric signal output from the voice input means; an image input section for optically detecting a movement of the lips of the operator, for converting the detected movement of lips into a second electric signal, and for outputting the second electric signal; a speech period detection section for receiving the second electric signal, and for obtaining a period in which the voice is uttered by the operator on the basis of the received second electric signal; an overall judgment section for extracting the voice uttered by the operator from the input voice set, on the basis of the voice set recognized by the voice recognition means and the period obtained by the speech period detection

Type: Grant

Filed: May 4, 1995

Date of Patent: October 29, 2002

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Hidetsugu Maekawa, Tatsumi Watanabe, Kazuaki Obara, Kazuhiro Kayashima, Kenji Matsui, Yoshihiko Matsukawa
Processing speech recognition errors in an embedded speech recognition system

Publication number: 20020123893

Abstract: A method and system for processing speech misrecognitions. The system can include an embedded speech recognition system having at least one acoustic model and at least one active grammar, wherein the embedded speech recognition system is configured to convert speech audio to text using the at least one acoustic model and the at least one active grammar; a remote training system for modifying the at least one acoustic model based on corrections to speech misrecognitions detected in the embedded speech recognition system; and, a communications link for communicatively linking the embedded speech recognition system to the remote training system. The embedded speech recognition system can further include a user interface for presenting a dialog for correcting the speech misrecognitions detected in the embedded speech recognition system. Notably, the user interface can be a visual display. Alternatively, the user interface can be an audio user interface.

Type: Application

Filed: March 1, 2001

Publication date: September 5, 2002

Applicant: International Business Machines Corporation

Inventor: Steven G. Woodward
Speech recognition method, speech recognition device, and recording medium on which is recorded a speech recognition processing program

Patent number: 6446039

Abstract: This invention concerns obtaining high recognition capability while there is a large limitation on memory capacity and processing ability of a CPU. When several words are selected as registration words among a plurality of recognizable words, a recognition target speaker speaks the respective registration words, registration word data for the respective registration words from the sound data is created and saved in a RAM. When the recognition target speaker speaks a registration word, sound is recognized using the registration word data, and when recognizable words other than the registration words are recognized, sound is recognized using specific speaker group sound model data. Furthermore, speaker learning processing is performed using the registration word data and the specific speaker group sound model data, and when recognizable words other than the registration words are recognized, sound is recognized using post-speaker learning data for speaker adaptation.

Type: Grant

Filed: August 23, 1999

Date of Patent: September 3, 2002

Assignee: Seiko Epson Corporation

Inventors: Yasunaga Miyazawa, Mitsuhiro Inazumi, Hiroshi Hasegawa, Masahisa Ikejiri
Method and system for frame alignment and unsupervised adaptation of acoustic models

Publication number: 20020116190

Abstract: An unsupervised adaptation method and apparatus are provided that reduce the storage and time requirements associated with adaptation. Under the invention, utterances are converted into feature vectors, which are decoded to produce a transcript and alignment unit boundaries for the utterance. Individual alignment units and the feature vectors associated with those alignment units are then provided to an alignment function, which aligns the feature vectors with the states of each alignment unit. Because the alignment is performed within alignment unit boundaries, fewer feature vectors are used and the time for alignment is reduced. After alignment, the feature vector dimensions aligned to a state are added to dimension sums that are kept for that state. After all the states in an utterance have had their sums updated, the speech signal and the alignment units are deleted. Once sufficient frames of data have been received to perform adaptive training, the acoustic model is adapted.

Type: Application

Filed: December 22, 2000

Publication date: August 22, 2002

Inventors: William H. Rockenbeck, Milind V. Mahajan, Fileno A. Alleva
Methods for generating pronounciation variants and for recognizing speech

Publication number: 20020111805

Abstract: To increase the recognition rate in processes for recognizing speech of a given target language (TL) which is spoken by a speaker of a different source language (SL) as a mother language, it is suggested to use pronunciation variants for said target language (TL) which are derived from said source languge (SL) without using non-native speech in said target langugae (TL).

Type: Application

Filed: February 12, 2002

Publication date: August 15, 2002

Inventors: Silke Goronzy, Ralf Kompe
Method of determining an eigenspace for representing a plurality of training speakers

Publication number: 20020095287

Abstract: Described here is a method of determining an eigenspace for representing a plurality of training speakers, in which first speaker-dependent sets of models are formed for the individual training speakers while training speech data of the individual training speakers are used and the models (SD) of a set of models are described each by a plurality of model parameters. For each speaker a combined model is then displayed in a high-dimensional model space by concatenation of the model parameters of the models of the individual training speakers to a respective coherent super vector. Subsequently, a transformation is carried out while the model space dimension is reduced to obtain eigenspace basis vectors (Ee), which transformation utilizes a reduction criterion based on the variability of the vectors to be transformed. Then the high-dimensional model space is first in a first step reduced to a speaker subspace by a change of basis, in which speaker subspace all the training speakers are represented.

Type: Application

Filed: September 24, 2001

Publication date: July 18, 2002

Inventor: Henrik Botterweck
Methods and apparatus for fast adaptation of a band-quantized speech decoding system

Patent number: 6421641

Abstract: A method of performing speaker adaptation of acoustic models in a band-quantized speech recognition system, wherein the system including one or more acoustic models represented by a feature space of multi-dimensional gaussians, whose dimensions are partitioned into bands, and the gaussian means and covariances within each band are quantized into atoms, comprises the following steps. A decoded segment of a speech signal associated with a particular speaker is obtained. Then, at least one adaptation mapping based on the decoded segment is computed. Lastly, the at least one adaptation mapping is applied to the atoms of the acoustic models to generate one or more acoustic models adapted to the particular speaker. Accordingly, a fast speaker adaptation methodology is provided for use in real-time applications.

Type: Grant

Filed: November 12, 1999

Date of Patent: July 16, 2002

Assignee: International Business Machines Corporation

Inventors: Jing Huang, Mukund Padmanabhan
Mobile communications terminal, voice recognition method for same, and record medium storing program for voice recognition

Publication number: 20020082832

Abstract: A voice input section receives voice of the user designating a name etc. and outputs a voice signal to a voice recognition section. The voice recognition section analyzes and recognizes the voice signal and thereby obtains voice data. The voice data is compared with voice patterns that have been registered in the mobile communications terminal corresponding to individuals etc. and thereby a voice pattern that most matches the voice data is searched for and retrieved. If the retrieval of a matching voice pattern succeeded, a memory search processing section refers to a voice-data correspondence table and thereby calls up a telephone directory that has been registered corresponding to the retrieved voice pattern. In each telephone directory, various types of data (telephone number, mail address, URL, etc.) of an individual etc. to be used for starting communication have been registered previously. The type of data to be called up is designated by button operation etc.

Type: Application

Filed: December 17, 2001

Publication date: June 27, 2002

Applicant: NEC CORPORATION

Inventor: Yoshihisa Nagashima
Cohort model selection apparatus and method

Patent number: 6393397

Abstract: An apparatus for selecting a cohort model for use in a speaker verification system includes a model generator (108) for determining a target speaker model (114) from a speech sample collected from the target speaker (106). A cohort selector (110) determines a similarity value between each of a number of predetermined existing speaker models from a model pool (112) and the target speaker model (114) and a dissimilarity value between each of the existing speaker models and any previously selected cohort models (116). An existing speaker model which is most similar to the target speaker model, but most dissimilar to previously chosen cohort models, is then chosen as another cohort model for the target speaker.

Type: Grant

Filed: June 14, 1999

Date of Patent: May 21, 2002

Assignee: Motorola, Inc.

Inventors: Ho Chuen Choi, Xiaoyuan Zhu, Jianming Song
Method and apparatus for speaker recognition via comparing an unknown input to reference data

Patent number: 6389392

Abstract: A method and apparatus for pattern recognition comprising comparing an input signal representing an unknown pattern with reference data representing each of a plurality of pre-defined patterns, at least one of the pre-defined patterns being represented by at least two instances of reference data. Successive segments of the input signal are compared with successive segments of the reference data and comparison results for each successive segment are generated. For each pre-defined pattern having at least two instances of reference data, the comparison results for the closest matching segment of reference data for each segment of the input signal are recorded to produce a composite comparison result for the said pre-defined pattern. The unknown pattern is the identified on the basis of the comparison results. Thus the effect of a mismatch between the input signal and each instance of the reference data is reduced by selecting the best segments from the instances of reference data for each pre-defined pattern.

Type: Grant

Filed: December 8, 1998

Date of Patent: May 14, 2002

Assignee: British Telecommunications public limited company

Inventors: Mark Pawlewski, Aladdin Mohammad Ariyaeeinia, Perasiriyan Sivakumaran
Method of modifying feature parameter for speech recognition, method of speech recognition and speech recognition apparatus

Patent number: 6381572

Abstract: A method of modifying feature parameters for a speech recognition is provided. This method is provided with: a process of extracting the feature parameter from an input speech in a real environment; a process of reading a first speech transfer characteristic corresponding to an environment in which a reference pattern for the speech recognition is generated, from a first memory device; a process of reading a second speech transfer characteristic corresponding to the real environment from a second memory device; a process of modifying the extracted feature parameter according to the first speech transfer characteristic and the second speech transfer characteristic to convert the extracted feature parameter corresponding to the real environment into a modified feature parameter corresponding to the environment in which the reference pattern is generated.

Type: Grant

Filed: April 9, 1999

Date of Patent: April 30, 2002

Assignee: Pioneer Electronic Corporation

Inventors: Shunsuke Ishimitsu, Ikuo Fujita
Apparatus and method of voice recognition

Publication number: 20020046027

Abstract: In an apparatus and method of voice recognition, where there are the same names, a recognition system side creates the keyword for limiting the plurality of names and inquires a user, and in response to the inquiry, the user announces a keyword, thereby executing limiting processing. Because of such a configuration, a single desired spot name can be finally specified easily.

Type: Application

Filed: October 15, 2001

Publication date: April 18, 2002

Applicant: Pioneer Corporation

Inventor: Fumio Tamura
Method and apparatus for speaker recognition

Patent number: 6349280

Abstract: A method of recognizing a speaker of an input speech according to the distance between an input speech pattern, obtained by converting the input speech to a feature parameter series, and a reference pattern preliminarily registered as feature parameter series for each speaker is provided. Contents of the input and reference speech patterns is obtained by recognition. An identical section, in which the contents of the input and reference speech patterns are identical is determined. The distance between the input and reference speech patterns in the calculated identical content section is determined. The speaker of the input speech is recognized on the basis of the determined distance.

Type: Grant

Filed: March 4, 1999

Date of Patent: February 19, 2002

Assignee: NEC Corporation

Inventor: Hiroaki Hattori
System and method for advanced interfaces for virtual environments

Patent number: 6310629

Abstract: A system and method for providing a controllable virtual environment includes a computer (11) with processor and a display coupled to the processor to display 2-D or 3-D virtual environment objects. Speech grammars are stored as attributes of the virtual environment objects. Voice commands are recognized by a speech recognizer (19) and microphone (20) coupled to the processor whereby the voice commands are used to manipulate the virtual environment objects on the display. The system is further made role-dependent whereby the display of virtual environment objects and grammar is dependent on the role of the user.

Type: Grant

Filed: November 9, 1998

Date of Patent: October 30, 2001

Assignee: Texas Instruments Incorporated

Inventors: Yeshwant K. Muthusamy, Jonathan D. Courtney, Edwin R. Cole
Speech recognition apparatus

Patent number: 6253180

Abstract: The invention provides a speech recognition apparatus which makes it possible to utilize, upon prior learning, in addition to SD-HMMs of a large number of speakers, adaptation utterances of the speakers to make operation conditions coincide with those upon speaker adaptation to allow prior learning of parameters having a high accuracy and by which, also when the number of words of adaptation utterances upon speaker adaptation is small, adaptation can be performed with a high degree of accuracy.

Type: Grant

Filed: June 16, 1999

Date of Patent: June 26, 2001

Assignee: NEC Corporation

Inventor: Kenichi Iso
Method of out of vocabulary word rejection

Patent number: 6243677

Abstract: An improved method of providing out-of-vocabulary word rejection is achieved by adapting an initial garbage model (23) using received incoming enrollment speech (21).

Type: Grant

Filed: October 23, 1998

Date of Patent: June 5, 2001

Assignee: Texas Instruments Incorporated

Inventors: Levent M. Arslan, Lorin P. Netsch, Periagaram K. Rajasekaran
Method of selectively assigning a penalty to a probability associated with a voice recognition system

Patent number: 6233557

Abstract: A voice recognition system (204, 206, 207, 208) assigns a penalty to a score in a voice recognition system. The system generates a lower threshold for the number of frames assigned to at least one state of at least one model and an upper threshold for the number of frames assigned to at least one state of at least one model. The system assigns an out of state transition penalty to an out of state transition score in an allocation assignment algorithm if the lower threshold has not been met. The out of state transition penalty is proportional to the number of frames that the dwell time is below the lower threshold. A self loop penalty is applied to a self loop score if the upper threshold number of frames assigned to a state has been exceeded. The out of state transition penalty is proportional to the number of frames that the dwell time is above the upper threshold.

Type: Grant

Filed: February 23, 1999

Date of Patent: May 15, 2001

Assignee: Motorola, Inc.

Inventor: Daniel C. Poppert
Voice processing and verification system

Patent number: 6233556

Abstract: A voice processing and verification system accounts for variations dependent upon telephony equipment differences. Models are developed for the various types of telephony equipment from many users speaking on each of the types of equipment. A transformation algorithm is determined for making a transformation between each of the various types of equipment to each of the others. In other words, a model is formed for carbon button telephony equipment from many users. Similarly, a model is formed for electret telephony equipment from many users, and for cellular telephony equipment from many users. During an enrollment, a user speaks to the system. The system forms and stores a model of the user's speech. The type of telephony equipment used in the original enrollment session is also detected and stored along with the enrollment voice model. The system determines the types of telephony equipment being used based upon the spectrum of sound it receives.

Type: Grant

Filed: December 16, 1998

Date of Patent: May 15, 2001

Assignee: Nuance Communications

Inventors: Remco Teunen, Ben Shahshahani
Method for direct recognition of encoded speech data

Patent number: 6223157

Abstract: Digital Cellular telephony requires voice compression designed to minimize the bandwidth required for the digital cellular channel. The features used in speech recognition have similar components to those used in the vocoding process. The present invention provides a system that bypasses the de-compression or decoding phase of the vocoding and converts the digital cellular parameters directly into features that can be processed by a recognition engine. More specifically, the present invention provides a system and method for mapping a vocoded representation of parameters defining speech components, which in turn define a particular waveform, into a base feature type representation of parameters defining speech components (e.g. LPC parameters), which in turn define the same digital waveform.

Type: Grant

Filed: May 7, 1998

Date of Patent: April 24, 2001

Assignee: DSC Telecom, L.P.

Inventors: Thomas D. Fisher, Jeffery J. Spiess, Dearborn R. Mowry
Two-staged cohort selection for speaker verification system

Patent number: 6205424

Abstract: Speech signals from speakers having known identities are used to create sets of acoustic models. The acoustic models along with their corresponding identities are stored in a memory. A plurality of sets of cohort models that characterize the speech signals are selected from the stored sets of acoustic models, and linked to the set of acoustic models of each identified speaker. During a testing session speech signals produced by an unknown speaker having a claimed identity are processed to generate processed speech signals. The processed speech signals are compared to the set of models of the claimed speaker to produce first scores. The processed speech signals are also compared to the sets cohort models to produce second scores. A subset of scores are dynamically selected from the second scores according to a predetermined criteria.

Type: Grant

Filed: July 31, 1996

Date of Patent: March 20, 2001

Assignee: Compaq Computer Corporation

Inventors: William D. Goldenthal, Brian S. Eberman
Marking and deferring correction of misrecognition errors

Patent number: 6195637

Abstract: A method for correcting misrecognition errors comprises the steps of: dictating to a speech application; marking misrecognized words during the dictating step; and, after the dictating and marking steps, displaying and correcting the marked misrecognized words, whereby the correcting of the misrecognized words is deferred until after the dictating step is concluded and the dictating step is not significantly interrupted. The displaying and correcting step can be implemented by invoking a correction tool of the speech application, whereby the correcting of the misrecognized words trains the speech application.

Type: Grant

Filed: March 25, 1998

Date of Patent: February 27, 2001

Assignee: International Business Machines Corp.

Inventors: Barbara E. Ballard, Kerry A. Ortega
Context dependent phoneme networks for encoding speech information

Patent number: 6182038

Abstract: A method and apparatus for generating a context dependent phoneme network as an intermediate step of encoding speech information. The context dependent phoneme network is generated from speech in a phoneme network generator (48) associated with an operating system (44). The context dependent phoneme network is then transmitted to a first application (52).

Type: Grant

Filed: December 1, 1997

Date of Patent: January 30, 2001

Assignee: Motorola, Inc.

Inventors: Sreeram Balakrishnan, Stephen Austin
Method for reducing search complexity in a speech recognition system

Patent number: 6178401

Abstract: A method is provided for reducing search complexity in a speech recognition system having a fast match, a detailed match, and a language model. Based on at least one predetermined variable, the fast match is optionally employed to generate candidate words and acoustic scores corresponding to the candidate words. The language model is employed to generate language model scores. The acoustic scores are combined with the language model scores and the combined scores are ranked to determine top ranking candidate words to be later processed by the detailed match, when the fast match is employed. The detailed match is employed to generate detailed match scores for the top ranking candidate words.

Type: Grant

Filed: August 28, 1998

Date of Patent: January 23, 2001

Assignee: International Business Machines Corporation

Inventors: Martin Franz, Miroslav Novak
System and method for automatic classification of speech based upon affective content

Patent number: 6173260

Abstract: The classification of speech according to emotional content employs acoustic measures in addition to pitch as classification input. In one embodiment, two different kinds of features in a speech signal are analyzed for classification purposes. One set of features is based on pitch information that is obtained from a speech signal, and the other set of features is based on changes in the spectral shape of the speech signal over time. This latter feature is used to distinguish long, smoothly varying sounds from quickly changing sound, which may indicate the emotional state of the speaker. These changes are determined by means of a low-dimensional representation of the speech signal, such as MFCC or LPC. Additional features of the speech signal, such as energy, can also be employed for classification purposes. Different variations of pitch and spectral shape features can be measured and analyzed, to assist in the classification of individual utterances.

Type: Grant

Filed: March 31, 1998

Date of Patent: January 9, 2001

Assignee: Interval Research Corporation

Inventor: Malcolm Slaney

prev … 4 5 6 7 8 9 next