Specialized Models Patents (Class 704/250)
  • Publication number: 20040015355
    Abstract: The invention relates to a telecommunications network (1) comprising a plurality of network elements (2 to 7, 17) and a plurality of subscriber connections (8 to 10). Said network is characterised in that at least one of the network elements (2 to 7, 17) comprises a voice recognition module (18) for recognising biometric voice parameters.
    Type: Application
    Filed: June 5, 2003
    Publication date: January 22, 2004
    Inventors: Marian Trinkel, Franz Steimer, Christel Mueller
  • Publication number: 20040015356
    Abstract: The invention aims at providing voice recognition apparatus which can perform training without a speaker being conscious thereof by utilizing the fact that the name of a distant party is frequently uttered at the beginning of conversation over telephone and increase the recognition ratio and recognition speed of the speaker dependent system as the speaker uses the voice recognition apparatus. The invention includes a voice recognition processor of the speaker independent system for comparing acoustic data obtained by splitting an input sound signal with a plurality of word acoustic data and detecting word acoustic data matching the split acoustic data, wherein the voice recognition processor sequentially compares word acoustic data generated from a phoneme model with acoustic data generated from a name uttered by the speaker, and stores the acoustic data identifier corresponding to the generated acoustic data, which match the word acoustic data, as a training signal.
    Type: Application
    Filed: July 16, 2003
    Publication date: January 22, 2004
    Applicant: Matsushita Electric Industrial Co., Ltd.
    Inventors: Kenji Nakamura, Hiroshi Harada, Yoshiyuki Ogata, Masakazu Tachiyama, Tatsuhiro Goto, Yasuyuki Nishioka, Yoshiaki Kuroki
  • Publication number: 20030216916
    Abstract: In detection systems, such as speaker verification systems, for a given operating point range, with an associated detection “cost”, the detection cost is preferably reduced by essentially trading off the system error in the area of interest with areas essentially “outside” that interest. Among the advantages achieved thereby are higher optimization gain and better generalization. From a measurable Detection Error Tradeoff (DET) curve of the given detection system, a criterion is preferably derived, such that its minimization provably leads to detection cost reduction in the area of interest. The criterion allows for selective access to the slope and offset of the DET curve (a line in case of normally distributed detection scores, a curve approximated by mixture of Gaussians in case of other distributions). By modifying the slope of the DET curve, the behavior of the detection system is changed favorably with respect to the given area of interest.
    Type: Application
    Filed: May 19, 2002
    Publication date: November 20, 2003
    Applicant: IBM Corporation
    Inventors: Jiri Navratil, Ganesh N. Ramaswamy
  • Patent number: 6618702
    Abstract: A language-independent speaker-recognition system based on parallel cumulative differences in dynamic realization of phonetic features ( i.e. , pronunciation) between speakers rather than spectral differences in voice quality. The system exploits phonetic information from many phone recognizers to perform text independent speaker recognition. A digitized speech signal from a speaker is converted to a sequence of phones by each phone recognizer. Each phone sequence is then modified based on the energy in the signal. The modified phone sequences are tokenized to produce phone n-grams that are compared against a speaker and a background model for each phone recognizer to produce log-likelihood ratio scores. The log-likelihood ratio scores from each phone recognizer are fused to produce a final recognition score for each speaker model. The recognition score for each speaker model is then evaluated to determine which of the modeled speakers, if any, produced the digitized speech signal.
    Type: Grant
    Filed: June 14, 2002
    Date of Patent: September 9, 2003
    Inventors: Mary Antoinette Kohler, Walter Doyle Andrews, III, Joseph Paul Campbell, Jr.
  • Publication number: 20030167167
    Abstract: An intelligent social agent is an animated computer interface agent with social intelligence that has been developed for a given application or type of applications and a particular user population. The social intelligence of the agent comes from the ability of the agent to be appealing, affective, adaptive, and appropriate when interacting with the user. An intelligent personal assistant is an implementation of an intelligent social agent that assists a user in operating a computing device and using application programs on a computing device.
    Type: Application
    Filed: May 31, 2002
    Publication date: September 4, 2003
    Inventor: Li Gong
  • Publication number: 20030163311
    Abstract: An intelligent social agent is an animated computer interface agent with social intelligence that has been developed for a given application or type of applications and a particular user population. The social intelligence of the agent comes from the ability of the agent to be appealing, affective, adaptive, and appropriate when interacting with the user.
    Type: Application
    Filed: April 30, 2002
    Publication date: August 28, 2003
    Inventor: Li Gong
  • Patent number: 6606594
    Abstract: A speech recognition system recognizes an input utterance of spoken words. The system includes a set of word models for modeling vocabulary to be recognized, each word model being associated with a word in the vocabulary, each word in the vocabulary considered as a sequence of phones including a first phone and a last phone, wherein each word model begins in the middle of the first phone of its associated word and ends in the middle of the last phone of its associated word; a set of word connecting models for modeling acoustic transitions between the middle of a word's last phone and the middle of an immediately succeeding word's first phone; and a recognition engine for processing the input utterance in relation to the set of word models and the set of word connecting models to cause recognition of the input utterance.
    Type: Grant
    Filed: September 29, 1999
    Date of Patent: August 12, 2003
    Assignee: ScanSoft, Inc.
    Inventors: Vladimir Sejnoha, Tom Lynch, Ramesh Sarukkai
  • Publication number: 20030149565
    Abstract: A system, method and computer program product are provided for recognizing utterances. Initially, an utterance is recognized. Thereafter, it is determined whether the utterance can be recognized utilizing speech recognition. If the utterance can not be recognized utilizing speech recognition, spelling recognition is used to recognize the utterance.
    Type: Application
    Filed: March 9, 2001
    Publication date: August 7, 2003
    Inventors: Steve S. Chang, Bertrand A. Damiba
  • Publication number: 20030135371
    Abstract: An improved system for an interactive voice recognition system (400) includes a voice prompt generator (401) for generating voice prompt in a first frequency band (501). A speech detector (406) detects presence of speech energy in a second frequency band (502). The first and second frequency bands (501, 502) are essentially conjugate frequency bands. A voice data generator (412) generates voice data based on an output of the voice prompt generator (401) and audible speech of a voice response generator (402). A control signal (422) controls the voice prompt generator (401) based on whether the speech detector (406) detects presence of speech energy in the second frequency band (502). A back end (405) of the interactive voice recognition system (400) is configured to operate on an extracted front end voice feature based on whether the speech detector (406) detects presence of speech energy in the second frequency band (502).
    Type: Application
    Filed: January 15, 2002
    Publication date: July 17, 2003
    Inventors: Chienchung Chang, Narendranath Malayath
  • Patent number: 6577997
    Abstract: A noise-dependent classifier for a speech recognition system includes a recognizer (15) that provides scores and score differences of two closest in-vocabulary words from a received utterance. A noise detector (17) detects the noise level of a pre-speech portion of the utterance. A classifier (19) is responsive to the detected noise level and scores and noise dependent model for making decisions for accepting or rejecting the utterance as a recognized word depending on the noise-dependent model and the scores.
    Type: Grant
    Filed: April 27, 2000
    Date of Patent: June 10, 2003
    Assignee: Texas Instruments Incorporated
    Inventor: Yifan Gong
  • Patent number: 6571208
    Abstract: A reduced dimensionality eigenvoice analytical technique is used during training to develop context-dependent acoustic models for allophones. The eigenvoice technique is also used during run time upon the speech of a new speaker. The technique removes individual speaker idiosyncrasies, to produce more universally applicable and robust allophone models. In one embodiment the eigenvoice technique is used to identify the centroid of each speaker, which may then be “subtracted out” of the recognition equation. In another embodiment maximum likelihood estimation techniques are used to develop common decision tree frameworks that may be shared across all speakers when constructing the eigenvoice representation of speaker space.
    Type: Grant
    Filed: November 29, 1999
    Date of Patent: May 27, 2003
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Roland Kuhn, Jean-Claude Junqua, Matteo Contolini
  • Patent number: 6567777
    Abstract: Methods and systems for efficient magnitude approximation, for example, for approximation of magnitudes of complex rectilinear Fourier transform coefficients in portable or other low-power speech recognition equipment, and for cartesian-to-polar coordinate transforms.
    Type: Grant
    Filed: August 2, 2000
    Date of Patent: May 20, 2003
    Assignee: Motorola, Inc.
    Inventor: Manjirnath Chatterjee
  • Patent number: 6567776
    Abstract: In speaker-independent speech recognition, between-speaker variability is one of the major resources of recognition errors. A speaker cluster model is used to manage recognition problems caused by between-speaker variability. In the training phase, the score function is used as a discriminative function. The parameters of at least two cluster-dependent models are adjusted through a discriminative training method to improve performance of the speech recognition.
    Type: Grant
    Filed: April 4, 2000
    Date of Patent: May 20, 2003
    Assignee: Industrial Technology Research Institute
    Inventors: Sen-Chia Chang, Shih-Chieh Chien, Chung-Mou Penwu
  • Patent number: 6563911
    Abstract: The present invention a speech enabled automatic telephone dialer device, system, and method using a spoken name corresponding to name-telephone number data of computer-based address book programs. The invention includes user telephones connected to a PBX-type telephony mechanism, which is connected to a telephony board of a name dialer device. User computer workstations containing loaded address book programs with name-telephone number data are connected to the name dialer device. The name dialer device includes a host computer in a network; a telephony board for controlling the PBX for dialing; memory within the host computer for storing software and name-telephone number data; and, software to access computer-based address book programs, to receive voice inputs from the PBX-type telephony mechanism, to create converted phonemes from names to match voice inputs with specific name-telephone number data from the computer-based address book programs for initiating an automatic dialing.
    Type: Grant
    Filed: January 23, 2001
    Date of Patent: May 13, 2003
    Assignee: iVoice, Inc.
    Inventor: Jerome R. Mahoney
  • Publication number: 20030061040
    Abstract: A method and apparatus using a probabilistic network to estimate probability values each representing a probability that at least part of a signal represents content, such as voice activity, and to combine the probability values into an overall probability value. The invention may conform itself to particular system and/or signal characteristics by using some probability estimates and discarding other probability estimates.
    Type: Application
    Filed: September 25, 2001
    Publication date: March 27, 2003
    Inventors: Maxim Likhachev, Murat Eren
  • Patent number: 6539352
    Abstract: The voice print system of the present invention is a subword-based, text-dependent automatic speaker verification system that embodies the capability of user-selectable passwords with no constraints on the choice of vocabulary words or the language. Automatic blind speech segmentation allows speech to be segmented into subword units without any linguistic knowledge of the password. Subword modeling is performed using a multiple classifiers. The system also takes advantage of such concepts as multiple classifier fusion and data resampling to successfully boost the performance. Key word/key phrase spotting is used to optimally locate the password phrase. Numerous adaptation techniques increase the flexibility of the base system, and include: channel adaptation, fusion adaptation, model adaptation and threshold adaptation.
    Type: Grant
    Filed: November 21, 1997
    Date of Patent: March 25, 2003
    Inventors: Manish Sharma, Xiaoyu Zhang, Richard J. Mammone
  • Patent number: 6535851
    Abstract: Phonetic units are identified in a body of utterance data according to a novel segmentation approach. A body of received utterance data is processed and a set of candidate phonetic unit boundaries is determined that defines a set of candidate phonetic units. The set of candidate phonetic unit boundaries is determined based upon changes in Cepstral coefficient values, changes in utterance energy, changes in phonetic classification, broad category analysis (retroflex, back vowels, front vowels) and sonorant onset detection. The set of candidate phonetic unit boundaries is filtered by priority and proximity to other candidate phonetic units and by silence regions. The set of candidate phonetic units is filtered using no-cross region analysis to generate a set of filtered candidate phonetic units. No-cross region analysis generally involves discarding candidate phonetic units that completely span an energy up, energy down, dip or broad category type no-cross region.
    Type: Grant
    Filed: March 24, 2000
    Date of Patent: March 18, 2003
    Assignee: SpeechWorks, International, Inc.
    Inventors: Mark Fanty, Michael S. Phillips
  • Patent number: 6529872
    Abstract: The improved noise adaptation technique employs a linear or non-linear transformation to the set of Jacobian matrices corresponding to an initial noise condition. An &agr;-adaptation parameter or artificial intelligence operation is employed in a linear or non-linear way to increase the adaptation bias added to the speech models. This corrects shortcomings of conventional Jacobian adaptation, which tend to underestimate the effect of noise. The improved adaptation technique is further enhanced by a reduced dimensionality, principal component analysis technique that reduces the computational burden, making the adaptation technique beneficial in embedded recognition systems.
    Type: Grant
    Filed: April 18, 2000
    Date of Patent: March 4, 2003
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Christophe Cerisara, Luca Rigazio, Robert Boman, Jean-Claude Junqua
  • Publication number: 20030036904
    Abstract: Methods and apparatus for the rapid adaptation of classification systems using small amounts of adaptation data. Improvements in classification accuracy are attainable when conditions similar to those that present in adaptation are observed. The attendant methods and apparatus are suitable for a wide variety of different classification schemes, including, e.g., speaker identification and speaker verification.
    Type: Application
    Filed: August 16, 2001
    Publication date: February 20, 2003
    Applicant: IBM Corporation
    Inventors: Upendra V. Chaudhari, Stephane H. Maes, Jiri Navratil
  • Publication number: 20030036905
    Abstract: A process of identifying a speaker in coded speech data and a process of searching for the speaker are efficiently performed with fewer computations and with a smaller storage capacity. In an information search apparatus, an LSP decoding section extracts and decodes only LSP information from coded speech data which is read for each block. An LPC conversion section converts the LSP information into LPC information. A Cepstrum conversion section converts the obtained LPC information into an LPC Cepstrum which represents features of speech. A vector quantization section performs vector quantization on the LPC Cepstrum. A speaker identification section identifies a speaker on the basis of the result of the vector quantization. Furthermore, the identified speaker is compared with a search condition in a condition comparison section, and based on the result, the search result is output.
    Type: Application
    Filed: July 23, 2002
    Publication date: February 20, 2003
    Inventors: Yasuhiro Toguri, Masayuki Nishiguchi
  • Patent number: 6519561
    Abstract: The model adaptation system of the present invention is a speaker verification system that embodies the capability to adapt models learned during the enrollment component to track aging of a user's voice. The system has the advantage of only requiring a single enrollment for the user. The model adaptation system and methods can be applied to several types of speaker recognition models including neural tree networks (NTN), Gaussian Mixture Models (GMMs), and dynamic time warping (DTW) or to multiple models (i.e., combinations of NTNs, GMMs and DTW). Moreover, the present invention can be applied to text-dependent or text-independent systems.
    Type: Grant
    Filed: November 3, 1998
    Date of Patent: February 11, 2003
    Assignee: T-Netix, Inc.
    Inventors: Kevin Farrell, William Mistretta
  • Patent number: 6519563
    Abstract: A speaker verification method and apparatus which advantageously minimizes the constraints on the customer and simplifies the system architecture by using a speaker dependent, rather than a speaker independent, background model, thereby obtaining many of the advantages of using a background model in a speaker verification process without many of the disadvantages thereof. In particular, no training data (e.g. speech) from anyone other than the customer is required, no speaker independent models need to be produced, no a priori knowledge of acoustic rules are required, and, no multi-lingual phone models, dictionaries, or letter-to-sound rules are needed. Nonetheless, in accordance with an illustrative embodiment of the present invention, the customer is free to select any password phrase in any language.
    Type: Grant
    Filed: November 22, 1999
    Date of Patent: February 11, 2003
    Assignee: Lucent Technologies Inc.
    Inventors: Chin-Hui Lee, Qi P. Li, Olivier Siohan, Arun Chandrasekaran Surendran
  • Patent number: 6505155
    Abstract: In a computer speech user interface, a method and computer apparatus for automatically adjusting the content of feedback in a responsive prompt based upon predicted recognition accuracy by a speech recognizer. The method includes the steps of receiving a user voice command from the speech recognizer; calculating present speech recognition accuracy based upon the received user voice command; predicting future recognition accuracy based upon the calculated present speech recognition accuracy; and, generating feedback in a responsive prompt responsive to the predicted recognition accuracy. For predicting future poor recognition accuracy based upon poor present recognition accuracy, the calculating step can include monitoring the received user voice command; detecting a reduced accuracy condition in the monitored user voice command; and, determining poor present recognition accuracy if the reduced accuracy condition is detected in the detecting step.
    Type: Grant
    Filed: May 6, 1999
    Date of Patent: January 7, 2003
    Assignee: International Business Machines Corporation
    Inventors: Ronald Vanbuskirk, Huifang Wang, Kerry A. Ortega, Catherine G. Wolf
  • Patent number: 6499012
    Abstract: A method and apparatus for generating a pair of data elements is provided suitable for use in a speaker verification system. The pair includes a first element representative of a speaker independent template and a second element representative of an extended speaker specific speech pattern. An audio signal forming enrollment data associated with a given speaker is received and processed to derive a speaker independent template and a speaker specific speech pattern. The speaker specific speech pattern is then processed to derive an extended speaker specific speech pattern. The extended speaker specific speech pattern includes a set of expanded speech models, each expanded speech model including a plurality of groups of states, the groups of states being linked to one another by inter-group transitions. Optionally, the expanded speech models are processed on the basis of the enrollment data to condition at least one of the plurality of inter-group transitions.
    Type: Grant
    Filed: December 23, 1999
    Date of Patent: December 24, 2002
    Assignee: Nortel Networks Limited
    Inventors: Stephen Douglas Peters, Matthieu Hebert, Daniel Boies
  • Patent number: 6490560
    Abstract: A system and method for verifying user identity, in accordance with the present invention, includes a conversational system for receiving inputs from a user and transforming the inputs into formal commands. A behavior verifier is coupled to the conversational system for extracting features from the inputs. The features include behavior patterns of the user. The behavior verifier is adapted to compare the input behavior to a behavior model to determine if the user is authorized to interact with the system.
    Type: Grant
    Filed: March 1, 2000
    Date of Patent: December 3, 2002
    Assignee: International Business Machines Corporation
    Inventors: Ganesh N. Ramaswamy, Upendra V. Chaudhari
  • Patent number: 6477493
    Abstract: A method and system for use with a computer recognition system to enroll a user. The method involves a series of steps. The invention provides a user with an enrollment script. The invention then receives a recording made with a transcription device of a dictation session in which the user has dictated at least a portion of the enrollment script. Additionally, the invention can enroll the user in the speech recognition system by decoding the recording and training the speech recognition system.
    Type: Grant
    Filed: July 15, 1999
    Date of Patent: November 5, 2002
    Assignee: International Business Machines Corporation
    Inventors: Brian S. Brooks, Waltraud Brunner, Carmi Gazit, Arthur Keller, Antonio R. Lee, Thomas Netousek, Kerry A. Ortega
  • Patent number: 6471420
    Abstract: A game apparatus of the invention includes: a voice input section for inputting at least one voice set including voice uttered by an operator, for converting the voice set into a first electric signal, and for outputting the first electric signal; a voice recognition section for recognizing the voice set on the basis of the first electric signal output from the voice input means; an image input section for optically detecting a movement of the lips of the operator, for converting the detected movement of lips into a second electric signal, and for outputting the second electric signal; a speech period detection section for receiving the second electric signal, and for obtaining a period in which the voice is uttered by the operator on the basis of the received second electric signal; an overall judgment section for extracting the voice uttered by the operator from the input voice set, on the basis of the voice set recognized by the voice recognition means and the period obtained by the speech period detection
    Type: Grant
    Filed: May 4, 1995
    Date of Patent: October 29, 2002
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Hidetsugu Maekawa, Tatsumi Watanabe, Kazuaki Obara, Kazuhiro Kayashima, Kenji Matsui, Yoshihiko Matsukawa
  • Publication number: 20020123893
    Abstract: A method and system for processing speech misrecognitions. The system can include an embedded speech recognition system having at least one acoustic model and at least one active grammar, wherein the embedded speech recognition system is configured to convert speech audio to text using the at least one acoustic model and the at least one active grammar; a remote training system for modifying the at least one acoustic model based on corrections to speech misrecognitions detected in the embedded speech recognition system; and, a communications link for communicatively linking the embedded speech recognition system to the remote training system. The embedded speech recognition system can further include a user interface for presenting a dialog for correcting the speech misrecognitions detected in the embedded speech recognition system. Notably, the user interface can be a visual display. Alternatively, the user interface can be an audio user interface.
    Type: Application
    Filed: March 1, 2001
    Publication date: September 5, 2002
    Applicant: International Business Machines Corporation
    Inventor: Steven G. Woodward
  • Patent number: 6446039
    Abstract: This invention concerns obtaining high recognition capability while there is a large limitation on memory capacity and processing ability of a CPU. When several words are selected as registration words among a plurality of recognizable words, a recognition target speaker speaks the respective registration words, registration word data for the respective registration words from the sound data is created and saved in a RAM. When the recognition target speaker speaks a registration word, sound is recognized using the registration word data, and when recognizable words other than the registration words are recognized, sound is recognized using specific speaker group sound model data. Furthermore, speaker learning processing is performed using the registration word data and the specific speaker group sound model data, and when recognizable words other than the registration words are recognized, sound is recognized using post-speaker learning data for speaker adaptation.
    Type: Grant
    Filed: August 23, 1999
    Date of Patent: September 3, 2002
    Assignee: Seiko Epson Corporation
    Inventors: Yasunaga Miyazawa, Mitsuhiro Inazumi, Hiroshi Hasegawa, Masahisa Ikejiri
  • Publication number: 20020116190
    Abstract: An unsupervised adaptation method and apparatus are provided that reduce the storage and time requirements associated with adaptation. Under the invention, utterances are converted into feature vectors, which are decoded to produce a transcript and alignment unit boundaries for the utterance. Individual alignment units and the feature vectors associated with those alignment units are then provided to an alignment function, which aligns the feature vectors with the states of each alignment unit. Because the alignment is performed within alignment unit boundaries, fewer feature vectors are used and the time for alignment is reduced. After alignment, the feature vector dimensions aligned to a state are added to dimension sums that are kept for that state. After all the states in an utterance have had their sums updated, the speech signal and the alignment units are deleted. Once sufficient frames of data have been received to perform adaptive training, the acoustic model is adapted.
    Type: Application
    Filed: December 22, 2000
    Publication date: August 22, 2002
    Inventors: William H. Rockenbeck, Milind V. Mahajan, Fileno A. Alleva
  • Publication number: 20020111805
    Abstract: To increase the recognition rate in processes for recognizing speech of a given target language (TL) which is spoken by a speaker of a different source language (SL) as a mother language, it is suggested to use pronunciation variants for said target language (TL) which are derived from said source languge (SL) without using non-native speech in said target langugae (TL).
    Type: Application
    Filed: February 12, 2002
    Publication date: August 15, 2002
    Inventors: Silke Goronzy, Ralf Kompe
  • Publication number: 20020095287
    Abstract: Described here is a method of determining an eigenspace for representing a plurality of training speakers, in which first speaker-dependent sets of models are formed for the individual training speakers while training speech data of the individual training speakers are used and the models (SD) of a set of models are described each by a plurality of model parameters. For each speaker a combined model is then displayed in a high-dimensional model space by concatenation of the model parameters of the models of the individual training speakers to a respective coherent super vector. Subsequently, a transformation is carried out while the model space dimension is reduced to obtain eigenspace basis vectors (Ee), which transformation utilizes a reduction criterion based on the variability of the vectors to be transformed. Then the high-dimensional model space is first in a first step reduced to a speaker subspace by a change of basis, in which speaker subspace all the training speakers are represented.
    Type: Application
    Filed: September 24, 2001
    Publication date: July 18, 2002
    Inventor: Henrik Botterweck
  • Patent number: 6421641
    Abstract: A method of performing speaker adaptation of acoustic models in a band-quantized speech recognition system, wherein the system including one or more acoustic models represented by a feature space of multi-dimensional gaussians, whose dimensions are partitioned into bands, and the gaussian means and covariances within each band are quantized into atoms, comprises the following steps. A decoded segment of a speech signal associated with a particular speaker is obtained. Then, at least one adaptation mapping based on the decoded segment is computed. Lastly, the at least one adaptation mapping is applied to the atoms of the acoustic models to generate one or more acoustic models adapted to the particular speaker. Accordingly, a fast speaker adaptation methodology is provided for use in real-time applications.
    Type: Grant
    Filed: November 12, 1999
    Date of Patent: July 16, 2002
    Assignee: International Business Machines Corporation
    Inventors: Jing Huang, Mukund Padmanabhan
  • Publication number: 20020082832
    Abstract: A voice input section receives voice of the user designating a name etc. and outputs a voice signal to a voice recognition section. The voice recognition section analyzes and recognizes the voice signal and thereby obtains voice data. The voice data is compared with voice patterns that have been registered in the mobile communications terminal corresponding to individuals etc. and thereby a voice pattern that most matches the voice data is searched for and retrieved. If the retrieval of a matching voice pattern succeeded, a memory search processing section refers to a voice-data correspondence table and thereby calls up a telephone directory that has been registered corresponding to the retrieved voice pattern. In each telephone directory, various types of data (telephone number, mail address, URL, etc.) of an individual etc. to be used for starting communication have been registered previously. The type of data to be called up is designated by button operation etc.
    Type: Application
    Filed: December 17, 2001
    Publication date: June 27, 2002
    Applicant: NEC CORPORATION
    Inventor: Yoshihisa Nagashima
  • Patent number: 6393397
    Abstract: An apparatus for selecting a cohort model for use in a speaker verification system includes a model generator (108) for determining a target speaker model (114) from a speech sample collected from the target speaker (106). A cohort selector (110) determines a similarity value between each of a number of predetermined existing speaker models from a model pool (112) and the target speaker model (114) and a dissimilarity value between each of the existing speaker models and any previously selected cohort models (116). An existing speaker model which is most similar to the target speaker model, but most dissimilar to previously chosen cohort models, is then chosen as another cohort model for the target speaker.
    Type: Grant
    Filed: June 14, 1999
    Date of Patent: May 21, 2002
    Assignee: Motorola, Inc.
    Inventors: Ho Chuen Choi, Xiaoyuan Zhu, Jianming Song
  • Patent number: 6389392
    Abstract: A method and apparatus for pattern recognition comprising comparing an input signal representing an unknown pattern with reference data representing each of a plurality of pre-defined patterns, at least one of the pre-defined patterns being represented by at least two instances of reference data. Successive segments of the input signal are compared with successive segments of the reference data and comparison results for each successive segment are generated. For each pre-defined pattern having at least two instances of reference data, the comparison results for the closest matching segment of reference data for each segment of the input signal are recorded to produce a composite comparison result for the said pre-defined pattern. The unknown pattern is the identified on the basis of the comparison results. Thus the effect of a mismatch between the input signal and each instance of the reference data is reduced by selecting the best segments from the instances of reference data for each pre-defined pattern.
    Type: Grant
    Filed: December 8, 1998
    Date of Patent: May 14, 2002
    Assignee: British Telecommunications public limited company
    Inventors: Mark Pawlewski, Aladdin Mohammad Ariyaeeinia, Perasiriyan Sivakumaran
  • Patent number: 6381572
    Abstract: A method of modifying feature parameters for a speech recognition is provided. This method is provided with: a process of extracting the feature parameter from an input speech in a real environment; a process of reading a first speech transfer characteristic corresponding to an environment in which a reference pattern for the speech recognition is generated, from a first memory device; a process of reading a second speech transfer characteristic corresponding to the real environment from a second memory device; a process of modifying the extracted feature parameter according to the first speech transfer characteristic and the second speech transfer characteristic to convert the extracted feature parameter corresponding to the real environment into a modified feature parameter corresponding to the environment in which the reference pattern is generated.
    Type: Grant
    Filed: April 9, 1999
    Date of Patent: April 30, 2002
    Assignee: Pioneer Electronic Corporation
    Inventors: Shunsuke Ishimitsu, Ikuo Fujita
  • Publication number: 20020046027
    Abstract: In an apparatus and method of voice recognition, where there are the same names, a recognition system side creates the keyword for limiting the plurality of names and inquires a user, and in response to the inquiry, the user announces a keyword, thereby executing limiting processing. Because of such a configuration, a single desired spot name can be finally specified easily.
    Type: Application
    Filed: October 15, 2001
    Publication date: April 18, 2002
    Applicant: Pioneer Corporation
    Inventor: Fumio Tamura
  • Patent number: 6349280
    Abstract: A method of recognizing a speaker of an input speech according to the distance between an input speech pattern, obtained by converting the input speech to a feature parameter series, and a reference pattern preliminarily registered as feature parameter series for each speaker is provided. Contents of the input and reference speech patterns is obtained by recognition. An identical section, in which the contents of the input and reference speech patterns are identical is determined. The distance between the input and reference speech patterns in the calculated identical content section is determined. The speaker of the input speech is recognized on the basis of the determined distance.
    Type: Grant
    Filed: March 4, 1999
    Date of Patent: February 19, 2002
    Assignee: NEC Corporation
    Inventor: Hiroaki Hattori
  • Patent number: 6310629
    Abstract: A system and method for providing a controllable virtual environment includes a computer (11) with processor and a display coupled to the processor to display 2-D or 3-D virtual environment objects. Speech grammars are stored as attributes of the virtual environment objects. Voice commands are recognized by a speech recognizer (19) and microphone (20) coupled to the processor whereby the voice commands are used to manipulate the virtual environment objects on the display. The system is further made role-dependent whereby the display of virtual environment objects and grammar is dependent on the role of the user.
    Type: Grant
    Filed: November 9, 1998
    Date of Patent: October 30, 2001
    Assignee: Texas Instruments Incorporated
    Inventors: Yeshwant K. Muthusamy, Jonathan D. Courtney, Edwin R. Cole
  • Patent number: 6253180
    Abstract: The invention provides a speech recognition apparatus which makes it possible to utilize, upon prior learning, in addition to SD-HMMs of a large number of speakers, adaptation utterances of the speakers to make operation conditions coincide with those upon speaker adaptation to allow prior learning of parameters having a high accuracy and by which, also when the number of words of adaptation utterances upon speaker adaptation is small, adaptation can be performed with a high degree of accuracy.
    Type: Grant
    Filed: June 16, 1999
    Date of Patent: June 26, 2001
    Assignee: NEC Corporation
    Inventor: Kenichi Iso
  • Patent number: 6243677
    Abstract: An improved method of providing out-of-vocabulary word rejection is achieved by adapting an initial garbage model (23) using received incoming enrollment speech (21).
    Type: Grant
    Filed: October 23, 1998
    Date of Patent: June 5, 2001
    Assignee: Texas Instruments Incorporated
    Inventors: Levent M. Arslan, Lorin P. Netsch, Periagaram K. Rajasekaran
  • Patent number: 6233557
    Abstract: A voice recognition system (204, 206, 207, 208) assigns a penalty to a score in a voice recognition system. The system generates a lower threshold for the number of frames assigned to at least one state of at least one model and an upper threshold for the number of frames assigned to at least one state of at least one model. The system assigns an out of state transition penalty to an out of state transition score in an allocation assignment algorithm if the lower threshold has not been met. The out of state transition penalty is proportional to the number of frames that the dwell time is below the lower threshold. A self loop penalty is applied to a self loop score if the upper threshold number of frames assigned to a state has been exceeded. The out of state transition penalty is proportional to the number of frames that the dwell time is above the upper threshold.
    Type: Grant
    Filed: February 23, 1999
    Date of Patent: May 15, 2001
    Assignee: Motorola, Inc.
    Inventor: Daniel C. Poppert
  • Patent number: 6233556
    Abstract: A voice processing and verification system accounts for variations dependent upon telephony equipment differences. Models are developed for the various types of telephony equipment from many users speaking on each of the types of equipment. A transformation algorithm is determined for making a transformation between each of the various types of equipment to each of the others. In other words, a model is formed for carbon button telephony equipment from many users. Similarly, a model is formed for electret telephony equipment from many users, and for cellular telephony equipment from many users. During an enrollment, a user speaks to the system. The system forms and stores a model of the user's speech. The type of telephony equipment used in the original enrollment session is also detected and stored along with the enrollment voice model. The system determines the types of telephony equipment being used based upon the spectrum of sound it receives.
    Type: Grant
    Filed: December 16, 1998
    Date of Patent: May 15, 2001
    Assignee: Nuance Communications
    Inventors: Remco Teunen, Ben Shahshahani
  • Patent number: 6223157
    Abstract: Digital Cellular telephony requires voice compression designed to minimize the bandwidth required for the digital cellular channel. The features used in speech recognition have similar components to those used in the vocoding process. The present invention provides a system that bypasses the de-compression or decoding phase of the vocoding and converts the digital cellular parameters directly into features that can be processed by a recognition engine. More specifically, the present invention provides a system and method for mapping a vocoded representation of parameters defining speech components, which in turn define a particular waveform, into a base feature type representation of parameters defining speech components (e.g. LPC parameters), which in turn define the same digital waveform.
    Type: Grant
    Filed: May 7, 1998
    Date of Patent: April 24, 2001
    Assignee: DSC Telecom, L.P.
    Inventors: Thomas D. Fisher, Jeffery J. Spiess, Dearborn R. Mowry
  • Patent number: 6205424
    Abstract: Speech signals from speakers having known identities are used to create sets of acoustic models. The acoustic models along with their corresponding identities are stored in a memory. A plurality of sets of cohort models that characterize the speech signals are selected from the stored sets of acoustic models, and linked to the set of acoustic models of each identified speaker. During a testing session speech signals produced by an unknown speaker having a claimed identity are processed to generate processed speech signals. The processed speech signals are compared to the set of models of the claimed speaker to produce first scores. The processed speech signals are also compared to the sets cohort models to produce second scores. A subset of scores are dynamically selected from the second scores according to a predetermined criteria.
    Type: Grant
    Filed: July 31, 1996
    Date of Patent: March 20, 2001
    Assignee: Compaq Computer Corporation
    Inventors: William D. Goldenthal, Brian S. Eberman
  • Patent number: 6195637
    Abstract: A method for correcting misrecognition errors comprises the steps of: dictating to a speech application; marking misrecognized words during the dictating step; and, after the dictating and marking steps, displaying and correcting the marked misrecognized words, whereby the correcting of the misrecognized words is deferred until after the dictating step is concluded and the dictating step is not significantly interrupted. The displaying and correcting step can be implemented by invoking a correction tool of the speech application, whereby the correcting of the misrecognized words trains the speech application.
    Type: Grant
    Filed: March 25, 1998
    Date of Patent: February 27, 2001
    Assignee: International Business Machines Corp.
    Inventors: Barbara E. Ballard, Kerry A. Ortega
  • Patent number: 6182038
    Abstract: A method and apparatus for generating a context dependent phoneme network as an intermediate step of encoding speech information. The context dependent phoneme network is generated from speech in a phoneme network generator (48) associated with an operating system (44). The context dependent phoneme network is then transmitted to a first application (52).
    Type: Grant
    Filed: December 1, 1997
    Date of Patent: January 30, 2001
    Assignee: Motorola, Inc.
    Inventors: Sreeram Balakrishnan, Stephen Austin
  • Patent number: 6178401
    Abstract: A method is provided for reducing search complexity in a speech recognition system having a fast match, a detailed match, and a language model. Based on at least one predetermined variable, the fast match is optionally employed to generate candidate words and acoustic scores corresponding to the candidate words. The language model is employed to generate language model scores. The acoustic scores are combined with the language model scores and the combined scores are ranked to determine top ranking candidate words to be later processed by the detailed match, when the fast match is employed. The detailed match is employed to generate detailed match scores for the top ranking candidate words.
    Type: Grant
    Filed: August 28, 1998
    Date of Patent: January 23, 2001
    Assignee: International Business Machines Corporation
    Inventors: Martin Franz, Miroslav Novak
  • Patent number: 6173260
    Abstract: The classification of speech according to emotional content employs acoustic measures in addition to pitch as classification input. In one embodiment, two different kinds of features in a speech signal are analyzed for classification purposes. One set of features is based on pitch information that is obtained from a speech signal, and the other set of features is based on changes in the spectral shape of the speech signal over time. This latter feature is used to distinguish long, smoothly varying sounds from quickly changing sound, which may indicate the emotional state of the speaker. These changes are determined by means of a low-dimensional representation of the speech signal, such as MFCC or LPC. Additional features of the speech signal, such as energy, can also be employed for classification purposes. Different variations of pitch and spectral shape features can be measured and analyzed, to assist in the classification of individual utterances.
    Type: Grant
    Filed: March 31, 1998
    Date of Patent: January 9, 2001
    Assignee: Interval Research Corporation
    Inventor: Malcolm Slaney