Specialized Models Patents (Class 704/250)
  • Patent number: 7391877
    Abstract: Optimal head related transfer function spatial configurations designed to maximize speech intelligibility in multi-talker speech displays by spatially separating competing speech channels combined with a method of normalizing the relative levels of the different talkers in a multi-talker speech display that improves overall performance even in conventional multi-talker spatial configurations.
    Type: Grant
    Filed: March 30, 2007
    Date of Patent: June 24, 2008
    Assignee: United States of America as represented by the Secretary of the Air Force
    Inventor: Douglas S. Brungart
  • Patent number: 7379869
    Abstract: A method of comparing voice signatures is provided comprising selecting an original performance. The original performance is comprised of an original performance voice signature. A user impersonation of at least a portion of the original performance is recorded and a user impersonation voice signature is established. The user impersonation voice signature is electronically compared to the original performance voice signature. A graduated performance value is generated representative of the similarities between the original voice signature and the user impersonation voice signature. An entertainment application is based on the graduated performance value.
    Type: Grant
    Filed: February 25, 2004
    Date of Patent: May 27, 2008
    Inventor: Kurz Kendra
  • Publication number: 20080103771
    Abstract: A method for the distributed construction of a voice recognition model that is intended to be used by a device comprising a model base and a reference base in which the modeling elements are stored. The method includes the steps of obtaining the entity to be modeled, transmitting data representative of the entity over a communication link to a server, determining a set of modeling parameters indicating the modeling elements, transmitting the modeling parameters to the device, determining the voice recognition model of the entity to be modeled as a function of at least the modeling parameters received and at least one modeling element that is stored in the reference base and indicated in the transmitted parameters, and subsequently saving the voice recognition model in the model base.
    Type: Application
    Filed: October 27, 2005
    Publication date: May 1, 2008
    Applicant: France Telecom
    Inventors: Denis Jouvet, Jean Monne
  • Publication number: 20080082332
    Abstract: An embodiment of the present invention provides a speech recognition engine that utilizes portable voice profiles for converting recorded speech to text. Each portable voice profile includes speaker-dependent data, and is configured to be accessible to a plurality of speech recognition engines through a common interface. A voice profile manager receives the portable voice profiles from other users who have agreed to share their voice profiles. The speech recognition engine includes speaker identification logic to dynamically select a particular portable voice profile, in real-time, from a group of portable voice profiles. The speaker-dependent data included with the portable voice profile enhances the accuracy with which speech recognition engines recognizes spoken words in recorded speech from a speaker associated with a portable voice profile.
    Type: Application
    Filed: September 28, 2006
    Publication date: April 3, 2008
    Inventors: Jacqueline Mallett, Sunil Vemuri, N. Rao Machiraju
  • Publication number: 20080082333
    Abstract: A contour for a syllable (or other speech segment) in a voice undergoing conversion is transformed. The transform of that contour is then used to identify one or more source syllable transforms in a codebook. Information regarding the context and/or linguistic features of the contour being converted can also be compared to similar information in the codebook when identifying an appropriate source transform. Once a codebook source transform is selected, an inverse transformation is performed on a corresponding codebook target transform to yield an output contour. The corresponding codebook target transform represents a target voice version of the same syllable represented by the selected codebook source transform. The output contour may be further processed to improve conversion quality.
    Type: Application
    Filed: September 29, 2006
    Publication date: April 3, 2008
    Applicant: NOKIA CORPORATION
    Inventors: Jani K. Nurminen, Elina Helander
  • Patent number: 7340396
    Abstract: Speech feature vectors (10) are provided and utilized to develop a corresponding estimated speaker dependent speech feature space model (20) (in one embodiment, it is not necessary that this model (20) have defined correlations with the verbal content of the represented speech itself). A model alignment unit (21) then contrasts this model (20) against the contents of a speaker independent speech feature space model (24) to provide alignment indices to a transformation estimation unit (23). In one embodiment, these alignment indices are based, as least in part, upon a measure of the differences between likelihoods of occurrence for the elements that comprise the constituency of these models. The transformation estimation unit (23) utilizes these alignment indices to provide transformation parameters to a model transformation unit (25) that uses such parameters to transform a speaker independent speech recognition model set (26) and yield a resultant speaker adapted speech recognition model set (27).
    Type: Grant
    Filed: February 18, 2003
    Date of Patent: March 4, 2008
    Assignee: Motorola, Inc.
    Inventors: Mark Thomson, Julien Epps, Trym Holter
  • Publication number: 20080046241
    Abstract: Method and System for detecting speaker change in a voice transaction is provided. The system analyzes a portion of speech in a speech stream and determines a speech feature set. The system then detects a feature change and determines speaker change.
    Type: Application
    Filed: February 20, 2007
    Publication date: February 21, 2008
    Inventors: Andrew Osburn, Jeremy Bernard, Mark Boyle
  • Patent number: 7308403
    Abstract: A method for objective speech quality assessment that accounts for phonetic contents, speaking styles or individual speaker differences by distorting speech signals under speech quality assessment. By using a distorted version of a speech signal, it is possible to compensate for different phonetic contents, different individual speakers and different speaking styles when assessing speech quality. The amount of degradation in the objective speech quality assessment by distorting the speech signal is maintained similarly for different speech signals, especially when the amount of distortion of the distorted version of speech signal is severe. Objective speech quality assessment for the distorted speech signal and the original undistorted speech signal are compared to obtain a speech quality assessment compensated for utterance dependent articulation.
    Type: Grant
    Filed: July 1, 2002
    Date of Patent: December 11, 2007
    Assignee: Lucent Technologies Inc.
    Inventor: Doh-Suk Kim
  • Patent number: 7305339
    Abstract: A method for estimating high-order Mel Frequency Cepstral Coefficients, the method comprising initializing any of N?L high-order coefficients (HOC) of an MFCC vector of length N having L low-order coefficients (LOC) to a predetermined value, thereby forming a candidate MFCC vector, synthesizing a speech signal frame from the candidate MFCC vector and a pitch value, and computing an N-dimensional MFCC vector from the synthesized frame, thereby producing an output MFCC vector.
    Type: Grant
    Filed: April 1, 2003
    Date of Patent: December 4, 2007
    Assignee: International Business Machines Corporation
    Inventor: Alexander Sorin
  • Patent number: 7302389
    Abstract: A computer-based system generates alternative phonetic transcriptions for a target word or phrase corresponding to specific phonological processes that replace individual phonemes or clusters of two or more phonemes with replacement phonemes. The system compares a user's speech with a list of possible transcriptions that includes the base (i.e., correct) transcription of the test target as well as the different alternative transcriptions, to identify the transcription that best matches the user's. In a speech therapy application, the system identifies the phonological process(es), if any, associated with the user's speech and generates statistics over multiple test targets that can be used to diagnose the user's specific phonological disorders. The system can also be implemented in other contexts such as foreign language instruction and automated attendant applications to cover a wide variety and range of accents and/or phonological disorders.
    Type: Grant
    Filed: August 8, 2003
    Date of Patent: November 27, 2007
    Assignee: Lucent Technologies Inc.
    Inventors: Sunil K. Gupta, Prabhu Raghavan, Chetan Vinchhi
  • Patent number: 7295978
    Abstract: A system for recognizing speech receives an input speech vector and identifies a Gaussian distribution. The system determines an address from the input speech vector (610) and uses the address to retrieve a distance value for the Gaussian distribution from a table (620). The system then determines the probability of the Gaussian distribution using the distance value (630) and recognizes the input speech vector based on the determined probability (640).
    Type: Grant
    Filed: September 5, 2000
    Date of Patent: November 13, 2007
    Assignees: Verizon Corporate Services Group Inc., BBN Technologies Corp.
    Inventors: Richard Mark Schwartz, Jason Charles Davenport, James Donald Van Sciver, Long Nguyen
  • Patent number: 7254529
    Abstract: A method and apparatus are provided for adapting a language model to a task-specific domain. Under the method and apparatus, the relative frequency of n-grams in a small training set (i.e. task-specific training data set) and the relative frequency of n-grams in a large training set (i.e. out-of-domain training data set) are used to weight a distribution count of n-grams in the large training set. The weighted distributions are then used to form a modified language model by identifying probabilities for n-grams from the weighted distributions.
    Type: Grant
    Filed: September 13, 2005
    Date of Patent: August 7, 2007
    Assignee: MIcrosoft Corporation
    Inventors: Jianfeng Gao, Mingjing Li
  • Patent number: 7231350
    Abstract: A method and system for speech characterization. One embodiment includes a method for speaker verification which includes collecting data from a speaker, wherein the data comprises acoustic data and non-acoustic data. The data is used to generate a template that includes a first set of “template” parameters. The method further includes receiving a real-time identity claim from a claimant, and using acoustic data and non-acoustic data from the identity claim to generate a second set of parameters. The method further includes comparing the first set of parameters to the set of parameters to determine whether the claimant is the speaker. The first set of parameters and the second set of parameters include at least one purely non-acoustic parameter, including a non-acoustic glottal shape parameter derived from averaging multiple glottal cycle waveforms.
    Type: Grant
    Filed: December 21, 2005
    Date of Patent: June 12, 2007
    Assignee: The Regents of the University of California
    Inventors: Todd J. Gable, Lawrence C. Ng, John F. Holzrichter, Greg C. Burnett
  • Patent number: 7231019
    Abstract: A method and apparatus are provided for identifying a caller of a call from the caller to a recipient. A voice input is received from the caller, and characteristics of the voice input are applied to a plurality of acoustic models, which include a generic acoustic model and acoustic models of any previously identified callers, to obtain a plurality of respective acoustic scores. The caller is identified as one of the previously identified callers or as a new caller based on the plurality of acoustic scores. If the caller is identified as a new caller, a new acoustic model is generated for the new caller, which is specific to the new caller.
    Type: Grant
    Filed: February 12, 2004
    Date of Patent: June 12, 2007
    Assignee: Microsoft Corporation
    Inventor: Andrei Pascovici
  • Patent number: 7228277
    Abstract: A voice input section receives voice of the user designating a name etc. and outputs a voice signal to a speech recognition section. The speech recognition section analyzes and recognizes the voice signal and thereby obtains voice data. The voice data is compared with voice patterns that have been registered in the mobile communications terminal corresponding to individuals etc. and thereby a voice pattern that most matches the voice data is searched for and retrieved. If the retrieval of a matching voice pattern succeeded, a memory search processing section refers to a voice-data correspondence table and thereby calls up a telephone directory that has been registered corresponding to the retrieved voice pattern. In each telephone directory, various types of data (telephone number, mail address, URL, etc.) of an individual etc. to be used for starting communication have been registered previously. The type of data to be called up is designated by button operation etc.
    Type: Grant
    Filed: December 17, 2001
    Date of Patent: June 5, 2007
    Assignee: NEC Corporation
    Inventor: Yoshihisa Nagashima
  • Patent number: 7225132
    Abstract: An identification code is assigned to a user by making a selection from a closed set of possible tokens. The selection is determined algorithmically by user identity data. The format of the identification code may comprise a sequence of natural language words chosen from closed sets and a separator character having a fixed value or a small range of possible values. The closed sets may be programmed in the recognition grammar of a speech interface to secure services such as banking.
    Type: Grant
    Filed: March 13, 2001
    Date of Patent: May 29, 2007
    Assignee: British Telecommunications plc
    Inventors: David J Attwater, John S Fisher, Paul F R Marsh
  • Patent number: 7222072
    Abstract: A speaker identity claim (SIC) utterance is received and recognized. The SIC utterance is compared with a voice profile registered under the SIC, and a first verification decision is based thereon. A first dynamic phrase (FDP) is generated, and a user is prompted to speak same. An FDP utterance is received, and compared with the voice profile registered under the SIC to make a second verification decision. If the second verification decision indicates a high or low confidence level, the speaker identity claim is accepted or rejected, respectively. If the verification decision indicates a medium confidence level, a second dynamic phrase (SDP) is generated, and the user is prompted to speak same. An SDP utterance is received, and compared with the voice profile registered under the SIC to make a third verification decision. The speaker identity claim is accepted or rejected based on the third verification decision.
    Type: Grant
    Filed: February 13, 2003
    Date of Patent: May 22, 2007
    Assignee: SBC Properties, L.P.
    Inventor: Hisao M. Chang
  • Patent number: 7167545
    Abstract: Information sought, with associated attributes, is stored in the form of data records. Embodiments provide for inquiry of search arguments for several attributes stored in a data record; comparison of the input search arguments with search arguments stored; selection of a number of hits of storage search arguments corresponding to the respective input search argument, for each of the search argument inputs; weighting of the selected search arguments with scores, which weighting indicates the probability with which the respective selected stored search argument agrees with the actually input search argument; selection of suitable data records from the database via the selected number of hits; weighting of the selected data records with overall scores indicating the probability of the respective selected data record agreeing with the actually input search arguments, depending on the scores of the individual selected search arguments; and output of the data record with the highest overall score.
    Type: Grant
    Filed: November 27, 2001
    Date of Patent: January 23, 2007
    Assignee: Varetis Solutions GmbH
    Inventors: Bernd Plannerer, Michael Dahmen, Klaus Heidenfelder, Johannes Wagner
  • Patent number: 7165025
    Abstract: Auditory-articulatory analysis for use in speech quality assessment. Articulatory analysis is based on a comparison between powers associated with articulation and non-articulation frequency ranges of a speech signal. Neither source speech nor an estimate of the source speech is utilized in articulatory analysis. Articulatory analysis comprises the steps of comparing articulation power and non-articulation power of a speech signal, and assessing speech quality based on the comparison, wherein articulation and non-articulation powers are powers associated with articulation and non-articulation frequency ranges of the speech signal.
    Type: Grant
    Filed: July 1, 2002
    Date of Patent: January 16, 2007
    Assignee: Lucent Technologies Inc.
    Inventor: Doh-Suk Kim
  • Patent number: 7085718
    Abstract: It is suggested to include application speech (AS) into the set of identification speech data (ISD) for training a speaker-identification process so as to make possible a reduction of the set of initial identification speech data (IISD) to be collected within an initial enrolment phase and therefore to add more convenience for the user to be registered or enrolled.
    Type: Grant
    Filed: May 6, 2002
    Date of Patent: August 1, 2006
    Assignee: Sony Deutschland GmbH
    Inventor: Thomas Kemp
  • Patent number: 7076423
    Abstract: The present invention relates to coding and storage of phonetic features in order to search for strings of characters, whereby it is applied in particular to searching for a variety of names, identifiers, denotations and other character strings in a database. This is achieved by a method and system for coding and storing phonetic information representable as an original character sequence in which the phonetic information is coded in a bit code which does not comprise any characters. In some embodiments, tables are used and which comprise character groups that are found empirically and reflect the specific phonetics and method of spelling a name adapted to the actual language in use. This enables efficient coding of phonetic features associated with said groups and provides for adapting the coding method of the present invention to a plurality of different languages.
    Type: Grant
    Filed: December 21, 2000
    Date of Patent: July 11, 2006
    Assignee: International Business Machines Corporation
    Inventor: Thomas Boehme
  • Patent number: 7043422
    Abstract: A method and apparatus are provided for adapting a language model to a task-specific domain. Under the method and apparatus, the relative frequency of n-grams in a small training set (i.e. task-specific training data set) and the relative frequency of n-grams in a large training set (i.e. out-of-domain training data set) are used to weight a distribution count of n-grams in the large training set. The weighted distributions are then used to form a modified language model by identifying probabilities for n-grams from the weighted distributions.
    Type: Grant
    Filed: September 4, 2001
    Date of Patent: May 9, 2006
    Assignee: Microsoft Corporation
    Inventors: Jianfeng Gao, Mingjing Li
  • Patent number: 7039587
    Abstract: Methods and arrangements for facilitating speaker identification. At least one N-best list is generated based on input speech, a system output is posited based on the input speech, and a determination is made, via at least one property of the N-best list, as to whether the posited system output is inconclusive.
    Type: Grant
    Filed: January 4, 2002
    Date of Patent: May 2, 2006
    Assignee: International Business Machines Corporation
    Inventors: Upendra V. Chaudhari, Jiri Navratil, Ganesh N. Ramaswamy
  • Patent number: 7016839
    Abstract: There is provided a method for extracting feature vectors from a digitized utterance. Spectral envelope estimates are computed from overlapping frames in the digitized utterance based on a Minimum Variance Distortionless Response (MVDR) method. Cepstral feature vectors are generated from the spectral envelope estimates. There is provided a method for generating spectral envelope estimates from a digitized utterance. The spectral envelope estimates are generated from overlapping frames in the digitized utterance based on a harmonic mean of at least two low- to-high resolution spectrum estimates. There is provided a method for reducing variance of a feature stream in a pattern recognition system. The feature stream is temporally or spatially averaged to reduce the variance of the feature stream.
    Type: Grant
    Filed: January 31, 2002
    Date of Patent: March 21, 2006
    Assignee: International Business Machines Corporation
    Inventors: Satayanarayana Dharanipragada, Bhaskar Dharanipragada Rao
  • Patent number: 6999928
    Abstract: Disclosed is a method of automated speaker identification, comprising receiving a sample speech input signal from a sample handset; deriving a cepstral covariance sample matrix from the first sample speech signal; calculating, with a distance metric, all distances between the sample matrix and one or more cepstral covariance signature matrices; determining if the smallest of the distances is below a predetermined threshold value; and wherein the distance metric is selected from d 5 ? ( S , ? ) = A + 1 H - 2 , d 6 ? ( S , ? ) = ( A + 1 H ) ? ( G + 1 G ) - 4 , d 7 ? ( S , ? ) = A 2 ? ? H ? ( G + 1 G ) - 1 , d 8 ? ( S , ? ) = ( A + 1 H ) ( G + 1 G ) - 1 , d 9 ? ( S , ? ) = A G + G H - 2 , fusion derivatives thereof, and fusion derivatives thereof with ? d 1 ? ( S , ? ) = A H - 1.
    Type: Grant
    Filed: August 21, 2001
    Date of Patent: February 14, 2006
    Assignee: International Business Machines Corporation
    Inventors: Zhong-Hua Wang, David Lubensky, Cheng Wu
  • Patent number: 6952674
    Abstract: A speech-recognition system learns a speech profile of a user whose speech is to be recognized. The system plays audible speech samples, stored in sound files, so that the file that most resembles the user's speech may be selected. After receiving the selection, the system identifies an acoustic model that is associated with the chosen sound file. The system may also select a subset of sound files based on information indicative of the user's speech. The system may then play a subset of sound files so that the file that most resembles the user's speech may be selected.
    Type: Grant
    Filed: January 7, 2002
    Date of Patent: October 4, 2005
    Assignee: Intel Corporation
    Inventor: Richard A. Forand
  • Patent number: 6934682
    Abstract: A method and system for processing speech misrecognitions. The system can include an embedded speech recognition system having at least one acoustic model and at least one active grammar, wherein the embedded speech recognition system is configured to convert speech audio to text using the at least one acoustic model and the at least one active grammar; a remote training system for modifying the at least one acoustic model based on corrections to speech misrecognitions detected in the embedded speech recognition system; and, a communications link for communicatively linking the embedded speech recognition system to the remote training system. The embedded speech recognition system can further include a user interface for presenting a dialog for correcting the speech misrecognitions detected in the embedded speech recognition system. Notably, the user interface can be a visual display. Alternatively, the user interface can be an audio user interface.
    Type: Grant
    Filed: March 1, 2001
    Date of Patent: August 23, 2005
    Assignee: International Business Machines Corporation
    Inventor: Steven G. Woodward
  • Patent number: 6934681
    Abstract: A voice recognition system comprises an analyzer for converting an input voice signal to an input pattern including cepstrum, a reference pattern for storing reference patterns, an elongation/contraction estimating unit for outputting an elongation/contraction parameter in frequency axis direction by using the input pattern and the reference patterns, and a recognizing unit for calculating the distances between the converted input pattern from the converter and the reference patterns and outputting the reference pattern corresponding to the shortest distance as result of recognition. The elongation/contraction unit estimates an elongation/contraction parameter by using cepstrum included in the input pattern. The elongation/contraction unit does not have various values in advance for determining the elongation/contraction parameter, nor is it necessary for the elongation/contraction unit have to execute distance calculation for various values.
    Type: Grant
    Filed: October 25, 2000
    Date of Patent: August 23, 2005
    Assignee: NEC Corporation
    Inventors: Tadashi Emori, Koichi Shinoda
  • Patent number: 6915260
    Abstract: Described here is a method of determining an eigenspace for representing a plurality of training speakers, in which first speaker-dependent sets of models are formed for the individual training speakers while training speech data of the individual training speakers are used and the models (SD) of a set of models are described each by a plurality of model parameters. For each speaker a combined model is then displayed in a high-dimensional model space by concatenation of the model parameters of the models of the individual training speakers to a respective coherent super vector. Subsequently, a transformation is carried out while the model space dimension is reduced to obtain eigenspace basis vectors (Ee), which transformation utilizes a reduction criterion based on the variability of the vectors to be transformed. Then the high-dimensional model space is first in a first step reduced to a speaker subspace by a change of basis, in which speaker subspace all the training speakers are represented.
    Type: Grant
    Filed: September 24, 2001
    Date of Patent: July 5, 2005
    Assignee: Koninklijke Philips Electronics, N.V.
    Inventor: Henrik Botterweck
  • Patent number: 6895376
    Abstract: A reduced dimensionality eigenvoice analytical technique is used during training to develop context-dependent acoustic models for allophones. Re-estimation processes are performed to more strongly separate speaker-dependent and speaker-independent components of the speech model. The eigenvoice technique is also used during run time upon the speech of a new speaker. The technique removes individual speaker idiosyncrasies, to produce more universally applicable and robust allophone models. In one embodiment the eigenvoice technique is used to identify the centroid of each speaker, which may then be “subtracted out” of the recognition equation.
    Type: Grant
    Filed: May 4, 2001
    Date of Patent: May 17, 2005
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Florent Perronnin, Roland Kuhn, Patrick Nguyen, Jean-Claude Junqua
  • Patent number: 6882971
    Abstract: A method and associated apparatus for indicating the voice of each talker from a plurality of talkers to be heard by a listener. The method uses a signal that is transmitted over a telecommunications system. The method includes projecting the voice from each one of the plurality of talkers to the listener. A talker indicator is provided proximate to the listener. Talker identification information is generated in the talker indicator that can be used to indicate the identity of each talker who is speaking at any given time to the listener. A device is coupled to the talker indicator that can transmit the voice signal from each talker to the listener. In different aspects, the talker identification information can include such varied indicators as audio, video, or an announcement combined with a temporally compressed voice signal. In another aspect, an emotographic figure is displayed to the listener that each represent a distinct talker.
    Type: Grant
    Filed: July 18, 2002
    Date of Patent: April 19, 2005
    Assignee: General Instrument Corporation
    Inventor: Michael L. Craner
  • Patent number: 6859773
    Abstract: A method of voice recognition in a noise-ridden acoustic signal comprises a phase of digitizing temporal frames of the noise-ridden acoustic signal, a phase of parametrization of speech-containing temporal frames, a shape-recognition phase in which the parameters are assessed with respect to references pre-recorded in a reference space, a phase of reiterative searching for noise models in the noise-ridden signal frames, a phase of searching for a transition between the new noise model and the old model and, when the noise transition has been detected, a phase of updating the reference space, the parametrization phase including a step of matching the parameters to the new noise model.
    Type: Grant
    Filed: May 9, 2001
    Date of Patent: February 22, 2005
    Assignee: Thales
    Inventor: Pierre-Albert Breton
  • Patent number: 6850888
    Abstract: A method and apparatus are disclosed for training a pattern recognition system, such as a speech recognition system, using an improved objective function. The concept of rank likelihood, previously applied only to the decding process, is applied in a novel manner to the parameter estimation of the training phase of a pattern recognition system. The disclosed objective function is based on a pseudo-rank likelihood that not only maximizes the likelihood of an observation for the correct class, but also minimizes the likelihoods of the observation for all other classes, such that the discrimination between classes is maximized. A training process is disclosed that utilizes the pseudo-rank likelihood objective function to identify model parameters that will result in a pattern recognizer with the lowest possible recognition error rate. The discrete nature of the rank-based rank likelihood objective function is transformed to allow the parameter estimations to be optimized during the training phase.
    Type: Grant
    Filed: October 6, 2000
    Date of Patent: February 1, 2005
    Assignee: International Business Machines Corporation
    Inventors: Yuqing Gao, Yongxin Li, Michael Alan Picheny
  • Publication number: 20040225498
    Abstract: A system and method for voice recognition is disclosed. The system enrolls speakers using an enrollment voice samples and identification information. An extraction module characterizes enrollment voice samples with high-dimensional feature vectors or speaker data points. A data structuring module organizes data points into a high-dimensional data structure, such as a kd-tree, in which similarity between data points dictates a distance, such as a Euclidean distance, a Minkowski distance, or a Manhattan distance. The system recognizes a speaker using an unidentified voice sample. A data querying module searches the data structure to generate a subset of approximate nearest neighbors based on an extracted high-dimensional feature vector. A data modeling module uses Parzen windows to estimate a probability density function representing how closely characteristics of the unidentified speaker match enrolled speakers, in real-time, without extensive training data or parametric assumptions about data distribution.
    Type: Application
    Filed: March 26, 2004
    Publication date: November 11, 2004
    Inventor: Ryan Rifkin
  • Patent number: 6804637
    Abstract: To retrieve an optimum template pattern in response to an input sentence, a set of templates is arranged in a plurality of template blocks containing an arbitrary number of sentence components, including grammatically correct and/or incorrect components. A score is assigned to every word in the set of templates according to its importance. The candidate template patterns and the input sentence are retrieved, the scores of the matched words are calculated, and the total of the scores of the entire paths are calculated. Optimum level comparison values are then calculated using the score of the matching words as the numerator and the total score as the denominator. The candidate template pattern having the largest optimum level comparison value among optimum level comparison values that provide the largest numerator, is selected as the optimum template pattern. The input sentence is then corrected using this optimum template pattern.
    Type: Grant
    Filed: June 20, 2000
    Date of Patent: October 12, 2004
    Assignee: Sunflare Co., Ltd.
    Inventors: Naoyuki Tokuda, Hiroyuki Sasai
  • Publication number: 20040181407
    Abstract: A method for generating and/or expanding a vocabulary database of a voice recognition system includes acoustically training the voice recognition system using a computer-based audio module.
    Type: Application
    Filed: March 10, 2004
    Publication date: September 16, 2004
    Applicant: Deutsche Telekom AG
    Inventors: Marian Trinkel, Christel Mueller
  • Patent number: 6789062
    Abstract: A telephone-based interactive speech recognition system is retrained using variable weighting and incremental retraining. Variable weighting involves changing the relative influence of particular measurement data to be reflected in a statistical model. Statistical model data is determined based upon an initial set of measurement data determined from an initial set of speech utterances. When new statistical model data is to be generated to reflect new measurement data determined from new speech utterances, a weighting factor is applied to the new measurement data to generate weighted new measurement data. The new statistical model data is then determined based upon the initial set of measurement data and the weighted new measurement data. Incremental retraining involves generating new statistical model data using prior statistical model data to reduce the amount of prior measurement data that must be maintained and processed.
    Type: Grant
    Filed: February 25, 2000
    Date of Patent: September 7, 2004
    Assignee: SpeechWorks International, Inc.
    Inventors: Michael S. Phillips, Krishna K. Govindarajan, Mark Fanty, Etienne Barnard
  • Patent number: 6789063
    Abstract: In some embodiments, the invention involves receiving phonetic samples and assembling a two-level phonetic decision tree structure using the phonetic samples. The decision tree has multiple leaf node levels each having at least one state, wherein a least one node in a second level is assigned a Gaussian of a node in the first level, but the at least one node in the second level has a weight computed for it.
    Type: Grant
    Filed: September 1, 2000
    Date of Patent: September 7, 2004
    Assignee: Intel Corporation
    Inventor: Yonghong Yan
  • Publication number: 20040122669
    Abstract: A method and apparatus for adapting reference templates is provided. The method includes adapting one or more reference templates using a stored test utterance by replacing data within at the reference templates with a weighted interpolation of that data and corresponding data within the test utterance.
    Type: Application
    Filed: December 24, 2002
    Publication date: June 24, 2004
    Inventor: Hagai Aronowitz
  • Patent number: 6751590
    Abstract: The present invention uses acoustic feature transformations, referred to as pattern-specific maximum likelihood transformations (PSMLT), to model the voice print of speakers in either a text dependent or independent mode. Each transformation maximizes the likelihood, when restricting to diagonal models, of the speaker training data with respect to the resulting voice-print model in the new feature space. Speakers are recognized (i.e., identified, verified or classified) by appropriate comparison of the likelihood of the testing data in each transformed feature space and/or by directly comparing transformation matrices obtained during enrollment and testing. It is to be appreciated that the principle of pattern-specific maximum likelihood transformations can be extended to a large number of pattern matching problems and, in particular, to other biometrics besides speech.
    Type: Grant
    Filed: June 13, 2000
    Date of Patent: June 15, 2004
    Assignee: International Business Machines Corporation
    Inventors: Upendra V. Chaudhari, Ramesh Ambat Gopinath, Stephane Herman Maes
  • Patent number: 6748356
    Abstract: A method and apparatus are disclosed for identifying speakers participating in an audio-video source, whether or not such speakers have been previously registered or enrolled. A speaker segmentation system separates the speakers and identifies all possible frames where there is a segment boundary between non-homogeneous speech portions. A hierarchical speaker tree clustering system clusters homogeneous segments (generally corresponding to the same speaker), and assigns a cluster identifier to each detected segment, whether or not the actual name of the speaker is known. A hierarchical enrolled speaker database is used that includes one or more background models for unenrolled speakers to assign a speaker to each identified segment.
    Type: Grant
    Filed: June 7, 2000
    Date of Patent: June 8, 2004
    Assignee: International Business Machines Corporation
    Inventors: Homayoon Sadr Mohammad Beigi, Mahesh Viswanathan
  • Patent number: 6732075
    Abstract: In a sound synthesizer, a noise adder generates a noise signal having a frequency band of 3,400 to 4,600 Hz, adjusts the gain of the noise signal, and adds the gain-adjusted noise signal to an excitation source after being filled with zeros by a zero-filling circuit, thereby providing a wide-band excitation source which is rather flat. The signal gain is adjusted by determining a narrow-band excitation source or a power of the wide-band excitation source after being filled with zeros and fitting the gain to the narrow-band excitation source or the power.
    Type: Grant
    Filed: April 20, 2000
    Date of Patent: May 4, 2004
    Assignee: Sony Corporation
    Inventors: Shiro Omori, Masayuki Nishiguchi
  • Patent number: 6728674
    Abstract: A method and a system for corrective training of speech models includes changing a weight of a date sample whenever a data sample is incorrectly associated with a classifier and retraining each classifier with the weights.
    Type: Grant
    Filed: July 31, 2000
    Date of Patent: April 27, 2004
    Assignee: Intel Corporation
    Inventor: Meir Griniasty
  • Patent number: 6704707
    Abstract: A method for switching between speech recognition technologies. The method includes reception of an initial recognition request accompanied by control information. Recognition characteristics are determined using the control information and then a switch is configured based upon the particular characteristic. Alternatively, the switch may be configured based upon system load levels and resource constraints.
    Type: Grant
    Filed: March 14, 2001
    Date of Patent: March 9, 2004
    Assignee: Intel Corporation
    Inventors: Andrew V. Anderson, Steven M. Bennett
  • Patent number: 6701293
    Abstract: A method and system for utilizing multiple speech recognizers. The speech system includes a port through which an input audio stream may be received, at least two recognizers that may convert the input stream to text or commands, and a combiner able to combine lists of possible results from each recognizer into a combined list. The method includes receiving an input audio stream, routing the stream to one or more recognizers, receiving a list of possible results from each of the recognizers, combining the lists into a combined list and returning at least a subset of the list to the application.
    Type: Grant
    Filed: June 13, 2001
    Date of Patent: March 2, 2004
    Assignee: Intel Corporation
    Inventors: Steven M. Bennett, Andrew V. Anderson
  • Patent number: 6697779
    Abstract: A method and system for training a user authentication by voice signal are described. In one embodiment, during training, a set of all spectral feature vectors for a given speaker is globally decomposed into speaker-specific decomposition units and a speaker-specific recognition unit. During recognition, spectral feature vectors are locally decomposed into speaker-specific characteristic units. The speaker-specific recognition unit is used together with selected speaker-specific characteristic units to compute a speaker-specific comparison unit. If the speaker-specific comparison unit is within a threshold limit, then the voice signal is authenticated. In addition, a speaker-specific content unit is time-aligned with selected speaker-specific characteristic units. If the alignment is within a threshold limit, then the voice signal is authenticated. In one embodiment, if both thresholds are satisfied, then the user is authenticated.
    Type: Grant
    Filed: September 29, 2000
    Date of Patent: February 24, 2004
    Assignee: Apple Computer, Inc.
    Inventors: Jerome Bellegarda, Devang Naik, Matthias Neeracher, Kim Silverman
  • Patent number: 6697778
    Abstract: Client speaker locations in a speaker space are used to generate speech models for comparison with test speaker data or test speaker speech models. The speaker space can be constructed using training speakers that are entirely separate from the population of client speakers, or from client speakers, or from a mix of training and client speakers. Reestimation of the speaker space based on client environment information is also provided to improve the likelihood that the client data will fall within the speaker space. During enrollment of the clients into the speaker space, additional client speech can be obtained when predetermined conditions are met. The speaker distribution can also be used in the client enrollment step.
    Type: Grant
    Filed: July 5, 2000
    Date of Patent: February 24, 2004
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Roland Kuhn, Olivier Thyes, Patrick Nguyen, Jean-Claude Junqua, Robert Boman
  • Patent number: 6691090
    Abstract: A method for use in a speech recognition system in which a speech waveform to be modelled is represented by a set of feature extracted parameters in the time domain, the method comprising dividing individual ones of one or more of said feature extracted parameters to provide for each divided feature extracted parameter a plurality of frequency channels, and demodulating at least one of the plurality of frequency channels to provide at least one corresponding baseband frequency signal.
    Type: Grant
    Filed: October 24, 2000
    Date of Patent: February 10, 2004
    Assignee: Nokia Mobile Phones Limited
    Inventors: Kari Laurila, Jilei Tian
  • Patent number: 6691089
    Abstract: A text-prompted speaker verification system that can be configured by users based on a desired level of security. A user is prompted for a multiple-digit (or multiple-word) password. The number of digits or words used for each password is defined by the system in accordance with a user set preferred level of security. The level of training required by the system is defined by the user in accordance with a preferred level of security. The set of words used to generate passwords can also be user configurable based upon the desired level of security. The level of security associated with the frequency of false accept errors verses false reject errors is user configurable for each particular application.
    Type: Grant
    Filed: September 30, 1999
    Date of Patent: February 10, 2004
    Assignee: Mindspeed Technologies Inc.
    Inventors: Huan-yu Su, Khaled Assaleh
  • Patent number: 6683625
    Abstract: A system and method for providing a controllable virtual environment includes a computer (11) with processor and a display coupled to the processor to display 2-D or 3-D virtual environment objects. Speech grammars are stored as attributes of the virtual environment objects. Voice commands are recognized by a speech recognizer (19) and microphone (20) coupled to the processor whereby the voice commands are used to manipulate the virtual environment objects on the display. The system is further made role-dependent whereby the display of virtual environment objects and grammar is dependent on the role of the user.
    Type: Grant
    Filed: August 3, 2001
    Date of Patent: January 27, 2004
    Assignee: Texas Instruments Incorporated
    Inventors: Yeshwant K. Muthusamy, Jonathan D. Courtney, Edwin R. Cole