Specialized Models Patents (Class 704/250)
-
Patent number: 7391877Abstract: Optimal head related transfer function spatial configurations designed to maximize speech intelligibility in multi-talker speech displays by spatially separating competing speech channels combined with a method of normalizing the relative levels of the different talkers in a multi-talker speech display that improves overall performance even in conventional multi-talker spatial configurations.Type: GrantFiled: March 30, 2007Date of Patent: June 24, 2008Assignee: United States of America as represented by the Secretary of the Air ForceInventor: Douglas S. Brungart
-
Patent number: 7379869Abstract: A method of comparing voice signatures is provided comprising selecting an original performance. The original performance is comprised of an original performance voice signature. A user impersonation of at least a portion of the original performance is recorded and a user impersonation voice signature is established. The user impersonation voice signature is electronically compared to the original performance voice signature. A graduated performance value is generated representative of the similarities between the original voice signature and the user impersonation voice signature. An entertainment application is based on the graduated performance value.Type: GrantFiled: February 25, 2004Date of Patent: May 27, 2008Inventor: Kurz Kendra
-
Publication number: 20080103771Abstract: A method for the distributed construction of a voice recognition model that is intended to be used by a device comprising a model base and a reference base in which the modeling elements are stored. The method includes the steps of obtaining the entity to be modeled, transmitting data representative of the entity over a communication link to a server, determining a set of modeling parameters indicating the modeling elements, transmitting the modeling parameters to the device, determining the voice recognition model of the entity to be modeled as a function of at least the modeling parameters received and at least one modeling element that is stored in the reference base and indicated in the transmitted parameters, and subsequently saving the voice recognition model in the model base.Type: ApplicationFiled: October 27, 2005Publication date: May 1, 2008Applicant: France TelecomInventors: Denis Jouvet, Jean Monne
-
Publication number: 20080082332Abstract: An embodiment of the present invention provides a speech recognition engine that utilizes portable voice profiles for converting recorded speech to text. Each portable voice profile includes speaker-dependent data, and is configured to be accessible to a plurality of speech recognition engines through a common interface. A voice profile manager receives the portable voice profiles from other users who have agreed to share their voice profiles. The speech recognition engine includes speaker identification logic to dynamically select a particular portable voice profile, in real-time, from a group of portable voice profiles. The speaker-dependent data included with the portable voice profile enhances the accuracy with which speech recognition engines recognizes spoken words in recorded speech from a speaker associated with a portable voice profile.Type: ApplicationFiled: September 28, 2006Publication date: April 3, 2008Inventors: Jacqueline Mallett, Sunil Vemuri, N. Rao Machiraju
-
Publication number: 20080082333Abstract: A contour for a syllable (or other speech segment) in a voice undergoing conversion is transformed. The transform of that contour is then used to identify one or more source syllable transforms in a codebook. Information regarding the context and/or linguistic features of the contour being converted can also be compared to similar information in the codebook when identifying an appropriate source transform. Once a codebook source transform is selected, an inverse transformation is performed on a corresponding codebook target transform to yield an output contour. The corresponding codebook target transform represents a target voice version of the same syllable represented by the selected codebook source transform. The output contour may be further processed to improve conversion quality.Type: ApplicationFiled: September 29, 2006Publication date: April 3, 2008Applicant: NOKIA CORPORATIONInventors: Jani K. Nurminen, Elina Helander
-
Patent number: 7340396Abstract: Speech feature vectors (10) are provided and utilized to develop a corresponding estimated speaker dependent speech feature space model (20) (in one embodiment, it is not necessary that this model (20) have defined correlations with the verbal content of the represented speech itself). A model alignment unit (21) then contrasts this model (20) against the contents of a speaker independent speech feature space model (24) to provide alignment indices to a transformation estimation unit (23). In one embodiment, these alignment indices are based, as least in part, upon a measure of the differences between likelihoods of occurrence for the elements that comprise the constituency of these models. The transformation estimation unit (23) utilizes these alignment indices to provide transformation parameters to a model transformation unit (25) that uses such parameters to transform a speaker independent speech recognition model set (26) and yield a resultant speaker adapted speech recognition model set (27).Type: GrantFiled: February 18, 2003Date of Patent: March 4, 2008Assignee: Motorola, Inc.Inventors: Mark Thomson, Julien Epps, Trym Holter
-
Publication number: 20080046241Abstract: Method and System for detecting speaker change in a voice transaction is provided. The system analyzes a portion of speech in a speech stream and determines a speech feature set. The system then detects a feature change and determines speaker change.Type: ApplicationFiled: February 20, 2007Publication date: February 21, 2008Inventors: Andrew Osburn, Jeremy Bernard, Mark Boyle
-
Patent number: 7308403Abstract: A method for objective speech quality assessment that accounts for phonetic contents, speaking styles or individual speaker differences by distorting speech signals under speech quality assessment. By using a distorted version of a speech signal, it is possible to compensate for different phonetic contents, different individual speakers and different speaking styles when assessing speech quality. The amount of degradation in the objective speech quality assessment by distorting the speech signal is maintained similarly for different speech signals, especially when the amount of distortion of the distorted version of speech signal is severe. Objective speech quality assessment for the distorted speech signal and the original undistorted speech signal are compared to obtain a speech quality assessment compensated for utterance dependent articulation.Type: GrantFiled: July 1, 2002Date of Patent: December 11, 2007Assignee: Lucent Technologies Inc.Inventor: Doh-Suk Kim
-
Patent number: 7305339Abstract: A method for estimating high-order Mel Frequency Cepstral Coefficients, the method comprising initializing any of N?L high-order coefficients (HOC) of an MFCC vector of length N having L low-order coefficients (LOC) to a predetermined value, thereby forming a candidate MFCC vector, synthesizing a speech signal frame from the candidate MFCC vector and a pitch value, and computing an N-dimensional MFCC vector from the synthesized frame, thereby producing an output MFCC vector.Type: GrantFiled: April 1, 2003Date of Patent: December 4, 2007Assignee: International Business Machines CorporationInventor: Alexander Sorin
-
Patent number: 7302389Abstract: A computer-based system generates alternative phonetic transcriptions for a target word or phrase corresponding to specific phonological processes that replace individual phonemes or clusters of two or more phonemes with replacement phonemes. The system compares a user's speech with a list of possible transcriptions that includes the base (i.e., correct) transcription of the test target as well as the different alternative transcriptions, to identify the transcription that best matches the user's. In a speech therapy application, the system identifies the phonological process(es), if any, associated with the user's speech and generates statistics over multiple test targets that can be used to diagnose the user's specific phonological disorders. The system can also be implemented in other contexts such as foreign language instruction and automated attendant applications to cover a wide variety and range of accents and/or phonological disorders.Type: GrantFiled: August 8, 2003Date of Patent: November 27, 2007Assignee: Lucent Technologies Inc.Inventors: Sunil K. Gupta, Prabhu Raghavan, Chetan Vinchhi
-
Patent number: 7295978Abstract: A system for recognizing speech receives an input speech vector and identifies a Gaussian distribution. The system determines an address from the input speech vector (610) and uses the address to retrieve a distance value for the Gaussian distribution from a table (620). The system then determines the probability of the Gaussian distribution using the distance value (630) and recognizes the input speech vector based on the determined probability (640).Type: GrantFiled: September 5, 2000Date of Patent: November 13, 2007Assignees: Verizon Corporate Services Group Inc., BBN Technologies Corp.Inventors: Richard Mark Schwartz, Jason Charles Davenport, James Donald Van Sciver, Long Nguyen
-
Patent number: 7254529Abstract: A method and apparatus are provided for adapting a language model to a task-specific domain. Under the method and apparatus, the relative frequency of n-grams in a small training set (i.e. task-specific training data set) and the relative frequency of n-grams in a large training set (i.e. out-of-domain training data set) are used to weight a distribution count of n-grams in the large training set. The weighted distributions are then used to form a modified language model by identifying probabilities for n-grams from the weighted distributions.Type: GrantFiled: September 13, 2005Date of Patent: August 7, 2007Assignee: MIcrosoft CorporationInventors: Jianfeng Gao, Mingjing Li
-
Patent number: 7231350Abstract: A method and system for speech characterization. One embodiment includes a method for speaker verification which includes collecting data from a speaker, wherein the data comprises acoustic data and non-acoustic data. The data is used to generate a template that includes a first set of “template” parameters. The method further includes receiving a real-time identity claim from a claimant, and using acoustic data and non-acoustic data from the identity claim to generate a second set of parameters. The method further includes comparing the first set of parameters to the set of parameters to determine whether the claimant is the speaker. The first set of parameters and the second set of parameters include at least one purely non-acoustic parameter, including a non-acoustic glottal shape parameter derived from averaging multiple glottal cycle waveforms.Type: GrantFiled: December 21, 2005Date of Patent: June 12, 2007Assignee: The Regents of the University of CaliforniaInventors: Todd J. Gable, Lawrence C. Ng, John F. Holzrichter, Greg C. Burnett
-
Patent number: 7231019Abstract: A method and apparatus are provided for identifying a caller of a call from the caller to a recipient. A voice input is received from the caller, and characteristics of the voice input are applied to a plurality of acoustic models, which include a generic acoustic model and acoustic models of any previously identified callers, to obtain a plurality of respective acoustic scores. The caller is identified as one of the previously identified callers or as a new caller based on the plurality of acoustic scores. If the caller is identified as a new caller, a new acoustic model is generated for the new caller, which is specific to the new caller.Type: GrantFiled: February 12, 2004Date of Patent: June 12, 2007Assignee: Microsoft CorporationInventor: Andrei Pascovici
-
Patent number: 7228277Abstract: A voice input section receives voice of the user designating a name etc. and outputs a voice signal to a speech recognition section. The speech recognition section analyzes and recognizes the voice signal and thereby obtains voice data. The voice data is compared with voice patterns that have been registered in the mobile communications terminal corresponding to individuals etc. and thereby a voice pattern that most matches the voice data is searched for and retrieved. If the retrieval of a matching voice pattern succeeded, a memory search processing section refers to a voice-data correspondence table and thereby calls up a telephone directory that has been registered corresponding to the retrieved voice pattern. In each telephone directory, various types of data (telephone number, mail address, URL, etc.) of an individual etc. to be used for starting communication have been registered previously. The type of data to be called up is designated by button operation etc.Type: GrantFiled: December 17, 2001Date of Patent: June 5, 2007Assignee: NEC CorporationInventor: Yoshihisa Nagashima
-
Patent number: 7225132Abstract: An identification code is assigned to a user by making a selection from a closed set of possible tokens. The selection is determined algorithmically by user identity data. The format of the identification code may comprise a sequence of natural language words chosen from closed sets and a separator character having a fixed value or a small range of possible values. The closed sets may be programmed in the recognition grammar of a speech interface to secure services such as banking.Type: GrantFiled: March 13, 2001Date of Patent: May 29, 2007Assignee: British Telecommunications plcInventors: David J Attwater, John S Fisher, Paul F R Marsh
-
Patent number: 7222072Abstract: A speaker identity claim (SIC) utterance is received and recognized. The SIC utterance is compared with a voice profile registered under the SIC, and a first verification decision is based thereon. A first dynamic phrase (FDP) is generated, and a user is prompted to speak same. An FDP utterance is received, and compared with the voice profile registered under the SIC to make a second verification decision. If the second verification decision indicates a high or low confidence level, the speaker identity claim is accepted or rejected, respectively. If the verification decision indicates a medium confidence level, a second dynamic phrase (SDP) is generated, and the user is prompted to speak same. An SDP utterance is received, and compared with the voice profile registered under the SIC to make a third verification decision. The speaker identity claim is accepted or rejected based on the third verification decision.Type: GrantFiled: February 13, 2003Date of Patent: May 22, 2007Assignee: SBC Properties, L.P.Inventor: Hisao M. Chang
-
Patent number: 7167545Abstract: Information sought, with associated attributes, is stored in the form of data records. Embodiments provide for inquiry of search arguments for several attributes stored in a data record; comparison of the input search arguments with search arguments stored; selection of a number of hits of storage search arguments corresponding to the respective input search argument, for each of the search argument inputs; weighting of the selected search arguments with scores, which weighting indicates the probability with which the respective selected stored search argument agrees with the actually input search argument; selection of suitable data records from the database via the selected number of hits; weighting of the selected data records with overall scores indicating the probability of the respective selected data record agreeing with the actually input search arguments, depending on the scores of the individual selected search arguments; and output of the data record with the highest overall score.Type: GrantFiled: November 27, 2001Date of Patent: January 23, 2007Assignee: Varetis Solutions GmbHInventors: Bernd Plannerer, Michael Dahmen, Klaus Heidenfelder, Johannes Wagner
-
Patent number: 7165025Abstract: Auditory-articulatory analysis for use in speech quality assessment. Articulatory analysis is based on a comparison between powers associated with articulation and non-articulation frequency ranges of a speech signal. Neither source speech nor an estimate of the source speech is utilized in articulatory analysis. Articulatory analysis comprises the steps of comparing articulation power and non-articulation power of a speech signal, and assessing speech quality based on the comparison, wherein articulation and non-articulation powers are powers associated with articulation and non-articulation frequency ranges of the speech signal.Type: GrantFiled: July 1, 2002Date of Patent: January 16, 2007Assignee: Lucent Technologies Inc.Inventor: Doh-Suk Kim
-
Patent number: 7085718Abstract: It is suggested to include application speech (AS) into the set of identification speech data (ISD) for training a speaker-identification process so as to make possible a reduction of the set of initial identification speech data (IISD) to be collected within an initial enrolment phase and therefore to add more convenience for the user to be registered or enrolled.Type: GrantFiled: May 6, 2002Date of Patent: August 1, 2006Assignee: Sony Deutschland GmbHInventor: Thomas Kemp
-
Patent number: 7076423Abstract: The present invention relates to coding and storage of phonetic features in order to search for strings of characters, whereby it is applied in particular to searching for a variety of names, identifiers, denotations and other character strings in a database. This is achieved by a method and system for coding and storing phonetic information representable as an original character sequence in which the phonetic information is coded in a bit code which does not comprise any characters. In some embodiments, tables are used and which comprise character groups that are found empirically and reflect the specific phonetics and method of spelling a name adapted to the actual language in use. This enables efficient coding of phonetic features associated with said groups and provides for adapting the coding method of the present invention to a plurality of different languages.Type: GrantFiled: December 21, 2000Date of Patent: July 11, 2006Assignee: International Business Machines CorporationInventor: Thomas Boehme
-
Patent number: 7043422Abstract: A method and apparatus are provided for adapting a language model to a task-specific domain. Under the method and apparatus, the relative frequency of n-grams in a small training set (i.e. task-specific training data set) and the relative frequency of n-grams in a large training set (i.e. out-of-domain training data set) are used to weight a distribution count of n-grams in the large training set. The weighted distributions are then used to form a modified language model by identifying probabilities for n-grams from the weighted distributions.Type: GrantFiled: September 4, 2001Date of Patent: May 9, 2006Assignee: Microsoft CorporationInventors: Jianfeng Gao, Mingjing Li
-
Patent number: 7039587Abstract: Methods and arrangements for facilitating speaker identification. At least one N-best list is generated based on input speech, a system output is posited based on the input speech, and a determination is made, via at least one property of the N-best list, as to whether the posited system output is inconclusive.Type: GrantFiled: January 4, 2002Date of Patent: May 2, 2006Assignee: International Business Machines CorporationInventors: Upendra V. Chaudhari, Jiri Navratil, Ganesh N. Ramaswamy
-
Patent number: 7016839Abstract: There is provided a method for extracting feature vectors from a digitized utterance. Spectral envelope estimates are computed from overlapping frames in the digitized utterance based on a Minimum Variance Distortionless Response (MVDR) method. Cepstral feature vectors are generated from the spectral envelope estimates. There is provided a method for generating spectral envelope estimates from a digitized utterance. The spectral envelope estimates are generated from overlapping frames in the digitized utterance based on a harmonic mean of at least two low- to-high resolution spectrum estimates. There is provided a method for reducing variance of a feature stream in a pattern recognition system. The feature stream is temporally or spatially averaged to reduce the variance of the feature stream.Type: GrantFiled: January 31, 2002Date of Patent: March 21, 2006Assignee: International Business Machines CorporationInventors: Satayanarayana Dharanipragada, Bhaskar Dharanipragada Rao
-
Patent number: 6999928Abstract: Disclosed is a method of automated speaker identification, comprising receiving a sample speech input signal from a sample handset; deriving a cepstral covariance sample matrix from the first sample speech signal; calculating, with a distance metric, all distances between the sample matrix and one or more cepstral covariance signature matrices; determining if the smallest of the distances is below a predetermined threshold value; and wherein the distance metric is selected from d 5 ? ( S , ? ) = A + 1 H - 2 , d 6 ? ( S , ? ) = ( A + 1 H ) ? ( G + 1 G ) - 4 , d 7 ? ( S , ? ) = A 2 ? ? H ? ( G + 1 G ) - 1 , d 8 ? ( S , ? ) = ( A + 1 H ) ( G + 1 G ) - 1 , d 9 ? ( S , ? ) = A G + G H - 2 , fusion derivatives thereof, and fusion derivatives thereof with ? d 1 ? ( S , ? ) = A H - 1.Type: GrantFiled: August 21, 2001Date of Patent: February 14, 2006Assignee: International Business Machines CorporationInventors: Zhong-Hua Wang, David Lubensky, Cheng Wu
-
Patent number: 6952674Abstract: A speech-recognition system learns a speech profile of a user whose speech is to be recognized. The system plays audible speech samples, stored in sound files, so that the file that most resembles the user's speech may be selected. After receiving the selection, the system identifies an acoustic model that is associated with the chosen sound file. The system may also select a subset of sound files based on information indicative of the user's speech. The system may then play a subset of sound files so that the file that most resembles the user's speech may be selected.Type: GrantFiled: January 7, 2002Date of Patent: October 4, 2005Assignee: Intel CorporationInventor: Richard A. Forand
-
Patent number: 6934682Abstract: A method and system for processing speech misrecognitions. The system can include an embedded speech recognition system having at least one acoustic model and at least one active grammar, wherein the embedded speech recognition system is configured to convert speech audio to text using the at least one acoustic model and the at least one active grammar; a remote training system for modifying the at least one acoustic model based on corrections to speech misrecognitions detected in the embedded speech recognition system; and, a communications link for communicatively linking the embedded speech recognition system to the remote training system. The embedded speech recognition system can further include a user interface for presenting a dialog for correcting the speech misrecognitions detected in the embedded speech recognition system. Notably, the user interface can be a visual display. Alternatively, the user interface can be an audio user interface.Type: GrantFiled: March 1, 2001Date of Patent: August 23, 2005Assignee: International Business Machines CorporationInventor: Steven G. Woodward
-
Patent number: 6934681Abstract: A voice recognition system comprises an analyzer for converting an input voice signal to an input pattern including cepstrum, a reference pattern for storing reference patterns, an elongation/contraction estimating unit for outputting an elongation/contraction parameter in frequency axis direction by using the input pattern and the reference patterns, and a recognizing unit for calculating the distances between the converted input pattern from the converter and the reference patterns and outputting the reference pattern corresponding to the shortest distance as result of recognition. The elongation/contraction unit estimates an elongation/contraction parameter by using cepstrum included in the input pattern. The elongation/contraction unit does not have various values in advance for determining the elongation/contraction parameter, nor is it necessary for the elongation/contraction unit have to execute distance calculation for various values.Type: GrantFiled: October 25, 2000Date of Patent: August 23, 2005Assignee: NEC CorporationInventors: Tadashi Emori, Koichi Shinoda
-
Patent number: 6915260Abstract: Described here is a method of determining an eigenspace for representing a plurality of training speakers, in which first speaker-dependent sets of models are formed for the individual training speakers while training speech data of the individual training speakers are used and the models (SD) of a set of models are described each by a plurality of model parameters. For each speaker a combined model is then displayed in a high-dimensional model space by concatenation of the model parameters of the models of the individual training speakers to a respective coherent super vector. Subsequently, a transformation is carried out while the model space dimension is reduced to obtain eigenspace basis vectors (Ee), which transformation utilizes a reduction criterion based on the variability of the vectors to be transformed. Then the high-dimensional model space is first in a first step reduced to a speaker subspace by a change of basis, in which speaker subspace all the training speakers are represented.Type: GrantFiled: September 24, 2001Date of Patent: July 5, 2005Assignee: Koninklijke Philips Electronics, N.V.Inventor: Henrik Botterweck
-
Patent number: 6895376Abstract: A reduced dimensionality eigenvoice analytical technique is used during training to develop context-dependent acoustic models for allophones. Re-estimation processes are performed to more strongly separate speaker-dependent and speaker-independent components of the speech model. The eigenvoice technique is also used during run time upon the speech of a new speaker. The technique removes individual speaker idiosyncrasies, to produce more universally applicable and robust allophone models. In one embodiment the eigenvoice technique is used to identify the centroid of each speaker, which may then be “subtracted out” of the recognition equation.Type: GrantFiled: May 4, 2001Date of Patent: May 17, 2005Assignee: Matsushita Electric Industrial Co., Ltd.Inventors: Florent Perronnin, Roland Kuhn, Patrick Nguyen, Jean-Claude Junqua
-
Patent number: 6882971Abstract: A method and associated apparatus for indicating the voice of each talker from a plurality of talkers to be heard by a listener. The method uses a signal that is transmitted over a telecommunications system. The method includes projecting the voice from each one of the plurality of talkers to the listener. A talker indicator is provided proximate to the listener. Talker identification information is generated in the talker indicator that can be used to indicate the identity of each talker who is speaking at any given time to the listener. A device is coupled to the talker indicator that can transmit the voice signal from each talker to the listener. In different aspects, the talker identification information can include such varied indicators as audio, video, or an announcement combined with a temporally compressed voice signal. In another aspect, an emotographic figure is displayed to the listener that each represent a distinct talker.Type: GrantFiled: July 18, 2002Date of Patent: April 19, 2005Assignee: General Instrument CorporationInventor: Michael L. Craner
-
Patent number: 6859773Abstract: A method of voice recognition in a noise-ridden acoustic signal comprises a phase of digitizing temporal frames of the noise-ridden acoustic signal, a phase of parametrization of speech-containing temporal frames, a shape-recognition phase in which the parameters are assessed with respect to references pre-recorded in a reference space, a phase of reiterative searching for noise models in the noise-ridden signal frames, a phase of searching for a transition between the new noise model and the old model and, when the noise transition has been detected, a phase of updating the reference space, the parametrization phase including a step of matching the parameters to the new noise model.Type: GrantFiled: May 9, 2001Date of Patent: February 22, 2005Assignee: ThalesInventor: Pierre-Albert Breton
-
Patent number: 6850888Abstract: A method and apparatus are disclosed for training a pattern recognition system, such as a speech recognition system, using an improved objective function. The concept of rank likelihood, previously applied only to the decding process, is applied in a novel manner to the parameter estimation of the training phase of a pattern recognition system. The disclosed objective function is based on a pseudo-rank likelihood that not only maximizes the likelihood of an observation for the correct class, but also minimizes the likelihoods of the observation for all other classes, such that the discrimination between classes is maximized. A training process is disclosed that utilizes the pseudo-rank likelihood objective function to identify model parameters that will result in a pattern recognizer with the lowest possible recognition error rate. The discrete nature of the rank-based rank likelihood objective function is transformed to allow the parameter estimations to be optimized during the training phase.Type: GrantFiled: October 6, 2000Date of Patent: February 1, 2005Assignee: International Business Machines CorporationInventors: Yuqing Gao, Yongxin Li, Michael Alan Picheny
-
Publication number: 20040225498Abstract: A system and method for voice recognition is disclosed. The system enrolls speakers using an enrollment voice samples and identification information. An extraction module characterizes enrollment voice samples with high-dimensional feature vectors or speaker data points. A data structuring module organizes data points into a high-dimensional data structure, such as a kd-tree, in which similarity between data points dictates a distance, such as a Euclidean distance, a Minkowski distance, or a Manhattan distance. The system recognizes a speaker using an unidentified voice sample. A data querying module searches the data structure to generate a subset of approximate nearest neighbors based on an extracted high-dimensional feature vector. A data modeling module uses Parzen windows to estimate a probability density function representing how closely characteristics of the unidentified speaker match enrolled speakers, in real-time, without extensive training data or parametric assumptions about data distribution.Type: ApplicationFiled: March 26, 2004Publication date: November 11, 2004Inventor: Ryan Rifkin
-
Patent number: 6804637Abstract: To retrieve an optimum template pattern in response to an input sentence, a set of templates is arranged in a plurality of template blocks containing an arbitrary number of sentence components, including grammatically correct and/or incorrect components. A score is assigned to every word in the set of templates according to its importance. The candidate template patterns and the input sentence are retrieved, the scores of the matched words are calculated, and the total of the scores of the entire paths are calculated. Optimum level comparison values are then calculated using the score of the matching words as the numerator and the total score as the denominator. The candidate template pattern having the largest optimum level comparison value among optimum level comparison values that provide the largest numerator, is selected as the optimum template pattern. The input sentence is then corrected using this optimum template pattern.Type: GrantFiled: June 20, 2000Date of Patent: October 12, 2004Assignee: Sunflare Co., Ltd.Inventors: Naoyuki Tokuda, Hiroyuki Sasai
-
Publication number: 20040181407Abstract: A method for generating and/or expanding a vocabulary database of a voice recognition system includes acoustically training the voice recognition system using a computer-based audio module.Type: ApplicationFiled: March 10, 2004Publication date: September 16, 2004Applicant: Deutsche Telekom AGInventors: Marian Trinkel, Christel Mueller
-
Patent number: 6789062Abstract: A telephone-based interactive speech recognition system is retrained using variable weighting and incremental retraining. Variable weighting involves changing the relative influence of particular measurement data to be reflected in a statistical model. Statistical model data is determined based upon an initial set of measurement data determined from an initial set of speech utterances. When new statistical model data is to be generated to reflect new measurement data determined from new speech utterances, a weighting factor is applied to the new measurement data to generate weighted new measurement data. The new statistical model data is then determined based upon the initial set of measurement data and the weighted new measurement data. Incremental retraining involves generating new statistical model data using prior statistical model data to reduce the amount of prior measurement data that must be maintained and processed.Type: GrantFiled: February 25, 2000Date of Patent: September 7, 2004Assignee: SpeechWorks International, Inc.Inventors: Michael S. Phillips, Krishna K. Govindarajan, Mark Fanty, Etienne Barnard
-
Patent number: 6789063Abstract: In some embodiments, the invention involves receiving phonetic samples and assembling a two-level phonetic decision tree structure using the phonetic samples. The decision tree has multiple leaf node levels each having at least one state, wherein a least one node in a second level is assigned a Gaussian of a node in the first level, but the at least one node in the second level has a weight computed for it.Type: GrantFiled: September 1, 2000Date of Patent: September 7, 2004Assignee: Intel CorporationInventor: Yonghong Yan
-
Publication number: 20040122669Abstract: A method and apparatus for adapting reference templates is provided. The method includes adapting one or more reference templates using a stored test utterance by replacing data within at the reference templates with a weighted interpolation of that data and corresponding data within the test utterance.Type: ApplicationFiled: December 24, 2002Publication date: June 24, 2004Inventor: Hagai Aronowitz
-
Patent number: 6751590Abstract: The present invention uses acoustic feature transformations, referred to as pattern-specific maximum likelihood transformations (PSMLT), to model the voice print of speakers in either a text dependent or independent mode. Each transformation maximizes the likelihood, when restricting to diagonal models, of the speaker training data with respect to the resulting voice-print model in the new feature space. Speakers are recognized (i.e., identified, verified or classified) by appropriate comparison of the likelihood of the testing data in each transformed feature space and/or by directly comparing transformation matrices obtained during enrollment and testing. It is to be appreciated that the principle of pattern-specific maximum likelihood transformations can be extended to a large number of pattern matching problems and, in particular, to other biometrics besides speech.Type: GrantFiled: June 13, 2000Date of Patent: June 15, 2004Assignee: International Business Machines CorporationInventors: Upendra V. Chaudhari, Ramesh Ambat Gopinath, Stephane Herman Maes
-
Patent number: 6748356Abstract: A method and apparatus are disclosed for identifying speakers participating in an audio-video source, whether or not such speakers have been previously registered or enrolled. A speaker segmentation system separates the speakers and identifies all possible frames where there is a segment boundary between non-homogeneous speech portions. A hierarchical speaker tree clustering system clusters homogeneous segments (generally corresponding to the same speaker), and assigns a cluster identifier to each detected segment, whether or not the actual name of the speaker is known. A hierarchical enrolled speaker database is used that includes one or more background models for unenrolled speakers to assign a speaker to each identified segment.Type: GrantFiled: June 7, 2000Date of Patent: June 8, 2004Assignee: International Business Machines CorporationInventors: Homayoon Sadr Mohammad Beigi, Mahesh Viswanathan
-
Patent number: 6732075Abstract: In a sound synthesizer, a noise adder generates a noise signal having a frequency band of 3,400 to 4,600 Hz, adjusts the gain of the noise signal, and adds the gain-adjusted noise signal to an excitation source after being filled with zeros by a zero-filling circuit, thereby providing a wide-band excitation source which is rather flat. The signal gain is adjusted by determining a narrow-band excitation source or a power of the wide-band excitation source after being filled with zeros and fitting the gain to the narrow-band excitation source or the power.Type: GrantFiled: April 20, 2000Date of Patent: May 4, 2004Assignee: Sony CorporationInventors: Shiro Omori, Masayuki Nishiguchi
-
Patent number: 6728674Abstract: A method and a system for corrective training of speech models includes changing a weight of a date sample whenever a data sample is incorrectly associated with a classifier and retraining each classifier with the weights.Type: GrantFiled: July 31, 2000Date of Patent: April 27, 2004Assignee: Intel CorporationInventor: Meir Griniasty
-
Patent number: 6704707Abstract: A method for switching between speech recognition technologies. The method includes reception of an initial recognition request accompanied by control information. Recognition characteristics are determined using the control information and then a switch is configured based upon the particular characteristic. Alternatively, the switch may be configured based upon system load levels and resource constraints.Type: GrantFiled: March 14, 2001Date of Patent: March 9, 2004Assignee: Intel CorporationInventors: Andrew V. Anderson, Steven M. Bennett
-
Patent number: 6701293Abstract: A method and system for utilizing multiple speech recognizers. The speech system includes a port through which an input audio stream may be received, at least two recognizers that may convert the input stream to text or commands, and a combiner able to combine lists of possible results from each recognizer into a combined list. The method includes receiving an input audio stream, routing the stream to one or more recognizers, receiving a list of possible results from each of the recognizers, combining the lists into a combined list and returning at least a subset of the list to the application.Type: GrantFiled: June 13, 2001Date of Patent: March 2, 2004Assignee: Intel CorporationInventors: Steven M. Bennett, Andrew V. Anderson
-
Patent number: 6697779Abstract: A method and system for training a user authentication by voice signal are described. In one embodiment, during training, a set of all spectral feature vectors for a given speaker is globally decomposed into speaker-specific decomposition units and a speaker-specific recognition unit. During recognition, spectral feature vectors are locally decomposed into speaker-specific characteristic units. The speaker-specific recognition unit is used together with selected speaker-specific characteristic units to compute a speaker-specific comparison unit. If the speaker-specific comparison unit is within a threshold limit, then the voice signal is authenticated. In addition, a speaker-specific content unit is time-aligned with selected speaker-specific characteristic units. If the alignment is within a threshold limit, then the voice signal is authenticated. In one embodiment, if both thresholds are satisfied, then the user is authenticated.Type: GrantFiled: September 29, 2000Date of Patent: February 24, 2004Assignee: Apple Computer, Inc.Inventors: Jerome Bellegarda, Devang Naik, Matthias Neeracher, Kim Silverman
-
Patent number: 6697778Abstract: Client speaker locations in a speaker space are used to generate speech models for comparison with test speaker data or test speaker speech models. The speaker space can be constructed using training speakers that are entirely separate from the population of client speakers, or from client speakers, or from a mix of training and client speakers. Reestimation of the speaker space based on client environment information is also provided to improve the likelihood that the client data will fall within the speaker space. During enrollment of the clients into the speaker space, additional client speech can be obtained when predetermined conditions are met. The speaker distribution can also be used in the client enrollment step.Type: GrantFiled: July 5, 2000Date of Patent: February 24, 2004Assignee: Matsushita Electric Industrial Co., Ltd.Inventors: Roland Kuhn, Olivier Thyes, Patrick Nguyen, Jean-Claude Junqua, Robert Boman
-
Patent number: 6691090Abstract: A method for use in a speech recognition system in which a speech waveform to be modelled is represented by a set of feature extracted parameters in the time domain, the method comprising dividing individual ones of one or more of said feature extracted parameters to provide for each divided feature extracted parameter a plurality of frequency channels, and demodulating at least one of the plurality of frequency channels to provide at least one corresponding baseband frequency signal.Type: GrantFiled: October 24, 2000Date of Patent: February 10, 2004Assignee: Nokia Mobile Phones LimitedInventors: Kari Laurila, Jilei Tian
-
Patent number: 6691089Abstract: A text-prompted speaker verification system that can be configured by users based on a desired level of security. A user is prompted for a multiple-digit (or multiple-word) password. The number of digits or words used for each password is defined by the system in accordance with a user set preferred level of security. The level of training required by the system is defined by the user in accordance with a preferred level of security. The set of words used to generate passwords can also be user configurable based upon the desired level of security. The level of security associated with the frequency of false accept errors verses false reject errors is user configurable for each particular application.Type: GrantFiled: September 30, 1999Date of Patent: February 10, 2004Assignee: Mindspeed Technologies Inc.Inventors: Huan-yu Su, Khaled Assaleh
-
Patent number: 6683625Abstract: A system and method for providing a controllable virtual environment includes a computer (11) with processor and a display coupled to the processor to display 2-D or 3-D virtual environment objects. Speech grammars are stored as attributes of the virtual environment objects. Voice commands are recognized by a speech recognizer (19) and microphone (20) coupled to the processor whereby the voice commands are used to manipulate the virtual environment objects on the display. The system is further made role-dependent whereby the display of virtual environment objects and grammar is dependent on the role of the user.Type: GrantFiled: August 3, 2001Date of Patent: January 27, 2004Assignee: Texas Instruments IncorporatedInventors: Yeshwant K. Muthusamy, Jonathan D. Courtney, Edwin R. Cole