Specialized Models Patents (Class 704/250)

Spatial processor for enhanced performance in multi-talker speech displays

Patent number: 7391877

Abstract: Optimal head related transfer function spatial configurations designed to maximize speech intelligibility in multi-talker speech displays by spatially separating competing speech channels combined with a method of normalizing the relative levels of the different talkers in a multi-talker speech display that improves overall performance even in conventional multi-talker spatial configurations.

Type: Grant

Filed: March 30, 2007

Date of Patent: June 24, 2008

Assignee: United States of America as represented by the Secretary of the Air Force

Inventor: Douglas S. Brungart
Voice evaluation for comparison of a user's voice to a pre-recorded voice of another

Patent number: 7379869

Abstract: A method of comparing voice signatures is provided comprising selecting an original performance. The original performance is comprised of an original performance voice signature. A user impersonation of at least a portion of the original performance is recorded and a user impersonation voice signature is established. The user impersonation voice signature is electronically compared to the original performance voice signature. A graduated performance value is generated representative of the similarities between the original voice signature and the user impersonation voice signature. An entertainment application is based on the graduated performance value.

Type: Grant

Filed: February 25, 2004

Date of Patent: May 27, 2008

Inventor: Kurz Kendra
Method for the Distributed Construction of a Voice Recognition Model, and Device, Server and Computer Programs Used to Implement Same

Publication number: 20080103771

Abstract: A method for the distributed construction of a voice recognition model that is intended to be used by a device comprising a model base and a reference base in which the modeling elements are stored. The method includes the steps of obtaining the entity to be modeled, transmitting data representative of the entity over a communication link to a server, determining a set of modeling parameters indicating the modeling elements, transmitting the modeling parameters to the device, determining the voice recognition model of the entity to be modeled as a function of at least the modeling parameters received and at least one modeling element that is stored in the reference base and indicated in the transmitted parameters, and subsequently saving the voice recognition model in the model base.

Type: Application

Filed: October 27, 2005

Publication date: May 1, 2008

Applicant: France Telecom

Inventors: Denis Jouvet, Jean Monne
Method And System For Sharing Portable Voice Profiles

Publication number: 20080082332

Abstract: An embodiment of the present invention provides a speech recognition engine that utilizes portable voice profiles for converting recorded speech to text. Each portable voice profile includes speaker-dependent data, and is configured to be accessible to a plurality of speech recognition engines through a common interface. A voice profile manager receives the portable voice profiles from other users who have agreed to share their voice profiles. The speech recognition engine includes speaker identification logic to dynamically select a particular portable voice profile, in real-time, from a group of portable voice profiles. The speaker-dependent data included with the portable voice profile enhances the accuracy with which speech recognition engines recognizes spoken words in recorded speech from a speaker associated with a portable voice profile.

Type: Application

Filed: September 28, 2006

Publication date: April 3, 2008

Inventors: Jacqueline Mallett, Sunil Vemuri, N. Rao Machiraju
Prosody Conversion

Publication number: 20080082333

Abstract: A contour for a syllable (or other speech segment) in a voice undergoing conversion is transformed. The transform of that contour is then used to identify one or more source syllable transforms in a codebook. Information regarding the context and/or linguistic features of the contour being converted can also be compared to similar information in the codebook when identifying an appropriate source transform. Once a codebook source transform is selected, an inverse transformation is performed on a corresponding codebook target transform to yield an output contour. The corresponding codebook target transform represents a target voice version of the same syllable represented by the selected codebook source transform. The output contour may be further processed to improve conversion quality.

Type: Application

Filed: September 29, 2006

Publication date: April 3, 2008

Applicant: NOKIA CORPORATION

Inventors: Jani K. Nurminen, Elina Helander
Method and apparatus for providing a speaker adapted speech recognition model set

Patent number: 7340396

Abstract: Speech feature vectors (10) are provided and utilized to develop a corresponding estimated speaker dependent speech feature space model (20) (in one embodiment, it is not necessary that this model (20) have defined correlations with the verbal content of the represented speech itself). A model alignment unit (21) then contrasts this model (20) against the contents of a speaker independent speech feature space model (24) to provide alignment indices to a transformation estimation unit (23). In one embodiment, these alignment indices are based, as least in part, upon a measure of the differences between likelihoods of occurrence for the elements that comprise the constituency of these models. The transformation estimation unit (23) utilizes these alignment indices to provide transformation parameters to a model transformation unit (25) that uses such parameters to transform a speaker independent speech recognition model set (26) and yield a resultant speaker adapted speech recognition model set (27).

Type: Grant

Filed: February 18, 2003

Date of Patent: March 4, 2008

Assignee: Motorola, Inc.

Inventors: Mark Thomson, Julien Epps, Trym Holter
Method and system for detecting speaker change in a voice transaction

Publication number: 20080046241

Abstract: Method and System for detecting speaker change in a voice transaction is provided. The system analyzes a portion of speech in a speech stream and determines a speech feature set. The system then detects a feature change and determines speaker change.

Type: Application

Filed: February 20, 2007

Publication date: February 21, 2008

Inventors: Andrew Osburn, Jeremy Bernard, Mark Boyle
Compensation for utterance dependent articulation for speech quality assessment

Patent number: 7308403

Abstract: A method for objective speech quality assessment that accounts for phonetic contents, speaking styles or individual speaker differences by distorting speech signals under speech quality assessment. By using a distorted version of a speech signal, it is possible to compensate for different phonetic contents, different individual speakers and different speaking styles when assessing speech quality. The amount of degradation in the objective speech quality assessment by distorting the speech signal is maintained similarly for different speech signals, especially when the amount of distortion of the distorted version of speech signal is severe. Objective speech quality assessment for the distorted speech signal and the original undistorted speech signal are compared to obtain a speech quality assessment compensated for utterance dependent articulation.

Type: Grant

Filed: July 1, 2002

Date of Patent: December 11, 2007

Assignee: Lucent Technologies Inc.

Inventor: Doh-Suk Kim
Restoration of high-order Mel Frequency Cepstral Coefficients

Patent number: 7305339

Abstract: A method for estimating high-order Mel Frequency Cepstral Coefficients, the method comprising initializing any of N?L high-order coefficients (HOC) of an MFCC vector of length N having L low-order coefficients (LOC) to a predetermined value, thereby forming a candidate MFCC vector, synthesizing a speech signal frame from the candidate MFCC vector and a pitch value, and computing an N-dimensional MFCC vector from the synthesized frame, thereby producing an output MFCC vector.

Type: Grant

Filed: April 1, 2003

Date of Patent: December 4, 2007

Assignee: International Business Machines Corporation

Inventor: Alexander Sorin
Automatic assessment of phonological processes

Patent number: 7302389

Abstract: A computer-based system generates alternative phonetic transcriptions for a target word or phrase corresponding to specific phonological processes that replace individual phonemes or clusters of two or more phonemes with replacement phonemes. The system compares a user's speech with a list of possible transcriptions that includes the base (i.e., correct) transcription of the test target as well as the different alternative transcriptions, to identify the transcription that best matches the user's. In a speech therapy application, the system identifies the phonological process(es), if any, associated with the user's speech and generates statistics over multiple test targets that can be used to diagnose the user's specific phonological disorders. The system can also be implemented in other contexts such as foreign language instruction and automated attendant applications to cover a wide variety and range of accents and/or phonological disorders.

Type: Grant

Filed: August 8, 2003

Date of Patent: November 27, 2007

Assignee: Lucent Technologies Inc.

Inventors: Sunil K. Gupta, Prabhu Raghavan, Chetan Vinchhi
Systems and methods for using one-dimensional gaussian distributions to model speech

Patent number: 7295978

Abstract: A system for recognizing speech receives an input speech vector and identifies a Gaussian distribution. The system determines an address from the input speech vector (610) and uses the address to retrieve a distance value for the Gaussian distribution from a table (620). The system then determines the probability of the Gaussian distribution using the distance value (630) and recognizes the input speech vector based on the determined probability (640).

Type: Grant

Filed: September 5, 2000

Date of Patent: November 13, 2007

Assignees: Verizon Corporate Services Group Inc., BBN Technologies Corp.

Inventors: Richard Mark Schwartz, Jason Charles Davenport, James Donald Van Sciver, Long Nguyen
Method and apparatus for distribution-based language model adaptation

Patent number: 7254529

Abstract: A method and apparatus are provided for adapting a language model to a task-specific domain. Under the method and apparatus, the relative frequency of n-grams in a small training set (i.e. task-specific training data set) and the relative frequency of n-grams in a large training set (i.e. out-of-domain training data set) are used to weight a distribution count of n-grams in the large training set. The weighted distributions are then used to form a modified language model by identifying probabilities for n-grams from the weighted distributions.

Type: Grant

Filed: September 13, 2005

Date of Patent: August 7, 2007

Assignee: MIcrosoft Corporation

Inventors: Jianfeng Gao, Mingjing Li
Speaker verification system using acoustic data and non-acoustic data

Patent number: 7231350

Abstract: A method and system for speech characterization. One embodiment includes a method for speaker verification which includes collecting data from a speaker, wherein the data comprises acoustic data and non-acoustic data. The data is used to generate a template that includes a first set of “template” parameters. The method further includes receiving a real-time identity claim from a claimant, and using acoustic data and non-acoustic data from the identity claim to generate a second set of parameters. The method further includes comparing the first set of parameters to the set of parameters to determine whether the claimant is the speaker. The first set of parameters and the second set of parameters include at least one purely non-acoustic parameter, including a non-acoustic glottal shape parameter derived from averaging multiple glottal cycle waveforms.

Type: Grant

Filed: December 21, 2005

Date of Patent: June 12, 2007

Assignee: The Regents of the University of California

Inventors: Todd J. Gable, Lawrence C. Ng, John F. Holzrichter, Greg C. Burnett
Automatic identification of telephone callers based on voice characteristics

Patent number: 7231019

Abstract: A method and apparatus are provided for identifying a caller of a call from the caller to a recipient. A voice input is received from the caller, and characteristics of the voice input are applied to a plurality of acoustic models, which include a generic acoustic model and acoustic models of any previously identified callers, to obtain a plurality of respective acoustic scores. The caller is identified as one of the previously identified callers or as a new caller based on the plurality of acoustic scores. If the caller is identified as a new caller, a new acoustic model is generated for the new caller, which is specific to the new caller.

Type: Grant

Filed: February 12, 2004

Date of Patent: June 12, 2007

Assignee: Microsoft Corporation

Inventor: Andrei Pascovici
Mobile communications terminal, voice recognition method for same, and record medium storing program for voice recognition

Patent number: 7228277

Abstract: A voice input section receives voice of the user designating a name etc. and outputs a voice signal to a speech recognition section. The speech recognition section analyzes and recognizes the voice signal and thereby obtains voice data. The voice data is compared with voice patterns that have been registered in the mobile communications terminal corresponding to individuals etc. and thereby a voice pattern that most matches the voice data is searched for and retrieved. If the retrieval of a matching voice pattern succeeded, a memory search processing section refers to a voice-data correspondence table and thereby calls up a telephone directory that has been registered corresponding to the retrieved voice pattern. In each telephone directory, various types of data (telephone number, mail address, URL, etc.) of an individual etc. to be used for starting communication have been registered previously. The type of data to be called up is designated by button operation etc.

Type: Grant

Filed: December 17, 2001

Date of Patent: June 5, 2007

Assignee: NEC Corporation

Inventor: Yoshihisa Nagashima
Method for assigning an identification code

Patent number: 7225132

Abstract: An identification code is assigned to a user by making a selection from a closed set of possible tokens. The selection is determined algorithmically by user identity data. The format of the identification code may comprise a sequence of natural language words chosen from closed sets and a separator character having a fixed value or a small range of possible values. The closed sets may be programmed in the recognition grammar of a speech interface to secure services such as banking.

Type: Grant

Filed: March 13, 2001

Date of Patent: May 29, 2007

Assignee: British Telecommunications plc

Inventors: David J Attwater, John S Fisher, Paul F R Marsh
Bio-phonetic multi-phrase speaker identity verification

Patent number: 7222072

Abstract: A speaker identity claim (SIC) utterance is received and recognized. The SIC utterance is compared with a voice profile registered under the SIC, and a first verification decision is based thereon. A first dynamic phrase (FDP) is generated, and a user is prompted to speak same. An FDP utterance is received, and compared with the voice profile registered under the SIC to make a second verification decision. If the second verification decision indicates a high or low confidence level, the speaker identity claim is accepted or rejected, respectively. If the verification decision indicates a medium confidence level, a second dynamic phrase (SDP) is generated, and the user is prompted to speak same. An SDP utterance is received, and compared with the voice profile registered under the SIC to make a third verification decision. The speaker identity claim is accepted or rejected based on the third verification decision.

Type: Grant

Filed: February 13, 2003

Date of Patent: May 22, 2007

Assignee: SBC Properties, L.P.

Inventor: Hisao M. Chang
Method and device for automatically issuing information using a search engine

Patent number: 7167545

Abstract: Information sought, with associated attributes, is stored in the form of data records. Embodiments provide for inquiry of search arguments for several attributes stored in a data record; comparison of the input search arguments with search arguments stored; selection of a number of hits of storage search arguments corresponding to the respective input search argument, for each of the search argument inputs; weighting of the selected search arguments with scores, which weighting indicates the probability with which the respective selected stored search argument agrees with the actually input search argument; selection of suitable data records from the database via the selected number of hits; weighting of the selected data records with overall scores indicating the probability of the respective selected data record agreeing with the actually input search arguments, depending on the scores of the individual selected search arguments; and output of the data record with the highest overall score.

Type: Grant

Filed: November 27, 2001

Date of Patent: January 23, 2007

Assignee: Varetis Solutions GmbH

Inventors: Bernd Plannerer, Michael Dahmen, Klaus Heidenfelder, Johannes Wagner
Auditory-articulatory analysis for speech quality assessment

Patent number: 7165025

Abstract: Auditory-articulatory analysis for use in speech quality assessment. Articulatory analysis is based on a comparison between powers associated with articulation and non-articulation frequency ranges of a speech signal. Neither source speech nor an estimate of the source speech is utilized in articulatory analysis. Articulatory analysis comprises the steps of comparing articulation power and non-articulation power of a speech signal, and assessing speech quality based on the comparison, wherein articulation and non-articulation powers are powers associated with articulation and non-articulation frequency ranges of the speech signal.

Type: Grant

Filed: July 1, 2002

Date of Patent: January 16, 2007

Assignee: Lucent Technologies Inc.

Inventor: Doh-Suk Kim
Method for speaker-identification using application speech

Patent number: 7085718

Abstract: It is suggested to include application speech (AS) into the set of identification speech data (ISD) for training a speaker-identification process so as to make possible a reduction of the set of initial identification speech data (IISD) to be collected within an initial enrolment phase and therefore to add more convenience for the user to be registered or enrolled.

Type: Grant

Filed: May 6, 2002

Date of Patent: August 1, 2006

Assignee: Sony Deutschland GmbH

Inventor: Thomas Kemp
Coding and storage of phonetical characteristics of strings

Patent number: 7076423

Abstract: The present invention relates to coding and storage of phonetic features in order to search for strings of characters, whereby it is applied in particular to searching for a variety of names, identifiers, denotations and other character strings in a database. This is achieved by a method and system for coding and storing phonetic information representable as an original character sequence in which the phonetic information is coded in a bit code which does not comprise any characters. In some embodiments, tables are used and which comprise character groups that are found empirically and reflect the specific phonetics and method of spelling a name adapted to the actual language in use. This enables efficient coding of phonetic features associated with said groups and provides for adapting the coding method of the present invention to a plurality of different languages.

Type: Grant

Filed: December 21, 2000

Date of Patent: July 11, 2006

Assignee: International Business Machines Corporation

Inventor: Thomas Boehme
Method and apparatus for distribution-based language model adaptation

Patent number: 7043422

Abstract: A method and apparatus are provided for adapting a language model to a task-specific domain. Under the method and apparatus, the relative frequency of n-grams in a small training set (i.e. task-specific training data set) and the relative frequency of n-grams in a large training set (i.e. out-of-domain training data set) are used to weight a distribution count of n-grams in the large training set. The weighted distributions are then used to form a modified language model by identifying probabilities for n-grams from the weighted distributions.

Type: Grant

Filed: September 4, 2001

Date of Patent: May 9, 2006

Assignee: Microsoft Corporation

Inventors: Jianfeng Gao, Mingjing Li
Speaker identification employing a confidence measure that uses statistical properties of N-best lists

Patent number: 7039587

Abstract: Methods and arrangements for facilitating speaker identification. At least one N-best list is generated based on input speech, a system output is posited based on the input speech, and a determination is made, via at least one property of the N-best list, as to whether the posited system output is inconclusive.

Type: Grant

Filed: January 4, 2002

Date of Patent: May 2, 2006

Assignee: International Business Machines Corporation

Inventors: Upendra V. Chaudhari, Jiri Navratil, Ganesh N. Ramaswamy
MVDR based feature extraction for speech recognition

Patent number: 7016839

Abstract: There is provided a method for extracting feature vectors from a digitized utterance. Spectral envelope estimates are computed from overlapping frames in the digitized utterance based on a Minimum Variance Distortionless Response (MVDR) method. Cepstral feature vectors are generated from the spectral envelope estimates. There is provided a method for generating spectral envelope estimates from a digitized utterance. The spectral envelope estimates are generated from overlapping frames in the digitized utterance based on a harmonic mean of at least two low- to-high resolution spectrum estimates. There is provided a method for reducing variance of a feature stream in a pattern recognition system. The feature stream is temporally or spatially averaged to reduce the variance of the feature stream.

Type: Grant

Filed: January 31, 2002

Date of Patent: March 21, 2006

Assignee: International Business Machines Corporation

Inventors: Satayanarayana Dharanipragada, Bhaskar Dharanipragada Rao
Method and apparatus for speaker identification using cepstral covariance matrices and distance metrics

Patent number: 6999928

Abstract: Disclosed is a method of automated speaker identification, comprising receiving a sample speech input signal from a sample handset; deriving a cepstral covariance sample matrix from the first sample speech signal; calculating, with a distance metric, all distances between the sample matrix and one or more cepstral covariance signature matrices; determining if the smallest of the distances is below a predetermined threshold value; and wherein the distance metric is selected from d 5 ? ( S , ? ) = A + 1 H - 2 , d 6 ? ( S , ? ) = ( A + 1 H ) ? ( G + 1 G ) - 4 , d 7 ? ( S , ? ) = A 2 ? ? H ? ( G + 1 G ) - 1 , d 8 ? ( S , ? ) = ( A + 1 H ) ( G + 1 G ) - 1 , d 9 ? ( S , ? ) = A G + G H - 2 , fusion derivatives thereof, and fusion derivatives thereof with ? d 1 ? ( S , ? ) = A H - 1.

Type: Grant

Filed: August 21, 2001

Date of Patent: February 14, 2006

Assignee: International Business Machines Corporation

Inventors: Zhong-Hua Wang, David Lubensky, Cheng Wu
Selecting an acoustic model in a speech recognition system

Patent number: 6952674

Abstract: A speech-recognition system learns a speech profile of a user whose speech is to be recognized. The system plays audible speech samples, stored in sound files, so that the file that most resembles the user's speech may be selected. After receiving the selection, the system identifies an acoustic model that is associated with the chosen sound file. The system may also select a subset of sound files based on information indicative of the user's speech. The system may then play a subset of sound files so that the file that most resembles the user's speech may be selected.

Type: Grant

Filed: January 7, 2002

Date of Patent: October 4, 2005

Assignee: Intel Corporation

Inventor: Richard A. Forand
Processing speech recognition errors in an embedded speech recognition system

Patent number: 6934682

Abstract: A method and system for processing speech misrecognitions. The system can include an embedded speech recognition system having at least one acoustic model and at least one active grammar, wherein the embedded speech recognition system is configured to convert speech audio to text using the at least one acoustic model and the at least one active grammar; a remote training system for modifying the at least one acoustic model based on corrections to speech misrecognitions detected in the embedded speech recognition system; and, a communications link for communicatively linking the embedded speech recognition system to the remote training system. The embedded speech recognition system can further include a user interface for presenting a dialog for correcting the speech misrecognitions detected in the embedded speech recognition system. Notably, the user interface can be a visual display. Alternatively, the user interface can be an audio user interface.

Type: Grant

Filed: March 1, 2001

Date of Patent: August 23, 2005

Assignee: International Business Machines Corporation

Inventor: Steven G. Woodward
Speaker's voice recognition system, method and recording medium using two dimensional frequency expansion coefficients

Patent number: 6934681

Abstract: A voice recognition system comprises an analyzer for converting an input voice signal to an input pattern including cepstrum, a reference pattern for storing reference patterns, an elongation/contraction estimating unit for outputting an elongation/contraction parameter in frequency axis direction by using the input pattern and the reference patterns, and a recognizing unit for calculating the distances between the converted input pattern from the converter and the reference patterns and outputting the reference pattern corresponding to the shortest distance as result of recognition. The elongation/contraction unit estimates an elongation/contraction parameter by using cepstrum included in the input pattern. The elongation/contraction unit does not have various values in advance for determining the elongation/contraction parameter, nor is it necessary for the elongation/contraction unit have to execute distance calculation for various values.

Type: Grant

Filed: October 25, 2000

Date of Patent: August 23, 2005

Assignee: NEC Corporation

Inventors: Tadashi Emori, Koichi Shinoda
Method of determining an eigenspace for representing a plurality of training speakers

Patent number: 6915260

Abstract: Described here is a method of determining an eigenspace for representing a plurality of training speakers, in which first speaker-dependent sets of models are formed for the individual training speakers while training speech data of the individual training speakers are used and the models (SD) of a set of models are described each by a plurality of model parameters. For each speaker a combined model is then displayed in a high-dimensional model space by concatenation of the model parameters of the models of the individual training speakers to a respective coherent super vector. Subsequently, a transformation is carried out while the model space dimension is reduced to obtain eigenspace basis vectors (Ee), which transformation utilizes a reduction criterion based on the variability of the vectors to be transformed. Then the high-dimensional model space is first in a first step reduced to a speaker subspace by a change of basis, in which speaker subspace all the training speakers are represented.

Type: Grant

Filed: September 24, 2001

Date of Patent: July 5, 2005

Assignee: Koninklijke Philips Electronics, N.V.

Inventor: Henrik Botterweck
Eigenvoice re-estimation technique of acoustic models for speech recognition, speaker identification and speaker verification

Patent number: 6895376

Abstract: A reduced dimensionality eigenvoice analytical technique is used during training to develop context-dependent acoustic models for allophones. Re-estimation processes are performed to more strongly separate speaker-dependent and speaker-independent components of the speech model. The eigenvoice technique is also used during run time upon the speech of a new speaker. The technique removes individual speaker idiosyncrasies, to produce more universally applicable and robust allophone models. In one embodiment the eigenvoice technique is used to identify the centroid of each speaker, which may then be “subtracted out” of the recognition equation.

Type: Grant

Filed: May 4, 2001

Date of Patent: May 17, 2005

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Florent Perronnin, Roland Kuhn, Patrick Nguyen, Jean-Claude Junqua
Method and apparatus for improving listener differentiation of talkers during a conference call

Patent number: 6882971

Abstract: A method and associated apparatus for indicating the voice of each talker from a plurality of talkers to be heard by a listener. The method uses a signal that is transmitted over a telecommunications system. The method includes projecting the voice from each one of the plurality of talkers to the listener. A talker indicator is provided proximate to the listener. Talker identification information is generated in the talker indicator that can be used to indicate the identity of each talker who is speaking at any given time to the listener. A device is coupled to the talker indicator that can transmit the voice signal from each talker to the listener. In different aspects, the talker identification information can include such varied indicators as audio, video, or an announcement combined with a temporally compressed voice signal. In another aspect, an emotographic figure is displayed to the listener that each represent a distinct talker.

Type: Grant

Filed: July 18, 2002

Date of Patent: April 19, 2005

Assignee: General Instrument Corporation

Inventor: Michael L. Craner
Method and device for voice recognition in environments with fluctuating noise levels

Patent number: 6859773

Abstract: A method of voice recognition in a noise-ridden acoustic signal comprises a phase of digitizing temporal frames of the noise-ridden acoustic signal, a phase of parametrization of speech-containing temporal frames, a shape-recognition phase in which the parameters are assessed with respect to references pre-recorded in a reference space, a phase of reiterative searching for noise models in the noise-ridden signal frames, a phase of searching for a transition between the new noise model and the old model and, when the noise transition has been detected, a phase of updating the reference space, the parametrization phase including a step of matching the parameters to the new noise model.

Type: Grant

Filed: May 9, 2001

Date of Patent: February 22, 2005

Assignee: Thales

Inventor: Pierre-Albert Breton
Methods and apparatus for training a pattern recognition system using maximal rank likelihood as an optimization function

Patent number: 6850888

Abstract: A method and apparatus are disclosed for training a pattern recognition system, such as a speech recognition system, using an improved objective function. The concept of rank likelihood, previously applied only to the decding process, is applied in a novel manner to the parameter estimation of the training phase of a pattern recognition system. The disclosed objective function is based on a pseudo-rank likelihood that not only maximizes the likelihood of an observation for the correct class, but also minimizes the likelihoods of the observation for all other classes, such that the discrimination between classes is maximized. A training process is disclosed that utilizes the pseudo-rank likelihood objective function to identify model parameters that will result in a pattern recognizer with the lowest possible recognition error rate. The discrete nature of the rank-based rank likelihood objective function is transformed to allow the parameter estimations to be optimized during the training phase.

Type: Grant

Filed: October 6, 2000

Date of Patent: February 1, 2005

Assignee: International Business Machines Corporation

Inventors: Yuqing Gao, Yongxin Li, Michael Alan Picheny
Speaker recognition using local models

Publication number: 20040225498

Abstract: A system and method for voice recognition is disclosed. The system enrolls speakers using an enrollment voice samples and identification information. An extraction module characterizes enrollment voice samples with high-dimensional feature vectors or speaker data points. A data structuring module organizes data points into a high-dimensional data structure, such as a kd-tree, in which similarity between data points dictates a distance, such as a Euclidean distance, a Minkowski distance, or a Manhattan distance. The system recognizes a speaker using an unidentified voice sample. A data querying module searches the data structure to generate a subset of approximate nearest neighbors based on an extracted high-dimensional feature vector. A data modeling module uses Parzen windows to estimate a probability density function representing how closely characteristics of the unidentified speaker match enrolled speakers, in real-time, without extensive training data or parametric assumptions about data distribution.

Type: Application

Filed: March 26, 2004

Publication date: November 11, 2004

Inventor: Ryan Rifkin
Method, apparatus, and recording medium for retrieving optimum template pattern

Patent number: 6804637

Abstract: To retrieve an optimum template pattern in response to an input sentence, a set of templates is arranged in a plurality of template blocks containing an arbitrary number of sentence components, including grammatically correct and/or incorrect components. A score is assigned to every word in the set of templates according to its importance. The candidate template patterns and the input sentence are retrieved, the scores of the matched words are calculated, and the total of the scores of the entire paths are calculated. Optimum level comparison values are then calculated using the score of the matching words as the numerator and the total score as the denominator. The candidate template pattern having the largest optimum level comparison value among optimum level comparison values that provide the largest numerator, is selected as the optimum template pattern. The input sentence is then corrected using this optimum template pattern.

Type: Grant

Filed: June 20, 2000

Date of Patent: October 12, 2004

Assignee: Sunflare Co., Ltd.

Inventors: Naoyuki Tokuda, Hiroyuki Sasai
Method and system for creating speech vocabularies in an automated manner

Publication number: 20040181407

Abstract: A method for generating and/or expanding a vocabulary database of a voice recognition system includes acoustically training the voice recognition system using a computer-based audio module.

Type: Application

Filed: March 10, 2004

Publication date: September 16, 2004

Applicant: Deutsche Telekom AG

Inventors: Marian Trinkel, Christel Mueller
Automatically retraining a speech recognition system

Patent number: 6789062

Abstract: A telephone-based interactive speech recognition system is retrained using variable weighting and incremental retraining. Variable weighting involves changing the relative influence of particular measurement data to be reflected in a statistical model. Statistical model data is determined based upon an initial set of measurement data determined from an initial set of speech utterances. When new statistical model data is to be generated to reflect new measurement data determined from new speech utterances, a weighting factor is applied to the new measurement data to generate weighted new measurement data. The new statistical model data is then determined based upon the initial set of measurement data and the weighted new measurement data. Incremental retraining involves generating new statistical model data using prior statistical model data to reduce the amount of prior measurement data that must be maintained and processed.

Type: Grant

Filed: February 25, 2000

Date of Patent: September 7, 2004

Assignee: SpeechWorks International, Inc.

Inventors: Michael S. Phillips, Krishna K. Govindarajan, Mark Fanty, Etienne Barnard
Acoustic modeling using a two-level decision tree in a speech recognition system

Patent number: 6789063

Abstract: In some embodiments, the invention involves receiving phonetic samples and assembling a two-level phonetic decision tree structure using the phonetic samples. The decision tree has multiple leaf node levels each having at least one state, wherein a least one node in a second level is assigned a Gaussian of a node in the first level, but the at least one node in the second level has a weight computed for it.

Type: Grant

Filed: September 1, 2000

Date of Patent: September 7, 2004

Assignee: Intel Corporation

Inventor: Yonghong Yan
Method and apparatus for adapting reference templates

Publication number: 20040122669

Abstract: A method and apparatus for adapting reference templates is provided. The method includes adapting one or more reference templates using a stored test utterance by replacing data within at the reference templates with a weighted interpolation of that data and corresponding data within the test utterance.

Type: Application

Filed: December 24, 2002

Publication date: June 24, 2004

Inventor: Hagai Aronowitz
Method and apparatus for performing pattern-specific maximum likelihood transformations for speaker recognition

Patent number: 6751590

Abstract: The present invention uses acoustic feature transformations, referred to as pattern-specific maximum likelihood transformations (PSMLT), to model the voice print of speakers in either a text dependent or independent mode. Each transformation maximizes the likelihood, when restricting to diagonal models, of the speaker training data with respect to the resulting voice-print model in the new feature space. Speakers are recognized (i.e., identified, verified or classified) by appropriate comparison of the likelihood of the testing data in each transformed feature space and/or by directly comparing transformation matrices obtained during enrollment and testing. It is to be appreciated that the principle of pattern-specific maximum likelihood transformations can be extended to a large number of pattern matching problems and, in particular, to other biometrics besides speech.

Type: Grant

Filed: June 13, 2000

Date of Patent: June 15, 2004

Assignee: International Business Machines Corporation

Inventors: Upendra V. Chaudhari, Ramesh Ambat Gopinath, Stephane Herman Maes
Methods and apparatus for identifying unknown speakers using a hierarchical tree structure

Patent number: 6748356

Abstract: A method and apparatus are disclosed for identifying speakers participating in an audio-video source, whether or not such speakers have been previously registered or enrolled. A speaker segmentation system separates the speakers and identifies all possible frames where there is a segment boundary between non-homogeneous speech portions. A hierarchical speaker tree clustering system clusters homogeneous segments (generally corresponding to the same speaker), and assigns a cluster identifier to each detected segment, whether or not the actual name of the speaker is known. A hierarchical enrolled speaker database is used that includes one or more background models for unenrolled speakers to assign a speaker to each identified segment.

Type: Grant

Filed: June 7, 2000

Date of Patent: June 8, 2004

Assignee: International Business Machines Corporation

Inventors: Homayoon Sadr Mohammad Beigi, Mahesh Viswanathan
Sound synthesizing apparatus and method, telephone apparatus, and program service medium

Patent number: 6732075

Abstract: In a sound synthesizer, a noise adder generates a noise signal having a frequency band of 3,400 to 4,600 Hz, adjusts the gain of the noise signal, and adds the gain-adjusted noise signal to an excitation source after being filled with zeros by a zero-filling circuit, thereby providing a wide-band excitation source which is rather flat. The signal gain is adjusted by determining a narrow-band excitation source or a power of the wide-band excitation source after being filled with zeros and fitting the gain to the narrow-band excitation source or the power.

Type: Grant

Filed: April 20, 2000

Date of Patent: May 4, 2004

Assignee: Sony Corporation

Inventors: Shiro Omori, Masayuki Nishiguchi
Method and system for training of a classifier

Patent number: 6728674

Abstract: A method and a system for corrective training of speech models includes changing a weight of a date sample whenever a data sample is incorrectly associated with a classifier and retraining each classifier with the weights.

Type: Grant

Filed: July 31, 2000

Date of Patent: April 27, 2004

Assignee: Intel Corporation

Inventor: Meir Griniasty
Method for automatically and dynamically switching between speech technologies

Patent number: 6704707

Abstract: A method for switching between speech recognition technologies. The method includes reception of an initial recognition request accompanied by control information. Recognition characteristics are determined using the control information and then a switch is configured based upon the particular characteristic. Alternatively, the switch may be configured based upon system load levels and resource constraints.

Type: Grant

Filed: March 14, 2001

Date of Patent: March 9, 2004

Assignee: Intel Corporation

Inventors: Andrew V. Anderson, Steven M. Bennett
Combining N-best lists from multiple speech recognizers

Patent number: 6701293

Abstract: A method and system for utilizing multiple speech recognizers. The speech system includes a port through which an input audio stream may be received, at least two recognizers that may convert the input stream to text or commands, and a combiner able to combine lists of possible results from each recognizer into a combined list. The method includes receiving an input audio stream, routing the stream to one or more recognizers, receiving a list of possible results from each of the recognizers, combining the lists into a combined list and returning at least a subset of the list to the application.

Type: Grant

Filed: June 13, 2001

Date of Patent: March 2, 2004

Assignee: Intel Corporation

Inventors: Steven M. Bennett, Andrew V. Anderson
Combined dual spectral and temporal alignment method for user authentication by voice

Patent number: 6697779

Abstract: A method and system for training a user authentication by voice signal are described. In one embodiment, during training, a set of all spectral feature vectors for a given speaker is globally decomposed into speaker-specific decomposition units and a speaker-specific recognition unit. During recognition, spectral feature vectors are locally decomposed into speaker-specific characteristic units. The speaker-specific recognition unit is used together with selected speaker-specific characteristic units to compute a speaker-specific comparison unit. If the speaker-specific comparison unit is within a threshold limit, then the voice signal is authenticated. In addition, a speaker-specific content unit is time-aligned with selected speaker-specific characteristic units. If the alignment is within a threshold limit, then the voice signal is authenticated. In one embodiment, if both thresholds are satisfied, then the user is authenticated.

Type: Grant

Filed: September 29, 2000

Date of Patent: February 24, 2004

Assignee: Apple Computer, Inc.

Inventors: Jerome Bellegarda, Devang Naik, Matthias Neeracher, Kim Silverman
Speaker verification and speaker identification based on a priori knowledge

Patent number: 6697778

Abstract: Client speaker locations in a speaker space are used to generate speech models for comparison with test speaker data or test speaker speech models. The speaker space can be constructed using training speakers that are entirely separate from the population of client speakers, or from client speakers, or from a mix of training and client speakers. Reestimation of the speaker space based on client environment information is also provided to improve the likelihood that the client data will fall within the speaker space. During enrollment of the clients into the speaker space, additional client speech can be obtained when predetermined conditions are met. The speaker distribution can also be used in the client enrollment step.

Type: Grant

Filed: July 5, 2000

Date of Patent: February 24, 2004

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Roland Kuhn, Olivier Thyes, Patrick Nguyen, Jean-Claude Junqua, Robert Boman
Speech recognition system including dimensionality reduction of baseband frequency signals

Patent number: 6691090

Abstract: A method for use in a speech recognition system in which a speech waveform to be modelled is represented by a set of feature extracted parameters in the time domain, the method comprising dividing individual ones of one or more of said feature extracted parameters to provide for each divided feature extracted parameter a plurality of frequency channels, and demodulating at least one of the plurality of frequency channels to provide at least one corresponding baseband frequency signal.

Type: Grant

Filed: October 24, 2000

Date of Patent: February 10, 2004

Assignee: Nokia Mobile Phones Limited

Inventors: Kari Laurila, Jilei Tian
User configurable levels of security for a speaker verification system

Patent number: 6691089

Abstract: A text-prompted speaker verification system that can be configured by users based on a desired level of security. A user is prompted for a multiple-digit (or multiple-word) password. The number of digits or words used for each password is defined by the system in accordance with a user set preferred level of security. The level of training required by the system is defined by the user in accordance with a preferred level of security. The set of words used to generate passwords can also be user configurable based upon the desired level of security. The level of security associated with the frequency of false accept errors verses false reject errors is user configurable for each particular application.

Type: Grant

Filed: September 30, 1999

Date of Patent: February 10, 2004

Assignee: Mindspeed Technologies Inc.

Inventors: Huan-yu Su, Khaled Assaleh
System and method for advanced interfaces for virtual environments

Patent number: 6683625

Abstract: A system and method for providing a controllable virtual environment includes a computer (11) with processor and a display coupled to the processor to display 2-D or 3-D virtual environment objects. Speech grammars are stored as attributes of the virtual environment objects. Voice commands are recognized by a speech recognizer (19) and microphone (20) coupled to the processor whereby the voice commands are used to manipulate the virtual environment objects on the display. The system is further made role-dependent whereby the display of virtual environment objects and grammar is dependent on the role of the user.

Type: Grant

Filed: August 3, 2001

Date of Patent: January 27, 2004

Assignee: Texas Instruments Incorporated

Inventors: Yeshwant K. Muthusamy, Jonathan D. Courtney, Edwin R. Cole

prev … 3 4 5 6 7 8 9 next