Specialized Models Patents (Class 704/250)
  • Patent number: 7689414
    Abstract: In a speech recognition device (1) for recognizing text information (TI) corresponding to speech information (SI), wherein speech information (SI) can be characterized in respect of language properties, there are firstly provided at least two language-property recognition means (20, 21, 22, 23), each of the language-property recognition means (20, 21, 22, 23) being arranged, by using the speech information (SI), to recognize a language property assigned to said means and to generate property information (ASI, LI, SGI, CI) representing the language property that is recognized, and secondly there are provided speech recognition means (24) that, while continuously taking into account the at least two items of property information (ASI, LI, SGI, CI), are arranged to recognize the text information (TI) corresponding to the speech information (SI).
    Type: Grant
    Filed: October 31, 2003
    Date of Patent: March 30, 2010
    Assignee: Nuance Communications Austria GmbH
    Inventor: Zsolt Saffer
  • Patent number: 7689418
    Abstract: A system and method for verifying user identity, in accordance with the present invention, includes a conversational system for receiving inputs from a user and transforming the inputs into formal commands. A behavior verifier is coupled to the conversational system for extracting features from the inputs. The features include behavior patterns of the user. The behavior verifier is adapted to compare the input behavior to a behavior model to determine if the user is authorized to interact with the system.
    Type: Grant
    Filed: September 12, 2002
    Date of Patent: March 30, 2010
    Assignee: Nuance Communications, Inc.
    Inventors: Ganesh N. Ramaswamy, Upendra V. Chaudhari
  • Patent number: 7689420
    Abstract: Architecture for integrating and generating back-off grammars (BOG) in a speech recognition application for recognizing out-of-grammar (OOG) utterances and updating the context-free grammars (CFG) with the results. A parsing component identifies keywords and/or slots from user utterances and a grammar generation component adds filler tags before and/or after the keywords and slots to create new grammar rules. The BOG can be generated from these new grammar rules and can be used to process the OOG user utterances. By processing the OOG user utterances through the BOG, the architecture can recognize and perform the intended task on behalf of the user.
    Type: Grant
    Filed: April 6, 2006
    Date of Patent: March 30, 2010
    Assignee: Microsoft Corporation
    Inventors: Timothy S. Paek, David M. Chickering, Eric Norman Badger, Qiang Wu
  • Patent number: 7684977
    Abstract: In an interface unit, an input section obtains an input signal of user's speech or the like and an input processing section processes the input signal and detects information relating to the user. On the basis of the detection result, a response contents determination section determines response contents to the user. While, a response manner adjusting section adjusts a response manner to the user, such as speech speed and the like, on the basis of the processing state of the input signal, the information relating to the user detected from the input signal, and the like.
    Type: Grant
    Filed: June 8, 2006
    Date of Patent: March 23, 2010
    Assignee: Panasonic Corporation
    Inventor: Koji Morikawa
  • Publication number: 20100070278
    Abstract: A transformation can be derived which would represent that processing required to convert a male speech model to a female speech model. That transformation is subjected to a predetermined modification, and the modified transformation is applied to a female speech model to produce a synthetic children's speech model. The male and female models can be expressed in terms of a vector representing key values defining each speech model and the derived transformation can be in the form of a matrix that would transform the vector of the male model to the vector of the female model. The modification to the derived matrix comprises applying an exponential p which has a value greater than zero and less than 1.
    Type: Application
    Filed: September 12, 2008
    Publication date: March 18, 2010
    Inventors: Andreas Hagen, Bryan Peltom, Kadri Hacioglu
  • Patent number: 7672844
    Abstract: A voice processing apparatus for performing voiceprint recognition processing with high accuracy even in the case where a plurality of conference participants speak at a time in a conference; wherein a bi-directional telephonic communication portion receives as an input respective voice signals from a plurality of microphones, selects one microphone based on the input voice signals, and outputs a voice signal from the microphone; a voiceprint recognition portion 322 performs voiceprint recognition based on the input voice signal in voiceprint recognizable period, and stores voiceprint data successively in a buffer; and a CPU takes out voiceprint data successively from the buffer, checking against voiceprint data stored in a voiceprint register, specifies a speaker, and processes the voice signal output from the bi-directional telephonic communication portion by associating the same with the speaker.
    Type: Grant
    Filed: August 3, 2004
    Date of Patent: March 2, 2010
    Assignee: Sony Corporation
    Inventors: Akira Masuda, Yoshitaka Abe, Hideharu Fujiyama
  • Patent number: 7664645
    Abstract: The voice of a synthesized voice output is individualized and matched to a user voice, the voice of a communication partner or the voice of a famous personality. In this way mobile terminals in particular can be originally individualized and text messages can be read out using a specific voice.
    Type: Grant
    Filed: March 11, 2005
    Date of Patent: February 16, 2010
    Assignee: SVOX AG
    Inventors: Horst-Udo Hain, Klaus Lukas
  • Patent number: 7657432
    Abstract: A technique for improved score calculation and normalization in a framework of recognition with phonetically structured speaker models. The technique involves determining, for each frame and each level of phonetic detail of a target speaker model, a non-interpolated likelihood value, and then resolving the at least one likelihood value to obtain a likelihood score.
    Type: Grant
    Filed: October 31, 2007
    Date of Patent: February 2, 2010
    Assignee: Nuance Communications, Inc.
    Inventors: Upendra V. Chaudhari, Stephane H. Maes, Jiri Navratil
  • Patent number: 7640159
    Abstract: An accent compensative speech recognition system and related methods for use with a signal processor generating one or more feature vectors based upon a voice-induced electrical signal are provided. The system includes a first-language acoustic module that determines a first-language phoneme sequence based upon one or more feature vectors, and a second-language lexicon module that determines a second-language speech segment based upon the first-language phoneme sequence. A method aspect includes the steps of generating a first-language phoneme sequence from at least one feature vector based upon a first-language phoneme model, and determining a second-language speech segment from the first-language phoneme sequence based upon a second-language lexicon model.
    Type: Grant
    Filed: July 22, 2004
    Date of Patent: December 29, 2009
    Assignee: Nuance Communications, Inc.
    Inventor: David E. Reich
  • Publication number: 20090313018
    Abstract: A computer implemented method, data processing system, apparatus and computer program product for determining current behavioral, psychological and speech styles characteristics of a speaker in a given situation and context, through analysis of current speech utterances of the speaker. The analysis calculates different prosodic parameters of the speech utterances, consisting of unique secondary derivatives of the primary pitch and amplitude speech parameters, and compares these parameters with pre-obtained reference speech data, indicative of various behavioral, psychological and speech styles characteristics. The method includes the formation of the classification speech parameters reference database, as well as the analysis of the speaker's speech utterances in order to determine the current behavioral, psychological and speech styles characteristics of the speaker in the given situation.
    Type: Application
    Filed: June 17, 2008
    Publication date: December 17, 2009
    Inventors: Yoav Degani, Yishai Zamir
  • Patent number: 7634405
    Abstract: The subject invention leverages spectral “palettes” or representations of an input sequence to provide recognition and/or synthesizing of a class of data. The class can include, but is not limited to, individual events, distributions of events, and/or environments relating to the input sequence. The representations are compressed versions of the data that utilize a substantially smaller amount of system resources to store and/or manipulate. Segments of the palettes are employed to facilitate in reconstruction of an event occurring in the input sequence. This provides an efficient means to recognize events, even when they occur in complex environments. The palettes themselves are constructed or “trained” utilizing any number of data compression techniques such as, for example, epitomes, vector quantization, and/or Huffman codes and the like.
    Type: Grant
    Filed: January 24, 2005
    Date of Patent: December 15, 2009
    Assignee: Microsoft Corporation
    Inventors: Sumit Basu, Nebojsa Jojic, Ashish Kapoor
  • Publication number: 20090299744
    Abstract: A voice recognition apparatus determines whether an input sound is a voice segment or a non-voice segment in time series, generates a word model for the voice segment, allocates a predetermined non-voice model for the non-voice segment, connects the word model and the non-voice model in sequence according to the time series of the segments of the input sound corresponding to the respective models and generates a vocalization model, and coordinates the vocalization model with a vocalization ID in one-to-one correspondence, and stores the same.
    Type: Application
    Filed: April 14, 2009
    Publication date: December 3, 2009
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventor: Mitsuyoshi TACHIMORI
  • Patent number: 7627473
    Abstract: A method and apparatus are provided for training and using a hidden conditional random field model for speech recognition and phonetic classification. The hidden conditional random field model uses feature functions, at least one of which is based on a hidden state in a phonetic unit. Values for the feature functions are determined from a segment of speech, and these values are used to identify a phonetic unit for the segment of speech.
    Type: Grant
    Filed: October 15, 2004
    Date of Patent: December 1, 2009
    Assignee: Microsoft Corporation
    Inventors: Asela J. Gunawardana, Milind Mahajan, Alejandro Acero
  • Patent number: 7620547
    Abstract: The present invention provides a method for operating and/or for controlling a man-machine interface unit (MMI) for a finite user group environment. Utterances out of a group of user are repeatedly received. A process of user identification is carried out based on said received utterances. The process of user identification comprises a set of clustering so as to enable an enrolment-free performance.
    Type: Grant
    Filed: January 24, 2005
    Date of Patent: November 17, 2009
    Assignee: Sony Deutschland GmbH
    Inventors: Ralf Kompe, Thomas Kemp
  • Patent number: 7617102
    Abstract: A speaker identifying apparatus includes: a module for performing a principal component analysis on predetermined vocal tract geometrical parameters of a plurality of speakers and calculating an average and principal component vectors representing speaker-dependent variation; a module for performing acoustic analysis on the speech data being uttered for each of the speakers to calculate cepstrum coefficients; a module for calculating principal component coefficients for approximating the vocal tract geometrical parameter of each of the plurality of speakers by a linear sum of principal component coefficients; a module for determining, by multiple regression analysis, a coefficient sequence for estimating principal component coefficients by a linear sum of the plurality of prescribed features, for each of the plurality of speakers; a module for calculating a plurality of features from speech data of the speaker to be identified, and estimating principal component coefficients for calculating the vocal tract ge
    Type: Grant
    Filed: September 27, 2006
    Date of Patent: November 10, 2009
    Assignee: Advanced Telecommunications Research Institute International
    Inventors: Parham Mokhtari, Tatsuya Kitamura, Hironori Takemoto, Seiji Adachi, Kiyoshi Honda
  • Publication number: 20090259469
    Abstract: A method and apparatus for performing speech recognition receives an audio signal, generates a sequence of frames of the audio signal, transforms each frame of the audio signal into a set of narrow band feature vectors using a narrow passband, couples the narrow band feature vectors to a speech model, and determines whether the audio signal is a wide band signal. When the audio signal is determined to be a wide band signal, a pass band parameter of each of one or more passbands that are outside the narrow passband is generated for each frame and the one or more band energy parameters are coupled to the speech model.
    Type: Application
    Filed: April 14, 2008
    Publication date: October 15, 2009
    Applicant: MOTOROLA, INC.
    Inventors: Changxue Ma, Yuan-Jun Wei
  • Patent number: 7603275
    Abstract: Embodiments of a system, method and computer program product for verifying an identity claimed by a claimant using voiced to unvoiced classifiers are described. In accordance with one embodiment, a speech sample from a claimant claiming an identity may be captured. From the speech sample, a ratio of unvoiced frames to a total number of frames in the speech sample may be calculated. An equal error rate value corresponding to the speech sample can then be determined based on the calculated ratio. The determined equal error rate value corresponding to the speech sample may be compared to an equal error rate value associated with the claimed identity in order to select a decision threshold. A match score may be also be generated based on a comparison of the speech sample to a voice sample associated with the claimed identity. A decision whether to accept the identity claim of the claimant can then be made based on a comparison of the match score to the decision threshold.
    Type: Grant
    Filed: October 31, 2005
    Date of Patent: October 13, 2009
    Assignee: Hitachi, Ltd.
    Inventor: Clifford Tavares
  • Patent number: 7590537
    Abstract: A speech recognition method and apparatus perform speaker clustering and speaker adaptation using average model variation information over speakers while analyzing the quantity variation amount and the directional variation amount. In the speaker clustering method, a speaker group model variation is generated based on the model variation between a speaker-independent model and a training speaker ML model. In the speaker adaptation method, the model in which the model variation between a test speaker ML model and a speaker group ML model to which the test speaker belongs which is most similar to a training speaker group model variation is found, and speaker adaptation is performed on the found model. Herein, the model variation in the speaker clustering and the speaker adaptation are calculated while analyzing both the quantity variation amount and the directional variation amount. The present invention may be applied to any speaker adaptation algorithm of MLLR and MAP.
    Type: Grant
    Filed: December 27, 2004
    Date of Patent: September 15, 2009
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Namhoon Kim, Injeong Choi, Yoonkyung Song
  • Patent number: 7574359
    Abstract: The present invention is directed to a 3-stage adaptation framework based on speaker selection training. First a subset of cohort speakers is selected for a test speaker. Then cohort models are transformed to be closer to the test speaker. Finally the adapted model for the test speaker is obtained by combining these transformed cohort models. Combination weights as well as bias items can be adaptively learned from adaptation data.
    Type: Grant
    Filed: October 1, 2004
    Date of Patent: August 11, 2009
    Assignee: Microsoft Corporation
    Inventor: Chao Huang
  • Patent number: 7567903
    Abstract: A method and apparatus for performing speech recognition are provided. A Vocal Tract Length Normalized acoustic model for a speaker is generated from training data. Speech recognition is performed on a first recognition input to determine a first best hypothesis. A first Vocal Tract Length Normalization factor is estimated based on the first best hypothesis. Speech recognition is performed on a second recognition input using the Vocal Tract Length Normalized acoustic model to determine an other best hypothesis. An other Vocal Tract Length Normalization factor is estimated based on the other best hypothesis and at least one previous best hypothesis.
    Type: Grant
    Filed: January 12, 2005
    Date of Patent: July 28, 2009
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Vincent Goffin, Andrej Ljolje, Murat Saraclar
  • Publication number: 20090182561
    Abstract: A speech recognition device and a method thereof are adapted to recognize a Chinese word. The speech recognition device includes a lexicon model, a language model, a speech recognition module, and a parsing module. The lexicon model keeps a plurality of words. The speech recognition module performs a speech recognition processing on a voice signal conforming to a syntax structure of Chinese word description. The speech recognition processing searches words related to the Chinese word description from the lexicon model according to a feature of the Chinese word description, and produces a literal word series in digital data form by referring a syntax combination probability. The language model based on the syntax structure of Chinese word description provides the syntax combination probability according to combination relations between the searched words. The parsing module analyzes the syntax structure of the literal word series for retrieving the Chinese word.
    Type: Application
    Filed: September 16, 2008
    Publication date: July 16, 2009
    Applicant: DELTA ELECTRONICS, INC.
    Inventors: Liang-Sheng Huang, Chao-Jen Huang, Jia-Lin Shen
  • Patent number: 7562015
    Abstract: A distributed pattern recognition training method includes providing data communication between at least one central pattern analysis node and a plurality of peripheral data analysis sites. The method also includes communicating from the at least one central pattern analysis node to the plurality of peripheral data analysis a plurality of kernel-based pattern elements. The method further includes performing a plurality of iterations of pattern template training at each of the plurality of peripheral data analysis sites.
    Type: Grant
    Filed: July 14, 2005
    Date of Patent: July 14, 2009
    Assignee: Aurilab, LLC
    Inventor: James K. Baker
  • Publication number: 20090171661
    Abstract: Techniques for assessing pronunciation abilities of a user are provided. The techniques include recording a sentence spoken by a user, performing a classification of the spoken sentence, wherein the classification is performed with respect to at least one N-ordered class, and wherein the spoken sentence is represented by a set of at least one acoustic feature extracted from the spoken sentence, and determining a score based on the classification, wherein the score is used to determine an optimal set of at least one question to assess pronunciation ability of the user without human intervention.
    Type: Application
    Filed: June 27, 2008
    Publication date: July 2, 2009
    Applicant: International Business Machines Corporation
    Inventors: Jayadeva, Sachindra Joshi, Himanshu Pant, Ashish Verma
  • Patent number: 7552049
    Abstract: An object of the present invention is to enable optimal clustering for many types of noise data and to improve the accuracy of estimation of a speech model sequence of input speech. Noise is added to speech in accordance with noise-to-signal ratio conditions to generate noise-added speech (step S1), the mean value of speech cepstral is subtracted from the generated, noise-added speech (step 2), a Gaussian distribution model of each piece of noise-added speech is created (step S3), the likelihoods of the pieces of noise-added speech are calculated to generate a likelihood matrix (step S4) to obtain a clustering result. An optimum model is selected (step S7) and linear transformation is performed to provide a maximized likelihood (step S8). Because noise-added speech is consistently used both in clustering and model learning, clustering for many types of noise data and an accurate estimation of a speech model sequence can be achieved.
    Type: Grant
    Filed: March 10, 2004
    Date of Patent: June 23, 2009
    Assignees: NTT DoCoMo, Inc., Sadaoki Furui
    Inventors: Zhipeng Zhang, Kiyotaka Otsuji, Toshiaki Sugimura, Sadaoki Furui
  • Publication number: 20090144058
    Abstract: A method for estimating high-order Mel Frequency Cepstral Coefficients, the method comprising initializing any of N-L high-order coefficients (HOC) of an MFCC vector of length N having L low-order coefficients (LOC) to a predetermined value, thereby forming a candidate MFCC vector, synthesizing a speech signal frame from the candidate MFCC vector and a pitch value, and computing an N-dimensional MFCC vector from the synthesized frame, thereby producing an output MFCC vector.
    Type: Application
    Filed: December 3, 2007
    Publication date: June 4, 2009
    Inventor: Alexander Sorin
  • Patent number: 7542904
    Abstract: A method for distributing voice-recognition grammars includes receiving match data from a first remote element. The match data includes information associated with an attempt by the remote element to match received audio information to first stored audio data. The method also includes generating a grammar entry based on the match data. The grammar entry includes second stored audio data and a word identifier associated with the second stored audio data. Additionally, the method includes transmitting the grammar entry to a second remote element.
    Type: Grant
    Filed: August 19, 2005
    Date of Patent: June 2, 2009
    Assignee: Cisco Technology, Inc.
    Inventors: Kevin L. Chestnut, Joseph B. Burton
  • Publication number: 20090119103
    Abstract: A method automatically recognizes speech received through an input. The method accesses one or more speaker-independent speaker models. The method detects whether the received speech input matches a speaker model according to an adaptable predetermined criterion. The method creates a speaker model assigned to a speaker model set when no match occurs based on the input.
    Type: Application
    Filed: October 10, 2008
    Publication date: May 7, 2009
    Inventors: Franz Gerl, Tobias Herbig
  • Patent number: 7529669
    Abstract: A voice based multimodal speaker authentication method and telecommunications application thereof employing a speaker adaptive method for training phenome specific Gaussian mixture models. Applied to telecommunications services, the method may advantageously be implemented in contemporary wireless terminals.
    Type: Grant
    Filed: June 13, 2007
    Date of Patent: May 5, 2009
    Assignee: NEC Laboratories America, Inc.
    Inventors: Srivaths Ravi, Anand Raghunathan, Srimat Chakradhar, Karthik Nandakumar
  • Publication number: 20090106025
    Abstract: EN) A speaker recognition system (1) includes a speaker model registration device (10) which registers a speaker model for speaker recognition in the speaker recognition system. The speaker model registration device includes acquisition means (13) for acquiring utterances by n+? times (wherein n is an integer not smaller than 2 and ? is an integer not smaller than 1); calculation means (20) for calculating a speaker model by using the acquired utterances of n times as utterances for registration; correlation means (30) for correlating the calculated speaker model by using the acquired utterances of ? times as correlation utterances; and registration means (40) for registering those having the correlation result satisfying a predetermined reference among the correlated speaker models, as the speaker model for speaker recognition.
    Type: Application
    Filed: March 16, 2007
    Publication date: April 23, 2009
    Applicant: PIONEER CORPORATION
    Inventor: Soichi Toyama
  • Patent number: 7505906
    Abstract: A method and system for automatic speech recognition are disclosed. The method comprises receiving speech from a user, the speech including at least one speech error, increasing the probabilities of closely related words to the at least one speech error and processing the received speech using the increased probabilities. A corpora of data having common words that are mis-stated is used to identify and increase the probabilities of related words. The method applies to at least the automatic speech recognition module and the spoken language understanding module.
    Type: Grant
    Filed: February 26, 2004
    Date of Patent: March 17, 2009
    Assignee: AT&T Intellectual Property, II
    Inventors: Steven H. Lewis, Kenneth H. Rosen
  • Publication number: 20090063146
    Abstract: In a voice processing device, a male voice index calculator calculates a male voice index indicating a similarity of the input sound relative to a male speaker sound model. A female voice index calculator calculates a female voice index indicating a similarity of the input sound relative to a female speaker sound model. A first discriminator discriminates the input sound between a non-human-voice sound and a human voice sound which may be either of the male voice sound or the female voice sound. A second discriminator discriminates the input sound between the male voice sound and the female voice sound based on the male voice index and the female voice index in case that the first discriminator discriminates the human voice sound.
    Type: Application
    Filed: August 26, 2008
    Publication date: March 5, 2009
    Applicant: Yamaha Corporation
    Inventor: Yasuo Yoshioka
  • Patent number: 7496510
    Abstract: Disclosed are a method and apparatus for processing a continuous audio stream containing human speech in order to locate a particular speech-based transaction in the audio stream, applying both known speaker recognition and speech recognition techniques. Only the utterances of a particular predetermined speaker are transcribed thus providing an index and a summary of the underlying dialogue(s). In a first scenario, an incoming audio stream, e.g. a speech call from outside, is scanned in order to detect audio segments of the predetermined speaker. These audio segments are then indexed and only the indexed segments are transcribed into spoken or written language. In a second scenario, two or more speakers located in one room are using a multi-user speech recognition system (SRS). For each user there exists a different speaker model and optionally a different dictionary or vocabulary of words already known or trained by the speech or voice recognition system.
    Type: Grant
    Filed: November 30, 2001
    Date of Patent: February 24, 2009
    Assignee: International Business Machines Corporation
    Inventors: Joachim Frank, Werner Kriechbaum, Gerhard Stenzel
  • Patent number: 7493258
    Abstract: A method is presented including selecting an initial beam width. The method also includes determining whether a value per frame is changing. A beam width is dynamically adjusted. The method further decides a speech input with the dynamically adjusted beam width. Also, a device is presented including a processor (420). A speech recognition component (610) is connected to the processor (420). A memory (410) is connected to the processor (420). The speech recognition component (610) dynamically adjusts a beam width to decode a speech input.
    Type: Grant
    Filed: July 3, 2001
    Date of Patent: February 17, 2009
    Assignee: Intel Corporation
    Inventors: Alexandr A. Kibkalo, Vyacheslav A. Barannikov
  • Patent number: 7475006
    Abstract: A method and parser are provided that generate a score for a node identified during a parse of a text segment. The score is based on a mutual information score that measures the mutual information between a phrase level for the node and a word class of at least one word in the text segment.
    Type: Grant
    Filed: July 11, 2001
    Date of Patent: January 6, 2009
    Assignee: Microsoft Corporation, Inc.
    Inventor: David N. Weise
  • Patent number: 7475014
    Abstract: A method models trajectories of a signal source. Training signals generated by a signal source moving along known trajectories are acquired by each sensor in an array of sensors. Phase differences between all unique pairs of the training signals are determined. A wrapped-phase hidden Markov model is constructed from the phase differences. The wrapped-phase hidden Markov model includes multiple Gaussian distributions to model the known trajectories of the signal source.
    Type: Grant
    Filed: July 25, 2005
    Date of Patent: January 6, 2009
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Paris Smaragdis, Petros Boufounos
  • Patent number: 7457753
    Abstract: A system for remote assessment of a user is disclosed. The system comprises application software resident on a server and arranged to interact across a network with a user operating a client device to obtain one or more sample signals of the user's speech. A datastore is arranged to store the user speech samples in association with details of the user. A feature extraction engine is arranged to extract one or more first features from respective speech samples. A comparator is arranged to compare the first features extracted from a speech sample with second features extracted from one or more reference samples and to provide a measure of any differences between the first and second features for assessment of the user.
    Type: Grant
    Filed: June 29, 2005
    Date of Patent: November 25, 2008
    Assignee: University College Dublin National University of Ireland
    Inventors: Rosalyn Moran, Richard Reilly, Philip De Chazal, Brian O'Mullane, Peter Lacy
  • Patent number: 7454339
    Abstract: A method for discriminatively training acoustic models is provided for automated speaker verification (SV) and speech (or utterance) verification (UV) systems.
    Type: Grant
    Filed: December 20, 2005
    Date of Patent: November 18, 2008
    Assignee: Panasonic Corporation
    Inventors: Chaojun Liu, David Kryze, Luca Rigazio
  • Publication number: 20080281595
    Abstract: According to an embodiment, voice recognition apparatus includes units of: acoustic processing, voice interval detecting, dictionary, collating, search target selecting, storing and determining, and voice recognition method includes processes of: selecting a search range on basis of abeam search, setting and storing a standard frame, storing an output probability of a certain transition path, determining whether or not the output probability of a certain path is stored. Number of times of calculation of the output probability is reduced by selecting the search range on basis of the beam search, calculating the output probability of the certain transition path only once in an interval from when the standard frame is set to when the standard frame is renewed, and storing and using thus calculated value as an approximate value of the output probability in subsequent frames.
    Type: Application
    Filed: March 30, 2007
    Publication date: November 13, 2008
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Masaru Sakai, Shinichi Tanaka
  • Patent number: 7447633
    Abstract: There is provided an apparatus for providing a Text Independent (TI) speaker recognition mode in a Text Dependent (TD) Hidden Markov Model (HMM) speaker recognition system and/or a Text Constrained (TC) HMM speaker recognition system. The apparatus includes a Gaussian Mixture Model (GMM) generator and a Gaussian weight normalizer. The GMM generator is for creating a GMM by pooling Gaussians from a plurality of HMM states. The Gaussian weight normalizer is for normalizing Gaussian weights with respect to the plurality of HMM states.
    Type: Grant
    Filed: November 22, 2004
    Date of Patent: November 4, 2008
    Assignee: International Business Machines Corporation
    Inventors: Jiri Navratil, James H. Nealand, Jason W. Pelecanos, Ganesh N. Ramaswamy, Ran D. Zilca
  • Publication number: 20080270132
    Abstract: A system and method for identifying an individual includes collecting biometric information for an individual attempting to gain access to a system. The biometric information for the individual is scored against pre-trained imposter models. If a score is greater than a threshold, the individual as an imposter is identified as an imposter. Other systems and methods are also disclosed.
    Type: Application
    Filed: June 3, 2008
    Publication date: October 30, 2008
    Inventors: Jari Navratil, Ganesh N. Ramaswamy, Ran D. Zilca
  • Publication number: 20080255843
    Abstract: The invention provides a method of voice recognition, and the method includes the steps of: obtaining a current position information, obtaining a current voice model according to the current position information; and performing voice recognition according to the current voice model. Particularly, the current position information can be obtained according to network address information, or by a global positioning system.
    Type: Application
    Filed: April 10, 2008
    Publication date: October 16, 2008
    Inventors: Yu-Chen Sun, Chang-Hung Lee
  • Patent number: 7437289
    Abstract: Methods and apparatus for the rapid adaptation of classification systems using small amounts of adaptation data. Improvements in classification accuracy are attainable when conditions similar to those that present in adaptation are observed. The attendant methods and apparatus are suitable for a wide variety of different classification schemes, including, e.g., speaker identification and speaker verification.
    Type: Grant
    Filed: August 16, 2001
    Date of Patent: October 14, 2008
    Assignee: International Business Machines Corporation
    Inventors: Upendra V. Chaudhari, Stephane H. Maes, Jiri Navratil
  • Publication number: 20080249774
    Abstract: Disclosed is a method for speech speaker recognition of a speech speaker recognition apparatus, the method including detecting effective speech data from input speech; extracting an acoustic feature from the speech data; generating an acoustic feature transformation matrix from the speech data according to each of Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA), mixing each of the acoustic feature transformation matrixes to construct a hybrid acoustic feature transformation matrix, and multiplying the matrix representing the acoustic feature with the hybrid acoustic feature transformation matrix to generate a final feature vector; and generating a speaker model from the final feature vector, comparing a pre-stored universal speaker model with the generated speaker model to identify the speaker, and verifying the identified speaker.
    Type: Application
    Filed: April 2, 2008
    Publication date: October 9, 2008
    Applicants: SAMSUNG ELECTRONICS CO., LTD., ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE
    Inventors: Hyun-Soo KIM, Myeong-gi Jeong, Hyun-Sik Shim, Young-Hee Park, Ha-Jin Yoo, Guen-Chang Kwak, Hye-Jin Kim, Kyung-Sook Bae
  • Patent number: 7430509
    Abstract: Initially an embedding module (22) determines an embedding of a lattice in a two-dimensional plane. The embedding module (22) then processes the initial embedding to generate a planar graph in which no links cross. The planar graph is then simplified by a link encoding module (24) and data representing the lattice structure is generated by a shape encoding module (26)—in which the simplified planar graph is represented by a shape encoding (42) identifying the numbers of links bounding areas defined by the planar graph and data identifying the locations of those areas within the planar graph; and a link list (43) identifying the modifications made to the lattice structure by the link encoding module (24). These encodings are such that the same substructures within a lattice are represented using the same data and hence are suitable for compression using conventional techniques.
    Type: Grant
    Filed: October 10, 2003
    Date of Patent: September 30, 2008
    Assignee: Canon Kabushiki Kaisha
    Inventors: Uwe Helmut Jost, Michael Richard Atkinson
  • Publication number: 20080235007
    Abstract: A method and system for speaker recognition and identification includes transforming features of a speaker utterance in a first condition state to match a second condition state and provide a transformed utterance. A discriminative criterion is used to generate a transform that maps an utterance to obtain a computed result. The discriminative criterion is maximized over a plurality of speakers to obtain a best transform for recognizing speech and/or identifying a speaker under the second condition state. Speech recognition and speaker identity may be determined by employing the best transform for decoding speech to reduce channel mismatch.
    Type: Application
    Filed: June 3, 2008
    Publication date: September 25, 2008
    Inventors: Jiri Navratil, Jagon Pelecanos, Ganesh N. Ramaswamy
  • Patent number: 7424426
    Abstract: An object of the present invention is to facilitate dealing with noisy speech with varying SNR and save calculation costs by generating a speech model with a single-tree-structure and using the model for speech recognition. Every piece of noise data stored in a noise database is used under every SNR condition to calculate the distance between all noise models with the SNR conditions and the noise-added speech is clustered. Based on the result of the clustering, a single-tree-structure model space into which the noise and SNR are integrated is generated (steps S1 to S5). At a noise extraction step (step S6), inputted noisy speech to be recognized is analyzed to extract a feature parameter string and the likelihoods of HMMs are compared one another to select an optimum model from the tree-structure noisy speech model space (step S7). Linear transformation is applied to the selected noisy speech model space so that the likelihood is maximized (step S8).
    Type: Grant
    Filed: August 18, 2004
    Date of Patent: September 9, 2008
    Assignee: Sadaoki Furui and NTT DoCoMo, Inc.
    Inventors: Sadaoki Furui, Zhipeng Zhang, Tsutomu Horikoshi, Toshiaki Sugimura
  • Patent number: 7424425
    Abstract: In detection systems, such as speaker verification systems, for a given operating point range, with an associated detection “cost”, the detection cost is preferably reduced by essentially trading off the system error in the area of interest with areas essentially “outside” that interest. Among the advantages achieved thereby are higher optimization gain and better generalization. From a measurable Detection Error Tradeoff (DET) curve of the given detection system, a criterion is preferably derived, such that its minimization provably leads to detection cost reduction in the area of interest. The criterion allows for selective access to the slope and offset of the DET curve (a line in case of normally distributed detection scores, a curve approximated by mixture of Gaussians in case of other distributions). By modifying the slope of the DET curve, the behavior of the detection system is changed favorably with respect to the given area of interest.
    Type: Grant
    Filed: May 19, 2002
    Date of Patent: September 9, 2008
    Assignee: International Business Machines Corporation
    Inventors: Jiri Navratil, Ganesh N. Ramaswamy
  • Publication number: 20080208581
    Abstract: A system and method for speaker recognition speaker modelling whereby prior speaker information is incorporated into the modelling process, utilising the maximum a posteriori (MAP) algorithm and extending it to contain prior Gaussian component correlation information. Firstly a background model (10) is estimated. Pooled acoustic reference data (11) relating to a specific demographic of speakers (population of interest) from a given total population is then trained via the Expectation Maximization (EM) algorithm (12) to produce a background model (13). The background model (13) is adapted utilising information from a plurality of reference speakers (21) in accordance with the Maximum A Posteriori (MAP) criterion (22). Utilizing MAP estimation technique, the reference speaker data and prior information obtained from the background model parameters are combined to produce a library of adapted speaker models, namely Gaussian Mixture Models (23).
    Type: Application
    Filed: December 3, 2004
    Publication date: August 28, 2008
    Inventors: Jason Pelecanos, Subramanian Sridharan, Robert Vogt
  • Patent number: 7409343
    Abstract: During a learning phase, a speech recognition device generates parameters of an acceptance voice model relating to a voice segment spoken by an authorized speaker and a rejection voice model. It uses normalization parameters to normalize a speaker verification score depending on the likelihood ratio of a voice segment to be tested and the acceptance model and rejection model. The speaker obtains access to a service application only if the normalized score is above a threshold. According to the invention, a module updates the normalization parameters as a function of the verification score on each voice segment test only if the normalized score is above a second threshold.
    Type: Grant
    Filed: July 22, 2003
    Date of Patent: August 5, 2008
    Assignee: France Telecom
    Inventor: Delphine Charlet
  • Publication number: 20080154599
    Abstract: The present invention discloses a system and a method for authenticating a user based upon a spoken password processed though a standard speech recognition engine lacking specialized speaker identification and verification (SIV) capabilities. It should be noted that the standard speech recognition grammar can be capable of acoustically generating speech recognition grammars in accordance with the cross referenced application indicated herein. The invention can prompt a user for a free-form password and can receive a user utterance in response. The utterance can be processed through a speech recognition engine (e.g., during a grammar enrollment operation) to generate an acoustic baseform. Future user utterances can be matched against the acoustic baseform. Results from the future matches can be used to determine whether to grant the user access to a secure resource.
    Type: Application
    Filed: June 26, 2007
    Publication date: June 26, 2008
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Brien H. Muschett, Julia A. Parker