Specialized Models Patents (Class 704/250)
-
Patent number: 7689414Abstract: In a speech recognition device (1) for recognizing text information (TI) corresponding to speech information (SI), wherein speech information (SI) can be characterized in respect of language properties, there are firstly provided at least two language-property recognition means (20, 21, 22, 23), each of the language-property recognition means (20, 21, 22, 23) being arranged, by using the speech information (SI), to recognize a language property assigned to said means and to generate property information (ASI, LI, SGI, CI) representing the language property that is recognized, and secondly there are provided speech recognition means (24) that, while continuously taking into account the at least two items of property information (ASI, LI, SGI, CI), are arranged to recognize the text information (TI) corresponding to the speech information (SI).Type: GrantFiled: October 31, 2003Date of Patent: March 30, 2010Assignee: Nuance Communications Austria GmbHInventor: Zsolt Saffer
-
Patent number: 7689418Abstract: A system and method for verifying user identity, in accordance with the present invention, includes a conversational system for receiving inputs from a user and transforming the inputs into formal commands. A behavior verifier is coupled to the conversational system for extracting features from the inputs. The features include behavior patterns of the user. The behavior verifier is adapted to compare the input behavior to a behavior model to determine if the user is authorized to interact with the system.Type: GrantFiled: September 12, 2002Date of Patent: March 30, 2010Assignee: Nuance Communications, Inc.Inventors: Ganesh N. Ramaswamy, Upendra V. Chaudhari
-
Patent number: 7689420Abstract: Architecture for integrating and generating back-off grammars (BOG) in a speech recognition application for recognizing out-of-grammar (OOG) utterances and updating the context-free grammars (CFG) with the results. A parsing component identifies keywords and/or slots from user utterances and a grammar generation component adds filler tags before and/or after the keywords and slots to create new grammar rules. The BOG can be generated from these new grammar rules and can be used to process the OOG user utterances. By processing the OOG user utterances through the BOG, the architecture can recognize and perform the intended task on behalf of the user.Type: GrantFiled: April 6, 2006Date of Patent: March 30, 2010Assignee: Microsoft CorporationInventors: Timothy S. Paek, David M. Chickering, Eric Norman Badger, Qiang Wu
-
Patent number: 7684977Abstract: In an interface unit, an input section obtains an input signal of user's speech or the like and an input processing section processes the input signal and detects information relating to the user. On the basis of the detection result, a response contents determination section determines response contents to the user. While, a response manner adjusting section adjusts a response manner to the user, such as speech speed and the like, on the basis of the processing state of the input signal, the information relating to the user detected from the input signal, and the like.Type: GrantFiled: June 8, 2006Date of Patent: March 23, 2010Assignee: Panasonic CorporationInventor: Koji Morikawa
-
Publication number: 20100070278Abstract: A transformation can be derived which would represent that processing required to convert a male speech model to a female speech model. That transformation is subjected to a predetermined modification, and the modified transformation is applied to a female speech model to produce a synthetic children's speech model. The male and female models can be expressed in terms of a vector representing key values defining each speech model and the derived transformation can be in the form of a matrix that would transform the vector of the male model to the vector of the female model. The modification to the derived matrix comprises applying an exponential p which has a value greater than zero and less than 1.Type: ApplicationFiled: September 12, 2008Publication date: March 18, 2010Inventors: Andreas Hagen, Bryan Peltom, Kadri Hacioglu
-
Patent number: 7672844Abstract: A voice processing apparatus for performing voiceprint recognition processing with high accuracy even in the case where a plurality of conference participants speak at a time in a conference; wherein a bi-directional telephonic communication portion receives as an input respective voice signals from a plurality of microphones, selects one microphone based on the input voice signals, and outputs a voice signal from the microphone; a voiceprint recognition portion 322 performs voiceprint recognition based on the input voice signal in voiceprint recognizable period, and stores voiceprint data successively in a buffer; and a CPU takes out voiceprint data successively from the buffer, checking against voiceprint data stored in a voiceprint register, specifies a speaker, and processes the voice signal output from the bi-directional telephonic communication portion by associating the same with the speaker.Type: GrantFiled: August 3, 2004Date of Patent: March 2, 2010Assignee: Sony CorporationInventors: Akira Masuda, Yoshitaka Abe, Hideharu Fujiyama
-
Patent number: 7664645Abstract: The voice of a synthesized voice output is individualized and matched to a user voice, the voice of a communication partner or the voice of a famous personality. In this way mobile terminals in particular can be originally individualized and text messages can be read out using a specific voice.Type: GrantFiled: March 11, 2005Date of Patent: February 16, 2010Assignee: SVOX AGInventors: Horst-Udo Hain, Klaus Lukas
-
Patent number: 7657432Abstract: A technique for improved score calculation and normalization in a framework of recognition with phonetically structured speaker models. The technique involves determining, for each frame and each level of phonetic detail of a target speaker model, a non-interpolated likelihood value, and then resolving the at least one likelihood value to obtain a likelihood score.Type: GrantFiled: October 31, 2007Date of Patent: February 2, 2010Assignee: Nuance Communications, Inc.Inventors: Upendra V. Chaudhari, Stephane H. Maes, Jiri Navratil
-
Patent number: 7640159Abstract: An accent compensative speech recognition system and related methods for use with a signal processor generating one or more feature vectors based upon a voice-induced electrical signal are provided. The system includes a first-language acoustic module that determines a first-language phoneme sequence based upon one or more feature vectors, and a second-language lexicon module that determines a second-language speech segment based upon the first-language phoneme sequence. A method aspect includes the steps of generating a first-language phoneme sequence from at least one feature vector based upon a first-language phoneme model, and determining a second-language speech segment from the first-language phoneme sequence based upon a second-language lexicon model.Type: GrantFiled: July 22, 2004Date of Patent: December 29, 2009Assignee: Nuance Communications, Inc.Inventor: David E. Reich
-
Publication number: 20090313018Abstract: A computer implemented method, data processing system, apparatus and computer program product for determining current behavioral, psychological and speech styles characteristics of a speaker in a given situation and context, through analysis of current speech utterances of the speaker. The analysis calculates different prosodic parameters of the speech utterances, consisting of unique secondary derivatives of the primary pitch and amplitude speech parameters, and compares these parameters with pre-obtained reference speech data, indicative of various behavioral, psychological and speech styles characteristics. The method includes the formation of the classification speech parameters reference database, as well as the analysis of the speaker's speech utterances in order to determine the current behavioral, psychological and speech styles characteristics of the speaker in the given situation.Type: ApplicationFiled: June 17, 2008Publication date: December 17, 2009Inventors: Yoav Degani, Yishai Zamir
-
Patent number: 7634405Abstract: The subject invention leverages spectral “palettes” or representations of an input sequence to provide recognition and/or synthesizing of a class of data. The class can include, but is not limited to, individual events, distributions of events, and/or environments relating to the input sequence. The representations are compressed versions of the data that utilize a substantially smaller amount of system resources to store and/or manipulate. Segments of the palettes are employed to facilitate in reconstruction of an event occurring in the input sequence. This provides an efficient means to recognize events, even when they occur in complex environments. The palettes themselves are constructed or “trained” utilizing any number of data compression techniques such as, for example, epitomes, vector quantization, and/or Huffman codes and the like.Type: GrantFiled: January 24, 2005Date of Patent: December 15, 2009Assignee: Microsoft CorporationInventors: Sumit Basu, Nebojsa Jojic, Ashish Kapoor
-
Publication number: 20090299744Abstract: A voice recognition apparatus determines whether an input sound is a voice segment or a non-voice segment in time series, generates a word model for the voice segment, allocates a predetermined non-voice model for the non-voice segment, connects the word model and the non-voice model in sequence according to the time series of the segments of the input sound corresponding to the respective models and generates a vocalization model, and coordinates the vocalization model with a vocalization ID in one-to-one correspondence, and stores the same.Type: ApplicationFiled: April 14, 2009Publication date: December 3, 2009Applicant: KABUSHIKI KAISHA TOSHIBAInventor: Mitsuyoshi TACHIMORI
-
Patent number: 7627473Abstract: A method and apparatus are provided for training and using a hidden conditional random field model for speech recognition and phonetic classification. The hidden conditional random field model uses feature functions, at least one of which is based on a hidden state in a phonetic unit. Values for the feature functions are determined from a segment of speech, and these values are used to identify a phonetic unit for the segment of speech.Type: GrantFiled: October 15, 2004Date of Patent: December 1, 2009Assignee: Microsoft CorporationInventors: Asela J. Gunawardana, Milind Mahajan, Alejandro Acero
-
Patent number: 7620547Abstract: The present invention provides a method for operating and/or for controlling a man-machine interface unit (MMI) for a finite user group environment. Utterances out of a group of user are repeatedly received. A process of user identification is carried out based on said received utterances. The process of user identification comprises a set of clustering so as to enable an enrolment-free performance.Type: GrantFiled: January 24, 2005Date of Patent: November 17, 2009Assignee: Sony Deutschland GmbHInventors: Ralf Kompe, Thomas Kemp
-
Patent number: 7617102Abstract: A speaker identifying apparatus includes: a module for performing a principal component analysis on predetermined vocal tract geometrical parameters of a plurality of speakers and calculating an average and principal component vectors representing speaker-dependent variation; a module for performing acoustic analysis on the speech data being uttered for each of the speakers to calculate cepstrum coefficients; a module for calculating principal component coefficients for approximating the vocal tract geometrical parameter of each of the plurality of speakers by a linear sum of principal component coefficients; a module for determining, by multiple regression analysis, a coefficient sequence for estimating principal component coefficients by a linear sum of the plurality of prescribed features, for each of the plurality of speakers; a module for calculating a plurality of features from speech data of the speaker to be identified, and estimating principal component coefficients for calculating the vocal tract geType: GrantFiled: September 27, 2006Date of Patent: November 10, 2009Assignee: Advanced Telecommunications Research Institute InternationalInventors: Parham Mokhtari, Tatsuya Kitamura, Hironori Takemoto, Seiji Adachi, Kiyoshi Honda
-
Publication number: 20090259469Abstract: A method and apparatus for performing speech recognition receives an audio signal, generates a sequence of frames of the audio signal, transforms each frame of the audio signal into a set of narrow band feature vectors using a narrow passband, couples the narrow band feature vectors to a speech model, and determines whether the audio signal is a wide band signal. When the audio signal is determined to be a wide band signal, a pass band parameter of each of one or more passbands that are outside the narrow passband is generated for each frame and the one or more band energy parameters are coupled to the speech model.Type: ApplicationFiled: April 14, 2008Publication date: October 15, 2009Applicant: MOTOROLA, INC.Inventors: Changxue Ma, Yuan-Jun Wei
-
Patent number: 7603275Abstract: Embodiments of a system, method and computer program product for verifying an identity claimed by a claimant using voiced to unvoiced classifiers are described. In accordance with one embodiment, a speech sample from a claimant claiming an identity may be captured. From the speech sample, a ratio of unvoiced frames to a total number of frames in the speech sample may be calculated. An equal error rate value corresponding to the speech sample can then be determined based on the calculated ratio. The determined equal error rate value corresponding to the speech sample may be compared to an equal error rate value associated with the claimed identity in order to select a decision threshold. A match score may be also be generated based on a comparison of the speech sample to a voice sample associated with the claimed identity. A decision whether to accept the identity claim of the claimant can then be made based on a comparison of the match score to the decision threshold.Type: GrantFiled: October 31, 2005Date of Patent: October 13, 2009Assignee: Hitachi, Ltd.Inventor: Clifford Tavares
-
Patent number: 7590537Abstract: A speech recognition method and apparatus perform speaker clustering and speaker adaptation using average model variation information over speakers while analyzing the quantity variation amount and the directional variation amount. In the speaker clustering method, a speaker group model variation is generated based on the model variation between a speaker-independent model and a training speaker ML model. In the speaker adaptation method, the model in which the model variation between a test speaker ML model and a speaker group ML model to which the test speaker belongs which is most similar to a training speaker group model variation is found, and speaker adaptation is performed on the found model. Herein, the model variation in the speaker clustering and the speaker adaptation are calculated while analyzing both the quantity variation amount and the directional variation amount. The present invention may be applied to any speaker adaptation algorithm of MLLR and MAP.Type: GrantFiled: December 27, 2004Date of Patent: September 15, 2009Assignee: Samsung Electronics Co., Ltd.Inventors: Namhoon Kim, Injeong Choi, Yoonkyung Song
-
Patent number: 7574359Abstract: The present invention is directed to a 3-stage adaptation framework based on speaker selection training. First a subset of cohort speakers is selected for a test speaker. Then cohort models are transformed to be closer to the test speaker. Finally the adapted model for the test speaker is obtained by combining these transformed cohort models. Combination weights as well as bias items can be adaptively learned from adaptation data.Type: GrantFiled: October 1, 2004Date of Patent: August 11, 2009Assignee: Microsoft CorporationInventor: Chao Huang
-
Patent number: 7567903Abstract: A method and apparatus for performing speech recognition are provided. A Vocal Tract Length Normalized acoustic model for a speaker is generated from training data. Speech recognition is performed on a first recognition input to determine a first best hypothesis. A first Vocal Tract Length Normalization factor is estimated based on the first best hypothesis. Speech recognition is performed on a second recognition input using the Vocal Tract Length Normalized acoustic model to determine an other best hypothesis. An other Vocal Tract Length Normalization factor is estimated based on the other best hypothesis and at least one previous best hypothesis.Type: GrantFiled: January 12, 2005Date of Patent: July 28, 2009Assignee: AT&T Intellectual Property II, L.P.Inventors: Vincent Goffin, Andrej Ljolje, Murat Saraclar
-
Publication number: 20090182561Abstract: A speech recognition device and a method thereof are adapted to recognize a Chinese word. The speech recognition device includes a lexicon model, a language model, a speech recognition module, and a parsing module. The lexicon model keeps a plurality of words. The speech recognition module performs a speech recognition processing on a voice signal conforming to a syntax structure of Chinese word description. The speech recognition processing searches words related to the Chinese word description from the lexicon model according to a feature of the Chinese word description, and produces a literal word series in digital data form by referring a syntax combination probability. The language model based on the syntax structure of Chinese word description provides the syntax combination probability according to combination relations between the searched words. The parsing module analyzes the syntax structure of the literal word series for retrieving the Chinese word.Type: ApplicationFiled: September 16, 2008Publication date: July 16, 2009Applicant: DELTA ELECTRONICS, INC.Inventors: Liang-Sheng Huang, Chao-Jen Huang, Jia-Lin Shen
-
Patent number: 7562015Abstract: A distributed pattern recognition training method includes providing data communication between at least one central pattern analysis node and a plurality of peripheral data analysis sites. The method also includes communicating from the at least one central pattern analysis node to the plurality of peripheral data analysis a plurality of kernel-based pattern elements. The method further includes performing a plurality of iterations of pattern template training at each of the plurality of peripheral data analysis sites.Type: GrantFiled: July 14, 2005Date of Patent: July 14, 2009Assignee: Aurilab, LLCInventor: James K. Baker
-
Publication number: 20090171661Abstract: Techniques for assessing pronunciation abilities of a user are provided. The techniques include recording a sentence spoken by a user, performing a classification of the spoken sentence, wherein the classification is performed with respect to at least one N-ordered class, and wherein the spoken sentence is represented by a set of at least one acoustic feature extracted from the spoken sentence, and determining a score based on the classification, wherein the score is used to determine an optimal set of at least one question to assess pronunciation ability of the user without human intervention.Type: ApplicationFiled: June 27, 2008Publication date: July 2, 2009Applicant: International Business Machines CorporationInventors: Jayadeva, Sachindra Joshi, Himanshu Pant, Ashish Verma
-
Patent number: 7552049Abstract: An object of the present invention is to enable optimal clustering for many types of noise data and to improve the accuracy of estimation of a speech model sequence of input speech. Noise is added to speech in accordance with noise-to-signal ratio conditions to generate noise-added speech (step S1), the mean value of speech cepstral is subtracted from the generated, noise-added speech (step 2), a Gaussian distribution model of each piece of noise-added speech is created (step S3), the likelihoods of the pieces of noise-added speech are calculated to generate a likelihood matrix (step S4) to obtain a clustering result. An optimum model is selected (step S7) and linear transformation is performed to provide a maximized likelihood (step S8). Because noise-added speech is consistently used both in clustering and model learning, clustering for many types of noise data and an accurate estimation of a speech model sequence can be achieved.Type: GrantFiled: March 10, 2004Date of Patent: June 23, 2009Assignees: NTT DoCoMo, Inc., Sadaoki FuruiInventors: Zhipeng Zhang, Kiyotaka Otsuji, Toshiaki Sugimura, Sadaoki Furui
-
Publication number: 20090144058Abstract: A method for estimating high-order Mel Frequency Cepstral Coefficients, the method comprising initializing any of N-L high-order coefficients (HOC) of an MFCC vector of length N having L low-order coefficients (LOC) to a predetermined value, thereby forming a candidate MFCC vector, synthesizing a speech signal frame from the candidate MFCC vector and a pitch value, and computing an N-dimensional MFCC vector from the synthesized frame, thereby producing an output MFCC vector.Type: ApplicationFiled: December 3, 2007Publication date: June 4, 2009Inventor: Alexander Sorin
-
Patent number: 7542904Abstract: A method for distributing voice-recognition grammars includes receiving match data from a first remote element. The match data includes information associated with an attempt by the remote element to match received audio information to first stored audio data. The method also includes generating a grammar entry based on the match data. The grammar entry includes second stored audio data and a word identifier associated with the second stored audio data. Additionally, the method includes transmitting the grammar entry to a second remote element.Type: GrantFiled: August 19, 2005Date of Patent: June 2, 2009Assignee: Cisco Technology, Inc.Inventors: Kevin L. Chestnut, Joseph B. Burton
-
Publication number: 20090119103Abstract: A method automatically recognizes speech received through an input. The method accesses one or more speaker-independent speaker models. The method detects whether the received speech input matches a speaker model according to an adaptable predetermined criterion. The method creates a speaker model assigned to a speaker model set when no match occurs based on the input.Type: ApplicationFiled: October 10, 2008Publication date: May 7, 2009Inventors: Franz Gerl, Tobias Herbig
-
Patent number: 7529669Abstract: A voice based multimodal speaker authentication method and telecommunications application thereof employing a speaker adaptive method for training phenome specific Gaussian mixture models. Applied to telecommunications services, the method may advantageously be implemented in contemporary wireless terminals.Type: GrantFiled: June 13, 2007Date of Patent: May 5, 2009Assignee: NEC Laboratories America, Inc.Inventors: Srivaths Ravi, Anand Raghunathan, Srimat Chakradhar, Karthik Nandakumar
-
Publication number: 20090106025Abstract: EN) A speaker recognition system (1) includes a speaker model registration device (10) which registers a speaker model for speaker recognition in the speaker recognition system. The speaker model registration device includes acquisition means (13) for acquiring utterances by n+? times (wherein n is an integer not smaller than 2 and ? is an integer not smaller than 1); calculation means (20) for calculating a speaker model by using the acquired utterances of n times as utterances for registration; correlation means (30) for correlating the calculated speaker model by using the acquired utterances of ? times as correlation utterances; and registration means (40) for registering those having the correlation result satisfying a predetermined reference among the correlated speaker models, as the speaker model for speaker recognition.Type: ApplicationFiled: March 16, 2007Publication date: April 23, 2009Applicant: PIONEER CORPORATIONInventor: Soichi Toyama
-
Patent number: 7505906Abstract: A method and system for automatic speech recognition are disclosed. The method comprises receiving speech from a user, the speech including at least one speech error, increasing the probabilities of closely related words to the at least one speech error and processing the received speech using the increased probabilities. A corpora of data having common words that are mis-stated is used to identify and increase the probabilities of related words. The method applies to at least the automatic speech recognition module and the spoken language understanding module.Type: GrantFiled: February 26, 2004Date of Patent: March 17, 2009Assignee: AT&T Intellectual Property, IIInventors: Steven H. Lewis, Kenneth H. Rosen
-
Publication number: 20090063146Abstract: In a voice processing device, a male voice index calculator calculates a male voice index indicating a similarity of the input sound relative to a male speaker sound model. A female voice index calculator calculates a female voice index indicating a similarity of the input sound relative to a female speaker sound model. A first discriminator discriminates the input sound between a non-human-voice sound and a human voice sound which may be either of the male voice sound or the female voice sound. A second discriminator discriminates the input sound between the male voice sound and the female voice sound based on the male voice index and the female voice index in case that the first discriminator discriminates the human voice sound.Type: ApplicationFiled: August 26, 2008Publication date: March 5, 2009Applicant: Yamaha CorporationInventor: Yasuo Yoshioka
-
Patent number: 7496510Abstract: Disclosed are a method and apparatus for processing a continuous audio stream containing human speech in order to locate a particular speech-based transaction in the audio stream, applying both known speaker recognition and speech recognition techniques. Only the utterances of a particular predetermined speaker are transcribed thus providing an index and a summary of the underlying dialogue(s). In a first scenario, an incoming audio stream, e.g. a speech call from outside, is scanned in order to detect audio segments of the predetermined speaker. These audio segments are then indexed and only the indexed segments are transcribed into spoken or written language. In a second scenario, two or more speakers located in one room are using a multi-user speech recognition system (SRS). For each user there exists a different speaker model and optionally a different dictionary or vocabulary of words already known or trained by the speech or voice recognition system.Type: GrantFiled: November 30, 2001Date of Patent: February 24, 2009Assignee: International Business Machines CorporationInventors: Joachim Frank, Werner Kriechbaum, Gerhard Stenzel
-
Patent number: 7493258Abstract: A method is presented including selecting an initial beam width. The method also includes determining whether a value per frame is changing. A beam width is dynamically adjusted. The method further decides a speech input with the dynamically adjusted beam width. Also, a device is presented including a processor (420). A speech recognition component (610) is connected to the processor (420). A memory (410) is connected to the processor (420). The speech recognition component (610) dynamically adjusts a beam width to decode a speech input.Type: GrantFiled: July 3, 2001Date of Patent: February 17, 2009Assignee: Intel CorporationInventors: Alexandr A. Kibkalo, Vyacheslav A. Barannikov
-
Patent number: 7475006Abstract: A method and parser are provided that generate a score for a node identified during a parse of a text segment. The score is based on a mutual information score that measures the mutual information between a phrase level for the node and a word class of at least one word in the text segment.Type: GrantFiled: July 11, 2001Date of Patent: January 6, 2009Assignee: Microsoft Corporation, Inc.Inventor: David N. Weise
-
Patent number: 7475014Abstract: A method models trajectories of a signal source. Training signals generated by a signal source moving along known trajectories are acquired by each sensor in an array of sensors. Phase differences between all unique pairs of the training signals are determined. A wrapped-phase hidden Markov model is constructed from the phase differences. The wrapped-phase hidden Markov model includes multiple Gaussian distributions to model the known trajectories of the signal source.Type: GrantFiled: July 25, 2005Date of Patent: January 6, 2009Assignee: Mitsubishi Electric Research Laboratories, Inc.Inventors: Paris Smaragdis, Petros Boufounos
-
Patent number: 7457753Abstract: A system for remote assessment of a user is disclosed. The system comprises application software resident on a server and arranged to interact across a network with a user operating a client device to obtain one or more sample signals of the user's speech. A datastore is arranged to store the user speech samples in association with details of the user. A feature extraction engine is arranged to extract one or more first features from respective speech samples. A comparator is arranged to compare the first features extracted from a speech sample with second features extracted from one or more reference samples and to provide a measure of any differences between the first and second features for assessment of the user.Type: GrantFiled: June 29, 2005Date of Patent: November 25, 2008Assignee: University College Dublin National University of IrelandInventors: Rosalyn Moran, Richard Reilly, Philip De Chazal, Brian O'Mullane, Peter Lacy
-
Patent number: 7454339Abstract: A method for discriminatively training acoustic models is provided for automated speaker verification (SV) and speech (or utterance) verification (UV) systems.Type: GrantFiled: December 20, 2005Date of Patent: November 18, 2008Assignee: Panasonic CorporationInventors: Chaojun Liu, David Kryze, Luca Rigazio
-
Publication number: 20080281595Abstract: According to an embodiment, voice recognition apparatus includes units of: acoustic processing, voice interval detecting, dictionary, collating, search target selecting, storing and determining, and voice recognition method includes processes of: selecting a search range on basis of abeam search, setting and storing a standard frame, storing an output probability of a certain transition path, determining whether or not the output probability of a certain path is stored. Number of times of calculation of the output probability is reduced by selecting the search range on basis of the beam search, calculating the output probability of the certain transition path only once in an interval from when the standard frame is set to when the standard frame is renewed, and storing and using thus calculated value as an approximate value of the output probability in subsequent frames.Type: ApplicationFiled: March 30, 2007Publication date: November 13, 2008Applicant: KABUSHIKI KAISHA TOSHIBAInventors: Masaru Sakai, Shinichi Tanaka
-
Patent number: 7447633Abstract: There is provided an apparatus for providing a Text Independent (TI) speaker recognition mode in a Text Dependent (TD) Hidden Markov Model (HMM) speaker recognition system and/or a Text Constrained (TC) HMM speaker recognition system. The apparatus includes a Gaussian Mixture Model (GMM) generator and a Gaussian weight normalizer. The GMM generator is for creating a GMM by pooling Gaussians from a plurality of HMM states. The Gaussian weight normalizer is for normalizing Gaussian weights with respect to the plurality of HMM states.Type: GrantFiled: November 22, 2004Date of Patent: November 4, 2008Assignee: International Business Machines CorporationInventors: Jiri Navratil, James H. Nealand, Jason W. Pelecanos, Ganesh N. Ramaswamy, Ran D. Zilca
-
Publication number: 20080270132Abstract: A system and method for identifying an individual includes collecting biometric information for an individual attempting to gain access to a system. The biometric information for the individual is scored against pre-trained imposter models. If a score is greater than a threshold, the individual as an imposter is identified as an imposter. Other systems and methods are also disclosed.Type: ApplicationFiled: June 3, 2008Publication date: October 30, 2008Inventors: Jari Navratil, Ganesh N. Ramaswamy, Ran D. Zilca
-
Publication number: 20080255843Abstract: The invention provides a method of voice recognition, and the method includes the steps of: obtaining a current position information, obtaining a current voice model according to the current position information; and performing voice recognition according to the current voice model. Particularly, the current position information can be obtained according to network address information, or by a global positioning system.Type: ApplicationFiled: April 10, 2008Publication date: October 16, 2008Inventors: Yu-Chen Sun, Chang-Hung Lee
-
Patent number: 7437289Abstract: Methods and apparatus for the rapid adaptation of classification systems using small amounts of adaptation data. Improvements in classification accuracy are attainable when conditions similar to those that present in adaptation are observed. The attendant methods and apparatus are suitable for a wide variety of different classification schemes, including, e.g., speaker identification and speaker verification.Type: GrantFiled: August 16, 2001Date of Patent: October 14, 2008Assignee: International Business Machines CorporationInventors: Upendra V. Chaudhari, Stephane H. Maes, Jiri Navratil
-
Publication number: 20080249774Abstract: Disclosed is a method for speech speaker recognition of a speech speaker recognition apparatus, the method including detecting effective speech data from input speech; extracting an acoustic feature from the speech data; generating an acoustic feature transformation matrix from the speech data according to each of Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA), mixing each of the acoustic feature transformation matrixes to construct a hybrid acoustic feature transformation matrix, and multiplying the matrix representing the acoustic feature with the hybrid acoustic feature transformation matrix to generate a final feature vector; and generating a speaker model from the final feature vector, comparing a pre-stored universal speaker model with the generated speaker model to identify the speaker, and verifying the identified speaker.Type: ApplicationFiled: April 2, 2008Publication date: October 9, 2008Applicants: SAMSUNG ELECTRONICS CO., LTD., ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTEInventors: Hyun-Soo KIM, Myeong-gi Jeong, Hyun-Sik Shim, Young-Hee Park, Ha-Jin Yoo, Guen-Chang Kwak, Hye-Jin Kim, Kyung-Sook Bae
-
Patent number: 7430509Abstract: Initially an embedding module (22) determines an embedding of a lattice in a two-dimensional plane. The embedding module (22) then processes the initial embedding to generate a planar graph in which no links cross. The planar graph is then simplified by a link encoding module (24) and data representing the lattice structure is generated by a shape encoding module (26)—in which the simplified planar graph is represented by a shape encoding (42) identifying the numbers of links bounding areas defined by the planar graph and data identifying the locations of those areas within the planar graph; and a link list (43) identifying the modifications made to the lattice structure by the link encoding module (24). These encodings are such that the same substructures within a lattice are represented using the same data and hence are suitable for compression using conventional techniques.Type: GrantFiled: October 10, 2003Date of Patent: September 30, 2008Assignee: Canon Kabushiki KaishaInventors: Uwe Helmut Jost, Michael Richard Atkinson
-
Publication number: 20080235007Abstract: A method and system for speaker recognition and identification includes transforming features of a speaker utterance in a first condition state to match a second condition state and provide a transformed utterance. A discriminative criterion is used to generate a transform that maps an utterance to obtain a computed result. The discriminative criterion is maximized over a plurality of speakers to obtain a best transform for recognizing speech and/or identifying a speaker under the second condition state. Speech recognition and speaker identity may be determined by employing the best transform for decoding speech to reduce channel mismatch.Type: ApplicationFiled: June 3, 2008Publication date: September 25, 2008Inventors: Jiri Navratil, Jagon Pelecanos, Ganesh N. Ramaswamy
-
Patent number: 7424426Abstract: An object of the present invention is to facilitate dealing with noisy speech with varying SNR and save calculation costs by generating a speech model with a single-tree-structure and using the model for speech recognition. Every piece of noise data stored in a noise database is used under every SNR condition to calculate the distance between all noise models with the SNR conditions and the noise-added speech is clustered. Based on the result of the clustering, a single-tree-structure model space into which the noise and SNR are integrated is generated (steps S1 to S5). At a noise extraction step (step S6), inputted noisy speech to be recognized is analyzed to extract a feature parameter string and the likelihoods of HMMs are compared one another to select an optimum model from the tree-structure noisy speech model space (step S7). Linear transformation is applied to the selected noisy speech model space so that the likelihood is maximized (step S8).Type: GrantFiled: August 18, 2004Date of Patent: September 9, 2008Assignee: Sadaoki Furui and NTT DoCoMo, Inc.Inventors: Sadaoki Furui, Zhipeng Zhang, Tsutomu Horikoshi, Toshiaki Sugimura
-
Patent number: 7424425Abstract: In detection systems, such as speaker verification systems, for a given operating point range, with an associated detection “cost”, the detection cost is preferably reduced by essentially trading off the system error in the area of interest with areas essentially “outside” that interest. Among the advantages achieved thereby are higher optimization gain and better generalization. From a measurable Detection Error Tradeoff (DET) curve of the given detection system, a criterion is preferably derived, such that its minimization provably leads to detection cost reduction in the area of interest. The criterion allows for selective access to the slope and offset of the DET curve (a line in case of normally distributed detection scores, a curve approximated by mixture of Gaussians in case of other distributions). By modifying the slope of the DET curve, the behavior of the detection system is changed favorably with respect to the given area of interest.Type: GrantFiled: May 19, 2002Date of Patent: September 9, 2008Assignee: International Business Machines CorporationInventors: Jiri Navratil, Ganesh N. Ramaswamy
-
Publication number: 20080208581Abstract: A system and method for speaker recognition speaker modelling whereby prior speaker information is incorporated into the modelling process, utilising the maximum a posteriori (MAP) algorithm and extending it to contain prior Gaussian component correlation information. Firstly a background model (10) is estimated. Pooled acoustic reference data (11) relating to a specific demographic of speakers (population of interest) from a given total population is then trained via the Expectation Maximization (EM) algorithm (12) to produce a background model (13). The background model (13) is adapted utilising information from a plurality of reference speakers (21) in accordance with the Maximum A Posteriori (MAP) criterion (22). Utilizing MAP estimation technique, the reference speaker data and prior information obtained from the background model parameters are combined to produce a library of adapted speaker models, namely Gaussian Mixture Models (23).Type: ApplicationFiled: December 3, 2004Publication date: August 28, 2008Inventors: Jason Pelecanos, Subramanian Sridharan, Robert Vogt
-
Patent number: 7409343Abstract: During a learning phase, a speech recognition device generates parameters of an acceptance voice model relating to a voice segment spoken by an authorized speaker and a rejection voice model. It uses normalization parameters to normalize a speaker verification score depending on the likelihood ratio of a voice segment to be tested and the acceptance model and rejection model. The speaker obtains access to a service application only if the normalized score is above a threshold. According to the invention, a module updates the normalization parameters as a function of the verification score on each voice segment test only if the normalized score is above a second threshold.Type: GrantFiled: July 22, 2003Date of Patent: August 5, 2008Assignee: France TelecomInventor: Delphine Charlet
-
Publication number: 20080154599Abstract: The present invention discloses a system and a method for authenticating a user based upon a spoken password processed though a standard speech recognition engine lacking specialized speaker identification and verification (SIV) capabilities. It should be noted that the standard speech recognition grammar can be capable of acoustically generating speech recognition grammars in accordance with the cross referenced application indicated herein. The invention can prompt a user for a free-form password and can receive a user utterance in response. The utterance can be processed through a speech recognition engine (e.g., during a grammar enrollment operation) to generate an acoustic baseform. Future user utterances can be matched against the acoustic baseform. Results from the future matches can be used to determine whether to grant the user access to a secure resource.Type: ApplicationFiled: June 26, 2007Publication date: June 26, 2008Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Brien H. Muschett, Julia A. Parker