Specialized Models Patents (Class 704/250)

Speech recognition device and method

Patent number: 7689414

Abstract: In a speech recognition device (1) for recognizing text information (TI) corresponding to speech information (SI), wherein speech information (SI) can be characterized in respect of language properties, there are firstly provided at least two language-property recognition means (20, 21, 22, 23), each of the language-property recognition means (20, 21, 22, 23) being arranged, by using the speech information (SI), to recognize a language property assigned to said means and to generate property information (ASI, LI, SGI, CI) representing the language property that is recognized, and secondly there are provided speech recognition means (24) that, while continuously taking into account the at least two items of property information (ASI, LI, SGI, CI), are arranged to recognize the text information (TI) corresponding to the speech information (SI).

Type: Grant

Filed: October 31, 2003

Date of Patent: March 30, 2010

Assignee: Nuance Communications Austria GmbH

Inventor: Zsolt Saffer
Method and system for non-intrusive speaker verification using behavior models

Patent number: 7689418

Abstract: A system and method for verifying user identity, in accordance with the present invention, includes a conversational system for receiving inputs from a user and transforming the inputs into formal commands. A behavior verifier is coupled to the conversational system for extracting features from the inputs. The features include behavior patterns of the user. The behavior verifier is adapted to compare the input behavior to a behavior model to determine if the user is authorized to interact with the system.

Type: Grant

Filed: September 12, 2002

Date of Patent: March 30, 2010

Assignee: Nuance Communications, Inc.

Inventors: Ganesh N. Ramaswamy, Upendra V. Chaudhari
Personalizing a context-free grammar using a dictation language model

Patent number: 7689420

Abstract: Architecture for integrating and generating back-off grammars (BOG) in a speech recognition application for recognizing out-of-grammar (OOG) utterances and updating the context-free grammars (CFG) with the results. A parsing component identifies keywords and/or slots from user utterances and a grammar generation component adds filler tags before and/or after the keywords and slots to create new grammar rules. The BOG can be generated from these new grammar rules and can be used to process the OOG user utterances. By processing the OOG user utterances through the BOG, the architecture can recognize and perform the intended task on behalf of the user.

Type: Grant

Filed: April 6, 2006

Date of Patent: March 30, 2010

Assignee: Microsoft Corporation

Inventors: Timothy S. Paek, David M. Chickering, Eric Norman Badger, Qiang Wu
User adaptive system and control method thereof

Patent number: 7684977

Abstract: In an interface unit, an input section obtains an input signal of user's speech or the like and an input processing section processes the input signal and detects information relating to the user. On the basis of the detection result, a response contents determination section determines response contents to the user. While, a response manner adjusting section adjusts a response manner to the user, such as speech speed and the like, on the basis of the processing state of the input signal, the information relating to the user detected from the input signal, and the like.

Type: Grant

Filed: June 8, 2006

Date of Patent: March 23, 2010

Assignee: Panasonic Corporation

Inventor: Koji Morikawa
Method for Creating a Speech Model

Publication number: 20100070278

Abstract: A transformation can be derived which would represent that processing required to convert a male speech model to a female speech model. That transformation is subjected to a predetermined modification, and the modified transformation is applied to a female speech model to produce a synthetic children's speech model. The male and female models can be expressed in terms of a vector representing key values defining each speech model and the derived transformation can be in the form of a matrix that would transform the vector of the male model to the vector of the female model. The modification to the derived matrix comprises applying an exponential p which has a value greater than zero and less than 1.

Type: Application

Filed: September 12, 2008

Publication date: March 18, 2010

Inventors: Andreas Hagen, Bryan Peltom, Kadri Hacioglu
Voice processing apparatus

Patent number: 7672844

Abstract: A voice processing apparatus for performing voiceprint recognition processing with high accuracy even in the case where a plurality of conference participants speak at a time in a conference; wherein a bi-directional telephonic communication portion receives as an input respective voice signals from a plurality of microphones, selects one microphone based on the input voice signals, and outputs a voice signal from the microphone; a voiceprint recognition portion 322 performs voiceprint recognition based on the input voice signal in voiceprint recognizable period, and stores voiceprint data successively in a buffer; and a CPU takes out voiceprint data successively from the buffer, checking against voiceprint data stored in a voiceprint register, specifies a speaker, and processes the voice signal output from the bi-directional telephonic communication portion by associating the same with the speaker.

Type: Grant

Filed: August 3, 2004

Date of Patent: March 2, 2010

Assignee: Sony Corporation

Inventors: Akira Masuda, Yoshitaka Abe, Hideharu Fujiyama
Individualization of voice output by matching synthesized voice target voice

Patent number: 7664645

Abstract: The voice of a synthesized voice output is individualized and matched to a user voice, the voice of a communication partner or the voice of a famous personality. In this way mobile terminals in particular can be originally individualized and text messages can be read out using a specific voice.

Type: Grant

Filed: March 11, 2005

Date of Patent: February 16, 2010

Assignee: SVOX AG

Inventors: Horst-Udo Hain, Klaus Lukas
Speaker recognition method based on structured speaker modeling and a scoring technique

Patent number: 7657432

Abstract: A technique for improved score calculation and normalization in a framework of recognition with phonetically structured speaker models. The technique involves determining, for each frame and each level of phonetic detail of a target speaker model, a non-interpolated likelihood value, and then resolving the at least one likelihood value to obtain a likelihood score.

Type: Grant

Filed: October 31, 2007

Date of Patent: February 2, 2010

Assignee: Nuance Communications, Inc.

Inventors: Upendra V. Chaudhari, Stephane H. Maes, Jiri Navratil
System and method of speech recognition for non-native speakers of a language

Patent number: 7640159

Abstract: An accent compensative speech recognition system and related methods for use with a signal processor generating one or more feature vectors based upon a voice-induced electrical signal are provided. The system includes a first-language acoustic module that determines a first-language phoneme sequence based upon one or more feature vectors, and a second-language lexicon module that determines a second-language speech segment based upon the first-language phoneme sequence. A method aspect includes the steps of generating a first-language phoneme sequence from at least one feature vector based upon a first-language phoneme model, and determining a second-language speech segment from the first-language phoneme sequence based upon a second-language lexicon model.

Type: Grant

Filed: July 22, 2004

Date of Patent: December 29, 2009

Assignee: Nuance Communications, Inc.

Inventor: David E. Reich
Speaker Characterization Through Speech Analysis

Publication number: 20090313018

Abstract: A computer implemented method, data processing system, apparatus and computer program product for determining current behavioral, psychological and speech styles characteristics of a speaker in a given situation and context, through analysis of current speech utterances of the speaker. The analysis calculates different prosodic parameters of the speech utterances, consisting of unique secondary derivatives of the primary pitch and amplitude speech parameters, and compares these parameters with pre-obtained reference speech data, indicative of various behavioral, psychological and speech styles characteristics. The method includes the formation of the classification speech parameters reference database, as well as the analysis of the speaker's speech utterances in order to determine the current behavioral, psychological and speech styles characteristics of the speaker in the given situation.

Type: Application

Filed: June 17, 2008

Publication date: December 17, 2009

Inventors: Yoav Degani, Yishai Zamir
Palette-based classifying and synthesizing of auditory information

Patent number: 7634405

Abstract: The subject invention leverages spectral “palettes” or representations of an input sequence to provide recognition and/or synthesizing of a class of data. The class can include, but is not limited to, individual events, distributions of events, and/or environments relating to the input sequence. The representations are compressed versions of the data that utilize a substantially smaller amount of system resources to store and/or manipulate. Segments of the palettes are employed to facilitate in reconstruction of an event occurring in the input sequence. This provides an efficient means to recognize events, even when they occur in complex environments. The palettes themselves are constructed or “trained” utilizing any number of data compression techniques such as, for example, epitomes, vector quantization, and/or Huffman codes and the like.

Type: Grant

Filed: January 24, 2005

Date of Patent: December 15, 2009

Assignee: Microsoft Corporation

Inventors: Sumit Basu, Nebojsa Jojic, Ashish Kapoor
VOICE RECOGNITION APPARATUS AND METHOD THEREOF

Publication number: 20090299744

Abstract: A voice recognition apparatus determines whether an input sound is a voice segment or a non-voice segment in time series, generates a word model for the voice segment, allocates a predetermined non-voice model for the non-voice segment, connects the word model and the non-voice model in sequence according to the time series of the segments of the input sound corresponding to the respective models and generates a vocalization model, and coordinates the vocalization model with a vocalization ID in one-to-one correspondence, and stores the same.

Type: Application

Filed: April 14, 2009

Publication date: December 3, 2009

Applicant: KABUSHIKI KAISHA TOSHIBA

Inventor: Mitsuyoshi TACHIMORI
Hidden conditional random field models for phonetic classification and speech recognition

Patent number: 7627473

Abstract: A method and apparatus are provided for training and using a hidden conditional random field model for speech recognition and phonetic classification. The hidden conditional random field model uses feature functions, at least one of which is based on a hidden state in a phonetic unit. Values for the feature functions are determined from a segment of speech, and these values are used to identify a phonetic unit for the segment of speech.

Type: Grant

Filed: October 15, 2004

Date of Patent: December 1, 2009

Assignee: Microsoft Corporation

Inventors: Asela J. Gunawardana, Milind Mahajan, Alejandro Acero
Spoken man-machine interface with speaker identification

Patent number: 7620547

Abstract: The present invention provides a method for operating and/or for controlling a man-machine interface unit (MMI) for a finite user group environment. Utterances out of a group of user are repeatedly received. A process of user identification is carried out based on said received utterances. The process of user identification comprises a set of clustering so as to enable an enrolment-free performance.

Type: Grant

Filed: January 24, 2005

Date of Patent: November 17, 2009

Assignee: Sony Deutschland GmbH

Inventors: Ralf Kompe, Thomas Kemp
Speaker identifying apparatus and computer program product

Patent number: 7617102

Abstract: A speaker identifying apparatus includes: a module for performing a principal component analysis on predetermined vocal tract geometrical parameters of a plurality of speakers and calculating an average and principal component vectors representing speaker-dependent variation; a module for performing acoustic analysis on the speech data being uttered for each of the speakers to calculate cepstrum coefficients; a module for calculating principal component coefficients for approximating the vocal tract geometrical parameter of each of the plurality of speakers by a linear sum of principal component coefficients; a module for determining, by multiple regression analysis, a coefficient sequence for estimating principal component coefficients by a linear sum of the plurality of prescribed features, for each of the plurality of speakers; a module for calculating a plurality of features from speech data of the speaker to be identified, and estimating principal component coefficients for calculating the vocal tract ge

Type: Grant

Filed: September 27, 2006

Date of Patent: November 10, 2009

Assignee: Advanced Telecommunications Research Institute International

Inventors: Parham Mokhtari, Tatsuya Kitamura, Hironori Takemoto, Seiji Adachi, Kiyoshi Honda
METHOD AND APPARATUS FOR SPEECH RECOGNITION

Publication number: 20090259469

Abstract: A method and apparatus for performing speech recognition receives an audio signal, generates a sequence of frames of the audio signal, transforms each frame of the audio signal into a set of narrow band feature vectors using a narrow passband, couples the narrow band feature vectors to a speech model, and determines whether the audio signal is a wide band signal. When the audio signal is determined to be a wide band signal, a pass band parameter of each of one or more passbands that are outside the narrow passband is generated for each frame and the one or more band energy parameters are coupled to the speech model.

Type: Application

Filed: April 14, 2008

Publication date: October 15, 2009

Applicant: MOTOROLA, INC.

Inventors: Changxue Ma, Yuan-Jun Wei
System, method and computer program product for verifying an identity using voiced to unvoiced classifiers

Patent number: 7603275

Abstract: Embodiments of a system, method and computer program product for verifying an identity claimed by a claimant using voiced to unvoiced classifiers are described. In accordance with one embodiment, a speech sample from a claimant claiming an identity may be captured. From the speech sample, a ratio of unvoiced frames to a total number of frames in the speech sample may be calculated. An equal error rate value corresponding to the speech sample can then be determined based on the calculated ratio. The determined equal error rate value corresponding to the speech sample may be compared to an equal error rate value associated with the claimed identity in order to select a decision threshold. A match score may be also be generated based on a comparison of the speech sample to a voice sample associated with the claimed identity. A decision whether to accept the identity claim of the claimant can then be made based on a comparison of the match score to the decision threshold.

Type: Grant

Filed: October 31, 2005

Date of Patent: October 13, 2009

Assignee: Hitachi, Ltd.

Inventor: Clifford Tavares
Speaker clustering and adaptation method based on the HMM model variation information and its apparatus for speech recognition

Patent number: 7590537

Abstract: A speech recognition method and apparatus perform speaker clustering and speaker adaptation using average model variation information over speakers while analyzing the quantity variation amount and the directional variation amount. In the speaker clustering method, a speaker group model variation is generated based on the model variation between a speaker-independent model and a training speaker ML model. In the speaker adaptation method, the model in which the model variation between a test speaker ML model and a speaker group ML model to which the test speaker belongs which is most similar to a training speaker group model variation is found, and speaker adaptation is performed on the found model. Herein, the model variation in the speaker clustering and the speaker adaptation are calculated while analyzing both the quantity variation amount and the directional variation amount. The present invention may be applied to any speaker adaptation algorithm of MLLR and MAP.

Type: Grant

Filed: December 27, 2004

Date of Patent: September 15, 2009

Assignee: Samsung Electronics Co., Ltd.

Inventors: Namhoon Kim, Injeong Choi, Yoonkyung Song
Speaker selection training via a-posteriori Gaussian mixture model analysis, transformation, and combination of hidden Markov models

Patent number: 7574359

Abstract: The present invention is directed to a 3-stage adaptation framework based on speaker selection training. First a subset of cohort speakers is selected for a test speaker. Then cohort models are transformed to be closer to the test speaker. Finally the adapted model for the test speaker is obtained by combining these transformed cohort models. Combination weights as well as bias items can be adaptively learned from adaptation data.

Type: Grant

Filed: October 1, 2004

Date of Patent: August 11, 2009

Assignee: Microsoft Corporation

Inventor: Chao Huang
Low latency real-time vocal tract length normalization

Patent number: 7567903

Abstract: A method and apparatus for performing speech recognition are provided. A Vocal Tract Length Normalized acoustic model for a speaker is generated from training data. Speech recognition is performed on a first recognition input to determine a first best hypothesis. A first Vocal Tract Length Normalization factor is estimated based on the first best hypothesis. Speech recognition is performed on a second recognition input using the Vocal Tract Length Normalized acoustic model to determine an other best hypothesis. An other Vocal Tract Length Normalization factor is estimated based on the other best hypothesis and at least one previous best hypothesis.

Type: Grant

Filed: January 12, 2005

Date of Patent: July 28, 2009

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Vincent Goffin, Andrej Ljolje, Murat Saraclar
SPEECH RECOGNITION DEVICE AND METHOD THEREOF

Publication number: 20090182561

Abstract: A speech recognition device and a method thereof are adapted to recognize a Chinese word. The speech recognition device includes a lexicon model, a language model, a speech recognition module, and a parsing module. The lexicon model keeps a plurality of words. The speech recognition module performs a speech recognition processing on a voice signal conforming to a syntax structure of Chinese word description. The speech recognition processing searches words related to the Chinese word description from the lexicon model according to a feature of the Chinese word description, and produces a literal word series in digital data form by referring a syntax combination probability. The language model based on the syntax structure of Chinese word description provides the syntax combination probability according to combination relations between the searched words. The parsing module analyzes the syntax structure of the literal word series for retrieving the Chinese word.

Type: Application

Filed: September 16, 2008

Publication date: July 16, 2009

Applicant: DELTA ELECTRONICS, INC.

Inventors: Liang-Sheng Huang, Chao-Jen Huang, Jia-Lin Shen
Distributed pattern recognition training method and system

Patent number: 7562015

Abstract: A distributed pattern recognition training method includes providing data communication between at least one central pattern analysis node and a plurality of peripheral data analysis sites. The method also includes communicating from the at least one central pattern analysis node to the plurality of peripheral data analysis a plurality of kernel-based pattern elements. The method further includes performing a plurality of iterations of pattern template training at each of the plurality of peripheral data analysis sites.

Type: Grant

Filed: July 14, 2005

Date of Patent: July 14, 2009

Assignee: Aurilab, LLC

Inventor: James K. Baker
METHOD FOR ASSESSING PRONUNCIATION ABILITIES

Publication number: 20090171661

Abstract: Techniques for assessing pronunciation abilities of a user are provided. The techniques include recording a sentence spoken by a user, performing a classification of the spoken sentence, wherein the classification is performed with respect to at least one N-ordered class, and wherein the spoken sentence is represented by a set of at least one acoustic feature extracted from the spoken sentence, and determining a score based on the classification, wherein the score is used to determine an optimal set of at least one question to assess pronunciation ability of the user without human intervention.

Type: Application

Filed: June 27, 2008

Publication date: July 2, 2009

Applicant: International Business Machines Corporation

Inventors: Jayadeva, Sachindra Joshi, Himanshu Pant, Ashish Verma
Noise adaptation system of speech model, noise adaptation method, and noise adaptation program for speech recognition

Patent number: 7552049

Abstract: An object of the present invention is to enable optimal clustering for many types of noise data and to improve the accuracy of estimation of a speech model sequence of input speech. Noise is added to speech in accordance with noise-to-signal ratio conditions to generate noise-added speech (step S1), the mean value of speech cepstral is subtracted from the generated, noise-added speech (step 2), a Gaussian distribution model of each piece of noise-added speech is created (step S3), the likelihoods of the pieces of noise-added speech are calculated to generate a likelihood matrix (step S4) to obtain a clustering result. An optimum model is selected (step S7) and linear transformation is performed to provide a maximized likelihood (step S8). Because noise-added speech is consistently used both in clustering and model learning, clustering for many types of noise data and an accurate estimation of a speech model sequence can be achieved.

Type: Grant

Filed: March 10, 2004

Date of Patent: June 23, 2009

Assignees: NTT DoCoMo, Inc., Sadaoki Furui

Inventors: Zhipeng Zhang, Kiyotaka Otsuji, Toshiaki Sugimura, Sadaoki Furui
Restoration of high-order Mel Frequency Cepstral Coefficients

Publication number: 20090144058

Abstract: A method for estimating high-order Mel Frequency Cepstral Coefficients, the method comprising initializing any of N-L high-order coefficients (HOC) of an MFCC vector of length N having L low-order coefficients (LOC) to a predetermined value, thereby forming a candidate MFCC vector, synthesizing a speech signal frame from the candidate MFCC vector and a pitch value, and computing an N-dimensional MFCC vector from the synthesized frame, thereby producing an output MFCC vector.

Type: Application

Filed: December 3, 2007

Publication date: June 4, 2009

Inventor: Alexander Sorin
System and method for maintaining a speech-recognition grammar

Patent number: 7542904

Abstract: A method for distributing voice-recognition grammars includes receiving match data from a first remote element. The match data includes information associated with an attempt by the remote element to match received audio information to first stored audio data. The method also includes generating a grammar entry based on the match data. The grammar entry includes second stored audio data and a word identifier associated with the second stored audio data. Additionally, the method includes transmitting the grammar entry to a second remote element.

Type: Grant

Filed: August 19, 2005

Date of Patent: June 2, 2009

Assignee: Cisco Technology, Inc.

Inventors: Kevin L. Chestnut, Joseph B. Burton
SPEAKER RECOGNITION SYSTEM

Publication number: 20090119103

Abstract: A method automatically recognizes speech received through an input. The method accesses one or more speaker-independent speaker models. The method detects whether the received speech input matches a speaker model according to an adaptable predetermined criterion. The method creates a speaker model assigned to a speaker model set when no match occurs based on the input.

Type: Application

Filed: October 10, 2008

Publication date: May 7, 2009

Inventors: Franz Gerl, Tobias Herbig
Voice-based multimodal speaker authentication using adaptive training and applications thereof

Patent number: 7529669

Abstract: A voice based multimodal speaker authentication method and telecommunications application thereof employing a speaker adaptive method for training phenome specific Gaussian mixture models. Applied to telecommunications services, the method may advantageously be implemented in contemporary wireless terminals.

Type: Grant

Filed: June 13, 2007

Date of Patent: May 5, 2009

Assignee: NEC Laboratories America, Inc.

Inventors: Srivaths Ravi, Anand Raghunathan, Srimat Chakradhar, Karthik Nandakumar
SPEAKER MODEL REGISTERING APPARATUS AND METHOD, AND COMPUTER PROGRAM

Publication number: 20090106025

Abstract: EN) A speaker recognition system (1) includes a speaker model registration device (10) which registers a speaker model for speaker recognition in the speaker recognition system. The speaker model registration device includes acquisition means (13) for acquiring utterances by n+? times (wherein n is an integer not smaller than 2 and ? is an integer not smaller than 1); calculation means (20) for calculating a speaker model by using the acquired utterances of n times as utterances for registration; correlation means (30) for correlating the calculated speaker model by using the acquired utterances of ? times as correlation utterances; and registration means (40) for registering those having the correlation result satisfying a predetermined reference among the correlated speaker models, as the speaker model for speaker recognition.

Type: Application

Filed: March 16, 2007

Publication date: April 23, 2009

Applicant: PIONEER CORPORATION

Inventor: Soichi Toyama
System and method for augmenting spoken language understanding by correcting common errors in linguistic performance

Patent number: 7505906

Abstract: A method and system for automatic speech recognition are disclosed. The method comprises receiving speech from a user, the speech including at least one speech error, increasing the probabilities of closely related words to the at least one speech error and processing the received speech using the increased probabilities. A corpora of data having common words that are mis-stated is used to identify and increase the probabilities of related words. The method applies to at least the automatic speech recognition module and the spoken language understanding module.

Type: Grant

Filed: February 26, 2004

Date of Patent: March 17, 2009

Assignee: AT&T Intellectual Property, II

Inventors: Steven H. Lewis, Kenneth H. Rosen
Voice Processing Device and Program

Publication number: 20090063146

Abstract: In a voice processing device, a male voice index calculator calculates a male voice index indicating a similarity of the input sound relative to a male speaker sound model. A female voice index calculator calculates a female voice index indicating a similarity of the input sound relative to a female speaker sound model. A first discriminator discriminates the input sound between a non-human-voice sound and a human voice sound which may be either of the male voice sound or the female voice sound. A second discriminator discriminates the input sound between the male voice sound and the female voice sound based on the male voice index and the female voice index in case that the first discriminator discriminates the human voice sound.

Type: Application

Filed: August 26, 2008

Publication date: March 5, 2009

Applicant: Yamaha Corporation

Inventor: Yasuo Yoshioka
Method and apparatus for the automatic separating and indexing of multi-speaker conversations

Patent number: 7496510

Abstract: Disclosed are a method and apparatus for processing a continuous audio stream containing human speech in order to locate a particular speech-based transaction in the audio stream, applying both known speaker recognition and speech recognition techniques. Only the utterances of a particular predetermined speaker are transcribed thus providing an index and a summary of the underlying dialogue(s). In a first scenario, an incoming audio stream, e.g. a speech call from outside, is scanned in order to detect audio segments of the predetermined speaker. These audio segments are then indexed and only the indexed segments are transcribed into spoken or written language. In a second scenario, two or more speakers located in one room are using a multi-user speech recognition system (SRS). For each user there exists a different speaker model and optionally a different dictionary or vocabulary of words already known or trained by the speech or voice recognition system.

Type: Grant

Filed: November 30, 2001

Date of Patent: February 24, 2009

Assignee: International Business Machines Corporation

Inventors: Joachim Frank, Werner Kriechbaum, Gerhard Stenzel
Method and apparatus for dynamic beam control in Viterbi search

Patent number: 7493258

Abstract: A method is presented including selecting an initial beam width. The method also includes determining whether a value per frame is changing. A beam width is dynamically adjusted. The method further decides a speech input with the dynamically adjusted beam width. Also, a device is presented including a processor (420). A speech recognition component (610) is connected to the processor (420). A memory (410) is connected to the processor (420). The speech recognition component (610) dynamically adjusts a beam width to decode a speech input.

Type: Grant

Filed: July 3, 2001

Date of Patent: February 17, 2009

Assignee: Intel Corporation

Inventors: Alexandr A. Kibkalo, Vyacheslav A. Barannikov
Method and apparatus for parsing text using mutual information

Patent number: 7475006

Abstract: A method and parser are provided that generate a score for a node identified during a parse of a text segment. The score is based on a mutual information score that measures the mutual information between a phrase level for the node and a word class of at least one word in the text segment.

Type: Grant

Filed: July 11, 2001

Date of Patent: January 6, 2009

Assignee: Microsoft Corporation, Inc.

Inventor: David N. Weise
Method and system for tracking signal sources with wrapped-phase hidden markov models

Patent number: 7475014

Abstract: A method models trajectories of a signal source. Training signals generated by a signal source moving along known trajectories are acquired by each sensor in an array of sensors. Phase differences between all unique pairs of the training signals are determined. A wrapped-phase hidden Markov model is constructed from the phase differences. The wrapped-phase hidden Markov model includes multiple Gaussian distributions to model the known trajectories of the signal source.

Type: Grant

Filed: July 25, 2005

Date of Patent: January 6, 2009

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Paris Smaragdis, Petros Boufounos
Telephone pathology assessment

Patent number: 7457753

Abstract: A system for remote assessment of a user is disclosed. The system comprises application software resident on a server and arranged to interact across a network with a user operating a client device to obtain one or more sample signals of the user's speech. A datastore is arranged to store the user speech samples in association with details of the user. A feature extraction engine is arranged to extract one or more first features from respective speech samples. A comparator is arranged to compare the first features extracted from a speech sample with second features extracted from one or more reference samples and to provide a measure of any differences between the first and second features for assessment of the user.

Type: Grant

Filed: June 29, 2005

Date of Patent: November 25, 2008

Assignee: University College Dublin National University of Ireland

Inventors: Rosalyn Moran, Richard Reilly, Philip De Chazal, Brian O'Mullane, Peter Lacy
Discriminative training for speaker and speech verification

Patent number: 7454339

Abstract: A method for discriminatively training acoustic models is provided for automated speaker verification (SV) and speech (or utterance) verification (UV) systems.

Type: Grant

Filed: December 20, 2005

Date of Patent: November 18, 2008

Assignee: Panasonic Corporation

Inventors: Chaojun Liu, David Kryze, Luca Rigazio
VOICE RECOGNITION APPARATUS AND METHOD

Publication number: 20080281595

Abstract: According to an embodiment, voice recognition apparatus includes units of: acoustic processing, voice interval detecting, dictionary, collating, search target selecting, storing and determining, and voice recognition method includes processes of: selecting a search range on basis of abeam search, setting and storing a standard frame, storing an output probability of a certain transition path, determining whether or not the output probability of a certain path is stored. Number of times of calculation of the output probability is reduced by selecting the search range on basis of the beam search, calculating the output probability of the certain transition path only once in an interval from when the standard frame is set to when the standard frame is renewed, and storing and using thus calculated value as an approximate value of the output probability in subsequent frames.

Type: Application

Filed: March 30, 2007

Publication date: November 13, 2008

Applicant: KABUSHIKI KAISHA TOSHIBA

Inventors: Masaru Sakai, Shinichi Tanaka
Method and apparatus for training a text independent speaker recognition system using speech data with text labels

Patent number: 7447633

Abstract: There is provided an apparatus for providing a Text Independent (TI) speaker recognition mode in a Text Dependent (TD) Hidden Markov Model (HMM) speaker recognition system and/or a Text Constrained (TC) HMM speaker recognition system. The apparatus includes a Gaussian Mixture Model (GMM) generator and a Gaussian weight normalizer. The GMM generator is for creating a GMM by pooling Gaussians from a plurality of HMM states. The Gaussian weight normalizer is for normalizing Gaussian weights with respect to the plurality of HMM states.

Type: Grant

Filed: November 22, 2004

Date of Patent: November 4, 2008

Assignee: International Business Machines Corporation

Inventors: Jiri Navratil, James H. Nealand, Jason W. Pelecanos, Ganesh N. Ramaswamy, Ran D. Zilca
METHOD AND SYSTEM TO IMPROVE SPEAKER VERIFICATION ACCURACY BY DETECTING REPEAT IMPOSTERS

Publication number: 20080270132

Abstract: A system and method for identifying an individual includes collecting biometric information for an individual attempting to gain access to a system. The biometric information for the individual is scored against pre-trained imposter models. If a score is greater than a threshold, the individual as an imposter is identified as an imposter. Other systems and methods are also disclosed.

Type: Application

Filed: June 3, 2008

Publication date: October 30, 2008

Inventors: Jari Navratil, Ganesh N. Ramaswamy, Ran D. Zilca
Voice recognition system and method

Publication number: 20080255843

Abstract: The invention provides a method of voice recognition, and the method includes the steps of: obtaining a current position information, obtaining a current voice model according to the current position information; and performing voice recognition according to the current voice model. Particularly, the current position information can be obtained according to network address information, or by a global positioning system.

Type: Application

Filed: April 10, 2008

Publication date: October 16, 2008

Inventors: Yu-Chen Sun, Chang-Hung Lee
Methods and apparatus for the systematic adaptation of classification systems from sparse adaptation data

Patent number: 7437289

Abstract: Methods and apparatus for the rapid adaptation of classification systems using small amounts of adaptation data. Improvements in classification accuracy are attainable when conditions similar to those that present in adaptation are observed. The attendant methods and apparatus are suitable for a wide variety of different classification schemes, including, e.g., speaker identification and speaker verification.

Type: Grant

Filed: August 16, 2001

Date of Patent: October 14, 2008

Assignee: International Business Machines Corporation

Inventors: Upendra V. Chaudhari, Stephane H. Maes, Jiri Navratil
METHOD AND APPARATUS FOR SPEECH SPEAKER RECOGNITION

Publication number: 20080249774

Abstract: Disclosed is a method for speech speaker recognition of a speech speaker recognition apparatus, the method including detecting effective speech data from input speech; extracting an acoustic feature from the speech data; generating an acoustic feature transformation matrix from the speech data according to each of Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA), mixing each of the acoustic feature transformation matrixes to construct a hybrid acoustic feature transformation matrix, and multiplying the matrix representing the acoustic feature with the hybrid acoustic feature transformation matrix to generate a final feature vector; and generating a speaker model from the final feature vector, comparing a pre-stored universal speaker model with the generated speaker model to identify the speaker, and verifying the identified speaker.

Type: Application

Filed: April 2, 2008

Publication date: October 9, 2008

Applicants: SAMSUNG ELECTRONICS CO., LTD., ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE

Inventors: Hyun-Soo KIM, Myeong-gi Jeong, Hyun-Sik Shim, Young-Hee Park, Ha-Jin Yoo, Guen-Chang Kwak, Hye-Jin Kim, Kyung-Sook Bae
Lattice encoding

Patent number: 7430509

Abstract: Initially an embedding module (22) determines an embedding of a lattice in a two-dimensional plane. The embedding module (22) then processes the initial embedding to generate a planar graph in which no links cross. The planar graph is then simplified by a link encoding module (24) and data representing the lattice structure is generated by a shape encoding module (26)—in which the simplified planar graph is represented by a shape encoding (42) identifying the numbers of links bounding areas defined by the planar graph and data identifying the locations of those areas within the planar graph; and a link list (43) identifying the modifications made to the lattice structure by the link encoding module (24). These encodings are such that the same substructures within a lattice are represented using the same data and hence are suitable for compression using conventional techniques.

Type: Grant

Filed: October 10, 2003

Date of Patent: September 30, 2008

Assignee: Canon Kabushiki Kaisha

Inventors: Uwe Helmut Jost, Michael Richard Atkinson
SYSTEM AND METHOD FOR ADDRESSING CHANNEL MISMATCH THROUGH CLASS SPECIFIC TRANSFORMS

Publication number: 20080235007

Abstract: A method and system for speaker recognition and identification includes transforming features of a speaker utterance in a first condition state to match a second condition state and provide a transformed utterance. A discriminative criterion is used to generate a transform that maps an utterance to obtain a computed result. The discriminative criterion is maximized over a plurality of speakers to obtain a best transform for recognizing speech and/or identifying a speaker under the second condition state. Speech recognition and speaker identity may be determined by employing the best transform for decoding speech to reduce channel mismatch.

Type: Application

Filed: June 3, 2008

Publication date: September 25, 2008

Inventors: Jiri Navratil, Jagon Pelecanos, Ganesh N. Ramaswamy
Noise adaptation system of speech model, noise adaptation method, and noise adaptation program for speech recognition

Patent number: 7424426

Abstract: An object of the present invention is to facilitate dealing with noisy speech with varying SNR and save calculation costs by generating a speech model with a single-tree-structure and using the model for speech recognition. Every piece of noise data stored in a noise database is used under every SNR condition to calculate the distance between all noise models with the SNR conditions and the noise-added speech is clustered. Based on the result of the clustering, a single-tree-structure model space into which the noise and SNR are integrated is generated (steps S1 to S5). At a noise extraction step (step S6), inputted noisy speech to be recognized is analyzed to extract a feature parameter string and the likelihoods of HMMs are compared one another to select an optimum model from the tree-structure noisy speech model space (step S7). Linear transformation is applied to the selected noisy speech model space so that the likelihood is maximized (step S8).

Type: Grant

Filed: August 18, 2004

Date of Patent: September 9, 2008

Assignee: Sadaoki Furui and NTT DoCoMo, Inc.

Inventors: Sadaoki Furui, Zhipeng Zhang, Tsutomu Horikoshi, Toshiaki Sugimura
Optimization of detection systems using a detection error tradeoff analysis criterion

Patent number: 7424425

Abstract: In detection systems, such as speaker verification systems, for a given operating point range, with an associated detection “cost”, the detection cost is preferably reduced by essentially trading off the system error in the area of interest with areas essentially “outside” that interest. Among the advantages achieved thereby are higher optimization gain and better generalization. From a measurable Detection Error Tradeoff (DET) curve of the given detection system, a criterion is preferably derived, such that its minimization provably leads to detection cost reduction in the area of interest. The criterion allows for selective access to the slope and offset of the DET curve (a line in case of normally distributed detection scores, a curve approximated by mixture of Gaussians in case of other distributions). By modifying the slope of the DET curve, the behavior of the detection system is changed favorably with respect to the given area of interest.

Type: Grant

Filed: May 19, 2002

Date of Patent: September 9, 2008

Assignee: International Business Machines Corporation

Inventors: Jiri Navratil, Ganesh N. Ramaswamy
Model Adaptation System and Method for Speaker Recognition

Publication number: 20080208581

Abstract: A system and method for speaker recognition speaker modelling whereby prior speaker information is incorporated into the modelling process, utilising the maximum a posteriori (MAP) algorithm and extending it to contain prior Gaussian component correlation information. Firstly a background model (10) is estimated. Pooled acoustic reference data (11) relating to a specific demographic of speakers (population of interest) from a given total population is then trained via the Expectation Maximization (EM) algorithm (12) to produce a background model (13). The background model (13) is adapted utilising information from a plurality of reference speakers (21) in accordance with the Maximum A Posteriori (MAP) criterion (22). Utilizing MAP estimation technique, the reference speaker data and prior information obtained from the background model parameters are combined to produce a library of adapted speaker models, namely Gaussian Mixture Models (23).

Type: Application

Filed: December 3, 2004

Publication date: August 28, 2008

Inventors: Jason Pelecanos, Subramanian Sridharan, Robert Vogt
Verification score normalization in a speaker voice recognition device

Patent number: 7409343

Abstract: During a learning phase, a speech recognition device generates parameters of an acceptance voice model relating to a voice segment spoken by an authorized speaker and a rejection voice model. It uses normalization parameters to normalize a speaker verification score depending on the likelihood ratio of a voice segment to be tested and the acceptance model and rejection model. The speaker obtains access to a service application only if the normalized score is above a threshold. According to the invention, a module updates the normalization parameters as a function of the verification score on each voice segment test only if the normalized score is above a second threshold.

Type: Grant

Filed: July 22, 2003

Date of Patent: August 5, 2008

Assignee: France Telecom

Inventor: Delphine Charlet
SPOKEN FREE-FORM PASSWORDS FOR LIGHT-WEIGHT SPEAKER VERIFICATION USING STANDARD SPEECH RECOGNITION ENGINES

Publication number: 20080154599

Abstract: The present invention discloses a system and a method for authenticating a user based upon a spoken password processed though a standard speech recognition engine lacking specialized speaker identification and verification (SIV) capabilities. It should be noted that the standard speech recognition grammar can be capable of acoustically generating speech recognition grammars in accordance with the cross referenced application indicated herein. The invention can prompt a user for a free-form password and can receive a user utterance in response. The utterance can be processed through a speech recognition engine (e.g., during a grammar enrollment operation) to generate an acoustic baseform. Future user utterances can be matched against the acoustic baseform. Results from the future matches can be used to determine whether to grant the user access to a secure resource.

Type: Application

Filed: June 26, 2007

Publication date: June 26, 2008

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Brien H. Muschett, Julia A. Parker

prev … 2 3 4 5 6 7 8 9 next