Specialized Models Patents (Class 704/250)
-
Patent number: 8447607Abstract: A mobile system is provided that includes speech-based and non-speech-based interfaces for telematics applications. The mobile system identifies and uses context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for users that submit requests and/or commands in multiple domains. The invention creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context and presenting the expected results for a particular question or command. The invention may organize domain specific behavior and information into agents, that are distributable or updateable over a wide area network.Type: GrantFiled: June 4, 2012Date of Patent: May 21, 2013Assignee: VoiceBox Technologies, Inc.Inventors: Chris Weider, Richard Kennewick, Mike Kennewick, Philippe Di Cristo, Robert A. Kennewick, Samuel Menaker, Lynn Elise Armstrong
-
Patent number: 8442187Abstract: A security method and system. The method includes receiving by a computing system, a telephone call from a user. The computing system comprises an existing password/passphrase and a pre-recorded voice sample associated with the user. The computing system prompts the user to enter a password/passphrase using speech. The computing system receives speech data comprising a first password/passphrase from the user. The computing system converts the speech data to text data. The computing system first compares the text data to the first password/passphrase and determines a match. The computing system compares the speech data to the pre-recorded voice sample to determine a result indicating whether a frequency spectrum associated with the speech data matches a frequency spectrum associated with the pre-recorded voice sample. The computing system transmits the result to the user.Type: GrantFiled: April 17, 2012Date of Patent: May 14, 2013Assignee: International Business Machines CorporationInventors: Peeyush Jaiswal, Naveen Narayan
-
Patent number: 8442828Abstract: A conditional model is used in spoken language understanding. One such model is a conditional random field model.Type: GrantFiled: March 17, 2006Date of Patent: May 14, 2013Assignee: Microsoft CorporationInventors: Ye-Yi Wang, Alejandro Acero, John Sie Yuen Lee, Milind V. Mahajan
-
Patent number: 8442824Abstract: Device, system, and method of liveness detection using voice biometrics. For example, a method comprises: generating a first matching score based on a comparison between: (a) a voice-print from a first text-dependent audio sample received at an enrollment stage, and (b) a second text-dependent audio sample received at an authentication stage; generating a second matching score based on a text-independent audio sample; and generating a liveness score by taking into account at least the first matching score and the second matching score.Type: GrantFiled: November 25, 2009Date of Patent: May 14, 2013Assignee: Nuance Communications, Inc.Inventors: Almog Aley-Raz, Nir Moshe Krause, Michael Itzhak Salmon, Ran Yehoshua Gazit
-
Patent number: 8442825Abstract: A device for voice identification including a receiver, a segmenter, a resolver, two advancers, a buffer, and a plurality of IIR resonator digital filters where each IIR filter comprises a set of memory locations or functional equivalent to hold filter specifications, a memory location or functional equivalent to hold the arithmetic reciprocal of the filter's gain, a five cell controller array, several multipliers, an adder, a subtractor, and a logical non-shift register. Each cell of the five cell controller array has five logical states, each acting as a five-position single-pole rotating switch that operates in unison with the four others. Additionally, the device also includes an artificial neural network and a display means.Type: GrantFiled: August 16, 2011Date of Patent: May 14, 2013Assignee: The United States of America as Represented by the Director, National Security AgencyInventor: Michael Sinutko
-
Patent number: 8438030Abstract: A method of and system for automated distortion classification. The method includes steps of (a) receiving audio including a user speech signal and at least some distortion associated with the signal; (b) pre-processing the received audio to generate acoustic feature vectors; (c) decoding the generated acoustic feature vectors to produce a plurality of hypotheses for the distortion; and (d) post-processing the plurality of hypotheses to identify at least one distortion hypothesis of the plurality of hypotheses as the received distortion. The system can include one or more distortion models including distortion-related acoustic features representative of various types of distortion and used by a decoder to compare the acoustic feature vectors with the distortion-related acoustic features to produce the plurality of hypotheses for the distortion.Type: GrantFiled: November 25, 2009Date of Patent: May 7, 2013Assignee: General Motors LLCInventors: Gaurav Talwar, Rathinavelu Chengalvarayan
-
Patent number: 8428269Abstract: A spatial audio system for implementing a head-related transfer function (HRTF). A first stage implements a lateral HRTF that reproduces the median frequency response for a sound source located at a particular lateral distance from a listener, and second stage implements a vertical HRTF that reproduces the spectral changes when the vertical distance of a sound source changes relative to the listener. The system improves the vertical localization accuracy provided by an arbitrary measured HRTF by introducing an enhancement factor into the second processing stage. The enhancement factor increases the spectral differentiation between simulated sound sources located at different positions within the same “cone of confusion.Type: GrantFiled: May 20, 2010Date of Patent: April 23, 2013Assignee: The United States of America as represented by the Secretary of the Air ForceInventors: Douglas S. Brungart, Griffin D. Romigh
-
Patent number: 8417525Abstract: A computer-implemented method, system and/or program product update voice prints over time. A receiving computer receives an initial voice print. A determining period of time is calculated for that initial voice print. This determining period of time is a length of time during which an expected degree of change in subsequent voice prints, in comparison to the initial voice print, is predicted to occur. A new voice print is received after the determining period of time has passed, and the new voice print is compared with the initial voice print. In response to a change to the new voice print falling within the expected degree of change in comparison to the initial voice print, a voice print store is updated with the new voice print.Type: GrantFiled: February 9, 2010Date of Patent: April 9, 2013Assignee: International Business Machines CorporationInventors: Sheri Gayle Daye, Peeyush Jaiswal, Fang Wang
-
Patent number: 8411830Abstract: A system, method and computer program product for providing targeted messages to a person using telephony services by generating user profile information from telephony data and using the user profile information to retrieve targeted messages.Type: GrantFiled: November 18, 2011Date of Patent: April 2, 2013Assignee: iCall, Inc.Inventors: Arlo Christopher Gilbert, Andrew Muldowney
-
Patent number: 8412526Abstract: A method for estimating high-order Mel Frequency Cepstral Coefficients, the method comprising initializing any of N?L high-order coefficients (HOC) of an MFCC vector of length N having L low-order coefficients (LOC) to a predetermined value, thereby forming a candidate MFCC vector, synthesizing a speech signal frame from the candidate MFCC vector and a pitch value, and computing an N-dimensional MFCC vector from the synthesized frame, thereby producing an output MFCC vector.Type: GrantFiled: December 3, 2007Date of Patent: April 2, 2013Assignee: Nuance Communications, Inc.Inventor: Alexander Sorin
-
Patent number: 8386253Abstract: Systems, methods, and programs for generating an authorized profile for a text communication device or account, may sample a text communication generated by the text communication device or account during communication and may store the text sample. The systems, methods, and programs may extract a language pattern from the stored text sample and may create an authorized profile based on the language pattern. Systems, methods, and programs for detecting unauthorized use of a text communication device or account may sample a text communication generated by the device or account during communication, may extract a language pattern from the audio sample, and may compare extracted language pattern of the sample with an authorized user profile.Type: GrantFiled: July 13, 2012Date of Patent: February 26, 2013Assignee: AT&T Intellectual Property II, L.P.Inventors: Lee Begeja, Benjamin J. Stern
-
Publication number: 20130046540Abstract: A method for estimating high-order Mel Frequency Cepstral Coefficients, the method comprising initializing any of N-L high-order coefficients (HOC) of an MFCC vector of length N having L low-order coefficients (LOC) to a predetermined value, thereby forming a candidate MFCC vector, synthesizing a speech signal frame from the candidate MFCC vector and a pitch value, and computing an N-dimensional MFCC vector from the synthesized frame, thereby producing an output MFCC vector.Type: ApplicationFiled: December 3, 2007Publication date: February 21, 2013Inventor: Alexander Sorin
-
Patent number: 8379806Abstract: A system and method for representing call content in a searchable database includes transcribing call content to text. The call content is projected to vector space, by creating a vector by indexing the call based on the content and determining a similarity of the call to an atomic-class dictionary. The call is classified in a relational database in accordance with the vector.Type: GrantFiled: August 22, 2008Date of Patent: February 19, 2013Assignee: International Business Machines CorporationInventors: Cheng Wu, Andrzej Sakrajda, Hong-Kwang Jeff Kuo, Vaibhava Goel, David Lubensky
-
Patent number: 8374867Abstract: Disclosed herein are systems, methods, and computer-readable storage media for selecting a speech recognition model in a standardized speech recognition infrastructure. The system receives speech from a user, and if a user-specific supervised speech model associated with the user is available, retrieves the supervised speech model. If the user-specific supervised speech model is unavailable and if an unsupervised speech model is available, the system retrieves the unsupervised speech model. If the user-specific supervised speech model and the unsupervised speech model are unavailable, the system retrieves a generic speech model associated with the user. Next the system recognizes the received speech from the user with the retrieved model. In one embodiment, the system trains a speech recognition model in a standardized speech recognition infrastructure. In another embodiment, the system handshakes with a remote application in a standardized speech recognition infrastructure.Type: GrantFiled: November 13, 2009Date of Patent: February 12, 2013Assignee: AT&T Intellectual Property I, L.P.Inventors: Andrej Ljolje, Bernard S. Renger, Steven Neil Tischer
-
Patent number: 8374869Abstract: An utterance verification method for an isolated word N-best speech recognition result includes: calculating log likelihoods of a context-dependent phoneme and an anti-phoneme model based on an N-best speech recognition result for an input utterance; measuring a confidence score of an N-best speech-recognized word using the log likelihoods; calculating distance between phonemes for the N-best speech-recognized word; comparing the confidence score with a threshold and the distance with a predetermined mean of distances; and accepting the N-best speech-recognized word when the compared results for the confidence score and the distance correspond to acceptance.Type: GrantFiled: August 4, 2009Date of Patent: February 12, 2013Assignee: Electronics and Telecommunications Research InstituteInventors: Jeom Ja Kang, Yunkeun Lee, Jeon Gue Park, Ho-Young Jung, Hyung-Bae Jeon, Hoon Chung, Sung Joo Lee, Euisok Chung, Ji Hyun Wang, Byung Ok Kang, Ki-young Park, Jong Jin Kim
-
Patent number: 8374868Abstract: A method for recognizing speech involves reciting, into a speech recognition system, an utterance including a numeric sequence that contains a digit string including a plurality of tokens and detecting a co-articulation problem related to at least two potentially co-articulated tokens in the digit string. The numeric sequence may be identified using i) a dynamically generated possible numeric sequence that potentially corresponds with the numeric sequence, and/or ii) at least one supplemental acoustic model. Also disclosed herein is a system for accomplishing the same.Type: GrantFiled: August 21, 2009Date of Patent: February 12, 2013Assignee: General Motors LLCInventors: Uma Arun, Sherri J Voran-Nowak, Rathinavelu Chengalvarayan, Gaurav Talwar
-
Patent number: 8355917Abstract: A representation of a speech signal is received and is decoded to identify a sequence of position-dependent phonetic tokens wherein each token comprises a phone and a position indicator that indicates the position of the phone within a syllable.Type: GrantFiled: February 1, 2012Date of Patent: January 15, 2013Assignee: Microsoft CorporationInventors: Peng Liu, Yu Shi, Frank Kao-ping Soong
-
Publication number: 20120278077Abstract: Systems and methods for detecting people or speakers in an automated fashion are disclosed. A pool of features including more than one type of input (like audio input and video input) may be identified and used with a learning algorithm to generate a classifier that identifies people or speakers. The resulting classifier may be evaluated to detect people or speakers.Type: ApplicationFiled: July 11, 2012Publication date: November 1, 2012Applicant: MICROSOFT CORPORATIONInventors: Cha Zhang, Paul A. Viola, Pei Yin, Ross G. Cutler, Xinding Sun, Yong Rui
-
Patent number: 8301450Abstract: An apparatus, method, and medium for dialogue speech recognition using topic domain detection are disclosed. An apparatus includes a forward search module performing a forward search in order to create a word lattice similar to a feature vector, which is extracted from an input voice signal, with reference to a global language model database, a pronunciation dictionary database and an acoustic model database, which have been previously established, a topic-domain-detection module detecting a topic domain by inferring a topic based on meanings of vocabularies contained in the word lattice using information of the word lattice created as a result of the forward search, and a backward-decoding module performing a backward decoding of the detected topic domain with reference to a specific topic domain language model database, which has been previously established, thereby outputting a speech recognition result for an input voice signal in text form. Accuracy and efficiency for a dialogue sentence are improved.Type: GrantFiled: October 30, 2006Date of Patent: October 30, 2012Assignee: Samsung Electronics Co., Ltd.Inventors: Jae-won Lee, In-jeong Choi
-
Patent number: 8285546Abstract: A system for use in speech recognition includes an acoustic module accessing a plurality of distinct-language acoustic models, each based upon a different language; a lexicon module accessing at least one lexicon model; and a speech recognition output module. The speech recognition output module generates a first speech recognition output using a first model combination that combines one of the plurality of distinct-language acoustic models with the at least one lexicon model. In response to a threshold determination, the speech recognition output module generates a second speech recognition output using a second model combination that combines a different one of the plurality of distinct-language acoustic models with the at least one distinct-language lexicon model.Type: GrantFiled: September 9, 2011Date of Patent: October 9, 2012Assignee: Nuance Communications, Inc.Inventor: David E. Reich
-
Patent number: 8280740Abstract: A method (700) and system (900) for authenticating a user is provided. The method can include receiving one or more spoken utterances from a user (702), recognizing a phrase corresponding to one or more spoken utterances (704), identifying a biometric voice print of the user from one or more spoken utterances of the phrase (706), determining a device identifier associated with the device (708), and authenticating the user based on the phrase, the biometric voice print, and the device identifier (710). A location of the handset or the user can be employed as criteria for granting access to one or more resources (712).Type: GrantFiled: April 13, 2009Date of Patent: October 2, 2012Assignee: Porticus Technology, Inc.Inventors: Germano Di Mambro, Bernardas Salna
-
Publication number: 20120239401Abstract: Provided is a voice recognition system capable of, while suppressing negative influences from sound not to be recognized, correctly estimating utterance sections that are to be recognized. A voice segmenting means calculates voice feature values, and segments voice sections or non-voice sections by comparing the voice feature values with a threshold value. Then, the voice segmenting means determines, to be first voice sections, those segmented sections or sections obtained by adding a margin to the front and rear of each of those segmented sections. On the basis of voice and non-voice likelihoods, a search means determines, to be second voice sections, sections to which voice recognition is to be applied. A parameter updating means updates the threshold value and the margin. The voice segmenting means determines the first voice sections by using the one of the threshold value and the margin which has been updated by the parameter updating means.Type: ApplicationFiled: November 26, 2010Publication date: September 20, 2012Applicant: NEC CORPORATIONInventor: Takayuki Arakawa
-
Patent number: 8271281Abstract: Techniques for assessing pronunciation abilities of a user are provided. The techniques include recording a sentence spoken by a user, performing a classification of the spoken sentence, wherein the classification is performed with respect to at least one N-ordered class, and wherein the spoken sentence is represented by a set of at least one acoustic feature extracted from the spoken sentence, and determining a score based on the classification, wherein the score is used to determine an optimal set of at least one question to assess pronunciation ability of the user without human intervention.Type: GrantFiled: June 27, 2008Date of Patent: September 18, 2012Assignee: Nuance Communications, Inc.Inventors: Jayadeva, Sachindra Joshi, Himanshu Pant, Ashish Verma
-
Patent number: 8271278Abstract: A system, method and computer program product for classification of an analog electrical signal using statistical models of training data. A technique is described to quantize the analog electrical signal in a manner which maximizes the compression of the signal while simultaneously minimizing the diminution in the ability to classify the compressed signal. These goals are achieved by utilizing a quantizer designed to minimize the loss in a power of the log-likelihood ratio. A further technique is described to enhance the quantization process by optimally allocating a number of bits for each dimension of the quantized feature vector subject to a maximum number of bits available across all dimensions.Type: GrantFiled: April 3, 2010Date of Patent: September 18, 2012Assignee: International Business Machines CorporationInventors: Upendra V. Chaudhari, Hsin I. Tseng, Deepak S. Turaga, Olivier Verscheure
-
Patent number: 8265932Abstract: A system and method for identifying audio command prompts for use in a voice response environment is provided. A signature is generated for audio samples each having preceding audio, reference phrase audio, and trailing audio segments. The trailing segment is removed and each of the preceding and reference phrase segments are divided into buffers. The buffers are transformed into discrete fourier transform buffers. One of the discrete fourier transform buffers from the reference phrase segment that is dissimilar to each of the discrete fourier transform buffers from the preceding segment is selected as the signature. Audio command prompts are processed to generate a discrete fourier transform. Each discrete fourier transform for the audio command prompts is compared with each of the signatures and a correlation value is determined. One such audio command prompt matches one such signature when the correlation value for that audio command prompt satisfies a threshold.Type: GrantFiled: October 3, 2011Date of Patent: September 11, 2012Assignee: Intellisist, Inc.Inventor: Martin R. M. Dunsmuir
-
Patent number: 8260614Abstract: A method and system that expands a word graph to a phone graph. An unknown speech signal is received. A word graph is generated based on an application task or based on information extracted from the unknown speech signal. The word graph is expanded into a phone graph. The unknown speech signal is recognized using the phone graph. The phone graph can be based on a cross-word acoustical model to improve continuous speech recognition. By expanding a word graph into a phone graph, the phone graph can consume less memory than a word graph and can reduce greatly the computation cost in the decoding process than that of the word graph thus improving system performance. Furthermore, continuous speech recognition error rate can be reduced by using the phone graph, which provides a more accurate graph for continuous speech recognition.Type: GrantFiled: September 28, 2000Date of Patent: September 4, 2012Assignee: Intel CorporationInventors: Qingwei Zhao, Zhiwei Lin, Yonghong Yan
-
Publication number: 20120221335Abstract: According to one embodiment, the method may include constructing a first voice tag for registration speech based on Hidden Markov acoustic model (HMM), constructing a second voice tag for the registration speech based on template matching, and combining the first voice tag and the second voice tag to construct voice tag of the registration speech.Type: ApplicationFiled: February 24, 2012Publication date: August 30, 2012Inventors: Rui Zhao, Lei He
-
Publication number: 20120221336Abstract: A computer implemented method, data processing system, apparatus and computer program product for determining current behavioral, psychological and speech styles characteristics of a speaker in a given situation and context, through analysis of current speech utterances of the speaker. The analysis calculates different prosodic parameters of the speech utterances, consisting of unique secondary derivatives of the primary pitch and amplitude speech parameters, and compares these parameters with pre-obtained reference speech data, indicative of various behavioral, psychological and speech styles characteristics. The method includes the formation of the classification speech parameters reference database, as well as the analysis of the speaker's speech utterances in order to determine the current behavioral, psychological and speech styles characteristics of the speaker in the given situation.Type: ApplicationFiled: May 7, 2012Publication date: August 30, 2012Applicant: VOICESENSE LTD.Inventors: Yoav DEGANI, Yishai ZAMIR
-
Patent number: 8244532Abstract: Systems, methods, and programs for generating an authorized profile for a text communication device or account, may sample a text communication generated by the text communication device or account during communication and may store the text sample. The systems, methods, and programs may extract a language pattern from the stored text sample and may create an authorized profile based on the language pattern. Systems, methods, and programs for detecting unauthorized use of a text communication device or account may sample a text communication generated by the device or account during communication, may extract a language pattern from the audio sample, and may compare extracted language pattern of the sample with an authorized user profile.Type: GrantFiled: December 23, 2005Date of Patent: August 14, 2012Assignee: AT&T Intellectual Property II, L.P.Inventors: Lee Begeja, Benjamin J. Stern
-
Patent number: 8244534Abstract: An exemplary method for generating speech based on text in one or more languages includes providing a phone set for two or more languages, training multilingual HMMs where the HMMs include state level sharing across languages, receiving text in one or more of the languages of the multilingual HMMs and generating speech, for the received text, based at least in part on the multilingual HMMs. Other exemplary techniques include mapping between a decision tree for a first language and a decision tree for a second language, and optionally vice versa, and Kullback-Leibler divergence analysis for a multilingual text-to-speech system.Type: GrantFiled: August 20, 2007Date of Patent: August 14, 2012Assignee: Microsoft CorporationInventors: Yao Qian, Frank Kao-PingK Soong
-
Patent number: 8234113Abstract: Systems and methods for detecting people or speakers in an automated fashion are disclosed. A pool of features including more than one type of input (like audio input and video input) may be identified and used with a learning algorithm to generate a classifier that identifies people or speakers. The resulting classifier may be evaluated to detect people or speakers.Type: GrantFiled: August 30, 2011Date of Patent: July 31, 2012Assignee: Microsoft CorporationInventors: Cha Zhang, Paul A. Viola, Pei Yin, Ross G. Cutler, Xinding Sun, Yong Rui
-
Patent number: 8234112Abstract: Provided are an apparatus and method for generating a noise adaptive acoustic model including a noise adaptive discriminative adaptation method. The method includes: generating a baseline model parameter from large-capacity speech training data including various noise environments; and receiving the generated baseline model parameter and applying a discriminative adaptation method to the generated results to generate an migrated acoustic model parameter suitable for an actually applied environment.Type: GrantFiled: April 25, 2008Date of Patent: July 31, 2012Assignee: Electronics and Telecommunications Research InstituteInventors: Byung Ok Kang, Ho Young Jung, Yun Keun Lee
-
Patent number: 8229729Abstract: A system and method for training a statistical machine translation model and decoding or translating using the same is disclosed. A source word versus target word co-occurrence matrix is created to define word pairs. Dimensionality of the matrix may be reduced. Word pairs are mapped as vectors into continuous space where the word pairs are vectors of continuous real numbers and not discrete entities in the continuous space. A machine translation parametric model is trained using an acoustic model training method based on word pair vectors in the continuous space.Type: GrantFiled: March 25, 2008Date of Patent: July 24, 2012Assignee: International Business Machines CorporationInventors: Ruhi Sarikaya, Yonggang Deng, Brian Edward Doorenbos Kingsbury, Yuqing Gao
-
Patent number: 8219404Abstract: A method and apparatus for identifying a speaker within a captured audio signal from a collection of known speakers. The method and apparatus receive or generate voice representations for each known speakers and tag the representations according to meta data related to the known speaker or to the voice. The representations are grouped into one or more groups according to the indices. When a voice to be recognized is introduced, characteristics are determined according to which the groups are prioritized, so that the representations participating only in part of the groups are matched against the voice to be identified, thus reducing identification time and improving the statistical significance.Type: GrantFiled: August 9, 2007Date of Patent: July 10, 2012Assignee: Nice Systems, Ltd.Inventors: Adam Weinberg, Irit Opher, Eyal Benaroya, Renan Gutman
-
Patent number: 8195462Abstract: Disclosed herein is a system, method and computer-readable medium storing instructions for controlling a computing device according to the method. The invention relates to a system, method and computer-readable medium storing instructions for controlling a computing device according to the method. As an example embodiment, the method uses a speech recognition decoder that operates or uses fixed point arithmetic. The exemplary method comprises representing arc costs associated with at least one finite state transducer (FST) in fixed point, representing parameters associated with a hidden Markov model (HMM) in fixed point and processing speech data in the speech recognition decoder using fixed point arithmetic for the fixed point FST arc costs and the fixed point HMM parameters. The method may also include computing at the decoder sentence hypothesis probabilities with fixed point arithmetic as type Q-2e numbers.Type: GrantFiled: February 16, 2006Date of Patent: June 5, 2012Assignee: AT&T Intellectual Property II, L.P.Inventors: Charles Douglas Blewett, Enrico Luigi Bocchieri
-
Patent number: 8195468Abstract: A mobile system is provided that includes speech-based and non-speech-based interfaces for telematics applications. The mobile system identifies and uses context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for users that submit requests and/or commands in multiple domains. The invention creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context and presenting the expected results for a particular question or command. The invention may organize domain specific behavior and information into agents, that are distributable or updateable over a wide area network.Type: GrantFiled: April 11, 2011Date of Patent: June 5, 2012Assignee: VoiceBox Technologies, Inc.Inventors: Chris Weider, Richard Kennewick, Mike Kennewick, Philippe Di Cristo, Robert A. Kennewick, Samuel Menaker, Lynn Elise Armstrong
-
Patent number: 8194827Abstract: A security method and system. The method includes receiving by a computing system, a telephone call from a user. The computing system comprises an existing password/passphrase and a pre-recorded voice sample associated with the user. The computing system prompts the user to enter a password/passphrase using speech. The computing system receives speech data comprising a first password/passphrase from the user. The computing system converts the speech data to text data. The computing system first compares the text data to the first password/passphrase and determines a match. The computing system compares the speech data to the pre-recorded voice sample to determine a result indicating whether a frequency spectrum associated with the speech data matches a frequency spectrum associated with the pre-recorded voice sample. The computing system transmits the result to the user.Type: GrantFiled: April 29, 2008Date of Patent: June 5, 2012Assignee: International Business Machines CorporationInventors: Peeyush Jaiswal, Naveen Narayan
-
Patent number: 8179289Abstract: A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software. The device provides output in the form of a default output and a number of variants. The output is based largely upon the frequency, i.e., the likelihood that a user intended a particular output, but various features of the device provide additional variants that are not based solely on frequency and rather are provided by various logic structures resident on the device. The device enables editing during text entry and also provides a learning function that allows the disambiguation function to adapt to provide a customized experience for the user. The disambiguation function can be selectively disabled and an alternate keystroke interpretation system provided.Type: GrantFiled: June 19, 2006Date of Patent: May 15, 2012Assignee: Research In Motion LimitedInventors: Vadim Fux, Michael G. Elizarov, Sergey V. Kolomiets
-
Patent number: 8180637Abstract: A method of compensating for additive and convolutive distortions applied to a signal indicative of an utterance is discussed. The method includes receiving a signal and initializing noise mean and channel mean vectors. Gaussian dependent matrix and Hidden Markov Model (HMM) parameters are calculated or updated to account for additive noise from the noise mean vector or convolutive distortion from the channel mean vector. The HMM parameters are adapted by decoding the utterance using the previously calculated HMM parameters and adjusting the Gaussian dependent matrix and the HMM parameters based upon data received during the decoding. The adapted HMM parameters are applied to decode the input utterance and provide a transcription of the utterance.Type: GrantFiled: December 3, 2007Date of Patent: May 15, 2012Assignee: Microsoft CorporationInventors: Dong Yu, Li Deng, Alejandro Acero, Yifan Gong, Jinyu Li
-
Patent number: 8180638Abstract: Disclosed herein is a method for emotion recognition based on a minimum classification error. In the method, a speaker's neutral emotion is extracted using a Gaussian mixture model (GMM), other emotions except the neutral emotion are classified using the Gaussian Mixture Model to which a discriminative weight for minimizing the loss function of a classification error for the feature vector for emotion recognition is applied. In the emotion recognition, the emotion recognition is performed by applying a discriminative weight evaluated using the Gaussian Mixture Model based on minimum classification error to feature vectors of the emotion classified with difficult, thereby enhancing the performance of emotion recognition.Type: GrantFiled: February 23, 2010Date of Patent: May 15, 2012Assignee: Korea Institute of Science and TechnologyInventors: Hyoung Gon Kim, Ig Jae Kim, Joon-Hyuk Chang, Kye Hwan Lee, Chang Seok Bae
-
Publication number: 20120116763Abstract: A voice data analyzing device comprises speaker model deriving means which derives speaker models as models each specifying character of voice of each speaker from voice data including a plurality of utterances to each of which a speaker label as information for identifying a speaker has been assigned and speaker co-occurrence model deriving means which derives a speaker co-occurrence model as a model representing the strength of co-occurrence relationship among the speakers from session data obtained by segmenting the voice data in units of sequences of conversation by use of the speaker models derived by the speaker model deriving means.Type: ApplicationFiled: June 3, 2010Publication date: May 10, 2012Applicant: NEC CORPORATIONInventor: Takafumi Koshinaka
-
Patent number: 8150690Abstract: The invention relates to a speech recognition system and method with cepstral noise subtraction. The speech recognition system and method utilize a first scalar coefficient, a second scalar coefficient, and a determining condition to limit the process for the cepstral feature vector, so as to avoid excessive enhancement or subtraction in the cepstral feature vector, so that the operation of the cepstral feature vector is performed properly to improve the anti-noise ability in speech recognition. Furthermore, the speech recognition system and method can be applied in any environment, and have a low complexity and can be easily integrated into other systems, so as to provide the user with a more reliable and stable speech recognition result.Type: GrantFiled: October 1, 2008Date of Patent: April 3, 2012Assignee: Industrial Technology Research InstituteInventor: Shih-Ming Huang
-
Patent number: 8145486Abstract: Acoustic models to provide features to a speech signal are created based on speech features included in regions where similarities of acoustic models created based on speech features in a certain time length are equal to or greater than a predetermined value. Feature vectors acquired by using the acoustic models of the regions and the speech features to provide features to speech signals of second segments are grouped by speaker.Type: GrantFiled: January 9, 2008Date of Patent: March 27, 2012Assignee: Kabushiki Kaisha ToshibaInventor: Makoto Hirohata
-
Patent number: 8126668Abstract: Disclosed is a method of signal detection. A received input signal is divided into a frame unit and each input signal present in a first frame and a second frame is transformed into a frequency signal. Then, first power spectrum information and second power spectrum information are computed utilizing the transformed frequency signal and a delta spectrum entropy value corresponding to a difference of the two computed power spectrum information is obtained. A predetermined input signal is included in a predetermined frame among the input signal after judging through comparing the delta spectrum entropy value with a critical value. Desired signal can be detected in a noisy environment including a noise signal by using the delta spectrum entropy value.Type: GrantFiled: February 29, 2008Date of Patent: February 28, 2012Assignee: Sungkyunkwan University Foundation for Corporate CollaborationInventors: Kwang-Seok Hong, Yong-Wan Roh, Kue-Bum Lee
-
Patent number: 8121839Abstract: A service is configured to analyze multimedia communications to determine a likelihood that the communication is unsolicited. For example, the service may inspect e-mail messages, instant messaging messages, facsimile transmissions, voice communications, and video telephony, and analyze these forms of communication to determine whether an intended communication is unsolicited. In connection with voice and video telephony, a voice sample may be obtained from the caller and voice recognition may be performed on the sample to determine an identity of the person or the identity of the voice. The voice sample may also be used to determine the type of voice—i.e. if the voice is live, machine generated, or prerecorded. Where the call is a video telephony call, image recognition may be used to inspect an image of the person. The information obtained from voice recognition, voice type recognition, and image recognition may be used to detect whether the messages if from a known source of unsolicited communications.Type: GrantFiled: December 19, 2005Date of Patent: February 21, 2012Assignee: Rockstar Bidco, LPInventors: Samir Srivastava, Francois Audet, Vibhu Vivek
-
Patent number: 8121837Abstract: Methods, apparatus, and products are disclosed for adjusting a speech engine for a mobile computing device based on background noise, the mobile computing device operatively coupled to a microphone, that include: sampling, through the microphone, background noise for a plurality of operating environments in which the mobile computing device operates; generating, for each operating environment, a noise model in dependence upon the sampled background noise for that operating environment; and configuring the speech engine for the mobile computing device with the noise model for the operating environment in which the mobile computing device currently operates.Type: GrantFiled: April 24, 2008Date of Patent: February 21, 2012Assignee: Nuance Communications, Inc.Inventors: Ciprian Agapi, William K. Bodin, Charles W. Cross, Jr., Paritosh D. Patel
-
Patent number: 8108212Abstract: A speech recognition method comprises model selection step which selects a recognition model based on characteristic information of input speech and speech recognition step which translates input speech into text data based on the selected recognition model.Type: GrantFiled: October 30, 2007Date of Patent: January 31, 2012Assignee: NEC CorporationInventor: Shuhei Maegawa
-
Patent number: 8099278Abstract: A device may be configured to provide a query to a user. Voice data may be received from the user responsive to the query. Voice recognition may be performed on the voice data to identify a query answer. A confidence score associated with the query answer may be calculated, wherein the confidence score represents the likelihood that the query answer has been accurately identified. A likely age range associated with the user may be determined based on the confidence score. The device to calculate the confidence score may be tuned to increase a likelihood of recognition of voice data for a particular age range of callers.Type: GrantFiled: December 22, 2010Date of Patent: January 17, 2012Assignee: Verizon Patent and Licensing Inc.Inventor: Kevin R. Witzman
-
Patent number: 8099288Abstract: A text-dependent speaker verification technique that uses a generic speaker-independent speech recognizer for robust speaker verification, and uses the acoustical model of a speaker-independent speech recognizer as a background model. Instead of using a likelihood ratio test (LRT) at the utterance level (e.g., the sentence level), which is typical of most speaker verification systems, the present text-dependent speaker verification technique uses weighted sum of likelihood ratios at the sub-unit level (word, tri-phone, or phone) as well as at the utterance level.Type: GrantFiled: February 12, 2007Date of Patent: January 17, 2012Assignee: Microsoft Corp.Inventors: Zhengyou Zhang, Amarnag Subramaya
-
Patent number: 8099290Abstract: A voice recognition unit is constructed in such a way as to create a voice label string for an inputted voice uttered by a user inputted for each language on the basis of a feature vector time series of the inputted voice uttered by the user and data about a sound standard model, and register the voice label string into a voice label memory 2 while automatically switching among languages for a sound standard model memory 1 used to create the voice label string, and automatically switching among the languages for the voice label memory 2 for holding the created voice label string by using a first language switching unit SW1 and a second language switching unit SW2.Type: GrantFiled: October 20, 2009Date of Patent: January 17, 2012Assignee: Mitsubishi Electric CorporationInventors: Tadashi Suzuki, Yasushi Ishikawa, Yuzo Maruta