Specialized Models Patents (Class 704/250)
  • Patent number: 8447607
    Abstract: A mobile system is provided that includes speech-based and non-speech-based interfaces for telematics applications. The mobile system identifies and uses context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for users that submit requests and/or commands in multiple domains. The invention creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context and presenting the expected results for a particular question or command. The invention may organize domain specific behavior and information into agents, that are distributable or updateable over a wide area network.
    Type: Grant
    Filed: June 4, 2012
    Date of Patent: May 21, 2013
    Assignee: VoiceBox Technologies, Inc.
    Inventors: Chris Weider, Richard Kennewick, Mike Kennewick, Philippe Di Cristo, Robert A. Kennewick, Samuel Menaker, Lynn Elise Armstrong
  • Patent number: 8442187
    Abstract: A security method and system. The method includes receiving by a computing system, a telephone call from a user. The computing system comprises an existing password/passphrase and a pre-recorded voice sample associated with the user. The computing system prompts the user to enter a password/passphrase using speech. The computing system receives speech data comprising a first password/passphrase from the user. The computing system converts the speech data to text data. The computing system first compares the text data to the first password/passphrase and determines a match. The computing system compares the speech data to the pre-recorded voice sample to determine a result indicating whether a frequency spectrum associated with the speech data matches a frequency spectrum associated with the pre-recorded voice sample. The computing system transmits the result to the user.
    Type: Grant
    Filed: April 17, 2012
    Date of Patent: May 14, 2013
    Assignee: International Business Machines Corporation
    Inventors: Peeyush Jaiswal, Naveen Narayan
  • Patent number: 8442828
    Abstract: A conditional model is used in spoken language understanding. One such model is a conditional random field model.
    Type: Grant
    Filed: March 17, 2006
    Date of Patent: May 14, 2013
    Assignee: Microsoft Corporation
    Inventors: Ye-Yi Wang, Alejandro Acero, John Sie Yuen Lee, Milind V. Mahajan
  • Patent number: 8442824
    Abstract: Device, system, and method of liveness detection using voice biometrics. For example, a method comprises: generating a first matching score based on a comparison between: (a) a voice-print from a first text-dependent audio sample received at an enrollment stage, and (b) a second text-dependent audio sample received at an authentication stage; generating a second matching score based on a text-independent audio sample; and generating a liveness score by taking into account at least the first matching score and the second matching score.
    Type: Grant
    Filed: November 25, 2009
    Date of Patent: May 14, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Almog Aley-Raz, Nir Moshe Krause, Michael Itzhak Salmon, Ran Yehoshua Gazit
  • Patent number: 8442825
    Abstract: A device for voice identification including a receiver, a segmenter, a resolver, two advancers, a buffer, and a plurality of IIR resonator digital filters where each IIR filter comprises a set of memory locations or functional equivalent to hold filter specifications, a memory location or functional equivalent to hold the arithmetic reciprocal of the filter's gain, a five cell controller array, several multipliers, an adder, a subtractor, and a logical non-shift register. Each cell of the five cell controller array has five logical states, each acting as a five-position single-pole rotating switch that operates in unison with the four others. Additionally, the device also includes an artificial neural network and a display means.
    Type: Grant
    Filed: August 16, 2011
    Date of Patent: May 14, 2013
    Assignee: The United States of America as Represented by the Director, National Security Agency
    Inventor: Michael Sinutko
  • Patent number: 8438030
    Abstract: A method of and system for automated distortion classification. The method includes steps of (a) receiving audio including a user speech signal and at least some distortion associated with the signal; (b) pre-processing the received audio to generate acoustic feature vectors; (c) decoding the generated acoustic feature vectors to produce a plurality of hypotheses for the distortion; and (d) post-processing the plurality of hypotheses to identify at least one distortion hypothesis of the plurality of hypotheses as the received distortion. The system can include one or more distortion models including distortion-related acoustic features representative of various types of distortion and used by a decoder to compare the acoustic feature vectors with the distortion-related acoustic features to produce the plurality of hypotheses for the distortion.
    Type: Grant
    Filed: November 25, 2009
    Date of Patent: May 7, 2013
    Assignee: General Motors LLC
    Inventors: Gaurav Talwar, Rathinavelu Chengalvarayan
  • Patent number: 8428269
    Abstract: A spatial audio system for implementing a head-related transfer function (HRTF). A first stage implements a lateral HRTF that reproduces the median frequency response for a sound source located at a particular lateral distance from a listener, and second stage implements a vertical HRTF that reproduces the spectral changes when the vertical distance of a sound source changes relative to the listener. The system improves the vertical localization accuracy provided by an arbitrary measured HRTF by introducing an enhancement factor into the second processing stage. The enhancement factor increases the spectral differentiation between simulated sound sources located at different positions within the same “cone of confusion.
    Type: Grant
    Filed: May 20, 2010
    Date of Patent: April 23, 2013
    Assignee: The United States of America as represented by the Secretary of the Air Force
    Inventors: Douglas S. Brungart, Griffin D. Romigh
  • Patent number: 8417525
    Abstract: A computer-implemented method, system and/or program product update voice prints over time. A receiving computer receives an initial voice print. A determining period of time is calculated for that initial voice print. This determining period of time is a length of time during which an expected degree of change in subsequent voice prints, in comparison to the initial voice print, is predicted to occur. A new voice print is received after the determining period of time has passed, and the new voice print is compared with the initial voice print. In response to a change to the new voice print falling within the expected degree of change in comparison to the initial voice print, a voice print store is updated with the new voice print.
    Type: Grant
    Filed: February 9, 2010
    Date of Patent: April 9, 2013
    Assignee: International Business Machines Corporation
    Inventors: Sheri Gayle Daye, Peeyush Jaiswal, Fang Wang
  • Patent number: 8411830
    Abstract: A system, method and computer program product for providing targeted messages to a person using telephony services by generating user profile information from telephony data and using the user profile information to retrieve targeted messages.
    Type: Grant
    Filed: November 18, 2011
    Date of Patent: April 2, 2013
    Assignee: iCall, Inc.
    Inventors: Arlo Christopher Gilbert, Andrew Muldowney
  • Patent number: 8412526
    Abstract: A method for estimating high-order Mel Frequency Cepstral Coefficients, the method comprising initializing any of N?L high-order coefficients (HOC) of an MFCC vector of length N having L low-order coefficients (LOC) to a predetermined value, thereby forming a candidate MFCC vector, synthesizing a speech signal frame from the candidate MFCC vector and a pitch value, and computing an N-dimensional MFCC vector from the synthesized frame, thereby producing an output MFCC vector.
    Type: Grant
    Filed: December 3, 2007
    Date of Patent: April 2, 2013
    Assignee: Nuance Communications, Inc.
    Inventor: Alexander Sorin
  • Patent number: 8386253
    Abstract: Systems, methods, and programs for generating an authorized profile for a text communication device or account, may sample a text communication generated by the text communication device or account during communication and may store the text sample. The systems, methods, and programs may extract a language pattern from the stored text sample and may create an authorized profile based on the language pattern. Systems, methods, and programs for detecting unauthorized use of a text communication device or account may sample a text communication generated by the device or account during communication, may extract a language pattern from the audio sample, and may compare extracted language pattern of the sample with an authorized user profile.
    Type: Grant
    Filed: July 13, 2012
    Date of Patent: February 26, 2013
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Lee Begeja, Benjamin J. Stern
  • Publication number: 20130046540
    Abstract: A method for estimating high-order Mel Frequency Cepstral Coefficients, the method comprising initializing any of N-L high-order coefficients (HOC) of an MFCC vector of length N having L low-order coefficients (LOC) to a predetermined value, thereby forming a candidate MFCC vector, synthesizing a speech signal frame from the candidate MFCC vector and a pitch value, and computing an N-dimensional MFCC vector from the synthesized frame, thereby producing an output MFCC vector.
    Type: Application
    Filed: December 3, 2007
    Publication date: February 21, 2013
    Inventor: Alexander Sorin
  • Patent number: 8379806
    Abstract: A system and method for representing call content in a searchable database includes transcribing call content to text. The call content is projected to vector space, by creating a vector by indexing the call based on the content and determining a similarity of the call to an atomic-class dictionary. The call is classified in a relational database in accordance with the vector.
    Type: Grant
    Filed: August 22, 2008
    Date of Patent: February 19, 2013
    Assignee: International Business Machines Corporation
    Inventors: Cheng Wu, Andrzej Sakrajda, Hong-Kwang Jeff Kuo, Vaibhava Goel, David Lubensky
  • Patent number: 8374867
    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for selecting a speech recognition model in a standardized speech recognition infrastructure. The system receives speech from a user, and if a user-specific supervised speech model associated with the user is available, retrieves the supervised speech model. If the user-specific supervised speech model is unavailable and if an unsupervised speech model is available, the system retrieves the unsupervised speech model. If the user-specific supervised speech model and the unsupervised speech model are unavailable, the system retrieves a generic speech model associated with the user. Next the system recognizes the received speech from the user with the retrieved model. In one embodiment, the system trains a speech recognition model in a standardized speech recognition infrastructure. In another embodiment, the system handshakes with a remote application in a standardized speech recognition infrastructure.
    Type: Grant
    Filed: November 13, 2009
    Date of Patent: February 12, 2013
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Andrej Ljolje, Bernard S. Renger, Steven Neil Tischer
  • Patent number: 8374869
    Abstract: An utterance verification method for an isolated word N-best speech recognition result includes: calculating log likelihoods of a context-dependent phoneme and an anti-phoneme model based on an N-best speech recognition result for an input utterance; measuring a confidence score of an N-best speech-recognized word using the log likelihoods; calculating distance between phonemes for the N-best speech-recognized word; comparing the confidence score with a threshold and the distance with a predetermined mean of distances; and accepting the N-best speech-recognized word when the compared results for the confidence score and the distance correspond to acceptance.
    Type: Grant
    Filed: August 4, 2009
    Date of Patent: February 12, 2013
    Assignee: Electronics and Telecommunications Research Institute
    Inventors: Jeom Ja Kang, Yunkeun Lee, Jeon Gue Park, Ho-Young Jung, Hyung-Bae Jeon, Hoon Chung, Sung Joo Lee, Euisok Chung, Ji Hyun Wang, Byung Ok Kang, Ki-young Park, Jong Jin Kim
  • Patent number: 8374868
    Abstract: A method for recognizing speech involves reciting, into a speech recognition system, an utterance including a numeric sequence that contains a digit string including a plurality of tokens and detecting a co-articulation problem related to at least two potentially co-articulated tokens in the digit string. The numeric sequence may be identified using i) a dynamically generated possible numeric sequence that potentially corresponds with the numeric sequence, and/or ii) at least one supplemental acoustic model. Also disclosed herein is a system for accomplishing the same.
    Type: Grant
    Filed: August 21, 2009
    Date of Patent: February 12, 2013
    Assignee: General Motors LLC
    Inventors: Uma Arun, Sherri J Voran-Nowak, Rathinavelu Chengalvarayan, Gaurav Talwar
  • Patent number: 8355917
    Abstract: A representation of a speech signal is received and is decoded to identify a sequence of position-dependent phonetic tokens wherein each token comprises a phone and a position indicator that indicates the position of the phone within a syllable.
    Type: Grant
    Filed: February 1, 2012
    Date of Patent: January 15, 2013
    Assignee: Microsoft Corporation
    Inventors: Peng Liu, Yu Shi, Frank Kao-ping Soong
  • Publication number: 20120278077
    Abstract: Systems and methods for detecting people or speakers in an automated fashion are disclosed. A pool of features including more than one type of input (like audio input and video input) may be identified and used with a learning algorithm to generate a classifier that identifies people or speakers. The resulting classifier may be evaluated to detect people or speakers.
    Type: Application
    Filed: July 11, 2012
    Publication date: November 1, 2012
    Applicant: MICROSOFT CORPORATION
    Inventors: Cha Zhang, Paul A. Viola, Pei Yin, Ross G. Cutler, Xinding Sun, Yong Rui
  • Patent number: 8301450
    Abstract: An apparatus, method, and medium for dialogue speech recognition using topic domain detection are disclosed. An apparatus includes a forward search module performing a forward search in order to create a word lattice similar to a feature vector, which is extracted from an input voice signal, with reference to a global language model database, a pronunciation dictionary database and an acoustic model database, which have been previously established, a topic-domain-detection module detecting a topic domain by inferring a topic based on meanings of vocabularies contained in the word lattice using information of the word lattice created as a result of the forward search, and a backward-decoding module performing a backward decoding of the detected topic domain with reference to a specific topic domain language model database, which has been previously established, thereby outputting a speech recognition result for an input voice signal in text form. Accuracy and efficiency for a dialogue sentence are improved.
    Type: Grant
    Filed: October 30, 2006
    Date of Patent: October 30, 2012
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Jae-won Lee, In-jeong Choi
  • Patent number: 8285546
    Abstract: A system for use in speech recognition includes an acoustic module accessing a plurality of distinct-language acoustic models, each based upon a different language; a lexicon module accessing at least one lexicon model; and a speech recognition output module. The speech recognition output module generates a first speech recognition output using a first model combination that combines one of the plurality of distinct-language acoustic models with the at least one lexicon model. In response to a threshold determination, the speech recognition output module generates a second speech recognition output using a second model combination that combines a different one of the plurality of distinct-language acoustic models with the at least one distinct-language lexicon model.
    Type: Grant
    Filed: September 9, 2011
    Date of Patent: October 9, 2012
    Assignee: Nuance Communications, Inc.
    Inventor: David E. Reich
  • Patent number: 8280740
    Abstract: A method (700) and system (900) for authenticating a user is provided. The method can include receiving one or more spoken utterances from a user (702), recognizing a phrase corresponding to one or more spoken utterances (704), identifying a biometric voice print of the user from one or more spoken utterances of the phrase (706), determining a device identifier associated with the device (708), and authenticating the user based on the phrase, the biometric voice print, and the device identifier (710). A location of the handset or the user can be employed as criteria for granting access to one or more resources (712).
    Type: Grant
    Filed: April 13, 2009
    Date of Patent: October 2, 2012
    Assignee: Porticus Technology, Inc.
    Inventors: Germano Di Mambro, Bernardas Salna
  • Publication number: 20120239401
    Abstract: Provided is a voice recognition system capable of, while suppressing negative influences from sound not to be recognized, correctly estimating utterance sections that are to be recognized. A voice segmenting means calculates voice feature values, and segments voice sections or non-voice sections by comparing the voice feature values with a threshold value. Then, the voice segmenting means determines, to be first voice sections, those segmented sections or sections obtained by adding a margin to the front and rear of each of those segmented sections. On the basis of voice and non-voice likelihoods, a search means determines, to be second voice sections, sections to which voice recognition is to be applied. A parameter updating means updates the threshold value and the margin. The voice segmenting means determines the first voice sections by using the one of the threshold value and the margin which has been updated by the parameter updating means.
    Type: Application
    Filed: November 26, 2010
    Publication date: September 20, 2012
    Applicant: NEC CORPORATION
    Inventor: Takayuki Arakawa
  • Patent number: 8271281
    Abstract: Techniques for assessing pronunciation abilities of a user are provided. The techniques include recording a sentence spoken by a user, performing a classification of the spoken sentence, wherein the classification is performed with respect to at least one N-ordered class, and wherein the spoken sentence is represented by a set of at least one acoustic feature extracted from the spoken sentence, and determining a score based on the classification, wherein the score is used to determine an optimal set of at least one question to assess pronunciation ability of the user without human intervention.
    Type: Grant
    Filed: June 27, 2008
    Date of Patent: September 18, 2012
    Assignee: Nuance Communications, Inc.
    Inventors: Jayadeva, Sachindra Joshi, Himanshu Pant, Ashish Verma
  • Patent number: 8271278
    Abstract: A system, method and computer program product for classification of an analog electrical signal using statistical models of training data. A technique is described to quantize the analog electrical signal in a manner which maximizes the compression of the signal while simultaneously minimizing the diminution in the ability to classify the compressed signal. These goals are achieved by utilizing a quantizer designed to minimize the loss in a power of the log-likelihood ratio. A further technique is described to enhance the quantization process by optimally allocating a number of bits for each dimension of the quantized feature vector subject to a maximum number of bits available across all dimensions.
    Type: Grant
    Filed: April 3, 2010
    Date of Patent: September 18, 2012
    Assignee: International Business Machines Corporation
    Inventors: Upendra V. Chaudhari, Hsin I. Tseng, Deepak S. Turaga, Olivier Verscheure
  • Patent number: 8265932
    Abstract: A system and method for identifying audio command prompts for use in a voice response environment is provided. A signature is generated for audio samples each having preceding audio, reference phrase audio, and trailing audio segments. The trailing segment is removed and each of the preceding and reference phrase segments are divided into buffers. The buffers are transformed into discrete fourier transform buffers. One of the discrete fourier transform buffers from the reference phrase segment that is dissimilar to each of the discrete fourier transform buffers from the preceding segment is selected as the signature. Audio command prompts are processed to generate a discrete fourier transform. Each discrete fourier transform for the audio command prompts is compared with each of the signatures and a correlation value is determined. One such audio command prompt matches one such signature when the correlation value for that audio command prompt satisfies a threshold.
    Type: Grant
    Filed: October 3, 2011
    Date of Patent: September 11, 2012
    Assignee: Intellisist, Inc.
    Inventor: Martin R. M. Dunsmuir
  • Patent number: 8260614
    Abstract: A method and system that expands a word graph to a phone graph. An unknown speech signal is received. A word graph is generated based on an application task or based on information extracted from the unknown speech signal. The word graph is expanded into a phone graph. The unknown speech signal is recognized using the phone graph. The phone graph can be based on a cross-word acoustical model to improve continuous speech recognition. By expanding a word graph into a phone graph, the phone graph can consume less memory than a word graph and can reduce greatly the computation cost in the decoding process than that of the word graph thus improving system performance. Furthermore, continuous speech recognition error rate can be reduced by using the phone graph, which provides a more accurate graph for continuous speech recognition.
    Type: Grant
    Filed: September 28, 2000
    Date of Patent: September 4, 2012
    Assignee: Intel Corporation
    Inventors: Qingwei Zhao, Zhiwei Lin, Yonghong Yan
  • Publication number: 20120221335
    Abstract: According to one embodiment, the method may include constructing a first voice tag for registration speech based on Hidden Markov acoustic model (HMM), constructing a second voice tag for the registration speech based on template matching, and combining the first voice tag and the second voice tag to construct voice tag of the registration speech.
    Type: Application
    Filed: February 24, 2012
    Publication date: August 30, 2012
    Inventors: Rui Zhao, Lei He
  • Publication number: 20120221336
    Abstract: A computer implemented method, data processing system, apparatus and computer program product for determining current behavioral, psychological and speech styles characteristics of a speaker in a given situation and context, through analysis of current speech utterances of the speaker. The analysis calculates different prosodic parameters of the speech utterances, consisting of unique secondary derivatives of the primary pitch and amplitude speech parameters, and compares these parameters with pre-obtained reference speech data, indicative of various behavioral, psychological and speech styles characteristics. The method includes the formation of the classification speech parameters reference database, as well as the analysis of the speaker's speech utterances in order to determine the current behavioral, psychological and speech styles characteristics of the speaker in the given situation.
    Type: Application
    Filed: May 7, 2012
    Publication date: August 30, 2012
    Applicant: VOICESENSE LTD.
    Inventors: Yoav DEGANI, Yishai ZAMIR
  • Patent number: 8244532
    Abstract: Systems, methods, and programs for generating an authorized profile for a text communication device or account, may sample a text communication generated by the text communication device or account during communication and may store the text sample. The systems, methods, and programs may extract a language pattern from the stored text sample and may create an authorized profile based on the language pattern. Systems, methods, and programs for detecting unauthorized use of a text communication device or account may sample a text communication generated by the device or account during communication, may extract a language pattern from the audio sample, and may compare extracted language pattern of the sample with an authorized user profile.
    Type: Grant
    Filed: December 23, 2005
    Date of Patent: August 14, 2012
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Lee Begeja, Benjamin J. Stern
  • Patent number: 8244534
    Abstract: An exemplary method for generating speech based on text in one or more languages includes providing a phone set for two or more languages, training multilingual HMMs where the HMMs include state level sharing across languages, receiving text in one or more of the languages of the multilingual HMMs and generating speech, for the received text, based at least in part on the multilingual HMMs. Other exemplary techniques include mapping between a decision tree for a first language and a decision tree for a second language, and optionally vice versa, and Kullback-Leibler divergence analysis for a multilingual text-to-speech system.
    Type: Grant
    Filed: August 20, 2007
    Date of Patent: August 14, 2012
    Assignee: Microsoft Corporation
    Inventors: Yao Qian, Frank Kao-PingK Soong
  • Patent number: 8234113
    Abstract: Systems and methods for detecting people or speakers in an automated fashion are disclosed. A pool of features including more than one type of input (like audio input and video input) may be identified and used with a learning algorithm to generate a classifier that identifies people or speakers. The resulting classifier may be evaluated to detect people or speakers.
    Type: Grant
    Filed: August 30, 2011
    Date of Patent: July 31, 2012
    Assignee: Microsoft Corporation
    Inventors: Cha Zhang, Paul A. Viola, Pei Yin, Ross G. Cutler, Xinding Sun, Yong Rui
  • Patent number: 8234112
    Abstract: Provided are an apparatus and method for generating a noise adaptive acoustic model including a noise adaptive discriminative adaptation method. The method includes: generating a baseline model parameter from large-capacity speech training data including various noise environments; and receiving the generated baseline model parameter and applying a discriminative adaptation method to the generated results to generate an migrated acoustic model parameter suitable for an actually applied environment.
    Type: Grant
    Filed: April 25, 2008
    Date of Patent: July 31, 2012
    Assignee: Electronics and Telecommunications Research Institute
    Inventors: Byung Ok Kang, Ho Young Jung, Yun Keun Lee
  • Patent number: 8229729
    Abstract: A system and method for training a statistical machine translation model and decoding or translating using the same is disclosed. A source word versus target word co-occurrence matrix is created to define word pairs. Dimensionality of the matrix may be reduced. Word pairs are mapped as vectors into continuous space where the word pairs are vectors of continuous real numbers and not discrete entities in the continuous space. A machine translation parametric model is trained using an acoustic model training method based on word pair vectors in the continuous space.
    Type: Grant
    Filed: March 25, 2008
    Date of Patent: July 24, 2012
    Assignee: International Business Machines Corporation
    Inventors: Ruhi Sarikaya, Yonggang Deng, Brian Edward Doorenbos Kingsbury, Yuqing Gao
  • Patent number: 8219404
    Abstract: A method and apparatus for identifying a speaker within a captured audio signal from a collection of known speakers. The method and apparatus receive or generate voice representations for each known speakers and tag the representations according to meta data related to the known speaker or to the voice. The representations are grouped into one or more groups according to the indices. When a voice to be recognized is introduced, characteristics are determined according to which the groups are prioritized, so that the representations participating only in part of the groups are matched against the voice to be identified, thus reducing identification time and improving the statistical significance.
    Type: Grant
    Filed: August 9, 2007
    Date of Patent: July 10, 2012
    Assignee: Nice Systems, Ltd.
    Inventors: Adam Weinberg, Irit Opher, Eyal Benaroya, Renan Gutman
  • Patent number: 8195462
    Abstract: Disclosed herein is a system, method and computer-readable medium storing instructions for controlling a computing device according to the method. The invention relates to a system, method and computer-readable medium storing instructions for controlling a computing device according to the method. As an example embodiment, the method uses a speech recognition decoder that operates or uses fixed point arithmetic. The exemplary method comprises representing arc costs associated with at least one finite state transducer (FST) in fixed point, representing parameters associated with a hidden Markov model (HMM) in fixed point and processing speech data in the speech recognition decoder using fixed point arithmetic for the fixed point FST arc costs and the fixed point HMM parameters. The method may also include computing at the decoder sentence hypothesis probabilities with fixed point arithmetic as type Q-2e numbers.
    Type: Grant
    Filed: February 16, 2006
    Date of Patent: June 5, 2012
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Charles Douglas Blewett, Enrico Luigi Bocchieri
  • Patent number: 8195468
    Abstract: A mobile system is provided that includes speech-based and non-speech-based interfaces for telematics applications. The mobile system identifies and uses context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for users that submit requests and/or commands in multiple domains. The invention creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context and presenting the expected results for a particular question or command. The invention may organize domain specific behavior and information into agents, that are distributable or updateable over a wide area network.
    Type: Grant
    Filed: April 11, 2011
    Date of Patent: June 5, 2012
    Assignee: VoiceBox Technologies, Inc.
    Inventors: Chris Weider, Richard Kennewick, Mike Kennewick, Philippe Di Cristo, Robert A. Kennewick, Samuel Menaker, Lynn Elise Armstrong
  • Patent number: 8194827
    Abstract: A security method and system. The method includes receiving by a computing system, a telephone call from a user. The computing system comprises an existing password/passphrase and a pre-recorded voice sample associated with the user. The computing system prompts the user to enter a password/passphrase using speech. The computing system receives speech data comprising a first password/passphrase from the user. The computing system converts the speech data to text data. The computing system first compares the text data to the first password/passphrase and determines a match. The computing system compares the speech data to the pre-recorded voice sample to determine a result indicating whether a frequency spectrum associated with the speech data matches a frequency spectrum associated with the pre-recorded voice sample. The computing system transmits the result to the user.
    Type: Grant
    Filed: April 29, 2008
    Date of Patent: June 5, 2012
    Assignee: International Business Machines Corporation
    Inventors: Peeyush Jaiswal, Naveen Narayan
  • Patent number: 8179289
    Abstract: A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software. The device provides output in the form of a default output and a number of variants. The output is based largely upon the frequency, i.e., the likelihood that a user intended a particular output, but various features of the device provide additional variants that are not based solely on frequency and rather are provided by various logic structures resident on the device. The device enables editing during text entry and also provides a learning function that allows the disambiguation function to adapt to provide a customized experience for the user. The disambiguation function can be selectively disabled and an alternate keystroke interpretation system provided.
    Type: Grant
    Filed: June 19, 2006
    Date of Patent: May 15, 2012
    Assignee: Research In Motion Limited
    Inventors: Vadim Fux, Michael G. Elizarov, Sergey V. Kolomiets
  • Patent number: 8180637
    Abstract: A method of compensating for additive and convolutive distortions applied to a signal indicative of an utterance is discussed. The method includes receiving a signal and initializing noise mean and channel mean vectors. Gaussian dependent matrix and Hidden Markov Model (HMM) parameters are calculated or updated to account for additive noise from the noise mean vector or convolutive distortion from the channel mean vector. The HMM parameters are adapted by decoding the utterance using the previously calculated HMM parameters and adjusting the Gaussian dependent matrix and the HMM parameters based upon data received during the decoding. The adapted HMM parameters are applied to decode the input utterance and provide a transcription of the utterance.
    Type: Grant
    Filed: December 3, 2007
    Date of Patent: May 15, 2012
    Assignee: Microsoft Corporation
    Inventors: Dong Yu, Li Deng, Alejandro Acero, Yifan Gong, Jinyu Li
  • Patent number: 8180638
    Abstract: Disclosed herein is a method for emotion recognition based on a minimum classification error. In the method, a speaker's neutral emotion is extracted using a Gaussian mixture model (GMM), other emotions except the neutral emotion are classified using the Gaussian Mixture Model to which a discriminative weight for minimizing the loss function of a classification error for the feature vector for emotion recognition is applied. In the emotion recognition, the emotion recognition is performed by applying a discriminative weight evaluated using the Gaussian Mixture Model based on minimum classification error to feature vectors of the emotion classified with difficult, thereby enhancing the performance of emotion recognition.
    Type: Grant
    Filed: February 23, 2010
    Date of Patent: May 15, 2012
    Assignee: Korea Institute of Science and Technology
    Inventors: Hyoung Gon Kim, Ig Jae Kim, Joon-Hyuk Chang, Kye Hwan Lee, Chang Seok Bae
  • Publication number: 20120116763
    Abstract: A voice data analyzing device comprises speaker model deriving means which derives speaker models as models each specifying character of voice of each speaker from voice data including a plurality of utterances to each of which a speaker label as information for identifying a speaker has been assigned and speaker co-occurrence model deriving means which derives a speaker co-occurrence model as a model representing the strength of co-occurrence relationship among the speakers from session data obtained by segmenting the voice data in units of sequences of conversation by use of the speaker models derived by the speaker model deriving means.
    Type: Application
    Filed: June 3, 2010
    Publication date: May 10, 2012
    Applicant: NEC CORPORATION
    Inventor: Takafumi Koshinaka
  • Patent number: 8150690
    Abstract: The invention relates to a speech recognition system and method with cepstral noise subtraction. The speech recognition system and method utilize a first scalar coefficient, a second scalar coefficient, and a determining condition to limit the process for the cepstral feature vector, so as to avoid excessive enhancement or subtraction in the cepstral feature vector, so that the operation of the cepstral feature vector is performed properly to improve the anti-noise ability in speech recognition. Furthermore, the speech recognition system and method can be applied in any environment, and have a low complexity and can be easily integrated into other systems, so as to provide the user with a more reliable and stable speech recognition result.
    Type: Grant
    Filed: October 1, 2008
    Date of Patent: April 3, 2012
    Assignee: Industrial Technology Research Institute
    Inventor: Shih-Ming Huang
  • Patent number: 8145486
    Abstract: Acoustic models to provide features to a speech signal are created based on speech features included in regions where similarities of acoustic models created based on speech features in a certain time length are equal to or greater than a predetermined value. Feature vectors acquired by using the acoustic models of the regions and the speech features to provide features to speech signals of second segments are grouped by speaker.
    Type: Grant
    Filed: January 9, 2008
    Date of Patent: March 27, 2012
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Makoto Hirohata
  • Patent number: 8126668
    Abstract: Disclosed is a method of signal detection. A received input signal is divided into a frame unit and each input signal present in a first frame and a second frame is transformed into a frequency signal. Then, first power spectrum information and second power spectrum information are computed utilizing the transformed frequency signal and a delta spectrum entropy value corresponding to a difference of the two computed power spectrum information is obtained. A predetermined input signal is included in a predetermined frame among the input signal after judging through comparing the delta spectrum entropy value with a critical value. Desired signal can be detected in a noisy environment including a noise signal by using the delta spectrum entropy value.
    Type: Grant
    Filed: February 29, 2008
    Date of Patent: February 28, 2012
    Assignee: Sungkyunkwan University Foundation for Corporate Collaboration
    Inventors: Kwang-Seok Hong, Yong-Wan Roh, Kue-Bum Lee
  • Patent number: 8121839
    Abstract: A service is configured to analyze multimedia communications to determine a likelihood that the communication is unsolicited. For example, the service may inspect e-mail messages, instant messaging messages, facsimile transmissions, voice communications, and video telephony, and analyze these forms of communication to determine whether an intended communication is unsolicited. In connection with voice and video telephony, a voice sample may be obtained from the caller and voice recognition may be performed on the sample to determine an identity of the person or the identity of the voice. The voice sample may also be used to determine the type of voice—i.e. if the voice is live, machine generated, or prerecorded. Where the call is a video telephony call, image recognition may be used to inspect an image of the person. The information obtained from voice recognition, voice type recognition, and image recognition may be used to detect whether the messages if from a known source of unsolicited communications.
    Type: Grant
    Filed: December 19, 2005
    Date of Patent: February 21, 2012
    Assignee: Rockstar Bidco, LP
    Inventors: Samir Srivastava, Francois Audet, Vibhu Vivek
  • Patent number: 8121837
    Abstract: Methods, apparatus, and products are disclosed for adjusting a speech engine for a mobile computing device based on background noise, the mobile computing device operatively coupled to a microphone, that include: sampling, through the microphone, background noise for a plurality of operating environments in which the mobile computing device operates; generating, for each operating environment, a noise model in dependence upon the sampled background noise for that operating environment; and configuring the speech engine for the mobile computing device with the noise model for the operating environment in which the mobile computing device currently operates.
    Type: Grant
    Filed: April 24, 2008
    Date of Patent: February 21, 2012
    Assignee: Nuance Communications, Inc.
    Inventors: Ciprian Agapi, William K. Bodin, Charles W. Cross, Jr., Paritosh D. Patel
  • Patent number: 8108212
    Abstract: A speech recognition method comprises model selection step which selects a recognition model based on characteristic information of input speech and speech recognition step which translates input speech into text data based on the selected recognition model.
    Type: Grant
    Filed: October 30, 2007
    Date of Patent: January 31, 2012
    Assignee: NEC Corporation
    Inventor: Shuhei Maegawa
  • Patent number: 8099278
    Abstract: A device may be configured to provide a query to a user. Voice data may be received from the user responsive to the query. Voice recognition may be performed on the voice data to identify a query answer. A confidence score associated with the query answer may be calculated, wherein the confidence score represents the likelihood that the query answer has been accurately identified. A likely age range associated with the user may be determined based on the confidence score. The device to calculate the confidence score may be tuned to increase a likelihood of recognition of voice data for a particular age range of callers.
    Type: Grant
    Filed: December 22, 2010
    Date of Patent: January 17, 2012
    Assignee: Verizon Patent and Licensing Inc.
    Inventor: Kevin R. Witzman
  • Patent number: 8099288
    Abstract: A text-dependent speaker verification technique that uses a generic speaker-independent speech recognizer for robust speaker verification, and uses the acoustical model of a speaker-independent speech recognizer as a background model. Instead of using a likelihood ratio test (LRT) at the utterance level (e.g., the sentence level), which is typical of most speaker verification systems, the present text-dependent speaker verification technique uses weighted sum of likelihood ratios at the sub-unit level (word, tri-phone, or phone) as well as at the utterance level.
    Type: Grant
    Filed: February 12, 2007
    Date of Patent: January 17, 2012
    Assignee: Microsoft Corp.
    Inventors: Zhengyou Zhang, Amarnag Subramaya
  • Patent number: 8099290
    Abstract: A voice recognition unit is constructed in such a way as to create a voice label string for an inputted voice uttered by a user inputted for each language on the basis of a feature vector time series of the inputted voice uttered by the user and data about a sound standard model, and register the voice label string into a voice label memory 2 while automatically switching among languages for a sound standard model memory 1 used to create the voice label string, and automatically switching among the languages for the voice label memory 2 for holding the created voice label string by using a first language switching unit SW1 and a second language switching unit SW2.
    Type: Grant
    Filed: October 20, 2009
    Date of Patent: January 17, 2012
    Assignee: Mitsubishi Electric Corporation
    Inventors: Tadashi Suzuki, Yasushi Ishikawa, Yuzo Maruta