Using Statistical Models, E.g., Hidden Markov Models (hmms), Etc. (epo) Patents (Class 704/E15.027)
  • Publication number: 20130132085
    Abstract: Methods and systems for non-negative hidden Markov modeling of signals are described. For example, techniques disclosed herein may be applied to signals emitted by one or more sources. In some embodiments, methods and systems may enable the separation of a signal's various components. As such, the systems and methods disclosed herein may find a wide variety of applications. In audio-related fields, for example, these techniques may be useful in music recording and processing, source extraction, noise reduction, teaching, automatic transcription, electronic games, audio search and retrieval, and many other applications.
    Type: Application
    Filed: February 21, 2011
    Publication date: May 23, 2013
    Inventors: Gautham J. Mysore, Paris Smaragdis
  • Publication number: 20120232901
    Abstract: A language identification system that includes a universal phoneme decoder (UPD) is described. The UPD contains a universal phoneme set representing both 1) all phonemes occurring in the set of two or more spoken languages, and 2) captures phoneme correspondences across languages, such that a set of unique phoneme patterns and probabilities are calculated in order to identify a most likely phoneme occurring each time in the audio files in the set of two or more potential languages in which the UPD was trained on. Each statistical language model (SLM) uses the set of unique phoneme patterns created for each language in the set to distinguish between spoken human languages in the set of languages. The run-time language identifier module identifies a particular human language being spoken by utilizing the linguistic probabilities supplied by the SLMs that are based on the set of unique phoneme patterns created for each language.
    Type: Application
    Filed: May 24, 2012
    Publication date: September 13, 2012
    Applicant: Autonomy Corporation Ltd.
    Inventors: Mahapathy Kadirkamanathan, Christopher John Waple
  • Publication number: 20120130716
    Abstract: A speech recognition method for a robot. The speech recognition method for the robot includes one fundamental acoustic model. Whenever the noisy environment and the speaker are changed, the speech recognition method generates a plurality of parallel acoustic models in which the characteristic for each noisy environment and the characteristic for each speaker are reflected. As a result, the speech recognition method for the robot can freely recognize one of several acoustic models according to individual environments and speakers, such that it can basically remove mismatch between the model training environment and the test environment, thereby improving speech recognition capabilities.
    Type: Application
    Filed: November 17, 2011
    Publication date: May 24, 2012
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventor: Ki Beom KIM
  • Publication number: 20120072215
    Abstract: A method is disclosed herein that include an act of causing a processor to access a deep-structured model retained in a computer-readable medium, wherein the deep-structured model comprises a plurality of layers with weights assigned thereto, transition probabilities between states, and language model scores. The method can further include the act of jointly substantially optimizing the weights, the transition probabilities, and the language model scores of the deep-structured model using the optimization criterion based on a sequence rather than a set of unrelated frames.
    Type: Application
    Filed: September 21, 2010
    Publication date: March 22, 2012
    Applicant: Microsoft Corporation
    Inventors: Dong Yu, Li Deng, Abdel-rahman Samir Abdel-rahman Mohamed
  • Publication number: 20110288865
    Abstract: A non-intrusive speech quality estimation technique is based on statistical or probability models such as Gaussian Mixture Models (“GMMs”). Perceptual features are extracted from the received speech signal and assessed by an artificial reference model formed using statistical models. The models characterize the statistical behavior of speech features. Consistency measures between the input speech features and the models are calculated to form indicators of speech quality. The consistency values are mapped to a speech quality score using a mapping optimized using machine learning algorithms, such as Multivariate Adaptive Regression Splines (“MARS”). The technique provides competitive or better quality estimates relative to known techniques while having lower computational complexity.
    Type: Application
    Filed: August 1, 2011
    Publication date: November 24, 2011
    Inventors: Wai-Yip Chan, Tiago H. Falk, Qingfeng Xu
  • Publication number: 20110202343
    Abstract: A method and apparatus derive a dynamic grammar composed of a subset of a plurality of data elements that are each associated with one of a plurality of reference identifiers. The present invention generates a set of selection identifiers on the basis of a user-provided first input identifier and determines which of these selection identifiers are present in a set of pre-stored reference identifiers. The present invention creates a dynamic grammar that includes those data elements that are associated with those reference identifiers that are matched to any of the selection identifiers. Based on a user-provided second identifier and on the data elements of the dynamic grammar, the present invention selects one of the reference identifiers in the dynamic grammar.
    Type: Application
    Filed: April 28, 2011
    Publication date: August 18, 2011
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Deborah W. Brown, Randy G. Goldberg, Stephen Michael Marcus, Richard R. Rosinski
  • Publication number: 20110137650
    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for training adaptation-specific acoustic models. A system practicing the method receives speech and generates a full size model and a reduced size model, the reduced size model starting with a single distribution for each speech sound in the received speech. The system finds speech segment boundaries in the speech using the full size model and adapts features of the speech data using the reduced size model based on the speech segment boundaries and an overall centroid for each speech sound. The system then recognizes speech using the adapted features of the speech. The model can be a Hidden Markov Model (HMM). The reduced size model can also be of a reduced complexity, such as having fewer mixture components than a model of full complexity. Adapting features of speech can include moving the features closer to an overall feature distribution center.
    Type: Application
    Filed: December 8, 2009
    Publication date: June 9, 2011
    Applicant: AT&T Intellectual Property I, L.P.
    Inventor: Andrej LJOLJE
  • Publication number: 20110046953
    Abstract: A method for recognizing speech involves reciting, into a speech recognition system, an utterance including a numeric sequence that contains a digit string including a plurality of tokens and detecting a co-articulation problem related to at least two potentially co-articulated tokens in the digit string. The numeric sequence may be identified using i) a dynamically generated possible numeric sequence that potentially corresponds with the numeric sequence, and/or ii) at least one supplemental acoustic model. Also disclosed herein is a system for accomplishing the same.
    Type: Application
    Filed: August 21, 2009
    Publication date: February 24, 2011
    Applicant: GENERAL MOTORS COMPANY
    Inventors: Uma Arun, Sherri J. Voran-Nowak, Rathinavelu Chengalvarayan, Gaurav Talwar
  • Publication number: 20100211390
    Abstract: The present invention relates to a method of generating a candidate list from a list of entries in accordance with a string of subword units corresponding to a speech input in a speech recognition system, the list of entries including plural list entries each comprising at least one fragment having one or more subword units. For each list entry, the fragments of the list entry are compared with the string of subword units. A matching score for each of the compared fragments based on the comparison is determined. The matching score for a fragment is further based on a comparison of at least one other fragment of the same list entry with the string of subword units. A total score for each list entry is determined based on the matching scores for the compared fragments of the respective list entry. A candidate list with the best matching entries from the list of entries based on the total scores of the list entries is generated.
    Type: Application
    Filed: February 16, 2010
    Publication date: August 19, 2010
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventors: Christian Hillebrecht, Markus Schwarz
  • Publication number: 20100198598
    Abstract: A method for recognizing a speaker of an utterance in a speech recognition system is disclosed. A likelihood score for each of a plurality of speaker models for different speakers is determined. The likelihood score indicating how well the speaker model corresponds to the utterance. For each of the plurality of speaker models, a probability that the utterance originates from that speaker is determined. The probability is determined based on the likelihood score for the speaker model and requires the estimation of a distribution of likelihood scores expected based at least in part on the training state of the speaker.
    Type: Application
    Filed: February 4, 2010
    Publication date: August 5, 2010
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventors: Tobias Herbig, Franz Gerl
  • Publication number: 20100185448
    Abstract: In embodiments of the present invention improved capabilities are described for interacting with a mobile communication facility comprising receiving a switch activation from a user to initiate a speech recognition recording session, wherein the speech recognition recording session comprises a voice command from the user followed by the speech to be recognized from the user; recording the speech recognition recording session using a mobile communication facility resident capture facility; recognizing at least a portion of the voice command as an indication that user speech for recognition will begin following the end of the at least a portion of the voice command; recognizing the recorded speech using a speech recognition facility to produce an external output; and using the selected output to perform a function on the mobile communication facility.
    Type: Application
    Filed: January 21, 2010
    Publication date: July 22, 2010
    Inventor: William S. Meisel
  • Publication number: 20100125458
    Abstract: In one embodiment, the present invention is a method and apparatus for error correction in speech recognition applications. In one embodiment, a method for recognizing user speech includes receiving a first utterance from the user, receiving a subsequent utterance from the user, and combining acoustic evidence from the first utterance with acoustic evidence from the subsequent utterance in order to recognize the first utterance. It is assumed that, if the first utterance has been incorrectly recognized on a first attempt, the user will repeat the first utterance (or at least the incorrectly recognized portion of the first utterance) in the subsequent utterance.
    Type: Application
    Filed: July 13, 2006
    Publication date: May 20, 2010
    Inventors: Horacio Franco, Gregory Myers, Jing Zheng, Federico Cesari, Cregg Cowan
  • Publication number: 20100036663
    Abstract: The method and system disclosed herein reduces total bandwidth requirement for communication in a voice over Internet protocol application. Sample [101] and convert [102] the analog input audio signal into digital signals and derive sampled frames [103]. Compute spacings of order statistics [104]. Measure the entropy for each of the sampled frames [105]. Set a threshold for entropy [106]. Mark the audio frames as active speech frames or inactive speech frames [107]. Mark an audio frame as an' inactive speech frame when the entropy is greater than the threshold, and mark the audio frame as an active speech frame when the entropy is lesser than the threshold [107]. Transmit only the active speech frames [108].
    Type: Application
    Filed: January 24, 2007
    Publication date: February 11, 2010
    Inventors: Muralishankar Rangarao, Vijay Satyanarayana, Venkatesha Prasad Rangarao, Shankar H. Narasimhiah
  • Publication number: 20100023331
    Abstract: An automated method is described for developing an automated speech input semantic classification system such as a call routing system. A set of semantic classifications is defined for classification of input speech utterances, where each semantic classification represents a specific semantic classification of the speech input. The semantic classification system is trained from training data having little or no in-domain manually transcribed training data, and then operated to assign input speech utterances to the defined semantic classifications. Adaptation training data based on input speech utterances is collected with manually assigned semantic labels. When the adaptation training data satisfies a pre-determined adaptation criteria, the semantic classification system is automatically retrained based on the adaptation training data.
    Type: Application
    Filed: July 15, 2009
    Publication date: January 28, 2010
    Applicant: Nuance Communications, Inc.
    Inventors: Nicolae Duta, Rèal Tremblay, Andy Mauro, Douglas Peters
  • Publication number: 20090306982
    Abstract: Disclosed is an apparatus includes a text input device that inputs text data provided with confidence measure, as subject for mining, a language processing unit that performs language analysis of the input text data provided with the confidence measures, a confidence measure exploiting characteristic word count unit that counts the characteristic words in the input text to provide a count result and that exploits the statistical information and the confidence measures provided in the input text to correct the count result obtained, a characteristic measure calculation unit that calculates the characteristic measure of each characteristic word from the corrected count result, a mining result output device that outputs the characteristic measure of each characteristic word obtained, a user operation input device for a user to input setting for language processing of the input text and setting for a technique for calculating the characteristic measure being found, a mining process management unit that transmits
    Type: Application
    Filed: July 18, 2007
    Publication date: December 10, 2009
    Inventors: Satoshi Nakazawa, Satoshi Morinaga
  • Publication number: 20090259466
    Abstract: Adjusting confidence score thresholds is described for a speech recognition engine. The speech recognition engine is implemented in multiple computer processes functioning in a computer processor, and is characterized by an associated receiver operating characteristic (ROC) curve. A results confirmation process interprets user confirmation of speech recognition results within a given confidence score threshold to create a confirmed portion of the ROC curve for the speech recognition engine. A curve extension process extends the confirmed portion of the ROC curve by extrapolation of unconfirmed speech recognition results beyond the confidence score threshold to generate an extended ROC curve. A threshold adjustment process adjusts the confidence score threshold based on the extended ROC curve to meet target operating constraints for operating the speech recognition engine to perform automatic speech recognition of user speech inputs.
    Type: Application
    Filed: April 14, 2009
    Publication date: October 15, 2009
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventors: Peter Stubley, Oded Monzon, Alex Gorodetski, Douglas Peters
  • Publication number: 20090055183
    Abstract: A method for sequence tagging medical patient records includes providing a labeled corpus of sentences taken from a set of medical records, initializing generative parameters ? and discriminative parameters {tilde over (?)}, providing a functional LL?C×Penalty, where LL is a log-likelihood function LL = log ? ? p ? ( ? , ? ~ ) + ? l = 1 M ? ? [ log ? ? p ? ( X l , Y l | ? ~ ) - log ? ? p ? ( X l | ? ~ ) ] + ? l = 1 M ? ? log ? ? p ? ( X l | ? ) , ? Penalty = ? y ? V Y ? ( em y 2 + tr y 2 + e ? ? m ~ y 2 + t ? ? r ~ y 2 ) , where emy=1???xj?VXp(xi|y), e{tilde over (m)}y=1???xi?VX{tilde over (p)}(xi|y) are emission probability constraints, try=1???yi?VYp(yi|y), t{tilde over (r)}y=1???yi?VY{tilde over (p)}(yi|y) are transition probability constraints, and extracting gradients of LL?C×Penalty with respect to the transition and emission probabilities and solving ?*k,{tilde o
    Type: Application
    Filed: August 21, 2008
    Publication date: February 26, 2009
    Applicant: Siemens Medical Solutions USA, Inc.
    Inventors: Oksana Yakhnenko, Romer E. Rosales, Radu Stefan Niculescu, Lucian Vlad Lita