Using Statistical Models, E.g., Hidden Markov Models (hmms), Etc. (epo) Patents (Class 704/E15.027)

E Subclasses

Hidden markov models (hmms) (epo) (Class 704/E15.028)

Training of Hidden Markov Models (HMMs) (EPO) (Class 704/E15.029)

With insufficient amount of training data, e.g., state sharing, tying, deleted interpolation, etc. (EPO) (Class 704/E15.03)

Duration modeling in Hidden Markov Models (HMMs), e.g., semi-HMM, segmental models, transition probabilities, etc. (EPO) (Class 704/E15.031)
Hidden Markov Models (HMMs) network (EPO) (Class 704/E15.032)
State emission probabilities (EPO) (Class 704/E15.033)

Non-hidden markov model (epo) (Class 704/E15.037)

Systems and Methods for Non-Negative Hidden Markov Modeling of Signals

Publication number: 20130132085

Abstract: Methods and systems for non-negative hidden Markov modeling of signals are described. For example, techniques disclosed herein may be applied to signals emitted by one or more sources. In some embodiments, methods and systems may enable the separation of a signal's various components. As such, the systems and methods disclosed herein may find a wide variety of applications. In audio-related fields, for example, these techniques may be useful in music recording and processing, source extraction, noise reduction, teaching, automatic transcription, electronic games, audio search and retrieval, and many other applications.

Type: Application

Filed: February 21, 2011

Publication date: May 23, 2013

Inventors: Gautham J. Mysore, Paris Smaragdis
AUTOMATIC SPOKEN LANGUAGE IDENTIFICATION BASED ON PHONEME SEQUENCE PATTERNS

Publication number: 20120232901

Abstract: A language identification system that includes a universal phoneme decoder (UPD) is described. The UPD contains a universal phoneme set representing both 1) all phonemes occurring in the set of two or more spoken languages, and 2) captures phoneme correspondences across languages, such that a set of unique phoneme patterns and probabilities are calculated in order to identify a most likely phoneme occurring each time in the audio files in the set of two or more potential languages in which the UPD was trained on. Each statistical language model (SLM) uses the set of unique phoneme patterns created for each language in the set to distinguish between spoken human languages in the set of languages. The run-time language identifier module identifies a particular human language being spoken by utilizing the linguistic probabilities supplied by the SLMs that are based on the set of unique phoneme patterns created for each language.

Type: Application

Filed: May 24, 2012

Publication date: September 13, 2012

Applicant: Autonomy Corporation Ltd.

Inventors: Mahapathy Kadirkamanathan, Christopher John Waple
SPEECH RECOGNITION METHOD FOR ROBOT

Publication number: 20120130716

Abstract: A speech recognition method for a robot. The speech recognition method for the robot includes one fundamental acoustic model. Whenever the noisy environment and the speaker are changed, the speech recognition method generates a plurality of parallel acoustic models in which the characteristic for each noisy environment and the characteristic for each speaker are reflected. As a result, the speech recognition method for the robot can freely recognize one of several acoustic models according to individual environments and speakers, such that it can basically remove mismatch between the model training environment and the test environment, thereby improving speech recognition capabilities.

Type: Application

Filed: November 17, 2011

Publication date: May 24, 2012

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventor: Ki Beom KIM
FULL-SEQUENCE TRAINING OF DEEP STRUCTURES FOR SPEECH RECOGNITION

Publication number: 20120072215

Abstract: A method is disclosed herein that include an act of causing a processor to access a deep-structured model retained in a computer-readable medium, wherein the deep-structured model comprises a plurality of layers with weights assigned thereto, transition probabilities between states, and language model scores. The method can further include the act of jointly substantially optimizing the weights, the transition probabilities, and the language model scores of the deep-structured model using the optimization criterion based on a sequence rather than a set of unrelated frames.

Type: Application

Filed: September 21, 2010

Publication date: March 22, 2012

Applicant: Microsoft Corporation

Inventors: Dong Yu, Li Deng, Abdel-rahman Samir Abdel-rahman Mohamed
Single-Sided Speech Quality Measurement

Publication number: 20110288865

Abstract: A non-intrusive speech quality estimation technique is based on statistical or probability models such as Gaussian Mixture Models (“GMMs”). Perceptual features are extracted from the received speech signal and assessed by an artificial reference model formed using statistical models. The models characterize the statistical behavior of speech features. Consistency measures between the input speech features and the models are calculated to form indicators of speech quality. The consistency values are mapped to a speech quality score using a mapping optimized using machine learning algorithms, such as Multivariate Adaptive Regression Splines (“MARS”). The technique provides competitive or better quality estimates relative to known techniques while having lower computational complexity.

Type: Application

Filed: August 1, 2011

Publication date: November 24, 2011

Inventors: Wai-Yip Chan, Tiago H. Falk, Qingfeng Xu
CONCISE DYNAMIC GRAMMARS USING N-BEST SELECTION

Publication number: 20110202343

Abstract: A method and apparatus derive a dynamic grammar composed of a subset of a plurality of data elements that are each associated with one of a plurality of reference identifiers. The present invention generates a set of selection identifiers on the basis of a user-provided first input identifier and determines which of these selection identifiers are present in a set of pre-stored reference identifiers. The present invention creates a dynamic grammar that includes those data elements that are associated with those reference identifiers that are matched to any of the selection identifiers. Based on a user-provided second identifier and on the data elements of the dynamic grammar, the present invention selects one of the reference identifiers in the dynamic grammar.

Type: Application

Filed: April 28, 2011

Publication date: August 18, 2011

Applicant: AT&T Intellectual Property I, L.P.

Inventors: Deborah W. Brown, Randy G. Goldberg, Stephen Michael Marcus, Richard R. Rosinski
SYSTEM AND METHOD FOR TRAINING ADAPTATION-SPECIFIC ACOUSTIC MODELS FOR AUTOMATIC SPEECH RECOGNITION

Publication number: 20110137650

Abstract: Disclosed herein are systems, methods, and computer-readable storage media for training adaptation-specific acoustic models. A system practicing the method receives speech and generates a full size model and a reduced size model, the reduced size model starting with a single distribution for each speech sound in the received speech. The system finds speech segment boundaries in the speech using the full size model and adapts features of the speech data using the reduced size model based on the speech segment boundaries and an overall centroid for each speech sound. The system then recognizes speech using the adapted features of the speech. The model can be a Hidden Markov Model (HMM). The reduced size model can also be of a reduced complexity, such as having fewer mixture components than a model of full complexity. Adapting features of speech can include moving the features closer to an overall feature distribution center.

Type: Application

Filed: December 8, 2009

Publication date: June 9, 2011

Applicant: AT&T Intellectual Property I, L.P.

Inventor: Andrej LJOLJE
METHOD OF RECOGNIZING SPEECH

Publication number: 20110046953

Abstract: A method for recognizing speech involves reciting, into a speech recognition system, an utterance including a numeric sequence that contains a digit string including a plurality of tokens and detecting a co-articulation problem related to at least two potentially co-articulated tokens in the digit string. The numeric sequence may be identified using i) a dynamically generated possible numeric sequence that potentially corresponds with the numeric sequence, and/or ii) at least one supplemental acoustic model. Also disclosed herein is a system for accomplishing the same.

Type: Application

Filed: August 21, 2009

Publication date: February 24, 2011

Applicant: GENERAL MOTORS COMPANY

Inventors: Uma Arun, Sherri J. Voran-Nowak, Rathinavelu Chengalvarayan, Gaurav Talwar
Speech Recognition of a List Entry

Publication number: 20100211390

Abstract: The present invention relates to a method of generating a candidate list from a list of entries in accordance with a string of subword units corresponding to a speech input in a speech recognition system, the list of entries including plural list entries each comprising at least one fragment having one or more subword units. For each list entry, the fragments of the list entry are compared with the string of subword units. A matching score for each of the compared fragments based on the comparison is determined. The matching score for a fragment is further based on a comparison of at least one other fragment of the same list entry with the string of subword units. A total score for each list entry is determined based on the matching scores for the compared fragments of the respective list entry. A candidate list with the best matching entries from the list of entries based on the total scores of the list entries is generated.

Type: Application

Filed: February 16, 2010

Publication date: August 19, 2010

Applicant: NUANCE COMMUNICATIONS, INC.

Inventors: Christian Hillebrecht, Markus Schwarz
Speaker Recognition in a Speech Recognition System

Publication number: 20100198598

Abstract: A method for recognizing a speaker of an utterance in a speech recognition system is disclosed. A likelihood score for each of a plurality of speaker models for different speakers is determined. The likelihood score indicating how well the speaker model corresponds to the utterance. For each of the plurality of speaker models, a probability that the utterance originates from that speaker is determined. The probability is determined based on the likelihood score for the speaker model and requires the estimation of a distribution of likelihood scores expected based at least in part on the training state of the speaker.

Type: Application

Filed: February 4, 2010

Publication date: August 5, 2010

Applicant: NUANCE COMMUNICATIONS, INC.

Inventors: Tobias Herbig, Franz Gerl
DEALING WITH SWITCH LATENCY IN SPEECH RECOGNITION

Publication number: 20100185448

Abstract: In embodiments of the present invention improved capabilities are described for interacting with a mobile communication facility comprising receiving a switch activation from a user to initiate a speech recognition recording session, wherein the speech recognition recording session comprises a voice command from the user followed by the speech to be recognized from the user; recording the speech recognition recording session using a mobile communication facility resident capture facility; recognizing at least a portion of the voice command as an indication that user speech for recognition will begin following the end of the at least a portion of the voice command; recognizing the recorded speech using a speech recognition facility to produce an external output; and using the selected output to perform a function on the mobile communication facility.

Type: Application

Filed: January 21, 2010

Publication date: July 22, 2010

Inventor: William S. Meisel
METHOD AND APPARATUS FOR ERROR CORRECTION IN SPEECH RECOGNITION APPLICATIONS

Publication number: 20100125458

Abstract: In one embodiment, the present invention is a method and apparatus for error correction in speech recognition applications. In one embodiment, a method for recognizing user speech includes receiving a first utterance from the user, receiving a subsequent utterance from the user, and combining acoustic evidence from the first utterance with acoustic evidence from the subsequent utterance in order to recognize the first utterance. It is assumed that, if the first utterance has been incorrectly recognized on a first attempt, the user will repeat the first utterance (or at least the incorrectly recognized portion of the first utterance) in the subsequent utterance.

Type: Application

Filed: July 13, 2006

Publication date: May 20, 2010

Inventors: Horacio Franco, Gregory Myers, Jing Zheng, Federico Cesari, Cregg Cowan
Speech Detection Using Order Statistics

Publication number: 20100036663

Abstract: The method and system disclosed herein reduces total bandwidth requirement for communication in a voice over Internet protocol application. Sample [101] and convert [102] the analog input audio signal into digital signals and derive sampled frames [103]. Compute spacings of order statistics [104]. Measure the entropy for each of the sampled frames [105]. Set a threshold for entropy [106]. Mark the audio frames as active speech frames or inactive speech frames [107]. Mark an audio frame as an' inactive speech frame when the entropy is greater than the threshold, and mark the audio frame as an active speech frame when the entropy is lesser than the threshold [107]. Transmit only the active speech frames [108].

Type: Application

Filed: January 24, 2007

Publication date: February 11, 2010

Inventors: Muralishankar Rangarao, Vijay Satyanarayana, Venkatesha Prasad Rangarao, Shankar H. Narasimhiah
Speech recognition semantic classification training

Publication number: 20100023331

Abstract: An automated method is described for developing an automated speech input semantic classification system such as a call routing system. A set of semantic classifications is defined for classification of input speech utterances, where each semantic classification represents a specific semantic classification of the speech input. The semantic classification system is trained from training data having little or no in-domain manually transcribed training data, and then operated to assign input speech utterances to the defined semantic classifications. Adaptation training data based on input speech utterances is collected with manually assigned semantic labels. When the adaptation training data satisfies a pre-determined adaptation criteria, the semantic classification system is automatically retrained based on the adaptation training data.

Type: Application

Filed: July 15, 2009

Publication date: January 28, 2010

Applicant: Nuance Communications, Inc.

Inventors: Nicolae Duta, Rèal Tremblay, Andy Mauro, Douglas Peters
APPARATUS, METHOD AND PROGRAM FOR TEXT MINING

Publication number: 20090306982

Abstract: Disclosed is an apparatus includes a text input device that inputs text data provided with confidence measure, as subject for mining, a language processing unit that performs language analysis of the input text data provided with the confidence measures, a confidence measure exploiting characteristic word count unit that counts the characteristic words in the input text to provide a count result and that exploits the statistical information and the confidence measures provided in the input text to correct the count result obtained, a characteristic measure calculation unit that calculates the characteristic measure of each characteristic word from the corrected count result, a mining result output device that outputs the characteristic measure of each characteristic word obtained, a user operation input device for a user to input setting for language processing of the input text and setting for a technique for calculating the characteristic measure being found, a mining process management unit that transmits

Type: Application

Filed: July 18, 2007

Publication date: December 10, 2009

Inventors: Satoshi Nakazawa, Satoshi Morinaga
Adaptive Confidence Thresholds for Speech Recognition

Publication number: 20090259466

Abstract: Adjusting confidence score thresholds is described for a speech recognition engine. The speech recognition engine is implemented in multiple computer processes functioning in a computer processor, and is characterized by an associated receiver operating characteristic (ROC) curve. A results confirmation process interprets user confirmation of speech recognition results within a given confidence score threshold to create a confirmed portion of the ROC curve for the speech recognition engine. A curve extension process extends the confirmed portion of the ROC curve by extrapolation of unconfirmed speech recognition results beyond the confidence score threshold to generate an extended ROC curve. A threshold adjustment process adjusts the confidence score threshold based on the extended ROC curve to meet target operating constraints for operating the speech recognition engine to perform automatic speech recognition of user speech inputs.

Type: Application

Filed: April 14, 2009

Publication date: October 15, 2009

Applicant: NUANCE COMMUNICATIONS, INC.

Inventors: Peter Stubley, Oded Monzon, Alex Gorodetski, Douglas Peters
System and Method for Text Tagging and Segmentation Using a Generative/Discriminative Hybrid Hidden Markov Model

Publication number: 20090055183

Abstract: A method for sequence tagging medical patient records includes providing a labeled corpus of sentences taken from a set of medical records, initializing generative parameters ? and discriminative parameters {tilde over (?)}, providing a functional LL?C×Penalty, where LL is a log-likelihood function LL = log ? ? p ? ( ? , ? ~ ) + ? l = 1 M ? ? [ log ? ? p ? ( X l , Y l | ? ~ ) - log ? ? p ? ( X l | ? ~ ) ] + ? l = 1 M ? ? log ? ? p ? ( X l | ? ) , ? Penalty = ? y ? V Y ? ( em y 2 + tr y 2 + e ? ? m ~ y 2 + t ? ? r ~ y 2 ) , where emy=1???xj?VXp(xi|y), e{tilde over (m)}y=1???xi?VX{tilde over (p)}(xi|y) are emission probability constraints, try=1???yi?VYp(yi|y), t{tilde over (r)}y=1???yi?VY{tilde over (p)}(yi|y) are transition probability constraints, and extracting gradients of LL?C×Penalty with respect to the transition and emission probabilities and solving ?*k,{tilde o

Type: Application

Filed: August 21, 2008

Publication date: February 26, 2009

Applicant: Siemens Medical Solutions USA, Inc.

Inventors: Oksana Yakhnenko, Romer E. Rosales, Radu Stefan Niculescu, Lucian Vlad Lita