Probability Patents (Class 704/240)
-
Publication number: 20110288865Abstract: A non-intrusive speech quality estimation technique is based on statistical or probability models such as Gaussian Mixture Models (“GMMs”). Perceptual features are extracted from the received speech signal and assessed by an artificial reference model formed using statistical models. The models characterize the statistical behavior of speech features. Consistency measures between the input speech features and the models are calculated to form indicators of speech quality. The consistency values are mapped to a speech quality score using a mapping optimized using machine learning algorithms, such as Multivariate Adaptive Regression Splines (“MARS”). The technique provides competitive or better quality estimates relative to known techniques while having lower computational complexity.Type: ApplicationFiled: August 1, 2011Publication date: November 24, 2011Inventors: Wai-Yip Chan, Tiago H. Falk, Qingfeng Xu
-
Patent number: 8055503Abstract: A system and method provide an audio analysis intelligence tool with ad-hoc search capabilities using spoken words as an organized data form. An SQL-like interface is used to process and search audio data and combine it with other traditional data forms to enhance searching of audio segments to identify those audio segments satisfying minimum confidence levels for a match.Type: GrantFiled: November 1, 2006Date of Patent: November 8, 2011Assignee: Siemens Enterprise Communications, Inc.Inventors: Robert Scarano, Lawrence Mark
-
Patent number: 8050929Abstract: An optimal selection or decision strategy is described through an example that includes use in dialog systems. The selection strategy or method includes receiving multiple predictions and multiple probabilities. The received predictions predict the content of a received input and each of the probabilities corresponds to one of the predictions. In an example dialog system, the received input includes an utterance. The selection method includes dynamically selecting a set of predictions from the received predictions by generating ranked predictions. The ranked predictions are generated by ordering the plurality of predictions according to descending probability.Type: GrantFiled: August 24, 2007Date of Patent: November 1, 2011Assignee: Robert Bosch GmbHInventors: Junling Hu, Fabrizio Morbini, Fuliang Weng, Xue Liu
-
Patent number: 8045646Abstract: Provided are an apparatus for estimating a phase error and a phase error correcting system using the phase error estimating apparatus. The apparatus includes: a probability value estimating unit for estimating a negative log probability value for each transmission symbol by transforming a soft output information transferred from the outside to a log A posterior probability ratio (LAPPR) value; an APP value calculating unit for calculating a posterior probability (APP) value by applying a negative exponential function to the transmission symbol; an average value deciding unit for deciding an average value for each transmission symbol using the probability information entirely, partially, or selectively according to a probability information type; and a symbol phase estimating unit for estimating a phase of a symbol based on the decided average value.Type: GrantFiled: September 11, 2006Date of Patent: October 25, 2011Assignee: Electronics and Telecommunications Research InstituteInventors: Pan-Soo Kim, Byoung-Hak Kim, Yun-Jeong Song, Deock-Gil Oh, Ho-Jin Lee, Jun Heo, Joong-Gon Ryoo
-
Patent number: 8024188Abstract: An optimal selection or decision strategy is described through an example that includes use in dialog systems. The selection strategy or method includes receiving multiple predictions and multiple probabilities. The received predictions predict the content of a received input and each of the probabilities corresponds to one of the predictions. In an example dialog system, the received input includes an utterance. The selection method includes dynamically selecting a set of predictions from the received predictions by generating ranked predictions. The ranked predictions are generated by ordering the plurality of predictions according to descending probability.Type: GrantFiled: August 24, 2007Date of Patent: September 20, 2011Assignee: Robert Bosch GmbHInventors: Junling Hu, Fabrizio Morbini, Fuliang Weng, Xue Liu
-
Publication number: 20110224983Abstract: Described is a technology by which a probability is estimated for a token in a sequence of tokens based upon a number of zero or more times (actual counts) that the sequence was observed in training data. The token may be a word in a word sequence, and the estimated probability may be used in a statistical language model. A discount parameter is set independently of interpolation parameters. If the sequence was observed at least once in the training data, a discount probability and an interpolation probability are computed and summed to provide the estimated probability. If the sequence was not observed, the probability is estimated by computing a backoff probability. Also described are various ways to obtain the discount parameter and interpolation parameters.Type: ApplicationFiled: March 11, 2010Publication date: September 15, 2011Applicant: Microsoft CorporationInventor: Robert Carter Moore
-
Publication number: 20110218803Abstract: A method for assessing intelligibility of speech represented by a speech signal includes providing a speech signal and performing a feature extraction on at least one frame of the speech signal so as to obtain a feature vector for each of the at least one frame of the speech signal. The feature vector is input to a statistical machine learning model so as to obtain an estimated posterior probability of phonemes in the at least one frame as an output including a vector of phoneme posterior probabilities of different phonemes for each of the at least one frame of the speech signal. An entropy estimation is performed on the vector of phoneme posterior probabilities of the at least one frame of the speech signal so as to evaluate intelligibility of the at least one frame of the speech signal. An intelligibility measure is output for the at least one frame of the speech signal.Type: ApplicationFiled: March 4, 2011Publication date: September 8, 2011Applicant: DEUTSCHE TELEKOM AGInventors: Hamed Ketabdar, Juan-Pablo Ramirez
-
Patent number: 8014999Abstract: The invention provides a softscaled frequency compensation function that allows the evaluation of a first quality measure indicating a global impact of all distortions in an audio transmission system, including linear frequency response distortions and second quality measure that only lakes into account the impact of linear frequency response distortions. The softscaled frequency compensation function is derived from a softscaled ratio between a time integrated output and a time integrated input power density functions. The first quality measure is derived from the difference loudness density function as function of time and frequency, using the frequency compensated input loudness density function and the gain compensated output loudness density function both as a function of time and frequency, in the same manner as carried out in ITU standard P.862.Type: GrantFiled: September 20, 2005Date of Patent: September 6, 2011Assignee: Nederlandse Organisatie voor toegepast - natuurwetenschappelijk Onderzoek TNOInventor: John Gerard Beerends
-
Patent number: 8014536Abstract: Improved audio source separation is provided by providing an audio dictionary for each source to be separated. Thus the invention can be regarded as providing “partially blind” source separation as opposed to the more commonly considered “blind” source separation problem, where no prior information about the sources is given. The audio dictionaries are probabilistic source models, and can be derived from training data from the sources to be separated, or from similar sources. Thus a library of audio dictionaries can be developed to aid in source separation. An unmixing and deconvolutive transformation can be inferred by maximum likelihood (ML) given the received signals and the selected audio dictionaries as input to the ML calculation. Optionally, frequency-domain filtering of the separated signal estimates can be performed prior to reconstructing the time-domain separated signal estimates. Such filtering can be regarded as providing an “audio skin” for a recovered signal.Type: GrantFiled: December 1, 2006Date of Patent: September 6, 2011Assignee: Golden Metallic, Inc.Inventor: Hagai Thomas Attias
-
Patent number: 8015007Abstract: A speech recognition apparatus includes a first grammar storage unit configured to store one or more grammar segments, a second grammar storage unit configured to store one or more grammar segments, a first decoder configured to carry out a decoding process by referring to the grammar segment stored in the second grammar storage unit, a grammar transfer unit configured to transfer a trailing grammar segment from the first grammar storage unit to the second grammar storage unit, a second decoder configured to operate in parallel to the grammar transfer unit and carry out the decoding process by referring to the grammar segment stored in the second grammar storage unit, and a recognition control unit configured to monitor the state of transfer of the trailing grammar segment carried out by the grammar transfer unit and activate the both decoders by switching the operation thereof according to the state of transfer of the grammar segment.Type: GrantFiled: March 13, 2008Date of Patent: September 6, 2011Assignee: Kabushiki Kaisha ToshibaInventor: Masaru Sakai
-
Patent number: 8005665Abstract: A method of extracting significant phrases from one or more documents stored in a computer-readable medium. A sequence of words is read from the one or more documents and a score is determined for each word in the sequence based on the length of the word. The score for each word in the sequence is compared against a threshold score. The sequence of words is indicated to be a significant phrase if the number of words in the sequences that have a score greater than the threshold score equals or exceeds a predetermined number. A sentence containing the sequence of words is retrieved from the document, if the sequence of words is a significant phrase. An abstract of the document is searched to determine if the sentence has been previously included in the abstract. If not, the sentence is added to the abstract.Type: GrantFiled: July 23, 2010Date of Patent: August 23, 2011Assignee: Schukhaus Group GmbH, LLCInventors: Garnet R. Chaney, Robert F. Richardson, Seymour I. Rubinstein
-
Patent number: 8000962Abstract: A method and system for using input signal quality in an automatic speech recognition system. The method includes measuring the quality of an input signal into a speech recognition system and varying a rejection threshold of the speech recognition system at runtime in dependence on the measurement of the input signal quality. If the measurement of the input signal quality is low, the rejection threshold is reduced and, if the measurement of the input signal quality is high, the rejection threshold is increased. The measurement of the input signal quality may be based on one or more of the measurements of signal-to-noise ratio, loudness, including clipping, and speech signal duration.Type: GrantFiled: May 19, 2006Date of Patent: August 16, 2011Assignee: Nuance Communications, Inc.Inventors: John Doyle, John Brian Pickering
-
Patent number: 7996214Abstract: Disclosed are a system and method for exploiting information in an utterance for dialog act tagging. An exemplary method includes receiving a user utterance, computing at periodic intervals at least one parameter in the user utterance, quantizing the at least one parameter at each periodic interval, approximating conditional probabilities using an n-gram over a sliding window over the periodic intervals and tagging the utterance as a dialog act based on the approximated conditional probabilities.Type: GrantFiled: November 1, 2007Date of Patent: August 9, 2011Assignee: AT&T Intellectual Property I, L.P.Inventors: Srinivas Bangalore, Vivek Kumar Rangarajan Sridhar
-
Publication number: 20110184735Abstract: Embodiments are disclosed that relate to the use of identity information to help avoid the occurrence of false positive speech recognition events in a speech recognition system. One embodiment provides a method comprising receiving speech recognition data comprising a recognized speech segment, acoustic locational data related to a location of origin of the recognized speech segment as determined via signals from the microphone array, and confidence data comprising a recognition confidence value, and also receiving image data comprising visual locational information related to a location of each person in an image. The acoustic locational data is compared to the visual locational data to determine whether the recognized speech segment originated from a person in the field of view of the image sensor, and the confidence data is adjusted depending on this determination.Type: ApplicationFiled: January 22, 2010Publication date: July 28, 2011Applicant: MICROSOFT CORPORATIONInventors: Jason Flaks, Dax Hawkins, Christian Klein, Mitchell Stephen Dernis, Tommer Leyvand, Ali M. Vassigh, Duncan McKay
-
Publication number: 20110173000Abstract: A word category estimation apparatus (100) includes a word category model (5) which is formed from a probability model having a plurality of kinds of information about a word category as features, and includes information about an entire word category graph as at least one of the features. A word category estimation unit (4) receives the word category graph of a speech recognition hypothesis to be processed, computes scores by referring to the word category model for respective arcs that form the word category graph, and outputs a word category sequence candidate based on the scores.Type: ApplicationFiled: December 19, 2008Publication date: July 14, 2011Inventors: Hitoshi Yamamoto, Miki Kiyokazu
-
Patent number: 7970613Abstract: Use of runtime memory may be reduced in a data processing algorithm that uses one or more probability distribution functions. Each probability distribution function may be characterized by one or more uncompressed mean values and one or more variance values. The uncompressed mean and variance values may be represented by ?-bit floating point numbers, where ? is an integer greater than 1. The probability distribution functions are converted to compressed probability functions having compressed mean and/or variance values represented as ?-bit integers, where ? is less than ?, whereby the compressed mean and/or variance values occupy less memory space than the uncompressed mean and/or variance values. Portions of the data processing algorithm can be performed with the compressed mean and variance values.Type: GrantFiled: November 12, 2005Date of Patent: June 28, 2011Assignee: Sony Computer Entertainment Inc.Inventor: Ruxin Chen
-
Patent number: 7970614Abstract: The present invention provides a system and method for treating distortion propagated though a detection system. The system includes a compensation module that compensates for untreated distortions propagating through the detection compensation system, a user model pool that comprises of a plurality of model sets, and a model selector that selects at least one model set from plurality of model sets in the user model pool. The compensation is accomplished by continually producing scores distributed according to a prescribed distribution for the at least one model set and mitigating the adverse effects of the scores being distorted and lying off a pre-set operating point. The method for treating distortion propagated though a detection system includes receiving a signal from a remote device, and compensating the signal for untreated distortions.Type: GrantFiled: May 8, 2007Date of Patent: June 28, 2011Assignee: Nuance Communications, Inc.Inventors: Janice J. Kim, Jiri Navratil, Jason W. Pelecanos, Ganesh N. Ramaswamy
-
Publication number: 20110153326Abstract: A system and method for extracting acoustic features and speech activity on a device and transmitting them in a distributed voice recognition system. The distributed voice recognition system includes a local VR engine in a subscriber unit and a server VR engine on a server. The local VR engine comprises a feature extraction (FE) module that extracts features from a speech signal, and a voice activity detection module (VAD) that detects voice activity within a speech signal. The system includes filters, framing and windowing modules, power spectrum analyzers, a neural network, a nonlinear element, and other components to selectively provide an advanced front end vector including predetermined portions of the voice activity detection indication and extracted features from the subscriber unit to the server. The system also includes a module to generate additional feature vectors on the server from the received features using a feed-forward multilayer perceptron (MLP) and providing the same to the speech server.Type: ApplicationFiled: February 9, 2011Publication date: June 23, 2011Applicant: QUALCOMM INCORPORATEDInventors: HARINATH GARUDADRI, HYNEK HERMANSKY, LUKAS BURGET, PRATIBHA JAIN, SACHIN KAJAREKAR, SUNIL SIVADAS, STEPHANE N. DUPONT, MARIA CARMEN BENITEZ ORTUZAR, NELSON H. MORGAN
-
Publication number: 20110144986Abstract: Described is a calibration model for use in a speech recognition system. The calibration model adjusts the confidence scores output by a speech recognition engine to thereby provide an improved calibrated confidence score for use by an application. The calibration model is one that has been trained for a specific usage scenario, e.g., for that application, based upon a calibration training set obtained from a previous similar/corresponding usage scenario or scenarios. Different calibration models may be used with different usage scenarios, e.g., during different conditions. The calibration model may comprise a maximum entropy classifier with distribution constraints, trained with continuous raw confidence scores and multi-valued word tokens, and/or other distributions and extracted features.Type: ApplicationFiled: December 10, 2009Publication date: June 16, 2011Applicant: Microsoft CorporationInventors: Dong Yu, Li Deng, Jinyu Li
-
Publication number: 20110137651Abstract: An automatic speech recognition (ASR) system and method is provided for controlling the recognition of speech utterances generated by an end user operating a communications device. The ASR system and method can be used with a mobile device that is used in a communications network. The ASR system can be used for ASR of speech utterances input into a mobile device, to perform compensating techniques using at least one characteristic and for updating an ASR speech recognizer associated with the ASR system by determined and using a background noise value and a distortion value that is based on the features of the mobile device. The ASR system can be used to augment a limited data input capability of a mobile device, for example, caused by limited input devices physically located on the mobile device.Type: ApplicationFiled: February 14, 2011Publication date: June 9, 2011Applicant: AT&T Intellectual Property II, L.P.Inventors: Richard C. ROSE, Sarangarajan PATHASARATHY, Aaron Edward ROSENBERG, Shrikanth Sambasivan NARAYANAN
-
Publication number: 20110131042Abstract: Disclosed is a dialogue speech recognition system that can expand the scope of applications by employing a universal dialogue structure as the condition for speech recognition of dialogue speech between persons. An acoustic likelihood computation means (701) provides a likelihood that a speech signal input from a given phoneme sequence will occur. A linguistic likelihood computation means (702) provides a likelihood that a given word sequence will occur. A maximum likelihood candidate search means (703) uses the likelihoods provided by the acoustic likelihood computation means and the linguistic likelihood computation means to provide a word sequence with the maximum likelihood of occurring from a speech signal. Further, the linguistic likelihood computation means (702) provides different linguistic likelihoods when the speaker who generated the acoustic signal input to the speech recognition means does and does not have the turn to speak.Type: ApplicationFiled: May 12, 2009Publication date: June 2, 2011Inventor: Kentaro Nagatomo
-
Patent number: 7953598Abstract: A device receives a voice recognition statistic from a voice recognition application and applies a grammar improvement rule based on the voice recognition statistic. The device also automatically adjusts a weight of the voice recognition statistic based on the grammar improvement rule, and outputs the weight adjusted voice recognition statistic for use in the voice recognition application.Type: GrantFiled: December 17, 2007Date of Patent: May 31, 2011Assignee: Verizon Patent and Licensing Inc.Inventor: Kevin W. Brown
-
Patent number: 7945552Abstract: A system of the present invention stores: a first index which designates lists of keywords contained in texts from identifications of the respective texts; a second index which designates lists of texts containing keywords from identifications of the respective keywords; and the number of texts containing the respective keywords. Then, upon receiving an input of a text search condition, the system calculates an estimation of search time by the first index and an estimation of search time by the second index, and determines which one of the first and second indexes makes a search faster. Then, by using the index which has been determined to make the search faster, the system searches for keywords which appear in texts satisfying the text search condition with higher frequency.Type: GrantFiled: March 26, 2008Date of Patent: May 17, 2011Assignee: International Business Machines CorporationInventors: Daisuke Takuma, Issei Yoshida, Yuta Tsuboi
-
Patent number: 7941317Abstract: Systems and methods for low-latency real-time speech recognition/transcription. A discriminative feature extraction, such as a heteroscedastic discriminant analysis transform, in combination with a maximum likelihood linear transform is applied during front-end processing of a digital speech signal. The extracted features reduce the word error rate. A discriminative acoustic model is applied by generating state-level lattices using Maximum Mutual Information Estimation. Recognition networks of language models are replaced by their closure. Latency is reduced by eliminating segmentation such that a number of words/sentences can be recognized as a single utterance. Latency is further reduced by performing front-end normalization in a causal fashion.Type: GrantFiled: June 5, 2007Date of Patent: May 10, 2011Assignee: AT&T Intellectual Property II, L.P.Inventors: Vincent Goffin, Michael Dennis Riley, Murat Saraclar
-
Publication number: 20110099012Abstract: Disclosed herein are systems, methods, and computer-readable storage media for estimating reliability of alternate speech recognition hypotheses. A system configured to practice the method receives an N-best list of speech recognition hypotheses and features describing the N-best list, determines a first probability of correctness for each hypothesis in the N-best list based on the received features, determines a second probability that the N-best list does not contain a correct hypothesis, and uses the first probability and the second probability in a spoken dialog. The features can describe properties of at least one of a lattice, a word confusion network, and a garbage model. In one aspect, the N-best lists are not reordered according to reranking scores. The determination of the first probability of correctness can include a first stage of training a probabilistic model and a second stage of distributing mass over items in a tail of the N-best list.Type: ApplicationFiled: October 23, 2009Publication date: April 28, 2011Applicant: AT&T Intellectual Property I, L.P.Inventors: Jason WILLIAMS, Suhrid BALAKRISHNAN
-
Patent number: 7933773Abstract: A natural language understanding monitoring system adapted to conduct an automated dialog with a user. If the system is unable to identify from the automated dialog, to at least a predetermined level of confidence, any one of a plurality of predetermined tasks as being a particular task that the user wants to have performed, the system makes a determination of the value of a probability that further automated dialog will enable the system to identify the particular task, and determines whether or not to conduct further automated dialog with the user, in an attempt to identify the particular task, based on the relative values of the determined probability and a predetermined threshold value. The probability value determination is based on inputs from the user during the automated dialog.Type: GrantFiled: March 25, 2009Date of Patent: April 26, 2011Assignee: AT&T Intellectual Property II, L.P.Inventors: Allen Louis Gorin, Irene Langkilde Geary, Marilyn Ann Walker, Jeremy H. Wright
-
Patent number: 7930181Abstract: Systems and methods for low-latency real-time speech recognition/transcription. A discriminative feature extraction, such as a heteroscedastic discriminant analysis transform, in combination with a maximum likelihood linear transform is applied during front-end processing of a digital speech signal. The extracted features reduce the word error rate. A discriminative acoustic model is applied by generating state-level lattices using Maximum Mutual Information Estimation. Recognition networks of language models are replaced by their closure. Latency is reduced by eliminating segmentation such that a number of words/sentences can be recognized as a single utterance. Latency is further reduced by performing front-end normalization in a causal fashion.Type: GrantFiled: November 21, 2002Date of Patent: April 19, 2011Assignee: AT&T Intellectual Property II, L.P.Inventors: Vincent Goffin, Michael Dennis Riley, Murat Saraclar
-
Publication number: 20110087492Abstract: A speech characteristic-amount calculation circuit 31 calculates an amount of speech characteristics of each phrase in input speech. An estimation process likelihood calculation circuit 33 compares the calculated speech characteristic amount of a phrase with speech pattern sequence information of a plurality of phrases stored in a storage unit 34 to select a plurality of candidates having from a higher likelihood value to a lower likelihood value for the phrases. A recognition filtering device 4 determines whether to reject or not reject the extracted candidates based on the likelihood difference ratio between the difference in likelihood values between the first candidate and the second candidate and the difference in likelihood values between the second candidate and the third candidate.Type: ApplicationFiled: May 11, 2009Publication date: April 14, 2011Applicant: RayTron, Inc.Inventors: Mitsuji Yoshida, Kazutaka Hyodo
-
Patent number: 7921012Abstract: A speech recognition apparatus includes a first storing unit configured to store a first acoustic model invariable regardless of speaker and environment, a second storing unit configured to store a classification model that has shared parameters and non-shared parameters with the first acoustic model to classify second acoustic models, a recognizing unit configured to calculate a first likelihood with regard to the input speech by applying the first acoustic model to the input speech and obtain calculation result on the shared parameter and a plurality of candidate words that have relatively large values as the first likelihood, and a calculating unit configured to calculate a second likelihood for each of the groups with regard to the input speech by use of the calculation result on the shared parameters and the non-shared parameters of the classification model.Type: GrantFiled: September 18, 2007Date of Patent: April 5, 2011Assignee: Kabushiki Kaisha ToshibaInventors: Hiroshi Fujimura, Takashi Masuko
-
Patent number: 7912717Abstract: The invention uses the ModelGrower program to generate possible candidates from an original or aggregated model. An isomorphic reduction program operates on the candidates to identify and exclude isomorphic models. A Markov model evaluation and optimization program operates on the remaining non-isomorphic candidates. The candidates are optimized and the ones that most closely conform to the data are kept. The best optimized candidate of one stage becomes the starting candidate for the next stage where ModelGrower and the other programs operate on the optimized candidate to generate a new optimized candidate. The invention repeats the steps of growing, excluding isomorphs, evaluating and optimizing until such repetitions yield no significantly better results.Type: GrantFiled: November 18, 2005Date of Patent: March 22, 2011Inventor: Albert Galick
-
Patent number: 7912720Abstract: A system, method and computer-readable medium for practicing a method of emotion detection during a natural language dialog between a human and a computing device are disclosed. The method includes receiving an utterance from a user in a natural language dialog between a human and a computing device, receiving contextual information regarding the natural language dialog which is related to changes of emotion over time in the dialog, and detecting an emotion of the user based on the received contextual information. Examples of contextual information include, for example, differential statistics, joint statistics and distance statistics.Type: GrantFiled: July 20, 2005Date of Patent: March 22, 2011Assignee: AT&T Intellectual Property II, L.P.Inventors: Dilek Z. Hakkani-Tur, Jackson J. Liscombe, Guiseppe Riccardi
-
Publication number: 20110046953Abstract: A method for recognizing speech involves reciting, into a speech recognition system, an utterance including a numeric sequence that contains a digit string including a plurality of tokens and detecting a co-articulation problem related to at least two potentially co-articulated tokens in the digit string. The numeric sequence may be identified using i) a dynamically generated possible numeric sequence that potentially corresponds with the numeric sequence, and/or ii) at least one supplemental acoustic model. Also disclosed herein is a system for accomplishing the same.Type: ApplicationFiled: August 21, 2009Publication date: February 24, 2011Applicant: GENERAL MOTORS COMPANYInventors: Uma Arun, Sherri J. Voran-Nowak, Rathinavelu Chengalvarayan, Gaurav Talwar
-
Patent number: 7895040Abstract: According to an embodiment, voice recognition apparatus includes units of: acoustic processing, voice interval detecting, dictionary, collating, search target selecting, storing and determining, and voice recognition method includes processes of: selecting a search range on basis of a beam search, setting and storing a standard frame, storing an output probability of a certain transition path, determining whether or not the output probability of a certain path is stored. Number of times of calculation of the output probability is reduced by selecting the search range on basis of the beam search, calculating the output probability of the certain transition path only once in an interval from when the standard frame is set to when the standard frame is renewed, and storing and using thus calculated value as an approximate value of the output probability in subsequent frames.Type: GrantFiled: March 30, 2007Date of Patent: February 22, 2011Assignee: Kabushiki Kaisha ToshibaInventors: Masaru Sakai, Shinichi Tanaka
-
Publication number: 20110040561Abstract: A method for compensating inter-session variability for automatic extraction of information from an input voice signal representing an utterance of a speaker, includes: processing the input voice signal to provide feature vectors each formed by acoustic features extracted from the input voice signal at a time frame; computing an intersession variability compensation feature vector; and computing compensated feature vectors based on the extracted feature vectors and the intersession variability compensation feature vector.Type: ApplicationFiled: May 16, 2006Publication date: February 17, 2011Inventors: Claudio Vair, Daniele Colibro, Pietro Laface
-
Patent number: 7890325Abstract: Speech recognition such as command and control speech recognition generally use a context free grammar to constrain the decoding process. Word or subword background model are constructed to repopulate dynamic hypothesis space, especially when word spareness is at issue. The background models can be later used in speech recognition. During speech recognition, background and conventional context free grammar decoding are used to measure confidence. The discussion above is merely provided for general background information and is not intended to be used as an aid in determining the scope of the claimed subject matter.Type: GrantFiled: March 16, 2006Date of Patent: February 15, 2011Assignee: Microsoft CorporationInventors: Peng Liu, Ye Tian, Jian-Lai Zhou, Frank Kao-Ping K. Soong
-
Publication number: 20110035216Abstract: The invention can recognize any several languages at the same time without using samples. The important skill is that features of known words in any language are extracted from unknown words or continuous voices. These unknown words represented by matrices are spread in the 144-dimensional space. The feature of a known word of any language represented by a matrix is simulated by the surrounding unknown words. The invention includes 12 elastic frames of equal length without filter and without overlap to normalize the signal waveform of variable length for a word, which has one to several syllables, into a 12×12 matrix as a feature of the word. The invention can improve the feature such that the speech recognition of an unknown sentence is correct. The invention can correctly recognize any languages without samples, such as English, Chinese, German, French, Japanese, Korean, Russian, Cantonese, Taiwanese, etc.Type: ApplicationFiled: August 5, 2009Publication date: February 10, 2011Inventors: Tze Fen LI, Tai-Jan Lee Li, Shih-Tzung Li, Shih-Hon Li, Li-Chuan Liao
-
Patent number: 7877258Abstract: Systems, methods, and apparatuses, including computer program products, are provided for representing language models. In some implementations, a computer-implemented method is provided. The method includes generating a compact language model including receiving a collection of n-grams from the corpus, each n-gram of the collection having a corresponding first probability of occurring in the corpus and generating a trie representing the collection of n-grams. The method also includes using the language model to identify a second probability of a particular string of words occurring.Type: GrantFiled: March 29, 2007Date of Patent: January 25, 2011Assignee: Google Inc.Inventors: Ciprian Chelba, Thorsten Brants
-
Publication number: 20110015925Abstract: A speech recognition method, comprising: receiving a speech input in a first noise environment which comprises a sequence of observations; determining the likelihood of a sequence of words arising from the sequence of observations using an acoustic model, comprising: providing an acoustic model for performing speech recognition on a input signal which comprises a sequence of observations, wherein said model has been trained to recognise speech in a second noise environment, said model having a plurality of model parameters relating to the probability distribution of a word or part thereof being related to an observation; adapting the model trained in the second environment to that of the first environment; the speech recognition method further comprising determining the likelihood of a sequence of observations occurring in a given language using a language model; combining the likelihoods determined by the acoustic model and the language model and outputting a sequence of words identified from said speeType: ApplicationFiled: March 26, 2010Publication date: January 20, 2011Applicant: Kabushiki Kaisha ToshibaInventors: Haitian Xu, Mark John Francis Gales
-
Patent number: 7870000Abstract: The present disclosure relates to prompting for a spoken response that provides input for multiple elements. A single spoken utterance including content for multiple elements can be received, where each element is mapped to a data field. The spoken utterance can be speech-to-text converted to derive values for each of the multiple elements. An utterance level confidence score can be determined, which can fall below an associated certainty threshold. Element-level confidence scores for each of the derived elements can then be ascertained. A first set of the multiple elements can have element-level confidence scores above an associated certainty threshold and a second set can have scores below. Values can be stored in data fields mapped to the first set. A prompt for input for the second set can be played.Type: GrantFiled: March 28, 2007Date of Patent: January 11, 2011Assignee: Nuance Communications, Inc.Inventors: Soonthorn Ativanichayaphong, Gerald M. McCobb, Paritosh D. Patel, Marc White
-
Publication number: 20100318354Abstract: Technologies are described herein for noise adaptive training to achieve robust automatic speech recognition. Through the use of these technologies, a noise adaptive training (NAT) approach may use both clean and corrupted speech for training. The NAT approach may normalize the environmental distortion as part of the model training. A set of underlying “pseudo-clean” model parameters may be estimated directly. This may be done without point estimation of clean speech features as an intermediate step. The pseudo-clean model parameters learned from the NAT technique may be used with a Vector Taylor Series (VTS) adaptation. Such adaptation may support decoding noisy utterances during the operating phase of a automatic voice recognition system.Type: ApplicationFiled: June 12, 2009Publication date: December 16, 2010Applicant: Microsoft CorporationInventors: Michael Lewis Seltzer, James Garnet Droppo, Ozlem Kalinli, Alejandro Acero
-
Patent number: 7848926Abstract: A speech recognition system is provided where a user may more efficiently and easily correct a recognition error resulting from speech recognition. The system compares multiple inputted words with multiple stored words and determines a most-competitive word candidate. The system selects one or more competitive words that have competitive probabilities close to the competitive probability of the most-competitive word candidate and displays the one or more competitive words adjacent to the most-competitive word candidate. The system selects an appropriate correction word from the one or more competitive words and replaces one of the most competitive word candidate with the correction word.Type: GrantFiled: November 18, 2005Date of Patent: December 7, 2010Assignee: National Institute of Advanced Industrial Science and TechnologyInventors: Masataka Goto, Jun Ogata
-
Patent number: 7835909Abstract: A method and apparatus for normalizing a histogram utilizing a backward cumulative histogram which can cumulate a probability distribution function in an order from a greatest to smallest value so as to estimate a noise robust histogram. A method of normalizing a speech feature vector includes: extracting the speech feature vector from a speech signal; calculating a probability distribution function using the extracted speech feature vector; calculating a backward cumulative distribution function by cumulating the probability distribution function in an order from a largest to smallest value; and normalizing a histogram using the backward cumulative distribution function.Type: GrantFiled: December 12, 2006Date of Patent: November 16, 2010Assignee: Samsung Electronics Co., Ltd.Inventors: So-Young Jeong, Gil Jin Jang, Kwang Cheol Oh
-
Patent number: 7822614Abstract: A language analyzer performs speech recognition on a speech input by a speech input unit, specifies a possible word which is represented by the speech, and the score thereof, and supplies word data representing them to an agent processing unit. The agent processing unit stores process item data which defines a data acquisition process to acquire word data or the like, a discrimination process, and an input/output process, and wires or data defining transition from one process to another and giving a weighting factor to the transition, and executes a flow represented generally by the process item data and the wires to thereby control devices belonging to an input/output target device group. To which process in the flow the transition takes place is determined by the weighting factor of each wire, which is determined by the connection relationship between a point where the process has proceeded and the wire, and the score of word data.Type: GrantFiled: December 6, 2004Date of Patent: October 26, 2010Assignee: Kabushikikaisha KenwoodInventor: Rika Koyama
-
Patent number: 7813925Abstract: When adjacent times or the small change of an observation signal is determined, a distribution which maximizes the output probability of a mixture distribution does not change at a high possibility. By using this fact, when obtaining the output probability of the mixture distribution HMM, a distribution serving as a maximum output probability is stored. When adjacent times or the small change of the observation signal is determined, the output probability of the stored distribution serves as the output probability of the mixture distribution. This can reduce the output probability calculation of other distributions when calculating the output probability of the mixture distribution, thereby reducing the calculation amount required for output probabilities.Type: GrantFiled: April 6, 2006Date of Patent: October 12, 2010Assignee: Canon Kabushiki KaishaInventors: Hiroki Yamamoto, Masayuki Yamada
-
Publication number: 20100256977Abstract: Described is a technology by which a maximum entropy (MaxEnt) model, such as used as a classifier or in a conditional random field or hidden conditional random field that embed the maximum entropy model, uses continuous features with continuous weights that are continuous functions of the feature values (instead of single-valued weights). The continuous weights may be approximated by a spline-based solution. In general, this converts the optimization problem into a standard log-linear optimization problem without continuous weights at a higher-dimensional space.Type: ApplicationFiled: April 1, 2009Publication date: October 7, 2010Applicant: Microsoft CorporationInventors: Dong Yu, Li Deng, Alejandro Acero
-
Patent number: 7809570Abstract: Systems and methods for receiving natural language queries and/or commands and execute the queries and/or commands. The systems and methods overcomes the deficiencies of prior art speech query and response systems through the application of a complete speech-based information query, retrieval, presentation and command environment. This environment makes significant use of context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for one or more users making queries or commands in multiple domains. Through this integrated approach, a complete speech-based natural language query and response environment can be created. The systems and methods creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context and presenting the expected results for a particular question or command.Type: GrantFiled: July 7, 2008Date of Patent: October 5, 2010Assignee: VoiceBox Technologies, Inc.Inventors: Robert A. Kennewick, David Locke, Michael R. Kennewick, Sr., Michael R. Kennewick, Jr., Richard Kennewick, Tom Freeman
-
Patent number: 7809566Abstract: A method for use in automatic speech recognition corrects erroneous recognition elements within a recognition hypothesis. A user input is recognized as a correction hypothesis which contains various recognition elements. A non-deterministic alignment is performed to align at least a portion of the correction hypothesis with an earlier recognition hypothesis which also contains various recognition elements such that the recognition elements in the aligned portion of the correction hypothesis are determined to most likely, correspond to a range of recognition elements in the earlier recognition hypotheses. The recognition elements in the range of recognition elements in the earlier recognition hypothesis are replaced with the recognition elements in the aligned portion of the correction hypothesis.Type: GrantFiled: October 13, 2006Date of Patent: October 5, 2010Assignee: Nuance Communications, Inc.Inventor: Ralf Meermeier
-
Patent number: 7792671Abstract: Outputs of an automatic probabilistic event detection system, such as a fact extraction system, a speech-to-text engine or an automatic character recognition system, are matched with comparable results produced manually or by a different system. This comparison allows statistical modeling of the run-time behavior of the event detection system. This model can subsequently be used to give supplemental or replacement data for an output sequence of the system. In particular, the model can effectively calibrate the system for use with data of a particular statistical nature.Type: GrantFiled: February 5, 2004Date of Patent: September 7, 2010Assignee: Verint Americas Inc.Inventor: Michael Brand
-
Patent number: 7792667Abstract: A method of extracting significant phrases from one or more documents stored in a computer-readable medium. A sequence of words is read from the one or more documents and a score is determined for each word in the sequence based on the length of the word. The score for each word in the sequence is compared against a threshold score. The sequence of words is indicated to be a significant phrase if the number of words in the sequences that have a score greater than the threshold score equals or exceeds a predetermined number. A sentence containing the sequence of words is retrieved from the document, if the sequence of words is a significant phrase. An abstract of the document is searched to determine if the sentence has been previously included in the abstract. If not, the sentence is added to the abstract.Type: GrantFiled: September 26, 2008Date of Patent: September 7, 2010Inventors: Garnet R. Chaney, Robert F. Richardson, Seymour I. Rubinstein
-
Patent number: 7788094Abstract: A method for performing conditional maximum entropy modeling includes constructing a conditional maximum entropy model, and incorporating an observation confidence score into the model to reduce an effect due to an uncertain observation.Type: GrantFiled: January 29, 2007Date of Patent: August 31, 2010Assignee: Robert Bosch GmbHInventors: Farhad Farahani, Fuliang Weng, Qi Zhang