Probability Patents (Class 704/240)
  • Publication number: 20110288865
    Abstract: A non-intrusive speech quality estimation technique is based on statistical or probability models such as Gaussian Mixture Models (“GMMs”). Perceptual features are extracted from the received speech signal and assessed by an artificial reference model formed using statistical models. The models characterize the statistical behavior of speech features. Consistency measures between the input speech features and the models are calculated to form indicators of speech quality. The consistency values are mapped to a speech quality score using a mapping optimized using machine learning algorithms, such as Multivariate Adaptive Regression Splines (“MARS”). The technique provides competitive or better quality estimates relative to known techniques while having lower computational complexity.
    Type: Application
    Filed: August 1, 2011
    Publication date: November 24, 2011
    Inventors: Wai-Yip Chan, Tiago H. Falk, Qingfeng Xu
  • Patent number: 8055503
    Abstract: A system and method provide an audio analysis intelligence tool with ad-hoc search capabilities using spoken words as an organized data form. An SQL-like interface is used to process and search audio data and combine it with other traditional data forms to enhance searching of audio segments to identify those audio segments satisfying minimum confidence levels for a match.
    Type: Grant
    Filed: November 1, 2006
    Date of Patent: November 8, 2011
    Assignee: Siemens Enterprise Communications, Inc.
    Inventors: Robert Scarano, Lawrence Mark
  • Patent number: 8050929
    Abstract: An optimal selection or decision strategy is described through an example that includes use in dialog systems. The selection strategy or method includes receiving multiple predictions and multiple probabilities. The received predictions predict the content of a received input and each of the probabilities corresponds to one of the predictions. In an example dialog system, the received input includes an utterance. The selection method includes dynamically selecting a set of predictions from the received predictions by generating ranked predictions. The ranked predictions are generated by ordering the plurality of predictions according to descending probability.
    Type: Grant
    Filed: August 24, 2007
    Date of Patent: November 1, 2011
    Assignee: Robert Bosch GmbH
    Inventors: Junling Hu, Fabrizio Morbini, Fuliang Weng, Xue Liu
  • Patent number: 8045646
    Abstract: Provided are an apparatus for estimating a phase error and a phase error correcting system using the phase error estimating apparatus. The apparatus includes: a probability value estimating unit for estimating a negative log probability value for each transmission symbol by transforming a soft output information transferred from the outside to a log A posterior probability ratio (LAPPR) value; an APP value calculating unit for calculating a posterior probability (APP) value by applying a negative exponential function to the transmission symbol; an average value deciding unit for deciding an average value for each transmission symbol using the probability information entirely, partially, or selectively according to a probability information type; and a symbol phase estimating unit for estimating a phase of a symbol based on the decided average value.
    Type: Grant
    Filed: September 11, 2006
    Date of Patent: October 25, 2011
    Assignee: Electronics and Telecommunications Research Institute
    Inventors: Pan-Soo Kim, Byoung-Hak Kim, Yun-Jeong Song, Deock-Gil Oh, Ho-Jin Lee, Jun Heo, Joong-Gon Ryoo
  • Patent number: 8024188
    Abstract: An optimal selection or decision strategy is described through an example that includes use in dialog systems. The selection strategy or method includes receiving multiple predictions and multiple probabilities. The received predictions predict the content of a received input and each of the probabilities corresponds to one of the predictions. In an example dialog system, the received input includes an utterance. The selection method includes dynamically selecting a set of predictions from the received predictions by generating ranked predictions. The ranked predictions are generated by ordering the plurality of predictions according to descending probability.
    Type: Grant
    Filed: August 24, 2007
    Date of Patent: September 20, 2011
    Assignee: Robert Bosch GmbH
    Inventors: Junling Hu, Fabrizio Morbini, Fuliang Weng, Xue Liu
  • Publication number: 20110224983
    Abstract: Described is a technology by which a probability is estimated for a token in a sequence of tokens based upon a number of zero or more times (actual counts) that the sequence was observed in training data. The token may be a word in a word sequence, and the estimated probability may be used in a statistical language model. A discount parameter is set independently of interpolation parameters. If the sequence was observed at least once in the training data, a discount probability and an interpolation probability are computed and summed to provide the estimated probability. If the sequence was not observed, the probability is estimated by computing a backoff probability. Also described are various ways to obtain the discount parameter and interpolation parameters.
    Type: Application
    Filed: March 11, 2010
    Publication date: September 15, 2011
    Applicant: Microsoft Corporation
    Inventor: Robert Carter Moore
  • Publication number: 20110218803
    Abstract: A method for assessing intelligibility of speech represented by a speech signal includes providing a speech signal and performing a feature extraction on at least one frame of the speech signal so as to obtain a feature vector for each of the at least one frame of the speech signal. The feature vector is input to a statistical machine learning model so as to obtain an estimated posterior probability of phonemes in the at least one frame as an output including a vector of phoneme posterior probabilities of different phonemes for each of the at least one frame of the speech signal. An entropy estimation is performed on the vector of phoneme posterior probabilities of the at least one frame of the speech signal so as to evaluate intelligibility of the at least one frame of the speech signal. An intelligibility measure is output for the at least one frame of the speech signal.
    Type: Application
    Filed: March 4, 2011
    Publication date: September 8, 2011
    Applicant: DEUTSCHE TELEKOM AG
    Inventors: Hamed Ketabdar, Juan-Pablo Ramirez
  • Patent number: 8014999
    Abstract: The invention provides a softscaled frequency compensation function that allows the evaluation of a first quality measure indicating a global impact of all distortions in an audio transmission system, including linear frequency response distortions and second quality measure that only lakes into account the impact of linear frequency response distortions. The softscaled frequency compensation function is derived from a softscaled ratio between a time integrated output and a time integrated input power density functions. The first quality measure is derived from the difference loudness density function as function of time and frequency, using the frequency compensated input loudness density function and the gain compensated output loudness density function both as a function of time and frequency, in the same manner as carried out in ITU standard P.862.
    Type: Grant
    Filed: September 20, 2005
    Date of Patent: September 6, 2011
    Assignee: Nederlandse Organisatie voor toegepast - natuurwetenschappelijk Onderzoek TNO
    Inventor: John Gerard Beerends
  • Patent number: 8014536
    Abstract: Improved audio source separation is provided by providing an audio dictionary for each source to be separated. Thus the invention can be regarded as providing “partially blind” source separation as opposed to the more commonly considered “blind” source separation problem, where no prior information about the sources is given. The audio dictionaries are probabilistic source models, and can be derived from training data from the sources to be separated, or from similar sources. Thus a library of audio dictionaries can be developed to aid in source separation. An unmixing and deconvolutive transformation can be inferred by maximum likelihood (ML) given the received signals and the selected audio dictionaries as input to the ML calculation. Optionally, frequency-domain filtering of the separated signal estimates can be performed prior to reconstructing the time-domain separated signal estimates. Such filtering can be regarded as providing an “audio skin” for a recovered signal.
    Type: Grant
    Filed: December 1, 2006
    Date of Patent: September 6, 2011
    Assignee: Golden Metallic, Inc.
    Inventor: Hagai Thomas Attias
  • Patent number: 8015007
    Abstract: A speech recognition apparatus includes a first grammar storage unit configured to store one or more grammar segments, a second grammar storage unit configured to store one or more grammar segments, a first decoder configured to carry out a decoding process by referring to the grammar segment stored in the second grammar storage unit, a grammar transfer unit configured to transfer a trailing grammar segment from the first grammar storage unit to the second grammar storage unit, a second decoder configured to operate in parallel to the grammar transfer unit and carry out the decoding process by referring to the grammar segment stored in the second grammar storage unit, and a recognition control unit configured to monitor the state of transfer of the trailing grammar segment carried out by the grammar transfer unit and activate the both decoders by switching the operation thereof according to the state of transfer of the grammar segment.
    Type: Grant
    Filed: March 13, 2008
    Date of Patent: September 6, 2011
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Masaru Sakai
  • Patent number: 8005665
    Abstract: A method of extracting significant phrases from one or more documents stored in a computer-readable medium. A sequence of words is read from the one or more documents and a score is determined for each word in the sequence based on the length of the word. The score for each word in the sequence is compared against a threshold score. The sequence of words is indicated to be a significant phrase if the number of words in the sequences that have a score greater than the threshold score equals or exceeds a predetermined number. A sentence containing the sequence of words is retrieved from the document, if the sequence of words is a significant phrase. An abstract of the document is searched to determine if the sentence has been previously included in the abstract. If not, the sentence is added to the abstract.
    Type: Grant
    Filed: July 23, 2010
    Date of Patent: August 23, 2011
    Assignee: Schukhaus Group GmbH, LLC
    Inventors: Garnet R. Chaney, Robert F. Richardson, Seymour I. Rubinstein
  • Patent number: 8000962
    Abstract: A method and system for using input signal quality in an automatic speech recognition system. The method includes measuring the quality of an input signal into a speech recognition system and varying a rejection threshold of the speech recognition system at runtime in dependence on the measurement of the input signal quality. If the measurement of the input signal quality is low, the rejection threshold is reduced and, if the measurement of the input signal quality is high, the rejection threshold is increased. The measurement of the input signal quality may be based on one or more of the measurements of signal-to-noise ratio, loudness, including clipping, and speech signal duration.
    Type: Grant
    Filed: May 19, 2006
    Date of Patent: August 16, 2011
    Assignee: Nuance Communications, Inc.
    Inventors: John Doyle, John Brian Pickering
  • Patent number: 7996214
    Abstract: Disclosed are a system and method for exploiting information in an utterance for dialog act tagging. An exemplary method includes receiving a user utterance, computing at periodic intervals at least one parameter in the user utterance, quantizing the at least one parameter at each periodic interval, approximating conditional probabilities using an n-gram over a sliding window over the periodic intervals and tagging the utterance as a dialog act based on the approximated conditional probabilities.
    Type: Grant
    Filed: November 1, 2007
    Date of Patent: August 9, 2011
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Srinivas Bangalore, Vivek Kumar Rangarajan Sridhar
  • Publication number: 20110184735
    Abstract: Embodiments are disclosed that relate to the use of identity information to help avoid the occurrence of false positive speech recognition events in a speech recognition system. One embodiment provides a method comprising receiving speech recognition data comprising a recognized speech segment, acoustic locational data related to a location of origin of the recognized speech segment as determined via signals from the microphone array, and confidence data comprising a recognition confidence value, and also receiving image data comprising visual locational information related to a location of each person in an image. The acoustic locational data is compared to the visual locational data to determine whether the recognized speech segment originated from a person in the field of view of the image sensor, and the confidence data is adjusted depending on this determination.
    Type: Application
    Filed: January 22, 2010
    Publication date: July 28, 2011
    Applicant: MICROSOFT CORPORATION
    Inventors: Jason Flaks, Dax Hawkins, Christian Klein, Mitchell Stephen Dernis, Tommer Leyvand, Ali M. Vassigh, Duncan McKay
  • Publication number: 20110173000
    Abstract: A word category estimation apparatus (100) includes a word category model (5) which is formed from a probability model having a plurality of kinds of information about a word category as features, and includes information about an entire word category graph as at least one of the features. A word category estimation unit (4) receives the word category graph of a speech recognition hypothesis to be processed, computes scores by referring to the word category model for respective arcs that form the word category graph, and outputs a word category sequence candidate based on the scores.
    Type: Application
    Filed: December 19, 2008
    Publication date: July 14, 2011
    Inventors: Hitoshi Yamamoto, Miki Kiyokazu
  • Patent number: 7970613
    Abstract: Use of runtime memory may be reduced in a data processing algorithm that uses one or more probability distribution functions. Each probability distribution function may be characterized by one or more uncompressed mean values and one or more variance values. The uncompressed mean and variance values may be represented by ?-bit floating point numbers, where ? is an integer greater than 1. The probability distribution functions are converted to compressed probability functions having compressed mean and/or variance values represented as ?-bit integers, where ? is less than ?, whereby the compressed mean and/or variance values occupy less memory space than the uncompressed mean and/or variance values. Portions of the data processing algorithm can be performed with the compressed mean and variance values.
    Type: Grant
    Filed: November 12, 2005
    Date of Patent: June 28, 2011
    Assignee: Sony Computer Entertainment Inc.
    Inventor: Ruxin Chen
  • Patent number: 7970614
    Abstract: The present invention provides a system and method for treating distortion propagated though a detection system. The system includes a compensation module that compensates for untreated distortions propagating through the detection compensation system, a user model pool that comprises of a plurality of model sets, and a model selector that selects at least one model set from plurality of model sets in the user model pool. The compensation is accomplished by continually producing scores distributed according to a prescribed distribution for the at least one model set and mitigating the adverse effects of the scores being distorted and lying off a pre-set operating point. The method for treating distortion propagated though a detection system includes receiving a signal from a remote device, and compensating the signal for untreated distortions.
    Type: Grant
    Filed: May 8, 2007
    Date of Patent: June 28, 2011
    Assignee: Nuance Communications, Inc.
    Inventors: Janice J. Kim, Jiri Navratil, Jason W. Pelecanos, Ganesh N. Ramaswamy
  • Publication number: 20110153326
    Abstract: A system and method for extracting acoustic features and speech activity on a device and transmitting them in a distributed voice recognition system. The distributed voice recognition system includes a local VR engine in a subscriber unit and a server VR engine on a server. The local VR engine comprises a feature extraction (FE) module that extracts features from a speech signal, and a voice activity detection module (VAD) that detects voice activity within a speech signal. The system includes filters, framing and windowing modules, power spectrum analyzers, a neural network, a nonlinear element, and other components to selectively provide an advanced front end vector including predetermined portions of the voice activity detection indication and extracted features from the subscriber unit to the server. The system also includes a module to generate additional feature vectors on the server from the received features using a feed-forward multilayer perceptron (MLP) and providing the same to the speech server.
    Type: Application
    Filed: February 9, 2011
    Publication date: June 23, 2011
    Applicant: QUALCOMM INCORPORATED
    Inventors: HARINATH GARUDADRI, HYNEK HERMANSKY, LUKAS BURGET, PRATIBHA JAIN, SACHIN KAJAREKAR, SUNIL SIVADAS, STEPHANE N. DUPONT, MARIA CARMEN BENITEZ ORTUZAR, NELSON H. MORGAN
  • Publication number: 20110144986
    Abstract: Described is a calibration model for use in a speech recognition system. The calibration model adjusts the confidence scores output by a speech recognition engine to thereby provide an improved calibrated confidence score for use by an application. The calibration model is one that has been trained for a specific usage scenario, e.g., for that application, based upon a calibration training set obtained from a previous similar/corresponding usage scenario or scenarios. Different calibration models may be used with different usage scenarios, e.g., during different conditions. The calibration model may comprise a maximum entropy classifier with distribution constraints, trained with continuous raw confidence scores and multi-valued word tokens, and/or other distributions and extracted features.
    Type: Application
    Filed: December 10, 2009
    Publication date: June 16, 2011
    Applicant: Microsoft Corporation
    Inventors: Dong Yu, Li Deng, Jinyu Li
  • Publication number: 20110137651
    Abstract: An automatic speech recognition (ASR) system and method is provided for controlling the recognition of speech utterances generated by an end user operating a communications device. The ASR system and method can be used with a mobile device that is used in a communications network. The ASR system can be used for ASR of speech utterances input into a mobile device, to perform compensating techniques using at least one characteristic and for updating an ASR speech recognizer associated with the ASR system by determined and using a background noise value and a distortion value that is based on the features of the mobile device. The ASR system can be used to augment a limited data input capability of a mobile device, for example, caused by limited input devices physically located on the mobile device.
    Type: Application
    Filed: February 14, 2011
    Publication date: June 9, 2011
    Applicant: AT&T Intellectual Property II, L.P.
    Inventors: Richard C. ROSE, Sarangarajan PATHASARATHY, Aaron Edward ROSENBERG, Shrikanth Sambasivan NARAYANAN
  • Publication number: 20110131042
    Abstract: Disclosed is a dialogue speech recognition system that can expand the scope of applications by employing a universal dialogue structure as the condition for speech recognition of dialogue speech between persons. An acoustic likelihood computation means (701) provides a likelihood that a speech signal input from a given phoneme sequence will occur. A linguistic likelihood computation means (702) provides a likelihood that a given word sequence will occur. A maximum likelihood candidate search means (703) uses the likelihoods provided by the acoustic likelihood computation means and the linguistic likelihood computation means to provide a word sequence with the maximum likelihood of occurring from a speech signal. Further, the linguistic likelihood computation means (702) provides different linguistic likelihoods when the speaker who generated the acoustic signal input to the speech recognition means does and does not have the turn to speak.
    Type: Application
    Filed: May 12, 2009
    Publication date: June 2, 2011
    Inventor: Kentaro Nagatomo
  • Patent number: 7953598
    Abstract: A device receives a voice recognition statistic from a voice recognition application and applies a grammar improvement rule based on the voice recognition statistic. The device also automatically adjusts a weight of the voice recognition statistic based on the grammar improvement rule, and outputs the weight adjusted voice recognition statistic for use in the voice recognition application.
    Type: Grant
    Filed: December 17, 2007
    Date of Patent: May 31, 2011
    Assignee: Verizon Patent and Licensing Inc.
    Inventor: Kevin W. Brown
  • Patent number: 7945552
    Abstract: A system of the present invention stores: a first index which designates lists of keywords contained in texts from identifications of the respective texts; a second index which designates lists of texts containing keywords from identifications of the respective keywords; and the number of texts containing the respective keywords. Then, upon receiving an input of a text search condition, the system calculates an estimation of search time by the first index and an estimation of search time by the second index, and determines which one of the first and second indexes makes a search faster. Then, by using the index which has been determined to make the search faster, the system searches for keywords which appear in texts satisfying the text search condition with higher frequency.
    Type: Grant
    Filed: March 26, 2008
    Date of Patent: May 17, 2011
    Assignee: International Business Machines Corporation
    Inventors: Daisuke Takuma, Issei Yoshida, Yuta Tsuboi
  • Patent number: 7941317
    Abstract: Systems and methods for low-latency real-time speech recognition/transcription. A discriminative feature extraction, such as a heteroscedastic discriminant analysis transform, in combination with a maximum likelihood linear transform is applied during front-end processing of a digital speech signal. The extracted features reduce the word error rate. A discriminative acoustic model is applied by generating state-level lattices using Maximum Mutual Information Estimation. Recognition networks of language models are replaced by their closure. Latency is reduced by eliminating segmentation such that a number of words/sentences can be recognized as a single utterance. Latency is further reduced by performing front-end normalization in a causal fashion.
    Type: Grant
    Filed: June 5, 2007
    Date of Patent: May 10, 2011
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Vincent Goffin, Michael Dennis Riley, Murat Saraclar
  • Publication number: 20110099012
    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for estimating reliability of alternate speech recognition hypotheses. A system configured to practice the method receives an N-best list of speech recognition hypotheses and features describing the N-best list, determines a first probability of correctness for each hypothesis in the N-best list based on the received features, determines a second probability that the N-best list does not contain a correct hypothesis, and uses the first probability and the second probability in a spoken dialog. The features can describe properties of at least one of a lattice, a word confusion network, and a garbage model. In one aspect, the N-best lists are not reordered according to reranking scores. The determination of the first probability of correctness can include a first stage of training a probabilistic model and a second stage of distributing mass over items in a tail of the N-best list.
    Type: Application
    Filed: October 23, 2009
    Publication date: April 28, 2011
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Jason WILLIAMS, Suhrid BALAKRISHNAN
  • Patent number: 7933773
    Abstract: A natural language understanding monitoring system adapted to conduct an automated dialog with a user. If the system is unable to identify from the automated dialog, to at least a predetermined level of confidence, any one of a plurality of predetermined tasks as being a particular task that the user wants to have performed, the system makes a determination of the value of a probability that further automated dialog will enable the system to identify the particular task, and determines whether or not to conduct further automated dialog with the user, in an attempt to identify the particular task, based on the relative values of the determined probability and a predetermined threshold value. The probability value determination is based on inputs from the user during the automated dialog.
    Type: Grant
    Filed: March 25, 2009
    Date of Patent: April 26, 2011
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Allen Louis Gorin, Irene Langkilde Geary, Marilyn Ann Walker, Jeremy H. Wright
  • Patent number: 7930181
    Abstract: Systems and methods for low-latency real-time speech recognition/transcription. A discriminative feature extraction, such as a heteroscedastic discriminant analysis transform, in combination with a maximum likelihood linear transform is applied during front-end processing of a digital speech signal. The extracted features reduce the word error rate. A discriminative acoustic model is applied by generating state-level lattices using Maximum Mutual Information Estimation. Recognition networks of language models are replaced by their closure. Latency is reduced by eliminating segmentation such that a number of words/sentences can be recognized as a single utterance. Latency is further reduced by performing front-end normalization in a causal fashion.
    Type: Grant
    Filed: November 21, 2002
    Date of Patent: April 19, 2011
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Vincent Goffin, Michael Dennis Riley, Murat Saraclar
  • Publication number: 20110087492
    Abstract: A speech characteristic-amount calculation circuit 31 calculates an amount of speech characteristics of each phrase in input speech. An estimation process likelihood calculation circuit 33 compares the calculated speech characteristic amount of a phrase with speech pattern sequence information of a plurality of phrases stored in a storage unit 34 to select a plurality of candidates having from a higher likelihood value to a lower likelihood value for the phrases. A recognition filtering device 4 determines whether to reject or not reject the extracted candidates based on the likelihood difference ratio between the difference in likelihood values between the first candidate and the second candidate and the difference in likelihood values between the second candidate and the third candidate.
    Type: Application
    Filed: May 11, 2009
    Publication date: April 14, 2011
    Applicant: RayTron, Inc.
    Inventors: Mitsuji Yoshida, Kazutaka Hyodo
  • Patent number: 7921012
    Abstract: A speech recognition apparatus includes a first storing unit configured to store a first acoustic model invariable regardless of speaker and environment, a second storing unit configured to store a classification model that has shared parameters and non-shared parameters with the first acoustic model to classify second acoustic models, a recognizing unit configured to calculate a first likelihood with regard to the input speech by applying the first acoustic model to the input speech and obtain calculation result on the shared parameter and a plurality of candidate words that have relatively large values as the first likelihood, and a calculating unit configured to calculate a second likelihood for each of the groups with regard to the input speech by use of the calculation result on the shared parameters and the non-shared parameters of the classification model.
    Type: Grant
    Filed: September 18, 2007
    Date of Patent: April 5, 2011
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Hiroshi Fujimura, Takashi Masuko
  • Patent number: 7912717
    Abstract: The invention uses the ModelGrower program to generate possible candidates from an original or aggregated model. An isomorphic reduction program operates on the candidates to identify and exclude isomorphic models. A Markov model evaluation and optimization program operates on the remaining non-isomorphic candidates. The candidates are optimized and the ones that most closely conform to the data are kept. The best optimized candidate of one stage becomes the starting candidate for the next stage where ModelGrower and the other programs operate on the optimized candidate to generate a new optimized candidate. The invention repeats the steps of growing, excluding isomorphs, evaluating and optimizing until such repetitions yield no significantly better results.
    Type: Grant
    Filed: November 18, 2005
    Date of Patent: March 22, 2011
    Inventor: Albert Galick
  • Patent number: 7912720
    Abstract: A system, method and computer-readable medium for practicing a method of emotion detection during a natural language dialog between a human and a computing device are disclosed. The method includes receiving an utterance from a user in a natural language dialog between a human and a computing device, receiving contextual information regarding the natural language dialog which is related to changes of emotion over time in the dialog, and detecting an emotion of the user based on the received contextual information. Examples of contextual information include, for example, differential statistics, joint statistics and distance statistics.
    Type: Grant
    Filed: July 20, 2005
    Date of Patent: March 22, 2011
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Dilek Z. Hakkani-Tur, Jackson J. Liscombe, Guiseppe Riccardi
  • Publication number: 20110046953
    Abstract: A method for recognizing speech involves reciting, into a speech recognition system, an utterance including a numeric sequence that contains a digit string including a plurality of tokens and detecting a co-articulation problem related to at least two potentially co-articulated tokens in the digit string. The numeric sequence may be identified using i) a dynamically generated possible numeric sequence that potentially corresponds with the numeric sequence, and/or ii) at least one supplemental acoustic model. Also disclosed herein is a system for accomplishing the same.
    Type: Application
    Filed: August 21, 2009
    Publication date: February 24, 2011
    Applicant: GENERAL MOTORS COMPANY
    Inventors: Uma Arun, Sherri J. Voran-Nowak, Rathinavelu Chengalvarayan, Gaurav Talwar
  • Patent number: 7895040
    Abstract: According to an embodiment, voice recognition apparatus includes units of: acoustic processing, voice interval detecting, dictionary, collating, search target selecting, storing and determining, and voice recognition method includes processes of: selecting a search range on basis of a beam search, setting and storing a standard frame, storing an output probability of a certain transition path, determining whether or not the output probability of a certain path is stored. Number of times of calculation of the output probability is reduced by selecting the search range on basis of the beam search, calculating the output probability of the certain transition path only once in an interval from when the standard frame is set to when the standard frame is renewed, and storing and using thus calculated value as an approximate value of the output probability in subsequent frames.
    Type: Grant
    Filed: March 30, 2007
    Date of Patent: February 22, 2011
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Masaru Sakai, Shinichi Tanaka
  • Publication number: 20110040561
    Abstract: A method for compensating inter-session variability for automatic extraction of information from an input voice signal representing an utterance of a speaker, includes: processing the input voice signal to provide feature vectors each formed by acoustic features extracted from the input voice signal at a time frame; computing an intersession variability compensation feature vector; and computing compensated feature vectors based on the extracted feature vectors and the intersession variability compensation feature vector.
    Type: Application
    Filed: May 16, 2006
    Publication date: February 17, 2011
    Inventors: Claudio Vair, Daniele Colibro, Pietro Laface
  • Patent number: 7890325
    Abstract: Speech recognition such as command and control speech recognition generally use a context free grammar to constrain the decoding process. Word or subword background model are constructed to repopulate dynamic hypothesis space, especially when word spareness is at issue. The background models can be later used in speech recognition. During speech recognition, background and conventional context free grammar decoding are used to measure confidence. The discussion above is merely provided for general background information and is not intended to be used as an aid in determining the scope of the claimed subject matter.
    Type: Grant
    Filed: March 16, 2006
    Date of Patent: February 15, 2011
    Assignee: Microsoft Corporation
    Inventors: Peng Liu, Ye Tian, Jian-Lai Zhou, Frank Kao-Ping K. Soong
  • Publication number: 20110035216
    Abstract: The invention can recognize any several languages at the same time without using samples. The important skill is that features of known words in any language are extracted from unknown words or continuous voices. These unknown words represented by matrices are spread in the 144-dimensional space. The feature of a known word of any language represented by a matrix is simulated by the surrounding unknown words. The invention includes 12 elastic frames of equal length without filter and without overlap to normalize the signal waveform of variable length for a word, which has one to several syllables, into a 12×12 matrix as a feature of the word. The invention can improve the feature such that the speech recognition of an unknown sentence is correct. The invention can correctly recognize any languages without samples, such as English, Chinese, German, French, Japanese, Korean, Russian, Cantonese, Taiwanese, etc.
    Type: Application
    Filed: August 5, 2009
    Publication date: February 10, 2011
    Inventors: Tze Fen LI, Tai-Jan Lee Li, Shih-Tzung Li, Shih-Hon Li, Li-Chuan Liao
  • Patent number: 7877258
    Abstract: Systems, methods, and apparatuses, including computer program products, are provided for representing language models. In some implementations, a computer-implemented method is provided. The method includes generating a compact language model including receiving a collection of n-grams from the corpus, each n-gram of the collection having a corresponding first probability of occurring in the corpus and generating a trie representing the collection of n-grams. The method also includes using the language model to identify a second probability of a particular string of words occurring.
    Type: Grant
    Filed: March 29, 2007
    Date of Patent: January 25, 2011
    Assignee: Google Inc.
    Inventors: Ciprian Chelba, Thorsten Brants
  • Publication number: 20110015925
    Abstract: A speech recognition method, comprising: receiving a speech input in a first noise environment which comprises a sequence of observations; determining the likelihood of a sequence of words arising from the sequence of observations using an acoustic model, comprising: providing an acoustic model for performing speech recognition on a input signal which comprises a sequence of observations, wherein said model has been trained to recognise speech in a second noise environment, said model having a plurality of model parameters relating to the probability distribution of a word or part thereof being related to an observation; adapting the model trained in the second environment to that of the first environment; the speech recognition method further comprising determining the likelihood of a sequence of observations occurring in a given language using a language model; combining the likelihoods determined by the acoustic model and the language model and outputting a sequence of words identified from said spee
    Type: Application
    Filed: March 26, 2010
    Publication date: January 20, 2011
    Applicant: Kabushiki Kaisha Toshiba
    Inventors: Haitian Xu, Mark John Francis Gales
  • Patent number: 7870000
    Abstract: The present disclosure relates to prompting for a spoken response that provides input for multiple elements. A single spoken utterance including content for multiple elements can be received, where each element is mapped to a data field. The spoken utterance can be speech-to-text converted to derive values for each of the multiple elements. An utterance level confidence score can be determined, which can fall below an associated certainty threshold. Element-level confidence scores for each of the derived elements can then be ascertained. A first set of the multiple elements can have element-level confidence scores above an associated certainty threshold and a second set can have scores below. Values can be stored in data fields mapped to the first set. A prompt for input for the second set can be played.
    Type: Grant
    Filed: March 28, 2007
    Date of Patent: January 11, 2011
    Assignee: Nuance Communications, Inc.
    Inventors: Soonthorn Ativanichayaphong, Gerald M. McCobb, Paritosh D. Patel, Marc White
  • Publication number: 20100318354
    Abstract: Technologies are described herein for noise adaptive training to achieve robust automatic speech recognition. Through the use of these technologies, a noise adaptive training (NAT) approach may use both clean and corrupted speech for training. The NAT approach may normalize the environmental distortion as part of the model training. A set of underlying “pseudo-clean” model parameters may be estimated directly. This may be done without point estimation of clean speech features as an intermediate step. The pseudo-clean model parameters learned from the NAT technique may be used with a Vector Taylor Series (VTS) adaptation. Such adaptation may support decoding noisy utterances during the operating phase of a automatic voice recognition system.
    Type: Application
    Filed: June 12, 2009
    Publication date: December 16, 2010
    Applicant: Microsoft Corporation
    Inventors: Michael Lewis Seltzer, James Garnet Droppo, Ozlem Kalinli, Alejandro Acero
  • Patent number: 7848926
    Abstract: A speech recognition system is provided where a user may more efficiently and easily correct a recognition error resulting from speech recognition. The system compares multiple inputted words with multiple stored words and determines a most-competitive word candidate. The system selects one or more competitive words that have competitive probabilities close to the competitive probability of the most-competitive word candidate and displays the one or more competitive words adjacent to the most-competitive word candidate. The system selects an appropriate correction word from the one or more competitive words and replaces one of the most competitive word candidate with the correction word.
    Type: Grant
    Filed: November 18, 2005
    Date of Patent: December 7, 2010
    Assignee: National Institute of Advanced Industrial Science and Technology
    Inventors: Masataka Goto, Jun Ogata
  • Patent number: 7835909
    Abstract: A method and apparatus for normalizing a histogram utilizing a backward cumulative histogram which can cumulate a probability distribution function in an order from a greatest to smallest value so as to estimate a noise robust histogram. A method of normalizing a speech feature vector includes: extracting the speech feature vector from a speech signal; calculating a probability distribution function using the extracted speech feature vector; calculating a backward cumulative distribution function by cumulating the probability distribution function in an order from a largest to smallest value; and normalizing a histogram using the backward cumulative distribution function.
    Type: Grant
    Filed: December 12, 2006
    Date of Patent: November 16, 2010
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: So-Young Jeong, Gil Jin Jang, Kwang Cheol Oh
  • Patent number: 7822614
    Abstract: A language analyzer performs speech recognition on a speech input by a speech input unit, specifies a possible word which is represented by the speech, and the score thereof, and supplies word data representing them to an agent processing unit. The agent processing unit stores process item data which defines a data acquisition process to acquire word data or the like, a discrimination process, and an input/output process, and wires or data defining transition from one process to another and giving a weighting factor to the transition, and executes a flow represented generally by the process item data and the wires to thereby control devices belonging to an input/output target device group. To which process in the flow the transition takes place is determined by the weighting factor of each wire, which is determined by the connection relationship between a point where the process has proceeded and the wire, and the score of word data.
    Type: Grant
    Filed: December 6, 2004
    Date of Patent: October 26, 2010
    Assignee: Kabushikikaisha Kenwood
    Inventor: Rika Koyama
  • Patent number: 7813925
    Abstract: When adjacent times or the small change of an observation signal is determined, a distribution which maximizes the output probability of a mixture distribution does not change at a high possibility. By using this fact, when obtaining the output probability of the mixture distribution HMM, a distribution serving as a maximum output probability is stored. When adjacent times or the small change of the observation signal is determined, the output probability of the stored distribution serves as the output probability of the mixture distribution. This can reduce the output probability calculation of other distributions when calculating the output probability of the mixture distribution, thereby reducing the calculation amount required for output probabilities.
    Type: Grant
    Filed: April 6, 2006
    Date of Patent: October 12, 2010
    Assignee: Canon Kabushiki Kaisha
    Inventors: Hiroki Yamamoto, Masayuki Yamada
  • Publication number: 20100256977
    Abstract: Described is a technology by which a maximum entropy (MaxEnt) model, such as used as a classifier or in a conditional random field or hidden conditional random field that embed the maximum entropy model, uses continuous features with continuous weights that are continuous functions of the feature values (instead of single-valued weights). The continuous weights may be approximated by a spline-based solution. In general, this converts the optimization problem into a standard log-linear optimization problem without continuous weights at a higher-dimensional space.
    Type: Application
    Filed: April 1, 2009
    Publication date: October 7, 2010
    Applicant: Microsoft Corporation
    Inventors: Dong Yu, Li Deng, Alejandro Acero
  • Patent number: 7809570
    Abstract: Systems and methods for receiving natural language queries and/or commands and execute the queries and/or commands. The systems and methods overcomes the deficiencies of prior art speech query and response systems through the application of a complete speech-based information query, retrieval, presentation and command environment. This environment makes significant use of context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for one or more users making queries or commands in multiple domains. Through this integrated approach, a complete speech-based natural language query and response environment can be created. The systems and methods creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context and presenting the expected results for a particular question or command.
    Type: Grant
    Filed: July 7, 2008
    Date of Patent: October 5, 2010
    Assignee: VoiceBox Technologies, Inc.
    Inventors: Robert A. Kennewick, David Locke, Michael R. Kennewick, Sr., Michael R. Kennewick, Jr., Richard Kennewick, Tom Freeman
  • Patent number: 7809566
    Abstract: A method for use in automatic speech recognition corrects erroneous recognition elements within a recognition hypothesis. A user input is recognized as a correction hypothesis which contains various recognition elements. A non-deterministic alignment is performed to align at least a portion of the correction hypothesis with an earlier recognition hypothesis which also contains various recognition elements such that the recognition elements in the aligned portion of the correction hypothesis are determined to most likely, correspond to a range of recognition elements in the earlier recognition hypotheses. The recognition elements in the range of recognition elements in the earlier recognition hypothesis are replaced with the recognition elements in the aligned portion of the correction hypothesis.
    Type: Grant
    Filed: October 13, 2006
    Date of Patent: October 5, 2010
    Assignee: Nuance Communications, Inc.
    Inventor: Ralf Meermeier
  • Patent number: 7792671
    Abstract: Outputs of an automatic probabilistic event detection system, such as a fact extraction system, a speech-to-text engine or an automatic character recognition system, are matched with comparable results produced manually or by a different system. This comparison allows statistical modeling of the run-time behavior of the event detection system. This model can subsequently be used to give supplemental or replacement data for an output sequence of the system. In particular, the model can effectively calibrate the system for use with data of a particular statistical nature.
    Type: Grant
    Filed: February 5, 2004
    Date of Patent: September 7, 2010
    Assignee: Verint Americas Inc.
    Inventor: Michael Brand
  • Patent number: 7792667
    Abstract: A method of extracting significant phrases from one or more documents stored in a computer-readable medium. A sequence of words is read from the one or more documents and a score is determined for each word in the sequence based on the length of the word. The score for each word in the sequence is compared against a threshold score. The sequence of words is indicated to be a significant phrase if the number of words in the sequences that have a score greater than the threshold score equals or exceeds a predetermined number. A sentence containing the sequence of words is retrieved from the document, if the sequence of words is a significant phrase. An abstract of the document is searched to determine if the sentence has been previously included in the abstract. If not, the sentence is added to the abstract.
    Type: Grant
    Filed: September 26, 2008
    Date of Patent: September 7, 2010
    Inventors: Garnet R. Chaney, Robert F. Richardson, Seymour I. Rubinstein
  • Patent number: 7788094
    Abstract: A method for performing conditional maximum entropy modeling includes constructing a conditional maximum entropy model, and incorporating an observation confidence score into the model to reduce an effect due to an uncertain observation.
    Type: Grant
    Filed: January 29, 2007
    Date of Patent: August 31, 2010
    Assignee: Robert Bosch GmbH
    Inventors: Farhad Farahani, Fuliang Weng, Qi Zhang