Probability Patents (Class 704/240)

Method and system for predicting understanding errors in a task classification system

Patent number: 7487088

Abstract: This invention concerns a method and system for monitoring an automated dialog system for the automatic recognition of language understanding errors based on a user's input communications. The method may include determining whether a probability of understanding the user's input communication exceeds a first threshold. If the first threshold is exceeded, further dialog is conducted with the user. Otherwise, the user may be directed to a human for assistance. The method also illustratively determines whether the probability also exceeds a second threshold, the second threshold being higher than the first. If so, then further dialog is conducted with the user using the current dialog strategy. However, if the probability falls between a first threshold and a second threshold, the dialog strategy may be adapted in order to improve the chances of conducting a successful dialog with the user.

Type: Grant

Filed: May 12, 2006

Date of Patent: February 3, 2009

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Allen Louis Gorin, Irene Langkilde Geary, Marilyn Ann Walker, Jeremy H. Wright
Method and system for computing or determining confidence scores for parse trees at all levels

Publication number: 20090030686

Abstract: In a confidence computing method and system, a processor may interpret speech signals as a text string or directly receive a text string as input, generate a syntactical parse tree representing the interpreted string and including a plurality of sub-trees which each represents a corresponding section of the interpreted text string, determine for each sub-tree whether the sub-tree is accurate, obtain replacement speech signals for each sub-tree determined to be inaccurate, and provide output based on corresponding text string sections of at least one sub-tree determined to be accurate.

Type: Application

Filed: July 27, 2007

Publication date: January 29, 2009

Inventors: Fuliang Weng, Feng Lin, Zhe Feng
Energy feature extraction method for noisy speech recognition

Patent number: 7480614

Abstract: The present invention provides an energy feature extraction method for noisy speech recognition. At first, noisy speech energy of an input noisy speech is computed. Next, the noise energy in the input noisy speech is estimated. Then, the estimated noise energy is subtracted from the noisy speech energy to obtain estimated clean speech energy. Finally, delta operations are performed on the log of the estimated clean speech energy to determine the energy derivative features for the noisy speech.

Type: Grant

Filed: December 30, 2003

Date of Patent: January 20, 2009

Assignee: Industrial Technology Research Institute

Inventor: Tai-Huei Huang
Method of speech recognition using multimodal variational inference with switching state space models

Patent number: 7480615

Abstract: A method of efficiently setting posterior probability parameters for a switching state space model begins by defining a window containing at least two but fewer than all of the frames. A separate posterior probability parameter is determined for each frame in the window. The window is then shifted sequentially from left to right in time so that it includes one or more subsequent frames in the sequence of frames. A separate posterior probability parameter is then determined for each frame in the shifted window. This method closely approximates a more rigorous solution but saves computational cost by two to three orders of magnitude. Further, a method of determining the optimal discrete state sequence in the switching state space model is invented that directly exploits the observation vector on a frame-by-frame basis and operates from left to right in time.

Type: Grant

Filed: January 20, 2004

Date of Patent: January 20, 2009

Assignee: Microsoft Corporation

Inventors: Hagai Attias, Li Deng, Leo Lee
Signal detection using maximum a posteriori likelihood and noise spectral difference

Patent number: 7475012

Abstract: Robust signal detection against various types of background noise is implemented. According to a signal detection apparatus, the feature amount of an input signal sequence and the feature amount of a noise component contained in the signal sequence are extracted. After that, the first likelihood indicating probability that the signal sequence is detected and the second likelihood indicating probability that the noise component is detected are calculated on the basis of a predetermined signal-to-noise ratio and the extracted feature amount of the signal sequence. Additionally, a likelihood ratio indicating the ratio between the first likelihood and the second likelihood is calculated. Detection of the signal sequence is determined on the basis of the likelihood ratio.

Type: Grant

Filed: December 9, 2004

Date of Patent: January 6, 2009

Assignee: Canon Kabushiki Kaisha

Inventors: Philip Garner, Toshiaki Fukada, Yasuhiro Komori
Sound identification apparatus

Patent number: 7473838

Abstract: A sound identification apparatus which reduces the chance of a drop in the identification rate, including: a frame sound feature extraction unit which extracts a sound feature per frame of an inputted audio signal; a frame likelihood calculation unit which calculates a frame likelihood of the sound feature in each frame, for each of a plurality of sound models; a confidence measure judgment unit which judges a confidence measure based on the frame likelihood; a cumulative likelihood output unit time determination unit which determines a cumulative likelihood output unit time based on the confidence measure; a cumulative likelihood calculation unit which calculates a cumulative likelihood in which the frame likelihoods of the frames included in the cumulative likelihood output unit time are cumulated, for each sound model; a sound type candidate judgment unit which determines, for each cumulative likelihood output unit time, a sound type corresponding to the sound model that has a maximum cumulative likelihood

Type: Grant

Filed: April 9, 2007

Date of Patent: January 6, 2009

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Tetsu Suzuki, Yoshihisa Nakatoh, Shinichi Yoshizawa
Automated dialog system and method

Patent number: 7472060

Abstract: This invention concerns a method and system for monitoring an automated dialog system for the automatic recognition of language understanding errors based on a user's input communications in a dialog with the user. The probability of conducting a successful dialog with the user is determined based, at least in part, on understanding data from at least one prior dialog exchange of the dialog.

Type: Grant

Filed: September 6, 2005

Date of Patent: December 30, 2008

Assignee: AT&T Corp.

Inventors: Allen Louis Gorin, Irene Langkilde Geary, Marilyn Ann Walker, Jeremy H. Wright
Efficient recursive clustering based on a splitting function derived from successive eigen-decompositions

Patent number: 7472062

Abstract: Methods and arrangements for facilitating data clustering. From a set of input data, a predetermined number of non-overlapping subsets are created. The input data is split recursively to create the subsets.

Type: Grant

Filed: January 4, 2002

Date of Patent: December 30, 2008

Assignee: International Business Machines Corporation

Inventors: Upendra V. Chaudhari, Jiri Navratil, Ganesh N. Ramaswamy
SPEECH RECOGNITION UTILIZING MULTITUDE OF SPEECH FEATURES

Publication number: 20080312921

Abstract: In a speech recognition system, the combination of a log-linear model with a multitude of speech features is provided to recognize unknown speech utterances. The speech recognition system models the posterior probability of linguistic units relevant to speech recognition using a log-linear model. The posterior model captures the probability of the linguistic unit given the observed speech features and the parameters of the posterior model. The posterior model may be determined using the probability of the word sequence hypotheses given a multitude of speech features. Log-linear models are used with features derived from sparse or incomplete data. The speech features that are utilized may include asynchronous, overlapping, and statistically non-independent speech features. Not all features used in training need to appear in testing/recognition.

Type: Application

Filed: August 20, 2008

Publication date: December 18, 2008

Inventors: Scott E. Axelrod, Sreeram Viswanath Balakrishnan, Stanley F. Chen, Yuging Gao, Rameah A. Gopinath, Hong-Kwang Kuo, Benoit Maison, David Nahamoo, Michael Alan Picheny, George A. Saon, Geoffrey G. Zweig
Decoding multiple HMM sets using a single sentence grammar

Patent number: 7464033

Abstract: For a given sentence grammar, speech recognizers are often required to decode M sets of HMMs each of which models a specific acoustic environment. In order to match input acoustic observations to each of the environments, typically recognition search methods require a network of M sub-networks. A new speech recognition search method is described here, which needs a network that is only the size of a single sub-network and yet provides the same recognition performance, thus reducing the memory requirements for network storage by (M-1)/M.

Type: Grant

Filed: February 4, 2005

Date of Patent: December 9, 2008

Assignee: Texas Instruments Incorporated

Inventor: Yifan Gong
Speech recognition utilizing multitude of speech features

Patent number: 7464031

Abstract: In a speech recognition system, the combination of a log-linear model with a multitude of speech features is provided to recognize unknown speech utterances. The speech recognition system models the posterior probability of linguistic units relevant to speech recognition using a log-linear model. The posterior model captures the probability of the linguistic unit given the observed speech features and the parameters of the posterior model. The posterior model may be determined using the probability of the word sequence hypotheses given a multitude of speech features. Log-linear models are used with features derived from sparse or incomplete data. The speech features that are utilized may include asynchronous, overlapping, and statistically non-independent speech features. Not all features used in training need to appear in testing/recognition.

Type: Grant

Filed: November 28, 2003

Date of Patent: December 9, 2008

Assignee: International Business Machines Corporation

Inventors: Scott E. Axelrod, Sreeram Viswanath Balakrishnan, Stanley F. Chen, Yuging Gao, Ramesh A. Gopinath, Hong-Kwang Kuo, Benoit Maison, David Nahamoo, Michael Alan Picheny, George A. Saon, Geoffrey G. Zweig
Method of pattern recognition using noise reduction uncertainty

Patent number: 7460992

Abstract: A method and apparatus are provided for using the uncertainty of a noise-removal process during pattern recognition. In particular, noise is removed from a representation of a portion of a noisy signal to produce a representation of a cleaned signal. In the meantime, an uncertainty associated with the noise removal is computed and is used with the representation of the cleaned signal to modify a probability for a phonetic state in the recognition system. In particular embodiments, the uncertainty is used to modify a probability distribution, by increasing the variance in each Gaussian distribution by the amount equal to the estimated variance of the cleaned signal, which is used in decoding the phonetic state sequence in a pattern recognition task.

Type: Grant

Filed: May 16, 2006

Date of Patent: December 2, 2008

Assignee: Microsoft Corporation

Inventors: James G. Droppo, Alejandro Acero, Li Deng
Method of automatic processing of a speech signal

Patent number: 7457748

Abstract: Method of automatically processing a speech signal which comprises the steps of: determining a sequence of probability models corresponding to a given text; determining a sequence of acoustic strings corresponding to the diction of the given text; aligning between the sequence of acoustic strings and the sequence of models; and determining a confidence index of acoustic alignment for each association between a model and an acoustic segment. Each determining step of an alignment confidence index is carried out at least from a combination of the model probability, a priori model probabilities and the average duration of occupancy of the models.

Type: Grant

Filed: August 12, 2003

Date of Patent: November 25, 2008

Assignee: France Telecom

Inventors: Samir Nefti, Olivier Boeffard
Method of modeling single data class from multi-class data

Patent number: 7454337

Abstract: The present invention is a method of modeling a single class of data from data containing multiple classes of data of the same type of data by first receiving a collection of data that includes data from multiple classes of data of the same type where the amount of data of the single class of data exceeds that of any other class of data. A first statistical model of the received collection of data is generated. The collection of data is divided into subsets. Each subset of the speech collection of data is scored using the first statistical model. A set of scores is selected. The subsets corresponding to the selected scores are identified. The identified subsets are combined. A second statistical model of the type of the first statistical model is generated for the combined subsets and used as the model of the single class of data.

Type: Grant

Filed: May 13, 2004

Date of Patent: November 18, 2008

Assignee: The United States of America as represented by the Director, National Security Agency, The

Inventors: David C. Smith, Daniel J. Richman
Variational inference and learning for segmental switching state space models of hidden speech dynamics

Patent number: 7454336

Abstract: A system and method that facilitate modeling unobserved speech dynamics based upon a hidden dynamic speech model in the form of segmental switching state space model that employs model parameters including those describing the unobserved speech dynamics and those describing the relationship between the unobserved speech dynamic vector and the observed acoustic feature vector is provided. The model parameters are modified based, at least in part, upon, a variational learning technique. In accordance with an aspect of the present invention, novel and powerful variational expectation maximization (EM) algorithm(s) for the segmental switching state space models used in speech applications, which are capable of capturing key internal (or hidden) dynamics of natural speech production, are provided. For example, modification of model parameters can be based upon an approximate mixture of Gaussian (MOG) posterior and/or based upon an approximate hidden Markov model (HMM) posterior using a variational technique.

Type: Grant

Filed: June 20, 2003

Date of Patent: November 18, 2008

Assignee: Microsoft Corporation

Inventors: Hagai Attias, Li Deng, Leo J. Lee
Removing noise from feature vectors

Patent number: 7451083

Abstract: A method and computer-readable medium are provided for identifying clean signal feature vectors from noisy signal feature vectors. One aspect of the invention includes using an iterative approach to identify the clean signal feature vector. Another aspect of the invention includes using the variance of a set of noise feature vectors and/or channel distortion feature vectors when identifying the clean signal feature vectors.

Type: Grant

Filed: July 20, 2005

Date of Patent: November 11, 2008

Assignee: Microsoft Corporation

Inventors: Brendan J. Frey, Alejandro Acero, Li Deng
Method and apparatus for generating a language independent document abstract

Patent number: 7447626

Abstract: A method of extracting significant phrases from one or more documents stored in a computer-readable medium. A sequence of words is read from the one or more documents and a score is determined for each word in the sequence based on the length of the word. The score for each word in the sequence is compared against a threshold score. The sequence of words is indicated to be a significant phrase if the number of words in the sequences that have a score greater than the threshold score equals or exceeds a predetermined number. A sentence containing the sequence of words is retrieved from the document, if the sequence of words is a significant phrase. An abstract of the document is searched to determine if the sentence has been previously included in the abstract. If not, the sentence is added to the abstract.

Type: Grant

Filed: December 21, 2004

Date of Patent: November 4, 2008

Assignee: UDICO Holdings

Inventors: Garnet R. Chaney, Robert F. Richardson, Seymour I. Rubinstein
System, method and computer program product for large-scale street name speech recognition

Patent number: 7444284

Abstract: A system, method and computer program product are provided for speech recognition. During operation, a database of words are maintained. Initially, a probability is assigned to each of the words which indicates a prevalence of use of the word. Further, an utterance is received for speech recognition purposes. Such utterance is matched with one of the words in the database based on least in part on the probability.

Type: Grant

Filed: November 15, 2004

Date of Patent: October 28, 2008

Assignee: BeVocal, Inc.

Inventor: Bertrand A. Damiba
Automated dialog method with first and second thresholds for adapted dialog strategy

Patent number: 7440893

Abstract: This invention concerns a method and system for monitoring an automated dialog system for the automatic recognition of language understanding errors based on a user's input communications. The method illustratively determines whether a probability of understanding the user's input communication exceeds a first threshold. If the first threshold is exceeded, further dialog is conducted with the user. Otherwise, the user may be directed to a human for assistance. The method also illustratively determines whether the probability also exceeds a second threshold, the second threshold being higher than the first. If so, then further dialog is conducted with the user using the current dialog strategy. However, if the probability falls between a first threshold and a second threshold, the dialog strategy may be adapted in order to improve the chances of conducting a successful dialog with the user.

Type: Grant

Filed: September 6, 2005

Date of Patent: October 21, 2008

Assignee: AT&T Corp.

Inventors: Allen Louis Gorin, Irene Langkilde Geary, Marilyn Ann Walker, Jeremy H. Wright
Speech recognition apparatus

Patent number: 7437288

Abstract: A speech recognition apparatus using a probability model that employs a mixed distribution, the apparatus formed by a standard pattern storage means for storing a standard pattern; a recognition means for outputting recognition results corresponding to an input speech by using the standard pattern; a standard pattern generating means for inputting learning speech and generating the standard pattern; and a standard pattern adjustment means, provided between the standard pattern generating means and the standard pattern storage means, for adjusting the number of element distributions of the mixed distribution of the standard pattern.

Type: Grant

Filed: March 11, 2002

Date of Patent: October 14, 2008

Assignee: NEC Corporation

Inventor: Koichi Shinoda
PARTIALLY FILLING MIXED-INITIATIVE FORMS FROM UTTERANCES HAVING SUB-THRESHOLD CONFIDENCE SCORES BASED UPON WORD-LEVEL CONFIDENCE DATA

Publication number: 20080243502

Abstract: The invention discloses prompting for a spoken response that provides input for multiple elements. A single spoken utterance including content for multiple elements can be received, where each element is mapped to a data field. The spoken utterance can be speech-to-text converted to derive values for each of the multiple elements. An utterance level confidence score can be determined, which can fall below an associated certainty threshold. Element-level confidence scores for each of the derived elements can then be ascertained. A first set of the multiple elements can have element-level confidence scores above an associated certainty threshold and a second set can have scores below. Values can be stored in data fields mapped to the first set. A prompt for input for the second set can be played. Accordingly, data fields are partially filled in based upon the original speech utterance, where a second prompt for unfilled fields is played.

Type: Application

Filed: March 28, 2007

Publication date: October 2, 2008

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: SOONTHORN ATIVANICHAYAPHONG, Gerald M. McCobb, PARITOSH D. PATEL, MARC WHITE
SYSTEM AND METHOD FOR ADDRESSING CHANNEL MISMATCH THROUGH CLASS SPECIFIC TRANSFORMS

Publication number: 20080235007

Abstract: A method and system for speaker recognition and identification includes transforming features of a speaker utterance in a first condition state to match a second condition state and provide a transformed utterance. A discriminative criterion is used to generate a transform that maps an utterance to obtain a computed result. The discriminative criterion is maximized over a plurality of speakers to obtain a best transform for recognizing speech and/or identifying a speaker under the second condition state. Speech recognition and speaker identity may be determined by employing the best transform for decoding speech to reduce channel mismatch.

Type: Application

Filed: June 3, 2008

Publication date: September 25, 2008

Inventors: Jiri Navratil, Jagon Pelecanos, Ganesh N. Ramaswamy
Dynamic N-best algorithm to reduce recognition errors

Patent number: 7421387

Abstract: A method for reducing recognition errors. The method includes receiving an N-best list associated with an input of a computer based recognition system. The N-best list includes one or more hypotheses and associated confidence values. The input is classified in response to the N-best list, resulting in a classification. A re-scoring algorithm that is tuned for the classification is selected. The re-scoring algorithm is applied to the N-best list to create a re-scored N-best list. A hypothesis for the value of the input is selected based on the re-scored N-best list.

Type: Grant

Filed: May 18, 2004

Date of Patent: September 2, 2008

Assignee: General Motors Corporation

Inventor: Kurt S. Godden
Segmentation posterior based boundary point determination

Publication number: 20080189109

Abstract: Boundary points for speech in an audio signal are determined based on posterior probabilities for the boundary points given a set of possible segmentations of the audio signal. The boundary point posterior probability is determined based on a set of level posterior probabilities that each provide the probability of a sequence of feature vectors given one of the segmentations in the set of possible segmentations.

Type: Application

Filed: February 5, 2007

Publication date: August 7, 2008

Applicant: Microsoft Corporation

Inventors: Yu Shi, Frank Kao-Ping Soong
Speech recognition device using statistical language model

Patent number: 7409342

Abstract: A speech recognizing device. Natural speech recognizing means recognizes speech input in an application program by dictation. Recognition result converting means converts a recognition result from said natural speech recognizing means into a final recognition result processable by said application program on the basis of a grammar to he used for recognizing said input speech in a grammar method. The recognition result converting means further comprises candidate sentence generating means for evolving said grammar to generate candidate sentences that are candidates for said final recognition result: and matching means for selecting a candidate sentence as said final recognition result among the candidate sentences by matching said candidate sentences generated by said candidate sentence generating means against the recognition result by said natural speech recognizing means.

Type: Grant

Filed: March 31, 2004

Date of Patent: August 5, 2008

Assignee: International Business Machines Corporation

Inventors: Hiroaki Kashima, Yoshinori Tahara, Daisuke Tomoda
Representation of a deleted interpolation N-gram language model in ARPA standard format

Patent number: 7406416

Abstract: A method and apparatus are provided for storing parameters of a deleted interpolation language model as parameters of a backoff language model. In particular, the parameters of the deleted interpolation language model are stored in the standard ARPA format. Under one embodiment, the deleted interpolation language model parameters are formed using fractional counts.

Type: Grant

Filed: March 26, 2004

Date of Patent: July 29, 2008

Assignee: Microsoft Corporation

Inventors: Ciprian Chelba, Milind Mahajan, Alejandro Acero
Method of recognizing phones in speech of any language

Patent number: 7406408

Abstract: Method of recognizing phones in speech of any language. Acquire phones for all languages and a set of languages. Acquire a pronunciation dictionary, a transcript of speech for the set of languages, and speech for the transcript. Receive speech containing unknown phones. If the speech's language is unknown, compare it to the phones for all languages to determine the phones. If the language is known but no phones were acquired in that language, compare the speech to the phones for all languages to determine the phones. If phones were acquired in the speech's language but no corresponding pronunciation dictionary was acquired, compare the speech to the phones for all languages to determine the phones. If a pronunciation dictionary was acquired for the phones in the speech's language but no transcript was acquired then compare the speech to the phones for all languages to determine the phones.

Type: Grant

Filed: August 24, 2004

Date of Patent: July 29, 2008

Assignee: The United States of America as represented by the Director, National Security Agency

Inventors: Bradley C. Lackey, Patrick J. Schone, Brenton D. Walker
SYSTEM FOR CLASSIFICATION OF VOICE SIGNALS

Publication number: 20080154595

Abstract: A system and method for classifying a voice signal to one of a set of predefined categories, based upon a statistical analysis of features extracted from the voice signal. The system includes an acoustic processor and a classifier. The acoustic processor extracts features that are characteristic of the voice signal and generates feature vectors using the extracted spectral features. The classifier uses the feature vectors to compute the probability that the voice signal belongs to each of the predefined categories and classifies the voice signal to a predefined category that is associated with the highest probability.

Type: Application

Filed: March 4, 2008

Publication date: June 26, 2008

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: Israel Nelken
System and method for classification of voice signals

Patent number: 7389230

Abstract: A system and method for classifying a voice signal to one of a set of predefined categories, based upon a statistical analysis of features extracted from the voice signal. The system includes an acoustic processor and a classifier. The acoustic processor extracts features that are characteristic of the voice signal and generates feature vectors using the extracted spectral features. The classifier uses the feature vectors to compute the probability that the voice signal belongs to each of the predefined categories and classifies the voice signal to a predefined category that is associated with the highest probability.

Type: Grant

Filed: April 22, 2003

Date of Patent: June 17, 2008

Assignee: International Business Machines Corporation

Inventor: Israel Nelken
Method and system for high-speed speech recognition

Publication number: 20080140399

Abstract: Provided is a method and system for high-speed speech recognition. On the basis of a continuous density hidden Markov model (CDHMM) using a Gaussian mixture model (GMM) for an observation probability, the method and system add only K Gaussian components highly contributing to a state-specific observation probability for an input feature vector and calculate the state-specific observation probability. Thus, in the aspect of the recognition ratio, the degree of approximation of a state-specific observation probability increases, thereby minimizing deterioration of speech recognition performance. In addition, in the aspect of the amount of computation, the number of addition operations required for computing an observation probability is reduced, in comparison with conventional speech recognition that adds all Gaussian probabilities of an input feature vector and uses it for a state-specific observation probability, thereby reducing the total amount of computation required for speech recognition.

Type: Application

Filed: July 30, 2007

Publication date: June 12, 2008

Inventor: Hoon Chung
Identifying language attributes through probabilistic analysis

Patent number: 7386438

Abstract: A system and method for identifying language attributes through probabilistic analysis is described. A set of language classes and a plurality of training documents are defined, Each language class identifies a language and a character set encoding. Occurrences of one or more document properties within each training document are evaluated. For each language class, a probability for the document properties set conditioned on the occurrence of the language class is calculated. Byte occurrences within each training document are evaluated. For each language class, a probability for the byte occurrences conditioned on the occurrence of the language class is calculated.

Type: Grant

Filed: August 4, 2003

Date of Patent: June 10, 2008

Assignee: Google Inc.

Inventors: Alexander Franz, Brian Milch, Eric Jackson, Jenny Zhou, Benjamin Diament
Discriminative training of language models for text and speech classification

Patent number: 7379867

Abstract: Methods are disclosed for estimating language models such that the conditional likelihood of a class given a word string, which is very well correlated with classification accuracy, is maximized. The methods comprise tuning statistical language model parameters jointly for all classes such that a classifier discriminates between the correct class and the incorrect ones for a given training sentence or utterance. Specific embodiments of the present invention pertain to implementation of the rational function growth transform in the context of a discriminative training technique for n-gram classifiers.

Type: Grant

Filed: June 3, 2003

Date of Patent: May 27, 2008

Assignee: Microsoft Corporation

Inventors: Ciprian Chelba, Alejandro Acero, Milind Mahajan
INFERRING OPINIONS BASED ON LEARNED PROBABILITIES

Publication number: 20080097758

Abstract: An opinion system infers the opinion of a sentence of a product review based on a probability that the sentence contains certain sequences of parts of speech that are commonly used to express an opinion as indicated by the training data and the probabilities of the training data. When provided with the sentence, the opinion system identifies possible sequences of parts of speech of the sentence that are commonly used to express an opinion and the probability that the sequence is the correct sequence for the sentence. For each sequence, the opinion system then retrieves a probability derived from the training data that the sequence contains an opinion word that expresses an opinion. The opinion system then retrieves a probability from the training data that the opinion words of the sentence are used to express an opinion. The opinion system then combines the probabilities to generate an overall probability that the sentence with that sequence expresses an opinion.

Type: Application

Filed: October 23, 2006

Publication date: April 24, 2008

Applicant: Microsoft Corporation

Inventors: Hua Li, Jian-Lai Zhou, Zheng Chen, Jian Wang, Dongmei Zhang
Minimum classification error training with growth transformation optimization

Publication number: 20080091424

Abstract: Hidden Markov Model (HMM) parameters are updated using update equations based on growth transformation optimization of a minimum classification error objective function. Using the list of N-best competitor word sequences obtained by decoding the training data with the current-iteration HMM parameters, the current HMM parameters are updated iteratively. The updating procedure involves using weights for each competitor word sequence that can take any positive real value. The updating procedure is further extended to the case where a decoded lattice of competitors is used. In this case, updating the model parameters relies on determining the probability for a state at a time point based on the word that spans the time point instead of the entire word sequence. This word-bound span of time is shorter than the duration of the entire word sequence and thus reduces the computing time.

Type: Application

Filed: October 16, 2006

Publication date: April 17, 2008

Applicant: Microsoft Corporation

Inventors: Xiaodong He, Li Deng
Method and apparatus for performing observation probability calculations

Patent number: 7356466

Abstract: A method and apparatus for calculating an observation probability includes a first operation unit that subtracts a mean of a first plurality of parameters of an input voice signal from a second parameter of an input voice signal, and multiplies the subtraction result to obtain a first output. The first output is squared and accumulated N times in a second operation unit to obtain a second output. A third operation unit subtracts a given weighted value from the second output to obtain a third output, and a comparator stores the third output for a comparator stores the third output in order to extract L outputs therefrom, and stores the L extracted outputs based on an order of magnitude of the extracted L outputs.

Type: Grant

Filed: June 20, 2003

Date of Patent: April 8, 2008

Assignee: Samsung Electronics Co., Ltd.

Inventors: Byung-Ho Min, Tae-Su Kim, Hyun-Woo Park, Ho-Rang Jang, Keun-Cheol Hong, Sung-Jae Kim
Software for statistical analysis of speech

Patent number: 7346509

Abstract: Computer-implemented methods and apparatus are provided to facilitate the recognition of the content of a body of speech data. In one embodiment, a method for analyzing verbal communication is provided, comprising acts of producing an electronic recording of a plurality of spoken words; processing the electronic recording to identify a plurality of word alternatives for each of the spoken words, each of the plurality of word alternatives being identified by comparing a portion of the electronic recording with a lexicon, and each of the plurality of word alternatives being assigned a probability of correctly identifying a spoken word; loading the word alternatives and the probabilities to a database for subsequent analysis; and examining the word alternatives and the probabilities to determine at least one characteristic of the plurality of spoken words.

Type: Grant

Filed: September 26, 2003

Date of Patent: March 18, 2008

Assignee: Callminer, Inc.

Inventor: Jeffrey A. Gallino
Caption Correction Device

Publication number: 20080040111

Abstract: A device of the present invention obtains a character string of a speech recognition result and a confidence factor thereof. A time monitor monitors time and determines whether or not processing is delayed by checking the confidence factor and time status. When the processing is not delayed, a checker is asked to perform manual judgment. In this event, speech is processed and the manual judgment of the speech recognition result is performed on the basis of the processed speech. When the processing is delayed, automatic judgment is performed by use of the confidence factor. When the character string is judged to be correct as a result of the manual judgment or the automatic judgment, the character string is displayed as a confirmed character string. When the character string is judged to be incorrect, automatic correction is performed by matching on the basis of a next candidate obtained by the speech recognition, texts and attributes of the presentation, a script text, and the like.

Type: Application

Filed: March 21, 2007

Publication date: February 14, 2008

Inventors: Kohtaroh Miyamoto, Kenichi Arakawa, Toshiya Ohgane
Speech recognition concept confidence measurement

Patent number: 7324940

Abstract: Systems and methods for determining a confidence score associated with a decoding output of a speech recognition engine. In one embodiment, a method of determining the confidence score comprises arranging time frame and acoustic score data into an array, determining a phoneme sequence in the array that yields the highest sum of acoustic scores under certain constraints, e.g., minimum number of time frames and order of phonemes in a phoneme string. A relative score is derived by applying a functional relationship between the acoustic score and different sums comprising acoustic scores from the array. The confidence score, in some embodiments, depends at least in part on the relative score and a measure of ambiguity associated with similar sounding phrases being included in different concepts of a specified grammar.

Type: Grant

Filed: February 27, 2004

Date of Patent: January 29, 2008

Assignee: Lumen Vox, LLC

Inventors: Edward S. Miller, James F. Blake, II, Kyle N. Danielson, Keith C. Herold
Fast feature selection method and system for maximum entropy modeling

Patent number: 7324927

Abstract: A method to select features for maximum entropy modeling in which the gains for all candidate features are determined during an initialization stage and gains for only top-ranked features are determined during each feature selection stage. The candidate features are ranked in an ordered list based on the determined gains, a top-ranked feature in the ordered list with a highest gain is selected, and the model is adjusted using the selected top-ranked feature.

Type: Grant

Filed: July 3, 2003

Date of Patent: January 29, 2008

Assignees: Robert Bosch GmbH, The Board Of Trustees Of The Leland Stanford Junior University

Inventors: Fuliang Weng, Yaqian Zhou
Method and apparatus for determining an estimate

Patent number: 7318028

Abstract: For determining an estimate of a need for information units for encoding a signal, a measure for the distribution of the energy in the frequency band is taken into account in addition to the admissible interference for a frequency band and an energy of the frequency band. With this, a better estimate of the need for information units is obtained, so that coding can be done more efficiently and more accurately.

Type: Grant

Filed: August 31, 2006

Date of Patent: January 8, 2008

Assignee: Fraunhofer-Gesellschaft zur Foerderung der Angewandten Forschung E.V.

Inventors: Michael Schug, Johannes Hilpert, Stefan Geyersberger, Max Neuendorf
Speech recognition apparatus and speech recognition method

Patent number: 7310601

Abstract: The present invention provides a speech recognition apparatus which appropriately performs speech recognition by generating, in real time, language models adapted to a new topic even in the case where topics are changed.

Type: Grant

Filed: December 8, 2005

Date of Patent: December 18, 2007

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Makoto Nishizaki, Yoshihisa Nakatoh, Maki Yamada, Shinichi Yoshizawa
Removing noise from feature vectors

Patent number: 7310599

Abstract: A method and computer-readable medium are provided for identifying clean signal feature vectors from noisy signal feature vectors. Aspects of the invention use mixtures of distributions of noise feature vectors and/or channel distortion feature vectors when identifying the clean signal feature vectors.

Type: Grant

Filed: July 20, 2005

Date of Patent: December 18, 2007

Assignee: Microsoft Corporation

Inventors: Brendan J. Frey, Alejandro Acero, Li Deng
Language recognition using a similarity measure

Patent number: 7310600

Abstract: A dynamic programming technique is provided for matching two sequences of phonemes both of which may be generated from text or speech. The scoring of the dynamic programming matching technique uses phoneme confusion scores, phoneme insertion scores and phoneme deletion scores which are obtained in advance in a training session and, if appropriate, confidence data generated by a recognition system if the sequences are generated from speech.

Type: Grant

Filed: October 25, 2000

Date of Patent: December 18, 2007

Assignee: Canon Kabushiki Kaisha

Inventors: Philip Neil Garner, Jason Peter Andrew Charlesworth, Asako Higuchi
Voice command processing system and computer therefor, and voice command processing method

Patent number: 7299187

Abstract: When a user issued voice command does not match grammars registered in advance, the voice command is identified as a sentence (step S305). This sentence is compared with the registered grammars to calculate a similarity (step S307). When the similarity is higher than a first threshold value (TH1), the voice command is executed (step S315). When the similarity is equal to or lower than the first threshold value (TH1) and higher than a second threshold value (TH2), command choices are displayed for the user and the user is permitted to select a command to be executed (step S319). When the similarity is equal to or lower than the second threshold value (TH2), the command is not executed (step S321). Furthermore, once a command has been executed it is added as a grammar, so that it can be identified when next it is used.

Type: Grant

Filed: February 10, 2003

Date of Patent: November 20, 2007

Assignee: International Business Machines Corporation

Inventors: Yoshinori Tahara, Daisuke Tomoda, Kikuo Mitsubo, Yoshinori Atake
Systems and methods for using one-dimensional gaussian distributions to model speech

Patent number: 7295978

Abstract: A system for recognizing speech receives an input speech vector and identifies a Gaussian distribution. The system determines an address from the input speech vector (610) and uses the address to retrieve a distance value for the Gaussian distribution from a table (620). The system then determines the probability of the Gaussian distribution using the distance value (630) and recognizes the input speech vector based on the determined probability (640).

Type: Grant

Filed: September 5, 2000

Date of Patent: November 13, 2007

Assignees: Verizon Corporate Services Group Inc., BBN Technologies Corp.

Inventors: Richard Mark Schwartz, Jason Charles Davenport, James Donald Van Sciver, Long Nguyen
System and method for user modeling to enhance named entity recognition

Patent number: 7289956

Abstract: The present invention employs user modeling to model a user's behavior patterns. The user's behavior patterns are then used to influence named entity (NE) recognition.

Type: Grant

Filed: May 27, 2003

Date of Patent: October 30, 2007

Assignee: Microsoft Corporation

Inventors: Dong Yu, Peter K. L. Mau, Kuansan Wang, Milind Mahajan, Alejandro Acero
Method of determining uncertainty associated with acoustic distortion-based noise reduction

Patent number: 7289955

Abstract: A method and apparatus are provided for determining uncertainty in noise reduction based on a parametric model of speech distortion. The method is first used to reduce noise in a noisy signal. In particular, noise is reduced from a representation of a portion of a noisy signal to produce a representation of a cleaned signal by utilizing an acoustic environment model. The uncertainty associated with the noise reduction process is then computed. In one embodiment, the uncertainty of the noise reduction process is used, in conjunction with the noise-reduced signal, to decode a pattern state.

Type: Grant

Filed: December 20, 2006

Date of Patent: October 30, 2007

Assignee: Microsoft Corporation

Inventors: Li Deng, Alejandro Acero, James G. Droppo
Method for learning linguistically valid word pronunciations from acoustic data

Patent number: 7280963

Abstract: A computerized method is provided for generating pronunciations for words and storing the pronunciations in a pronunciation dictionary. The method includes graphing sets of initial pronunciations; thereafter in an ASR subsystem determining a highest-scoring set of initial pronunciations; generating sets of alternate pronunciations, wherein each set of alternate pronunciations includes the highest-scoring set of initial pronunciations with a lowest-probability phone of the highest-scoring initial pronunciation substituted with a unique-substitute phone; graphing the sets of alternate pronunciations; determining in the ASR subsystem a highest-scoring set of alternate pronunciations; and adding to a pronunciation dictionary the highest-scoring set of alternate pronunciations.

Type: Grant

Filed: September 12, 2003

Date of Patent: October 9, 2007

Assignee: Nuance Communications, Inc.

Inventors: Francoise Beaufays, Ananth Sankar, Mitchel Weintraub, Shaun Williams
Apparatus, method and computer program product for recognizing speech

Publication number: 20070225980

Abstract: A speech recognition apparatus includes a first-candidate selecting unit that selects a recognition result of a first speech from first recognition candidates based on likelihood of the first recognition candidates; a second-candidate selecting unit that extracts recognition candidates of a object word contained in the first speech and recognition candidates of a clue word from second recognition candidates, acquires the relevance ratio associated with the semantic relation between the extracted recognition candidates of the object word and the extracted recognition candidates of the clue word, and selects a recognition result of the second speech based on the acquired relevance ratio; a correction-portion identifying unit that identifies a portion corresponding to the object word in the first speech; and a correcting unit that corrects the word on identified portion.

Type: Application

Filed: March 1, 2007

Publication date: September 27, 2007

Inventor: Kazuo Sumita
Speech detection and enhancement using audio/video fusion

Patent number: 7269560

Abstract: A system and method facilitating speech detection and/or enhancement utilizing audio/video fusion is provided. The present invention fuses audio and video in a probabilistic generative model that implements cross-model, self-supervised learning, enabling rapid adaptation to audio visual data. The system can learn to detect and enhance speech in noise given only a short (e.g., 30 second) sequence of audio-visual data. In addition, it automatically learns to track the lips as they move around in the video.

Type: Grant

Filed: June 27, 2003

Date of Patent: September 11, 2007

Assignee: Microsoft Corporation

Inventors: John R. Hershey, Trausti Thor Kristjansson, Hagai Attias, Nebojsa Jojic

prev … 6 7 8 9 10 11 12 13 14 … next