Probability Patents (Class 704/240)
-
Patent number: 7487088Abstract: This invention concerns a method and system for monitoring an automated dialog system for the automatic recognition of language understanding errors based on a user's input communications. The method may include determining whether a probability of understanding the user's input communication exceeds a first threshold. If the first threshold is exceeded, further dialog is conducted with the user. Otherwise, the user may be directed to a human for assistance. The method also illustratively determines whether the probability also exceeds a second threshold, the second threshold being higher than the first. If so, then further dialog is conducted with the user using the current dialog strategy. However, if the probability falls between a first threshold and a second threshold, the dialog strategy may be adapted in order to improve the chances of conducting a successful dialog with the user.Type: GrantFiled: May 12, 2006Date of Patent: February 3, 2009Assignee: AT&T Intellectual Property II, L.P.Inventors: Allen Louis Gorin, Irene Langkilde Geary, Marilyn Ann Walker, Jeremy H. Wright
-
Publication number: 20090030686Abstract: In a confidence computing method and system, a processor may interpret speech signals as a text string or directly receive a text string as input, generate a syntactical parse tree representing the interpreted string and including a plurality of sub-trees which each represents a corresponding section of the interpreted text string, determine for each sub-tree whether the sub-tree is accurate, obtain replacement speech signals for each sub-tree determined to be inaccurate, and provide output based on corresponding text string sections of at least one sub-tree determined to be accurate.Type: ApplicationFiled: July 27, 2007Publication date: January 29, 2009Inventors: Fuliang Weng, Feng Lin, Zhe Feng
-
Patent number: 7480614Abstract: The present invention provides an energy feature extraction method for noisy speech recognition. At first, noisy speech energy of an input noisy speech is computed. Next, the noise energy in the input noisy speech is estimated. Then, the estimated noise energy is subtracted from the noisy speech energy to obtain estimated clean speech energy. Finally, delta operations are performed on the log of the estimated clean speech energy to determine the energy derivative features for the noisy speech.Type: GrantFiled: December 30, 2003Date of Patent: January 20, 2009Assignee: Industrial Technology Research InstituteInventor: Tai-Huei Huang
-
Patent number: 7480615Abstract: A method of efficiently setting posterior probability parameters for a switching state space model begins by defining a window containing at least two but fewer than all of the frames. A separate posterior probability parameter is determined for each frame in the window. The window is then shifted sequentially from left to right in time so that it includes one or more subsequent frames in the sequence of frames. A separate posterior probability parameter is then determined for each frame in the shifted window. This method closely approximates a more rigorous solution but saves computational cost by two to three orders of magnitude. Further, a method of determining the optimal discrete state sequence in the switching state space model is invented that directly exploits the observation vector on a frame-by-frame basis and operates from left to right in time.Type: GrantFiled: January 20, 2004Date of Patent: January 20, 2009Assignee: Microsoft CorporationInventors: Hagai Attias, Li Deng, Leo Lee
-
Patent number: 7475012Abstract: Robust signal detection against various types of background noise is implemented. According to a signal detection apparatus, the feature amount of an input signal sequence and the feature amount of a noise component contained in the signal sequence are extracted. After that, the first likelihood indicating probability that the signal sequence is detected and the second likelihood indicating probability that the noise component is detected are calculated on the basis of a predetermined signal-to-noise ratio and the extracted feature amount of the signal sequence. Additionally, a likelihood ratio indicating the ratio between the first likelihood and the second likelihood is calculated. Detection of the signal sequence is determined on the basis of the likelihood ratio.Type: GrantFiled: December 9, 2004Date of Patent: January 6, 2009Assignee: Canon Kabushiki KaishaInventors: Philip Garner, Toshiaki Fukada, Yasuhiro Komori
-
Patent number: 7473838Abstract: A sound identification apparatus which reduces the chance of a drop in the identification rate, including: a frame sound feature extraction unit which extracts a sound feature per frame of an inputted audio signal; a frame likelihood calculation unit which calculates a frame likelihood of the sound feature in each frame, for each of a plurality of sound models; a confidence measure judgment unit which judges a confidence measure based on the frame likelihood; a cumulative likelihood output unit time determination unit which determines a cumulative likelihood output unit time based on the confidence measure; a cumulative likelihood calculation unit which calculates a cumulative likelihood in which the frame likelihoods of the frames included in the cumulative likelihood output unit time are cumulated, for each sound model; a sound type candidate judgment unit which determines, for each cumulative likelihood output unit time, a sound type corresponding to the sound model that has a maximum cumulative likelihoodType: GrantFiled: April 9, 2007Date of Patent: January 6, 2009Assignee: Matsushita Electric Industrial Co., Ltd.Inventors: Tetsu Suzuki, Yoshihisa Nakatoh, Shinichi Yoshizawa
-
Patent number: 7472060Abstract: This invention concerns a method and system for monitoring an automated dialog system for the automatic recognition of language understanding errors based on a user's input communications in a dialog with the user. The probability of conducting a successful dialog with the user is determined based, at least in part, on understanding data from at least one prior dialog exchange of the dialog.Type: GrantFiled: September 6, 2005Date of Patent: December 30, 2008Assignee: AT&T Corp.Inventors: Allen Louis Gorin, Irene Langkilde Geary, Marilyn Ann Walker, Jeremy H. Wright
-
Patent number: 7472062Abstract: Methods and arrangements for facilitating data clustering. From a set of input data, a predetermined number of non-overlapping subsets are created. The input data is split recursively to create the subsets.Type: GrantFiled: January 4, 2002Date of Patent: December 30, 2008Assignee: International Business Machines CorporationInventors: Upendra V. Chaudhari, Jiri Navratil, Ganesh N. Ramaswamy
-
Publication number: 20080312921Abstract: In a speech recognition system, the combination of a log-linear model with a multitude of speech features is provided to recognize unknown speech utterances. The speech recognition system models the posterior probability of linguistic units relevant to speech recognition using a log-linear model. The posterior model captures the probability of the linguistic unit given the observed speech features and the parameters of the posterior model. The posterior model may be determined using the probability of the word sequence hypotheses given a multitude of speech features. Log-linear models are used with features derived from sparse or incomplete data. The speech features that are utilized may include asynchronous, overlapping, and statistically non-independent speech features. Not all features used in training need to appear in testing/recognition.Type: ApplicationFiled: August 20, 2008Publication date: December 18, 2008Inventors: Scott E. Axelrod, Sreeram Viswanath Balakrishnan, Stanley F. Chen, Yuging Gao, Rameah A. Gopinath, Hong-Kwang Kuo, Benoit Maison, David Nahamoo, Michael Alan Picheny, George A. Saon, Geoffrey G. Zweig
-
Patent number: 7464033Abstract: For a given sentence grammar, speech recognizers are often required to decode M sets of HMMs each of which models a specific acoustic environment. In order to match input acoustic observations to each of the environments, typically recognition search methods require a network of M sub-networks. A new speech recognition search method is described here, which needs a network that is only the size of a single sub-network and yet provides the same recognition performance, thus reducing the memory requirements for network storage by (M-1)/M.Type: GrantFiled: February 4, 2005Date of Patent: December 9, 2008Assignee: Texas Instruments IncorporatedInventor: Yifan Gong
-
Patent number: 7464031Abstract: In a speech recognition system, the combination of a log-linear model with a multitude of speech features is provided to recognize unknown speech utterances. The speech recognition system models the posterior probability of linguistic units relevant to speech recognition using a log-linear model. The posterior model captures the probability of the linguistic unit given the observed speech features and the parameters of the posterior model. The posterior model may be determined using the probability of the word sequence hypotheses given a multitude of speech features. Log-linear models are used with features derived from sparse or incomplete data. The speech features that are utilized may include asynchronous, overlapping, and statistically non-independent speech features. Not all features used in training need to appear in testing/recognition.Type: GrantFiled: November 28, 2003Date of Patent: December 9, 2008Assignee: International Business Machines CorporationInventors: Scott E. Axelrod, Sreeram Viswanath Balakrishnan, Stanley F. Chen, Yuging Gao, Ramesh A. Gopinath, Hong-Kwang Kuo, Benoit Maison, David Nahamoo, Michael Alan Picheny, George A. Saon, Geoffrey G. Zweig
-
Patent number: 7460992Abstract: A method and apparatus are provided for using the uncertainty of a noise-removal process during pattern recognition. In particular, noise is removed from a representation of a portion of a noisy signal to produce a representation of a cleaned signal. In the meantime, an uncertainty associated with the noise removal is computed and is used with the representation of the cleaned signal to modify a probability for a phonetic state in the recognition system. In particular embodiments, the uncertainty is used to modify a probability distribution, by increasing the variance in each Gaussian distribution by the amount equal to the estimated variance of the cleaned signal, which is used in decoding the phonetic state sequence in a pattern recognition task.Type: GrantFiled: May 16, 2006Date of Patent: December 2, 2008Assignee: Microsoft CorporationInventors: James G. Droppo, Alejandro Acero, Li Deng
-
Patent number: 7457748Abstract: Method of automatically processing a speech signal which comprises the steps of: determining a sequence of probability models corresponding to a given text; determining a sequence of acoustic strings corresponding to the diction of the given text; aligning between the sequence of acoustic strings and the sequence of models; and determining a confidence index of acoustic alignment for each association between a model and an acoustic segment. Each determining step of an alignment confidence index is carried out at least from a combination of the model probability, a priori model probabilities and the average duration of occupancy of the models.Type: GrantFiled: August 12, 2003Date of Patent: November 25, 2008Assignee: France TelecomInventors: Samir Nefti, Olivier Boeffard
-
Patent number: 7454337Abstract: The present invention is a method of modeling a single class of data from data containing multiple classes of data of the same type of data by first receiving a collection of data that includes data from multiple classes of data of the same type where the amount of data of the single class of data exceeds that of any other class of data. A first statistical model of the received collection of data is generated. The collection of data is divided into subsets. Each subset of the speech collection of data is scored using the first statistical model. A set of scores is selected. The subsets corresponding to the selected scores are identified. The identified subsets are combined. A second statistical model of the type of the first statistical model is generated for the combined subsets and used as the model of the single class of data.Type: GrantFiled: May 13, 2004Date of Patent: November 18, 2008Assignee: The United States of America as represented by the Director, National Security Agency, TheInventors: David C. Smith, Daniel J. Richman
-
Patent number: 7454336Abstract: A system and method that facilitate modeling unobserved speech dynamics based upon a hidden dynamic speech model in the form of segmental switching state space model that employs model parameters including those describing the unobserved speech dynamics and those describing the relationship between the unobserved speech dynamic vector and the observed acoustic feature vector is provided. The model parameters are modified based, at least in part, upon, a variational learning technique. In accordance with an aspect of the present invention, novel and powerful variational expectation maximization (EM) algorithm(s) for the segmental switching state space models used in speech applications, which are capable of capturing key internal (or hidden) dynamics of natural speech production, are provided. For example, modification of model parameters can be based upon an approximate mixture of Gaussian (MOG) posterior and/or based upon an approximate hidden Markov model (HMM) posterior using a variational technique.Type: GrantFiled: June 20, 2003Date of Patent: November 18, 2008Assignee: Microsoft CorporationInventors: Hagai Attias, Li Deng, Leo J. Lee
-
Patent number: 7451083Abstract: A method and computer-readable medium are provided for identifying clean signal feature vectors from noisy signal feature vectors. One aspect of the invention includes using an iterative approach to identify the clean signal feature vector. Another aspect of the invention includes using the variance of a set of noise feature vectors and/or channel distortion feature vectors when identifying the clean signal feature vectors.Type: GrantFiled: July 20, 2005Date of Patent: November 11, 2008Assignee: Microsoft CorporationInventors: Brendan J. Frey, Alejandro Acero, Li Deng
-
Patent number: 7447626Abstract: A method of extracting significant phrases from one or more documents stored in a computer-readable medium. A sequence of words is read from the one or more documents and a score is determined for each word in the sequence based on the length of the word. The score for each word in the sequence is compared against a threshold score. The sequence of words is indicated to be a significant phrase if the number of words in the sequences that have a score greater than the threshold score equals or exceeds a predetermined number. A sentence containing the sequence of words is retrieved from the document, if the sequence of words is a significant phrase. An abstract of the document is searched to determine if the sentence has been previously included in the abstract. If not, the sentence is added to the abstract.Type: GrantFiled: December 21, 2004Date of Patent: November 4, 2008Assignee: UDICO HoldingsInventors: Garnet R. Chaney, Robert F. Richardson, Seymour I. Rubinstein
-
Patent number: 7444284Abstract: A system, method and computer program product are provided for speech recognition. During operation, a database of words are maintained. Initially, a probability is assigned to each of the words which indicates a prevalence of use of the word. Further, an utterance is received for speech recognition purposes. Such utterance is matched with one of the words in the database based on least in part on the probability.Type: GrantFiled: November 15, 2004Date of Patent: October 28, 2008Assignee: BeVocal, Inc.Inventor: Bertrand A. Damiba
-
Patent number: 7440893Abstract: This invention concerns a method and system for monitoring an automated dialog system for the automatic recognition of language understanding errors based on a user's input communications. The method illustratively determines whether a probability of understanding the user's input communication exceeds a first threshold. If the first threshold is exceeded, further dialog is conducted with the user. Otherwise, the user may be directed to a human for assistance. The method also illustratively determines whether the probability also exceeds a second threshold, the second threshold being higher than the first. If so, then further dialog is conducted with the user using the current dialog strategy. However, if the probability falls between a first threshold and a second threshold, the dialog strategy may be adapted in order to improve the chances of conducting a successful dialog with the user.Type: GrantFiled: September 6, 2005Date of Patent: October 21, 2008Assignee: AT&T Corp.Inventors: Allen Louis Gorin, Irene Langkilde Geary, Marilyn Ann Walker, Jeremy H. Wright
-
Patent number: 7437288Abstract: A speech recognition apparatus using a probability model that employs a mixed distribution, the apparatus formed by a standard pattern storage means for storing a standard pattern; a recognition means for outputting recognition results corresponding to an input speech by using the standard pattern; a standard pattern generating means for inputting learning speech and generating the standard pattern; and a standard pattern adjustment means, provided between the standard pattern generating means and the standard pattern storage means, for adjusting the number of element distributions of the mixed distribution of the standard pattern.Type: GrantFiled: March 11, 2002Date of Patent: October 14, 2008Assignee: NEC CorporationInventor: Koichi Shinoda
-
Publication number: 20080243502Abstract: The invention discloses prompting for a spoken response that provides input for multiple elements. A single spoken utterance including content for multiple elements can be received, where each element is mapped to a data field. The spoken utterance can be speech-to-text converted to derive values for each of the multiple elements. An utterance level confidence score can be determined, which can fall below an associated certainty threshold. Element-level confidence scores for each of the derived elements can then be ascertained. A first set of the multiple elements can have element-level confidence scores above an associated certainty threshold and a second set can have scores below. Values can be stored in data fields mapped to the first set. A prompt for input for the second set can be played. Accordingly, data fields are partially filled in based upon the original speech utterance, where a second prompt for unfilled fields is played.Type: ApplicationFiled: March 28, 2007Publication date: October 2, 2008Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: SOONTHORN ATIVANICHAYAPHONG, Gerald M. McCobb, PARITOSH D. PATEL, MARC WHITE
-
Publication number: 20080235007Abstract: A method and system for speaker recognition and identification includes transforming features of a speaker utterance in a first condition state to match a second condition state and provide a transformed utterance. A discriminative criterion is used to generate a transform that maps an utterance to obtain a computed result. The discriminative criterion is maximized over a plurality of speakers to obtain a best transform for recognizing speech and/or identifying a speaker under the second condition state. Speech recognition and speaker identity may be determined by employing the best transform for decoding speech to reduce channel mismatch.Type: ApplicationFiled: June 3, 2008Publication date: September 25, 2008Inventors: Jiri Navratil, Jagon Pelecanos, Ganesh N. Ramaswamy
-
Patent number: 7421387Abstract: A method for reducing recognition errors. The method includes receiving an N-best list associated with an input of a computer based recognition system. The N-best list includes one or more hypotheses and associated confidence values. The input is classified in response to the N-best list, resulting in a classification. A re-scoring algorithm that is tuned for the classification is selected. The re-scoring algorithm is applied to the N-best list to create a re-scored N-best list. A hypothesis for the value of the input is selected based on the re-scored N-best list.Type: GrantFiled: May 18, 2004Date of Patent: September 2, 2008Assignee: General Motors CorporationInventor: Kurt S. Godden
-
Publication number: 20080189109Abstract: Boundary points for speech in an audio signal are determined based on posterior probabilities for the boundary points given a set of possible segmentations of the audio signal. The boundary point posterior probability is determined based on a set of level posterior probabilities that each provide the probability of a sequence of feature vectors given one of the segmentations in the set of possible segmentations.Type: ApplicationFiled: February 5, 2007Publication date: August 7, 2008Applicant: Microsoft CorporationInventors: Yu Shi, Frank Kao-Ping Soong
-
Patent number: 7409342Abstract: A speech recognizing device. Natural speech recognizing means recognizes speech input in an application program by dictation. Recognition result converting means converts a recognition result from said natural speech recognizing means into a final recognition result processable by said application program on the basis of a grammar to he used for recognizing said input speech in a grammar method. The recognition result converting means further comprises candidate sentence generating means for evolving said grammar to generate candidate sentences that are candidates for said final recognition result: and matching means for selecting a candidate sentence as said final recognition result among the candidate sentences by matching said candidate sentences generated by said candidate sentence generating means against the recognition result by said natural speech recognizing means.Type: GrantFiled: March 31, 2004Date of Patent: August 5, 2008Assignee: International Business Machines CorporationInventors: Hiroaki Kashima, Yoshinori Tahara, Daisuke Tomoda
-
Patent number: 7406416Abstract: A method and apparatus are provided for storing parameters of a deleted interpolation language model as parameters of a backoff language model. In particular, the parameters of the deleted interpolation language model are stored in the standard ARPA format. Under one embodiment, the deleted interpolation language model parameters are formed using fractional counts.Type: GrantFiled: March 26, 2004Date of Patent: July 29, 2008Assignee: Microsoft CorporationInventors: Ciprian Chelba, Milind Mahajan, Alejandro Acero
-
Patent number: 7406408Abstract: Method of recognizing phones in speech of any language. Acquire phones for all languages and a set of languages. Acquire a pronunciation dictionary, a transcript of speech for the set of languages, and speech for the transcript. Receive speech containing unknown phones. If the speech's language is unknown, compare it to the phones for all languages to determine the phones. If the language is known but no phones were acquired in that language, compare the speech to the phones for all languages to determine the phones. If phones were acquired in the speech's language but no corresponding pronunciation dictionary was acquired, compare the speech to the phones for all languages to determine the phones. If a pronunciation dictionary was acquired for the phones in the speech's language but no transcript was acquired then compare the speech to the phones for all languages to determine the phones.Type: GrantFiled: August 24, 2004Date of Patent: July 29, 2008Assignee: The United States of America as represented by the Director, National Security AgencyInventors: Bradley C. Lackey, Patrick J. Schone, Brenton D. Walker
-
Publication number: 20080154595Abstract: A system and method for classifying a voice signal to one of a set of predefined categories, based upon a statistical analysis of features extracted from the voice signal. The system includes an acoustic processor and a classifier. The acoustic processor extracts features that are characteristic of the voice signal and generates feature vectors using the extracted spectral features. The classifier uses the feature vectors to compute the probability that the voice signal belongs to each of the predefined categories and classifies the voice signal to a predefined category that is associated with the highest probability.Type: ApplicationFiled: March 4, 2008Publication date: June 26, 2008Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Israel Nelken
-
Patent number: 7389230Abstract: A system and method for classifying a voice signal to one of a set of predefined categories, based upon a statistical analysis of features extracted from the voice signal. The system includes an acoustic processor and a classifier. The acoustic processor extracts features that are characteristic of the voice signal and generates feature vectors using the extracted spectral features. The classifier uses the feature vectors to compute the probability that the voice signal belongs to each of the predefined categories and classifies the voice signal to a predefined category that is associated with the highest probability.Type: GrantFiled: April 22, 2003Date of Patent: June 17, 2008Assignee: International Business Machines CorporationInventor: Israel Nelken
-
Publication number: 20080140399Abstract: Provided is a method and system for high-speed speech recognition. On the basis of a continuous density hidden Markov model (CDHMM) using a Gaussian mixture model (GMM) for an observation probability, the method and system add only K Gaussian components highly contributing to a state-specific observation probability for an input feature vector and calculate the state-specific observation probability. Thus, in the aspect of the recognition ratio, the degree of approximation of a state-specific observation probability increases, thereby minimizing deterioration of speech recognition performance. In addition, in the aspect of the amount of computation, the number of addition operations required for computing an observation probability is reduced, in comparison with conventional speech recognition that adds all Gaussian probabilities of an input feature vector and uses it for a state-specific observation probability, thereby reducing the total amount of computation required for speech recognition.Type: ApplicationFiled: July 30, 2007Publication date: June 12, 2008Inventor: Hoon Chung
-
Patent number: 7386438Abstract: A system and method for identifying language attributes through probabilistic analysis is described. A set of language classes and a plurality of training documents are defined, Each language class identifies a language and a character set encoding. Occurrences of one or more document properties within each training document are evaluated. For each language class, a probability for the document properties set conditioned on the occurrence of the language class is calculated. Byte occurrences within each training document are evaluated. For each language class, a probability for the byte occurrences conditioned on the occurrence of the language class is calculated.Type: GrantFiled: August 4, 2003Date of Patent: June 10, 2008Assignee: Google Inc.Inventors: Alexander Franz, Brian Milch, Eric Jackson, Jenny Zhou, Benjamin Diament
-
Patent number: 7379867Abstract: Methods are disclosed for estimating language models such that the conditional likelihood of a class given a word string, which is very well correlated with classification accuracy, is maximized. The methods comprise tuning statistical language model parameters jointly for all classes such that a classifier discriminates between the correct class and the incorrect ones for a given training sentence or utterance. Specific embodiments of the present invention pertain to implementation of the rational function growth transform in the context of a discriminative training technique for n-gram classifiers.Type: GrantFiled: June 3, 2003Date of Patent: May 27, 2008Assignee: Microsoft CorporationInventors: Ciprian Chelba, Alejandro Acero, Milind Mahajan
-
Publication number: 20080097758Abstract: An opinion system infers the opinion of a sentence of a product review based on a probability that the sentence contains certain sequences of parts of speech that are commonly used to express an opinion as indicated by the training data and the probabilities of the training data. When provided with the sentence, the opinion system identifies possible sequences of parts of speech of the sentence that are commonly used to express an opinion and the probability that the sequence is the correct sequence for the sentence. For each sequence, the opinion system then retrieves a probability derived from the training data that the sequence contains an opinion word that expresses an opinion. The opinion system then retrieves a probability from the training data that the opinion words of the sentence are used to express an opinion. The opinion system then combines the probabilities to generate an overall probability that the sentence with that sequence expresses an opinion.Type: ApplicationFiled: October 23, 2006Publication date: April 24, 2008Applicant: Microsoft CorporationInventors: Hua Li, Jian-Lai Zhou, Zheng Chen, Jian Wang, Dongmei Zhang
-
Publication number: 20080091424Abstract: Hidden Markov Model (HMM) parameters are updated using update equations based on growth transformation optimization of a minimum classification error objective function. Using the list of N-best competitor word sequences obtained by decoding the training data with the current-iteration HMM parameters, the current HMM parameters are updated iteratively. The updating procedure involves using weights for each competitor word sequence that can take any positive real value. The updating procedure is further extended to the case where a decoded lattice of competitors is used. In this case, updating the model parameters relies on determining the probability for a state at a time point based on the word that spans the time point instead of the entire word sequence. This word-bound span of time is shorter than the duration of the entire word sequence and thus reduces the computing time.Type: ApplicationFiled: October 16, 2006Publication date: April 17, 2008Applicant: Microsoft CorporationInventors: Xiaodong He, Li Deng
-
Patent number: 7356466Abstract: A method and apparatus for calculating an observation probability includes a first operation unit that subtracts a mean of a first plurality of parameters of an input voice signal from a second parameter of an input voice signal, and multiplies the subtraction result to obtain a first output. The first output is squared and accumulated N times in a second operation unit to obtain a second output. A third operation unit subtracts a given weighted value from the second output to obtain a third output, and a comparator stores the third output for a comparator stores the third output in order to extract L outputs therefrom, and stores the L extracted outputs based on an order of magnitude of the extracted L outputs.Type: GrantFiled: June 20, 2003Date of Patent: April 8, 2008Assignee: Samsung Electronics Co., Ltd.Inventors: Byung-Ho Min, Tae-Su Kim, Hyun-Woo Park, Ho-Rang Jang, Keun-Cheol Hong, Sung-Jae Kim
-
Patent number: 7346509Abstract: Computer-implemented methods and apparatus are provided to facilitate the recognition of the content of a body of speech data. In one embodiment, a method for analyzing verbal communication is provided, comprising acts of producing an electronic recording of a plurality of spoken words; processing the electronic recording to identify a plurality of word alternatives for each of the spoken words, each of the plurality of word alternatives being identified by comparing a portion of the electronic recording with a lexicon, and each of the plurality of word alternatives being assigned a probability of correctly identifying a spoken word; loading the word alternatives and the probabilities to a database for subsequent analysis; and examining the word alternatives and the probabilities to determine at least one characteristic of the plurality of spoken words.Type: GrantFiled: September 26, 2003Date of Patent: March 18, 2008Assignee: Callminer, Inc.Inventor: Jeffrey A. Gallino
-
Publication number: 20080040111Abstract: A device of the present invention obtains a character string of a speech recognition result and a confidence factor thereof. A time monitor monitors time and determines whether or not processing is delayed by checking the confidence factor and time status. When the processing is not delayed, a checker is asked to perform manual judgment. In this event, speech is processed and the manual judgment of the speech recognition result is performed on the basis of the processed speech. When the processing is delayed, automatic judgment is performed by use of the confidence factor. When the character string is judged to be correct as a result of the manual judgment or the automatic judgment, the character string is displayed as a confirmed character string. When the character string is judged to be incorrect, automatic correction is performed by matching on the basis of a next candidate obtained by the speech recognition, texts and attributes of the presentation, a script text, and the like.Type: ApplicationFiled: March 21, 2007Publication date: February 14, 2008Inventors: Kohtaroh Miyamoto, Kenichi Arakawa, Toshiya Ohgane
-
Patent number: 7324940Abstract: Systems and methods for determining a confidence score associated with a decoding output of a speech recognition engine. In one embodiment, a method of determining the confidence score comprises arranging time frame and acoustic score data into an array, determining a phoneme sequence in the array that yields the highest sum of acoustic scores under certain constraints, e.g., minimum number of time frames and order of phonemes in a phoneme string. A relative score is derived by applying a functional relationship between the acoustic score and different sums comprising acoustic scores from the array. The confidence score, in some embodiments, depends at least in part on the relative score and a measure of ambiguity associated with similar sounding phrases being included in different concepts of a specified grammar.Type: GrantFiled: February 27, 2004Date of Patent: January 29, 2008Assignee: Lumen Vox, LLCInventors: Edward S. Miller, James F. Blake, II, Kyle N. Danielson, Keith C. Herold
-
Patent number: 7324927Abstract: A method to select features for maximum entropy modeling in which the gains for all candidate features are determined during an initialization stage and gains for only top-ranked features are determined during each feature selection stage. The candidate features are ranked in an ordered list based on the determined gains, a top-ranked feature in the ordered list with a highest gain is selected, and the model is adjusted using the selected top-ranked feature.Type: GrantFiled: July 3, 2003Date of Patent: January 29, 2008Assignees: Robert Bosch GmbH, The Board Of Trustees Of The Leland Stanford Junior UniversityInventors: Fuliang Weng, Yaqian Zhou
-
Patent number: 7318028Abstract: For determining an estimate of a need for information units for encoding a signal, a measure for the distribution of the energy in the frequency band is taken into account in addition to the admissible interference for a frequency band and an energy of the frequency band. With this, a better estimate of the need for information units is obtained, so that coding can be done more efficiently and more accurately.Type: GrantFiled: August 31, 2006Date of Patent: January 8, 2008Assignee: Fraunhofer-Gesellschaft zur Foerderung der Angewandten Forschung E.V.Inventors: Michael Schug, Johannes Hilpert, Stefan Geyersberger, Max Neuendorf
-
Patent number: 7310601Abstract: The present invention provides a speech recognition apparatus which appropriately performs speech recognition by generating, in real time, language models adapted to a new topic even in the case where topics are changed.Type: GrantFiled: December 8, 2005Date of Patent: December 18, 2007Assignee: Matsushita Electric Industrial Co., Ltd.Inventors: Makoto Nishizaki, Yoshihisa Nakatoh, Maki Yamada, Shinichi Yoshizawa
-
Patent number: 7310599Abstract: A method and computer-readable medium are provided for identifying clean signal feature vectors from noisy signal feature vectors. Aspects of the invention use mixtures of distributions of noise feature vectors and/or channel distortion feature vectors when identifying the clean signal feature vectors.Type: GrantFiled: July 20, 2005Date of Patent: December 18, 2007Assignee: Microsoft CorporationInventors: Brendan J. Frey, Alejandro Acero, Li Deng
-
Patent number: 7310600Abstract: A dynamic programming technique is provided for matching two sequences of phonemes both of which may be generated from text or speech. The scoring of the dynamic programming matching technique uses phoneme confusion scores, phoneme insertion scores and phoneme deletion scores which are obtained in advance in a training session and, if appropriate, confidence data generated by a recognition system if the sequences are generated from speech.Type: GrantFiled: October 25, 2000Date of Patent: December 18, 2007Assignee: Canon Kabushiki KaishaInventors: Philip Neil Garner, Jason Peter Andrew Charlesworth, Asako Higuchi
-
Patent number: 7299187Abstract: When a user issued voice command does not match grammars registered in advance, the voice command is identified as a sentence (step S305). This sentence is compared with the registered grammars to calculate a similarity (step S307). When the similarity is higher than a first threshold value (TH1), the voice command is executed (step S315). When the similarity is equal to or lower than the first threshold value (TH1) and higher than a second threshold value (TH2), command choices are displayed for the user and the user is permitted to select a command to be executed (step S319). When the similarity is equal to or lower than the second threshold value (TH2), the command is not executed (step S321). Furthermore, once a command has been executed it is added as a grammar, so that it can be identified when next it is used.Type: GrantFiled: February 10, 2003Date of Patent: November 20, 2007Assignee: International Business Machines CorporationInventors: Yoshinori Tahara, Daisuke Tomoda, Kikuo Mitsubo, Yoshinori Atake
-
Patent number: 7295978Abstract: A system for recognizing speech receives an input speech vector and identifies a Gaussian distribution. The system determines an address from the input speech vector (610) and uses the address to retrieve a distance value for the Gaussian distribution from a table (620). The system then determines the probability of the Gaussian distribution using the distance value (630) and recognizes the input speech vector based on the determined probability (640).Type: GrantFiled: September 5, 2000Date of Patent: November 13, 2007Assignees: Verizon Corporate Services Group Inc., BBN Technologies Corp.Inventors: Richard Mark Schwartz, Jason Charles Davenport, James Donald Van Sciver, Long Nguyen
-
Patent number: 7289956Abstract: The present invention employs user modeling to model a user's behavior patterns. The user's behavior patterns are then used to influence named entity (NE) recognition.Type: GrantFiled: May 27, 2003Date of Patent: October 30, 2007Assignee: Microsoft CorporationInventors: Dong Yu, Peter K. L. Mau, Kuansan Wang, Milind Mahajan, Alejandro Acero
-
Patent number: 7289955Abstract: A method and apparatus are provided for determining uncertainty in noise reduction based on a parametric model of speech distortion. The method is first used to reduce noise in a noisy signal. In particular, noise is reduced from a representation of a portion of a noisy signal to produce a representation of a cleaned signal by utilizing an acoustic environment model. The uncertainty associated with the noise reduction process is then computed. In one embodiment, the uncertainty of the noise reduction process is used, in conjunction with the noise-reduced signal, to decode a pattern state.Type: GrantFiled: December 20, 2006Date of Patent: October 30, 2007Assignee: Microsoft CorporationInventors: Li Deng, Alejandro Acero, James G. Droppo
-
Patent number: 7280963Abstract: A computerized method is provided for generating pronunciations for words and storing the pronunciations in a pronunciation dictionary. The method includes graphing sets of initial pronunciations; thereafter in an ASR subsystem determining a highest-scoring set of initial pronunciations; generating sets of alternate pronunciations, wherein each set of alternate pronunciations includes the highest-scoring set of initial pronunciations with a lowest-probability phone of the highest-scoring initial pronunciation substituted with a unique-substitute phone; graphing the sets of alternate pronunciations; determining in the ASR subsystem a highest-scoring set of alternate pronunciations; and adding to a pronunciation dictionary the highest-scoring set of alternate pronunciations.Type: GrantFiled: September 12, 2003Date of Patent: October 9, 2007Assignee: Nuance Communications, Inc.Inventors: Francoise Beaufays, Ananth Sankar, Mitchel Weintraub, Shaun Williams
-
Publication number: 20070225980Abstract: A speech recognition apparatus includes a first-candidate selecting unit that selects a recognition result of a first speech from first recognition candidates based on likelihood of the first recognition candidates; a second-candidate selecting unit that extracts recognition candidates of a object word contained in the first speech and recognition candidates of a clue word from second recognition candidates, acquires the relevance ratio associated with the semantic relation between the extracted recognition candidates of the object word and the extracted recognition candidates of the clue word, and selects a recognition result of the second speech based on the acquired relevance ratio; a correction-portion identifying unit that identifies a portion corresponding to the object word in the first speech; and a correcting unit that corrects the word on identified portion.Type: ApplicationFiled: March 1, 2007Publication date: September 27, 2007Inventor: Kazuo Sumita
-
Patent number: 7269560Abstract: A system and method facilitating speech detection and/or enhancement utilizing audio/video fusion is provided. The present invention fuses audio and video in a probabilistic generative model that implements cross-model, self-supervised learning, enabling rapid adaptation to audio visual data. The system can learn to detect and enhance speech in noise given only a short (e.g., 30 second) sequence of audio-visual data. In addition, it automatically learns to track the lips as they move around in the video.Type: GrantFiled: June 27, 2003Date of Patent: September 11, 2007Assignee: Microsoft CorporationInventors: John R. Hershey, Trausti Thor Kristjansson, Hagai Attias, Nebojsa Jojic