Probability Patents (Class 704/240)
  • Patent number: 8892437
    Abstract: Example embodiments of the present invention may include a method that provides transcribing spoken utterances occurring during a call and assigning each of the spoken utterances with a corresponding set of first classifications. The method may also include determining a confidence rating associated with each of the spoken utterances and the assigned set of first classifications, and performing at least one of reclassifying the spoken utterances with new classifications based on at least one additional classification operation, and adding the assigned first classifications and the corresponding plurality of spoken utterances to a training data set.
    Type: Grant
    Filed: November 13, 2013
    Date of Patent: November 18, 2014
    Assignee: West Corporation
    Inventor: Silke Witt-ehsani
  • Patent number: 8886540
    Abstract: A method and system for entering information into a software application resident on a mobile communication facility is provided. The method and system may include recording speech presented by a user using a mobile communication facility resident capture facility, transmitting the recording through a wireless communication facility to a speech recognition facility, transmitting information relating to the software application to the speech recognition facility, generating results utilizing the speech recognition facility using an unstructured language model based at least in part on the information relating to the software application and the recording, transmitting the results to the mobile communications facility, loading the results into the software application and simultaneously displaying the results as a set of words and as a set of application results based on those words.
    Type: Grant
    Filed: August 1, 2008
    Date of Patent: November 11, 2014
    Assignee: Vlingo Corporation
    Inventors: Joseph P. Cerra, John N. Nguyen, Michael S. Phillips, Han Shu, Alexandra Beth Mischke
  • Patent number: 8886533
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for combining frame and segment level processing, via temporal pooling, for phonetic classification. A frame processor unit receives an input and extracts the time-dependent features from the input. A plurality of pooling interface units generates a plurality of feature vectors based on pooling the time-dependent features and selecting a plurality of time-dependent features according to a plurality of selection strategies. Next, a plurality of segmental classification units generates scores for the feature vectors. Each segmental classification unit (SCU) can be dedicated to a specific pooling interface unit (PIU) to form a PIU-SCU combination. Multiple PIU-SCU combinations can be further combined to form an ensemble of combinations, and the ensemble can be diversified by varying the pooling operations used by the PIU-SCU combinations.
    Type: Grant
    Filed: October 25, 2011
    Date of Patent: November 11, 2014
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Sumit Chopra, Dimitrios Dimitriadis, Patrick Haffner
  • Patent number: 8886532
    Abstract: On a computing device a speech utterance is received from a user. The speech utterance is a section of a speech dialog that includes a plurality of speech utterances. One or more features from the speech utterance are identified. Each identified feature from the speech utterance is a specific characteristic of the speech utterance. One or more features from the speech dialog are identified. Each identified feature from the speech dialog is associated with one or more events in the speech dialog. The one or more events occur prior to the speech utterance. One or more identified features from the speech utterance and one or more identified features from the speech dialog are used to calculate a confidence score for the speech utterance.
    Type: Grant
    Filed: October 27, 2010
    Date of Patent: November 11, 2014
    Assignee: Microsoft Corporation
    Inventors: Michael Levit, Bruce Melvin Buntschuh
  • Patent number: 8873868
    Abstract: An apparatus is provided for classifying targets into a known-object group and an unknown-object group. The apparatus includes a speech/image data storage unit configured to store a spoken sound of a name of an object and an image of the object; a unit configured to calculate a speech confidence level of a speech for the name of the object with reference to a spoken sound of a name of a known object; a unit configured to calculate an image confidence level of an image of an object with respect to an image of a known object; and a unit configured to compare an evaluation value, which is obtained by combining the speech confidence level and image confidence level, with a threshold value, and classify a target object into an object group determined according to whether the spoken sound of the name and the image are known or unknown.
    Type: Grant
    Filed: December 21, 2012
    Date of Patent: October 28, 2014
    Assignees: Honda Motor Co. Ltd., National University Corporation Kobe University
    Inventors: Mikio Nakano, Naoto Iwahashi, Yasuo Ariki, Yuko Ozasa, Takahiro Hori, Ryohei Nakatani
  • Patent number: 8856004
    Abstract: Techniques for converting spoken speech into written speech are provided. The techniques include transcribing input speech via speech recognition, mapping each spoken utterance from input speech into a corresponding formal utterance, and mapping each formal utterance into a stylistically formatted written utterance.
    Type: Grant
    Filed: May 13, 2011
    Date of Patent: October 7, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Sara H. Basson, Rick Hamilton, Dan Ning Jiang, Dimitri Kanevsky, David Nahamoo, Michael Picheny, Bhuvana Ramabhadran, Tara N. Sainath
  • Patent number: 8856002
    Abstract: A universal pattern processing system receives input data and produces output patterns that are best associated with said data. The system uses input means receiving and processing input data, a universal pattern decoder means transforming models using the input data and associating output patterns with original models that are changed least during transforming, and output means outputting best associated patterns chosen by a pattern decoder means.
    Type: Grant
    Filed: April 11, 2008
    Date of Patent: October 7, 2014
    Assignee: International Business Machines Corporation
    Inventors: Dimitri Kanevsky, David Nahamoo, Tara N Sainath
  • Patent number: 8856049
    Abstract: An apparatus for classifying an audio signal configured to: estimate at least one shaping parameter value for a plurality of samples of the audio signal; generate at least one audio signal classification value by mapping the at least one shaping parameter value to one of at least two interval estimates; and determine at least one audio signal classification decision based on the at least one audio signal classification value.
    Type: Grant
    Filed: March 26, 2008
    Date of Patent: October 7, 2014
    Assignee: Nokia Corporation
    Inventors: Adriana Vasilache, Lasse Juhani Laaksonen, Mikko Tapio Tammi, Anssi Sakari Ramo
  • Patent number: 8849663
    Abstract: A system and method may be provided to segment and/or classify an audio signal from transformed audio information. Transformed audio information representing a sound may be obtained. The transformed audio information may specify magnitude of a coefficient related to energy amplitude as a function of frequency for the audio signal and time. Features associated with the audio signal may be obtained from the transformed audio information. Individual ones of the features may be associated with a feature score relative to a predetermined speaker model. An aggregate score may be obtained based on the feature scores according to a weighting scheme. The weighting scheme may be associated with a noise and/or SNR estimation. The aggregate score may be used for segmentation to identify portions of the audio signal containing speech of one or more different speakers. For classification, the aggregate score may be used to determine a likely speaker model to identify a source of the sound in the audio signal.
    Type: Grant
    Filed: August 8, 2011
    Date of Patent: September 30, 2014
    Assignee: The Intellisis Corporation
    Inventors: David C. Bradley, Robert N. Hilton, Daniel S. Goldin, Nicholas K. Fisher, Derrick R. Roos, Eric Wiewiora
  • Patent number: 8843367
    Abstract: An adaptive equalization system that adjusts the spectral shape of a speech signal based on an intelligibility measurement of the speech signal may improve the intelligibility of the output speech signal. Such an adaptive equalization system may include a speech intelligibility measurement module, a spectral shape adjustment module, and an adaptive equalization module. The speech intelligibility measurement module is configured to calculate a speech intelligibility measurement of a speech signal. The spectral shape adjustment module is configured to generate a weighted long-term speech curve based on a first predetermined long-term average speech curve, a second predetermined long-term average speech curve, and the speech intelligibility measurement. The adaptive equalization module is configured to adapt equalization coefficients for the speech signal based on the weighted long-term speech curve.
    Type: Grant
    Filed: May 4, 2012
    Date of Patent: September 23, 2014
    Assignee: 8758271 Canada Inc.
    Inventors: Phillip Alan Hetherington, Xueman Li
  • Publication number: 20140278412
    Abstract: Characterizing an acoustic signal includes extracting a vector from the acoustic signal, where the vector contains information about the nuisance characteristics present in the acoustic signal, and computing a set of likelihoods of the vector for a plurality of classes that model a plurality of nuisance characteristics. Training a system to characterize an acoustic signal includes obtaining training data, the training data comprising a plurality of acoustic signals, where each of the plurality of acoustic signals is associated with one of a plurality of classes that indicates a presence of a specific type of nuisance characteristic, transforming each of the plurality of acoustic signals into a vector that summarizes information about the acoustic characteristics of the signal, to produce a plurality of vectors, and labeling each of the plurality of vectors with one of the plurality of classes.
    Type: Application
    Filed: March 15, 2013
    Publication date: September 18, 2014
    Applicant: SRI International
    Inventors: NICOLAS SCHEFFER, LUCIANA FERRER
  • Publication number: 20140278411
    Abstract: A method for vocabulary integration of speech recognition comprises converting multiple speech signals into multiple words using a processor, applying confidence scores to the multiple words, classifying the multiple words into a plurality of classifications based on classification criteria and the confidence score for each word, determining if one or more of the multiple words are unrecognized based on the plurality of classifications, classifying each unrecognized word and detecting a match for the unrecognized word based on additional classification criteria, and upon detecting a match for an unrecognized word, converting at least a portion of the multiple speech signals corresponding to the unrecognized word into words.
    Type: Application
    Filed: March 13, 2013
    Publication date: September 18, 2014
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventor: Chun Shing Cheung
  • Patent number: 8825481
    Abstract: Techniques are described for training a speech recognition model for accented speech. A subword parse table is employed that models mispronunciations at multiple subword levels, such as the syllable, position-specific cluster, and/or phone levels. Mispronunciation probability data is then generated at each level based on inputted training data, such as phone-level annotated transcripts of accented speech. Data from different levels of the subword parse table may then be combined to determine the accented speech model. Mispronunciation probability data at each subword level is based at least in part on context at that level. In some embodiments, phone-level annotated transcripts are generated using a semi-supervised method.
    Type: Grant
    Filed: January 20, 2012
    Date of Patent: September 2, 2014
    Assignee: Microsoft Corporation
    Inventors: Albert Joseph Kishan Thambiratnam, Timo Pascal Mertens, Frank Torsten Bernd Seide
  • Patent number: 8826226
    Abstract: Systems, methods, and apparatuses including computer program products for generating a custom language model. In one implementation, a method is provided. The method includes receiving a collection of documents; clustering the documents into one or more clusters; generating a cluster vector for each cluster of the one or more clusters; generating a target vector associated with a target profile; comparing the target vector with each of the cluster vectors; selecting one or more of the one or more clusters based on the comparison; and generating a language model using documents from the one or more selected clusters.
    Type: Grant
    Filed: November 5, 2008
    Date of Patent: September 2, 2014
    Assignee: Google Inc.
    Inventors: Jun Wu, Henry Ou, Xiliu Tang, Yong-Gang Wang, Yongyan Liu
  • Patent number: 8818801
    Abstract: Disclosed is a dialogue speech recognition system that can expand the scope of applications by employing a universal dialogue structure as the condition for speech recognition of dialogue speech between persons. An acoustic likelihood computation means (701) provides a likelihood that a speech signal input from a given phoneme sequence will occur. A linguistic likelihood computation means (702) provides a likelihood that a given word sequence will occur. A maximum likelihood candidate search means (703) uses the likelihoods provided by the acoustic likelihood computation means and the linguistic likelihood computation means to provide a word sequence with the maximum likelihood of occurring from a speech signal. Further, the linguistic likelihood computation means (702) provides different linguistic likelihoods when the speaker who generated the acoustic signal input to the speech recognition means does and does not have the turn to speak.
    Type: Grant
    Filed: May 12, 2009
    Date of Patent: August 26, 2014
    Assignee: NEC Corporation
    Inventor: Kentaro Nagatomo
  • Patent number: 8812291
    Abstract: Systems, methods, and computer program products for machine translation are provided. In some implementations a system is provided. The system includes a language model including a collection of n-grams from a corpus, each n-gram having a corresponding relative frequency in the corpus and an order n corresponding to a number of tokens in the n-gram, each n-gram corresponding to a backoff n-gram having an order of n?1 and a collection of backoff scores, each backoff score associated with an n-gram, the backoff score determined as a function of a backoff factor and a relative frequency of a corresponding backoff n-gram in the corpus.
    Type: Grant
    Filed: December 10, 2012
    Date of Patent: August 19, 2014
    Assignee: Google Inc.
    Inventors: Thorsten Brants, Ashok C. Popat, Peng Xu, Franz J. Och, Jeffrey Dean
  • Patent number: 8798995
    Abstract: Topics of potential interest to a user, useful for purposes such as targeted advertising and product recommendations, can be extracted from voice content produced by a user. A computing device can capture voice content, such as when a user speaks into or near the device. One or more sniffer algorithms or processes can attempt to identify trigger words in the voice content, which can indicate a level of interest of the user. For each identified potential trigger word, the device can capture adjacent audio that can be analyzed, on the device or remotely, to attempt to determine one or more keywords associated with that trigger word. The identified keywords can be stored and/or transmitted to an appropriate location accessible to entities such as advertisers or content providers who can use the keywords to attempt to select or customize content that is likely relevant to the user.
    Type: Grant
    Filed: September 23, 2011
    Date of Patent: August 5, 2014
    Assignee: Amazon Technologies, Inc.
    Inventor: Kiran K. Edara
  • Patent number: 8798994
    Abstract: The present invention discloses a solution for conserving computing resources when implementing transformation based adaptation techniques. The disclosed solution limits the amount of speech data used by real-time adaptation algorithms to compute a transformation, which results in substantial computational savings. Appreciably, application of a transform is a relatively low memory and computationally cheap process compared to memory and resource requirements for computing the transform to be applied.
    Type: Grant
    Filed: February 6, 2008
    Date of Patent: August 5, 2014
    Assignee: International Business Machines Corporation
    Inventors: John W. Eckhart, Michael Florio, Radek Hampl, Pavel Krbec, Jonathan Palgon
  • Publication number: 20140214419
    Abstract: An automatic speech recognition method includes at a computer having one or more processors and memory for storing one or more programs to be executed by the processors, obtaining a plurality of speech corpus categories through classifying and calculating raw speech corpus; obtaining a plurality of classified language models that respectively correspond to the plurality of speech corpus categories through a language model training applied on each speech corpus category; obtaining an interpolation language model through implementing a weighted interpolation on each classified language model and merging the interpolated plurality of classified language models; constructing a decoding resource in accordance with an acoustic model and the interpolation language model; and decoding input speech using the decoding resource, and outputting a character string with a highest probability as a recognition result of the input speech.
    Type: Application
    Filed: December 16, 2013
    Publication date: July 31, 2014
    Applicant: Tencent Technology (Shenzhen) Company Limited
    Inventors: Feng Rao, Li Lu, Bo Chen, Shuai Yue, Xiang Zhang, Eryu Wang, Dadong Xie, Lou Li, Duling Lu
  • Patent number: 8793130
    Abstract: A method of generating a confidence measure generator is provided for use in a voice search system, the voice search system including voice search components comprising a speech recognition system, a dialog manager and a search system. The method includes selecting voice search features, from a plurality of the voice search components, to be considered by the confidence measure generator in generating a voice search confidence measure. The method includes training a model, using a computer processor, to generate the voice search confidence measure based on selected voice search features.
    Type: Grant
    Filed: March 23, 2012
    Date of Patent: July 29, 2014
    Assignee: Microsoft Corporation
    Inventors: Ye-Yi Wang, Yun-Cheng Ju, Dong Yu
  • Patent number: 8781825
    Abstract: Embodiments of the present invention improve methods of performing speech recognition. In one embodiment, the present invention includes a method comprising receiving a spoken utterance, processing the spoken utterance in a speech recognizer to generate a recognition result, determining consistencies of one or more parameters of component sounds of the spoken utterance, wherein the parameters are selected from the group consisting of duration, energy, and pitch, and wherein each component sound of the spoken utterance has a corresponding value of said parameter, and validating the recognition result based on the consistency of at least one of said parameters.
    Type: Grant
    Filed: August 24, 2011
    Date of Patent: July 15, 2014
    Assignee: Sensory, Incorporated
    Inventors: Jonathan Shaw, Pieter Vermeulen, Stephen Sutton, Robert Savoie
  • Patent number: 8762154
    Abstract: Example embodiments of the present invention may include a method that includes collecting caller response timings to each of a plurality of dialog states conducted during a call, and estimating a plurality of parameters based on the caller response timings. The method may also include selecting a response completeness value responsive to the estimated plurality of parameters, the response completeness value is used to calculate at least one optimal timeout value. The method may also include selecting the at least one optimal timeout value, and setting the at least one optimal timeout value for each of the corresponding dialog states. The timeout value(s) may be used for subsequent calls to provide optimal user satisfaction and call success rates.
    Type: Grant
    Filed: August 15, 2011
    Date of Patent: June 24, 2014
    Assignee: West Corporation
    Inventor: Silke Witt-ehsani
  • Patent number: 8744849
    Abstract: A microphone-array-based speech recognition system combines a noise cancelling technique for cancelling noise of input speech signals from an array of microphones, according to at least an inputted threshold. The system receives noise-cancelled speech signals outputted by a noise masking module through at least a speech model and at least a filler model, then computes a confidence measure score with the at least a speech model and the at least a filler model for each threshold and each noise-cancelled speech signal, and adjusts the threshold to continue the noise cancelling for achieving a maximum confidence measure score, thereby outputting a speech recognition result related to the maximum confidence measure score.
    Type: Grant
    Filed: October 12, 2011
    Date of Patent: June 3, 2014
    Assignee: Industrial Technology Research Institute
    Inventor: Hsien-Cheng Liao
  • Patent number: 8744856
    Abstract: A computer implemented method, system and computer program product for evaluating pronunciation. Known phonemes are stored in a computer memory. A spoken utterance corresponding to a target utterance, comprised of a sequence of target phonemes, is received and stored in a computer memory. The spoken utterance is segmented into a sequence of spoken phonemes, each corresponding to a target phoneme. For each spoken phoneme, a relative posterior probability is calculated that the spoken phoneme is the corresponding target phoneme. If the calculated probability is greater than a first threshold, an indication that the target phoneme was pronounced correctly is output; if less than a first threshold, an indication that the target phoneme was pronounced incorrectly is output. If the probability is less than a first threshold and greater than a second threshold, an indication that pronunciation of the target phoneme was acceptable is output.
    Type: Grant
    Filed: February 21, 2012
    Date of Patent: June 3, 2014
    Assignee: Carnegie Speech Company
    Inventor: Mosur K. Ravishankar
  • Publication number: 20140149113
    Abstract: A speech recognition system, according to an example embodiment, includes a data storage to store speech training data. A training engine determines consecutive breakout periods in the speech training data, calculates forward and backward probabilities for the breakout periods, and generates a speech recognition Hidden Markov Model (HMM) from the forward and backward probabilities calculated for the breakout periods.
    Type: Application
    Filed: November 27, 2012
    Publication date: May 29, 2014
    Applicant: LONGSAND LIMITED
    Inventor: Maha Kadirkamanathan
  • Publication number: 20140149116
    Abstract: There are provided a speech synthesis device, a speech synthesis method and a speech synthesis program which can represent a phoneme as a duration shorter than a duration upon modeling according to a statistical method. A speech synthesis device 80 according to the present invention includes a phoneme boundary updating means 81 which, by using a voiced utterance likelihood index which is an index indicating a degree of voiced utterance likelihood of each state which represents a phoneme modeled by a statistical method, updates a phoneme boundary position which is a boundary with other phonemes neighboring to the phoneme.
    Type: Application
    Filed: June 8, 2012
    Publication date: May 29, 2014
    Applicant: NEC CORPORATION
    Inventors: Yasuyuki Mitsui, Masanori Kato, Reishi Kondo
  • Patent number: 8738367
    Abstract: A speech signal processing device is equipped with a power acquisition unit, a probability distribution acquisition unit, and a correspondence degree determination unit. The power acquisition unit accepts an inputted speech signal and, based on the accepted speech signal, acquires power representing the intensity of a speech sound represented by the speech signal. The probability distribution acquisition unit acquires a probability distribution using the intensity of the power acquired by the power acquisition unit as a random variable. The correspondence degree determination unit determines whether a correspondence degree representing a degree that power acquired by the power acquisition unit in a case that a predetermined reference speech signal is inputted into the power acquisition unit corresponds with predetermined reference power is higher than a predetermined reference correspondence degree, based on the probability distribution acquired by the probability distribution acquisition unit.
    Type: Grant
    Filed: February 18, 2010
    Date of Patent: May 27, 2014
    Assignee: NEC Corporation
    Inventor: Tadashi Emori
  • Patent number: 8738376
    Abstract: Techniques disclosed herein include using a Maximum A Posteriori (MAP) adaptation process that imposes sparseness constraints to generate acoustic parameter adaptation data for specific users based on a relatively small set of training data. The resulting acoustic parameter adaptation data identifies changes for a relatively small fraction of acoustic parameters from a baseline acoustic speech model instead of changes to all acoustic parameters. This results in user-specific acoustic parameter adaptation data that is several orders of magnitude smaller than storage amounts otherwise required for a complete acoustic model. This provides customized acoustic speech models that increase recognition accuracy at a fraction of expected data storage requirements.
    Type: Grant
    Filed: October 28, 2011
    Date of Patent: May 27, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Vaibhava Goel, Peder A. Olsen, Steven J. Rennie, Jing Huang
  • Publication number: 20140129221
    Abstract: A sound recognition device includes a storage for storing a comment that is input while the user listening to sounds emitted as multimedia data being played. The sound recognition device further includes an extractor for extracting a word that appears in a set of sentences that contains the stored comment, and candidate words that contain co-occurrences of the word in the set of sentences. Furthermore, the sound recognition device includes a sound recognizer for recognizing sounds emitted as the multimedia data being played, based on the extracted candidate words.
    Type: Application
    Filed: March 22, 2013
    Publication date: May 8, 2014
    Inventor: Dwango Co., Ltd.
  • Patent number: 8711015
    Abstract: The invention relates to compressing of sparse data sets contains sequences of data values and position information therefor. The position information may be in the form of position indices defining active positions of the data values in a sparse vector of length N. The position information is encoded into the data values by adjusting one or more of the data values within a pre-defined tolerance range, so that a pre-defined mapping function of the data values and their positions is close to a target value. In one embodiment, the mapping function is defined using a sub-set of N filler values which elements are used to fill empty positions in the input sparse data vector. At the decoder, the correct data positions are identified by searching though possible sub-sets of filler values.
    Type: Grant
    Filed: August 24, 2011
    Date of Patent: April 29, 2014
    Assignee: Her Majesty the Queen in Right of Canada as represented by the Minister of Industry, through the Communications Research Centre Canada
    Inventors: Frederic Mustiere, Hossein Najaf-Zadeh, Ramin Pishehvar, Hassan Lahdili, Louis Thibault, Martin Bouchard
  • Patent number: 8712773
    Abstract: The present invention relates to a method for modeling a common-language speech recognition, by a computer, under the influence of multiple dialects and concerns a technical field of speech recognition by a computer. In this method, a triphone standard common-language model is first generated based on training data of standard common language, and first and second monophone dialectal-accented common-language models are based on development data of dialectal-accented common languages of first kind and second kind, respectively. Then a temporary merged model is obtained in a manner that the first dialectal-accented common-language model is merged into the standard common-language model according to a first confusion matrix obtained by recognizing the development data of first dialectal-accented common language using the standard common-language model.
    Type: Grant
    Filed: October 29, 2009
    Date of Patent: April 29, 2014
    Assignees: Sony Computer Entertainment Inc., Tsinghua University
    Inventors: Fang Zheng, Xi Xiao, Linquan Liu, Zhan You, Wenxiao Cao, Makoto Akabane, Ruxin Chen, Yoshikazu Takahashi
  • Publication number: 20140114660
    Abstract: A method and device for speaker recognition are provided. In the present invention, identifiability re-estimation is performed on a first vector (namely, a weight vector) in a score function by adopting a support vector machine (SVM), so that a recognition result of a characteristic parameter of a test voice is more accurate, thereby improving identifiability of speaker recognition.
    Type: Application
    Filed: December 31, 2013
    Publication date: April 24, 2014
    Applicant: Huawei Technologies Co., Ltd.
    Inventors: Xiang Zhang, Hualin Wan, Jun Zhang
  • Patent number: 8706488
    Abstract: In one aspect, a method of processing a voice signal to extract information to facilitate training a speech synthesis model is provided. The method comprises acts of detecting a plurality of candidate features in the voice signal, performing at least one comparison between one or more combinations of the plurality of candidate features and the voice signal, and selecting a set of features from the plurality of candidate features based, at least in part, on the at least one comparison. In another aspect, the method is performed by executing a program encoded on a computer readable medium. In another aspect, a speech synthesis model is provided by, at least in part, performing the method.
    Type: Grant
    Filed: February 27, 2013
    Date of Patent: April 22, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Michael D. Edgington, Laurence Gillick, Jordan R. Cohen
  • Patent number: 8700398
    Abstract: An interactive user interface is described for setting confidence score thresholds in a language processing system. There is a display of a first system confidence score curve characterizing system recognition performance associated with a high confidence threshold, a first user control for adjusting the high confidence threshold and an associated visual display highlighting a point on the first system confidence score curve representing the selected high confidence threshold, a display of a second system confidence score curve characterizing system recognition performance associated with a low confidence threshold, and a second user control for adjusting the low confidence threshold and an associated visual display highlighting a point on the second system confidence score curve representing the selected low confidence threshold. The operation of the second user control is constrained to require that the low confidence threshold must be less than or equal to the high confidence threshold.
    Type: Grant
    Filed: November 29, 2011
    Date of Patent: April 15, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Jeffrey N. Marcus, Amy E. Ulug, William Bridges Smith, Jr.
  • Publication number: 20140100848
    Abstract: Methods and systems for identifying specified phrases within audio streams are provided. More particularly, a phrase is specified. An audio stream is them monitored for the phrase. In response to determining that the audio stream contains the phrase, verification from a user that the phrase was in fact included in the audio stream is requested. If such verification is received, the portion of the audio stream including the phrase is recorded. The recorded phrase can then be applied to identify future instances of the phrase in monitored audio streams.
    Type: Application
    Filed: October 5, 2012
    Publication date: April 10, 2014
    Applicant: AVAYA INC.
    Inventors: Shmuel Shaffer, Keith Ponting, Valentine C. Matula
  • Patent number: 8694317
    Abstract: Methods for processing audio data containing speech to produce a searchable index file and for subsequently searching such an index file are provided. The processing method uses a phonetic approach and models each frame of the audio data with a set of reference phones. A score for each of the reference phones, representing the difference of the audio from the phone model, is stored in the searchable data file for each of the phones in the reference set. A consequence of storing information regarding each of the reference phones is that the accuracy of searches carried out on the index file is not compromised by the rejection of information about particular phones. A subsequent search method is also provided which uses a simple and efficient dynamic programming search to locate instances of a search term in the audio. The methods of the present invention have particular application to the field of audio data mining.
    Type: Grant
    Filed: February 6, 2006
    Date of Patent: April 8, 2014
    Assignee: Aurix Limited
    Inventors: Adrian I Skilling, Howard A K Wright
  • Patent number: 8694316
    Abstract: An automatic speech recognition (ASR) system includes a speech-responsive application and a recognition engine. The ASR system generates user prompts to elicit certain spoken inputs, and the speech-responsive application performs operations when the spoken inputs are recognized. The recognition engine compares sounds within an input audio signal with phones within an acoustic model, to identify candidate matching phones. A recognition confidence score is calculated for each candidate matching phone, and the confidence scores are used to help identify one or more likely sequences of matching phones that appear to match a word within the grammar of the speech-responsive application. The per-phone confidence scores are evaluated against predefined confidence score criteria (for example, identifying scores below a ‘low confidence’ threshold) and the results of the evaluation are used to influence subsequent selection of user prompts.
    Type: Grant
    Filed: October 20, 2005
    Date of Patent: April 8, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: John Brian Pickering, Timothy David Poultney, Benjamin Terrick Staniford, Matthew Whitbourne
  • Patent number: 8682661
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for recognizing speech input. In one aspect, a method includes receiving a user input and a grammar including annotations, the user input comprising audio data and the annotations providing syntax and semantics to the grammar, retrieving third-party statistical speech recognition information, the statistical speech recognition information being transmitted over a network, generating a statistical language model (SLM) based on the grammar and the statistical speech recognition information, the SLM preserving semantics of the grammar, processing the user input using the SLM to generate one or more results, comparing the one or more results to candidates provided in the grammar, identifying a particular candidate of the grammar based on the comparing, and providing the particular candidate for input to an application executed on a computing device.
    Type: Grant
    Filed: August 31, 2010
    Date of Patent: March 25, 2014
    Assignee: Google Inc.
    Inventors: Johan Schalkwyk, Bjorn Bringert, David P. Singleton
  • Patent number: 8682668
    Abstract: A speech recognition apparatus that performs frame synchronous beam search by using a language model score look-ahead value prevents the pruning of a correct answer hypothesis while suppressing an increase in the number of hypotheses. A language model score look-ahead value imparting device 108 is provided with a word dictionary 203 that defines a phoneme string of a word, a language model 202 that imparts a score of appearance easiness of a word, and a smoothing language model score look-ahead value calculation means 201. The smoothing language model score look-ahead value calculation means 201 obtains a language model score look-ahead value at each phoneme in the word from the phoneme string of the word defined by the word dictionary 203 and the language model score defined by the language model 202 so that the language model score look-ahead values are prevented from concentrating on the beginning of the word.
    Type: Grant
    Filed: March 27, 2009
    Date of Patent: March 25, 2014
    Assignee: NEC Corporation
    Inventors: Koji Okabe, Ryosuke Isotani, Kiyoshi Yamabana, Ken Hanazawa
  • Patent number: 8682660
    Abstract: A system and a method to correct semantic interpretation recognition errors presented in this invention applies to Automatic Speech Recognition systems returning recognition results with semantic interpretations. The method finds the most likely intended semantic interpretation given the recognized sequence of words and the recognized semantic interpretation. The key point is the computation of the conditional probability of the recognized sequence of words given the recognized semantic interpretation and a particular intended semantic interpretation. It is done with the use of Conditional Language Models which are Statistical Language Models trained on a corpus of utterances collected under the condition of a particular recognized semantic interpretation and a particular intended semantic interpretation. Based on these conditional probabilities and the joint probabilities of the recognized and intended semantic interpretations, new semantic interpretation confidences are computed.
    Type: Grant
    Filed: May 16, 2009
    Date of Patent: March 25, 2014
    Assignee: Resolvity, Inc.
    Inventors: Yevgeniy Lyudovyk, Jacek Jarmulak
  • Patent number: 8677236
    Abstract: Word predictions in a message are selected or prioritized based on the recipient of the message and a previous location of use by a user. An input history is created based on messages sent to the recipient from the user at a particular location (e.g., global positioning system coordinates). As the user composes subsequent messages, a current location of the user is determined. Word predictions are performed based on a comparison of the current location to the previous locations, and based on the recipient(s). In further embodiments, location-aware spell-check functionality is provided for the messages.
    Type: Grant
    Filed: December 19, 2008
    Date of Patent: March 18, 2014
    Assignee: Microsoft Corporation
    Inventors: Jason Michael Bower, Rui Li, Kenichi Morimoto, Honghui Sun, Simon Liu
  • Patent number: 8676580
    Abstract: A method, an apparatus and an article of manufacture for automatic speech recognition. The method includes obtaining at least one language model word and at least one rule-based grammar word, determining an acoustic similarity of at least one pair of language model word and rule-based grammar word, and increasing a transition cost to the at least one language model word based on the acoustic similarity of the at least one language model word with the at least one rule-based grammar word to generate a modified language model for automatic speech recognition.
    Type: Grant
    Filed: August 16, 2011
    Date of Patent: March 18, 2014
    Assignee: International Business Machines Corporation
    Inventors: Om D. Deshmukh, Etienne Marcheret, Shajith I. Mohamed, Ashish Verma, Karthik Visweswariah
  • Patent number: 8676581
    Abstract: Embodiments are disclosed that relate to the use of identity information to help avoid the occurrence of false positive speech recognition events in a speech recognition system. One embodiment provides a method comprising receiving speech recognition data comprising a recognized speech segment, acoustic locational data related to a location of origin of the recognized speech segment as determined via signals from the microphone array, and confidence data comprising a recognition confidence value, and also receiving image data comprising visual locational information related to a location of each person in an image. The acoustic locational data is compared to the visual locational data to determine whether the recognized speech segment originated from a person in the field of view of the image sensor, and the confidence data is adjusted depending on this determination.
    Type: Grant
    Filed: January 22, 2010
    Date of Patent: March 18, 2014
    Assignee: Microsoft Corporation
    Inventors: Jason Flaks, Dax Hawkins, Christian Klein, Mitchell Stephen Dernis, Tommer Leyvand, Ali M. Vassigh, Duncan McKay
  • Patent number: 8666737
    Abstract: A noise power estimation system for estimating noise power of each frequency spectral component includes a cumulative histogram generating section for generating a cumulative histogram for each frequency spectral component of a time series signal, in which the horizontal axis indicates index of power level and the vertical axis indicates cumulative frequency and which is weighted by exponential moving average; and a noise power estimation section for determining an estimated value of noise power for each frequency spectral component of the time series signal based on the cumulative histogram.
    Type: Grant
    Filed: September 14, 2011
    Date of Patent: March 4, 2014
    Assignee: Honda Motor Co., Ltd.
    Inventors: Hirofumi Nakajima, Kazuhiro Nakadai, Yuji Hasegawa
  • Patent number: 8655656
    Abstract: A method for assessing intelligibility of speech represented by a speech signal includes providing a speech signal and performing a feature extraction on at least one frame of the speech signal so as to obtain a feature vector for each of the at least one frame of the speech signal. The feature vector is input to a statistical machine learning model so as to obtain an estimated posterior probability of phonemes in the at least one frame as an output including a vector of phoneme posterior probabilities of different phonemes for each of the at least one frame of the speech signal. An entropy estimation is performed on the vector of phoneme posterior probabilities of the at least one frame of the speech signal so as to evaluate intelligibility of the at least one frame of the speech signal. An intelligibility measure is output for the at least one frame of the speech signal.
    Type: Grant
    Filed: March 4, 2011
    Date of Patent: February 18, 2014
    Assignee: Deutsche Telekom AG
    Inventors: Hamed Ketabdar, Juan-Pablo Ramirez
  • Patent number: 8655657
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving (i) audio data that encodes a spoken natural language query, and (ii) environmental audio data, obtaining a transcription of the spoken natural language query, determining a particular content type associated with one or more keywords in the transcription, providing at least a portion of the environmental audio data to a content recognition engine, and identifying a content item that has been output by the content recognition engine, and that matches the particular content type.
    Type: Grant
    Filed: February 15, 2013
    Date of Patent: February 18, 2014
    Assignee: Google Inc.
    Inventors: Matthew Sharifi, Gheorghe Postelnicu
  • Patent number: 8655647
    Abstract: Described is a technology by which a statistical N-gram (e.g., language) model is trained using an N-gram selection technique that helps reduce the size of the final N-gram model. During training, a higher-order probability estimate for an N-gram is only added to the model when the training data justifies adding the estimate. To this end, if a backoff probability estimate is within a maximum likelihood set determined by that N-gram and the N-gram's associated context, or is between the higher-order estimate and the maximum likelihood set, then the higher-order estimate is not included in the model. The backoff probability estimate may be determined via an iterative process such that the backoff probability estimate is based on the final model rather than any lower-order model. Also described is additional pruning referred to as modified weighted difference pruning.
    Type: Grant
    Filed: March 11, 2010
    Date of Patent: February 18, 2014
    Assignee: Microsoft Corporation
    Inventor: Robert Carter Moore
  • Patent number: 8650031
    Abstract: Techniques disclosed herein include systems and methods for voice-enabled searching. Techniques include a co-occurrence based approach to improve accuracy of the 1-best hypothesis for non-phrase voice queries, as well as for phrased voice queries. A co-occurrence model is used in addition to a statistical natural language model and acoustic model to recognize spoken queries, such as spoken queries for searching a search engine. Given an utterance and an associated list of automated speech recognition n-best hypotheses, the system rescores the different hypotheses using co-occurrence information. For each hypothesis, the system estimates a frequency of co-occurrence within web documents. Combined scores from a speech recognizer and a co-occurrence engine can be combined to select a best hypothesis with a lower word error rate.
    Type: Grant
    Filed: July 31, 2011
    Date of Patent: February 11, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Jonathan Mamou, Abhinav Sethy, Bhuvana Ramabhadran, Ron Hoory, Paul Joseph Vozila, Nathan Bodenstab
  • Patent number: 8627230
    Abstract: A method, system, and computer program product for intelligent command prediction are provided. The method includes determining a command prediction preference associated with a user from user profile data, and selecting one or more command history repositories responsive to the command prediction preference. The one or more command history repositories include command history data collected from a plurality of users and classification data associated with the plurality of users. The method also includes calculating command probabilities for commands in the command history data of the selected one or more command history repositories as a function of the classification data associated with the plurality of users in relation to the user. The method additionally includes presenting a next suggested command as a command from the command history data of the selected one or more command history repositories with a highest calculated command probability.
    Type: Grant
    Filed: November 24, 2009
    Date of Patent: January 7, 2014
    Assignee: International Business Machines Corporation
    Inventors: Olivier Boehler, Gisela C. Cheng, Anuja Deedwaniya, Zamir G. Gonzalez, Shayne M. Grant, Jagadish B. Kotra
  • Patent number: 8620655
    Abstract: A speech processing method, comprising: receiving a speech input which comprises a sequence of feature vectors; determining the likelihood of a sequence of words arising from the sequence of feature vectors using an acoustic model and a language model, comprising: providing an acoustic model for performing speech recognition on an input signal which comprises a sequence of feature vectors, said model having a plurality of model parameters relating to the probability distribution of a word or part thereof being related to a feature vector, wherein said speech input is a mismatched speech input which is received from a speaker in an environment which is not matched to the speaker or environment under which the acoustic model was trained; and adapting the acoustic model to the mismatched speech input, the speech processing method further comprising determining the likelihood of a sequence of features occurring in a given language using a language model; and combining the likelihoods determined by the acoustic
    Type: Grant
    Filed: August 10, 2011
    Date of Patent: December 31, 2013
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Haitian Xu, Kean Kheong Chin, Mark John Francis Gales