Probability Patents (Class 704/240)

Method and apparatus of providing semi-automated classifier adaptation for natural language processing

Patent number: 8892437

Abstract: Example embodiments of the present invention may include a method that provides transcribing spoken utterances occurring during a call and assigning each of the spoken utterances with a corresponding set of first classifications. The method may also include determining a confidence rating associated with each of the spoken utterances and the assigned set of first classifications, and performing at least one of reclassifying the spoken utterances with new classifications based on at least one additional classification operation, and adding the assigned first classifications and the corresponding plurality of spoken utterances to a training data set.

Type: Grant

Filed: November 13, 2013

Date of Patent: November 18, 2014

Assignee: West Corporation

Inventor: Silke Witt-ehsani
Using speech recognition results based on an unstructured language model in a mobile communication facility application

Patent number: 8886540

Abstract: A method and system for entering information into a software application resident on a mobile communication facility is provided. The method and system may include recording speech presented by a user using a mobile communication facility resident capture facility, transmitting the recording through a wireless communication facility to a speech recognition facility, transmitting information relating to the software application to the speech recognition facility, generating results utilizing the speech recognition facility using an unstructured language model based at least in part on the information relating to the software application and the recording, transmitting the results to the mobile communications facility, loading the results into the software application and simultaneously displaying the results as a set of words and as a set of application results based on those words.

Type: Grant

Filed: August 1, 2008

Date of Patent: November 11, 2014

Assignee: Vlingo Corporation

Inventors: Joseph P. Cerra, John N. Nguyen, Michael S. Phillips, Han Shu, Alexandra Beth Mischke
System and method for combining frame and segment level processing, via temporal pooling, for phonetic classification

Patent number: 8886533

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for combining frame and segment level processing, via temporal pooling, for phonetic classification. A frame processor unit receives an input and extracts the time-dependent features from the input. A plurality of pooling interface units generates a plurality of feature vectors based on pooling the time-dependent features and selecting a plurality of time-dependent features according to a plurality of selection strategies. Next, a plurality of segmental classification units generates scores for the feature vectors. Each segmental classification unit (SCU) can be dedicated to a specific pooling interface unit (PIU) to form a PIU-SCU combination. Multiple PIU-SCU combinations can be further combined to form an ensemble of combinations, and the ensemble can be diversified by varying the pooling operations used by the PIU-SCU combinations.

Type: Grant

Filed: October 25, 2011

Date of Patent: November 11, 2014

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Sumit Chopra, Dimitrios Dimitriadis, Patrick Haffner
Leveraging interaction context to improve recognition confidence scores

Patent number: 8886532

Abstract: On a computing device a speech utterance is received from a user. The speech utterance is a section of a speech dialog that includes a plurality of speech utterances. One or more features from the speech utterance are identified. Each identified feature from the speech utterance is a specific characteristic of the speech utterance. One or more features from the speech dialog are identified. Each identified feature from the speech dialog is associated with one or more events in the speech dialog. The one or more events occur prior to the speech utterance. One or more identified features from the speech utterance and one or more identified features from the speech dialog are used to calculate a confidence score for the speech utterance.

Type: Grant

Filed: October 27, 2010

Date of Patent: November 11, 2014

Assignee: Microsoft Corporation

Inventors: Michael Levit, Bruce Melvin Buntschuh
Object classification/recognition apparatus and method

Patent number: 8873868

Abstract: An apparatus is provided for classifying targets into a known-object group and an unknown-object group. The apparatus includes a speech/image data storage unit configured to store a spoken sound of a name of an object and an image of the object; a unit configured to calculate a speech confidence level of a speech for the name of the object with reference to a spoken sound of a name of a known object; a unit configured to calculate an image confidence level of an image of an object with respect to an image of a known object; and a unit configured to compare an evaluation value, which is obtained by combining the speech confidence level and image confidence level, with a threshold value, and classify a target object into an object group determined according to whether the spoken sound of the name and the image are known or unknown.

Type: Grant

Filed: December 21, 2012

Date of Patent: October 28, 2014

Assignees: Honda Motor Co. Ltd., National University Corporation Kobe University

Inventors: Mikio Nakano, Naoto Iwahashi, Yasuo Ariki, Yuko Ozasa, Takahiro Hori, Ryohei Nakatani
Text processing using natural language understanding

Patent number: 8856004

Abstract: Techniques for converting spoken speech into written speech are provided. The techniques include transcribing input speech via speech recognition, mapping each spoken utterance from input speech into a corresponding formal utterance, and mapping each formal utterance into a stylistically formatted written utterance.

Type: Grant

Filed: May 13, 2011

Date of Patent: October 7, 2014

Assignee: Nuance Communications, Inc.

Inventors: Sara H. Basson, Rick Hamilton, Dan Ning Jiang, Dimitri Kanevsky, David Nahamoo, Michael Picheny, Bhuvana Ramabhadran, Tara N. Sainath
Distance metrics for universal pattern processing tasks

Patent number: 8856002

Abstract: A universal pattern processing system receives input data and produces output patterns that are best associated with said data. The system uses input means receiving and processing input data, a universal pattern decoder means transforming models using the input data and associating output patterns with original models that are changed least during transforming, and output means outputting best associated patterns chosen by a pattern decoder means.

Type: Grant

Filed: April 11, 2008

Date of Patent: October 7, 2014

Assignee: International Business Machines Corporation

Inventors: Dimitri Kanevsky, David Nahamoo, Tara N Sainath
Audio signal classification by shape parameter estimation for a plurality of audio signal samples

Patent number: 8856049

Abstract: An apparatus for classifying an audio signal configured to: estimate at least one shaping parameter value for a plurality of samples of the audio signal; generate at least one audio signal classification value by mapping the at least one shaping parameter value to one of at least two interval estimates; and determine at least one audio signal classification decision based on the at least one audio signal classification value.

Type: Grant

Filed: March 26, 2008

Date of Patent: October 7, 2014

Assignee: Nokia Corporation

Inventors: Adriana Vasilache, Lasse Juhani Laaksonen, Mikko Tapio Tammi, Anssi Sakari Ramo
Systems and methods for segmenting and/or classifying an audio signal from transformed audio information

Patent number: 8849663

Abstract: A system and method may be provided to segment and/or classify an audio signal from transformed audio information. Transformed audio information representing a sound may be obtained. The transformed audio information may specify magnitude of a coefficient related to energy amplitude as a function of frequency for the audio signal and time. Features associated with the audio signal may be obtained from the transformed audio information. Individual ones of the features may be associated with a feature score relative to a predetermined speaker model. An aggregate score may be obtained based on the feature scores according to a weighting scheme. The weighting scheme may be associated with a noise and/or SNR estimation. The aggregate score may be used for segmentation to identify portions of the audio signal containing speech of one or more different speakers. For classification, the aggregate score may be used to determine a likely speaker model to identify a source of the sound in the audio signal.

Type: Grant

Filed: August 8, 2011

Date of Patent: September 30, 2014

Assignee: The Intellisis Corporation

Inventors: David C. Bradley, Robert N. Hilton, Daniel S. Goldin, Nicholas K. Fisher, Derrick R. Roos, Eric Wiewiora
Adaptive equalization system

Patent number: 8843367

Abstract: An adaptive equalization system that adjusts the spectral shape of a speech signal based on an intelligibility measurement of the speech signal may improve the intelligibility of the output speech signal. Such an adaptive equalization system may include a speech intelligibility measurement module, a spectral shape adjustment module, and an adaptive equalization module. The speech intelligibility measurement module is configured to calculate a speech intelligibility measurement of a speech signal. The spectral shape adjustment module is configured to generate a weighted long-term speech curve based on a first predetermined long-term average speech curve, a second predetermined long-term average speech curve, and the speech intelligibility measurement. The adaptive equalization module is configured to adapt equalization coefficients for the speech signal based on the weighted long-term speech curve.

Type: Grant

Filed: May 4, 2012

Date of Patent: September 23, 2014

Assignee: 8758271 Canada Inc.

Inventors: Phillip Alan Hetherington, Xueman Li
METHOD AND APPARATUS FOR AUDIO CHARACTERIZATION

Publication number: 20140278412

Abstract: Characterizing an acoustic signal includes extracting a vector from the acoustic signal, where the vector contains information about the nuisance characteristics present in the acoustic signal, and computing a set of likelihoods of the vector for a plurality of classes that model a plurality of nuisance characteristics. Training a system to characterize an acoustic signal includes obtaining training data, the training data comprising a plurality of acoustic signals, where each of the plurality of acoustic signals is associated with one of a plurality of classes that indicates a presence of a specific type of nuisance characteristic, transforming each of the plurality of acoustic signals into a vector that summarizes information about the acoustic characteristics of the signal, to produce a plurality of vectors, and labeling each of the plurality of vectors with one of the plurality of classes.

Type: Application

Filed: March 15, 2013

Publication date: September 18, 2014

Applicant: SRI International

Inventors: NICOLAS SCHEFFER, LUCIANA FERRER
SPEECH RECOGNITION VOCABULARY INTEGRATION

Publication number: 20140278411

Abstract: A method for vocabulary integration of speech recognition comprises converting multiple speech signals into multiple words using a processor, applying confidence scores to the multiple words, classifying the multiple words into a plurality of classifications based on classification criteria and the confidence score for each word, determining if one or more of the multiple words are unrecognized based on the plurality of classifications, classifying each unrecognized word and detecting a match for the unrecognized word based on additional classification criteria, and upon detecting a match for an unrecognized word, converting at least a portion of the multiple speech signals corresponding to the unrecognized word into words.

Type: Application

Filed: March 13, 2013

Publication date: September 18, 2014

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventor: Chun Shing Cheung
Subword-based multi-level pronunciation adaptation for recognizing accented speech

Patent number: 8825481

Abstract: Techniques are described for training a speech recognition model for accented speech. A subword parse table is employed that models mispronunciations at multiple subword levels, such as the syllable, position-specific cluster, and/or phone levels. Mispronunciation probability data is then generated at each level based on inputted training data, such as phone-level annotated transcripts of accented speech. Data from different levels of the subword parse table may then be combined to determine the accented speech model. Mispronunciation probability data at each subword level is based at least in part on context at that level. In some embodiments, phone-level annotated transcripts are generated using a semi-supervised method.

Type: Grant

Filed: January 20, 2012

Date of Patent: September 2, 2014

Assignee: Microsoft Corporation

Inventors: Albert Joseph Kishan Thambiratnam, Timo Pascal Mertens, Frank Torsten Bernd Seide
Custom language models

Patent number: 8826226

Abstract: Systems, methods, and apparatuses including computer program products for generating a custom language model. In one implementation, a method is provided. The method includes receiving a collection of documents; clustering the documents into one or more clusters; generating a cluster vector for each cluster of the one or more clusters; generating a target vector associated with a target profile; comparing the target vector with each of the cluster vectors; selecting one or more of the one or more clusters based on the comparison; and generating a language model using documents from the one or more selected clusters.

Type: Grant

Filed: November 5, 2008

Date of Patent: September 2, 2014

Assignee: Google Inc.

Inventors: Jun Wu, Henry Ou, Xiliu Tang, Yong-Gang Wang, Yongyan Liu
Dialogue speech recognition system, dialogue speech recognition method, and recording medium for storing dialogue speech recognition program

Patent number: 8818801

Abstract: Disclosed is a dialogue speech recognition system that can expand the scope of applications by employing a universal dialogue structure as the condition for speech recognition of dialogue speech between persons. An acoustic likelihood computation means (701) provides a likelihood that a speech signal input from a given phoneme sequence will occur. A linguistic likelihood computation means (702) provides a likelihood that a given word sequence will occur. A maximum likelihood candidate search means (703) uses the likelihoods provided by the acoustic likelihood computation means and the linguistic likelihood computation means to provide a word sequence with the maximum likelihood of occurring from a speech signal. Further, the linguistic likelihood computation means (702) provides different linguistic likelihoods when the speaker who generated the acoustic signal input to the speech recognition means does and does not have the turn to speak.

Type: Grant

Filed: May 12, 2009

Date of Patent: August 26, 2014

Assignee: NEC Corporation

Inventor: Kentaro Nagatomo
Large language models in machine translation

Patent number: 8812291

Abstract: Systems, methods, and computer program products for machine translation are provided. In some implementations a system is provided. The system includes a language model including a collection of n-grams from a corpus, each n-gram having a corresponding relative frequency in the corpus and an order n corresponding to a number of tokens in the n-gram, each n-gram corresponding to a backoff n-gram having an order of n?1 and a collection of backoff scores, each backoff score associated with an n-gram, the backoff score determined as a function of a backoff factor and a relative frequency of a corresponding backoff n-gram in the corpus.

Type: Grant

Filed: December 10, 2012

Date of Patent: August 19, 2014

Assignee: Google Inc.

Inventors: Thorsten Brants, Ashok C. Popat, Peng Xu, Franz J. Och, Jeffrey Dean
Key word determinations from voice data

Patent number: 8798995

Abstract: Topics of potential interest to a user, useful for purposes such as targeted advertising and product recommendations, can be extracted from voice content produced by a user. A computing device can capture voice content, such as when a user speaks into or near the device. One or more sniffer algorithms or processes can attempt to identify trigger words in the voice content, which can indicate a level of interest of the user. For each identified potential trigger word, the device can capture adjacent audio that can be analyzed, on the device or remotely, to attempt to determine one or more keywords associated with that trigger word. The identified keywords can be stored and/or transmitted to an appropriate location accessible to entities such as advertisers or content providers who can use the keywords to attempt to select or customize content that is likely relevant to the user.

Type: Grant

Filed: September 23, 2011

Date of Patent: August 5, 2014

Assignee: Amazon Technologies, Inc.

Inventor: Kiran K. Edara
Resource conservative transformation based unsupervised speaker adaptation

Patent number: 8798994

Abstract: The present invention discloses a solution for conserving computing resources when implementing transformation based adaptation techniques. The disclosed solution limits the amount of speech data used by real-time adaptation algorithms to compute a transformation, which results in substantial computational savings. Appreciably, application of a transform is a relatively low memory and computationally cheap process compared to memory and resource requirements for computing the transform to be applied.

Type: Grant

Filed: February 6, 2008

Date of Patent: August 5, 2014

Assignee: International Business Machines Corporation

Inventors: John W. Eckhart, Michael Florio, Radek Hampl, Pavel Krbec, Jonathan Palgon
METHOD AND SYSTEM FOR AUTOMATIC SPEECH RECOGNITION

Publication number: 20140214419

Abstract: An automatic speech recognition method includes at a computer having one or more processors and memory for storing one or more programs to be executed by the processors, obtaining a plurality of speech corpus categories through classifying and calculating raw speech corpus; obtaining a plurality of classified language models that respectively correspond to the plurality of speech corpus categories through a language model training applied on each speech corpus category; obtaining an interpolation language model through implementing a weighted interpolation on each classified language model and merging the interpolated plurality of classified language models; constructing a decoding resource in accordance with an acoustic model and the interpolation language model; and decoding input speech using the decoding resource, and outputting a character string with a highest probability as a recognition result of the input speech.

Type: Application

Filed: December 16, 2013

Publication date: July 31, 2014

Applicant: Tencent Technology (Shenzhen) Company Limited

Inventors: Feng Rao, Li Lu, Bo Chen, Shuai Yue, Xiang Zhang, Eryu Wang, Dadong Xie, Lou Li, Duling Lu
Confidence measure generation for speech related searching

Patent number: 8793130

Abstract: A method of generating a confidence measure generator is provided for use in a voice search system, the voice search system including voice search components comprising a speech recognition system, a dialog manager and a search system. The method includes selecting voice search features, from a plurality of the voice search components, to be considered by the confidence measure generator in generating a voice search confidence measure. The method includes training a model, using a computer processor, to generate the voice search confidence measure based on selected voice search features.

Type: Grant

Filed: March 23, 2012

Date of Patent: July 29, 2014

Assignee: Microsoft Corporation

Inventors: Ye-Yi Wang, Yun-Cheng Ju, Dong Yu
Reducing false positives in speech recognition systems

Patent number: 8781825

Abstract: Embodiments of the present invention improve methods of performing speech recognition. In one embodiment, the present invention includes a method comprising receiving a spoken utterance, processing the spoken utterance in a speech recognizer to generate a recognition result, determining consistencies of one or more parameters of component sounds of the spoken utterance, wherein the parameters are selected from the group consisting of duration, energy, and pitch, and wherein each component sound of the spoken utterance has a corresponding value of said parameter, and validating the recognition result based on the consistency of at least one of said parameters.

Type: Grant

Filed: August 24, 2011

Date of Patent: July 15, 2014

Assignee: Sensory, Incorporated

Inventors: Jonathan Shaw, Pieter Vermeulen, Stephen Sutton, Robert Savoie
Method and apparatus of estimating optimum dialog state timeout settings in a spoken dialog system

Patent number: 8762154

Abstract: Example embodiments of the present invention may include a method that includes collecting caller response timings to each of a plurality of dialog states conducted during a call, and estimating a plurality of parameters based on the caller response timings. The method may also include selecting a response completeness value responsive to the estimated plurality of parameters, the response completeness value is used to calculate at least one optimal timeout value. The method may also include selecting the at least one optimal timeout value, and setting the at least one optimal timeout value for each of the corresponding dialog states. The timeout value(s) may be used for subsequent calls to provide optimal user satisfaction and call success rates.

Type: Grant

Filed: August 15, 2011

Date of Patent: June 24, 2014

Assignee: West Corporation

Inventor: Silke Witt-ehsani
Microphone-array-based speech recognition system and method

Patent number: 8744849

Abstract: A microphone-array-based speech recognition system combines a noise cancelling technique for cancelling noise of input speech signals from an array of microphones, according to at least an inputted threshold. The system receives noise-cancelled speech signals outputted by a noise masking module through at least a speech model and at least a filler model, then computes a confidence measure score with the at least a speech model and the at least a filler model for each threshold and each noise-cancelled speech signal, and adjusts the threshold to continue the noise cancelling for achieving a maximum confidence measure score, thereby outputting a speech recognition result related to the maximum confidence measure score.

Type: Grant

Filed: October 12, 2011

Date of Patent: June 3, 2014

Assignee: Industrial Technology Research Institute

Inventor: Hsien-Cheng Liao
Computer implemented system and method and computer program product for evaluating pronunciation of phonemes in a language

Patent number: 8744856

Abstract: A computer implemented method, system and computer program product for evaluating pronunciation. Known phonemes are stored in a computer memory. A spoken utterance corresponding to a target utterance, comprised of a sequence of target phonemes, is received and stored in a computer memory. The spoken utterance is segmented into a sequence of spoken phonemes, each corresponding to a target phoneme. For each spoken phoneme, a relative posterior probability is calculated that the spoken phoneme is the corresponding target phoneme. If the calculated probability is greater than a first threshold, an indication that the target phoneme was pronounced correctly is output; if less than a first threshold, an indication that the target phoneme was pronounced incorrectly is output. If the probability is less than a first threshold and greater than a second threshold, an indication that pronunciation of the target phoneme was acceptable is output.

Type: Grant

Filed: February 21, 2012

Date of Patent: June 3, 2014

Assignee: Carnegie Speech Company

Inventor: Mosur K. Ravishankar
SPEECH RECOGNITION

Publication number: 20140149113

Abstract: A speech recognition system, according to an example embodiment, includes a data storage to store speech training data. A training engine determines consecutive breakout periods in the speech training data, calculates forward and backward probabilities for the breakout periods, and generates a speech recognition Hidden Markov Model (HMM) from the forward and backward probabilities calculated for the breakout periods.

Type: Application

Filed: November 27, 2012

Publication date: May 29, 2014

Applicant: LONGSAND LIMITED

Inventor: Maha Kadirkamanathan
SPEECH SYNTHESIS DEVICE, SPEECH SYNTHESIS METHOD, AND SPEECH SYNTHESIS PROGRAM

Publication number: 20140149116

Abstract: There are provided a speech synthesis device, a speech synthesis method and a speech synthesis program which can represent a phoneme as a duration shorter than a duration upon modeling according to a statistical method. A speech synthesis device 80 according to the present invention includes a phoneme boundary updating means 81 which, by using a voiced utterance likelihood index which is an index indicating a degree of voiced utterance likelihood of each state which represents a phoneme modeled by a statistical method, updates a phoneme boundary position which is a boundary with other phonemes neighboring to the phoneme.

Type: Application

Filed: June 8, 2012

Publication date: May 29, 2014

Applicant: NEC CORPORATION

Inventors: Yasuyuki Mitsui, Masanori Kato, Reishi Kondo
Speech signal processing device

Patent number: 8738367

Abstract: A speech signal processing device is equipped with a power acquisition unit, a probability distribution acquisition unit, and a correspondence degree determination unit. The power acquisition unit accepts an inputted speech signal and, based on the accepted speech signal, acquires power representing the intensity of a speech sound represented by the speech signal. The probability distribution acquisition unit acquires a probability distribution using the intensity of the power acquired by the power acquisition unit as a random variable. The correspondence degree determination unit determines whether a correspondence degree representing a degree that power acquired by the power acquisition unit in a case that a predetermined reference speech signal is inputted into the power acquisition unit corresponds with predetermined reference power is higher than a predetermined reference correspondence degree, based on the probability distribution acquired by the probability distribution acquisition unit.

Type: Grant

Filed: February 18, 2010

Date of Patent: May 27, 2014

Assignee: NEC Corporation

Inventor: Tadashi Emori
Sparse maximum a posteriori (MAP) adaptation

Patent number: 8738376

Abstract: Techniques disclosed herein include using a Maximum A Posteriori (MAP) adaptation process that imposes sparseness constraints to generate acoustic parameter adaptation data for specific users based on a relatively small set of training data. The resulting acoustic parameter adaptation data identifies changes for a relatively small fraction of acoustic parameters from a baseline acoustic speech model instead of changes to all acoustic parameters. This results in user-specific acoustic parameter adaptation data that is several orders of magnitude smaller than storage amounts otherwise required for a complete acoustic model. This provides customized acoustic speech models that increase recognition accuracy at a fraction of expected data storage requirements.

Type: Grant

Filed: October 28, 2011

Date of Patent: May 27, 2014

Assignee: Nuance Communications, Inc.

Inventors: Vaibhava Goel, Peder A. Olsen, Steven J. Rennie, Jing Huang
SOUND RECOGNITION DEVICE, NON-TRANSITORY COMPUTER READABLE STORAGE MEDIUM STORED THREREOF SOUND RECOGNITION PROGRAM, AND SOUND RECOGNITION METHOD

Publication number: 20140129221

Abstract: A sound recognition device includes a storage for storing a comment that is input while the user listening to sounds emitted as multimedia data being played. The sound recognition device further includes an extractor for extracting a word that appears in a set of sentences that contains the stored comment, and candidate words that contain co-occurrences of the word in the set of sentences. Furthermore, the sound recognition device includes a sound recognizer for recognizing sounds emitted as the multimedia data being played, based on the extracted candidate words.

Type: Application

Filed: March 22, 2013

Publication date: May 8, 2014

Inventor: Dwango Co., Ltd.
Sparse data compression

Patent number: 8711015

Abstract: The invention relates to compressing of sparse data sets contains sequences of data values and position information therefor. The position information may be in the form of position indices defining active positions of the data values in a sparse vector of length N. The position information is encoded into the data values by adjusting one or more of the data values within a pre-defined tolerance range, so that a pre-defined mapping function of the data values and their positions is close to a target value. In one embodiment, the mapping function is defined using a sub-set of N filler values which elements are used to fill empty positions in the input sparse data vector. At the decoder, the correct data positions are identified by searching though possible sub-sets of filler values.

Type: Grant

Filed: August 24, 2011

Date of Patent: April 29, 2014

Assignee: Her Majesty the Queen in Right of Canada as represented by the Minister of Industry, through the Communications Research Centre Canada

Inventors: Frederic Mustiere, Hossein Najaf-Zadeh, Ramin Pishehvar, Hassan Lahdili, Louis Thibault, Martin Bouchard
Method and system for modeling a common-language speech recognition, by a computer, under the influence of a plurality of dialects

Patent number: 8712773

Abstract: The present invention relates to a method for modeling a common-language speech recognition, by a computer, under the influence of multiple dialects and concerns a technical field of speech recognition by a computer. In this method, a triphone standard common-language model is first generated based on training data of standard common language, and first and second monophone dialectal-accented common-language models are based on development data of dialectal-accented common languages of first kind and second kind, respectively. Then a temporary merged model is obtained in a manner that the first dialectal-accented common-language model is merged into the standard common-language model according to a first confusion matrix obtained by recognizing the development data of first dialectal-accented common language using the standard common-language model.

Type: Grant

Filed: October 29, 2009

Date of Patent: April 29, 2014

Assignees: Sony Computer Entertainment Inc., Tsinghua University

Inventors: Fang Zheng, Xi Xiao, Linquan Liu, Zhan You, Wenxiao Cao, Makoto Akabane, Ruxin Chen, Yoshikazu Takahashi
Method and Device for Speaker Recognition

Publication number: 20140114660

Abstract: A method and device for speaker recognition are provided. In the present invention, identifiability re-estimation is performed on a first vector (namely, a weight vector) in a score function by adopting a support vector machine (SVM), so that a recognition result of a characteristic parameter of a test voice is more accurate, thereby improving identifiability of speaker recognition.

Type: Application

Filed: December 31, 2013

Publication date: April 24, 2014

Applicant: Huawei Technologies Co., Ltd.

Inventors: Xiang Zhang, Hualin Wan, Jun Zhang
Methods and apparatus for formant-based voice synthesis

Patent number: 8706488

Abstract: In one aspect, a method of processing a voice signal to extract information to facilitate training a speech synthesis model is provided. The method comprises acts of detecting a plurality of candidate features in the voice signal, performing at least one comparison between one or more combinations of the plurality of candidate features and the voice signal, and selecting a set of features from the plurality of candidate features based, at least in part, on the at least one comparison. In another aspect, the method is performed by executing a program encoded on a computer readable medium. In another aspect, a speech synthesis model is provided by, at least in part, performing the method.

Type: Grant

Filed: February 27, 2013

Date of Patent: April 22, 2014

Assignee: Nuance Communications, Inc.

Inventors: Michael D. Edgington, Laurence Gillick, Jordan R. Cohen
Interface for setting confidence thresholds for automatic speech recognition and call steering applications

Patent number: 8700398

Abstract: An interactive user interface is described for setting confidence score thresholds in a language processing system. There is a display of a first system confidence score curve characterizing system recognition performance associated with a high confidence threshold, a first user control for adjusting the high confidence threshold and an associated visual display highlighting a point on the first system confidence score curve representing the selected high confidence threshold, a display of a second system confidence score curve characterizing system recognition performance associated with a low confidence threshold, and a second user control for adjusting the low confidence threshold and an associated visual display highlighting a point on the second system confidence score curve representing the selected low confidence threshold. The operation of the second user control is constrained to require that the low confidence threshold must be less than or equal to the high confidence threshold.

Type: Grant

Filed: November 29, 2011

Date of Patent: April 15, 2014

Assignee: Nuance Communications, Inc.

Inventors: Jeffrey N. Marcus, Amy E. Ulug, William Bridges Smith, Jr.
PHRASE SPOTTING SYSTEMS AND METHODS

Publication number: 20140100848

Abstract: Methods and systems for identifying specified phrases within audio streams are provided. More particularly, a phrase is specified. An audio stream is them monitored for the phrase. In response to determining that the audio stream contains the phrase, verification from a user that the phrase was in fact included in the audio stream is requested. If such verification is received, the portion of the audio stream including the phrase is recorded. The recorded phrase can then be applied to identify future instances of the phrase in monitored audio streams.

Type: Application

Filed: October 5, 2012

Publication date: April 10, 2014

Applicant: AVAYA INC.

Inventors: Shmuel Shaffer, Keith Ponting, Valentine C. Matula
Methods and apparatus relating to searching of spoken audio data

Patent number: 8694317

Abstract: Methods for processing audio data containing speech to produce a searchable index file and for subsequently searching such an index file are provided. The processing method uses a phonetic approach and models each frame of the audio data with a set of reference phones. A score for each of the reference phones, representing the difference of the audio from the phone model, is stored in the searchable data file for each of the phones in the reference set. A consequence of storing information regarding each of the reference phones is that the accuracy of searches carried out on the index file is not compromised by the rejection of information about particular phones. A subsequent search method is also provided which uses a simple and efficient dynamic programming search to locate instances of a search term in the audio. The methods of the present invention have particular application to the field of audio data mining.

Type: Grant

Filed: February 6, 2006

Date of Patent: April 8, 2014

Assignee: Aurix Limited

Inventors: Adrian I Skilling, Howard A K Wright
Methods, apparatus and computer programs for automatic speech recognition

Patent number: 8694316

Abstract: An automatic speech recognition (ASR) system includes a speech-responsive application and a recognition engine. The ASR system generates user prompts to elicit certain spoken inputs, and the speech-responsive application performs operations when the spoken inputs are recognized. The recognition engine compares sounds within an input audio signal with phones within an acoustic model, to identify candidate matching phones. A recognition confidence score is calculated for each candidate matching phone, and the confidence scores are used to help identify one or more likely sequences of matching phones that appear to match a word within the grammar of the speech-responsive application. The per-phone confidence scores are evaluated against predefined confidence score criteria (for example, identifying scores below a ‘low confidence’ threshold) and the results of the evaluation are used to influence subsequent selection of user prompts.

Type: Grant

Filed: October 20, 2005

Date of Patent: April 8, 2014

Assignee: Nuance Communications, Inc.

Inventors: John Brian Pickering, Timothy David Poultney, Benjamin Terrick Staniford, Matthew Whitbourne
Robust speech recognition

Patent number: 8682661

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for recognizing speech input. In one aspect, a method includes receiving a user input and a grammar including annotations, the user input comprising audio data and the annotations providing syntax and semantics to the grammar, retrieving third-party statistical speech recognition information, the statistical speech recognition information being transmitted over a network, generating a statistical language model (SLM) based on the grammar and the statistical speech recognition information, the SLM preserving semantics of the grammar, processing the user input using the SLM to generate one or more results, comparing the one or more results to candidates provided in the grammar, identifying a particular candidate of the grammar based on the comparing, and providing the particular candidate for input to an application executed on a computing device.

Type: Grant

Filed: August 31, 2010

Date of Patent: March 25, 2014

Assignee: Google Inc.

Inventors: Johan Schalkwyk, Bjorn Bringert, David P. Singleton
Language model score look-ahead value imparting device, language model score look-ahead value imparting method, and program storage medium

Patent number: 8682668

Abstract: A speech recognition apparatus that performs frame synchronous beam search by using a language model score look-ahead value prevents the pruning of a correct answer hypothesis while suppressing an increase in the number of hypotheses. A language model score look-ahead value imparting device 108 is provided with a word dictionary 203 that defines a phoneme string of a word, a language model 202 that imparts a score of appearance easiness of a word, and a smoothing language model score look-ahead value calculation means 201. The smoothing language model score look-ahead value calculation means 201 obtains a language model score look-ahead value at each phoneme in the word from the phoneme string of the word defined by the word dictionary 203 and the language model score defined by the language model 202 so that the language model score look-ahead values are prevented from concentrating on the beginning of the word.

Type: Grant

Filed: March 27, 2009

Date of Patent: March 25, 2014

Assignee: NEC Corporation

Inventors: Koji Okabe, Ryosuke Isotani, Kiyoshi Yamabana, Ken Hanazawa
Method and system for post-processing speech recognition results

Patent number: 8682660

Abstract: A system and a method to correct semantic interpretation recognition errors presented in this invention applies to Automatic Speech Recognition systems returning recognition results with semantic interpretations. The method finds the most likely intended semantic interpretation given the recognized sequence of words and the recognized semantic interpretation. The key point is the computation of the conditional probability of the recognized sequence of words given the recognized semantic interpretation and a particular intended semantic interpretation. It is done with the use of Conditional Language Models which are Statistical Language Models trained on a corpus of utterances collected under the condition of a particular recognized semantic interpretation and a particular intended semantic interpretation. Based on these conditional probabilities and the joint probabilities of the recognized and intended semantic interpretations, new semantic interpretation confidences are computed.

Type: Grant

Filed: May 16, 2009

Date of Patent: March 25, 2014

Assignee: Resolvity, Inc.

Inventors: Yevgeniy Lyudovyk, Jacek Jarmulak
Contact-specific and location-aware lexicon prediction

Patent number: 8677236

Abstract: Word predictions in a message are selected or prioritized based on the recipient of the message and a previous location of use by a user. An input history is created based on messages sent to the recipient from the user at a particular location (e.g., global positioning system coordinates). As the user composes subsequent messages, a current location of the user is determined. Word predictions are performed based on a comparison of the current location to the previous locations, and based on the recipient(s). In further embodiments, location-aware spell-check functionality is provided for the messages.

Type: Grant

Filed: December 19, 2008

Date of Patent: March 18, 2014

Assignee: Microsoft Corporation

Inventors: Jason Michael Bower, Rui Li, Kenichi Morimoto, Honghui Sun, Simon Liu
Automatic speech and concept recognition

Patent number: 8676580

Abstract: A method, an apparatus and an article of manufacture for automatic speech recognition. The method includes obtaining at least one language model word and at least one rule-based grammar word, determining an acoustic similarity of at least one pair of language model word and rule-based grammar word, and increasing a transition cost to the at least one language model word based on the acoustic similarity of the at least one language model word with the at least one rule-based grammar word to generate a modified language model for automatic speech recognition.

Type: Grant

Filed: August 16, 2011

Date of Patent: March 18, 2014

Assignee: International Business Machines Corporation

Inventors: Om D. Deshmukh, Etienne Marcheret, Shajith I. Mohamed, Ashish Verma, Karthik Visweswariah
Speech recognition analysis via identification information

Patent number: 8676581

Abstract: Embodiments are disclosed that relate to the use of identity information to help avoid the occurrence of false positive speech recognition events in a speech recognition system. One embodiment provides a method comprising receiving speech recognition data comprising a recognized speech segment, acoustic locational data related to a location of origin of the recognized speech segment as determined via signals from the microphone array, and confidence data comprising a recognition confidence value, and also receiving image data comprising visual locational information related to a location of each person in an image. The acoustic locational data is compared to the visual locational data to determine whether the recognized speech segment originated from a person in the field of view of the image sensor, and the confidence data is adjusted depending on this determination.

Type: Grant

Filed: January 22, 2010

Date of Patent: March 18, 2014

Assignee: Microsoft Corporation

Inventors: Jason Flaks, Dax Hawkins, Christian Klein, Mitchell Stephen Dernis, Tommer Leyvand, Ali M. Vassigh, Duncan McKay
Noise power estimation system, noise power estimating method, speech recognition system and speech recognizing method

Patent number: 8666737

Abstract: A noise power estimation system for estimating noise power of each frequency spectral component includes a cumulative histogram generating section for generating a cumulative histogram for each frequency spectral component of a time series signal, in which the horizontal axis indicates index of power level and the vertical axis indicates cumulative frequency and which is weighted by exponential moving average; and a noise power estimation section for determining an estimated value of noise power for each frequency spectral component of the time series signal based on the cumulative histogram.

Type: Grant

Filed: September 14, 2011

Date of Patent: March 4, 2014

Assignee: Honda Motor Co., Ltd.

Inventors: Hirofumi Nakajima, Kazuhiro Nakadai, Yuji Hasegawa
Method and system for assessing intelligibility of speech represented by a speech signal

Patent number: 8655656

Abstract: A method for assessing intelligibility of speech represented by a speech signal includes providing a speech signal and performing a feature extraction on at least one frame of the speech signal so as to obtain a feature vector for each of the at least one frame of the speech signal. The feature vector is input to a statistical machine learning model so as to obtain an estimated posterior probability of phonemes in the at least one frame as an output including a vector of phoneme posterior probabilities of different phonemes for each of the at least one frame of the speech signal. An entropy estimation is performed on the vector of phoneme posterior probabilities of the at least one frame of the speech signal so as to evaluate intelligibility of the at least one frame of the speech signal. An intelligibility measure is output for the at least one frame of the speech signal.

Type: Grant

Filed: March 4, 2011

Date of Patent: February 18, 2014

Assignee: Deutsche Telekom AG

Inventors: Hamed Ketabdar, Juan-Pablo Ramirez
Identifying media content

Patent number: 8655657

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving (i) audio data that encodes a spoken natural language query, and (ii) environmental audio data, obtaining a transcription of the spoken natural language query, determining a particular content type associated with one or more keywords in the transcription, providing at least a portion of the environmental audio data to a content recognition engine, and identifying a content item that has been output by the content recognition engine, and that matches the particular content type.

Type: Grant

Filed: February 15, 2013

Date of Patent: February 18, 2014

Assignee: Google Inc.

Inventors: Matthew Sharifi, Gheorghe Postelnicu
N-gram selection for practical-sized language models

Patent number: 8655647

Abstract: Described is a technology by which a statistical N-gram (e.g., language) model is trained using an N-gram selection technique that helps reduce the size of the final N-gram model. During training, a higher-order probability estimate for an N-gram is only added to the model when the training data justifies adding the estimate. To this end, if a backoff probability estimate is within a maximum likelihood set determined by that N-gram and the N-gram's associated context, or is between the higher-order estimate and the maximum likelihood set, then the higher-order estimate is not included in the model. The backoff probability estimate may be determined via an iterative process such that the backoff probability estimate is based on the final model rather than any lower-order model. Also described is additional pruning referred to as modified weighted difference pruning.

Type: Grant

Filed: March 11, 2010

Date of Patent: February 18, 2014

Assignee: Microsoft Corporation

Inventor: Robert Carter Moore
Accuracy improvement of spoken queries transcription using co-occurrence information

Patent number: 8650031

Abstract: Techniques disclosed herein include systems and methods for voice-enabled searching. Techniques include a co-occurrence based approach to improve accuracy of the 1-best hypothesis for non-phrase voice queries, as well as for phrased voice queries. A co-occurrence model is used in addition to a statistical natural language model and acoustic model to recognize spoken queries, such as spoken queries for searching a search engine. Given an utterance and an associated list of automated speech recognition n-best hypotheses, the system rescores the different hypotheses using co-occurrence information. For each hypothesis, the system estimates a frequency of co-occurrence within web documents. Combined scores from a speech recognizer and a co-occurrence engine can be combined to select a best hypothesis with a lower word error rate.

Type: Grant

Filed: July 31, 2011

Date of Patent: February 11, 2014

Assignee: Nuance Communications, Inc.

Inventors: Jonathan Mamou, Abhinav Sethy, Bhuvana Ramabhadran, Ron Hoory, Paul Joseph Vozila, Nathan Bodenstab
Intelligent command prediction

Patent number: 8627230

Abstract: A method, system, and computer program product for intelligent command prediction are provided. The method includes determining a command prediction preference associated with a user from user profile data, and selecting one or more command history repositories responsive to the command prediction preference. The one or more command history repositories include command history data collected from a plurality of users and classification data associated with the plurality of users. The method also includes calculating command probabilities for commands in the command history data of the selected one or more command history repositories as a function of the classification data associated with the plurality of users in relation to the user. The method additionally includes presenting a next suggested command as a command from the command history data of the selected one or more command history repositories with a highest calculated command probability.

Type: Grant

Filed: November 24, 2009

Date of Patent: January 7, 2014

Assignee: International Business Machines Corporation

Inventors: Olivier Boehler, Gisela C. Cheng, Anuja Deedwaniya, Zamir G. Gonzalez, Shayne M. Grant, Jagadish B. Kotra
Speech processing system and method

Patent number: 8620655

Abstract: A speech processing method, comprising: receiving a speech input which comprises a sequence of feature vectors; determining the likelihood of a sequence of words arising from the sequence of feature vectors using an acoustic model and a language model, comprising: providing an acoustic model for performing speech recognition on an input signal which comprises a sequence of feature vectors, said model having a plurality of model parameters relating to the probability distribution of a word or part thereof being related to a feature vector, wherein said speech input is a mismatched speech input which is received from a speaker in an environment which is not matched to the speaker or environment under which the acoustic model was trained; and adapting the acoustic model to the mismatched speech input, the speech processing method further comprising determining the likelihood of a sequence of features occurring in a given language using a language model; and combining the likelihoods determined by the acoustic

Type: Grant

Filed: August 10, 2011

Date of Patent: December 31, 2013

Assignee: Kabushiki Kaisha Toshiba

Inventors: Haitian Xu, Kean Kheong Chin, Mark John Francis Gales

prev 1 2 3 4 5 6 7 8 9 … next