Specialized Equations Or Comparisons Patents (Class 704/236)

Correlation (Class 704/237)

Distance (Class 704/238)

Similarity (Class 704/239)

Probability (Class 704/240)

Dynamic time warping (Class 704/241)

Viterbi trellis (Class 704/242)

Applying a structured language model to information extraction

Patent number: 8706491

Abstract: One feature of the present invention uses the parsing capabilities of a structured language model in the information extraction process. During training, the structured language model is first initialized with syntactically annotated training data. The model is then trained by generating parses on semantically annotated training data enforcing annotated constituent boundaries. The syntactic labels in the parse trees generated by the parser are then replaced with joint syntactic and semantic labels. The model is then trained by generating parses on the semantically annotated training data enforcing the semantic tags or labels found in the training data. The trained model can then be used to extract information from test data using the parses generated by the model.

Type: Grant

Filed: August 24, 2010

Date of Patent: April 22, 2014

Assignee: Microsoft Corporation

Inventors: Ciprian Chelba, Milind Mahajan
Periodic ambient waveform analysis for enhanced social functions

Patent number: 8706499

Abstract: Client devices periodically capture ambient audio waveforms, generate waveform fingerprints, and upload the fingerprints to a server for analysis. The server compares the waveforms to a database of stored waveform fingerprints, and upon finding a match, pushes content or other information to the client device. The fingerprints in the database may be uploaded by other users, and compared to the received client waveform fingerprint based on common location or other social factors. Thus a client's location may be enhanced if the location of users whose fingerprints match the client's is known. In particular embodiments, the server may instruct clients whose fingerprints partially match to capture waveform data at a particular time and duration for further analysis and increased match confidence.

Type: Grant

Filed: August 16, 2011

Date of Patent: April 22, 2014

Assignee: Facebook, Inc.

Inventors: Matthew Nicholas Papakipos, David Harry Garcia
Systems and methods for hands-free voice control and voice search

Patent number: 8700399

Abstract: In one embodiment the present invention includes a method comprising receiving an acoustic input signal and processing the acoustic input signal with a plurality of acoustic recognition processes configured to recognize the same target sound. Different acoustic recognition processes start processing different segments of the acoustic input signal at different time points in the acoustic input signal. In one embodiment, initial states in the recognition processes may be configured on each time step.

Type: Grant

Filed: July 6, 2010

Date of Patent: April 15, 2014

Assignee: Sensory, Inc.

Inventors: Pieter J. Vermeulen, Jonathan Shaw, Todd F. Mozer
Speech recognition of character sequences

Patent number: 8700397

Abstract: A method of and a system for processing speech. A spoken utterance of a plurality of characters can be received. A plurality of known character sequences that potentially correspond to the spoken utterance can be selected. Each selected known character sequence can be scored based on, at least in part, a weighting of individual characters that comprise the known character sequence.

Type: Grant

Filed: July 30, 2012

Date of Patent: April 15, 2014

Assignee: Nuance Communications, Inc.

Inventor: Kenneth D. White
Speech-inclusive device interfaces

Patent number: 8700392

Abstract: A user can provide input to a computing device through various combinations of speech, movement, and/or gestures. A computing device can analyze captured audio data and analyze that data to determine any speech information in the audio data. The computing device can simultaneously capture image or video information which can be used to assist in analyzing the audio information. For example, image information is utilized by the device to determine when someone is speaking, and the movement of the person's lips can be analyzed to assist in determining the words that were spoken. Any gestures or other motions can assist in the determination as well. By combining various types of data to determine user input, the accuracy of a process such as speech recognition can be improved, and the need for lengthy application training processes can be avoided.

Type: Grant

Filed: September 10, 2010

Date of Patent: April 15, 2014

Assignee: Amazon Technologies, Inc.

Inventors: Gregory M. Hart, Ian W. Freed, Gregg Elliott Zehr, Jeffrey P. Bezos
Interface for setting confidence thresholds for automatic speech recognition and call steering applications

Patent number: 8700398

Abstract: An interactive user interface is described for setting confidence score thresholds in a language processing system. There is a display of a first system confidence score curve characterizing system recognition performance associated with a high confidence threshold, a first user control for adjusting the high confidence threshold and an associated visual display highlighting a point on the first system confidence score curve representing the selected high confidence threshold, a display of a second system confidence score curve characterizing system recognition performance associated with a low confidence threshold, and a second user control for adjusting the low confidence threshold and an associated visual display highlighting a point on the second system confidence score curve representing the selected low confidence threshold. The operation of the second user control is constrained to require that the low confidence threshold must be less than or equal to the high confidence threshold.

Type: Grant

Filed: November 29, 2011

Date of Patent: April 15, 2014

Assignee: Nuance Communications, Inc.

Inventors: Jeffrey N. Marcus, Amy E. Ulug, William Bridges Smith, Jr.
VOICE RECOGNITION DEVICE AND NAVIGATION DEVICE

Publication number: 20140100847

Abstract: Disclosed is a voice recognition device including: first through Mth voice recognition parts each for detecting a voice interval from sound data stored in a sound data storage unit 2 to extract a feature quantity of the sound data within the voice interval, and each for carrying out a recognition process on the basis of the feature quantity extracted thereby while referring to a recognition dictionary; a voice recognition switching unit 4 for switching among the first through Mth voice recognition parts; a recognition control unit 5 for controlling the switching among the voice recognition parts by the voice recognition switching unit 4 to acquire recognition results acquired by a voice recognition part selected; and a recognition result selecting unit 6 for selecting a recognition result to be presented to a user from the recognition results acquired by the recognition control unit 5.

Type: Application

Filed: July 5, 2011

Publication date: April 10, 2014

Applicant: MITSUBISHI ELECTRIC CORPORATION

Inventors: Jun Ishii, Michihiro Yamazaki
PHRASE SPOTTING SYSTEMS AND METHODS

Publication number: 20140100848

Abstract: Methods and systems for identifying specified phrases within audio streams are provided. More particularly, a phrase is specified. An audio stream is them monitored for the phrase. In response to determining that the audio stream contains the phrase, verification from a user that the phrase was in fact included in the audio stream is requested. If such verification is received, the portion of the audio stream including the phrase is recorded. The recorded phrase can then be applied to identify future instances of the phrase in monitored audio streams.

Type: Application

Filed: October 5, 2012

Publication date: April 10, 2014

Applicant: AVAYA INC.

Inventors: Shmuel Shaffer, Keith Ponting, Valentine C. Matula
Methods and apparatus relating to searching of spoken audio data

Patent number: 8694317

Abstract: Methods for processing audio data containing speech to produce a searchable index file and for subsequently searching such an index file are provided. The processing method uses a phonetic approach and models each frame of the audio data with a set of reference phones. A score for each of the reference phones, representing the difference of the audio from the phone model, is stored in the searchable data file for each of the phones in the reference set. A consequence of storing information regarding each of the reference phones is that the accuracy of searches carried out on the index file is not compromised by the rejection of information about particular phones. A subsequent search method is also provided which uses a simple and efficient dynamic programming search to locate instances of a search term in the audio. The methods of the present invention have particular application to the field of audio data mining.

Type: Grant

Filed: February 6, 2006

Date of Patent: April 8, 2014

Assignee: Aurix Limited

Inventors: Adrian I Skilling, Howard A K Wright
Distinguishing out-of-vocabulary speech from in-vocabulary speech

Patent number: 8688451

Abstract: A speech recognition method includes receiving input speech from a user, processing the input speech using a first grammar to obtain parameter values of a first N-best list of vocabulary, comparing a parameter value of a top result of the first N-best list to a threshold value, and if the compared parameter value is below the threshold value, then additionally processing the input speech using a second grammar to obtain parameter values of a second N-best list of vocabulary. Other preferred steps include: determining the input speech to be in-vocabulary if any of the results of the first N-best list is also present within the second N-best list, but out-of-vocabulary if none of the results of the first N-best list is within the second N-best list; and providing audible feedback to the user if the input speech is determined to be out-of-vocabulary.

Type: Grant

Filed: May 11, 2006

Date of Patent: April 1, 2014

Assignee: General Motors LLC

Inventors: Timothy J. Grost, Rathinavelu Chengalvarayan
Text segmentation and label assignment with user interaction by means of topic specific language models and topic-specific label statistics

Patent number: 8688448

Abstract: The invention relates to a method, a computer program product, a segmentation system and a user interface for structuring an unstructured text by making use of statistical models trained on annotated training data. The method performs text segmentation into text sections and assigns labels to text sections as section headings. The performed segmentation and assignment is provided to a user for general review. Additionally, alternative segmentations and label assignments are provided to the user being capable to select alternative segmentations and alternative labels as well as to enter a user defined segmentation and user defined label. In response to the modifications introduced by the user, a plurality of different actions are initiated incorporating the re-segmentation and re-labeling of successive parts of the document or the entire document.

Type: Grant

Filed: September 14, 2012

Date of Patent: April 1, 2014

Assignee: Nuance Communications Austria GmbH

Inventors: Jochen Peters, Evgeny Matusov, Carsten Meyer, Dietrich Klakow
System and method for building optimal state-dependent statistical utterance classifiers in spoken dialog systems

Patent number: 8682669

Abstract: A system and a method to generate statistical utterance classifiers optimized for the individual states of a spoken dialog system is disclosed. The system and method make use of large databases of transcribed and annotated utterances from calls collected in a dialog system in production and log data reporting the association between the state of the system at the moment when the utterances were recorded and the utterance. From the system state, being a vector of multiple system variables, subsets of these variables, certain variable ranges, quantized variable values, etc. can be extracted to produce a multitude of distinct utterance subsets matching every possible system state. For each of these subset and variable combinations, statistical classifiers can be trained, tuned, and tested, and the classifiers can be stored together with the performance results and the state subset and variable combination.

Type: Grant

Filed: August 21, 2009

Date of Patent: March 25, 2014

Assignee: Synchronoss Technologies, Inc.

Inventors: David Suendermann, Jackson Liscombe, Krishna Dayanidhi, Roberto Pieraccini
Method and device for audio signal classification using tonal characteristic parameters and spectral tilt characteristic parameters

Patent number: 8682664

Abstract: The present invention discloses a method and a device for audio signal classification, and relates to the field of communications technologies, which solve a problem of high complexity of type classification of audio signals in the prior art. In the present invention, after an audio signal to be classified is received, a tonal characteristic parameter of the audio signal to be classified, where the tonal characteristic parameter of the audio signal to be classified is in at least one sub-band, is obtained, and a type of the audio signal to be classified is determined according to the obtained characteristic parameter. The present invention is mainly applied to an audio signal classification scenario, and implements audio signal classification through a relatively simple method.

Type: Grant

Filed: September 27, 2011

Date of Patent: March 25, 2014

Assignee: Huawei Technologies Co., Ltd.

Inventors: Lijing Xu, Shunmei Wu, Liwei Chen, Qing Zhang
SYSTEM AND METHOD FOR DYNAMIC ASR BASED ON SOCIAL MEDIA

Publication number: 20140081636

Abstract: System and method to adjust an automatic speech recognition (ASR) engine, the method including: receiving social network information from a social network; data mining the social network information to extract one or more characteristics; inferring a trend from the extracted one or more characteristics; and adjusting the ASR engine based upon the inferred trend. Embodiments of the method may further include: receiving a speech signal from a user; and recognizing the speech signal by use of the adjusted ASR engine. Further embodiments of the method may further include: producing a list of candidate matching words; and ranking the list of candidate matching words by use of the inferred trend.

Type: Application

Filed: September 15, 2012

Publication date: March 20, 2014

Applicant: Avaya Inc.

Inventors: George W. Erhart, Valentine C. Matula, David J. Skiba
Meeting support apparatus, method and program

Patent number: 8676578

Abstract: According to one embodiment, a meeting support apparatus includes a storage unit, a determination unit, a generation unit. The storage unit is configured to store storage information for each of words, the storage information indicating a word of the words, pronunciation information on the word, and pronunciation recognition frequency. The determination unit is configured to generate emphasis determination information including an emphasis level that represents whether a first word should be highlighted and represents a degree of highlighting determined in accordance with a pronunciation recognition frequency of a second word when the first word is highlighted, based on whether the storage information includes second set corresponding to first set and based on the pronunciation recognition frequency of the second word when the second set is included. The generation unit is configured to generate an emphasis character string based on the emphasis determination information when the first word is highlighted.

Type: Grant

Filed: March 25, 2011

Date of Patent: March 18, 2014

Assignee: Kabushiki Kaisha Toshiba

Inventors: Tomoo Ikeda, Nobuhiro Shimogori, Kouji Ueno
Automatic speech and concept recognition

Patent number: 8676580

Abstract: A method, an apparatus and an article of manufacture for automatic speech recognition. The method includes obtaining at least one language model word and at least one rule-based grammar word, determining an acoustic similarity of at least one pair of language model word and rule-based grammar word, and increasing a transition cost to the at least one language model word based on the acoustic similarity of the at least one language model word with the at least one rule-based grammar word to generate a modified language model for automatic speech recognition.

Type: Grant

Filed: August 16, 2011

Date of Patent: March 18, 2014

Assignee: International Business Machines Corporation

Inventors: Om D. Deshmukh, Etienne Marcheret, Shajith I. Mohamed, Ashish Verma, Karthik Visweswariah
System for handling a plurality of streaming voice signals for determination of responsive action thereto

Patent number: 8676588

Abstract: Streaming voice signals, such as might be received at a contact center or similar operation, are analyzed to detect the occurrence of one or more unprompted, predetermined utterances. The predetermined utterances preferably constitute a vocabulary of words and/or phrases having particular meaning within the context in which they are uttered. Detection of one or more of the predetermined utterances during a call causes a determination of response-determinative significance of the detected utterance(s). Based on the response-determinative significance of the detected utterance(s), a responsive action may be further determined. Additionally, long term storage of the call corresponding to the detected utterance may also be initiated. Conversely, calls in which no predetermined utterances are detected may be deleted from short term storage.

Type: Grant

Filed: May 22, 2009

Date of Patent: March 18, 2014

Assignee: Accenture Global Services Limited

Inventors: Thomas J. Ryan, Biji K. Janan
System and Method for Automatic Prediction of Speech Suitability for Statistical Modeling

Publication number: 20140074468

Abstract: An embodiment according to the invention provides a capability of automatically predicting how favorable a given speech signal is for statistical modeling, which is advantageous in a variety of different contexts. In Multi-Form Segment (MFS) synthesis, for example, an embodiment according to the invention uses prediction capability to provide an automatic acoustic driven template versus model decision maker with an output quality that is high, stable and depends gradually on the system footprint. In speaker selection for a statistical Text-to-Speech synthesis (TTS) system build, as another example context, an embodiment according to the invention enables a fast selection of the most appropriate speaker among several available ones for the full voice dataset recording and preparation, based on a small amount of recorded speech material.

Type: Application

Filed: September 7, 2012

Publication date: March 13, 2014

Applicant: Nuance Communications, Inc.

Inventors: Alexander Sorin, Slava Shechtman, Vincent Pollet
Apparatus and Method for Generating Signatures of Acoustic Signal and Apparatus for Acoustic Signal Identification

Publication number: 20140074469

Abstract: Method and apparatus for generating compact signatures of acoustic signal are disclosed. A method of generating acoustic signal signatures comprises the steps of dividing input signal into multiple frames, computing Fourier transform of each frame, computing difference between non-negative Fourier transform output values for the current frame and non-negative Fourier transform output values for one of previous frames, combining difference values into subgroups, accumulating difference values within a subgroup, combining accumulated subgroup values into groups, and finding an extreme value within each group.

Type: Application

Filed: September 8, 2013

Publication date: March 13, 2014

Inventor: Sergey Zhidkov
CENTRALIZED SPEECH LOGGER ANALYSIS

Publication number: 20140067392

Abstract: A method of providing hands-free services using a mobile device having wireless access to computer-based services includes receiving speech in a vehicle from a vehicle occupant; recording the speech using a mobile device; transmitting the recorded speech from the mobile device to a cloud speech service; receiving automatic speech recognition (ASR) results from the cloud speech service at the mobile device; and comparing the recorded speech with the received ASR results at the mobile device to identify one or more error conditions.

Type: Application

Filed: September 5, 2012

Publication date: March 6, 2014

Applicant: GM GLOBAL TECHNOLOGY OPERATIONS LLC

Inventors: Denis R. Burke, Danilo Gurovich, Daniel E. Rudman, Keith A. Fry, Shane M. McCutchen, Marco T. Carnevale, Mukesh Gupta
Method and System for Predicting Speech Recognition Performance Using Accuracy Scores

Publication number: 20140067391

Abstract: A system and method are presented for predicting speech recognition performance using accuracy scores in speech recognition systems within the speech analytics field. A keyword set is selected. Figure of Merit (FOM) is computed for the keyword set. Relevant features that describe the word individually and in relation to other words in the language are computed. A mapping from these features to FOM is learned. This mapping can be generalized via a suitable machine learning algorithm and be used to predict FOM for a new keyword. In at least embodiment, the predicted FOM may be used to adjust internals of speech recognition engine to achieve a consistent behavior for all inputs for various settings of confidence values.

Type: Application

Filed: August 30, 2012

Publication date: March 6, 2014

Applicant: INTERACTIVE INTELLIGENCE, INC.

Inventors: Aravind Ganapathiraju, Yingyi Tan, Felix Immanuel Wyss, Scott Allen Randal
Noise power estimation system, noise power estimating method, speech recognition system and speech recognizing method

Patent number: 8666737

Abstract: A noise power estimation system for estimating noise power of each frequency spectral component includes a cumulative histogram generating section for generating a cumulative histogram for each frequency spectral component of a time series signal, in which the horizontal axis indicates index of power level and the vertical axis indicates cumulative frequency and which is weighted by exponential moving average; and a noise power estimation section for determining an estimated value of noise power for each frequency spectral component of the time series signal based on the cumulative histogram.

Type: Grant

Filed: September 14, 2011

Date of Patent: March 4, 2014

Assignee: Honda Motor Co., Ltd.

Inventors: Hirofumi Nakajima, Kazuhiro Nakadai, Yuji Hasegawa
ELECTRONIC DEVICE WITH VOICE CONTROL FUNCTION AND VOICE CONTROL METHOD

Publication number: 20140052443

Abstract: A voice control method of an electronic device is provided. The method includes detecting external voice around, changing detected voices into voices signals; periodically sensing non-vocal physical actions of the user, and identifying the action; comparing the identified action with a preset action to determine whether the identified action is same as the preset action; extracting voice characteristic of linguistic meaning from the voice signals when the signals are received; determining whether the extracted voice characteristic of linguistic meaning matches voice templates stored in a storage unit; and performing a particular function associated with the voice template when the storage unit stores the voice template corresponding to the voice characteristic of linguistic meaning. The electronic device is also provided.

Type: Application

Filed: November 1, 2012

Publication date: February 20, 2014

Inventor: TZU-CHIAO SUNG
Method and system for assessing intelligibility of speech represented by a speech signal

Patent number: 8655656

Abstract: A method for assessing intelligibility of speech represented by a speech signal includes providing a speech signal and performing a feature extraction on at least one frame of the speech signal so as to obtain a feature vector for each of the at least one frame of the speech signal. The feature vector is input to a statistical machine learning model so as to obtain an estimated posterior probability of phonemes in the at least one frame as an output including a vector of phoneme posterior probabilities of different phonemes for each of the at least one frame of the speech signal. An entropy estimation is performed on the vector of phoneme posterior probabilities of the at least one frame of the speech signal so as to evaluate intelligibility of the at least one frame of the speech signal. An intelligibility measure is output for the at least one frame of the speech signal.

Type: Grant

Filed: March 4, 2011

Date of Patent: February 18, 2014

Assignee: Deutsche Telekom AG

Inventors: Hamed Ketabdar, Juan-Pablo Ramirez
EFFICIENT CONTENT CLASSIFICATION AND LOUDNESS ESTIMATION

Publication number: 20140039890

Abstract: The present document relates to methods and systems for encoding an audio signal. The method comprises determining a spectral representation of the audio signal. The determining a spectral representation step may comprise determining modified discrete cosine transform, MDCT, coefficients, or a Quadrature Mirror Filter, QMF, filter bank representation of the audio signal. The method further comprises encoding the audio signal using the determined spectral representation; and classifying parts of the audio signal to be speech or non-speech based on the determined spectral representation. Finally, a loudness measure for the audio signal based on the speech parts is determined.

Type: Application

Filed: April 27, 2012

Publication date: February 6, 2014

Applicant: DOLBY INTERNATIONAL AB

Inventors: Harald H. Mundt, Arijit Biswas, Rolf Meissner
Fast partial pattern matching system and method

Patent number: 8639506

Abstract: Method, system and computer program for determining the matching between a first and a second sampled signals using an improved Dynamic Time Warping algorithm, called Unbounded DTW. It uses a dynamic programming algorithm to find exact start-end alignment points, unknown a priori, being the initial subsampling of the similarity matrix made via definition of optimal synchronization points, allowing a very fast process.

Type: Grant

Filed: December 10, 2010

Date of Patent: January 28, 2014

Assignee: Telefonica, S.A.

Inventors: Xavier Anguera Miro, Robert Macrae
Method and system for computing or determining confidence scores for parse trees at all levels

Patent number: 8639509

Abstract: In a confidence computing method and system, a processor may interpret speech signals as a text string or directly receive a text string as input, generate a syntactical parse tree representing the interpreted string and including a plurality of sub-trees which each represents a corresponding section of the interpreted text string, determine for each sub-tree whether the sub-tree is accurate, obtain replacement speech signals for each sub-tree determined to be inaccurate, and provide output based on corresponding text string sections of at least one sub-tree determined to be accurate.

Type: Grant

Filed: July 27, 2007

Date of Patent: January 28, 2014

Assignee: Robert Bosch GmbH

Inventors: Fuliang Weng, Feng Lin, Zhe Feng
User-specific confidence thresholds for speech recognition

Patent number: 8639508

Abstract: A method of automatic speech recognition includes receiving an utterance from a user via a microphone that converts the utterance into a speech signal, pre-processing the speech signal using a processor to extract acoustic data from the received speech signal, and identifying at least one user-specific characteristic in response to the extracted acoustic data. The method also includes determining a user-specific confidence threshold responsive to the at least one user-specific characteristic, and using the user-specific confidence threshold to recognize the utterance received from the user and/or to assess confusability of the utterance with stored vocabulary.

Type: Grant

Filed: February 14, 2011

Date of Patent: January 28, 2014

Assignee: General Motors LLC

Inventors: Xufang Zhao, Gaurav Talwar
Speaker and call characteristic sensitive open voice search

Patent number: 8630860

Abstract: Techniques disclosed herein include systems and methods for open-domain voice-enabled searching that is speaker sensitive. Techniques include using speech information, speaker information, and information associated with a spoken query to enhance open voice search results. This includes integrating a textual index with a voice index to support the entire search cycle. Given a voice query, the system can execute two matching processes simultaneously. This can include a text matching process based on the output of speech recognition, as well as a voice matching process based on characteristics of a caller or user voicing a query. Characteristics of the caller can include output of voice feature extraction and metadata about the call. The system clusters callers according to these characteristics. The system can use specific voice and text clusters to modify speech recognition results, as well as modifying search results.

Type: Grant

Filed: March 3, 2011

Date of Patent: January 14, 2014

Assignee: Nuance Communications, Inc.

Inventors: Shilei Zhang, Shenghua Bao, Wen Liu, Yong Qin, Zhiwei Shuang, Jian Chen, Zhong Su, Qin Shi, William F. Ganong, III
Speech processing system and method

Patent number: 8620655

Abstract: A speech processing method, comprising: receiving a speech input which comprises a sequence of feature vectors; determining the likelihood of a sequence of words arising from the sequence of feature vectors using an acoustic model and a language model, comprising: providing an acoustic model for performing speech recognition on an input signal which comprises a sequence of feature vectors, said model having a plurality of model parameters relating to the probability distribution of a word or part thereof being related to a feature vector, wherein said speech input is a mismatched speech input which is received from a speaker in an environment which is not matched to the speaker or environment under which the acoustic model was trained; and adapting the acoustic model to the mismatched speech input, the speech processing method further comprising determining the likelihood of a sequence of features occurring in a given language using a language model; and combining the likelihoods determined by the acoustic

Type: Grant

Filed: August 10, 2011

Date of Patent: December 31, 2013

Assignee: Kabushiki Kaisha Toshiba

Inventors: Haitian Xu, Kean Kheong Chin, Mark John Francis Gales
MULTI-SAMPLE CONVERSATIONAL VOICE VERIFICATION

Publication number: 20130339018

Abstract: A system and method of verifying the identity of an authorized user in an authorized user group through a voice user interface for enabling secure access to one or more services via a mobile device includes receiving first voice information from a speaker through the voice user interface of the mobile device, calculating a confidence score based on a comparison of the first voice information with a stored voice model associated with the authorized user and specific to the authorized user, interpreting the first voice information as a specific service request, identifying a minimum confidence score for initiating the specific service request, determining whether or not the confidence score exceeds the minimum confidence score, and initiating the specific service request if the confidence score exceeds the minimum confidence score.

Type: Application

Filed: July 27, 2012

Publication date: December 19, 2013

Applicant: SRI INTERNATIONAL

Inventors: Nicolas Scheffer, Yun Lei, Douglas A. Bercow
Voice processing device and method, and program

Patent number: 8612223

Abstract: There is provided a voice processing device. The device includes: score calculation unit configured to calculate a score indicating compatibility of a voice signal input on the basis of an utterance of a user with each of plural pieces of intention information indicating each of a plurality of intentions; intention selection unit configured to select the intention information indicating the intention of the utterance of the user among the plural pieces of intention information on the basis of the score calculated by the score calculation unit; and intention reliability calculation unit configured to calculate the reliability with respect to the intention information selected by the intention selection unit on the basis of the score calculated by the score calculation unit.

Type: Grant

Filed: June 17, 2010

Date of Patent: December 17, 2013

Assignee: Sony Corporation

Inventors: Katsuki Minamino, Hitoshi Honda, Yoshinori Maeda, Hiroaki Ogawa
VOICED SOUND INTERVAL CLASSIFICATION DEVICE, VOICED SOUND INTERVAL CLASSIFICATION METHOD AND VOICED SOUND INTERVAL CLASSIFICATION PROGRAM

Publication number: 20130332163

Abstract: The voiced sound interval classification device comprises a vector calculation unit which calculates, from a power spectrum time series of voice signals, a multidimensional vector series as a vector series of a power spectrum having as many dimensions as the number of microphones, a difference calculation unit which calculates, with respect to each time of the multidimensional vector series, a vector of a difference between the time and the preceding time, a sound source direction estimation unit which estimates, as a sound source direction, a main component of the differential vector, and a voiced sound interval determination unit which determines whether each sound source direction is in a voiced sound interval or a voiceless sound interval by using a predetermined voiced sound index indicative of a likelihood of a voiced sound interval of the voice signal applied at each time.

Type: Application

Filed: January 25, 2012

Publication date: December 12, 2013

Applicant: NEC CORPORATION

Inventor: Yoshifumi Onishi
Method and apparatus for improving memory locality for real-time speech recognition

Patent number: 8606578

Abstract: According to some embodiments, a method and apparatus are provided to buffer N audio frames of a plurality of audio frames associated with an audio signal, pre-compute scores for a subset of context dependent models (CDMs), and perform a graphical model search associated with the N audio frames where a score of a context independent model (CIM) associated with a CDM is used in lieu of a score for the CDM when a score for the CDM is needed and has not been pre-computed.

Type: Grant

Filed: June 25, 2009

Date of Patent: December 10, 2013

Assignee: Intel Corporation

Inventors: Michael Eugene Deisher, Tao Ma
METHOD FOR PROVIDING VOICE RECOGNITION FUNCTION AND ELECTRONIC DEVICE THEREOF

Publication number: 20130325469

Abstract: A method for providing a voice recognition function and an electronic device thereof are provided. The method provides a voice recognition function in an electronic device that includes outputting, when a voice instruction is input, a list of prediction instructions that are candidate instructions similar to the input voice instruction, updating, when a correction instruction correcting the output candidate instructions is input, the list of prediction instructions, and performing, if the correction instruction matches with an instruction of high similarity in the updated list of prediction instructions, a voice recognition function corresponding to the voice instruction.

Type: Application

Filed: May 24, 2013

Publication date: December 5, 2013

Applicant: Samsung Electronics Co., Ltd.

Inventors: Hee-Woon KIM, Yu-Mi AHN, Seon-Hwa KIM, Ha-Young JEON
Signal classification method and device, and encoding and decoding methods and devices

Patent number: 8600765

Abstract: Embodiments of the present invention provide a signal classification method and device, and encoding and decoding methods and devices. The encoding method includes: dividing a current frame into a low-frequency band signal and a high-frequency band signal; attenuating the high-frequency band signal or a to-be-encoded characteristic parameter of the high-frequency band signal according to an energy attenuation value of the low-frequency band signal, where the energy attenuation value indicates energy attenuation of the low-frequency band signal caused by encoding of the low-frequency band signal; and encoding the attenuated high-frequency band signal or the attenuated to-be-encoded characteristic parameter of the high-frequency band signal. The technical solutions according to the embodiments of the present invention can improve the effect of combining the low-frequency band signal and the high-frequency band signal at the decoder.

Type: Grant

Filed: December 27, 2012

Date of Patent: December 3, 2013

Assignee: Huawei Technologies Co., Ltd.

Inventors: Zexin Liu, Lei Miao, Anisse Taleb
SPARSE SIGNAL DETECTION WITH MISMATCHED MODELS

Publication number: 20130317821

Abstract: Various arrangements for detecting a type of sound, such as speech, are presented. A plurality of audio snippets may be sampled. A period of time may elapse between consecutive audio snippets. A hypothetical test may be performed using the sampled plurality of audio snippets. Such a hypothetical test may include weighting one or more hypothetical values greater than one or more other hypothetical values. Each hypothetical value may correspond to an audio snippet of the plurality of audio snippets. The hypothetical test may further include using at least the greater weighted one or more hypothetical values to determine whether at least one audio snippet of the plurality of audio snippets comprises the type of sound.

Type: Application

Filed: January 2, 2013

Publication date: November 28, 2013

Applicant: QUALCOMM INCORPORATED

Inventors: Shankar Sadasivam, Minho Jin, Leonard Henry Grokop, Edward Harrison Teague
Automatic Methods to Predict Error Rates and Detect Performance Degradation

Publication number: 20130317820

Abstract: An automatic speech recognition dictation application is described that includes a dictation module for performing automatic speech recognition in a dictation session with a speaker user to determine representative text corresponding to input speech from the speaker user. A post-processing module develops a session level metric correlated to verbatim recognition error rate of the dictation session, and determines if recognition performance degraded during the dictation session based on a comparison of the session metric to a baseline metric.

Type: Application

Filed: May 24, 2012

Publication date: November 28, 2013

Applicant: NUANCE COMMUNICATIONS, INC.

Inventors: Xiaoqiang Xiao, Venkatesh Nagesha
Pronunciation variation rule extraction apparatus, pronunciation variation rule extraction method, and pronunciation variation rule extraction program

Patent number: 8595004

Abstract: A problem to be solved is to robustly detect a pronunciation variation example and acquire a pronunciation variation rule having a high generalization property, with less effort. The problem can be solved by a pronunciation variation rule extraction apparatus including a speech data storage unit, a base form pronunciation storage unit, a sub word language model generation unit, a speech recognition unit, and a difference extraction unit. The speech data storage unit stores speech data. The base form pronunciation storage unit stores base form pronunciation data representing base form pronunciation of the speech data. The sub word language model generation unit generates a sub word language model from the base form pronunciation data. The speech recognition unit recognizes the speech data by using the sub word language model.

Type: Grant

Filed: November 27, 2008

Date of Patent: November 26, 2013

Assignee: NEC Corporation

Inventor: Takafumi Koshinaka
System and method for recognizing emotional state from a speech signal

Patent number: 8595005

Abstract: A computerized method, software, and system for recognizing emotions from a speech signal, wherein statistical and MFCC features are extracted from the speech signal, the MFCC features are sorted to provide a basis for comparison between the speech signal and reference samples, the statistical and MFCC features are compared between the speech signal and reference samples, a scoring system is used to compare relative correlation to different emotions, a probable emotional state is assigned to the speech signal based on the scoring system, and the probable emotional state is communicated to a user.

Type: Grant

Filed: April 22, 2011

Date of Patent: November 26, 2013

Assignee: Simple Emotion, Inc.

Inventors: Akash Krishnan, Matthew Fernandez
Speech recognition system and method using vector taylor series joint uncertainty decoding

Patent number: 8595006

Abstract: A speech recognition method and system, includes receiving in a first noise environment a speech input having a sequence of observations; determining a likelihood of a sequence of words arising from the sequence of observations using an acoustic model trained to recognize speech in a second noise environment, the model having a plurality of model parameters relating to the probability distribution of a word or part thereof being related to an observation; and adapting the model trained in the second environment to that of the first environment.

Type: Grant

Filed: March 26, 2010

Date of Patent: November 26, 2013

Assignee: Kabushiki Kaisha Toshiba

Inventors: Haitian Xu, Mark John Francis Gales
Device, method and program for voice detection and recording medium

Patent number: 8589152

Abstract: To this end, a voice detection device includes a band-based power calculation unit that calculates a total of signal power values (sub-band power) of signals entered from the microphones from one preset frequency width (sub-band) to another. The voice detection device also includes a band-based noise estimation unit that estimates the sub-band based noise power, and a sub-band based SNR calculation unit. The sub-band based SNR calculation unit calculates a sub-band SNR from one sub-band to another to output the largest one of the sub-band SNRs as an SNR for a microphone of interest. The voice detection device further includes a voice/non-voice decision unit that determines the voice/non-voice using the SNR for the microphone of interest.

Type: Grant

Filed: May 26, 2009

Date of Patent: November 19, 2013

Assignee: NEC Corporation

Inventors: Tadashi Emori, Masanori Tsujikawa
Contextual Voice Query Dilation

Publication number: 20130304468

Abstract: A method for contextual voice query dilation in a Spoken Web search includes determining a context in which a voice query is created, generating a set of multiple voice query terms based on the context and information derived by a speech recognizer component pertaining to the voice query, and processing the set of query terms with at least one dilation operator to produce a dilated set of queries. A method for performing a search on a voice query is also provided, including generating a set of multiple query terms based on information derived by a speech recognizer component processing a voice query, processing the set with multiple dilation operators to produce multiple dilated sub-sets of query terms, selecting at least one query term from each dilated sub-set to compose a query set, and performing a search on the query set.

Type: Application

Filed: August 8, 2012

Publication date: November 14, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Nitendra Rajput, Kundan Shrivastava
Word category estimation apparatus, word category estimation method, speech recognition apparatus, speech recognition method, program, and recording medium

Patent number: 8583436

Abstract: A word category estimation apparatus (100) includes a word category model (5) which is formed from a probability model having a plurality of kinds of information about a word category as features, and includes information about an entire word category graph as at least one of the features. A word category estimation unit (4) receives the word category graph of a speech recognition hypothesis to be processed, computes scores by referring to the word category model for respective arcs that form the word category graph, and outputs a word category sequence candidate based on the scores.

Type: Grant

Filed: December 19, 2008

Date of Patent: November 12, 2013

Assignee: NEC Corporation

Inventors: Hitoshi Yamamoto, Kiyokazu Miki
Adaptive Equalization System

Publication number: 20130297306

Abstract: An adaptive equalization system that adjusts the spectral shape of a speech signal based on an intelligibility measurement of the speech signal may improve the intelligibility of the output speech signal. Such an adaptive equalization system may include a speech intelligibility measurement module, a spectral shape adjustment module, and an adaptive equalization module. The speech intelligibility measurement module is configured to calculate a speech intelligibility measurement of a speech signal. The spectral shape adjustment module is configured to generate a weighted long-term speech curve based on a first predetermined long-term average speech curve, a second predetermined long-term average speech curve, and the speech intelligibility measurement. The adaptive equalization module is configured to adapt equalization coefficients for the speech signal based on the weighted long-term speech curve.

Type: Application

Filed: May 4, 2012

Publication date: November 7, 2013

Applicant: QNX Software Systems Limited

Inventors: Phillip Alan Hetherington, Xueman Li
Speech recognition system and speech recognizing method

Patent number: 8577678

Abstract: A speech recognition system according to the present invention includes a sound source separating section which separates mixed speeches from multiple sound sources from one another; a mask generating section which generates a soft mask which can take continuous values between 0 and 1 for each frequency spectral component of a separated speech signal using distributions of speech signal and noise against separation reliability of the separated speech signal; and a speech recognizing section which recognizes speeches separated by the sound source separating section using soft masks generated by the mask generating section.

Type: Grant

Filed: March 10, 2011

Date of Patent: November 5, 2013

Assignee: Honda Motor Co., Ltd.

Inventors: Kazuhiro Nakadai, Toru Takahashi, Hiroshi Okuno
Negative Example (Anti-Word) Based Performance Improvement For Speech Recognition

Publication number: 20130289987

Abstract: A system and method are presented for negative example based performance improvements for speech recognition. The presently disclosed embodiments address identified false positives and the identification of negative examples of keywords in an Automatic Speech Recognition (ASR) system. Various methods may be used to identify negative examples of keywords. Such methods may include, for example, human listening and learning possible negative examples from a large domain specific text source. In at least one embodiment, negative examples of keywords may be used to improve the performance of an ASR system by reducing false positives.

Type: Application

Filed: April 26, 2013

Publication date: October 31, 2013

Applicant: Interactive Intelligence, Inc.

Inventors: Aravind Ganapathiraju, Ananth Nagaraja Iyer, Felix Immanuel Wyss
Inference-aided speaker recognition

Patent number: 8571865

Abstract: Systems, methods performed by data processing apparatus and computer storage media encoded with computer programs for receiving information relating to (i) a communication device that has received an utterance and (ii) a voice associated with the received utterance, comparing the received voice information with voice signatures in a comparison group, the comparison group including one or more individuals identified from one or more connections arising from the received information relating to the communication device, attempting to identify the voice associated with the utterance as matching one of the individuals in the comparison group, and based on a result of the attempt to identify, selectively providing the communication device with access to one or more resources associated with the matched individual.

Type: Grant

Filed: August 10, 2012

Date of Patent: October 29, 2013

Assignee: Google Inc.

Inventor: Philip Hewinson
SPEECH RECOGNITION DEVICE, SPEECH RECOGNITION METHOD, AND SPEECH RECOGNITION PROGRAM

Publication number: 20130282374

Abstract: A speech recognition device has: hypothesis search means which searches for an optimal solution of inputted speech data by generating a hypothesis which is a bundle of words which are searched for as recognition result candidates; self-repair decision means which calculates a self-repair likelihood of a word or a word sequence included in the hypothesis which is being searched for by the hypothesis search means, and decides whether or not self-repair of the word or the word sequence is performed; and transparent word hypothesis generation means which, when the self-repair decision means decides that the self-repair is performed, generates a transparent word hypothesis which is a hypothesis which regards as a transparent word a word or a word sequence included in an un-repaired interval related to the word or the word sequence, and the hypothesis search means searches hypotheses for an optimal solution, the hypotheses including as search target hypotheses the transparent word hypothesis generated by the transp

Type: Application

Filed: January 5, 2012

Publication date: October 24, 2013

Applicant: NEC CORPORATION

Inventors: Koji Okabe, Ken Hanazawa, Seiya Osada
Speech recognition system

Patent number: 8566091

Abstract: A speech recognition system is provided for selecting, via a speech input, an item from a list of items. The speech recognition system detects a first speech input, recognizes the first speech input, compares the recognized first speech input with the list of items and generates a first candidate list of best matching items based on the comparison result. The system then informs the speaker of at least one of the best matching items of the first candidate list for a selection of an item by the speaker. If the intended item is not one of the best matching items presented to the speaker, the system then detects a second speech input, recognizes the second speech input, and generates a second candidate list of best matching items taking into account the comparison result obtained with the first speech input.

Type: Grant

Filed: December 12, 2007

Date of Patent: October 22, 2013

Assignee: Nuance Communications, Inc.

Inventors: Andreas Löw, Lars König, Christian Hillebrecht

prev … 2 3 4 5 6 7 8 9 10 … next