Specialized Equations Or Comparisons Patents (Class 704/236)
  • Patent number: 8706491
    Abstract: One feature of the present invention uses the parsing capabilities of a structured language model in the information extraction process. During training, the structured language model is first initialized with syntactically annotated training data. The model is then trained by generating parses on semantically annotated training data enforcing annotated constituent boundaries. The syntactic labels in the parse trees generated by the parser are then replaced with joint syntactic and semantic labels. The model is then trained by generating parses on the semantically annotated training data enforcing the semantic tags or labels found in the training data. The trained model can then be used to extract information from test data using the parses generated by the model.
    Type: Grant
    Filed: August 24, 2010
    Date of Patent: April 22, 2014
    Assignee: Microsoft Corporation
    Inventors: Ciprian Chelba, Milind Mahajan
  • Patent number: 8706499
    Abstract: Client devices periodically capture ambient audio waveforms, generate waveform fingerprints, and upload the fingerprints to a server for analysis. The server compares the waveforms to a database of stored waveform fingerprints, and upon finding a match, pushes content or other information to the client device. The fingerprints in the database may be uploaded by other users, and compared to the received client waveform fingerprint based on common location or other social factors. Thus a client's location may be enhanced if the location of users whose fingerprints match the client's is known. In particular embodiments, the server may instruct clients whose fingerprints partially match to capture waveform data at a particular time and duration for further analysis and increased match confidence.
    Type: Grant
    Filed: August 16, 2011
    Date of Patent: April 22, 2014
    Assignee: Facebook, Inc.
    Inventors: Matthew Nicholas Papakipos, David Harry Garcia
  • Patent number: 8700399
    Abstract: In one embodiment the present invention includes a method comprising receiving an acoustic input signal and processing the acoustic input signal with a plurality of acoustic recognition processes configured to recognize the same target sound. Different acoustic recognition processes start processing different segments of the acoustic input signal at different time points in the acoustic input signal. In one embodiment, initial states in the recognition processes may be configured on each time step.
    Type: Grant
    Filed: July 6, 2010
    Date of Patent: April 15, 2014
    Assignee: Sensory, Inc.
    Inventors: Pieter J. Vermeulen, Jonathan Shaw, Todd F. Mozer
  • Patent number: 8700397
    Abstract: A method of and a system for processing speech. A spoken utterance of a plurality of characters can be received. A plurality of known character sequences that potentially correspond to the spoken utterance can be selected. Each selected known character sequence can be scored based on, at least in part, a weighting of individual characters that comprise the known character sequence.
    Type: Grant
    Filed: July 30, 2012
    Date of Patent: April 15, 2014
    Assignee: Nuance Communications, Inc.
    Inventor: Kenneth D. White
  • Patent number: 8700392
    Abstract: A user can provide input to a computing device through various combinations of speech, movement, and/or gestures. A computing device can analyze captured audio data and analyze that data to determine any speech information in the audio data. The computing device can simultaneously capture image or video information which can be used to assist in analyzing the audio information. For example, image information is utilized by the device to determine when someone is speaking, and the movement of the person's lips can be analyzed to assist in determining the words that were spoken. Any gestures or other motions can assist in the determination as well. By combining various types of data to determine user input, the accuracy of a process such as speech recognition can be improved, and the need for lengthy application training processes can be avoided.
    Type: Grant
    Filed: September 10, 2010
    Date of Patent: April 15, 2014
    Assignee: Amazon Technologies, Inc.
    Inventors: Gregory M. Hart, Ian W. Freed, Gregg Elliott Zehr, Jeffrey P. Bezos
  • Patent number: 8700398
    Abstract: An interactive user interface is described for setting confidence score thresholds in a language processing system. There is a display of a first system confidence score curve characterizing system recognition performance associated with a high confidence threshold, a first user control for adjusting the high confidence threshold and an associated visual display highlighting a point on the first system confidence score curve representing the selected high confidence threshold, a display of a second system confidence score curve characterizing system recognition performance associated with a low confidence threshold, and a second user control for adjusting the low confidence threshold and an associated visual display highlighting a point on the second system confidence score curve representing the selected low confidence threshold. The operation of the second user control is constrained to require that the low confidence threshold must be less than or equal to the high confidence threshold.
    Type: Grant
    Filed: November 29, 2011
    Date of Patent: April 15, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Jeffrey N. Marcus, Amy E. Ulug, William Bridges Smith, Jr.
  • Publication number: 20140100847
    Abstract: Disclosed is a voice recognition device including: first through Mth voice recognition parts each for detecting a voice interval from sound data stored in a sound data storage unit 2 to extract a feature quantity of the sound data within the voice interval, and each for carrying out a recognition process on the basis of the feature quantity extracted thereby while referring to a recognition dictionary; a voice recognition switching unit 4 for switching among the first through Mth voice recognition parts; a recognition control unit 5 for controlling the switching among the voice recognition parts by the voice recognition switching unit 4 to acquire recognition results acquired by a voice recognition part selected; and a recognition result selecting unit 6 for selecting a recognition result to be presented to a user from the recognition results acquired by the recognition control unit 5.
    Type: Application
    Filed: July 5, 2011
    Publication date: April 10, 2014
    Applicant: MITSUBISHI ELECTRIC CORPORATION
    Inventors: Jun Ishii, Michihiro Yamazaki
  • Publication number: 20140100848
    Abstract: Methods and systems for identifying specified phrases within audio streams are provided. More particularly, a phrase is specified. An audio stream is them monitored for the phrase. In response to determining that the audio stream contains the phrase, verification from a user that the phrase was in fact included in the audio stream is requested. If such verification is received, the portion of the audio stream including the phrase is recorded. The recorded phrase can then be applied to identify future instances of the phrase in monitored audio streams.
    Type: Application
    Filed: October 5, 2012
    Publication date: April 10, 2014
    Applicant: AVAYA INC.
    Inventors: Shmuel Shaffer, Keith Ponting, Valentine C. Matula
  • Patent number: 8694317
    Abstract: Methods for processing audio data containing speech to produce a searchable index file and for subsequently searching such an index file are provided. The processing method uses a phonetic approach and models each frame of the audio data with a set of reference phones. A score for each of the reference phones, representing the difference of the audio from the phone model, is stored in the searchable data file for each of the phones in the reference set. A consequence of storing information regarding each of the reference phones is that the accuracy of searches carried out on the index file is not compromised by the rejection of information about particular phones. A subsequent search method is also provided which uses a simple and efficient dynamic programming search to locate instances of a search term in the audio. The methods of the present invention have particular application to the field of audio data mining.
    Type: Grant
    Filed: February 6, 2006
    Date of Patent: April 8, 2014
    Assignee: Aurix Limited
    Inventors: Adrian I Skilling, Howard A K Wright
  • Patent number: 8688451
    Abstract: A speech recognition method includes receiving input speech from a user, processing the input speech using a first grammar to obtain parameter values of a first N-best list of vocabulary, comparing a parameter value of a top result of the first N-best list to a threshold value, and if the compared parameter value is below the threshold value, then additionally processing the input speech using a second grammar to obtain parameter values of a second N-best list of vocabulary. Other preferred steps include: determining the input speech to be in-vocabulary if any of the results of the first N-best list is also present within the second N-best list, but out-of-vocabulary if none of the results of the first N-best list is within the second N-best list; and providing audible feedback to the user if the input speech is determined to be out-of-vocabulary.
    Type: Grant
    Filed: May 11, 2006
    Date of Patent: April 1, 2014
    Assignee: General Motors LLC
    Inventors: Timothy J. Grost, Rathinavelu Chengalvarayan
  • Patent number: 8688448
    Abstract: The invention relates to a method, a computer program product, a segmentation system and a user interface for structuring an unstructured text by making use of statistical models trained on annotated training data. The method performs text segmentation into text sections and assigns labels to text sections as section headings. The performed segmentation and assignment is provided to a user for general review. Additionally, alternative segmentations and label assignments are provided to the user being capable to select alternative segmentations and alternative labels as well as to enter a user defined segmentation and user defined label. In response to the modifications introduced by the user, a plurality of different actions are initiated incorporating the re-segmentation and re-labeling of successive parts of the document or the entire document.
    Type: Grant
    Filed: September 14, 2012
    Date of Patent: April 1, 2014
    Assignee: Nuance Communications Austria GmbH
    Inventors: Jochen Peters, Evgeny Matusov, Carsten Meyer, Dietrich Klakow
  • Patent number: 8682669
    Abstract: A system and a method to generate statistical utterance classifiers optimized for the individual states of a spoken dialog system is disclosed. The system and method make use of large databases of transcribed and annotated utterances from calls collected in a dialog system in production and log data reporting the association between the state of the system at the moment when the utterances were recorded and the utterance. From the system state, being a vector of multiple system variables, subsets of these variables, certain variable ranges, quantized variable values, etc. can be extracted to produce a multitude of distinct utterance subsets matching every possible system state. For each of these subset and variable combinations, statistical classifiers can be trained, tuned, and tested, and the classifiers can be stored together with the performance results and the state subset and variable combination.
    Type: Grant
    Filed: August 21, 2009
    Date of Patent: March 25, 2014
    Assignee: Synchronoss Technologies, Inc.
    Inventors: David Suendermann, Jackson Liscombe, Krishna Dayanidhi, Roberto Pieraccini
  • Patent number: 8682664
    Abstract: The present invention discloses a method and a device for audio signal classification, and relates to the field of communications technologies, which solve a problem of high complexity of type classification of audio signals in the prior art. In the present invention, after an audio signal to be classified is received, a tonal characteristic parameter of the audio signal to be classified, where the tonal characteristic parameter of the audio signal to be classified is in at least one sub-band, is obtained, and a type of the audio signal to be classified is determined according to the obtained characteristic parameter. The present invention is mainly applied to an audio signal classification scenario, and implements audio signal classification through a relatively simple method.
    Type: Grant
    Filed: September 27, 2011
    Date of Patent: March 25, 2014
    Assignee: Huawei Technologies Co., Ltd.
    Inventors: Lijing Xu, Shunmei Wu, Liwei Chen, Qing Zhang
  • Publication number: 20140081636
    Abstract: System and method to adjust an automatic speech recognition (ASR) engine, the method including: receiving social network information from a social network; data mining the social network information to extract one or more characteristics; inferring a trend from the extracted one or more characteristics; and adjusting the ASR engine based upon the inferred trend. Embodiments of the method may further include: receiving a speech signal from a user; and recognizing the speech signal by use of the adjusted ASR engine. Further embodiments of the method may further include: producing a list of candidate matching words; and ranking the list of candidate matching words by use of the inferred trend.
    Type: Application
    Filed: September 15, 2012
    Publication date: March 20, 2014
    Applicant: Avaya Inc.
    Inventors: George W. Erhart, Valentine C. Matula, David J. Skiba
  • Patent number: 8676578
    Abstract: According to one embodiment, a meeting support apparatus includes a storage unit, a determination unit, a generation unit. The storage unit is configured to store storage information for each of words, the storage information indicating a word of the words, pronunciation information on the word, and pronunciation recognition frequency. The determination unit is configured to generate emphasis determination information including an emphasis level that represents whether a first word should be highlighted and represents a degree of highlighting determined in accordance with a pronunciation recognition frequency of a second word when the first word is highlighted, based on whether the storage information includes second set corresponding to first set and based on the pronunciation recognition frequency of the second word when the second set is included. The generation unit is configured to generate an emphasis character string based on the emphasis determination information when the first word is highlighted.
    Type: Grant
    Filed: March 25, 2011
    Date of Patent: March 18, 2014
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Tomoo Ikeda, Nobuhiro Shimogori, Kouji Ueno
  • Patent number: 8676580
    Abstract: A method, an apparatus and an article of manufacture for automatic speech recognition. The method includes obtaining at least one language model word and at least one rule-based grammar word, determining an acoustic similarity of at least one pair of language model word and rule-based grammar word, and increasing a transition cost to the at least one language model word based on the acoustic similarity of the at least one language model word with the at least one rule-based grammar word to generate a modified language model for automatic speech recognition.
    Type: Grant
    Filed: August 16, 2011
    Date of Patent: March 18, 2014
    Assignee: International Business Machines Corporation
    Inventors: Om D. Deshmukh, Etienne Marcheret, Shajith I. Mohamed, Ashish Verma, Karthik Visweswariah
  • Patent number: 8676588
    Abstract: Streaming voice signals, such as might be received at a contact center or similar operation, are analyzed to detect the occurrence of one or more unprompted, predetermined utterances. The predetermined utterances preferably constitute a vocabulary of words and/or phrases having particular meaning within the context in which they are uttered. Detection of one or more of the predetermined utterances during a call causes a determination of response-determinative significance of the detected utterance(s). Based on the response-determinative significance of the detected utterance(s), a responsive action may be further determined. Additionally, long term storage of the call corresponding to the detected utterance may also be initiated. Conversely, calls in which no predetermined utterances are detected may be deleted from short term storage.
    Type: Grant
    Filed: May 22, 2009
    Date of Patent: March 18, 2014
    Assignee: Accenture Global Services Limited
    Inventors: Thomas J. Ryan, Biji K. Janan
  • Publication number: 20140074468
    Abstract: An embodiment according to the invention provides a capability of automatically predicting how favorable a given speech signal is for statistical modeling, which is advantageous in a variety of different contexts. In Multi-Form Segment (MFS) synthesis, for example, an embodiment according to the invention uses prediction capability to provide an automatic acoustic driven template versus model decision maker with an output quality that is high, stable and depends gradually on the system footprint. In speaker selection for a statistical Text-to-Speech synthesis (TTS) system build, as another example context, an embodiment according to the invention enables a fast selection of the most appropriate speaker among several available ones for the full voice dataset recording and preparation, based on a small amount of recorded speech material.
    Type: Application
    Filed: September 7, 2012
    Publication date: March 13, 2014
    Applicant: Nuance Communications, Inc.
    Inventors: Alexander Sorin, Slava Shechtman, Vincent Pollet
  • Publication number: 20140074469
    Abstract: Method and apparatus for generating compact signatures of acoustic signal are disclosed. A method of generating acoustic signal signatures comprises the steps of dividing input signal into multiple frames, computing Fourier transform of each frame, computing difference between non-negative Fourier transform output values for the current frame and non-negative Fourier transform output values for one of previous frames, combining difference values into subgroups, accumulating difference values within a subgroup, combining accumulated subgroup values into groups, and finding an extreme value within each group.
    Type: Application
    Filed: September 8, 2013
    Publication date: March 13, 2014
    Inventor: Sergey Zhidkov
  • Publication number: 20140067392
    Abstract: A method of providing hands-free services using a mobile device having wireless access to computer-based services includes receiving speech in a vehicle from a vehicle occupant; recording the speech using a mobile device; transmitting the recorded speech from the mobile device to a cloud speech service; receiving automatic speech recognition (ASR) results from the cloud speech service at the mobile device; and comparing the recorded speech with the received ASR results at the mobile device to identify one or more error conditions.
    Type: Application
    Filed: September 5, 2012
    Publication date: March 6, 2014
    Applicant: GM GLOBAL TECHNOLOGY OPERATIONS LLC
    Inventors: Denis R. Burke, Danilo Gurovich, Daniel E. Rudman, Keith A. Fry, Shane M. McCutchen, Marco T. Carnevale, Mukesh Gupta
  • Publication number: 20140067391
    Abstract: A system and method are presented for predicting speech recognition performance using accuracy scores in speech recognition systems within the speech analytics field. A keyword set is selected. Figure of Merit (FOM) is computed for the keyword set. Relevant features that describe the word individually and in relation to other words in the language are computed. A mapping from these features to FOM is learned. This mapping can be generalized via a suitable machine learning algorithm and be used to predict FOM for a new keyword. In at least embodiment, the predicted FOM may be used to adjust internals of speech recognition engine to achieve a consistent behavior for all inputs for various settings of confidence values.
    Type: Application
    Filed: August 30, 2012
    Publication date: March 6, 2014
    Applicant: INTERACTIVE INTELLIGENCE, INC.
    Inventors: Aravind Ganapathiraju, Yingyi Tan, Felix Immanuel Wyss, Scott Allen Randal
  • Patent number: 8666737
    Abstract: A noise power estimation system for estimating noise power of each frequency spectral component includes a cumulative histogram generating section for generating a cumulative histogram for each frequency spectral component of a time series signal, in which the horizontal axis indicates index of power level and the vertical axis indicates cumulative frequency and which is weighted by exponential moving average; and a noise power estimation section for determining an estimated value of noise power for each frequency spectral component of the time series signal based on the cumulative histogram.
    Type: Grant
    Filed: September 14, 2011
    Date of Patent: March 4, 2014
    Assignee: Honda Motor Co., Ltd.
    Inventors: Hirofumi Nakajima, Kazuhiro Nakadai, Yuji Hasegawa
  • Publication number: 20140052443
    Abstract: A voice control method of an electronic device is provided. The method includes detecting external voice around, changing detected voices into voices signals; periodically sensing non-vocal physical actions of the user, and identifying the action; comparing the identified action with a preset action to determine whether the identified action is same as the preset action; extracting voice characteristic of linguistic meaning from the voice signals when the signals are received; determining whether the extracted voice characteristic of linguistic meaning matches voice templates stored in a storage unit; and performing a particular function associated with the voice template when the storage unit stores the voice template corresponding to the voice characteristic of linguistic meaning. The electronic device is also provided.
    Type: Application
    Filed: November 1, 2012
    Publication date: February 20, 2014
    Inventor: TZU-CHIAO SUNG
  • Patent number: 8655656
    Abstract: A method for assessing intelligibility of speech represented by a speech signal includes providing a speech signal and performing a feature extraction on at least one frame of the speech signal so as to obtain a feature vector for each of the at least one frame of the speech signal. The feature vector is input to a statistical machine learning model so as to obtain an estimated posterior probability of phonemes in the at least one frame as an output including a vector of phoneme posterior probabilities of different phonemes for each of the at least one frame of the speech signal. An entropy estimation is performed on the vector of phoneme posterior probabilities of the at least one frame of the speech signal so as to evaluate intelligibility of the at least one frame of the speech signal. An intelligibility measure is output for the at least one frame of the speech signal.
    Type: Grant
    Filed: March 4, 2011
    Date of Patent: February 18, 2014
    Assignee: Deutsche Telekom AG
    Inventors: Hamed Ketabdar, Juan-Pablo Ramirez
  • Publication number: 20140039890
    Abstract: The present document relates to methods and systems for encoding an audio signal. The method comprises determining a spectral representation of the audio signal. The determining a spectral representation step may comprise determining modified discrete cosine transform, MDCT, coefficients, or a Quadrature Mirror Filter, QMF, filter bank representation of the audio signal. The method further comprises encoding the audio signal using the determined spectral representation; and classifying parts of the audio signal to be speech or non-speech based on the determined spectral representation. Finally, a loudness measure for the audio signal based on the speech parts is determined.
    Type: Application
    Filed: April 27, 2012
    Publication date: February 6, 2014
    Applicant: DOLBY INTERNATIONAL AB
    Inventors: Harald H. Mundt, Arijit Biswas, Rolf Meissner
  • Patent number: 8639506
    Abstract: Method, system and computer program for determining the matching between a first and a second sampled signals using an improved Dynamic Time Warping algorithm, called Unbounded DTW. It uses a dynamic programming algorithm to find exact start-end alignment points, unknown a priori, being the initial subsampling of the similarity matrix made via definition of optimal synchronization points, allowing a very fast process.
    Type: Grant
    Filed: December 10, 2010
    Date of Patent: January 28, 2014
    Assignee: Telefonica, S.A.
    Inventors: Xavier Anguera Miro, Robert Macrae
  • Patent number: 8639509
    Abstract: In a confidence computing method and system, a processor may interpret speech signals as a text string or directly receive a text string as input, generate a syntactical parse tree representing the interpreted string and including a plurality of sub-trees which each represents a corresponding section of the interpreted text string, determine for each sub-tree whether the sub-tree is accurate, obtain replacement speech signals for each sub-tree determined to be inaccurate, and provide output based on corresponding text string sections of at least one sub-tree determined to be accurate.
    Type: Grant
    Filed: July 27, 2007
    Date of Patent: January 28, 2014
    Assignee: Robert Bosch GmbH
    Inventors: Fuliang Weng, Feng Lin, Zhe Feng
  • Patent number: 8639508
    Abstract: A method of automatic speech recognition includes receiving an utterance from a user via a microphone that converts the utterance into a speech signal, pre-processing the speech signal using a processor to extract acoustic data from the received speech signal, and identifying at least one user-specific characteristic in response to the extracted acoustic data. The method also includes determining a user-specific confidence threshold responsive to the at least one user-specific characteristic, and using the user-specific confidence threshold to recognize the utterance received from the user and/or to assess confusability of the utterance with stored vocabulary.
    Type: Grant
    Filed: February 14, 2011
    Date of Patent: January 28, 2014
    Assignee: General Motors LLC
    Inventors: Xufang Zhao, Gaurav Talwar
  • Patent number: 8630860
    Abstract: Techniques disclosed herein include systems and methods for open-domain voice-enabled searching that is speaker sensitive. Techniques include using speech information, speaker information, and information associated with a spoken query to enhance open voice search results. This includes integrating a textual index with a voice index to support the entire search cycle. Given a voice query, the system can execute two matching processes simultaneously. This can include a text matching process based on the output of speech recognition, as well as a voice matching process based on characteristics of a caller or user voicing a query. Characteristics of the caller can include output of voice feature extraction and metadata about the call. The system clusters callers according to these characteristics. The system can use specific voice and text clusters to modify speech recognition results, as well as modifying search results.
    Type: Grant
    Filed: March 3, 2011
    Date of Patent: January 14, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Shilei Zhang, Shenghua Bao, Wen Liu, Yong Qin, Zhiwei Shuang, Jian Chen, Zhong Su, Qin Shi, William F. Ganong, III
  • Patent number: 8620655
    Abstract: A speech processing method, comprising: receiving a speech input which comprises a sequence of feature vectors; determining the likelihood of a sequence of words arising from the sequence of feature vectors using an acoustic model and a language model, comprising: providing an acoustic model for performing speech recognition on an input signal which comprises a sequence of feature vectors, said model having a plurality of model parameters relating to the probability distribution of a word or part thereof being related to a feature vector, wherein said speech input is a mismatched speech input which is received from a speaker in an environment which is not matched to the speaker or environment under which the acoustic model was trained; and adapting the acoustic model to the mismatched speech input, the speech processing method further comprising determining the likelihood of a sequence of features occurring in a given language using a language model; and combining the likelihoods determined by the acoustic
    Type: Grant
    Filed: August 10, 2011
    Date of Patent: December 31, 2013
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Haitian Xu, Kean Kheong Chin, Mark John Francis Gales
  • Publication number: 20130339018
    Abstract: A system and method of verifying the identity of an authorized user in an authorized user group through a voice user interface for enabling secure access to one or more services via a mobile device includes receiving first voice information from a speaker through the voice user interface of the mobile device, calculating a confidence score based on a comparison of the first voice information with a stored voice model associated with the authorized user and specific to the authorized user, interpreting the first voice information as a specific service request, identifying a minimum confidence score for initiating the specific service request, determining whether or not the confidence score exceeds the minimum confidence score, and initiating the specific service request if the confidence score exceeds the minimum confidence score.
    Type: Application
    Filed: July 27, 2012
    Publication date: December 19, 2013
    Applicant: SRI INTERNATIONAL
    Inventors: Nicolas Scheffer, Yun Lei, Douglas A. Bercow
  • Patent number: 8612223
    Abstract: There is provided a voice processing device. The device includes: score calculation unit configured to calculate a score indicating compatibility of a voice signal input on the basis of an utterance of a user with each of plural pieces of intention information indicating each of a plurality of intentions; intention selection unit configured to select the intention information indicating the intention of the utterance of the user among the plural pieces of intention information on the basis of the score calculated by the score calculation unit; and intention reliability calculation unit configured to calculate the reliability with respect to the intention information selected by the intention selection unit on the basis of the score calculated by the score calculation unit.
    Type: Grant
    Filed: June 17, 2010
    Date of Patent: December 17, 2013
    Assignee: Sony Corporation
    Inventors: Katsuki Minamino, Hitoshi Honda, Yoshinori Maeda, Hiroaki Ogawa
  • Publication number: 20130332163
    Abstract: The voiced sound interval classification device comprises a vector calculation unit which calculates, from a power spectrum time series of voice signals, a multidimensional vector series as a vector series of a power spectrum having as many dimensions as the number of microphones, a difference calculation unit which calculates, with respect to each time of the multidimensional vector series, a vector of a difference between the time and the preceding time, a sound source direction estimation unit which estimates, as a sound source direction, a main component of the differential vector, and a voiced sound interval determination unit which determines whether each sound source direction is in a voiced sound interval or a voiceless sound interval by using a predetermined voiced sound index indicative of a likelihood of a voiced sound interval of the voice signal applied at each time.
    Type: Application
    Filed: January 25, 2012
    Publication date: December 12, 2013
    Applicant: NEC CORPORATION
    Inventor: Yoshifumi Onishi
  • Patent number: 8606578
    Abstract: According to some embodiments, a method and apparatus are provided to buffer N audio frames of a plurality of audio frames associated with an audio signal, pre-compute scores for a subset of context dependent models (CDMs), and perform a graphical model search associated with the N audio frames where a score of a context independent model (CIM) associated with a CDM is used in lieu of a score for the CDM when a score for the CDM is needed and has not been pre-computed.
    Type: Grant
    Filed: June 25, 2009
    Date of Patent: December 10, 2013
    Assignee: Intel Corporation
    Inventors: Michael Eugene Deisher, Tao Ma
  • Publication number: 20130325469
    Abstract: A method for providing a voice recognition function and an electronic device thereof are provided. The method provides a voice recognition function in an electronic device that includes outputting, when a voice instruction is input, a list of prediction instructions that are candidate instructions similar to the input voice instruction, updating, when a correction instruction correcting the output candidate instructions is input, the list of prediction instructions, and performing, if the correction instruction matches with an instruction of high similarity in the updated list of prediction instructions, a voice recognition function corresponding to the voice instruction.
    Type: Application
    Filed: May 24, 2013
    Publication date: December 5, 2013
    Applicant: Samsung Electronics Co., Ltd.
    Inventors: Hee-Woon KIM, Yu-Mi AHN, Seon-Hwa KIM, Ha-Young JEON
  • Patent number: 8600765
    Abstract: Embodiments of the present invention provide a signal classification method and device, and encoding and decoding methods and devices. The encoding method includes: dividing a current frame into a low-frequency band signal and a high-frequency band signal; attenuating the high-frequency band signal or a to-be-encoded characteristic parameter of the high-frequency band signal according to an energy attenuation value of the low-frequency band signal, where the energy attenuation value indicates energy attenuation of the low-frequency band signal caused by encoding of the low-frequency band signal; and encoding the attenuated high-frequency band signal or the attenuated to-be-encoded characteristic parameter of the high-frequency band signal. The technical solutions according to the embodiments of the present invention can improve the effect of combining the low-frequency band signal and the high-frequency band signal at the decoder.
    Type: Grant
    Filed: December 27, 2012
    Date of Patent: December 3, 2013
    Assignee: Huawei Technologies Co., Ltd.
    Inventors: Zexin Liu, Lei Miao, Anisse Taleb
  • Publication number: 20130317821
    Abstract: Various arrangements for detecting a type of sound, such as speech, are presented. A plurality of audio snippets may be sampled. A period of time may elapse between consecutive audio snippets. A hypothetical test may be performed using the sampled plurality of audio snippets. Such a hypothetical test may include weighting one or more hypothetical values greater than one or more other hypothetical values. Each hypothetical value may correspond to an audio snippet of the plurality of audio snippets. The hypothetical test may further include using at least the greater weighted one or more hypothetical values to determine whether at least one audio snippet of the plurality of audio snippets comprises the type of sound.
    Type: Application
    Filed: January 2, 2013
    Publication date: November 28, 2013
    Applicant: QUALCOMM INCORPORATED
    Inventors: Shankar Sadasivam, Minho Jin, Leonard Henry Grokop, Edward Harrison Teague
  • Publication number: 20130317820
    Abstract: An automatic speech recognition dictation application is described that includes a dictation module for performing automatic speech recognition in a dictation session with a speaker user to determine representative text corresponding to input speech from the speaker user. A post-processing module develops a session level metric correlated to verbatim recognition error rate of the dictation session, and determines if recognition performance degraded during the dictation session based on a comparison of the session metric to a baseline metric.
    Type: Application
    Filed: May 24, 2012
    Publication date: November 28, 2013
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventors: Xiaoqiang Xiao, Venkatesh Nagesha
  • Patent number: 8595004
    Abstract: A problem to be solved is to robustly detect a pronunciation variation example and acquire a pronunciation variation rule having a high generalization property, with less effort. The problem can be solved by a pronunciation variation rule extraction apparatus including a speech data storage unit, a base form pronunciation storage unit, a sub word language model generation unit, a speech recognition unit, and a difference extraction unit. The speech data storage unit stores speech data. The base form pronunciation storage unit stores base form pronunciation data representing base form pronunciation of the speech data. The sub word language model generation unit generates a sub word language model from the base form pronunciation data. The speech recognition unit recognizes the speech data by using the sub word language model.
    Type: Grant
    Filed: November 27, 2008
    Date of Patent: November 26, 2013
    Assignee: NEC Corporation
    Inventor: Takafumi Koshinaka
  • Patent number: 8595005
    Abstract: A computerized method, software, and system for recognizing emotions from a speech signal, wherein statistical and MFCC features are extracted from the speech signal, the MFCC features are sorted to provide a basis for comparison between the speech signal and reference samples, the statistical and MFCC features are compared between the speech signal and reference samples, a scoring system is used to compare relative correlation to different emotions, a probable emotional state is assigned to the speech signal based on the scoring system, and the probable emotional state is communicated to a user.
    Type: Grant
    Filed: April 22, 2011
    Date of Patent: November 26, 2013
    Assignee: Simple Emotion, Inc.
    Inventors: Akash Krishnan, Matthew Fernandez
  • Patent number: 8595006
    Abstract: A speech recognition method and system, includes receiving in a first noise environment a speech input having a sequence of observations; determining a likelihood of a sequence of words arising from the sequence of observations using an acoustic model trained to recognize speech in a second noise environment, the model having a plurality of model parameters relating to the probability distribution of a word or part thereof being related to an observation; and adapting the model trained in the second environment to that of the first environment.
    Type: Grant
    Filed: March 26, 2010
    Date of Patent: November 26, 2013
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Haitian Xu, Mark John Francis Gales
  • Patent number: 8589152
    Abstract: To this end, a voice detection device includes a band-based power calculation unit that calculates a total of signal power values (sub-band power) of signals entered from the microphones from one preset frequency width (sub-band) to another. The voice detection device also includes a band-based noise estimation unit that estimates the sub-band based noise power, and a sub-band based SNR calculation unit. The sub-band based SNR calculation unit calculates a sub-band SNR from one sub-band to another to output the largest one of the sub-band SNRs as an SNR for a microphone of interest. The voice detection device further includes a voice/non-voice decision unit that determines the voice/non-voice using the SNR for the microphone of interest.
    Type: Grant
    Filed: May 26, 2009
    Date of Patent: November 19, 2013
    Assignee: NEC Corporation
    Inventors: Tadashi Emori, Masanori Tsujikawa
  • Publication number: 20130304468
    Abstract: A method for contextual voice query dilation in a Spoken Web search includes determining a context in which a voice query is created, generating a set of multiple voice query terms based on the context and information derived by a speech recognizer component pertaining to the voice query, and processing the set of query terms with at least one dilation operator to produce a dilated set of queries. A method for performing a search on a voice query is also provided, including generating a set of multiple query terms based on information derived by a speech recognizer component processing a voice query, processing the set with multiple dilation operators to produce multiple dilated sub-sets of query terms, selecting at least one query term from each dilated sub-set to compose a query set, and performing a search on the query set.
    Type: Application
    Filed: August 8, 2012
    Publication date: November 14, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Nitendra Rajput, Kundan Shrivastava
  • Patent number: 8583436
    Abstract: A word category estimation apparatus (100) includes a word category model (5) which is formed from a probability model having a plurality of kinds of information about a word category as features, and includes information about an entire word category graph as at least one of the features. A word category estimation unit (4) receives the word category graph of a speech recognition hypothesis to be processed, computes scores by referring to the word category model for respective arcs that form the word category graph, and outputs a word category sequence candidate based on the scores.
    Type: Grant
    Filed: December 19, 2008
    Date of Patent: November 12, 2013
    Assignee: NEC Corporation
    Inventors: Hitoshi Yamamoto, Kiyokazu Miki
  • Publication number: 20130297306
    Abstract: An adaptive equalization system that adjusts the spectral shape of a speech signal based on an intelligibility measurement of the speech signal may improve the intelligibility of the output speech signal. Such an adaptive equalization system may include a speech intelligibility measurement module, a spectral shape adjustment module, and an adaptive equalization module. The speech intelligibility measurement module is configured to calculate a speech intelligibility measurement of a speech signal. The spectral shape adjustment module is configured to generate a weighted long-term speech curve based on a first predetermined long-term average speech curve, a second predetermined long-term average speech curve, and the speech intelligibility measurement. The adaptive equalization module is configured to adapt equalization coefficients for the speech signal based on the weighted long-term speech curve.
    Type: Application
    Filed: May 4, 2012
    Publication date: November 7, 2013
    Applicant: QNX Software Systems Limited
    Inventors: Phillip Alan Hetherington, Xueman Li
  • Patent number: 8577678
    Abstract: A speech recognition system according to the present invention includes a sound source separating section which separates mixed speeches from multiple sound sources from one another; a mask generating section which generates a soft mask which can take continuous values between 0 and 1 for each frequency spectral component of a separated speech signal using distributions of speech signal and noise against separation reliability of the separated speech signal; and a speech recognizing section which recognizes speeches separated by the sound source separating section using soft masks generated by the mask generating section.
    Type: Grant
    Filed: March 10, 2011
    Date of Patent: November 5, 2013
    Assignee: Honda Motor Co., Ltd.
    Inventors: Kazuhiro Nakadai, Toru Takahashi, Hiroshi Okuno
  • Publication number: 20130289987
    Abstract: A system and method are presented for negative example based performance improvements for speech recognition. The presently disclosed embodiments address identified false positives and the identification of negative examples of keywords in an Automatic Speech Recognition (ASR) system. Various methods may be used to identify negative examples of keywords. Such methods may include, for example, human listening and learning possible negative examples from a large domain specific text source. In at least one embodiment, negative examples of keywords may be used to improve the performance of an ASR system by reducing false positives.
    Type: Application
    Filed: April 26, 2013
    Publication date: October 31, 2013
    Applicant: Interactive Intelligence, Inc.
    Inventors: Aravind Ganapathiraju, Ananth Nagaraja Iyer, Felix Immanuel Wyss
  • Patent number: 8571865
    Abstract: Systems, methods performed by data processing apparatus and computer storage media encoded with computer programs for receiving information relating to (i) a communication device that has received an utterance and (ii) a voice associated with the received utterance, comparing the received voice information with voice signatures in a comparison group, the comparison group including one or more individuals identified from one or more connections arising from the received information relating to the communication device, attempting to identify the voice associated with the utterance as matching one of the individuals in the comparison group, and based on a result of the attempt to identify, selectively providing the communication device with access to one or more resources associated with the matched individual.
    Type: Grant
    Filed: August 10, 2012
    Date of Patent: October 29, 2013
    Assignee: Google Inc.
    Inventor: Philip Hewinson
  • Publication number: 20130282374
    Abstract: A speech recognition device has: hypothesis search means which searches for an optimal solution of inputted speech data by generating a hypothesis which is a bundle of words which are searched for as recognition result candidates; self-repair decision means which calculates a self-repair likelihood of a word or a word sequence included in the hypothesis which is being searched for by the hypothesis search means, and decides whether or not self-repair of the word or the word sequence is performed; and transparent word hypothesis generation means which, when the self-repair decision means decides that the self-repair is performed, generates a transparent word hypothesis which is a hypothesis which regards as a transparent word a word or a word sequence included in an un-repaired interval related to the word or the word sequence, and the hypothesis search means searches hypotheses for an optimal solution, the hypotheses including as search target hypotheses the transparent word hypothesis generated by the transp
    Type: Application
    Filed: January 5, 2012
    Publication date: October 24, 2013
    Applicant: NEC CORPORATION
    Inventors: Koji Okabe, Ken Hanazawa, Seiya Osada
  • Patent number: 8566091
    Abstract: A speech recognition system is provided for selecting, via a speech input, an item from a list of items. The speech recognition system detects a first speech input, recognizes the first speech input, compares the recognized first speech input with the list of items and generates a first candidate list of best matching items based on the comparison result. The system then informs the speaker of at least one of the best matching items of the first candidate list for a selection of an item by the speaker. If the intended item is not one of the best matching items presented to the speaker, the system then detects a second speech input, recognizes the second speech input, and generates a second candidate list of best matching items taking into account the comparison result obtained with the first speech input.
    Type: Grant
    Filed: December 12, 2007
    Date of Patent: October 22, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Andreas Löw, Lars König, Christian Hillebrecht