Specialized Equations Or Comparisons Patents (Class 704/236)
  • Patent number: 8150678
    Abstract: The present invention provides a spoken document retrieval system capable of high-speed and high-accuracy retrieval of where a user-specified keyword is uttered from spoken documents, even if the spoken documents are large in amount. Candidate periods are narrowed down in advance on the basis of a sequence of subwords generated from a keyword, and then the count values of the candidate periods containing the subwords are each calculated by adding up certain values. Through such simple process, the candidate periods are prioritized and then selected as retrieved results. In addition, the sequence of subwords generated from the keyword is complemented assuming that speech recognition errors occur, and then, candidate period generation and selection are performed on the basis of the complemented sequence of subwords.
    Type: Grant
    Filed: November 21, 2008
    Date of Patent: April 3, 2012
    Assignee: Hitachi, Ltd.
    Inventor: Hirohiko Sagawa
  • Patent number: 8150690
    Abstract: The invention relates to a speech recognition system and method with cepstral noise subtraction. The speech recognition system and method utilize a first scalar coefficient, a second scalar coefficient, and a determining condition to limit the process for the cepstral feature vector, so as to avoid excessive enhancement or subtraction in the cepstral feature vector, so that the operation of the cepstral feature vector is performed properly to improve the anti-noise ability in speech recognition. Furthermore, the speech recognition system and method can be applied in any environment, and have a low complexity and can be easily integrated into other systems, so as to provide the user with a more reliable and stable speech recognition result.
    Type: Grant
    Filed: October 1, 2008
    Date of Patent: April 3, 2012
    Assignee: Industrial Technology Research Institute
    Inventor: Shih-Ming Huang
  • Patent number: 8145274
    Abstract: Systems and methods for automatically setting reminders. A method for automatically setting reminders includes receiving utterances, determining whether the utterances match a stored phrase, and in response to determining that there is a match, automatically setting a reminder in a mobile communication device. Various filters can be applied to determine whether or not to set a reminder. Examples of suitable filters include location, date/time, callee's phone number, etc.
    Type: Grant
    Filed: May 14, 2009
    Date of Patent: March 27, 2012
    Assignee: International Business Machines Corporation
    Inventors: Salil P. Gandhi, Saidas T. Kottawar, Mike V. Macias, Sandip D. Mahajan
  • Patent number: 8145483
    Abstract: The invention can recognize any several languages at the same time without using samples. The important skill is that features of known words in any language are extracted from unknown words or continuous voices. These unknown words represented by matrices are spread in the 144-dimensional space. The feature of a known word of any language represented by a matrix is simulated by the surrounding unknown words. The invention includes 12 elastic frames of equal length without filter and without overlap to normalize the signal waveform of variable length for a word, which has one to several syllables, into a 12×12 matrix as a feature of the word. The invention can improve the feature such that the speech recognition of an unknown sentence is correct. The invention can correctly recognize any languages without samples, such as English, Chinese, German, French, Japanese, Korean, Russian, Cantonese, Taiwanese, etc.
    Type: Grant
    Filed: August 5, 2009
    Date of Patent: March 27, 2012
    Inventors: Tze Fen Li, Tai-Jan Lee Li, Shih-Tzung Li, Shih-Hon Li, Li-Chuan Liao
  • Patent number: 8145482
    Abstract: Methods and apparatus for the enhancement of speech to text engines, by providing indications to the correctness of the found words, based on additional sources besides the internal indication provided by the STT engine. The enhanced indications comprise sources of data such as acoustic features, CTI features, phonetic search and others. The apparatus and methods also enable the detection of important or significant keywords found in audio files, thus enabling more efficient usages, such as further processing or transfer of interactions to relevant agents, escalation of issues, or the like. The methods and apparatus employ a training phase in which word model and key phrase model are generated for determining an enhanced correctness indication for a word and an enhanced importance indication for a key phrase, based on the additional features.
    Type: Grant
    Filed: May 25, 2008
    Date of Patent: March 27, 2012
    Inventors: Ezra Daya, Oren Pereg, Yuval Lubowich, Moshe Wasserblat
  • Publication number: 20120072214
    Abstract: A frame erasure concealment technique for a bitstream-based feature extractor in a speech recognition system particularly suited for use in a wireless communication system operates to “delete” each frame in which an erasure is declared. The deletions thus reduce the length of the observation sequence, but have been found to provide for sufficient speech recognition based on both single word and “string” tests of the deletion technique.
    Type: Application
    Filed: November 29, 2011
    Publication date: March 22, 2012
    Applicant: AT&T INTELLECTUAL PROPERTY II, L.P.
    Inventors: Richard Vandervoort Cox, Hong Kook Kim
  • Patent number: 8140329
    Abstract: A method and apparatus are proposed for automatically recognizing observed audio data. An observation vector is created of audio features extracted from the observed audio data and the observed audio data is recognized from the observation vector. The audio features include features are selected from a group of 3 types of features obtained from the observed audio data: (i) ICA features obtained by processing the observed audio data, (ii) first MFCC features obtained by removing a logarithm step from the conventional MFCC process, or (iii) second MFCC features obtained by applying the ICA process to results of a mel scale filter bank.
    Type: Grant
    Filed: April 5, 2004
    Date of Patent: March 20, 2012
    Assignee: Sony Corporation
    Inventors: Jian Zhang, Wei Lu, Xiaobing Sun
  • Patent number: 8140069
    Abstract: The present invention provides a method and system for defining the mean opinion score (MOS) as a function of frame error rate (FER) and pilot signal strength. In an embodiment of the invention, an entity receives MOS scores that have been obtained using subjective tests for certain calls made within the network. Next, the entity receives FER and pilot signal strength samples that have been obtained for the calls for which MOS scores have been subjectively obtained. Finally, the entity calculates an equation for the MOS as a function of FER and pilot signal strength using a non-linear regression analysis.
    Type: Grant
    Filed: June 12, 2008
    Date of Patent: March 20, 2012
    Assignee: Sprint Spectrum L.P.
    Inventors: Abhishek Lall, Ashish Bhan, Sachin Vargantwar, Robert Stedman, Mark Yarkosky
  • Patent number: 8140328
    Abstract: Disclosed herein are systems, computer-implemented methods, and tangible computer-readable media for using alternate recognition hypotheses to improve whole-dialog understanding accuracy. The method includes receiving an utterance as part of a user dialog, generating an N-best list of recognition hypotheses for the user dialog turn, selecting an underlying user intention based on a belief distribution across the generated N-best list and at least one contextually similar N-best list, and responding to the user based on the selected underlying user intention. Selecting an intention can further be based on confidence scores associated with recognition hypotheses in the generated N-best lists, and also on the probability of a user's action given their underlying intention. A belief or cumulative confidence score can be assigned to each inferred user intention.
    Type: Grant
    Filed: December 1, 2008
    Date of Patent: March 20, 2012
    Assignee: AT&T Intellectual Property I, L.P.
    Inventor: Jason Williams
  • Patent number: 8131543
    Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving an audio signal, determining an energy-independent component of a portion of the audio signal associated with a spectral shape of the portion, and determining an energy-dependent component of the portion associated with a gain level of the portion. The method also comprises comparing the energy-independent and energy-dependent components to a speech model, comparing the energy-independent and energy-dependent components to a noise model, and outputting an indication whether the portion of the audio signal more closely corresponds to the speech model or to the noise model based on the comparisons.
    Type: Grant
    Filed: April 14, 2008
    Date of Patent: March 6, 2012
    Assignee: Google Inc.
    Inventors: Ron J. Weiss, Trausti Kristjansson
  • Patent number: 8131544
    Abstract: A system distinguishes a primary audio source and background noise to improve the quality of an audio signal. A speech signal from a microphone may be improved by identifying and dampening background noise to enhance speech. Stochastic models may be used to model speech and to model background noise. The models may determine which portions of the signal are speech and which portions are noise. The distinction may be used to improve the signal's quality, and for speaker identification or verification.
    Type: Grant
    Filed: November 12, 2008
    Date of Patent: March 6, 2012
    Assignee: Nuance Communications, Inc.
    Inventors: Tobias Herbig, Oliver Gaupp, Franz Gerl
  • Patent number: 8126162
    Abstract: An audio signal interpolation apparatus is configured to perform interpolation processing on the basis of audio signals preceding and/or following a predetermined segment on a time axis so as to obtain an audio signal corresponding to the predetermined segment. The audio signal interpolation apparatus includes a waveform formation unit configured to form a waveform for the predetermined segment on the basis of time-domain samples of the preceding and/or the following audio signals and a power control unit configured to control power of the waveform for the predetermined segment formed by the waveform formation unit using a non-linear model selected on the basis of the preceding audio signal when the power of the preceding audio signal is larger than that of the following audio signal, or the following audio signal when the power of the preceding audio signal is smaller than that of the following audio signal.
    Type: Grant
    Filed: May 23, 2007
    Date of Patent: February 28, 2012
    Assignee: Sony Corporation
    Inventors: Chunmao Zhang, Toru Chinen
  • Publication number: 20120046945
    Abstract: In a voice processing system, a multimodal request is received from a plurality of modality input devices, and the requested application is run to provide a user with the feedback of the multimodal request. In the voice processing system, a multimodal aggregating unit is provided which receives a multimodal input from a plurality of modality input devices, and provides an aggregated result to an application control based on the interpretation of the interaction ergonomics of the multimodal input within the temporal constraints of the multimodal input. Thus, the multimodal input from the user is recognized within a temporal window. Interpretation of the interaction ergonomics of the multimodal input include interpretation of interaction biometrics and interaction mechani-metrics, wherein the interaction input of at least one modality may be used to bring meaning to at least one other input of another modality.
    Type: Application
    Filed: September 23, 2011
    Publication date: February 23, 2012
    Applicant: Nuance Communications, Inc.
    Inventors: Alexander Faisman, Dimitri Kanevsky, David Nahamoo, Roberto Sicconi, Mahesh Viswanathan
  • Patent number: 8121835
    Abstract: Automatic level control of speech portions of an audio signal is provided. An audio signal is received in the form of a sequence of samples and may contain speech portion and non-speech portions. The sequence of samples is divided into a sequence of sub-frames. Multiple sub-frames adjacent to a present sub-frame are examined to determine a peak value of samples in the sub-frames. A gain factor is computed for the present sub-frame based on the peak value and a desired maximum value for said speech portion, and each sample in the present sub-frame is amplified by the gain factor. In an embodiment, variations in filtered energy values of multiple sub-frames enable determination of whether a sub-frame corresponds to a speech or non-speech/noise portion.
    Type: Grant
    Filed: March 6, 2008
    Date of Patent: February 21, 2012
    Assignee: Texas Instruments Incorporated
    Inventor: Fitzgerald John Archibald
  • Patent number: 8121837
    Abstract: Methods, apparatus, and products are disclosed for adjusting a speech engine for a mobile computing device based on background noise, the mobile computing device operatively coupled to a microphone, that include: sampling, through the microphone, background noise for a plurality of operating environments in which the mobile computing device operates; generating, for each operating environment, a noise model in dependence upon the sampled background noise for that operating environment; and configuring the speech engine for the mobile computing device with the noise model for the operating environment in which the mobile computing device currently operates.
    Type: Grant
    Filed: April 24, 2008
    Date of Patent: February 21, 2012
    Assignee: Nuance Communications, Inc.
    Inventors: Ciprian Agapi, William K. Bodin, Charles W. Cross, Jr., Paritosh D. Patel
  • Patent number: 8116486
    Abstract: An apparatus for mixing a plurality of input data streams is described, which has a processing unit adapted to compare the frames of the plurality of input data streams, and determine, based on the comparison, for a spectral component of an output frame of an output data stream, exactly one input data stream of the plurality of input data streams. The output data stream is generated by copying at least a part of an information of a corresponding spectral component of the frame of the determined data stream. Further or alternatively, the control values of the frames of the first and second input data streams are compared, and, if so, the control value is adopted.
    Type: Grant
    Filed: March 4, 2009
    Date of Patent: February 14, 2012
    Assignee: Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.
    Inventors: Markus Schnell, Manfred Lutzky, Markus Multrus
  • Publication number: 20120016672
    Abstract: Computer-implemented systems and methods are provided for assessing non-native speech proficiency. A non-native speech sample is processed to identify a plurality of vowel sound boundaries in the non-native speech sample. Portions of the non-native speech sample are analyzed within the vowel sound boundaries to extract vowel characteristics. The vowel characteristics are used to identify a plurality of vowel space metrics for the non-native speech sample, and the vowel space metrics are used to determine a non-native speech proficiency score for the non-native speech sample.
    Type: Application
    Filed: July 14, 2011
    Publication date: January 19, 2012
    Inventors: Lei Chen, Keelan Evanini, Xie Sun
  • Patent number: 8095373
    Abstract: The present invention provides a robot apparatus with a vocal interactive function. The robot apparatus receives a vocal input, and recognizes the vocal input. The robot apparatus stores a plurality of output data, a last output time of each of the output data, and a weighted value of each of the output data. The robot apparatus outputs output data according to the weighted values of all the output data corresponding to the vocal input, and updates the last output time of the output data. The robot apparatus calculates the weighted values of all the output data corresponding to the vocal input according to the last output time. Consequently, the robot apparatus may output different and variable output data when receiving the same vocal input. The present invention also provides a vocal interactive method adapted for the robot apparatus.
    Type: Grant
    Filed: August 19, 2008
    Date of Patent: January 10, 2012
    Assignee: Hon Hai Precision Industry Co., Ltd.
    Inventors: Tsu-Li Chiang, Chuan-Hong Wang, Kuo-Pao Hung, Kuan-Hong Hsieh
  • Patent number: 8095363
    Abstract: A method and system for monitoring an automated dialog system for the automatic recognition of language understanding errors based on a user's input communications in a task classification system. If the user's input communication cannot be understood and a task classification decision cannot be made, then further dialog may be conducted with the user if a probability of understanding the user's input communication exceeds a first threshold. Otherwise, the user may be directed to a human for assistance. In another possible embodiment, the method operates as above except that if the probability exceeds a second threshold, then further dialog may be conducted with the user using the current dialog strategy. However, if the probability falls between a first threshold and a second threshold, the dialog strategy may be adapted in order to improve the chances of conducting a successful dialog with the user.
    Type: Grant
    Filed: January 6, 2009
    Date of Patent: January 10, 2012
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Allen Louis Gorin, Irene Langkilde Geary, Marilyn Ann Walker, Jeremy H. Wright
  • Patent number: 8090581
    Abstract: A frame erasure concealment technique for a bitstream-based feature extractor in a speech recognition system particularly suited for use in a wireless communication system operates to “delete” each frame in which an erasure is declared. The deletions thus reduce the length of the observation sequence, but have been found to provide for sufficient speech recognition based on both single word and “string” tests of the deletion technique.
    Type: Grant
    Filed: August 19, 2009
    Date of Patent: January 3, 2012
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Richard Vandervoort Cox, Hong Kook Kim
  • Patent number: 8082148
    Abstract: Methods, systems, and products for testing a grammar used in speech recognition for reliability in a plurality of operating environments having different background noise that include: receiving recorded background noise for each of the plurality of operating environments; generating a test speech utterance for recognition by a speech recognition engine using a grammar; mixing the test speech utterance with each recorded background noise, resulting in a plurality of mixed test speech utterances, each mixed test speech utterance having different background noise; performing, for each of the mixed test speech utterances, speech recognition using the grammar and the mixed test speech utterance, resulting in speech recognition results for each of the mixed test speech utterances; and evaluating, for each recorded background noise, speech recognition reliability of the grammar in dependence upon the speech recognition results for the mixed test speech utterance having that recorded background noise.
    Type: Grant
    Filed: April 24, 2008
    Date of Patent: December 20, 2011
    Assignee: Nuance Communications, Inc.
    Inventors: Ciprian Agapi, William K. Bodin, Charles W. Cross, Jr., Michael H. Mirt
  • Patent number: 8078462
    Abstract: A transformation-parameter calculating unit calculates a first model parameter indicating a parameter of a speaker model for causing a first likelihood for a clean feature to maximum, and calculates a transformation parameter for causing the first likelihood to maximum. The transformation parameter transforms, for each of the speakers, a distribution of the clean feature corresponding to the identification information of the speaker to a distribution represented by the speaker model of the first model parameter. A model-parameter calculating unit transforms a noisy feature corresponding to identification information for each of speakers by using the transformation parameter, and calculates a second model parameter indicating a parameter of the speaker model for causing a second likelihood for the transformed noisy feature to maximum.
    Type: Grant
    Filed: October 2, 2008
    Date of Patent: December 13, 2011
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Yusuke Shinohara, Masami Akamine
  • Patent number: 8069041
    Abstract: Candidates for channels of television programs to be displayed are determined in accordance with a result of voice recognition of a voice input by a user. The channel candidates are assigned to a limited number of tuners and television programs received by the tuners are displayed to allow the user to make a selection.
    Type: Grant
    Filed: October 13, 2006
    Date of Patent: November 29, 2011
    Assignee: Canon Kabushiki Kaisha
    Inventors: Hideo Kuboyama, Masayuki Yamada
  • Publication number: 20110288865
    Abstract: A non-intrusive speech quality estimation technique is based on statistical or probability models such as Gaussian Mixture Models (“GMMs”). Perceptual features are extracted from the received speech signal and assessed by an artificial reference model formed using statistical models. The models characterize the statistical behavior of speech features. Consistency measures between the input speech features and the models are calculated to form indicators of speech quality. The consistency values are mapped to a speech quality score using a mapping optimized using machine learning algorithms, such as Multivariate Adaptive Regression Splines (“MARS”). The technique provides competitive or better quality estimates relative to known techniques while having lower computational complexity.
    Type: Application
    Filed: August 1, 2011
    Publication date: November 24, 2011
    Inventors: Wai-Yip Chan, Tiago H. Falk, Qingfeng Xu
  • Publication number: 20110270610
    Abstract: Parameters for distributions of a hidden trajectory model including means and variances are estimated using an acoustic likelihood function for observation vectors as an objection function for optimization. The estimation includes only acoustic data and not any intermediate estimate on hidden dynamic variables. Gradient ascent methods can be developed for optimizing the acoustic likelihood function.
    Type: Application
    Filed: July 14, 2011
    Publication date: November 3, 2011
    Applicant: MICROSOFT CORPORATION
    Inventors: Li Deng, Dong Yu, Xiaolong Li, Alejandro Acero
  • Patent number: 8050929
    Abstract: An optimal selection or decision strategy is described through an example that includes use in dialog systems. The selection strategy or method includes receiving multiple predictions and multiple probabilities. The received predictions predict the content of a received input and each of the probabilities corresponds to one of the predictions. In an example dialog system, the received input includes an utterance. The selection method includes dynamically selecting a set of predictions from the received predictions by generating ranked predictions. The ranked predictions are generated by ordering the plurality of predictions according to descending probability.
    Type: Grant
    Filed: August 24, 2007
    Date of Patent: November 1, 2011
    Assignee: Robert Bosch GmbH
    Inventors: Junling Hu, Fabrizio Morbini, Fuliang Weng, Xue Liu
  • Patent number: 8041566
    Abstract: The present invention relates to a method, a computer system and a computer program product for speech recognition and/or text formatting by making use of topic specific statistical models. A text document which may be obtained from a first speech recognition pass is subject to segmentation and to an assignment of topic specific models for each obtained section. Each model of the set of models provides statistic information about language model probabilities, about text processing or formatting rules, as e.g. the interpretation of commands for punctuation, formatting, text highlighting or of ambiguous text portions requiring specific formatting, as well as a specific vocabulary being characteristic for each section of the recognized text. Furthermore, other properties of a speech recognition and/or formatting system (such as e.g. settings for the speaking rate) may be encoded in the statistical models. The models themselves are generated on the basis of annotated training data and/or by manual coding.
    Type: Grant
    Filed: November 12, 2004
    Date of Patent: October 18, 2011
    Assignee: Nuance Communications Austria GmbH
    Inventors: Jochen Peters, Evgeny Matusov, Carsten Meyer, Dietrich Klakow
  • Patent number: 8036893
    Abstract: A system for use in speech recognition includes an acoustic module accessing a plurality of distinct-language acoustic models, each based upon a different language; a lexicon module accessing at least one lexicon model; and a speech recognition output module. The speech recognition output module generates a first speech recognition output using a first model combination that combines one of the plurality of distinct-language acoustic models with the at least one lexicon model. In response to a threshold determination, the speech recognition output module generates a second speech recognition output using a second model combination that combines a different one of the plurality of distinct-language acoustic models with the at least one distinct-language lexicon model.
    Type: Grant
    Filed: July 22, 2004
    Date of Patent: October 11, 2011
    Assignee: Nuance Communications, Inc.
    Inventor: David E. Reich
  • Patent number: 8032373
    Abstract: A system and method for enabling two computer systems to communicate over an audio communications channel, such as a voice telephony connection. Such a system includes a software application that enables a user's computer to call, interrogate, download, and manage a voicemail account stored on a telephone company's computer, without human intervention. A voicemail retrieved from the telephone company's computer can be stored in a digital format on the user's computer. In such a format, the voicemail can be readily archived, or even distributed throughout a network, such as the Internet, in a digital form, such as an email attachment. Preferably a computationally efficient audio recognition algorithm is employed by the user's computer to respond to and navigate the automated audio menu of the telephone company's computer.
    Type: Grant
    Filed: February 28, 2007
    Date of Patent: October 4, 2011
    Assignee: Intellisist, Inc.
    Inventor: Martin R. M. Dunsmuir
  • Publication number: 20110231188
    Abstract: The system and method described herein may provide an acoustic grammar to dynamically sharpen speech interpretation. In particular, the acoustic grammar may be used to map one or more phonemes identified in a user verbalization to one or more syllables or words, wherein the acoustic grammar may have one or more linking elements to reduce a search space associated with mapping the phonemes to the syllables or words. As such, the acoustic grammar may be used to generate one or more preliminary interpretations associated with the verbalization, wherein one or more post-processing techniques may then be used to sharpen accuracy associated with the preliminary interpretations. For example, a heuristic model may assign weights to the preliminary interpretations based on context, user profiles, or other knowledge and a probable interpretation may be identified based on confidence scores associated with one or more candidate interpretations generated with the heuristic model.
    Type: Application
    Filed: June 1, 2011
    Publication date: September 22, 2011
    Applicant: VoiceBox Technologies, Inc.
    Inventors: Robert A. Kennewick, Min Ke, Michael Tjalve, Philippe Di Cristo
  • Publication number: 20110224982
    Abstract: Described is a technology in which information retrieval (IR) techniques are used in a speech recognition (ASR) system. Acoustic units (e.g., phones, syllables, multi-phone units, words and/or phrases) are decoded, and features found from those acoustic units. The features are then used with IR techniques (e.g., TF-IDF based retrieval) to obtain a target output (a word or words).
    Type: Application
    Filed: March 12, 2010
    Publication date: September 15, 2011
    Applicant: c/o Microsoft Corporation
    Inventors: Alejandro Acero, James Garnet Droppo, III, Xiaoqiang Xiao, Geoffrey G. Zweig
  • Patent number: 8015005
    Abstract: A method, system and communication device for enabling voice-to-voice searching and ordered content retrieval via audio tags assigned to individual content, which tags generate uniterms that are matched against components of a voice query. The method includes storing content and tagging at least one of the content with an audio tag. The method further includes receiving a voice query to retrieve content stored on the device. When the voice query is received, the method completes a voice-to-voice search utilizing uniterms of the audio tag, scored against the phoneme latent lattice model generated by the voice query to identify matching terms within the audio tags and corresponding stored content. The retrieved content(s) associated with the identified audio tags having uniterms that score within the phoneme lattice model are outputted in an order corresponding to an order in which the uniterms are structured within the voice query.
    Type: Grant
    Filed: February 15, 2008
    Date of Patent: September 6, 2011
    Assignee: Motorola Mobility, Inc.
    Inventor: Changxue Ma
  • Patent number: 8015006
    Abstract: Systems and methods for receiving natural language queries and/or commands and execute the queries and/or commands. The systems and methods overcomes the deficiencies of prior art speech query and response systems through the application of a complete speech-based information query, retrieval, presentation and command environment. This environment makes significant use of context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for one or more users making queries or commands in multiple domains. Through this integrated approach, a complete speech-based natural language query and response environment can be created. The systems and methods creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context and presenting the expected results for a particular question or command.
    Type: Grant
    Filed: May 30, 2008
    Date of Patent: September 6, 2011
    Assignee: VoiceBox Technologies, Inc.
    Inventors: Robert A. Kennewick, David Locke, Michael R. Kennewick, Sr., Michael R. Kennewick, Jr., Richard Kennewick, Tom Freeman
  • Publication number: 20110213614
    Abstract: A method of analysing an audio signal is disclosed. A digital representation of an audio signal is received and a first output function is generated based on a response of a physiological model to the digital representation. At least one property of the first output function may be determined. One or more values are determined for use in analysing the audio signal, based on the determined property of the first output function.
    Type: Application
    Filed: September 11, 2009
    Publication date: September 1, 2011
    Applicant: NEWSOUTH INNOVATIONS PTY LIMITED
    Inventors: Wenliang Lu, Dipanjan Sen
  • Patent number: 7996215
    Abstract: A method and an apparatus for Voice Activity Detection (VAD) and an encoder are provided. The method for VAD includes: acquiring a fluctuant feature value of a background noise when an input signal is the background noise, in which the fluctuant feature value is used to represent fluctuation of the background noise; performing adaptive adjustment on a VAD decision criterion related parameter according to the fluctuant feature value; and performing VAD decision on the input signal by using the decision criterion related parameter on which the adaptive adjustment is performed. The method, the apparatus, and the encoder can be adaptive to fluctuation of the background noise to perform VAD decision, so as to enhance the VAD decision performance, save limited channel bandwidth resources, and use the channel bandwidth efficiently.
    Type: Grant
    Filed: April 13, 2011
    Date of Patent: August 9, 2011
    Assignee: Huawei Technologies Co., Ltd.
    Inventors: Zhe Wang, Qing Zhang
  • Patent number: 7996218
    Abstract: A user adaptive speech recognition method and apparatus is disclosed that controls user confirmation of a recognition candidate using a new threshold value adapted to a user. The user adaptive speech recognition method includes calculating a confidence score of a recognition candidate according to the result of speech recognition, setting a new threshold value adapted to the user based on a result of user confirmation of the recognition candidate and the confidence score of the recognition candidate, and outputting a corresponding recognition candidate as a result of the speech recognition if the calculated confidence score is higher than the new threshold value. Thus, the need for user confirmation of the result of speech recognition is reduced and the probability of speech recognition success is increased.
    Type: Grant
    Filed: February 16, 2006
    Date of Patent: August 9, 2011
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Jung-eun Kim, Jeong-su Kim
  • Patent number: 7996220
    Abstract: An automatic speech recognition (ASR) system and method is provided for controlling the recognition of speech utterances generated by an end user operating a communications device. The ASR system and method can be used with a communications device that is used in a communications network. The ASR system can be used for ASR of speech utterances input into a mobile device, to perform compensating techniques using at least one characteristic and for updating an ASR speech recognizer associated with the ASR system by determined and using a background noise value and a distortion value that is based on the features of the mobile device. The ASR system can be used to augment a limited data input capability of a mobile device, for example, caused by limited input devices physically located on the mobile device.
    Type: Grant
    Filed: November 4, 2008
    Date of Patent: August 9, 2011
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Richard C. Rose, Sarangarajan Pathasarathy, Aaron Edward Rosenberg, Shrikanth Sambasivan Narayanan
  • Publication number: 20110191104
    Abstract: A method for measuring a disparity between two speech samples is disclosed that may include determining upon a speech granularity level at which to compare the rhythm of a student speech sample and a reference speech sample; determining a duration disparity between a first speech unit and a second, non-adjacent speech unit in the student speech sample; determining a duration disparity between a first speech unit and a second, non-adjacent speech unit in the reference speech sample; and calculating the difference between the student speech-unit duration disparity and the reference speech-unit disparity.
    Type: Application
    Filed: January 29, 2010
    Publication date: August 4, 2011
    Applicant: Rosetta Stone, Ltd.
    Inventors: Joseph Tepperman, Theban Stanley, Kadri Hacioglu
  • Patent number: 7983911
    Abstract: The invention relates to a speech recognition process implemented in at least one terminal (114), the speech recognition process using a language model (311), comprising the following steps: detection (502) of at least one unrecognized expression in one of the terminals; recording (503) in the terminal of data representative of the unrecognized expression (309); transmission (603) by the terminal of the recorded data to a remote server (116); analysis, (803) at the level of the remote server, of the data and generation (805) of information for correcting the said language model taking account of at least one part of the unrecognized expression; and transmission (806) from the server to at least one terminal (114, 117, 118) of the correcting information, so as to allow future recognition of at least certain of the unrecognized expressions. The invention also relates to corresponding modules, devices (102) and a remote server (116).
    Type: Grant
    Filed: February 12, 2002
    Date of Patent: July 19, 2011
    Assignee: Thomson Licensing
    Inventors: Frédéric Soufflet, Nour-Eddine Tazine
  • Publication number: 20110166857
    Abstract: A human voice distinguishing method and device are provided. The method involves: taking every n sampling points of the current frame of audio signals as one subsection, wherein n is a positive integer, judging whether two adjacent subsections have transition relative to a distinguishing threshold, wherein the sliding maximum absolute value of the two adjacent subsections is more and less than the distinguishing threshold respectively, if so, then determining the current frame to be human voice, where the sliding maximum absolute value of the subsection is obtained by the following method: taking the maximum value of absolute intensity of every sampling point in this subsection as the initial maximum absolute value of this subsection, and taking the maximum value of the initial maximum absolute value of this subsection and m subsections following this subsection as the sliding maximum absolute value of this subsection, wherein m is a positive integer.
    Type: Application
    Filed: September 15, 2009
    Publication date: July 7, 2011
    Applicant: ACTIONS SEMICONDUCTOR CO. LTD.
    Inventors: Xiangyong Xie, Zhan Chen
  • Patent number: 7974392
    Abstract: A communication device and method are provided for audibly outputting a received text message to a user, the text message being received from a sender. A text message to present audibly is received. An output voice to present the text message is retrieved, wherein the output voice is synthesized using predefined voice characteristic information to represent the sender's voice. The output voice is used to audibly present the text message to the user.
    Type: Grant
    Filed: March 2, 2010
    Date of Patent: July 5, 2011
    Assignee: Research In Motion Limited
    Inventor: Eric Ng
  • Patent number: 7966176
    Abstract: A system includes an acoustic input engine configured to accept a speech input, to recognize phonemes of the speech input, and to create word strings based on the recognized phonemes. The system includes a semantic engine coupled to the acoustic engine and operable to identify actions and to identify objects by parsing the word strings. The system also includes an action-object pairing system to identify a dominant entry from the identified actions and the identified objects, to select a complement to the dominant entry from the identified actions and the identified objects, and to form an action-object pair that includes the dominant entry and the complement. The system further includes an action-object routing table operable to provide a routing destination based on the action-object pair. The system also includes a call routing module to route a call to the routing destination.
    Type: Grant
    Filed: October 22, 2009
    Date of Patent: June 21, 2011
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Robert R. Bushey, Michael Sabourin, Carl Potvin, Benjamin Anthony Knott, John Mills Martin
  • Patent number: 7966183
    Abstract: Automatic speech recognition verification using a combination of two or more confidence scores based on UV features which reuse computations of the original recognition.
    Type: Grant
    Filed: May 4, 2007
    Date of Patent: June 21, 2011
    Assignee: Texas Instruments Incorporated
    Inventors: Kaisheng Yao, Lorin Paul Netsch, Vishu Viswanathan
  • Publication number: 20110137650
    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for training adaptation-specific acoustic models. A system practicing the method receives speech and generates a full size model and a reduced size model, the reduced size model starting with a single distribution for each speech sound in the received speech. The system finds speech segment boundaries in the speech using the full size model and adapts features of the speech data using the reduced size model based on the speech segment boundaries and an overall centroid for each speech sound. The system then recognizes speech using the adapted features of the speech. The model can be a Hidden Markov Model (HMM). The reduced size model can also be of a reduced complexity, such as having fewer mixture components than a model of full complexity. Adapting features of speech can include moving the features closer to an overall feature distribution center.
    Type: Application
    Filed: December 8, 2009
    Publication date: June 9, 2011
    Applicant: AT&T Intellectual Property I, L.P.
    Inventor: Andrej LJOLJE
  • Patent number: 7957971
    Abstract: Word lattices that are generated by an automatic speech recognition system are used to generate a modified word lattice that is usable by a spoken language understanding module. In one embodiment, the spoken language understanding module determines a set of salient phrases by calculating an intersection of the modified word lattice, which is optionally preprocessed, and a finite state machine that includes a plurality of salient grammar fragments.
    Type: Grant
    Filed: June 12, 2009
    Date of Patent: June 7, 2011
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Allen Louis Gorin, Dilek Z. Hakkani-Tur, Giuseppe Riccardi, Gokhan Tur, Jeremy Huntley Wright
  • Patent number: 7949525
    Abstract: A spoken language understanding method and system are provided. The method includes classifying a set of labeled candidate utterances based on a previously trained classifier, generating classification types for each candidate utterance, receiving confidence scores for the classification types from the trained classifier, sorting the classified utterances based on an analysis of the confidence score of each candidate utterance compared to a respective label of the candidate utterance, and rechecking candidate utterances according to the analysis. The system includes modules configured to control a processor in the system to perform the steps of the method.
    Type: Grant
    Filed: June 16, 2009
    Date of Patent: May 24, 2011
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Dilek Z. Hakkani-Tur, Mazin G. Rahim, Gokhan Tur
  • Patent number: 7949524
    Abstract: At least one recognized keyword is presented to a user, and the keyword is corrected appropriately upon receipt of a correction for the presented result. A standby-word dictionary used for recognition of uttered speech is generated according to a result of the correction to recognize the uttered speech. Therefore, even if recognized keywords contain an error, the error can be corrected and uttered speech can be accurately recognized.
    Type: Grant
    Filed: November 13, 2007
    Date of Patent: May 24, 2011
    Assignee: Nissan Motor Co., Ltd.
    Inventors: Daisuke Saitoh, Keiko Katsuragawa, Minoru Tomikashi, Takeshi Ono, Eiji Tozuka
  • Publication number: 20110109539
    Abstract: A behavior recognition system and method by combining an image and a speech are provided. The system includes a data analyzing module, a database, and a calculating module. A plurality of image-and-speech relation modules is stored in the database. Each image-and-speech relation module includes a feature extraction parameter and an image-and-speech relation parameter. The data analyzing module obtains a gesture image and a speech data corresponding to each other, and substitutes the gesture image and the speech data into each feature extraction parameter to generate image feature sequences and speech feature sequences. The data analyzing module uses each image-and-speech relation parameter to calculate image-and-speech status parameters.
    Type: Application
    Filed: December 9, 2009
    Publication date: May 12, 2011
    Inventors: Chung-Hsien Wu, Jen-Chun Lin, Wen-Li Wei, Chia-Te Chu, Red-Tom Lin, Chin-Shun Hsu
  • Patent number: 7941317
    Abstract: Systems and methods for low-latency real-time speech recognition/transcription. A discriminative feature extraction, such as a heteroscedastic discriminant analysis transform, in combination with a maximum likelihood linear transform is applied during front-end processing of a digital speech signal. The extracted features reduce the word error rate. A discriminative acoustic model is applied by generating state-level lattices using Maximum Mutual Information Estimation. Recognition networks of language models are replaced by their closure. Latency is reduced by eliminating segmentation such that a number of words/sentences can be recognized as a single utterance. Latency is further reduced by performing front-end normalization in a causal fashion.
    Type: Grant
    Filed: June 5, 2007
    Date of Patent: May 10, 2011
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Vincent Goffin, Michael Dennis Riley, Murat Saraclar
  • Patent number: 7930181
    Abstract: Systems and methods for low-latency real-time speech recognition/transcription. A discriminative feature extraction, such as a heteroscedastic discriminant analysis transform, in combination with a maximum likelihood linear transform is applied during front-end processing of a digital speech signal. The extracted features reduce the word error rate. A discriminative acoustic model is applied by generating state-level lattices using Maximum Mutual Information Estimation. Recognition networks of language models are replaced by their closure. Latency is reduced by eliminating segmentation such that a number of words/sentences can be recognized as a single utterance. Latency is further reduced by performing front-end normalization in a causal fashion.
    Type: Grant
    Filed: November 21, 2002
    Date of Patent: April 19, 2011
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Vincent Goffin, Michael Dennis Riley, Murat Saraclar