Specialized Equations Or Comparisons Patents (Class 704/236)

Correlation (Class 704/237)

Distance (Class 704/238)

Similarity (Class 704/239)

Probability (Class 704/240)

Dynamic time warping (Class 704/241)

Viterbi trellis (Class 704/242)

Spoken document retrieval system

Patent number: 8150678

Abstract: The present invention provides a spoken document retrieval system capable of high-speed and high-accuracy retrieval of where a user-specified keyword is uttered from spoken documents, even if the spoken documents are large in amount. Candidate periods are narrowed down in advance on the basis of a sequence of subwords generated from a keyword, and then the count values of the candidate periods containing the subwords are each calculated by adding up certain values. Through such simple process, the candidate periods are prioritized and then selected as retrieved results. In addition, the sequence of subwords generated from the keyword is complemented assuming that speech recognition errors occur, and then, candidate period generation and selection are performed on the basis of the complemented sequence of subwords.

Type: Grant

Filed: November 21, 2008

Date of Patent: April 3, 2012

Assignee: Hitachi, Ltd.

Inventor: Hirohiko Sagawa
Speech recognition system and method with cepstral noise subtraction

Patent number: 8150690

Abstract: The invention relates to a speech recognition system and method with cepstral noise subtraction. The speech recognition system and method utilize a first scalar coefficient, a second scalar coefficient, and a determining condition to limit the process for the cepstral feature vector, so as to avoid excessive enhancement or subtraction in the cepstral feature vector, so that the operation of the cepstral feature vector is performed properly to improve the anti-noise ability in speech recognition. Furthermore, the speech recognition system and method can be applied in any environment, and have a low complexity and can be easily integrated into other systems, so as to provide the user with a more reliable and stable speech recognition result.

Type: Grant

Filed: October 1, 2008

Date of Patent: April 3, 2012

Assignee: Industrial Technology Research Institute

Inventor: Shih-Ming Huang
Automatic setting of reminders in telephony using speech recognition

Patent number: 8145274

Abstract: Systems and methods for automatically setting reminders. A method for automatically setting reminders includes receiving utterances, determining whether the utterances match a stored phrase, and in response to determining that there is a match, automatically setting a reminder in a mobile communication device. Various filters can be applied to determine whether or not to set a reminder. Examples of suitable filters include location, date/time, callee's phone number, etc.

Type: Grant

Filed: May 14, 2009

Date of Patent: March 27, 2012

Assignee: International Business Machines Corporation

Inventors: Salil P. Gandhi, Saidas T. Kottawar, Mike V. Macias, Sandip D. Mahajan
Speech recognition method for all languages without using samples

Patent number: 8145483

Abstract: The invention can recognize any several languages at the same time without using samples. The important skill is that features of known words in any language are extracted from unknown words or continuous voices. These unknown words represented by matrices are spread in the 144-dimensional space. The feature of a known word of any language represented by a matrix is simulated by the surrounding unknown words. The invention includes 12 elastic frames of equal length without filter and without overlap to normalize the signal waveform of variable length for a word, which has one to several syllables, into a 12×12 matrix as a feature of the word. The invention can improve the feature such that the speech recognition of an unknown sentence is correct. The invention can correctly recognize any languages without samples, such as English, Chinese, German, French, Japanese, Korean, Russian, Cantonese, Taiwanese, etc.

Type: Grant

Filed: August 5, 2009

Date of Patent: March 27, 2012

Inventors: Tze Fen Li, Tai-Jan Lee Li, Shih-Tzung Li, Shih-Hon Li, Li-Chuan Liao
Enhancing analysis of test key phrases from acoustic sources with key phrase training models

Patent number: 8145482

Abstract: Methods and apparatus for the enhancement of speech to text engines, by providing indications to the correctness of the found words, based on additional sources besides the internal indication provided by the STT engine. The enhanced indications comprise sources of data such as acoustic features, CTI features, phonetic search and others. The apparatus and methods also enable the detection of important or significant keywords found in audio files, thus enabling more efficient usages, such as further processing or transfer of interactions to relevant agents, escalation of issues, or the like. The methods and apparatus employ a training phase in which word model and key phrase model are generated for determining an enhanced correctness indication for a word and an enhanced importance indication for a key phrase, based on the additional features.

Type: Grant

Filed: May 25, 2008

Date of Patent: March 27, 2012

Inventors: Ezra Daya, Oren Pereg, Yuval Lubowich, Moshe Wasserblat
Frame Erasure Concealment Technique for a Bitstream-Based Feature Extractor

Publication number: 20120072214

Abstract: A frame erasure concealment technique for a bitstream-based feature extractor in a speech recognition system particularly suited for use in a wireless communication system operates to “delete” each frame in which an erasure is declared. The deletions thus reduce the length of the observation sequence, but have been found to provide for sufficient speech recognition based on both single word and “string” tests of the deletion technique.

Type: Application

Filed: November 29, 2011

Publication date: March 22, 2012

Applicant: AT&T INTELLECTUAL PROPERTY II, L.P.

Inventors: Richard Vandervoort Cox, Hong Kook Kim
Method and apparatus for automatically recognizing audio data

Patent number: 8140329

Abstract: A method and apparatus are proposed for automatically recognizing observed audio data. An observation vector is created of audio features extracted from the observed audio data and the observed audio data is recognized from the observation vector. The audio features include features are selected from a group of 3 types of features obtained from the observed audio data: (i) ICA features obtained by processing the observed audio data, (ii) first MFCC features obtained by removing a logarithm step from the conventional MFCC process, or (iii) second MFCC features obtained by applying the ICA process to results of a mel scale filter bank.

Type: Grant

Filed: April 5, 2004

Date of Patent: March 20, 2012

Assignee: Sony Corporation

Inventors: Jian Zhang, Wei Lu, Xiaobing Sun
System and method for determining the audio fidelity of calls made on a cellular network using frame error rate and pilot signal strength

Patent number: 8140069

Abstract: The present invention provides a method and system for defining the mean opinion score (MOS) as a function of frame error rate (FER) and pilot signal strength. In an embodiment of the invention, an entity receives MOS scores that have been obtained using subjective tests for certain calls made within the network. Next, the entity receives FER and pilot signal strength samples that have been obtained for the calls for which MOS scores have been subjectively obtained. Finally, the entity calculates an equation for the MOS as a function of FER and pilot signal strength using a non-linear regression analysis.

Type: Grant

Filed: June 12, 2008

Date of Patent: March 20, 2012

Assignee: Sprint Spectrum L.P.

Inventors: Abhishek Lall, Ashish Bhan, Sachin Vargantwar, Robert Stedman, Mark Yarkosky
User intention based on N-best list of recognition hypotheses for utterances in a dialog

Patent number: 8140328

Abstract: Disclosed herein are systems, computer-implemented methods, and tangible computer-readable media for using alternate recognition hypotheses to improve whole-dialog understanding accuracy. The method includes receiving an utterance as part of a user dialog, generating an N-best list of recognition hypotheses for the user dialog turn, selecting an underlying user intention based on a belief distribution across the generated N-best list and at least one contextually similar N-best list, and responding to the user based on the selected underlying user intention. Selecting an intention can further be based on confidence scores associated with recognition hypotheses in the generated N-best lists, and also on the probability of a user's action given their underlying intention. A belief or cumulative confidence score can be assigned to each inferred user intention.

Type: Grant

Filed: December 1, 2008

Date of Patent: March 20, 2012

Assignee: AT&T Intellectual Property I, L.P.

Inventor: Jason Williams
Speech detection

Patent number: 8131543

Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving an audio signal, determining an energy-independent component of a portion of the audio signal associated with a spectral shape of the portion, and determining an energy-dependent component of the portion associated with a gain level of the portion. The method also comprises comparing the energy-independent and energy-dependent components to a speech model, comparing the energy-independent and energy-dependent components to a noise model, and outputting an indication whether the portion of the audio signal more closely corresponds to the speech model or to the noise model based on the comparisons.

Type: Grant

Filed: April 14, 2008

Date of Patent: March 6, 2012

Assignee: Google Inc.

Inventors: Ron J. Weiss, Trausti Kristjansson
System for distinguishing desired audio signals from noise

Patent number: 8131544

Abstract: A system distinguishes a primary audio source and background noise to improve the quality of an audio signal. A speech signal from a microphone may be improved by identifying and dampening background noise to enhance speech. Stochastic models may be used to model speech and to model background noise. The models may determine which portions of the signal are speech and which portions are noise. The distinction may be used to improve the signal's quality, and for speaker identification or verification.

Type: Grant

Filed: November 12, 2008

Date of Patent: March 6, 2012

Assignee: Nuance Communications, Inc.

Inventors: Tobias Herbig, Oliver Gaupp, Franz Gerl
Audio signal interpolation method and audio signal interpolation apparatus

Patent number: 8126162

Abstract: An audio signal interpolation apparatus is configured to perform interpolation processing on the basis of audio signals preceding and/or following a predetermined segment on a time axis so as to obtain an audio signal corresponding to the predetermined segment. The audio signal interpolation apparatus includes a waveform formation unit configured to form a waveform for the predetermined segment on the basis of time-domain samples of the preceding and/or the following audio signals and a power control unit configured to control power of the waveform for the predetermined segment formed by the waveform formation unit using a non-linear model selected on the basis of the preceding audio signal when the power of the preceding audio signal is larger than that of the following audio signal, or the following audio signal when the power of the preceding audio signal is smaller than that of the following audio signal.

Type: Grant

Filed: May 23, 2007

Date of Patent: February 28, 2012

Assignee: Sony Corporation

Inventors: Chunmao Zhang, Toru Chinen
MULTIMODAL AGGREGATING UNIT

Publication number: 20120046945

Abstract: In a voice processing system, a multimodal request is received from a plurality of modality input devices, and the requested application is run to provide a user with the feedback of the multimodal request. In the voice processing system, a multimodal aggregating unit is provided which receives a multimodal input from a plurality of modality input devices, and provides an aggregated result to an application control based on the interpretation of the interaction ergonomics of the multimodal input within the temporal constraints of the multimodal input. Thus, the multimodal input from the user is recognized within a temporal window. Interpretation of the interaction ergonomics of the multimodal input include interpretation of interaction biometrics and interaction mechani-metrics, wherein the interaction input of at least one modality may be used to bring meaning to at least one other input of another modality.

Type: Application

Filed: September 23, 2011

Publication date: February 23, 2012

Applicant: Nuance Communications, Inc.

Inventors: Alexander Faisman, Dimitri Kanevsky, David Nahamoo, Roberto Sicconi, Mahesh Viswanathan
Automatic level control of speech signals

Patent number: 8121835

Abstract: Automatic level control of speech portions of an audio signal is provided. An audio signal is received in the form of a sequence of samples and may contain speech portion and non-speech portions. The sequence of samples is divided into a sequence of sub-frames. Multiple sub-frames adjacent to a present sub-frame are examined to determine a peak value of samples in the sub-frames. A gain factor is computed for the present sub-frame based on the peak value and a desired maximum value for said speech portion, and each sample in the present sub-frame is amplified by the gain factor. In an embodiment, variations in filtered energy values of multiple sub-frames enable determination of whether a sub-frame corresponds to a speech or non-speech/noise portion.

Type: Grant

Filed: March 6, 2008

Date of Patent: February 21, 2012

Assignee: Texas Instruments Incorporated

Inventor: Fitzgerald John Archibald
Adjusting a speech engine for a mobile computing device based on background noise

Patent number: 8121837

Abstract: Methods, apparatus, and products are disclosed for adjusting a speech engine for a mobile computing device based on background noise, the mobile computing device operatively coupled to a microphone, that include: sampling, through the microphone, background noise for a plurality of operating environments in which the mobile computing device operates; generating, for each operating environment, a noise model in dependence upon the sampled background noise for that operating environment; and configuring the speech engine for the mobile computing device with the noise model for the operating environment in which the mobile computing device currently operates.

Type: Grant

Filed: April 24, 2008

Date of Patent: February 21, 2012

Assignee: Nuance Communications, Inc.

Inventors: Ciprian Agapi, William K. Bodin, Charles W. Cross, Jr., Paritosh D. Patel
Mixing of input data streams and generation of an output data stream therefrom

Patent number: 8116486

Abstract: An apparatus for mixing a plurality of input data streams is described, which has a processing unit adapted to compare the frames of the plurality of input data streams, and determine, based on the comparison, for a spectral component of an output frame of an output data stream, exactly one input data stream of the plurality of input data streams. The output data stream is generated by copying at least a part of an information of a corresponding spectral component of the frame of the determined data stream. Further or alternatively, the control values of the frames of the first and second input data streams are compared, and, if so, the control value is adopted.

Type: Grant

Filed: March 4, 2009

Date of Patent: February 14, 2012

Assignee: Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.

Inventors: Markus Schnell, Manfred Lutzky, Markus Multrus
Systems and Methods for Assessment of Non-Native Speech Using Vowel Space Characteristics

Publication number: 20120016672

Abstract: Computer-implemented systems and methods are provided for assessing non-native speech proficiency. A non-native speech sample is processed to identify a plurality of vowel sound boundaries in the non-native speech sample. Portions of the non-native speech sample are analyzed within the vowel sound boundaries to extract vowel characteristics. The vowel characteristics are used to identify a plurality of vowel space metrics for the non-native speech sample, and the vowel space metrics are used to determine a non-native speech proficiency score for the non-native speech sample.

Type: Application

Filed: July 14, 2011

Publication date: January 19, 2012

Inventors: Lei Chen, Keelan Evanini, Xie Sun
Robot apparatus with vocal interactive function and method therefor

Patent number: 8095373

Abstract: The present invention provides a robot apparatus with a vocal interactive function. The robot apparatus receives a vocal input, and recognizes the vocal input. The robot apparatus stores a plurality of output data, a last output time of each of the output data, and a weighted value of each of the output data. The robot apparatus outputs output data according to the weighted values of all the output data corresponding to the vocal input, and updates the last output time of the output data. The robot apparatus calculates the weighted values of all the output data corresponding to the vocal input according to the last output time. Consequently, the robot apparatus may output different and variable output data when receiving the same vocal input. The present invention also provides a vocal interactive method adapted for the robot apparatus.

Type: Grant

Filed: August 19, 2008

Date of Patent: January 10, 2012

Assignee: Hon Hai Precision Industry Co., Ltd.

Inventors: Tsu-Li Chiang, Chuan-Hong Wang, Kuo-Pao Hung, Kuan-Hong Hsieh
Method and system for predicting understanding errors in a task classification system

Patent number: 8095363

Abstract: A method and system for monitoring an automated dialog system for the automatic recognition of language understanding errors based on a user's input communications in a task classification system. If the user's input communication cannot be understood and a task classification decision cannot be made, then further dialog may be conducted with the user if a probability of understanding the user's input communication exceeds a first threshold. Otherwise, the user may be directed to a human for assistance. In another possible embodiment, the method operates as above except that if the probability exceeds a second threshold, then further dialog may be conducted with the user using the current dialog strategy. However, if the probability falls between a first threshold and a second threshold, the dialog strategy may be adapted in order to improve the chances of conducting a successful dialog with the user.

Type: Grant

Filed: January 6, 2009

Date of Patent: January 10, 2012

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Allen Louis Gorin, Irene Langkilde Geary, Marilyn Ann Walker, Jeremy H. Wright
Frame erasure concealment technique for a bitstream-based feature extractor

Patent number: 8090581

Abstract: A frame erasure concealment technique for a bitstream-based feature extractor in a speech recognition system particularly suited for use in a wireless communication system operates to “delete” each frame in which an erasure is declared. The deletions thus reduce the length of the observation sequence, but have been found to provide for sufficient speech recognition based on both single word and “string” tests of the deletion technique.

Type: Grant

Filed: August 19, 2009

Date of Patent: January 3, 2012

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Richard Vandervoort Cox, Hong Kook Kim
Testing a grammar used in speech recognition for reliability in a plurality of operating environments having different background noise

Patent number: 8082148

Abstract: Methods, systems, and products for testing a grammar used in speech recognition for reliability in a plurality of operating environments having different background noise that include: receiving recorded background noise for each of the plurality of operating environments; generating a test speech utterance for recognition by a speech recognition engine using a grammar; mixing the test speech utterance with each recorded background noise, resulting in a plurality of mixed test speech utterances, each mixed test speech utterance having different background noise; performing, for each of the mixed test speech utterances, speech recognition using the grammar and the mixed test speech utterance, resulting in speech recognition results for each of the mixed test speech utterances; and evaluating, for each recorded background noise, speech recognition reliability of the grammar in dependence upon the speech recognition results for the mixed test speech utterance having that recorded background noise.

Type: Grant

Filed: April 24, 2008

Date of Patent: December 20, 2011

Assignee: Nuance Communications, Inc.

Inventors: Ciprian Agapi, William K. Bodin, Charles W. Cross, Jr., Michael H. Mirt
Apparatus for creating speaker model, and computer program product

Patent number: 8078462

Abstract: A transformation-parameter calculating unit calculates a first model parameter indicating a parameter of a speaker model for causing a first likelihood for a clean feature to maximum, and calculates a transformation parameter for causing the first likelihood to maximum. The transformation parameter transforms, for each of the speakers, a distribution of the clean feature corresponding to the identification information of the speaker to a distribution represented by the speaker model of the first model parameter. A model-parameter calculating unit transforms a noisy feature corresponding to identification information for each of speakers by using the transformation parameter, and calculates a second model parameter indicating a parameter of the speaker model for causing a second likelihood for the transformed noisy feature to maximum.

Type: Grant

Filed: October 2, 2008

Date of Patent: December 13, 2011

Assignee: Kabushiki Kaisha Toshiba

Inventors: Yusuke Shinohara, Masami Akamine
Display of channel candidates from voice recognition results for a plurality of receiving units

Patent number: 8069041

Abstract: Candidates for channels of television programs to be displayed are determined in accordance with a result of voice recognition of a voice input by a user. The channel candidates are assigned to a limited number of tuners and television programs received by the tuners are displayed to allow the user to make a selection.

Type: Grant

Filed: October 13, 2006

Date of Patent: November 29, 2011

Assignee: Canon Kabushiki Kaisha

Inventors: Hideo Kuboyama, Masayuki Yamada
Single-Sided Speech Quality Measurement

Publication number: 20110288865

Abstract: A non-intrusive speech quality estimation technique is based on statistical or probability models such as Gaussian Mixture Models (“GMMs”). Perceptual features are extracted from the received speech signal and assessed by an artificial reference model formed using statistical models. The models characterize the statistical behavior of speech features. Consistency measures between the input speech features and the models are calculated to form indicators of speech quality. The consistency values are mapped to a speech quality score using a mapping optimized using machine learning algorithms, such as Multivariate Adaptive Regression Splines (“MARS”). The technique provides competitive or better quality estimates relative to known techniques while having lower computational complexity.

Type: Application

Filed: August 1, 2011

Publication date: November 24, 2011

Inventors: Wai-Yip Chan, Tiago H. Falk, Qingfeng Xu
PARAMETER LEARNING IN A HIDDEN TRAJECTORY MODEL

Publication number: 20110270610

Abstract: Parameters for distributions of a hidden trajectory model including means and variances are estimated using an acoustic likelihood function for observation vectors as an objection function for optimization. The estimation includes only acoustic data and not any intermediate estimate on hidden dynamic variables. Gradient ascent methods can be developed for optimizing the acoustic likelihood function.

Type: Application

Filed: July 14, 2011

Publication date: November 3, 2011

Applicant: MICROSOFT CORPORATION

Inventors: Li Deng, Dong Yu, Xiaolong Li, Alejandro Acero
Method and system of optimal selection strategy for statistical classifications in dialog systems

Patent number: 8050929

Abstract: An optimal selection or decision strategy is described through an example that includes use in dialog systems. The selection strategy or method includes receiving multiple predictions and multiple probabilities. The received predictions predict the content of a received input and each of the probabilities corresponds to one of the predictions. In an example dialog system, the received input includes an utterance. The selection method includes dynamically selecting a set of predictions from the received predictions by generating ranked predictions. The ranked predictions are generated by ordering the plurality of predictions according to descending probability.

Type: Grant

Filed: August 24, 2007

Date of Patent: November 1, 2011

Assignee: Robert Bosch GmbH

Inventors: Junling Hu, Fabrizio Morbini, Fuliang Weng, Xue Liu
Topic specific models for text formatting and speech recognition

Patent number: 8041566

Abstract: The present invention relates to a method, a computer system and a computer program product for speech recognition and/or text formatting by making use of topic specific statistical models. A text document which may be obtained from a first speech recognition pass is subject to segmentation and to an assignment of topic specific models for each obtained section. Each model of the set of models provides statistic information about language model probabilities, about text processing or formatting rules, as e.g. the interpretation of commands for punctuation, formatting, text highlighting or of ambiguous text portions requiring specific formatting, as well as a specific vocabulary being characteristic for each section of the recognized text. Furthermore, other properties of a speech recognition and/or formatting system (such as e.g. settings for the speaking rate) may be encoded in the statistical models. The models themselves are generated on the basis of annotated training data and/or by manual coding.

Type: Grant

Filed: November 12, 2004

Date of Patent: October 18, 2011

Assignee: Nuance Communications Austria GmbH

Inventors: Jochen Peters, Evgeny Matusov, Carsten Meyer, Dietrich Klakow
Method and system for identifying and correcting accent-induced speech recognition difficulties

Patent number: 8036893

Abstract: A system for use in speech recognition includes an acoustic module accessing a plurality of distinct-language acoustic models, each based upon a different language; a lexicon module accessing at least one lexicon model; and a speech recognition output module. The speech recognition output module generates a first speech recognition output using a first model combination that combines one of the plurality of distinct-language acoustic models with the at least one lexicon model. In response to a threshold determination, the speech recognition output module generates a second speech recognition output using a second model combination that combines a different one of the plurality of distinct-language acoustic models with the at least one distinct-language lexicon model.

Type: Grant

Filed: July 22, 2004

Date of Patent: October 11, 2011

Assignee: Nuance Communications, Inc.

Inventor: David E. Reich
Closed-loop command and response system for automatic communications between interacting computer systems over an audio communications channel

Patent number: 8032373

Abstract: A system and method for enabling two computer systems to communicate over an audio communications channel, such as a voice telephony connection. Such a system includes a software application that enables a user's computer to call, interrogate, download, and manage a voicemail account stored on a telephone company's computer, without human intervention. A voicemail retrieved from the telephone company's computer can be stored in a digital format on the user's computer. In such a format, the voicemail can be readily archived, or even distributed throughout a network, such as the Internet, in a digital form, such as an email attachment. Preferably a computationally efficient audio recognition algorithm is employed by the user's computer to respond to and navigate the automated audio menu of the telephone company's computer.

Type: Grant

Filed: February 28, 2007

Date of Patent: October 4, 2011

Assignee: Intellisist, Inc.

Inventor: Martin R. M. Dunsmuir
SYSTEM AND METHOD FOR PROVIDING AN ACOUSTIC GRAMMAR TO DYNAMICALLY SHARPEN SPEECH INTERPRETATION

Publication number: 20110231188

Abstract: The system and method described herein may provide an acoustic grammar to dynamically sharpen speech interpretation. In particular, the acoustic grammar may be used to map one or more phonemes identified in a user verbalization to one or more syllables or words, wherein the acoustic grammar may have one or more linking elements to reduce a search space associated with mapping the phonemes to the syllables or words. As such, the acoustic grammar may be used to generate one or more preliminary interpretations associated with the verbalization, wherein one or more post-processing techniques may then be used to sharpen accuracy associated with the preliminary interpretations. For example, a heuristic model may assign weights to the preliminary interpretations based on context, user profiles, or other knowledge and a probable interpretation may be identified based on confidence scores associated with one or more candidate interpretations generated with the heuristic model.

Type: Application

Filed: June 1, 2011

Publication date: September 22, 2011

Applicant: VoiceBox Technologies, Inc.

Inventors: Robert A. Kennewick, Min Ke, Michael Tjalve, Philippe Di Cristo
AUTOMATIC SPEECH RECOGNITION BASED UPON INFORMATION RETRIEVAL METHODS

Publication number: 20110224982

Abstract: Described is a technology in which information retrieval (IR) techniques are used in a speech recognition (ASR) system. Acoustic units (e.g., phones, syllables, multi-phone units, words and/or phrases) are decoded, and features found from those acoustic units. The features are then used with IR techniques (e.g., TF-IDF based retrieval) to obtain a target output (a word or words).

Type: Application

Filed: March 12, 2010

Publication date: September 15, 2011

Applicant: c/o Microsoft Corporation

Inventors: Alejandro Acero, James Garnet Droppo, III, Xiaoqiang Xiao, Geoffrey G. Zweig
Method and apparatus for voice searching for stored content using uniterm discovery

Patent number: 8015005

Abstract: A method, system and communication device for enabling voice-to-voice searching and ordered content retrieval via audio tags assigned to individual content, which tags generate uniterms that are matched against components of a voice query. The method includes storing content and tagging at least one of the content with an audio tag. The method further includes receiving a voice query to retrieve content stored on the device. When the voice query is received, the method completes a voice-to-voice search utilizing uniterms of the audio tag, scored against the phoneme latent lattice model generated by the voice query to identify matching terms within the audio tags and corresponding stored content. The retrieved content(s) associated with the identified audio tags having uniterms that score within the phoneme lattice model are outputted in an order corresponding to an order in which the uniterms are structured within the voice query.

Type: Grant

Filed: February 15, 2008

Date of Patent: September 6, 2011

Assignee: Motorola Mobility, Inc.

Inventor: Changxue Ma
Systems and methods for processing natural language speech utterances with context-specific domain agents

Patent number: 8015006

Abstract: Systems and methods for receiving natural language queries and/or commands and execute the queries and/or commands. The systems and methods overcomes the deficiencies of prior art speech query and response systems through the application of a complete speech-based information query, retrieval, presentation and command environment. This environment makes significant use of context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for one or more users making queries or commands in multiple domains. Through this integrated approach, a complete speech-based natural language query and response environment can be created. The systems and methods creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context and presenting the expected results for a particular question or command.

Type: Grant

Filed: May 30, 2008

Date of Patent: September 6, 2011

Assignee: VoiceBox Technologies, Inc.

Inventors: Robert A. Kennewick, David Locke, Michael R. Kennewick, Sr., Michael R. Kennewick, Jr., Richard Kennewick, Tom Freeman
METHOD OF ANALYSING AN AUDIO SIGNAL

Publication number: 20110213614

Abstract: A method of analysing an audio signal is disclosed. A digital representation of an audio signal is received and a first output function is generated based on a response of a physiological model to the digital representation. At least one property of the first output function may be determined. One or more values are determined for use in analysing the audio signal, based on the determined property of the first output function.

Type: Application

Filed: September 11, 2009

Publication date: September 1, 2011

Applicant: NEWSOUTH INNOVATIONS PTY LIMITED

Inventors: Wenliang Lu, Dipanjan Sen
Method and apparatus for voice activity detection, and encoder

Patent number: 7996215

Abstract: A method and an apparatus for Voice Activity Detection (VAD) and an encoder are provided. The method for VAD includes: acquiring a fluctuant feature value of a background noise when an input signal is the background noise, in which the fluctuant feature value is used to represent fluctuation of the background noise; performing adaptive adjustment on a VAD decision criterion related parameter according to the fluctuant feature value; and performing VAD decision on the input signal by using the decision criterion related parameter on which the adaptive adjustment is performed. The method, the apparatus, and the encoder can be adaptive to fluctuation of the background noise to perform VAD decision, so as to enhance the VAD decision performance, save limited channel bandwidth resources, and use the channel bandwidth efficiently.

Type: Grant

Filed: April 13, 2011

Date of Patent: August 9, 2011

Assignee: Huawei Technologies Co., Ltd.

Inventors: Zhe Wang, Qing Zhang
User adaptive speech recognition method and apparatus

Patent number: 7996218

Abstract: A user adaptive speech recognition method and apparatus is disclosed that controls user confirmation of a recognition candidate using a new threshold value adapted to a user. The user adaptive speech recognition method includes calculating a confidence score of a recognition candidate according to the result of speech recognition, setting a new threshold value adapted to the user based on a result of user confirmation of the recognition candidate and the confidence score of the recognition candidate, and outputting a corresponding recognition candidate as a result of the speech recognition if the calculated confidence score is higher than the new threshold value. Thus, the need for user confirmation of the result of speech recognition is reduced and the probability of speech recognition success is increased.

Type: Grant

Filed: February 16, 2006

Date of Patent: August 9, 2011

Assignee: Samsung Electronics Co., Ltd.

Inventors: Jung-eun Kim, Jeong-su Kim
System and method for providing a compensated speech recognition model for speech recognition

Patent number: 7996220

Abstract: An automatic speech recognition (ASR) system and method is provided for controlling the recognition of speech utterances generated by an end user operating a communications device. The ASR system and method can be used with a communications device that is used in a communications network. The ASR system can be used for ASR of speech utterances input into a mobile device, to perform compensating techniques using at least one characteristic and for updating an ASR speech recognizer associated with the ASR system by determined and using a background noise value and a distortion value that is based on the features of the mobile device. The ASR system can be used to augment a limited data input capability of a mobile device, for example, caused by limited input devices physically located on the mobile device.

Type: Grant

Filed: November 4, 2008

Date of Patent: August 9, 2011

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Richard C. Rose, Sarangarajan Pathasarathy, Aaron Edward Rosenberg, Shrikanth Sambasivan Narayanan
SYSTEM AND METHOD FOR MEASURING SPEECH CHARACTERISTICS

Publication number: 20110191104

Abstract: A method for measuring a disparity between two speech samples is disclosed that may include determining upon a speech granularity level at which to compare the rhythm of a student speech sample and a reference speech sample; determining a duration disparity between a first speech unit and a second, non-adjacent speech unit in the student speech sample; determining a duration disparity between a first speech unit and a second, non-adjacent speech unit in the reference speech sample; and calculating the difference between the student speech-unit duration disparity and the reference speech-unit disparity.

Type: Application

Filed: January 29, 2010

Publication date: August 4, 2011

Applicant: Rosetta Stone, Ltd.

Inventors: Joseph Tepperman, Theban Stanley, Kadri Hacioglu
Method, module, device and server for voice recognition

Patent number: 7983911

Abstract: The invention relates to a speech recognition process implemented in at least one terminal (114), the speech recognition process using a language model (311), comprising the following steps: detection (502) of at least one unrecognized expression in one of the terminals; recording (503) in the terminal of data representative of the unrecognized expression (309); transmission (603) by the terminal of the recorded data to a remote server (116); analysis, (803) at the level of the remote server, of the data and generation (805) of information for correcting the said language model taking account of at least one part of the unrecognized expression; and transmission (806) from the server to at least one terminal (114, 117, 118) of the correcting information, so as to allow future recognition of at least certain of the unrecognized expressions. The invention also relates to corresponding modules, devices (102) and a remote server (116).

Type: Grant

Filed: February 12, 2002

Date of Patent: July 19, 2011

Assignee: Thomson Licensing

Inventors: Frédéric Soufflet, Nour-Eddine Tazine
Human Voice Distinguishing Method and Device

Publication number: 20110166857

Abstract: A human voice distinguishing method and device are provided. The method involves: taking every n sampling points of the current frame of audio signals as one subsection, wherein n is a positive integer, judging whether two adjacent subsections have transition relative to a distinguishing threshold, wherein the sliding maximum absolute value of the two adjacent subsections is more and less than the distinguishing threshold respectively, if so, then determining the current frame to be human voice, where the sliding maximum absolute value of the subsection is obtained by the following method: taking the maximum value of absolute intensity of every sampling point in this subsection as the initial maximum absolute value of this subsection, and taking the maximum value of the initial maximum absolute value of this subsection and m subsections following this subsection as the sliding maximum absolute value of this subsection, wherein m is a positive integer.

Type: Application

Filed: September 15, 2009

Publication date: July 7, 2011

Applicant: ACTIONS SEMICONDUCTOR CO. LTD.

Inventors: Xiangyong Xie, Zhan Chen
System and method for personalized text-to-voice synthesis

Patent number: 7974392

Abstract: A communication device and method are provided for audibly outputting a received text message to a user, the text message being received from a sender. A text message to present audibly is received. An output voice to present the text message is retrieved, wherein the output voice is synthesized using predefined voice characteristic information to represent the sender's voice. The output voice is used to audibly present the text message to the user.

Type: Grant

Filed: March 2, 2010

Date of Patent: July 5, 2011

Assignee: Research In Motion Limited

Inventor: Eric Ng
System and method for independently recognizing and selecting actions and objects in a speech recognition system

Patent number: 7966176

Abstract: A system includes an acoustic input engine configured to accept a speech input, to recognize phonemes of the speech input, and to create word strings based on the recognized phonemes. The system includes a semantic engine coupled to the acoustic engine and operable to identify actions and to identify objects by parsing the word strings. The system also includes an action-object pairing system to identify a dominant entry from the identified actions and the identified objects, to select a complement to the dominant entry from the identified actions and the identified objects, and to form an action-object pair that includes the dominant entry and the complement. The system further includes an action-object routing table operable to provide a routing destination based on the action-object pair. The system also includes a call routing module to route a call to the routing destination.

Type: Grant

Filed: October 22, 2009

Date of Patent: June 21, 2011

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Robert R. Bushey, Michael Sabourin, Carl Potvin, Benjamin Anthony Knott, John Mills Martin
Multiplying confidence scores for utterance verification in a mobile telephone

Patent number: 7966183

Abstract: Automatic speech recognition verification using a combination of two or more confidence scores based on UV features which reuse computations of the original recognition.

Type: Grant

Filed: May 4, 2007

Date of Patent: June 21, 2011

Assignee: Texas Instruments Incorporated

Inventors: Kaisheng Yao, Lorin Paul Netsch, Vishu Viswanathan
SYSTEM AND METHOD FOR TRAINING ADAPTATION-SPECIFIC ACOUSTIC MODELS FOR AUTOMATIC SPEECH RECOGNITION

Publication number: 20110137650

Abstract: Disclosed herein are systems, methods, and computer-readable storage media for training adaptation-specific acoustic models. A system practicing the method receives speech and generates a full size model and a reduced size model, the reduced size model starting with a single distribution for each speech sound in the received speech. The system finds speech segment boundaries in the speech using the full size model and adapts features of the speech data using the reduced size model based on the speech segment boundaries and an overall centroid for each speech sound. The system then recognizes speech using the adapted features of the speech. The model can be a Hidden Markov Model (HMM). The reduced size model can also be of a reduced complexity, such as having fewer mixture components than a model of full complexity. Adapting features of speech can include moving the features closer to an overall feature distribution center.

Type: Application

Filed: December 8, 2009

Publication date: June 9, 2011

Applicant: AT&T Intellectual Property I, L.P.

Inventor: Andrej LJOLJE
System and method of spoken language understanding using word confusion networks

Patent number: 7957971

Abstract: Word lattices that are generated by an automatic speech recognition system are used to generate a modified word lattice that is usable by a spoken language understanding module. In one embodiment, the spoken language understanding module determines a set of salient phrases by calculating an intersection of the modified word lattice, which is optionally preprocessed, and a finite state machine that includes a plurality of salient grammar fragments.

Type: Grant

Filed: June 12, 2009

Date of Patent: June 7, 2011

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Allen Louis Gorin, Dilek Z. Hakkani-Tur, Giuseppe Riccardi, Gokhan Tur, Jeremy Huntley Wright
Active labeling for spoken language understanding

Patent number: 7949525

Abstract: A spoken language understanding method and system are provided. The method includes classifying a set of labeled candidate utterances based on a previously trained classifier, generating classification types for each candidate utterance, receiving confidence scores for the classification types from the trained classifier, sorting the classified utterances based on an analysis of the confidence score of each candidate utterance compared to a respective label of the candidate utterance, and rechecking candidate utterances according to the analysis. The system includes modules configured to control a processor in the system to perform the steps of the method.

Type: Grant

Filed: June 16, 2009

Date of Patent: May 24, 2011

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Dilek Z. Hakkani-Tur, Mazin G. Rahim, Gokhan Tur
Speech recognition correction with standby-word dictionary

Patent number: 7949524

Abstract: At least one recognized keyword is presented to a user, and the keyword is corrected appropriately upon receipt of a correction for the presented result. A standby-word dictionary used for recognition of uttered speech is generated according to a result of the correction to recognize the uttered speech. Therefore, even if recognized keywords contain an error, the error can be corrected and uttered speech can be accurately recognized.

Type: Grant

Filed: November 13, 2007

Date of Patent: May 24, 2011

Assignee: Nissan Motor Co., Ltd.

Inventors: Daisuke Saitoh, Keiko Katsuragawa, Minoru Tomikashi, Takeshi Ono, Eiji Tozuka
BEHAVIOR RECOGNITION SYSTEM AND METHOD BY COMBINING IMAGE AND SPEECH

Publication number: 20110109539

Abstract: A behavior recognition system and method by combining an image and a speech are provided. The system includes a data analyzing module, a database, and a calculating module. A plurality of image-and-speech relation modules is stored in the database. Each image-and-speech relation module includes a feature extraction parameter and an image-and-speech relation parameter. The data analyzing module obtains a gesture image and a speech data corresponding to each other, and substitutes the gesture image and the speech data into each feature extraction parameter to generate image feature sequences and speech feature sequences. The data analyzing module uses each image-and-speech relation parameter to calculate image-and-speech status parameters.

Type: Application

Filed: December 9, 2009

Publication date: May 12, 2011

Inventors: Chung-Hsien Wu, Jen-Chun Lin, Wen-Li Wei, Chia-Te Chu, Red-Tom Lin, Chin-Shun Hsu
Low latency real-time speech transcription

Patent number: 7941317

Abstract: Systems and methods for low-latency real-time speech recognition/transcription. A discriminative feature extraction, such as a heteroscedastic discriminant analysis transform, in combination with a maximum likelihood linear transform is applied during front-end processing of a digital speech signal. The extracted features reduce the word error rate. A discriminative acoustic model is applied by generating state-level lattices using Maximum Mutual Information Estimation. Recognition networks of language models are replaced by their closure. Latency is reduced by eliminating segmentation such that a number of words/sentences can be recognized as a single utterance. Latency is further reduced by performing front-end normalization in a causal fashion.

Type: Grant

Filed: June 5, 2007

Date of Patent: May 10, 2011

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Vincent Goffin, Michael Dennis Riley, Murat Saraclar
Low latency real-time speech transcription

Patent number: 7930181

Abstract: Systems and methods for low-latency real-time speech recognition/transcription. A discriminative feature extraction, such as a heteroscedastic discriminant analysis transform, in combination with a maximum likelihood linear transform is applied during front-end processing of a digital speech signal. The extracted features reduce the word error rate. A discriminative acoustic model is applied by generating state-level lattices using Maximum Mutual Information Estimation. Recognition networks of language models are replaced by their closure. Latency is reduced by eliminating segmentation such that a number of words/sentences can be recognized as a single utterance. Latency is further reduced by performing front-end normalization in a causal fashion.

Type: Grant

Filed: November 21, 2002

Date of Patent: April 19, 2011

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Vincent Goffin, Michael Dennis Riley, Murat Saraclar

prev … 5 6 7 8 9 10 11 12 13 … next