Subportions Patents (Class 704/254)
  • Patent number: 8364486
    Abstract: A speech recognition system includes a mobile device and a remote server. The mobile device receives the speech from the user and extracts the features and phonemes from the speech. Selected phonemes, supplemental non-verbal information not based upon or derived from the speech, and measures of uncertainty are transmitted to the server, the phonemes are processed for speech understanding remotely or locally based, respectively, on whether or not the audible speech has a high or low recognition uncertainty, and a text of the speech (or the context or understanding of the speech) is transmitted back to the mobile device.
    Type: Grant
    Filed: March 12, 2009
    Date of Patent: January 29, 2013
    Assignee: Intelligent Mechatronic Systems Inc.
    Inventors: Otman A. Basir, William Ben Miners
  • Patent number: 8364477
    Abstract: A method (400, 500) and apparatus (220) seeks to improve the intelligibility of speech emitted into a noisy environment. Formants are identified (426) and perceptual frequency scale band is selected (502) that includes at least one of the identified formants. The SNR in each band is compared (504) to a threshold and, if the SNR for that band is less than the threshold, the method increases a formant enhancement gain for that band. A set of high pass filter gains (338) is combined (516) with the formant enhancement gains yielding combined gains that are then clipped (518), scaled (520) according to a total SNR, normalized (526), smoothed across time (530) and frequency (532), and used to reconstruct (532, 534) an audio signal.
    Type: Grant
    Filed: August 30, 2012
    Date of Patent: January 29, 2013
    Assignee: Motorola Mobility LLC
    Inventors: Jianming J Song, John C Johnson
  • Patent number: 8359201
    Abstract: Systems, methods, and apparatuses including computer program products are provided for encoding and using a language model. In one implementation, a method is provided. The method includes generating a compact language model, including receiving a collection of n-grams, each n-gram having one or more associated parameter values, determining a fingerprint for each n-gram of the collection of n-grams, identifying locations in an array for each n-gram using a plurality of hash functions, and encoding the one or more parameter values associated with each n-gram in the identified array locations as a function of corresponding array values and the fingerprint for the n-gram.
    Type: Grant
    Filed: June 5, 2012
    Date of Patent: January 22, 2013
    Assignee: Google, Inc.
    Inventors: David Talbot, Thorsten Brants
  • Patent number: 8359200
    Abstract: Generating a profile of a word, including: receiving the word including at least one phoneme; determining the at least one phoneme in the word; selecting at least one characteristic for each of the determined at least one phoneme in the word using a dictionary of sound symbolism rules; and generating the profile of the word by combining all of the selected at least one characteristic for the at least one phoneme.
    Type: Grant
    Filed: September 22, 2011
    Date of Patent: January 22, 2013
    Assignee: Sony Online Entertainment LLC
    Inventor: Patrick McCuller
  • Patent number: 8352266
    Abstract: The invention provides a system and method for improving speech recognition. A computer software system is provided for implementing the system and method. A user of the computer software system may speak to the system directly and the system may respond, in spoken language, with an appropriate response. Grammar rules may be generated automatically from sample utterances when implementing the system for a particular application. Dynamic grammar rules may also be generated during interaction between the user and the system. In addition to arranging searching order of grammar files based on a predetermined hierarchy, a dynamically generated searching order based on history of contexts of a single conversation may be provided for further improved speech recognition.
    Type: Grant
    Filed: March 8, 2011
    Date of Patent: January 8, 2013
    Assignee: Inago Corporation
    Inventors: Gary Farmaner, Ron DiCarlantonio, Huw Leonard
  • Patent number: 8345830
    Abstract: An audio indication of a recipient for a message is received, the message to be sent by a user to the recipient. An electronic database is searched for the recipient. When the recipient is found in the electronic database, information is determined from the electronic database concerning the recipient. An audio prompt is formed including at least some of the information concerning the recipient that was obtained from the electronic database.
    Type: Grant
    Filed: June 25, 2009
    Date of Patent: January 1, 2013
    Assignees: Sony Corporation, Sony Electronics Inc.
    Inventors: Kyoko Takeda, Theodore R Booth, III, Jason Clement
  • Patent number: 8346548
    Abstract: The aural similarity measuring system and method provides a measure of the aural similarity between a target text (10) and one or more reference texts (11). Both the target text (10) and the reference texts (11) are converted into a string of phonemes (15) and then one or other of the phoneme strings are adjusted (16) so that both are equal in length. The phoneme strings are compared (12) and a score generated representative of the degree of similarity of the two phoneme strings. Finally, where there is a plurality of reference texts the similarity scores for each of the reference texts are ranked (13). With this aural similarity measuring system the analysis is automated thereby reducing risks of errors and omissions. Moreover, the system provides an objective measure of aural similarity enabling consistency of comparison in results and reproducibility of results.
    Type: Grant
    Filed: March 5, 2008
    Date of Patent: January 1, 2013
    Assignee: Mongoose Ventures Limited
    Inventor: Mark Owen
  • Publication number: 20120323577
    Abstract: Methods of automatic speech recognition for premature enunciation. In one method, a) a user is prompted to input speech, then b) a listening period is initiated to monitor audio via a microphone, such that there is no pause between the end of step a) and the beginning of step b), and then the begin-speaking audible indicator is communicated to the user during the listening period. In another method, a) at least one audio file is played including both a prompt for a user to input speech and a begin-speaking audible indicator to the user, b) a microphone is activated to monitor audio, after playing the prompt but before playing the begin-speaking audible indicator in step a), and c) speech is received from the user via the microphone.
    Type: Application
    Filed: June 16, 2011
    Publication date: December 20, 2012
    Applicant: GENERAL MOTORS LLC
    Inventors: John J. Correia, Rathinavelu Chengalvarayan, Gaurav Talwar, Xufang Zhao
  • Publication number: 20120323574
    Abstract: Event audio data that is based on verbal utterances associated with a medical event associated with a patient is received. A list of a plurality of candidate text strings that match interpretations of the event audio data is obtained, based on information included in a medical speech repository, information included in a speech accent repository, and a matching function. A selection of at least one of the candidate text strings included in the list is obtained. A population of at least one field of an electronic medical form is initiated, based on the obtained selection.
    Type: Application
    Filed: June 17, 2011
    Publication date: December 20, 2012
    Applicant: MICROSOFT CORPORATION
    Inventors: Tao Wang, Bin Zhou
  • Publication number: 20120316880
    Abstract: An information processing apparatus, information processing method, and computer readable non-transitory storage medium for analyzing words reflecting information that is not explicitly recognized verbally. An information processing method includes the steps of: extracting speech data and sound data used for recognizing phonemes included in the speech data as words; identifying a section surrounded by pauses within a speech spectrum of the speech data; performing sound analysis on the identified section to identify a word in the section; generating prosodic feature values for the words; acquiring frequencies of occurrence of the word within the speech data; calculating a degree of fluctuation within the speech data for the prosodic feature values of high frequency words where the high frequency words are any words whose frequency of occurrence meets a threshold; and determining a key phrase based on the degree of fluctuation.
    Type: Application
    Filed: August 22, 2012
    Publication date: December 13, 2012
    Applicant: International Business Machines Corporation
    Inventors: Tohru Nagano, Masafumi Nishimura, Ryuki Tachibana
  • Publication number: 20120316879
    Abstract: A continuous speech recognition system to recognize continuous speech smoothly in a noisy environment. The system selects call commands, configures a minimum recognition network in token, which consists of the call commands and mute intervals including noises, recognizes the inputted speech continuously in real time, analyzes the reliability of speech recognition continuously and recognizes the continuous speech from a speaker. When a speaker delivers a call command, the system for detecting the speech interval and recognizing continuous speech in a noisy environment through the real-time recognition of call commands measures the reliability of the speech after recognizing the call command, and recognizes the speech from the speaker by transferring the speech interval following the call command to a continuous speech-recognition engine at the moment when the system recognizes the call command.
    Type: Application
    Filed: August 22, 2012
    Publication date: December 13, 2012
    Applicant: KOREAPOWERVOICE CO., LTD.
    Inventors: Heui-Suck JUNG, Se-Hoon CHIN, Tae-Young ROH
  • Patent number: 8332215
    Abstract: The invention provides a dynamic range control module installed in a speech processing apparatus. In one embodiment, the dynamic range control module comprises a buffer, a voice activity detector, a peak calculation module, and an amplitude adjusting module. The buffer buffers a speech signal to obtain a delayed speech signal. The voice activity detector determines a syllable from the delayed speech signal. The peak calculation module calculates peak amplitude of the syllable. The amplitude adjusting module determines an attenuation factor corresponding to the syllable according to the peak amplitude in the syllable, and adjusts amplitude of the whole syllable with the same gain according to the attenuation factor to obtain an adjusted speech signal.
    Type: Grant
    Filed: October 31, 2008
    Date of Patent: December 11, 2012
    Assignee: Fortemedia, Inc.
    Inventors: Ming Zhang, Wan-Chieh Pai
  • Patent number: 8332225
    Abstract: Techniques to create and share custom voice fonts are described. An apparatus may include a preprocessing component to receive voice audio data and a corresponding text script from a client and to process the voice audio data to produce prosody labels and a rich script. The apparatus may further include a verification component to automatically verify the voice audio data and the text script. The apparatus may further include a training component to train a custom voice font from the verified voice audio data and rich script and to generate custom voice font data usable by the TTS component. Other embodiments are described and claimed.
    Type: Grant
    Filed: June 4, 2009
    Date of Patent: December 11, 2012
    Assignee: Microsoft Corporation
    Inventors: Sheng Zhao, Zhi Li, Shenghao Qin, Chiwei Che, Jingyang Xu, Binggong Ding
  • Patent number: 8332223
    Abstract: In one aspect, a method for determining validity of an identity asserted by a speaker using a voice print associated with a user whose identity the speaker is asserting, the voice print obtained from characteristic features of at least one first voice signal obtained from the user uttering at least one enrollment utterance including at least one enrollment word is provided.
    Type: Grant
    Filed: October 24, 2008
    Date of Patent: December 11, 2012
    Assignee: Nuance Communications, Inc.
    Inventors: Kevin R. Farrell, David A. James, William F. Ganong, III, Jerry K. Carter
  • Publication number: 20120296653
    Abstract: A method of and a system for processing speech. A spoken utterance of a plurality of characters can be received. A plurality of known character sequences that potentially correspond to the spoken utterance can be selected. Each selected known character sequence can be scored based on, at least in part, a weighting of individual characters that comprise the known character sequence.
    Type: Application
    Filed: July 30, 2012
    Publication date: November 22, 2012
    Applicant: Nuance Communications, Inc.
    Inventor: Kenneth D. White
  • Patent number: 8315857
    Abstract: Systems and methods for modification of an audio input signal are provided. In exemplary embodiments, an adaptive multiple-model optimizer is configured to generate at least one source model parameter for facilitating modification of an analyzed signal. The adaptive multiple-model optimizer comprises a segment grouping engine and a source grouping engine. The segment grouping engine is configured to group simultaneous feature segments to generate at least one segment model. The at least one segment model is used by the source grouping engine to generate at least one source model, which comprises the at least one source model parameter. Control signals for modification of the analyzed signal may then be generated based on the at least one source model parameter.
    Type: Grant
    Filed: May 30, 2006
    Date of Patent: November 20, 2012
    Assignee: Audience, Inc.
    Inventors: David Klein, Stephen Malinowski, Lloyd Watts, Bernard Mont-Reynaud
  • Patent number: 8315869
    Abstract: A speech recognition apparatus for recognizing a plurality of sequential words contained in a speech includes an acoustic model reading part for reading an acoustic model, a dictionary management part for reading required data from dictionary data, and a recognition part for successively recognizing the sequential words by matching a group of words represented by the dictionary data with the inputted speech, using the acoustic model, wherein the dictionary data contains a beginning part dictionary representing beginning parts of words, and a group of ending part dictionaries storing data representing ending parts, the ending part dictionary and/or the beginning part dictionary are read in accordance with the word recognized by the recognition part, and the recognition part matches a subsequent speech with the beginning parts of words contained in the beginning part dictionary while the dictionary management part is reading the ending part dictionary and/or the beginning part dictionary.
    Type: Grant
    Filed: July 19, 2006
    Date of Patent: November 20, 2012
    Assignee: Fujitsu Limited
    Inventor: Shouji Harada
  • Patent number: 8315870
    Abstract: A distance calculation unit (16) obtains the acoustic distance between the feature amount of input speech and each phonetic model. A word search unit (17) performs a word search based on the acoustic distance and a language model including the phoneme and prosodic label of a word, and outputs a word hypothesis and a first score representing the likelihood of the word hypothesis. The word search unit (17) also outputs a vowel interval and its tone label in the input speech, when assuming that the recognition result of the input speech is the word hypothesis. A tone recognition unit (21) outputs a second score representing the likelihood of the tone label output from the word search unit (17) based on a feature amount corresponding to the vowel interval output from the word search unit (17). A rescore unit (22) corrects the first score of the word hypothesis output from the word search unit (17) using the second score output from the tone recognition unit (21).
    Type: Grant
    Filed: August 22, 2008
    Date of Patent: November 20, 2012
    Assignee: NEC Corporation
    Inventor: Ken Hanazawa
  • Publication number: 20120290302
    Abstract: A Chinese speech recognition system and method is disclosed. Firstly, a speech signal is received and recognized to output a word lattice. Next, the word lattice is received, and word arcs of the word lattice are rescored and reranked with a prosodic break model, a prosodic state model, a syllable prosodic-acoustic model, a syllable-juncture prosodic-acoustic model and a factored language model, so as to output a language tag, a prosodic tag and a phonetic segmentation tag, which correspond to the speech signal. The present invention performs rescoring in a two-stage way to promote the recognition rate of basic speech information and labels the language tag, prosodic tag and phonetic segmentation tag to provide the prosodic structure and language information for the rear-stage voice conversion and voice synthesis.
    Type: Application
    Filed: April 13, 2012
    Publication date: November 15, 2012
    Inventors: Jyh-Her YANG, Chen-Yu Chiang, Ming-Chieh Liu, Yih-Ru Wang, Yuan-Fu Liao, Sin-Horng Chen
  • Patent number: 8311828
    Abstract: In some aspects, a wordspotter is used to locate occurrences in an audio corpus of each of a set of predetermined subword units, which may be phoneme sequences. To locate a query (e.g., a keyword or phrase) in the audio corpus, constituent subword units in the query are indentified and then locations of those subwords are determined based on the locations of those subword units determined earlier by the wordspotter, for example, using a pre-built inverted index that maps subword units to their locations.
    Type: Grant
    Filed: August 27, 2008
    Date of Patent: November 13, 2012
    Assignee: Nexidia Inc.
    Inventors: Jon A. Arrowood, Robert W. Morris, Mark Finlay, Scott A. Judy
  • Patent number: 8311824
    Abstract: In a multi-lingual environment, a method and apparatus for determining a language spoken in a speech utterance. The method and apparatus test acoustic feature vectors extracted from the utterances against acoustic models associated with one or more of the languages. Speech to text is then performed for the language indicated by the acoustic testing, followed by textual verification of the resulting text. During verification, the resulting text is processed by language specific NLP and verified against textual models associated with the language. The system is self-learning, i.e., once a language is verified or rejected, the relevant feature vectors are used for enhancing one or more acoustic models associated with one or more languages, so that acoustic determination may improve.
    Type: Grant
    Filed: October 27, 2008
    Date of Patent: November 13, 2012
    Assignee: Nice-Systems Ltd
    Inventors: Yuval Lubowich, Moshe Wasserblat, Dimitri Volsky, Oren Pereg
  • Publication number: 20120278079
    Abstract: An audio processing system makes use of a number of levels of compression or data reduction, thereby providing reduced storage requirements while maintaining a high accuracy of keyword detection in the original audio input.
    Type: Application
    Filed: April 29, 2011
    Publication date: November 1, 2012
    Inventors: Jon A. Arrowood, Robert W. Morris, Peter S. Cardillo, Marsal Gavalda
  • Patent number: 8301447
    Abstract: The present invention relates to creating a phonetic index of phonemes from an audio segment that includes speech content from multiple sources. The phonemes in the phonetic index are directly or indirectly associated with the corresponding source of the speech from which the phonemes were derived. By associating the phonemes with a corresponding source, the phonetic index of speech content from multiple sources may be searched based on phonetic content as well as the corresponding source.
    Type: Grant
    Filed: October 10, 2008
    Date of Patent: October 30, 2012
    Assignee: Avaya Inc.
    Inventors: John H. Yoakum, Stephen Whynot
  • Patent number: 8301454
    Abstract: A method is provided of providing cues from am electronic communication device to a user while capturing an utterance. A plurality of cues associated with the user utterance are provided by the device to the user in at least near real-time. For each of a plurality of portions of the utterance, data representative of the respective portion of the user utterance is communicated from the electronic communication device to a remote electronic device. In response to this communication, data, representative of at least one parameter associated with the respective portion of the user utterance, is received at the electronic communication device. The electronic communication device provides one or more cues to the user based on the at least parameter. At least one of the cues is provided by the electronic communication device to the user prior to completion of the step of capturing the user utterance.
    Type: Grant
    Filed: August 24, 2009
    Date of Patent: October 30, 2012
    Assignee: Canyon IP Holdings LLC
    Inventor: Scott Edward Paden
  • Publication number: 20120271635
    Abstract: A system and method for performing speech recognition is disclosed. The method comprises receiving an utterance, applying the utterance to a recognizer with a language model having pronunciation probabilities associated with unique word identifiers for words given their pronunciations and presenting a recognition result for the utterance. Recognition improvement is found by moving a pronunciation model from a dictionary to the language model.
    Type: Application
    Filed: July 2, 2012
    Publication date: October 25, 2012
    Applicant: AT&T Intellectual Property II, L.P.
    Inventor: Andrej Ljolje
  • Patent number: 8296141
    Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable media for speech recognition. The method includes receiving speech utterances, assigning a pronunciation weight to each unit of speech in the speech utterances, each respective pronunciation weight being normalized at a unit of speech level to sum to 1, for each received speech utterance, optimizing the pronunciation weight by (1) identifying word and phone alignments and corresponding likelihood scores, and (2) discriminatively adapting the pronunciation weight to minimize classification errors, and recognizing additional received speech utterances using the optimized pronunciation weights. A unit of speech can be a sentence, a word, a context-dependent phone, a context-independent phone, or a syllable. The method can further include discriminatively adapting pronunciation weights based on an objective function.
    Type: Grant
    Filed: November 19, 2008
    Date of Patent: October 23, 2012
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Mazin Gilbert, Alistair D. Conkie, Andrej Ljolje
  • Patent number: 8296154
    Abstract: A sound processor including a microphone (1), a pre-amplifier (2), a bank of N parallel filters (3), means for detecting short-duration transitions in the envelope signal of each filter channel, and means for applying gain to the outputs of these filter channels in which the gain is related to a function of the second-order derivative of the slow-varying envelope signal in each filter channel, to assist in perception of low-intensity sort-duration speech features in said signal.
    Type: Grant
    Filed: October 28, 2008
    Date of Patent: October 23, 2012
    Assignee: Hearworks Pty Limited
    Inventors: Andrew E. Vandali, Graeme M. Clark
  • Publication number: 20120265531
    Abstract: An intelligent query system for processing voiced-based queries is disclosed, which uses semantic based processing to identify the question posed by the user by understanding the meaning of the users utterance. Based on identifying the meaning of the utterance, the system selects a single answer that best matches the user's query. The answer that is paired to this single question is then retrieved and presented to the user. The system, as implemented, accepts environmental variables selected by the user and is scalable to provide answers to a variety and quantity of user-initiated queries.
    Type: Application
    Filed: June 18, 2012
    Publication date: October 18, 2012
    Inventor: Ian M. Bennett
  • Publication number: 20120259632
    Abstract: A feature transform for speech recognition is described. An input speech utterance is processed to produce a sequence of representative speech vectors. A time-synchronous speech recognition pass is performed using a decoding search to determine a recognition output corresponding to the speech input. The decoding search includes, for each speech vector after some first threshold number of speech vectors, estimating a feature transform based on the preceding speech vectors in the utterance and partial decoding results of the decoding search. The current speech vector is then adjusted based on the current feature transform, and the adjusted speech vector is used in a current frame of the decoding search.
    Type: Application
    Filed: February 22, 2010
    Publication date: October 11, 2012
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventor: Daniel Willett
  • Patent number: 8285546
    Abstract: A system for use in speech recognition includes an acoustic module accessing a plurality of distinct-language acoustic models, each based upon a different language; a lexicon module accessing at least one lexicon model; and a speech recognition output module. The speech recognition output module generates a first speech recognition output using a first model combination that combines one of the plurality of distinct-language acoustic models with the at least one lexicon model. In response to a threshold determination, the speech recognition output module generates a second speech recognition output using a second model combination that combines a different one of the plurality of distinct-language acoustic models with the at least one distinct-language lexicon model.
    Type: Grant
    Filed: September 9, 2011
    Date of Patent: October 9, 2012
    Assignee: Nuance Communications, Inc.
    Inventor: David E. Reich
  • Publication number: 20120253813
    Abstract: A speech segment determination device includes a frame division portion, a power spectrum calculation portion, a power spectrum operation portion, a spectral entropy calculation portion and a determination portion. The frame division portion divides an input signal in units of frames. The power spectrum calculation portion calculates, using an analysis length, a power spectrum of the input signal for each of the frames that have been divided. The power spectrum operation portion adds a value of the calculated power spectrum to a value of power spectrum in each of frequency bins. The spectral entropy calculation portion calculates spectral entropy using the power spectrum whose value has been increased. The determination portion determines, based on a value of the spectral entropy, whether the input signal is a signal in a speech segment.
    Type: Application
    Filed: February 17, 2012
    Publication date: October 4, 2012
    Applicant: OKI ELECTRIC INDUSTRY CO., LTD.
    Inventor: Kazuhiro KATAGIRI
  • Publication number: 20120253812
    Abstract: In syllable or vowel or phone boundary detection during speech, an auditory spectrum may be determined for an input window of sound and one or more multi-scale features may be extracted from the auditory spectrum. Each multi-scale feature can be extracted using a separate two-dimensional spectro-temporal receptive filter. One or more feature maps corresponding to the one or more multi-scale features can be generated and an auditory gist vector can be extracted from each of the one or more feature maps. A cumulative gist vector may be obtained through augmentation of each auditory gist vector extracted from the one or more feature maps. One or more syllable or vowel or phone boundaries in the input window of sound can be detected by mapping the cumulative gist vector to one or more syllable or vowel or phone boundary characteristics using a machine learning algorithm.
    Type: Application
    Filed: April 1, 2011
    Publication date: October 4, 2012
    Applicant: Sony Computer Entertainment Inc.
    Inventors: OZLEM KALINLI, Ruxin Chen
  • Patent number: 8280730
    Abstract: A method (400, 600, 700) and apparatus (220) for enhancing the intelligibility of speech emitted into a noisy environment. After filtering (408) ambient noise with a filter (304) that simulates the physical blocking of noise by a at least a part of a voice communication device (102) a frequency dependent SNR of received voice audio relative to ambient noise is computed (424) on a perceptual (e.g. Bark) frequency scale. Formants are identified (426, 600, 700) and the SNR in bands including certain formants are modified (508, 510) with formant enhancement gain factors in order to improve intelligibility. A set of high pass filter gains (338) is combined (516) with the formant enhancement gains factors yielding combined gains which are clipped (518), scaled (520) according to a total SNR, normalized (526), smoothed across time (530) and frequency (532) and used to reconstruct (532, 534) an audio signal.
    Type: Grant
    Filed: May 25, 2005
    Date of Patent: October 2, 2012
    Assignee: Motorola Mobility LLC
    Inventors: Jianming J. Song, John C. Johnson
  • Publication number: 20120245942
    Abstract: Systems and methods are provided for scoring speech. A speech sample is received, where the speech sample is associated with a script. The speech sample is aligned with the script. An event recognition metric of the speech sample is extracted, and locations of prosodic events are detected in the speech sample based on the event recognition metric. The locations of the detected prosodic events are compared with locations of model prosodic events, where the locations of model prosodic events identify expected locations of prosodic events of a fluent, native speaker speaking the script. A prosodic event metric is calculated based on the comparison, and the speech sample is scored using a scoring model based upon the prosodic event metric.
    Type: Application
    Filed: March 20, 2012
    Publication date: September 27, 2012
    Inventors: Klaus Zechner, Xiaoming Xi
  • Publication number: 20120239403
    Abstract: An approach for phoneme recognition is described. A sequence of intermediate output posterior vectors is generated from an input sequence of cepstral features using a first layer perceptron. The intermediate output posterior vectors are then downsampled to form a reduced input set of intermediate posterior vectors for a second layer perceptron. A sequence of final posterior vectors is generated from the reduced input set of intermediate posterior vectors using the second layer perceptron. Then the final posterior vectors are decoded to determine an output recognized phoneme sequence representative of the input sequence of cepstral features.
    Type: Application
    Filed: September 28, 2009
    Publication date: September 20, 2012
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventors: Daniel Andrés Vásquez Cano, Guillermo Aradilla, Rainer Gruhn
  • Patent number: 8271283
    Abstract: Disclosed herein is a method and apparatus to recognize speech by measuring the confidence levels of respective frames. The method includes the operations of obtaining frequency features of a received speech signal for the respective frames having a predetermined length, calculating a keyword model-based likelihood and a filler model-based likelihood for each of the frame, calculating a confidence score based on the two types of likelihoods, and deciding whether the received speech signal corresponds to a keyword or a non-keyword based on the confidence scores. Also, the method includes the operation of transforming the confidence scores by applying transform functions of clusters, which include the confidence scores or are close to the confidence scores, to the confidence scores.
    Type: Grant
    Filed: February 16, 2006
    Date of Patent: September 18, 2012
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Jae-hoon Jeong, Sang-bae Jeong, Jeong-su Kim, In-Jeong Choi
  • Publication number: 20120232904
    Abstract: A method and apparatus for correcting a named entity word in a speech input text. The method includes recognizing a speech input signal from a user, obtaining a recognition result including named entity vocabulary mark-up information, determining a named entity word recognized incorrectly in the recognition result according to the named entity vocabulary mark-up information, displaying the named entity word recognized incorrectly, and correcting the named entity word recognized incorrectly.
    Type: Application
    Filed: March 12, 2012
    Publication date: September 13, 2012
    Applicant: Samsung Electronics Co., Ltd.
    Inventors: Xuan ZHU, Hua Zhang, Tengrong Su, Ki-Wan Eom, Jae-Won Lee
  • Publication number: 20120232898
    Abstract: The invention relates to a system and method for gathering data for use in a spoken dialog system. An aspect of the invention is generally referred to as an automated hidden human that performs data collection automatically at the beginning of a conversation with a user in a spoken dialog system. The method comprises presenting an initial prompt to a user, recognizing a received user utterance using an automatic speech recognition engine and classifying the recognized user utterance using a spoken language understanding module. If the recognized user utterance is not understood or classifiable to a predetermined acceptance threshold, then the method re-prompts the user. If the recognized user utterance is not classifiable to a predetermined rejection threshold, then the method transfers the user to a human as this may imply a task-specific utterance. The received and classified user utterance is then used for training the spoken dialog system.
    Type: Application
    Filed: May 21, 2012
    Publication date: September 13, 2012
    Applicant: AT&T Intellectual Property II, L.P.
    Inventors: Giuseppe Di Fabbrizio, Dilek Z. Hakkani-Tur, Mazin G. Rahim, Bernard S. Renger, Gokhan Tur
  • Patent number: 8265930
    Abstract: The present invention relates to recording voice data using a voice communication device connected to a communication network and converting the voice data into a text file for delivery to a text communication device. In accordance with the present invention, the voice communication device may transfer the voice data in real-time or store the voice data on the device to be transmitted at a later time. Transcribing the voice data into a text file may be accomplished by automated computer software, either speaker-independent or speaker-dependent or by a human who transcribes the voice data into a text file. After transcribing the voice data into a text file, the text file may be delivered to a text communication device in a number of ways, such as email, file transfer protocol (FTP), or hypertext transfer protocol (HTTP).
    Type: Grant
    Filed: April 13, 2005
    Date of Patent: September 11, 2012
    Assignee: Sprint Communications Company L.P.
    Inventors: Bryce A. Jones, Raymond Edward Dickensheets
  • Publication number: 20120215539
    Abstract: A recipient computing device can receive a speech utterance to be processed by speech recognition and segment the speech utterance into two or more speech utterance segments, each of which can be to one of a plurality of available speech recognizers. A first one of the plurality of available speech recognizers can be implemented on a separate computing device accessible via a data network. A first segment can be processed by the first recognizer and the results of the processing returned to the recipient computing device, and a second segment can be processed by a second recognizer implemented at the recipient computing device.
    Type: Application
    Filed: February 22, 2012
    Publication date: August 23, 2012
    Inventor: Ajay Juneja
  • Patent number: 8249873
    Abstract: Tonal correction of speech is provided. Received speech is analyzed and compared to a table of commonly mispronounced phrases. These phrases are mapped to the phrase likely intended by the speaker. The phrase determines to be the phrase the user likely intended can be suggested to the user. If the user approves of the suggestion, tonal correction can be applied to the speech before that speech is delivered to a recipient.
    Type: Grant
    Filed: August 12, 2005
    Date of Patent: August 21, 2012
    Assignee: Avaya Inc.
    Inventors: Colin Blair, Kevin Chan, Christopher R. Gentle, Neil Hepworth, Andrew W. Lang, Paul R. Michaelis
  • Patent number: 8244539
    Abstract: Audio comparison using phoneme matching is described, including evaluating audio data associated with a file, identifying a sequence of phonemes in the audio data, associating the file with a product category based on a match indicating the sequence of phonemes is substantially similar to another sequence of phonemes, the file being stored, and accessing the file when a request associated with the product category is detected.
    Type: Grant
    Filed: February 28, 2011
    Date of Patent: August 14, 2012
    Assignee: Adobe Systems Incorporated
    Inventor: James A. Moorer
  • Patent number: 8239197
    Abstract: A system and method for efficiently transcribing verbal messages transmitted over the Internet (or other network) into text. The verbal messages are initially checked to ensure that they are in a valid format and include a return network address, and if so, are processed either as whole verbal messages or split into segments. These whole verbal messages and segments are processed by an automated speech recognition (ASR) program, which produces automatically recognized text. The automatically recognized text messages or segments are assigned to selected workbenches for manual editing and transcription, producing edited text. The segments of edited text are reassembled to produce whole edited text messages, undergo post processing to correct minor errors and output as an email, an SMS message, a file, or an input to a program. The automatically recognized text and manual edits thereof are returned as feedback to the ASR program to improve its accuracy.
    Type: Grant
    Filed: October 29, 2008
    Date of Patent: August 7, 2012
    Assignee: Intellisist, Inc.
    Inventors: Mike O. Webb, Bruce J. Peterson, Janet S. Kaseda
  • Patent number: 8239199
    Abstract: A method includes identifying a first syllable in a first audio of a first word and a second syllable in a second audio of a second word, the first syllable having a first set of properties and the second syllable having a second set of properties; detecting the first syllable in a first instance of the first word in an audio file, the first syllable in the first instance having a third set of properties; determining one or more transformations for transforming the first set of properties to the third set of properties; applying the one or more transformations to the second set of properties of the second syllable to yield a transformed second syllable; and replacing the first syllable in the first instance of the first word with the transformed second syllable in the audio file.
    Type: Grant
    Filed: October 16, 2009
    Date of Patent: August 7, 2012
    Assignee: Yahoo! Inc.
    Inventor: Narayan Lakshmi Bhamidipati
  • Publication number: 20120197644
    Abstract: An information processing apparatus, information processing method, and computer readable non-transitory storage medium for analyzing words reflecting information that is not explicitly recognized verbally. An information processing method includes the steps of: extracting speech data and sound data used for recognizing phonemes included in the speech data as words; identifying a section surrounded by pauses within a speech spectrum of the speech data; performing sound analysis on the identified section to identify a word in the section; generating prosodic feature values for the words; acquiring frequencies of occurrence of the word within the speech data; calculating a degree of fluctuation within the speech data for the prosodic feature values of high frequency words where the high frequency words are any words whose frequency of occurrence meets a threshold; and determining a key phrase based on the degree of fluctuation.
    Type: Application
    Filed: January 30, 2012
    Publication date: August 2, 2012
    Applicant: International Business Machines Corporation
    Inventors: Tohru Nagano, Masafumi Nishimura, Ryuki Tachibana
  • Publication number: 20120191456
    Abstract: A representation of a speech signal is received and is decoded to identify a sequence of position-dependent phonetic tokens wherein each token comprises a phone and a position indicator that indicates the position of the phone within a syllable.
    Type: Application
    Filed: February 1, 2012
    Publication date: July 26, 2012
    Applicant: MICROSOFT CORPORATION
    Inventors: Peng Liu, Yu Shi, Frank Kao-ping Soong
  • Patent number: 8229733
    Abstract: There are provided methods and apparatus for linguistic independent parsing and for dynamic learning in natural language systems. A parsing method for a natural language system includes the step of parsing an input phrase to identify at least one source phrase within the input phrase for which replacement phrase synonyms exist. The method further includes the step of substituting the replacement phrase synonyms in place of the identified at source phrase, in descending order by text length, to provide a modified input phrase. The method also includes the step of repeating the parsing and substituting steps until no more source phrases exist or a pre-specified number of times.
    Type: Grant
    Filed: February 9, 2006
    Date of Patent: July 24, 2012
    Inventors: John Harney, Janet Dwyer
  • Patent number: 8229729
    Abstract: A system and method for training a statistical machine translation model and decoding or translating using the same is disclosed. A source word versus target word co-occurrence matrix is created to define word pairs. Dimensionality of the matrix may be reduced. Word pairs are mapped as vectors into continuous space where the word pairs are vectors of continuous real numbers and not discrete entities in the continuous space. A machine translation parametric model is trained using an acoustic model training method based on word pair vectors in the continuous space.
    Type: Grant
    Filed: March 25, 2008
    Date of Patent: July 24, 2012
    Assignee: International Business Machines Corporation
    Inventors: Ruhi Sarikaya, Yonggang Deng, Brian Edward Doorenbos Kingsbury, Yuqing Gao
  • Patent number: 8229744
    Abstract: A method, system, and computer program for class detection and time mediated averaging of class dependent models. A technique is described to take advantage of gender information in training data and how obtain female, male, and gender independent models from this information. By using a probability value to average male and female Gaussian Mixture Models (GMMs), dramatic deterioration in cross gender decoding performance is avoided.
    Type: Grant
    Filed: August 26, 2003
    Date of Patent: July 24, 2012
    Assignee: Nuance Communications, Inc.
    Inventors: Satyanarayana Dharanipragada, Peder A. Olsen
  • Patent number: 8223374
    Abstract: A maintenance system of an image forming apparatus is composed of an image forming apparatus and a central management apparatus connected to it via a communication line. The image forming apparatus is composed of an image forming unit for forming an image, an information output unit for outputting intra-machine information of the image forming unit, a voice information input unit for inputting voice information of a user of the image forming unit, and a first communication controller for outputting the intra-machine information and voice information via the communication line.
    Type: Grant
    Filed: May 5, 2009
    Date of Patent: July 17, 2012
    Assignees: Kabushiki Kaisha Toshiba, Toshiba Tec Kabushiki Kaisha
    Inventor: Hiroyo Katou