Subportions Patents (Class 704/254)
  • Patent number: 8069042
    Abstract: A method and system for obtaining a pool of speech syllable models. The model pool is generated by first detecting a training segment using unsupervised speech segmentation or speech unit spotting. If the model pool is empty, a first speech syllable model is trained and added to the model pool. If the model pool is not empty, an existing model is determined from the model pool that best matches the training segment. Then the existing module is scored for the training segment. If the score is less than a predefined threshold, a new model for the training segment is created and added to the pool. If the score equals the threshold or is larger than the threshold, the training segment is used to improve or to re-estimate the model.
    Type: Grant
    Filed: September 21, 2007
    Date of Patent: November 29, 2011
    Assignee: Honda Research Institute Europe GmbH
    Inventors: Frank Joublin, Holger Brandl
  • Patent number: 8063809
    Abstract: A transient signal encoding method and device, decoding method and device, and processing system, where the transient signal encoding method includes: obtaining a reference sub-frame where a maximal time envelope having a maximal amplitude value is located from time envelopes of all sub-frames of an input transient signal; adjusting an amplitude value of the time envelope of each sub-frame before the reference sub-frame in such a way that a first difference is greater than a preset first threshold, in which the first difference is a difference between the amplitude value of the time envelope of each sub-frame before the reference sub-frame and the amplitude value of the maximal time envelope; and writing the adjusted time envelope into bitstream.
    Type: Grant
    Filed: June 29, 2011
    Date of Patent: November 22, 2011
    Assignee: Huawei Technologies Co., Ltd.
    Inventors: Zexin Liu, Longyin Chen, Lei Miao, Chen Hu, Wei Xiao, Herve Marcel Taddei, Qing Zhang
  • Publication number: 20110282650
    Abstract: A very common problem is when people speak a language other than the language which they are accustomed, syllables can be spoken for longer or shorter than the listener would regard as appropriate. An example of this can be observed when people who have a heavy Japanese accent speak English. Since Japanese words end with vowels, there is a tendency for native Japanese to add a vowel sound to the end of English words that should end with a consonant. Illustratively, native Japanese speakers often pronounce “orange” as “orenji.” An aspect provides an automatic speech-correcting process that would not necessarily need to know that fruit is being discussed; the system would only need to know that the speaker is accustomed to Japanese, that the listener is accustomed to English, that “orenji” is not a word in English, and that “orenji” is a typical Japanese mispronunciation of the English word “orange.
    Type: Application
    Filed: May 17, 2010
    Publication date: November 17, 2011
    Applicant: AVAYA INC.
    Inventors: Terry Jennings, Paul Roller Michaelis
  • Publication number: 20110282667
    Abstract: A plurality of statements are received from within a grammar structure. Each of the statements is formed by a number of word sets. A number of alignment regions across the statements are identified by aligning the statements on a word set basis. Each aligned word set represents an alignment region. A number of potential confusion zones are identified across the statements. Each potential confusion zone is defined by words from two or more of the statements at corresponding positions outside the alignment regions. For each of the identified potential confusion zones, phonetic pronunciations of the words within the potential confusion zone are analyzed to determine a measure of confusion probability between the words when audibly processed by a speech recognition system during the computing event. An identity of the potential confusion zones across the statements and their corresponding measure of confusion probability are reported to facilitate grammar structure improvement.
    Type: Application
    Filed: May 14, 2010
    Publication date: November 17, 2011
    Applicant: Sony Computer Entertainment Inc.
    Inventor: Gustavo A. Hernandez-Abrego
  • Patent number: 8060365
    Abstract: A dialog processing system which includes a target expression data extraction unit for extracting a plurality of target expression data each including a pattern matching portion which matches an utterance pattern, which are inputted by an utterance pattern input unit and is an utterance structure derived from contents of field-independent general conversations, among a plurality of utterance data which are inputted by an utterance data input unit and obtained by converting contents of a plurality of conversations in one field; a feature extraction unit for retrieving the pattern matching portions, respectively, from the plurality of target expression data extracted, and then for extracting feature quantity common to the plurality of pattern matching portions; and a mandatory data extraction unit for extracting mandatory data in the one field included in the plurality of utterance data by use of the feature quantities extracted.
    Type: Grant
    Filed: July 3, 2008
    Date of Patent: November 15, 2011
    Assignee: Nuance Communications, Inc.
    Inventors: Nobuyasu Itoh, Shiho Negishi, Hironori Takeuchi
  • Patent number: 8050924
    Abstract: Methods and apparatus for implementing the generation of names. In one implementation, a system for generating a name includes: a user interface that receives user input including values for corresponding characteristics and name lengths; a rule dictionary that indicates one or more rules, each rule indicating a relationship between a phoneme and a characteristic; a phoneme selector that selects a phoneme using a value for a characteristic received through said user interface and a rule corresponding to that characteristic; a phoneme compiler that combines selected phonemes to form a name, wherein said name includes a number of letters based on said name length; storage storing data, including data representing said user input and said rule dictionary; and a processor for executing instructions providing said user interface, said first phoneme selector, said second phoneme selector, and said phoneme compiler.
    Type: Grant
    Filed: April 8, 2005
    Date of Patent: November 1, 2011
    Assignee: Sony Online Entertainment LLC
    Inventor: Patrick McCuller
  • Patent number: 8046224
    Abstract: A phonetic vocabulary for a speech recognition system is adapted to a particular speaker's pronunciation. A speaker can be attributed specific pronunciation styles, which can be identified from specific pronunciation examples. Consequently, a phonetic vocabulary can be reduced in size, which can improve recognition accuracy and recognition speed.
    Type: Grant
    Filed: April 18, 2008
    Date of Patent: October 25, 2011
    Assignee: Nuance Communications, Inc.
    Inventors: Nitendra Rajput, Ashish Verma
  • Patent number: 8046218
    Abstract: A system and method for phone detection. The system includes a microphone configured to receive a speech signal in an acoustic domain and convert the speech signal from the acoustic domain to an electrical domain, and a filter bank coupled to the microphone and configured to receive the converted speech signal and generate a plurality of channel speech signals corresponding to a plurality of channels respectively. Additionally, the system includes a plurality of onset enhancement devices configured to receive the plurality of channel speech signals and generate a plurality of onset enhanced signals. Each of the plurality of onset enhancement devices is configured to receive one of the plurality of channel speech signals, enhance one or more onsets of one or more signal pulses for the received one of the plurality of channel speech signals, and generate one of the plurality of onset enhanced signals.
    Type: Grant
    Filed: September 18, 2007
    Date of Patent: October 25, 2011
    Assignee: The Board of Trustees of the University of Illinois
    Inventors: Jont B. Allen, Marion Regnier
  • Publication number: 20110251844
    Abstract: Described is the use of acoustic data to improve grapheme-to-phoneme conversion for speech recognition, such as to more accurately recognize spoken names in a voice-dialing system. A joint model of acoustics and graphonemes (acoustic data, phonemes sequences, grapheme sequences and an alignment between phoneme sequences and grapheme sequences) is described, as is retraining by maximum likelihood training and discriminative training in adapting graphoneme model parameters using acoustic data. Also described is the unsupervised collection of grapheme labels for received acoustic data, thereby automatically obtaining a substantial number of actual samples that may be used in retraining. Speech input that does not meet a confidence threshold may be filtered out so as to not be used by the retrained model.
    Type: Application
    Filed: June 20, 2011
    Publication date: October 13, 2011
    Applicant: MICROSOFT CORPORATION
    Inventors: Xiao Li, Asela J. R. Gunawardana, Alejandro Acero
  • Patent number: 8036348
    Abstract: A method of presenting instructions to a user sending an incoming communication to a service center includes presenting a menu to the user. The menu includes a plurality of procedure descriptors to the user. The user is presented, according to a selection of one of the procedure descriptors by the user, a sequence of instructions which enable completion of a procedure described by the selected procedure descriptor. The incoming communication is transferred at a position in the sequence of instructions to a representative. The incoming communication is also transferred back to the same position in the sequence of instructions.
    Type: Grant
    Filed: October 14, 2008
    Date of Patent: October 11, 2011
    Assignee: AT&T Labs, Inc.
    Inventors: Philip Ted Kortum, Robert R. Bushey
  • Patent number: 8032374
    Abstract: Provided are an apparatus and method for recognizing continuous speech using search space restriction based on phoneme recognition. In the apparatus and method, a search space can be primarily reduced by restricting connection words to be shifted at a boundary between words based on the phoneme recognition result. In addition, the search space can be secondarily reduced by rapidly calculating a degree of similarity between the connection word to be shifted and the phoneme recognition result using a phoneme code and shifting the corresponding phonemes to only connection words having degrees of similarity equal to or higher than a predetermined reference value. Therefore, the speed and performance of the speech recognition process can be improved in various speech recognition services.
    Type: Grant
    Filed: December 4, 2007
    Date of Patent: October 4, 2011
    Assignee: Electronics and Telecommunications Research Institute
    Inventors: Hyung Bae Jeon, Jun Park, Seung Hi Kim, Kyu Woong Hwang
  • Patent number: 8027834
    Abstract: The present invention discloses a method for training an exception-limited phonetic decision tree. An initial subset of data can be selected and used for creating an initial phonetic decision tree. Additional terms can then be incorporated into the subset. The enlarged subset can be used to evaluate the phonetic decision tree with the results being categorized as either correctly or incorrectly phonetized. An exception-limited phonetic tree can be generated from the set of correctly phonetized terms. If the termination conditions for the method have been determined to be unsatisfactorily met, then steps of the method can be repeated.
    Type: Grant
    Filed: June 25, 2007
    Date of Patent: September 27, 2011
    Assignee: Nuance Communications, Inc.
    Inventor: Steven M. Hancock
  • Patent number: 8024191
    Abstract: Systems and methods are provided for recognizing speech in a spoken dialogue system. The method includes receiving input speech having a pre-vocalic consonant or a post-vocalic consonant, generating at least one output lattice that calculates a first score by comparing the input speech to a training model to provide a result and distinguishing between the pre-vocalic consonant and the post-vocalic consonant in the input speech. A second score is calculated by measuring a similarity between the pre-vocalic consonant or the post vocalic consonant in the input speech and the first score. At least one category is determined for the pre-vocalic match or mismatch or the post-vocalic match or mismatch by using the second score and the results of the an automated speech recognition (ASR) system are refined by using the at least one category for the pre-vocalic match or mismatch or the post-vocalic match or mismatch.
    Type: Grant
    Filed: October 31, 2007
    Date of Patent: September 20, 2011
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Yeon-Jun Kim, Alistair Conkie, Andrej Ljolje, Ann K. Syrdal
  • Patent number: 8019602
    Abstract: An automatic speech recognition system recognizes user changes to dictated text and infers whether such changes result from the user changing his/her mind, or whether such changes are a result of a recognition error. If a recognition error is detected, the system uses the type of user correction to modify itself to reduce the chance that such recognition error will occur again. Accordingly, the system and methods provide for significant speech recognition learning with little or no additional user interaction.
    Type: Grant
    Filed: January 20, 2004
    Date of Patent: September 13, 2011
    Assignee: Microsoft Corporation
    Inventors: Dong Yu, Peter Mau, Mei-Yuh Hwang, Alejandro Acero
  • Patent number: 8019604
    Abstract: A method, system and communication device for enabling uniterm discovery from audio content and voice-to-voice searching of audio content stored on a device using discovered uniterms. Received audio/voice input signal is sent to a uniterm discovery and search (UDS) engine within the device. The audio data may be associated with other content that is also stored within the device. The UDS engine retrieves a number of uniterms from the audio data and associates the uniterms with the stored content. When a voice search is initiated at the device, the UDS engine generates a statistical latent lattice model from the voice query and scores the uniterms from the audio database against the latent lattice model. Following a further refinement, the best group of uniterms is then determined and segments of the stored audio data and/or other content corresponding to the best group of uniterms are outputted.
    Type: Grant
    Filed: December 21, 2007
    Date of Patent: September 13, 2011
    Assignee: Motorola Mobility, Inc.
    Inventor: Changxue Ma
  • Publication number: 20110218802
    Abstract: A computerized method for continuous speech recognition using a speech recognition engine and a phoneme model. The computerized method inputs a speech signal into the speech recognition engine. Based on the phoneme model, the speech signal is indexed by scoring for the phonemes of the phoneme model and a time-ordered list of phoneme candidates and respective scores resulting from the scoring are produced. The phoneme candidates are input with the scores from the time-ordered list. Word transcription candidates are typically input from a dictionary and words are built by selecting from the word transcription candidates based on the scores. A stream of transcriptions is outputted corresponding to the input speech signal. The stream of transcriptions is re-scored by searching for and detecting anomalous word transcriptions in the stream of transcriptions to produce second scores.
    Type: Application
    Filed: March 8, 2010
    Publication date: September 8, 2011
    Inventors: Shlomi Hai Bouganim, Boris Levant
  • Patent number: 8015008
    Abstract: Disclosed are systems, methods and computer readable media for training acoustic models for an automatic speech recognition systems (ASR) system. The method includes receiving a speech signal, defining at least one syllable boundary position in the received speech signal, based on the at least one syllable boundary position, generating for each consonant in a consonant phoneme inventory a pre-vocalic position label and a post-vocalic position label to expand the consonant phoneme inventory, reformulating a lexicon to reflect an expanded consonant phoneme inventory, and training a language model for an automated speech recognition (ASR) system based on the reformulated lexicon.
    Type: Grant
    Filed: October 31, 2007
    Date of Patent: September 6, 2011
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Yeon-Jun Kim, Alistair Conkie, Andrej Ljolje, Ann K. Syrdal
  • Patent number: 8010361
    Abstract: In an embodiment, a lattice of phone strings in an input communication of a user may be recognized, wherein the lattice may represent a distribution over the phone strings. Morphemes in the input communication of the user may be detected using the recognized lattice. Task-type classification decisions may be made based on the detected morphemes in the input communication of the user.
    Type: Grant
    Filed: July 30, 2008
    Date of Patent: August 30, 2011
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Allen Louis Gorin, Dijana Petrovska-Delacretaz, Giuseppe Riccardi, Jeremy Huntley Wright
  • Patent number: 8005676
    Abstract: Included are embodiments for providing speech analysis. At least one embodiment of a method includes receiving audio data associated with a communication and providing the at least one phoneme in a phonetic transcript, the phonetic transcript including at least one character from a phonetic alphabet.
    Type: Grant
    Filed: September 29, 2006
    Date of Patent: August 23, 2011
    Assignee: Verint Americas, Inc.
    Inventors: Gary Duke, Joseph Watson
  • Patent number: 7996224
    Abstract: Systems and methods relate to generating a language model for use in, for example, a spoken dialog system or some other application. The method comprises building a class-based language model, generating at least one sequence network and replacing class labels in the class-based language model with the at least one sequence network. In this manner, placeholders or tokens associated with classes can be inserted into the models at training time and word/phone networks can be built based on meta-data information at test time. Finally, the placeholder token can be replaced with the word/phone networks at run time to improve recognition of difficult words such as proper names.
    Type: Grant
    Filed: October 29, 2004
    Date of Patent: August 9, 2011
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Michiel A. U. Bacchiani, Sameer Raj Maskey, Brian E. Roark, Richard William Sproat
  • Patent number: 7991615
    Abstract: Described is the use of acoustic data to improve grapheme-to-phoneme conversion for speech recognition, such as to more accurately recognize spoken names in a voice-dialing system. A joint model of acoustics and graphonemes (acoustic data, phonemes sequences, grapheme sequences and an alignment between phoneme sequences and grapheme sequences) is described, as is retraining by maximum likelihood training and discriminative training in adapting graphoneme model parameters using acoustic data. Also described is the unsupervised collection of grapheme labels for received acoustic data, thereby automatically obtaining a substantial number of actual samples that may be used in retraining. Speech input that does not meet a confidence threshold may be filtered out so as to not be used by the retrained model.
    Type: Grant
    Filed: December 7, 2007
    Date of Patent: August 2, 2011
    Assignee: Microsoft Corporation
    Inventors: Xiao Li, Asela J. R. Gunawardana, Alejandro Acero
  • Publication number: 20110184737
    Abstract: A speech recognition apparatus includes a speech input unit that receives input speech, a phoneme recognition unit that recognizes phonemes of the input speech and generates a first phoneme sequence representing corrected speech, a matching unit that matches the first phoneme sequence with a second phoneme sequence representing original speech, and a phoneme correcting unit that corrects phonemes of the second phoneme sequence based on the matching result.
    Type: Application
    Filed: January 27, 2011
    Publication date: July 28, 2011
    Applicant: HONDA MOTOR CO., LTD.
    Inventors: Mikio NAKANO, Naoto IWAHASHI, Kotaro FUNAKOSHI, Taisuke SUMII
  • Patent number: 7983914
    Abstract: A speech recognition system or method can include a speech input device and a processor coupled to the speech input device. The processor can be programmed to identify a plurality of words that are members of confusable pairs of words where each pair includes a target word and a substituted word. The processor can degrade a pronunciation of the substituted word to provide a worse pronunciation of the substituted word. The processor can further compare the pronunciation of the target word with the worse pronunciation to the substituted word. The processor can be further programmed to reduce confusion between the substituted word and other words in a recognition grammar of the speech recognition engine and can also narrow the scope within which the substituted word is recognized.
    Type: Grant
    Filed: August 10, 2005
    Date of Patent: July 19, 2011
    Assignee: Nuance Communications, Inc.
    Inventors: John W. Eckhart, Harvey M. Ruback
  • Patent number: 7983915
    Abstract: A method of generating an audio content index for use by a search engine includes determining a phoneme sequence based on recognized speech from an audio content time segment. The method also includes identifying k-phonemes which occur within the phoneme sequence. The identified k-phonemes are stored within a data structure such that the identified k-phonemes are capable of being compared with k-phonemes from a search query.
    Type: Grant
    Filed: April 30, 2007
    Date of Patent: July 19, 2011
    Assignee: Sonic Foundry, Inc.
    Inventors: Michael J. Knight, Jonathan Scott, Steven J. Yurick, John Hancock
  • Patent number: 7979718
    Abstract: An operator recognition device is provided that eliminates the registration of data such as HMM data having a characteristic amount for which error in recognition occurs easily when recognizing an operator, and thus reduces the possibility of errors in recognition, and has stable recognition performance. When registering HMM data that is used when performing recognition processing, a speaker recognition device 100 eliminates the registration of HMM data of a password having a characteristic amount of the spoken voice component that is similar to a characteristic amount that is indicated by HMM data that is already registered, and does not allow the registration of HMM data for which it is estimated that error in recognition will occur easily during the recognition process.
    Type: Grant
    Filed: March 24, 2006
    Date of Patent: July 12, 2011
    Assignees: Pioneer Corporation, Tech Experts Incorporation
    Inventors: Soichi Toyama, Ikuo Fujita, Mitsuya Komamura
  • Publication number: 20110166860
    Abstract: Systems and methods are disclosed to operate a mobile device by capturing user input; transmitting the user input over a wireless channel to an engine, analyzing at the engine music clip or video in a multimedia data stream and sending an analysis wirelessly to the mobile device.
    Type: Application
    Filed: July 12, 2010
    Publication date: July 7, 2011
    Inventor: Bao Q. Tran
  • Patent number: 7974843
    Abstract: The invention relates to an operating method for an automated language recognizer intended for the speaker-independent language recognition of words from different languages, particularly for recognizing names from different languages. The method is based on a language defined as the mother tongue and has an input phase for establishing a language recognizer vocabulary. Phonetic transcripts are determined for words in various languages in order to obtain phoneme sequences for pronunciation variants. The phonemes of each relevant phoneme set of the mother tongue are then specifically mapped to determine phoneme sequences that correspond to pronunciation variants.
    Type: Grant
    Filed: January 2, 2003
    Date of Patent: July 5, 2011
    Assignee: Siemens Aktiengesellschaft
    Inventor: Tobias Schneider
  • Publication number: 20110153329
    Abstract: Audio comparison using phoneme matching is described, including evaluating audio data associated with a file, identifying a sequence of phonemes in the audio data, associating the file with a product category based on a match indicating the sequence of phonemes is substantially similar to another sequence of phonemes, the file being stored, and accessing the file when a request associated with the product category is detected.
    Type: Application
    Filed: February 28, 2011
    Publication date: June 23, 2011
    Inventor: James A. Moorer
  • Publication number: 20110144992
    Abstract: Described is a technology for performing unsupervised learning using global features extracted from unlabeled examples. The unsupervised learning process may be used to train a log-linear model, such as for use in morphological segmentation of words. For example, segmentations of the examples are sampled based upon the global features to produce a segmented corpus and log-linear model, which are then iteratively reprocessed to produce a final segmented corpus and a log-linear model.
    Type: Application
    Filed: December 15, 2009
    Publication date: June 16, 2011
    Applicant: Microsoft Corporation
    Inventors: Kristina N. Toutanova, Colin Andrew Cherry, Hoifung Poon
  • Patent number: 7961851
    Abstract: A system and method to selectively retrieve stored messages are disclosed. The method comprises receiving a voice command from a user, the voice command comprising at least one spoken search identifier, determining at least one stored message that corresponds to the spoken search identifier, and presenting the at least one stored message to the user. The plurality of stored messages may comprise content data for allowing messages with matching content data to be retrieved. The plurality of stored messages may also comprise caller data, for allowing messages with matching caller data to be retrieved. The stored messages may either be voice messages or text messages.
    Type: Grant
    Filed: July 26, 2006
    Date of Patent: June 14, 2011
    Assignee: Cisco Technology, Inc.
    Inventors: Cary Arnold Bran, Alan D. Gatzke, Jim Kerr
  • Patent number: 7957969
    Abstract: Systems and methods are provided for automatically building a native phonetic lexicon for a speech-based application trained to process a native (base) language, wherein the native phonetic lexicon includes native phonetic transcriptions (base forms) for non-native (foreign) words which are automatically derived from non-native phonetic transcriptions of the non-native words.
    Type: Grant
    Filed: October 1, 2008
    Date of Patent: June 7, 2011
    Assignee: Nuance Communications, Inc.
    Inventors: Neal Alewine, Eric Janke, Paul Sharp, Roberto Sicconi
  • Patent number: 7949534
    Abstract: A system is disclosed for facilitating speech recognition and transcription among users employing incompatible protocols for generating, transcribing, and exchanging speech. The system includes a system transaction manager that receives a speech information request from at least one of the users. The speech information request includes formatted spoken text generated using a first protocol. The system also includes a speech recognition and transcription engine, which communicates with the system transaction manager. The speech recognition and transcription engine receives the speech information request from the system transaction manager and generates a transcribed response, which includes a formatted transcription of the formatted speech. The system transmits the response to the system transaction manager, which routes the response to one or more of the users. The latter users employ a second protocol to handle the response, which may be the same as or different than the first protocol.
    Type: Grant
    Filed: July 5, 2009
    Date of Patent: May 24, 2011
    Assignee: Advanced Voice Recognition Systems, Inc.
    Inventors: Michael K. Davis, Joseph Miglietta, Douglas Holt
  • Patent number: 7949527
    Abstract: This invention relates to processing of audio files, and more specifically, to an improved technique of searching audio. More particularly, a method and system for processing audio using a multi-stage searching process is disclosed.
    Type: Grant
    Filed: December 19, 2007
    Date of Patent: May 24, 2011
    Assignee: Nexidia, Inc.
    Inventors: Jon A. Arrowood, Robert W. Morris, Kenneth K. Griggs
  • Patent number: 7949536
    Abstract: Intelligent speech recognition is used to provide users with the ability to utter more user friendly commands. Satisfaction is increased when a user can vocalize a subset of a formal command name and still have the intended command identified and processed. Moreover, greater accuracy in identifying a command application from a user's utterance can be achieved by ignoring command choices associated with unlikely user utterances. An intelligent speech recognition system can identify differing acceptable verbal command phrase forms, e.g., but not limited to, complete commands, command subsequences and command subsets, for different commands supported by the system. Subset blocking words are identified for assistance in reducing the ambiguity in matching user verbal command phrases with valid commands supported by the intelligent speech recognition system.
    Type: Grant
    Filed: August 31, 2006
    Date of Patent: May 24, 2011
    Assignee: Microsoft Corporation
    Inventors: David Mowatt, Ricky Loynd, Robert Edward Dewar, Rachel Imogen Morton, Qiang Wu, Robert Ian Brown, Michael D. Plumpe, Philipp Heinz Schmid
  • Patent number: 7945445
    Abstract: Methods and apparatus for speech recognition based on a hidden Markov model are disclosed. A disclosed method of speech recognition is based on a hidden Markov model in which words to be recognized are modeled as chains of states and trained using predefined speech data material. Known vocabulary is divided into first and second partial vocabularies where the first partial vocabulary is trained and transcribed using a whole word model and the second partial vocabulary is trained and transcribed using a phoneme-based model in order to obtain a mixed hidden Markov model. The transcriptions from the two models are stored in a single pronunciation lexicon and the mixed hidden Markov model stored in a singe search space. Apparatus are disclosed that also employ a hidden Markov model.
    Type: Grant
    Filed: July 4, 2001
    Date of Patent: May 17, 2011
    Assignee: SVOX AG
    Inventors: Erwin Marschall, Meinrad Niemoeller, Ralph Wilhelm
  • Publication number: 20110109539
    Abstract: A behavior recognition system and method by combining an image and a speech are provided. The system includes a data analyzing module, a database, and a calculating module. A plurality of image-and-speech relation modules is stored in the database. Each image-and-speech relation module includes a feature extraction parameter and an image-and-speech relation parameter. The data analyzing module obtains a gesture image and a speech data corresponding to each other, and substitutes the gesture image and the speech data into each feature extraction parameter to generate image feature sequences and speech feature sequences. The data analyzing module uses each image-and-speech relation parameter to calculate image-and-speech status parameters.
    Type: Application
    Filed: December 9, 2009
    Publication date: May 12, 2011
    Inventors: Chung-Hsien Wu, Jen-Chun Lin, Wen-Li Wei, Chia-Te Chu, Red-Tom Lin, Chin-Shun Hsu
  • Publication number: 20110106792
    Abstract: The invention provides a method for retrieving similar sounding words from an electronic database. An input or query word is first converted to a string of corresponding phonemes. The string of phonemes is then used to generate a key, with the key made up of elements corresponding to the phonemes. In a preferred embodiment the key elements correspond to classes of phonemes. The electronic database comprises a plurality of words, each of which have a corresponding, phoneme-based key. Words in the database having a key identical to the key of the input word are retrieved and output. The use of phonemes in generating the search key results in the retrieval of similar sounding words. In another aspect, the invention provides a method of providing a similarity score for an output word or a list of output words compared to an input word. All of the output words are converted into phonemes and the score is based on a comparison of the phonemes in the input word with the phonemes in each output word.
    Type: Application
    Filed: November 5, 2010
    Publication date: May 5, 2011
    Applicant: I2 LIMITED
    Inventor: Ian Robertson
  • Patent number: 7933766
    Abstract: A method of generating a natural language model for use in a spoken dialog system is disclosed. The method comprises using sample utterances and creating a number of hand crafted rules for each call-type defined in a labeling guide. A first NLU model is generated and tested using the hand crafted rules and sample utterances. A second NLU model is built using the sample utterances as new training data and using the hand crafted rules. The second NLU model is tested for performance using a first batch of labeled data. A series of NLU models are built by adding a previous batch of labeled data to training data and using a new batch of labeling data as test data to generate the series of NLU models with training data that increases constantly. If not all the labeling data is received, the method comprises repeating the step of building a series of NLU models until all labeling data is received.
    Type: Grant
    Filed: October 20, 2009
    Date of Patent: April 26, 2011
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Narendra K. Gupta, Mazin G. Rahim, Gokhan Tur, Antony Van der Mude
  • Publication number: 20110093270
    Abstract: A method includes identifying a first syllable in a first audio of a first word and a second syllable in a second audio of a second word, the first syllable having a first set of properties and the second syllable having a second set of properties; detecting the first syllable in a first instance of the first word in an audio file, the first syllable in the first instance having a third set of properties; determining one or more transformations for transforming the first set of properties to the third set of properties; applying the one or more transformations to the second set of properties of the second syllable to yield a transformed second syllable; and replacing the first syllable in the first instance of the first word with the transformed second syllable in the audio file.
    Type: Application
    Filed: October 16, 2009
    Publication date: April 21, 2011
    Applicant: Yahoo! Inc.
    Inventor: Narayan Lakshmi BHAMIDIPATI
  • Patent number: 7925506
    Abstract: The invention provides a system and method for improving speech recognition. A computer software system is provided for implementing the system and method. A user of the computer software system may speak to the system directly and the system may respond, in spoken language, with an appropriate response. Grammar rules may be generated automatically from sample utterances when implementing the system for a particular application. Dynamic grammar rules may also be generated during interaction between the user and the system. In addition to arranging searching order of grammar files based on a predetermined hierarchy, a dynamically generated searching order based on history of contexts of a single conversation may be provided for further improved speech recognition.
    Type: Grant
    Filed: October 5, 2004
    Date of Patent: April 12, 2011
    Assignee: Inago Corporation
    Inventors: Gary Farmaner, Ron Dicarlantonio, Huw Leonard
  • Patent number: 7921011
    Abstract: Methods for optimizing grammar structure for a set of phrases to be used in speech recognition during a computing event are provided. One method includes receiving a set of phrases, the set of phrases being relevant for the computing event and the set of phrases having a node and link structure. Also included is identifying redundant nodes by examining the node and link structures of each of the set of phrases so as to generate a single node for the redundant nodes. The method further includes examining the node and link structures to identify nodes that are capable of being vertically grouped and grouping the identified nodes to define vertical word groups. The method continues with fusing nodes of the set of phrases that are not vertically grouped into fused word groups. Wherein the vertical word groups and the fused word groups are linked to define an optimized grammar structure.
    Type: Grant
    Filed: May 19, 2006
    Date of Patent: April 5, 2011
    Assignee: Sony Computer Entertainment Inc.
    Inventors: Gustavo Hernandez Abrego, Ruxin Chen
  • Patent number: 7917361
    Abstract: A method for training a spoken language identification system to identify an unknown language as one of a plurality of known candidate languages includes the process of creating a sound inventory comprising a plurality of sound tokens, the collective plurality of sound tokens provided from a subset of the known candidate languages. The method further includes providing a plurality of training samples, each training sample composed within one of the known candidate languages. Further included is the process of generating one or more training vectors from each training database, wherein each training vector is defined as a function of said plurality of sound tokens provided from said subset of the known candidate languages. The method further includes associating each training vector with the candidate language of the corresponding training sample.
    Type: Grant
    Filed: September 19, 2005
    Date of Patent: March 29, 2011
    Assignee: Agency for Science, Technology and Research
    Inventors: Haizhou Li, Bin Ma, George M. White
  • Patent number: 7912716
    Abstract: Generating words and/or names, comprising: receiving at least one corpus based on a given language; generating a plurality of N-grams of phonemes and a plurality of frequencies of occurrence using the corpus, such that each frequency of occurrence corresponds to a respective pair of phonemes and indicates the frequency of the second phoneme in the pair following the first phoneme in the pair; generating a phoneme tree using the plurality of N-grams of phonemes and the plurality of frequencies of occurrence; performing a random walk on the phoneme tree using the frequencies of occurrence to generate a sequence of phonemes; and mapping the sequence of phonemes into a sequence of graphemes.
    Type: Grant
    Filed: October 6, 2005
    Date of Patent: March 22, 2011
    Assignee: Sony Online Entertainment LLC
    Inventor: Patrick McCuller
  • Patent number: 7912717
    Abstract: The invention uses the ModelGrower program to generate possible candidates from an original or aggregated model. An isomorphic reduction program operates on the candidates to identify and exclude isomorphic models. A Markov model evaluation and optimization program operates on the remaining non-isomorphic candidates. The candidates are optimized and the ones that most closely conform to the data are kept. The best optimized candidate of one stage becomes the starting candidate for the next stage where ModelGrower and the other programs operate on the optimized candidate to generate a new optimized candidate. The invention repeats the steps of growing, excluding isomorphs, evaluating and optimizing until such repetitions yield no significantly better results.
    Type: Grant
    Filed: November 18, 2005
    Date of Patent: March 22, 2011
    Inventor: Albert Galick
  • Patent number: 7912724
    Abstract: Audio comparison using phoneme matching is described, including evaluating audio data associated with a file, identifying a sequence of phonemes in the audio data, associating the file with a product category based on a match indicating the sequence of phonemes is substantially similar to another sequence of phonemes, the file being stored, and accessing the file when a request associated with the product category is detected.
    Type: Grant
    Filed: January 18, 2007
    Date of Patent: March 22, 2011
    Assignee: Adobe Systems Incorporated
    Inventor: James Moorer
  • Publication number: 20110066437
    Abstract: Methods and apparatus to construct and transmit content-aware watermarks are disclosed herein. An example method of creating a content-aware watermark includes selecting at least one word associated with a media composition; representing the word with at least one phonetic notation; obtaining a proxy code for each phonetic notation; and locating the proxy code in the content-aware watermark.
    Type: Application
    Filed: December 11, 2009
    Publication date: March 17, 2011
    Inventor: Robert Luff
  • Patent number: 7904296
    Abstract: An approach to wordspotting (180) using query data from one or more spoken instance of a query (140). The query data is processed to determining a representation of the query (160) that defines multiple sequences of subword (130) units each representing the query. Then putative instances of the query (190) are located in input data from an audio signal using the determined representation of the query.
    Type: Grant
    Filed: July 22, 2004
    Date of Patent: March 8, 2011
    Assignee: Nexidia Inc.
    Inventor: Robert W. Morris
  • Publication number: 20110054892
    Abstract: The present invention relates to a continuous speech recognition system that is very robust in a noisy environment. In order to recognize continuous speech smoothly in a noisy environment, the system selects call commands, configures a minimum recognition network in token, which consists of the call commands and mute intervals including noises, recognizes the inputted speech continuously in real time, analyzes the reliability of speech recognition continuously and recognizes the continuous speech from a speaker. When a speaker delivers a call command, the system for detecting the speech interval and recognizing continuous speech in a noisy environment through the real-time recognition of call commands measures the reliability of the speech after recognizing the call command, and recognizes the speech from the speaker by transferring the speech interval following the call command to a continuous speech-recognition engine at the moment when the system recognizes the call command.
    Type: Application
    Filed: April 22, 2009
    Publication date: March 3, 2011
    Applicant: KOREAPOWERVOICE CO., LTD.
    Inventors: Heui-Suck Jung, Se-Hoon Chin, Tae-Young Roh
  • Publication number: 20110054901
    Abstract: A method and apparatus for aligning texts. The method includes acquiring a target text and a reference text and aligning the target text and the reference text at word level based on phoneme similarity. The method can be applied to automatically archiving a multimedia resource and a method of automatically searching a multimedia resource.
    Type: Application
    Filed: August 27, 2010
    Publication date: March 3, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Yong Qin, Qin Shi, Zhiwei Shuang, Shi Lei Zhang, Jie Zhou
  • Patent number: 7899891
    Abstract: A network mobility server, which includes a target device inventory module, a data collection module, a data management module and a distribution module. The data management module, includes at least one data storage module, in which at least a portion of the data stored therein are identical data items stored in different selected formats suitable for use on mobile computing and telecommunication devices. The network also includes network agents, resident on numbers of the network members.
    Type: Grant
    Filed: July 9, 2010
    Date of Patent: March 1, 2011
    Assignee: Soonr Corporation
    Inventors: Martin Frid-Nielsen, Steven Ray Boye, Lars Gunnersen, Song Zun Huang