Subportions Patents (Class 704/254)

Using child directed speech to bootstrap a model based speech segmentation and recognition system

Patent number: 8069042

Abstract: A method and system for obtaining a pool of speech syllable models. The model pool is generated by first detecting a training segment using unsupervised speech segmentation or speech unit spotting. If the model pool is empty, a first speech syllable model is trained and added to the model pool. If the model pool is not empty, an existing model is determined from the model pool that best matches the training segment. Then the existing module is scored for the training segment. If the score is less than a predefined threshold, a new model for the training segment is created and added to the pool. If the score equals the threshold or is larger than the threshold, the training segment is used to improve or to re-estimate the model.

Type: Grant

Filed: September 21, 2007

Date of Patent: November 29, 2011

Assignee: Honda Research Institute Europe GmbH

Inventors: Frank Joublin, Holger Brandl
Transient signal encoding method and device, decoding method and device, and processing system

Patent number: 8063809

Abstract: A transient signal encoding method and device, decoding method and device, and processing system, where the transient signal encoding method includes: obtaining a reference sub-frame where a maximal time envelope having a maximal amplitude value is located from time envelopes of all sub-frames of an input transient signal; adjusting an amplitude value of the time envelope of each sub-frame before the reference sub-frame in such a way that a first difference is greater than a preset first threshold, in which the first difference is a difference between the amplitude value of the time envelope of each sub-frame before the reference sub-frame and the amplitude value of the maximal time envelope; and writing the adjusted time envelope into bitstream.

Type: Grant

Filed: June 29, 2011

Date of Patent: November 22, 2011

Assignee: Huawei Technologies Co., Ltd.

Inventors: Zexin Liu, Longyin Chen, Lei Miao, Chen Hu, Wei Xiao, Herve Marcel Taddei, Qing Zhang
AUTOMATIC NORMALIZATION OF SPOKEN SYLLABLE DURATION

Publication number: 20110282650

Abstract: A very common problem is when people speak a language other than the language which they are accustomed, syllables can be spoken for longer or shorter than the listener would regard as appropriate. An example of this can be observed when people who have a heavy Japanese accent speak English. Since Japanese words end with vowels, there is a tendency for native Japanese to add a vowel sound to the end of English words that should end with a consonant. Illustratively, native Japanese speakers often pronounce “orange” as “orenji.” An aspect provides an automatic speech-correcting process that would not necessarily need to know that fruit is being discussed; the system would only need to know that the speaker is accustomed to Japanese, that the listener is accustomed to English, that “orenji” is not a word in English, and that “orenji” is a typical Japanese mispronunciation of the English word “orange.

Type: Application

Filed: May 17, 2010

Publication date: November 17, 2011

Applicant: AVAYA INC.

Inventors: Terry Jennings, Paul Roller Michaelis
Methods and System for Grammar Fitness Evaluation as Speech Recognition Error Predictor

Publication number: 20110282667

Abstract: A plurality of statements are received from within a grammar structure. Each of the statements is formed by a number of word sets. A number of alignment regions across the statements are identified by aligning the statements on a word set basis. Each aligned word set represents an alignment region. A number of potential confusion zones are identified across the statements. Each potential confusion zone is defined by words from two or more of the statements at corresponding positions outside the alignment regions. For each of the identified potential confusion zones, phonetic pronunciations of the words within the potential confusion zone are analyzed to determine a measure of confusion probability between the words when audibly processed by a speech recognition system during the computing event. An identity of the potential confusion zones across the statements and their corresponding measure of confusion probability are reported to facilitate grammar structure improvement.

Type: Application

Filed: May 14, 2010

Publication date: November 17, 2011

Applicant: Sony Computer Entertainment Inc.

Inventor: Gustavo A. Hernandez-Abrego
Dialog processing system, dialog processing method and computer program

Patent number: 8060365

Abstract: A dialog processing system which includes a target expression data extraction unit for extracting a plurality of target expression data each including a pattern matching portion which matches an utterance pattern, which are inputted by an utterance pattern input unit and is an utterance structure derived from contents of field-independent general conversations, among a plurality of utterance data which are inputted by an utterance data input unit and obtained by converting contents of a plurality of conversations in one field; a feature extraction unit for retrieving the pattern matching portions, respectively, from the plurality of target expression data extracted, and then for extracting feature quantity common to the plurality of pattern matching portions; and a mandatory data extraction unit for extracting mandatory data in the one field included in the plurality of utterance data by use of the feature quantities extracted.

Type: Grant

Filed: July 3, 2008

Date of Patent: November 15, 2011

Assignee: Nuance Communications, Inc.

Inventors: Nobuyasu Itoh, Shiho Negishi, Hironori Takeuchi
System for generating and selecting names

Patent number: 8050924

Abstract: Methods and apparatus for implementing the generation of names. In one implementation, a system for generating a name includes: a user interface that receives user input including values for corresponding characteristics and name lengths; a rule dictionary that indicates one or more rules, each rule indicating a relationship between a phoneme and a characteristic; a phoneme selector that selects a phoneme using a value for a characteristic received through said user interface and a rule corresponding to that characteristic; a phoneme compiler that combines selected phonemes to form a name, wherein said name includes a number of letters based on said name length; storage storing data, including data representing said user input and said rule dictionary; and a processor for executing instructions providing said user interface, said first phoneme selector, said second phoneme selector, and said phoneme compiler.

Type: Grant

Filed: April 8, 2005

Date of Patent: November 1, 2011

Assignee: Sony Online Entertainment LLC

Inventor: Patrick McCuller
Speaker adaptation of vocabulary for speech recognition

Patent number: 8046224

Abstract: A phonetic vocabulary for a speech recognition system is adapted to a particular speaker's pronunciation. A speaker can be attributed specific pronunciation styles, which can be identified from specific pronunciation examples. Consequently, a phonetic vocabulary can be reduced in size, which can improve recognition accuracy and recognition speed.

Type: Grant

Filed: April 18, 2008

Date of Patent: October 25, 2011

Assignee: Nuance Communications, Inc.

Inventors: Nitendra Rajput, Ashish Verma
Speech and method for identifying perceptual features

Patent number: 8046218

Abstract: A system and method for phone detection. The system includes a microphone configured to receive a speech signal in an acoustic domain and convert the speech signal from the acoustic domain to an electrical domain, and a filter bank coupled to the microphone and configured to receive the converted speech signal and generate a plurality of channel speech signals corresponding to a plurality of channels respectively. Additionally, the system includes a plurality of onset enhancement devices configured to receive the plurality of channel speech signals and generate a plurality of onset enhanced signals. Each of the plurality of onset enhancement devices is configured to receive one of the plurality of channel speech signals, enhance one or more onsets of one or more signal pulses for the received one of the plurality of channel speech signals, and generate one of the plurality of onset enhanced signals.

Type: Grant

Filed: September 18, 2007

Date of Patent: October 25, 2011

Assignee: The Board of Trustees of the University of Illinois

Inventors: Jont B. Allen, Marion Regnier
GRAPHEME-TO-PHONEME CONVERSION USING ACOUSTIC DATA

Publication number: 20110251844

Abstract: Described is the use of acoustic data to improve grapheme-to-phoneme conversion for speech recognition, such as to more accurately recognize spoken names in a voice-dialing system. A joint model of acoustics and graphonemes (acoustic data, phonemes sequences, grapheme sequences and an alignment between phoneme sequences and grapheme sequences) is described, as is retraining by maximum likelihood training and discriminative training in adapting graphoneme model parameters using acoustic data. Also described is the unsupervised collection of grapheme labels for received acoustic data, thereby automatically obtaining a substantial number of actual samples that may be used in retraining. Speech input that does not meet a confidence threshold may be filtered out so as to not be used by the retrained model.

Type: Application

Filed: June 20, 2011

Publication date: October 13, 2011

Applicant: MICROSOFT CORPORATION

Inventors: Xiao Li, Asela J. R. Gunawardana, Alejandro Acero
Sequential presentation of long instructions in an interactive voice response system

Patent number: 8036348

Abstract: A method of presenting instructions to a user sending an incoming communication to a service center includes presenting a menu to the user. The menu includes a plurality of procedure descriptors to the user. The user is presented, according to a selection of one of the procedure descriptors by the user, a sequence of instructions which enable completion of a procedure described by the selected procedure descriptor. The incoming communication is transferred at a position in the sequence of instructions to a representative. The incoming communication is also transferred back to the same position in the sequence of instructions.

Type: Grant

Filed: October 14, 2008

Date of Patent: October 11, 2011

Assignee: AT&T Labs, Inc.

Inventors: Philip Ted Kortum, Robert R. Bushey
Method and apparatus for recognizing continuous speech using search space restriction based on phoneme recognition

Patent number: 8032374

Abstract: Provided are an apparatus and method for recognizing continuous speech using search space restriction based on phoneme recognition. In the apparatus and method, a search space can be primarily reduced by restricting connection words to be shifted at a boundary between words based on the phoneme recognition result. In addition, the search space can be secondarily reduced by rapidly calculating a degree of similarity between the connection word to be shifted and the phoneme recognition result using a phoneme code and shifting the corresponding phonemes to only connection words having degrees of similarity equal to or higher than a predetermined reference value. Therefore, the speed and performance of the speech recognition process can be improved in various speech recognition services.

Type: Grant

Filed: December 4, 2007

Date of Patent: October 4, 2011

Assignee: Electronics and Telecommunications Research Institute

Inventors: Hyung Bae Jeon, Jun Park, Seung Hi Kim, Kyu Woong Hwang
Technique for training a phonetic decision tree with limited phonetic exceptional terms

Patent number: 8027834

Abstract: The present invention discloses a method for training an exception-limited phonetic decision tree. An initial subset of data can be selected and used for creating an initial phonetic decision tree. Additional terms can then be incorporated into the subset. The enlarged subset can be used to evaluate the phonetic decision tree with the results being categorized as either correctly or incorrectly phonetized. An exception-limited phonetic tree can be generated from the set of correctly phonetized terms. If the termination conditions for the method have been determined to be unsatisfactorily met, then steps of the method can be repeated.

Type: Grant

Filed: June 25, 2007

Date of Patent: September 27, 2011

Assignee: Nuance Communications, Inc.

Inventor: Steven M. Hancock
System and method of word lattice augmentation using a pre/post vocalic consonant distinction

Patent number: 8024191

Abstract: Systems and methods are provided for recognizing speech in a spoken dialogue system. The method includes receiving input speech having a pre-vocalic consonant or a post-vocalic consonant, generating at least one output lattice that calculates a first score by comparing the input speech to a training model to provide a result and distinguishing between the pre-vocalic consonant and the post-vocalic consonant in the input speech. A second score is calculated by measuring a similarity between the pre-vocalic consonant or the post vocalic consonant in the input speech and the first score. At least one category is determined for the pre-vocalic match or mismatch or the post-vocalic match or mismatch by using the second score and the results of the an automated speech recognition (ASR) system are refined by using the at least one category for the pre-vocalic match or mismatch or the post-vocalic match or mismatch.

Type: Grant

Filed: October 31, 2007

Date of Patent: September 20, 2011

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Yeon-Jun Kim, Alistair Conkie, Andrej Ljolje, Ann K. Syrdal
Automatic speech recognition learning using user corrections

Patent number: 8019602

Abstract: An automatic speech recognition system recognizes user changes to dictated text and infers whether such changes result from the user changing his/her mind, or whether such changes are a result of a recognition error. If a recognition error is detected, the system uses the type of user correction to modify itself to reduce the chance that such recognition error will occur again. Accordingly, the system and methods provide for significant speech recognition learning with little or no additional user interaction.

Type: Grant

Filed: January 20, 2004

Date of Patent: September 13, 2011

Assignee: Microsoft Corporation

Inventors: Dong Yu, Peter Mau, Mei-Yuh Hwang, Alejandro Acero
Method and apparatus for uniterm discovery and voice-to-voice search on mobile device

Patent number: 8019604

Abstract: A method, system and communication device for enabling uniterm discovery from audio content and voice-to-voice searching of audio content stored on a device using discovered uniterms. Received audio/voice input signal is sent to a uniterm discovery and search (UDS) engine within the device. The audio data may be associated with other content that is also stored within the device. The UDS engine retrieves a number of uniterms from the audio data and associates the uniterms with the stored content. When a voice search is initiated at the device, the UDS engine generates a statistical latent lattice model from the voice query and scores the uniterms from the audio database against the latent lattice model. Following a further refinement, the best group of uniterms is then determined and segments of the stored audio data and/or other content corresponding to the best group of uniterms are outputted.

Type: Grant

Filed: December 21, 2007

Date of Patent: September 13, 2011

Assignee: Motorola Mobility, Inc.

Inventor: Changxue Ma
Continuous Speech Recognition

Publication number: 20110218802

Abstract: A computerized method for continuous speech recognition using a speech recognition engine and a phoneme model. The computerized method inputs a speech signal into the speech recognition engine. Based on the phoneme model, the speech signal is indexed by scoring for the phonemes of the phoneme model and a time-ordered list of phoneme candidates and respective scores resulting from the scoring are produced. The phoneme candidates are input with the scores from the time-ordered list. Word transcription candidates are typically input from a dictionary and words are built by selecting from the word transcription candidates based on the scores. A stream of transcriptions is outputted corresponding to the input speech signal. The stream of transcriptions is re-scored by searching for and detecting anomalous word transcriptions in the stream of transcriptions to produce second scores.

Type: Application

Filed: March 8, 2010

Publication date: September 8, 2011

Inventors: Shlomi Hai Bouganim, Boris Levant
System and method of using acoustic models for automatic speech recognition which distinguish pre- and post-vocalic consonants

Patent number: 8015008

Abstract: Disclosed are systems, methods and computer readable media for training acoustic models for an automatic speech recognition systems (ASR) system. The method includes receiving a speech signal, defining at least one syllable boundary position in the received speech signal, based on the at least one syllable boundary position, generating for each consonant in a consonant phoneme inventory a pre-vocalic position label and a post-vocalic position label to expand the consonant phoneme inventory, reformulating a lexicon to reflect an expanded consonant phoneme inventory, and training a language model for an automated speech recognition (ASR) system based on the reformulated lexicon.

Type: Grant

Filed: October 31, 2007

Date of Patent: September 6, 2011

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Yeon-Jun Kim, Alistair Conkie, Andrej Ljolje, Ann K. Syrdal
Method and system for automatically detecting morphemes in a task classification system using lattices

Patent number: 8010361

Abstract: In an embodiment, a lattice of phone strings in an input communication of a user may be recognized, wherein the lattice may represent a distribution over the phone strings. Morphemes in the input communication of the user may be detected using the recognized lattice. Task-type classification decisions may be made based on the detected morphemes in the input communication of the user.

Type: Grant

Filed: July 30, 2008

Date of Patent: August 30, 2011

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Allen Louis Gorin, Dijana Petrovska-Delacretaz, Giuseppe Riccardi, Jeremy Huntley Wright
Speech analysis using statistical learning

Patent number: 8005676

Abstract: Included are embodiments for providing speech analysis. At least one embodiment of a method includes receiving audio data associated with a communication and providing the at least one phoneme in a phonetic transcript, the phonetic transcript including at least one character from a phonetic alphabet.

Type: Grant

Filed: September 29, 2006

Date of Patent: August 23, 2011

Assignee: Verint Americas, Inc.

Inventors: Gary Duke, Joseph Watson
System and method of using meta-data in speech processing

Patent number: 7996224

Abstract: Systems and methods relate to generating a language model for use in, for example, a spoken dialog system or some other application. The method comprises building a class-based language model, generating at least one sequence network and replacing class labels in the class-based language model with the at least one sequence network. In this manner, placeholders or tokens associated with classes can be inserted into the models at training time and word/phone networks can be built based on meta-data information at test time. Finally, the placeholder token can be replaced with the word/phone networks at run time to improve recognition of difficult words such as proper names.

Type: Grant

Filed: October 29, 2004

Date of Patent: August 9, 2011

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Michiel A. U. Bacchiani, Sameer Raj Maskey, Brian E. Roark, Richard William Sproat
Grapheme-to-phoneme conversion using acoustic data

Patent number: 7991615

Abstract: Described is the use of acoustic data to improve grapheme-to-phoneme conversion for speech recognition, such as to more accurately recognize spoken names in a voice-dialing system. A joint model of acoustics and graphonemes (acoustic data, phonemes sequences, grapheme sequences and an alignment between phoneme sequences and grapheme sequences) is described, as is retraining by maximum likelihood training and discriminative training in adapting graphoneme model parameters using acoustic data. Also described is the unsupervised collection of grapheme labels for received acoustic data, thereby automatically obtaining a substantial number of actual samples that may be used in retraining. Speech input that does not meet a confidence threshold may be filtered out so as to not be used by the retrained model.

Type: Grant

Filed: December 7, 2007

Date of Patent: August 2, 2011

Assignee: Microsoft Corporation

Inventors: Xiao Li, Asela J. R. Gunawardana, Alejandro Acero
SPEECH RECOGNITION APPARATUS, SPEECH RECOGNITION METHOD, AND SPEECH RECOGNITION ROBOT

Publication number: 20110184737

Abstract: A speech recognition apparatus includes a speech input unit that receives input speech, a phoneme recognition unit that recognizes phonemes of the input speech and generates a first phoneme sequence representing corrected speech, a matching unit that matches the first phoneme sequence with a second phoneme sequence representing original speech, and a phoneme correcting unit that corrects phonemes of the second phoneme sequence based on the matching result.

Type: Application

Filed: January 27, 2011

Publication date: July 28, 2011

Applicant: HONDA MOTOR CO., LTD.

Inventors: Mikio NAKANO, Naoto IWAHASHI, Kotaro FUNAKOSHI, Taisuke SUMII
Method and system for improved speech recognition by degrading utterance pronunciations

Patent number: 7983914

Abstract: A speech recognition system or method can include a speech input device and a processor coupled to the speech input device. The processor can be programmed to identify a plurality of words that are members of confusable pairs of words where each pair includes a target word and a substituted word. The processor can degrade a pronunciation of the substituted word to provide a worse pronunciation of the substituted word. The processor can further compare the pronunciation of the target word with the worse pronunciation to the substituted word. The processor can be further programmed to reduce confusion between the substituted word and other words in a recognition grammar of the speech recognition engine and can also narrow the scope within which the substituted word is recognized.

Type: Grant

Filed: August 10, 2005

Date of Patent: July 19, 2011

Assignee: Nuance Communications, Inc.

Inventors: John W. Eckhart, Harvey M. Ruback
Audio content search engine

Patent number: 7983915

Abstract: A method of generating an audio content index for use by a search engine includes determining a phoneme sequence based on recognized speech from an audio content time segment. The method also includes identifying k-phonemes which occur within the phoneme sequence. The identified k-phonemes are stored within a data structure such that the identified k-phonemes are capable of being compared with k-phonemes from a search query.

Type: Grant

Filed: April 30, 2007

Date of Patent: July 19, 2011

Assignee: Sonic Foundry, Inc.

Inventors: Michael J. Knight, Jonathan Scott, Steven J. Yurick, John Hancock
Operator recognition device, operator recognition method and operator recognition program

Patent number: 7979718

Abstract: An operator recognition device is provided that eliminates the registration of data such as HMM data having a characteristic amount for which error in recognition occurs easily when recognizing an operator, and thus reduces the possibility of errors in recognition, and has stable recognition performance. When registering HMM data that is used when performing recognition processing, a speaker recognition device 100 eliminates the registration of HMM data of a password having a characteristic amount of the spoken voice component that is similar to a characteristic amount that is indicated by HMM data that is already registered, and does not allow the registration of HMM data for which it is estimated that error in recognition will occur easily during the recognition process.

Type: Grant

Filed: March 24, 2006

Date of Patent: July 12, 2011

Assignees: Pioneer Corporation, Tech Experts Incorporation

Inventors: Soichi Toyama, Ikuo Fujita, Mitsuya Komamura
SPOKEN MOBILE ENGINE

Publication number: 20110166860

Abstract: Systems and methods are disclosed to operate a mobile device by capturing user input; transmitting the user input over a wireless channel to an engine, analyzing at the engine music clip or video in a multimedia data stream and sending an analysis wirelessly to the mobile device.

Type: Application

Filed: July 12, 2010

Publication date: July 7, 2011

Inventor: Bao Q. Tran
Operating method for an automated language recognizer intended for the speaker-independent language recognition of words in different languages and automated language recognizer

Patent number: 7974843

Abstract: The invention relates to an operating method for an automated language recognizer intended for the speaker-independent language recognition of words from different languages, particularly for recognizing names from different languages. The method is based on a language defined as the mother tongue and has an input phase for establishing a language recognizer vocabulary. Phonetic transcripts are determined for words in various languages in order to obtain phoneme sequences for pronunciation variants. The phonemes of each relevant phoneme set of the mother tongue are then specifically mapped to determine phoneme sequences that correspond to pronunciation variants.

Type: Grant

Filed: January 2, 2003

Date of Patent: July 5, 2011

Assignee: Siemens Aktiengesellschaft

Inventor: Tobias Schneider
Audio Comparison Using Phoneme Matching

Publication number: 20110153329

Abstract: Audio comparison using phoneme matching is described, including evaluating audio data associated with a file, identifying a sequence of phonemes in the audio data, associating the file with a product category based on a match indicating the sequence of phonemes is substantially similar to another sequence of phonemes, the file being stored, and accessing the file when a request associated with the product category is detected.

Type: Application

Filed: February 28, 2011

Publication date: June 23, 2011

Inventor: James A. Moorer
UNSUPERVISED LEARNING USING GLOBAL FEATURES, INCLUDING FOR LOG-LINEAR MODEL WORD SEGMENTATION

Publication number: 20110144992

Abstract: Described is a technology for performing unsupervised learning using global features extracted from unlabeled examples. The unsupervised learning process may be used to train a log-linear model, such as for use in morphological segmentation of words. For example, segmentations of the examples are sampled based upon the global features to produce a segmented corpus and log-linear model, which are then iteratively reprocessed to produce a final segmented corpus and a log-linear model.

Type: Application

Filed: December 15, 2009

Publication date: June 16, 2011

Applicant: Microsoft Corporation

Inventors: Kristina N. Toutanova, Colin Andrew Cherry, Hoifung Poon
Method and system to select messages using voice commands and a telephone user interface

Patent number: 7961851

Abstract: A system and method to selectively retrieve stored messages are disclosed. The method comprises receiving a voice command from a user, the voice command comprising at least one spoken search identifier, determining at least one stored message that corresponds to the spoken search identifier, and presenting the at least one stored message to the user. The plurality of stored messages may comprise content data for allowing messages with matching content data to be retrieved. The plurality of stored messages may also comprise caller data, for allowing messages with matching caller data to be retrieved. The stored messages may either be voice messages or text messages.

Type: Grant

Filed: July 26, 2006

Date of Patent: June 14, 2011

Assignee: Cisco Technology, Inc.

Inventors: Cary Arnold Bran, Alan D. Gatzke, Jim Kerr
Systems and methods for building a native language phoneme lexicon having native pronunciations of non-native words derived from non-native pronunciatons

Patent number: 7957969

Abstract: Systems and methods are provided for automatically building a native phonetic lexicon for a speech-based application trained to process a native (base) language, wherein the native phonetic lexicon includes native phonetic transcriptions (base forms) for non-native (foreign) words which are automatically derived from non-native phonetic transcriptions of the non-native words.

Type: Grant

Filed: October 1, 2008

Date of Patent: June 7, 2011

Assignee: Nuance Communications, Inc.

Inventors: Neal Alewine, Eric Janke, Paul Sharp, Roberto Sicconi
Speech recognition and transcription among users having heterogeneous protocols

Patent number: 7949534

Abstract: A system is disclosed for facilitating speech recognition and transcription among users employing incompatible protocols for generating, transcribing, and exchanging speech. The system includes a system transaction manager that receives a speech information request from at least one of the users. The speech information request includes formatted spoken text generated using a first protocol. The system also includes a speech recognition and transcription engine, which communicates with the system transaction manager. The speech recognition and transcription engine receives the speech information request from the system transaction manager and generates a transcribed response, which includes a formatted transcription of the formatted speech. The system transmits the response to the system transaction manager, which routes the response to one or more of the users. The latter users employ a second protocol to handle the response, which may be the same as or different than the first protocol.

Type: Grant

Filed: July 5, 2009

Date of Patent: May 24, 2011

Assignee: Advanced Voice Recognition Systems, Inc.

Inventors: Michael K. Davis, Joseph Miglietta, Douglas Holt
Multiresolution searching

Patent number: 7949527

Abstract: This invention relates to processing of audio files, and more specifically, to an improved technique of searching audio. More particularly, a method and system for processing audio using a multi-stage searching process is disclosed.

Type: Grant

Filed: December 19, 2007

Date of Patent: May 24, 2011

Assignee: Nexidia, Inc.

Inventors: Jon A. Arrowood, Robert W. Morris, Kenneth K. Griggs
Intelligent speech recognition of incomplete phrases

Patent number: 7949536

Abstract: Intelligent speech recognition is used to provide users with the ability to utter more user friendly commands. Satisfaction is increased when a user can vocalize a subset of a formal command name and still have the intended command identified and processed. Moreover, greater accuracy in identifying a command application from a user's utterance can be achieved by ignoring command choices associated with unlikely user utterances. An intelligent speech recognition system can identify differing acceptable verbal command phrase forms, e.g., but not limited to, complete commands, command subsequences and command subsets, for different commands supported by the system. Subset blocking words are identified for assistance in reducing the ambiguity in matching user verbal command phrases with valid commands supported by the intelligent speech recognition system.

Type: Grant

Filed: August 31, 2006

Date of Patent: May 24, 2011

Assignee: Microsoft Corporation

Inventors: David Mowatt, Ricky Loynd, Robert Edward Dewar, Rachel Imogen Morton, Qiang Wu, Robert Ian Brown, Michael D. Plumpe, Philipp Heinz Schmid
Hybrid lexicon for speech recognition

Patent number: 7945445

Abstract: Methods and apparatus for speech recognition based on a hidden Markov model are disclosed. A disclosed method of speech recognition is based on a hidden Markov model in which words to be recognized are modeled as chains of states and trained using predefined speech data material. Known vocabulary is divided into first and second partial vocabularies where the first partial vocabulary is trained and transcribed using a whole word model and the second partial vocabulary is trained and transcribed using a phoneme-based model in order to obtain a mixed hidden Markov model. The transcriptions from the two models are stored in a single pronunciation lexicon and the mixed hidden Markov model stored in a singe search space. Apparatus are disclosed that also employ a hidden Markov model.

Type: Grant

Filed: July 4, 2001

Date of Patent: May 17, 2011

Assignee: SVOX AG

Inventors: Erwin Marschall, Meinrad Niemoeller, Ralph Wilhelm
BEHAVIOR RECOGNITION SYSTEM AND METHOD BY COMBINING IMAGE AND SPEECH

Publication number: 20110109539

Abstract: A behavior recognition system and method by combining an image and a speech are provided. The system includes a data analyzing module, a database, and a calculating module. A plurality of image-and-speech relation modules is stored in the database. Each image-and-speech relation module includes a feature extraction parameter and an image-and-speech relation parameter. The data analyzing module obtains a gesture image and a speech data corresponding to each other, and substitutes the gesture image and the speech data into each feature extraction parameter to generate image feature sequences and speech feature sequences. The data analyzing module uses each image-and-speech relation parameter to calculate image-and-speech status parameters.

Type: Application

Filed: December 9, 2009

Publication date: May 12, 2011

Inventors: Chung-Hsien Wu, Jen-Chun Lin, Wen-Li Wei, Chia-Te Chu, Red-Tom Lin, Chin-Shun Hsu
System and method for word matching and indexing

Publication number: 20110106792

Abstract: The invention provides a method for retrieving similar sounding words from an electronic database. An input or query word is first converted to a string of corresponding phonemes. The string of phonemes is then used to generate a key, with the key made up of elements corresponding to the phonemes. In a preferred embodiment the key elements correspond to classes of phonemes. The electronic database comprises a plurality of words, each of which have a corresponding, phoneme-based key. Words in the database having a key identical to the key of the input word are retrieved and output. The use of phonemes in generating the search key results in the retrieval of similar sounding words. In another aspect, the invention provides a method of providing a similarity score for an output word or a list of output words compared to an input word. All of the output words are converted into phonemes and the score is based on a comparison of the phonemes in the input word with the phonemes in each output word.

Type: Application

Filed: November 5, 2010

Publication date: May 5, 2011

Applicant: I2 LIMITED

Inventor: Ian Robertson
Method for building a natural language understanding model for a spoken dialog system

Patent number: 7933766

Abstract: A method of generating a natural language model for use in a spoken dialog system is disclosed. The method comprises using sample utterances and creating a number of hand crafted rules for each call-type defined in a labeling guide. A first NLU model is generated and tested using the hand crafted rules and sample utterances. A second NLU model is built using the sample utterances as new training data and using the hand crafted rules. The second NLU model is tested for performance using a first batch of labeled data. A series of NLU models are built by adding a previous batch of labeled data to training data and using a new batch of labeling data as test data to generate the series of NLU models with training data that increases constantly. If not all the labeling data is received, the method comprises repeating the step of building a series of NLU models until all labeling data is received.

Type: Grant

Filed: October 20, 2009

Date of Patent: April 26, 2011

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Narendra K. Gupta, Mazin G. Rahim, Gokhan Tur, Antony Van der Mude
REPLACING AN AUDIO PORTION

Publication number: 20110093270

Abstract: A method includes identifying a first syllable in a first audio of a first word and a second syllable in a second audio of a second word, the first syllable having a first set of properties and the second syllable having a second set of properties; detecting the first syllable in a first instance of the first word in an audio file, the first syllable in the first instance having a third set of properties; determining one or more transformations for transforming the first set of properties to the third set of properties; applying the one or more transformations to the second set of properties of the second syllable to yield a transformed second syllable; and replacing the first syllable in the first instance of the first word with the transformed second syllable in the audio file.

Type: Application

Filed: October 16, 2009

Publication date: April 21, 2011

Applicant: Yahoo! Inc.

Inventor: Narayan Lakshmi BHAMIDIPATI
Speech recognition accuracy via concept to keyword mapping

Patent number: 7925506

Abstract: The invention provides a system and method for improving speech recognition. A computer software system is provided for implementing the system and method. A user of the computer software system may speak to the system directly and the system may respond, in spoken language, with an appropriate response. Grammar rules may be generated automatically from sample utterances when implementing the system for a particular application. Dynamic grammar rules may also be generated during interaction between the user and the system. In addition to arranging searching order of grammar files based on a predetermined hierarchy, a dynamically generated searching order based on history of contexts of a single conversation may be provided for further improved speech recognition.

Type: Grant

Filed: October 5, 2004

Date of Patent: April 12, 2011

Assignee: Inago Corporation

Inventors: Gary Farmaner, Ron Dicarlantonio, Huw Leonard
Structure for grammar and dictionary representation in voice recognition and method for simplifying link and node-generated grammars

Patent number: 7921011

Abstract: Methods for optimizing grammar structure for a set of phrases to be used in speech recognition during a computing event are provided. One method includes receiving a set of phrases, the set of phrases being relevant for the computing event and the set of phrases having a node and link structure. Also included is identifying redundant nodes by examining the node and link structures of each of the set of phrases so as to generate a single node for the redundant nodes. The method further includes examining the node and link structures to identify nodes that are capable of being vertically grouped and grouping the identified nodes to define vertical word groups. The method continues with fusing nodes of the set of phrases that are not vertically grouped into fused word groups. Wherein the vertical word groups and the fused word groups are linked to define an optimized grammar structure.

Type: Grant

Filed: May 19, 2006

Date of Patent: April 5, 2011

Assignee: Sony Computer Entertainment Inc.

Inventors: Gustavo Hernandez Abrego, Ruxin Chen
Spoken language identification system and methods for training and operating same

Patent number: 7917361

Abstract: A method for training a spoken language identification system to identify an unknown language as one of a plurality of known candidate languages includes the process of creating a sound inventory comprising a plurality of sound tokens, the collective plurality of sound tokens provided from a subset of the known candidate languages. The method further includes providing a plurality of training samples, each training sample composed within one of the known candidate languages. Further included is the process of generating one or more training vectors from each training database, wherein each training vector is defined as a function of said plurality of sound tokens provided from said subset of the known candidate languages. The method further includes associating each training vector with the candidate language of the corresponding training sample.

Type: Grant

Filed: September 19, 2005

Date of Patent: March 29, 2011

Assignee: Agency for Science, Technology and Research

Inventors: Haizhou Li, Bin Ma, George M. White
Generating words and names using N-grams of phonemes

Patent number: 7912716

Abstract: Generating words and/or names, comprising: receiving at least one corpus based on a given language; generating a plurality of N-grams of phonemes and a plurality of frequencies of occurrence using the corpus, such that each frequency of occurrence corresponds to a respective pair of phonemes and indicates the frequency of the second phoneme in the pair following the first phoneme in the pair; generating a phoneme tree using the plurality of N-grams of phonemes and the plurality of frequencies of occurrence; performing a random walk on the phoneme tree using the frequencies of occurrence to generate a sequence of phonemes; and mapping the sequence of phonemes into a sequence of graphemes.

Type: Grant

Filed: October 6, 2005

Date of Patent: March 22, 2011

Assignee: Sony Online Entertainment LLC

Inventor: Patrick McCuller
Method for uncovering hidden Markov models

Patent number: 7912717

Abstract: The invention uses the ModelGrower program to generate possible candidates from an original or aggregated model. An isomorphic reduction program operates on the candidates to identify and exclude isomorphic models. A Markov model evaluation and optimization program operates on the remaining non-isomorphic candidates. The candidates are optimized and the ones that most closely conform to the data are kept. The best optimized candidate of one stage becomes the starting candidate for the next stage where ModelGrower and the other programs operate on the optimized candidate to generate a new optimized candidate. The invention repeats the steps of growing, excluding isomorphs, evaluating and optimizing until such repetitions yield no significantly better results.

Type: Grant

Filed: November 18, 2005

Date of Patent: March 22, 2011

Inventor: Albert Galick
Audio comparison using phoneme matching

Patent number: 7912724

Abstract: Audio comparison using phoneme matching is described, including evaluating audio data associated with a file, identifying a sequence of phonemes in the audio data, associating the file with a product category based on a match indicating the sequence of phonemes is substantially similar to another sequence of phonemes, the file being stored, and accessing the file when a request associated with the product category is detected.

Type: Grant

Filed: January 18, 2007

Date of Patent: March 22, 2011

Assignee: Adobe Systems Incorporated

Inventor: James Moorer
METHODS AND APPARATUS TO MONITOR MEDIA EXPOSURE USING CONTENT-AWARE WATERMARKS

Publication number: 20110066437

Abstract: Methods and apparatus to construct and transmit content-aware watermarks are disclosed herein. An example method of creating a content-aware watermark includes selecting at least one word associated with a media composition; representing the word with at least one phonetic notation; obtaining a proxy code for each phonetic notation; and locating the proxy code in the content-aware watermark.

Type: Application

Filed: December 11, 2009

Publication date: March 17, 2011

Inventor: Robert Luff
Spoken word spotting queries

Patent number: 7904296

Abstract: An approach to wordspotting (180) using query data from one or more spoken instance of a query (140). The query data is processed to determining a representation of the query (160) that defines multiple sequences of subword (130) units each representing the query. Then putative instances of the query (190) are located in input data from an audio signal using the determined representation of the query.

Type: Grant

Filed: July 22, 2004

Date of Patent: March 8, 2011

Assignee: Nexidia Inc.

Inventor: Robert W. Morris
SYSTEM FOR DETECTING SPEECH INTERVAL AND RECOGNIZING CONTINUOUS SPEECH IN A NOISY ENVIRONMENT THROUGH REAL-TIME RECOGNITION OF CALL COMMANDS

Publication number: 20110054892

Abstract: The present invention relates to a continuous speech recognition system that is very robust in a noisy environment. In order to recognize continuous speech smoothly in a noisy environment, the system selects call commands, configures a minimum recognition network in token, which consists of the call commands and mute intervals including noises, recognizes the inputted speech continuously in real time, analyzes the reliability of speech recognition continuously and recognizes the continuous speech from a speaker. When a speaker delivers a call command, the system for detecting the speech interval and recognizing continuous speech in a noisy environment through the real-time recognition of call commands measures the reliability of the speech after recognizing the call command, and recognizes the speech from the speaker by transferring the speech interval following the call command to a continuous speech-recognition engine at the moment when the system recognizes the call command.

Type: Application

Filed: April 22, 2009

Publication date: March 3, 2011

Applicant: KOREAPOWERVOICE CO., LTD.

Inventors: Heui-Suck Jung, Se-Hoon Chin, Tae-Young Roh
METHOD AND APPARATUS FOR ALIGNING TEXTS

Publication number: 20110054901

Abstract: A method and apparatus for aligning texts. The method includes acquiring a target text and a reference text and aligning the target text and the reference text at word level based on phoneme similarity. The method can be applied to automatically archiving a multimedia resource and a method of automatically searching a multimedia resource.

Type: Application

Filed: August 27, 2010

Publication date: March 3, 2011

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Yong Qin, Qin Shi, Zhiwei Shuang, Shi Lei Zhang, Jie Zhou
Network adapted for mobile devices

Patent number: 7899891

Abstract: A network mobility server, which includes a target device inventory module, a data collection module, a data management module and a distribution module. The data management module, includes at least one data storage module, in which at least a portion of the data stored therein are identical data items stored in different selected formats suitable for use on mobile computing and telecommunication devices. The network also includes network agents, resident on numbers of the network members.

Type: Grant

Filed: July 9, 2010

Date of Patent: March 1, 2011

Assignee: Soonr Corporation

Inventors: Martin Frid-Nielsen, Steven Ray Boye, Lars Gunnersen, Song Zun Huang

prev … 7 8 9 10 11 12 13 14 15 … next