Segmentation Or Word Limit Detection (epo) Patents (Class 704/E15.005)
-
Publication number: 20130159000Abstract: The subject disclosure is directed towards training a classifier for spoken utterances without relying on human-assistance. The spoken utterances may be related to a voice menu program for which a speech comprehension component interprets the spoken utterances into voice menu options. The speech comprehension component provides confirmations to some of the spoken utterances in order to accurately assign a semantic label. For each spoken utterance with a denied confirmation, the speech comprehension component automatically generates a pseudo-semantic label that is consistent with the denied confirmation and selected from a set of potential semantic labels and updates a classification model associated with the classifier using the pseudo-semantic label.Type: ApplicationFiled: December 15, 2011Publication date: June 20, 2013Applicant: MICROSOFT CORPORATIONInventors: Yun-Cheng Ju, James Garnet Droppo, III
-
Publication number: 20130138441Abstract: Disclosed is a method of generating a search network for voice recognition, the method including: generating a pronunciation transduction weighted finite state transducer by implementing a pronunciation transduction rule representing a phenomenon of pronunciation transduction between recognition units as a weighted finite state transducer; and composing the pronunciation transduction weighted finite state transducer and one or more weighted finite state transducers.Type: ApplicationFiled: August 14, 2012Publication date: May 30, 2013Applicant: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTEInventors: Seung Hi Kim, Dong Hyun Kim, Young Ik Kim, Jun Park, Hoon Young Cho, Sang Hun Kim
-
Publication number: 20130110492Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for predicting the stability of speech recognition results. In one aspect, a method includes determining a length of time, or a number of occasions, in which a word has remained in an incremental speech recognizer's top hypothesis, and assigning a stability metric to the word based on the length of time or number of occasions.Type: ApplicationFiled: May 1, 2012Publication date: May 2, 2013Applicant: GOOGLE INC.Inventors: Ian C. McGraw, Alexander H. Gruenstein
-
Publication number: 20130103402Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for combining frame and segment level processing, via temporal pooling, for phonetic classification. A frame processor unit receives an input and extracts the time-dependent features from the input. A plurality of pooling interface units generates a plurality of feature vectors based on pooling the time-dependent features and selecting a plurality of time-dependent features according to a plurality of selection strategies. Next, a plurality of segmental classification units generates scores for the feature vectors. Each segmental classification unit (SCU) can be dedicated to a specific pooling interface unit (PIU) to form a PIU-SCU combination. Multiple PIU-SCU combinations can be further combined to form an ensemble of combinations, and the ensemble can be diversified by varying the pooling operations used by the PIU-SCU combinations.Type: ApplicationFiled: October 25, 2011Publication date: April 25, 2013Applicant: AT&T Intellectual Property I, L.P.Inventors: Sumit CHOPRA, Dimitrios Dimitriadis, Patrick Haffner
-
Publication number: 20130096918Abstract: A recognizing device includes a memory and a processor coupled to the memory. The memory stores words included in a sentence and positional information indicating a position of the words in the sentence. The processor executes a process including comparing an input voice signal with reading information of a character string that connects a plurality of words stored in the memory to calculate a similarity; calculating a connection score indicating a proximity between the plurality of connected words based on positional information of the words stored in the memory; and determining a character string corresponding to the voice signal based on the similarity and the connection score.Type: ApplicationFiled: August 15, 2012Publication date: April 18, 2013Applicant: FUJITSU LIMITEDInventor: Shouji HARADA
-
Publication number: 20130085757Abstract: An embodiment of an apparatus for speech recognition includes a plurality of trigger detection units, each of which is configured to detect a start trigger for recognizing a command utterance for controlling a device, a selection unit, utilizing a signal from one or more sensors embedded on the device, configured to select a selected trigger detection unit among the trigger detection units, the selected trigger detection unit being appropriate to a usage environment of the device, and a recognition unit configured to recognize the command utterance when the start trigger is detected by the selected trigger detection unit.Type: ApplicationFiled: June 29, 2012Publication date: April 4, 2013Applicant: Kabushiki Kaisha ToshibaInventors: Masanobu NAKAMURA, Akinori KAWAMURA
-
Publication number: 20130066635Abstract: An apparatus and a method, which set a remote control command for controlling a home network service in a portable terminal are provided. The apparatus includes a memory for storing configuration types of a remote control command in a set order in a home network service; and a controller for setting the remote control command including the input configuration types of the remote control command and transmitting the remote control command, when the configuration types of the remote control command are input in the set order in the home network service.Type: ApplicationFiled: September 10, 2012Publication date: March 14, 2013Applicant: Samsung Electronics Co., Ltd.Inventors: Jong-Seok KIM, Jin Park
-
Publication number: 20130060572Abstract: In an aspect, in general, method for aligning an audio recording and a transcript includes receiving a transcript including a plurality of terms, each term of the plurality of terms associated with a time location within a different version of the audio recording, forming a plurality of search terms from the terms of the transcript, determining possible time locations of the search terms in the audio recording, determining a correspondence between time locations within the different version of the audio recording associated with the search terms and the possible time locations of the search terms in the audio recording, and aligning the audio recording and the transcript including updating the time location associated with terms of the transcript based on the determined correspondence.Type: ApplicationFiled: September 4, 2012Publication date: March 7, 2013Applicant: Nexidia Inc.Inventors: Jacob B. Garland, Drew Lanham, Daryl Kip Watters, Marsal Gavalda, Mark Finlay, Kenneth K. Griggs
-
Publication number: 20130041667Abstract: The present invention provides a speech recognition system combined with one or more alternate input modalities to ensure efficient and accurate text input. The speech recognition system achieves less than perfect accuracy due to limited processing power, environmental noise, and/or natural variations in speaking style. The alternate input modalities use disambiguation or recognition engines to compensate for reduced keyboards, sloppy input, and/or natural variations in writing style. The ambiguity remaining in the speech recognition process is mostly orthogonal to the ambiguity inherent in the alternate input modality, such that the combination of the two modalities resolves the recognition errors efficiently and accurately. The invention is especially well suited for mobile devices with limited space for keyboards or touch-screen input.Type: ApplicationFiled: October 12, 2012Publication date: February 14, 2013Applicant: NUANCE COMMUNICATIONS, INC.Inventor: NUANCE COMMUNICATIONS, INC.
-
Publication number: 20130035939Abstract: Disclosed herein is a method for speech recognition. The method includes receiving speech utterances, assigning a pronunciation weight to each unit of speech in the speech utterances, each respective pronunciation weight being normalized at a unit of speech level to sum to 1, for each received speech utterance, optimizing the pronunciation weight by identifying word and phone alignments and corresponding likelihood scores, and discriminatively adapting the pronunciation weight to minimize classification errors, and recognizing additional received speech utterances using the optimized pronunciation weights. A unit of speech can be a sentence, a word, a context-dependent phone, a context-independent phone, or a syllable. The method can further include discriminatively adapting pronunciation weights based on an objective function. The objective function can be maximum mutual information, maximum likelihood training, minimum classification error training, or other functions known to those of skill in the art.Type: ApplicationFiled: October 11, 2012Publication date: February 7, 2013Applicant: AT&T INTELLECTUAL PROPERTY I, L.P.Inventor: AT&T INTELLECTUAL PROPERTY I, L.P.
-
Publication number: 20130035938Abstract: The present invention includes a hierarchical search process. The hierarchical search process includes three steps. In a first step, a word boundary is determined using a recognition method of determining a following word dependent on a preceding word, and a word boundary detector. In a second step, word unit based recognition is performed in each area by dividing an input voice into a plurality of areas based on the determined word boundary. Finally, in a third step, a language model is applied to induce an optimal sentence recognition result with respect to a candidate word that is determined for each area. The present invention may improve the voice recognition performance, and particularly, the sentence unit based consecutive voice recognition performance.Type: ApplicationFiled: July 2, 2012Publication date: February 7, 2013Applicant: ELECTRONICS AND COMMUNICATIONS RESEARCH INSTITUTEInventor: Ho Young Jung
-
Publication number: 20130013310Abstract: A speech recognition system comprising a recognition dictionary for use in speech recognition and a controller configured to recognize an inputted speech by using the recognition dictionary is disclosed. The controller detects a speech section based on a signal level of the inputted speech, recognizes a speech data corresponding to the speech section by using the recognition dictionary, and displays a recognition result of the recognition process and a correspondence item that corresponds to the recognition result in form of list. The correspondence item displayed in form of list is manually operable.Type: ApplicationFiled: July 5, 2012Publication date: January 10, 2013Applicant: DENSO CORPORATIONInventors: Yuki Fujisawa, Katsushi Asami
-
Publication number: 20130006637Abstract: Improved techniques are disclosed for permitting a user to employ more human-based grammar (i.e., free form or conversational input) while addressing a target system via a voice system. For example, a technique for determining intent associated with a spoken utterance of a user comprises the following steps/operations. Decoded speech uttered by the user is obtained. An intent is then extracted from the decoded speech uttered by the user. The intent is extracted in an iterative manner such that a first class is determined after a first iteration and a sub-class of the first class is determined after a second iteration. The first class and the sub-class of the first class are hierarchically indicative of the intent of the user, e.g., a target and data that may be associated with the target.Type: ApplicationFiled: August 1, 2012Publication date: January 3, 2013Applicant: Nuance Communications, Inc.Inventors: Dimitri Kanevsky, Joseph Simon Reisinger, Robert Sicconi, Mahesh Viswanathan
-
Publication number: 20130006631Abstract: Environmental recognition systems may improve recognition accuracy by leveraging local and nonlocal features in a recognition target. A local decoder may be used to analyze local features, and a nonlocal decoder may be used to analyze nonlocal features. Local and nonlocal estimates may then be exchanged to improve the accuracy of the local and nonlocal decoders. Additional iterations of analysis and exchange may be performed until a predetermined threshold is reached. In some embodiments, the system may comprise extrinsic information extractors to prevent positive feedback loops from causing the system to adhere to erroneous previous decisions.Type: ApplicationFiled: June 28, 2012Publication date: January 3, 2013Applicant: UTAH STATE UNIVERSITYInventors: Jacob Gunther, Todd Moon
-
Publication number: 20130006639Abstract: A word pattern recognition system improves text input entered via a shorthand-on-keyboard interface. A core lexicon comprises commonly used words in a language; an extended lexicon comprises words not included in the core lexicon. The system only directly outputs words from the core lexicon. Candidate words from the extended lexicon can be outputted and simultaneously admitted to the core lexicon upon user selection. A concatenation module enables a user to input parts of a long word separately. A compound word module combines two common shorter words whose concatenation forms a long word.Type: ApplicationFiled: September 14, 2012Publication date: January 3, 2013Applicant: Nuance Communications, Inc.Inventors: Per-Ola Kristensson, Shumin Zhai
-
Publication number: 20120323577Abstract: Methods of automatic speech recognition for premature enunciation. In one method, a) a user is prompted to input speech, then b) a listening period is initiated to monitor audio via a microphone, such that there is no pause between the end of step a) and the beginning of step b), and then the begin-speaking audible indicator is communicated to the user during the listening period. In another method, a) at least one audio file is played including both a prompt for a user to input speech and a begin-speaking audible indicator to the user, b) a microphone is activated to monitor audio, after playing the prompt but before playing the begin-speaking audible indicator in step a), and c) speech is received from the user via the microphone.Type: ApplicationFiled: June 16, 2011Publication date: December 20, 2012Applicant: GENERAL MOTORS LLCInventors: John J. Correia, Rathinavelu Chengalvarayan, Gaurav Talwar, Xufang Zhao
-
Publication number: 20120316880Abstract: An information processing apparatus, information processing method, and computer readable non-transitory storage medium for analyzing words reflecting information that is not explicitly recognized verbally. An information processing method includes the steps of: extracting speech data and sound data used for recognizing phonemes included in the speech data as words; identifying a section surrounded by pauses within a speech spectrum of the speech data; performing sound analysis on the identified section to identify a word in the section; generating prosodic feature values for the words; acquiring frequencies of occurrence of the word within the speech data; calculating a degree of fluctuation within the speech data for the prosodic feature values of high frequency words where the high frequency words are any words whose frequency of occurrence meets a threshold; and determining a key phrase based on the degree of fluctuation.Type: ApplicationFiled: August 22, 2012Publication date: December 13, 2012Applicant: International Business Machines CorporationInventors: Tohru Nagano, Masafumi Nishimura, Ryuki Tachibana
-
Publication number: 20120316878Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving geographical information derived from a non-verbal user action associated with a first computing device. The non-verbal user action implies an interest of a user in a geographic location. The method also includes identifying a grammar associated with the geographic location using the derived geographical information and outputting a grammar indicator for use in selecting the identified grammar for voice recognition processing of vocal input from the user.Type: ApplicationFiled: August 23, 2012Publication date: December 13, 2012Applicant: Google Inc.Inventors: David Singleton, Debajit Ghosh
-
Publication number: 20120316879Abstract: A continuous speech recognition system to recognize continuous speech smoothly in a noisy environment. The system selects call commands, configures a minimum recognition network in token, which consists of the call commands and mute intervals including noises, recognizes the inputted speech continuously in real time, analyzes the reliability of speech recognition continuously and recognizes the continuous speech from a speaker. When a speaker delivers a call command, the system for detecting the speech interval and recognizing continuous speech in a noisy environment through the real-time recognition of call commands measures the reliability of the speech after recognizing the call command, and recognizes the speech from the speaker by transferring the speech interval following the call command to a continuous speech-recognition engine at the moment when the system recognizes the call command.Type: ApplicationFiled: August 22, 2012Publication date: December 13, 2012Applicant: KOREAPOWERVOICE CO., LTD.Inventors: Heui-Suck JUNG, Se-Hoon CHIN, Tae-Young ROH
-
Publication number: 20120303267Abstract: A method for speech recognition includes providing a source of geographical information within a vehicle. The geographical information pertains to a current location of the vehicle, a planned travel route of the vehicle, a map displayed within the vehicle, and/or a gesture marked by a user on a map. Words spoken within the vehicle are recognized by use of a speech recognition module. The recognizing is dependent upon the geographical information.Type: ApplicationFiled: August 6, 2012Publication date: November 29, 2012Applicant: Robert Bosch GmbHInventors: Zhongnan Shen, Fuliang Weng, Zhe Feng
-
Publication number: 20120299824Abstract: To take security into account and increase user friendliness, an information processing device includes: an input unit to which information is input; an extracting unit extracting predetermined words from the information input to the input unit; a classifying unit classifying the words extracted by the extracting unit into first words and second words; and a converting unit converting the first words by a first conversion method and converting the second words by a second conversion method, the second conversion method being different from the first conversion method.Type: ApplicationFiled: February 4, 2011Publication date: November 29, 2012Applicant: NIKON CORPORATIONInventors: Hideo Hoshuyama, Hiroyuki Akiya, Kazuya Umeyama, Keiichi Nitta, Hiroki Uwai, Masakazu Sekiguchi
-
Publication number: 20120296653Abstract: A method of and a system for processing speech. A spoken utterance of a plurality of characters can be received. A plurality of known character sequences that potentially correspond to the spoken utterance can be selected. Each selected known character sequence can be scored based on, at least in part, a weighting of individual characters that comprise the known character sequence.Type: ApplicationFiled: July 30, 2012Publication date: November 22, 2012Applicant: Nuance Communications, Inc.Inventor: Kenneth D. White
-
Publication number: 20120290302Abstract: A Chinese speech recognition system and method is disclosed. Firstly, a speech signal is received and recognized to output a word lattice. Next, the word lattice is received, and word arcs of the word lattice are rescored and reranked with a prosodic break model, a prosodic state model, a syllable prosodic-acoustic model, a syllable-juncture prosodic-acoustic model and a factored language model, so as to output a language tag, a prosodic tag and a phonetic segmentation tag, which correspond to the speech signal. The present invention performs rescoring in a two-stage way to promote the recognition rate of basic speech information and labels the language tag, prosodic tag and phonetic segmentation tag to provide the prosodic structure and language information for the rear-stage voice conversion and voice synthesis.Type: ApplicationFiled: April 13, 2012Publication date: November 15, 2012Inventors: Jyh-Her YANG, Chen-Yu Chiang, Ming-Chieh Liu, Yih-Ru Wang, Yuan-Fu Liao, Sin-Horng Chen
-
Publication number: 20120290303Abstract: A speech recognition system and method based on word-level candidate generation are provided. The speech recognition system may include a speech recognition result verifying unit to verify a word sequence and a candidate word for at least one word included in the word sequence when the word sequence and the candidate word are provided as a result of speech recognition. A word sequence displaying unit may display the word sequence in which the at least one word is visually distinguishable from other words of the word sequence. The word sequence displaying unit may display the word sequence by replacing the at least one word with the candidate word when the at least one word is selected by a user.Type: ApplicationFiled: May 8, 2012Publication date: November 15, 2012Applicant: NHN CORPORATIONInventors: Sang Ho LEE, Hoon KIM, Dong Ook KOO, Dae Sung JUNG
-
Publication number: 20120278079Abstract: An audio processing system makes use of a number of levels of compression or data reduction, thereby providing reduced storage requirements while maintaining a high accuracy of keyword detection in the original audio input.Type: ApplicationFiled: April 29, 2011Publication date: November 1, 2012Inventors: Jon A. Arrowood, Robert W. Morris, Peter S. Cardillo, Marsal Gavalda
-
Publication number: 20120271635Abstract: A system and method for performing speech recognition is disclosed. The method comprises receiving an utterance, applying the utterance to a recognizer with a language model having pronunciation probabilities associated with unique word identifiers for words given their pronunciations and presenting a recognition result for the utterance. Recognition improvement is found by moving a pronunciation model from a dictionary to the language model.Type: ApplicationFiled: July 2, 2012Publication date: October 25, 2012Applicant: AT&T Intellectual Property II, L.P.Inventor: Andrej Ljolje
-
Publication number: 20120271634Abstract: A speech dialog system is described that adjusts a voice activity detection threshold during a speech dialog prompt C to reflect a context-based probability of user barge in speech occurring. For example, the context-based probability may be based on the location of one or more transition relevance places in the speech dialog prompt.Type: ApplicationFiled: March 26, 2010Publication date: October 25, 2012Applicant: NUANCE COMMUNICATIONS, INC.Inventor: Nils Lenke
-
Publication number: 20120265531Abstract: An intelligent query system for processing voiced-based queries is disclosed, which uses semantic based processing to identify the question posed by the user by understanding the meaning of the users utterance. Based on identifying the meaning of the utterance, the system selects a single answer that best matches the user's query. The answer that is paired to this single question is then retrieved and presented to the user. The system, as implemented, accepts environmental variables selected by the user and is scalable to provide answers to a variety and quantity of user-initiated queries.Type: ApplicationFiled: June 18, 2012Publication date: October 18, 2012Inventor: Ian M. Bennett
-
Publication number: 20120262533Abstract: A method is provided in one example and includes identifying a particular word recited by an active speaker in a conference involving a plurality of endpoints in a network environment; evaluating a profile associated with the active speaker in order to identify contextual information associated with the particular word; and providing augmented data associated with the particular word to at least some of the plurality of endpoints. In more specific examples, the active speaker is identified using a facial detection protocol, or a speech recognition protocol. Data from the active speaker can be converted from speech to text.Type: ApplicationFiled: April 18, 2011Publication date: October 18, 2012Inventors: Satish K. Gannu, Leon A. Frazier, Didier R. Moretti
-
Publication number: 20120259632Abstract: A feature transform for speech recognition is described. An input speech utterance is processed to produce a sequence of representative speech vectors. A time-synchronous speech recognition pass is performed using a decoding search to determine a recognition output corresponding to the speech input. The decoding search includes, for each speech vector after some first threshold number of speech vectors, estimating a feature transform based on the preceding speech vectors in the utterance and partial decoding results of the decoding search. The current speech vector is then adjusted based on the current feature transform, and the adjusted speech vector is used in a current frame of the decoding search.Type: ApplicationFiled: February 22, 2010Publication date: October 11, 2012Applicant: NUANCE COMMUNICATIONS, INC.Inventor: Daniel Willett
-
Publication number: 20120245934Abstract: A method of automatic speech recognition. An utterance is received from a user in reply to a text message, via a microphone that converts the reply utterance into a speech signal. The speech signal is processed using at least one processor to extract acoustic data from the speech signal. An acoustic model is identified from a plurality of acoustic models to decode the acoustic data, and using a conversational context associated with the text message. The acoustic data is decoded using the identified acoustic model to produce a plurality of hypotheses for the reply utterance.Type: ApplicationFiled: March 25, 2011Publication date: September 27, 2012Applicant: GENERAL MOTORS LLCInventors: Gaurav Talwar, Xufang Zhao
-
Publication number: 20120239403Abstract: An approach for phoneme recognition is described. A sequence of intermediate output posterior vectors is generated from an input sequence of cepstral features using a first layer perceptron. The intermediate output posterior vectors are then downsampled to form a reduced input set of intermediate posterior vectors for a second layer perceptron. A sequence of final posterior vectors is generated from the reduced input set of intermediate posterior vectors using the second layer perceptron. Then the final posterior vectors are decoded to determine an output recognized phoneme sequence representative of the input sequence of cepstral features.Type: ApplicationFiled: September 28, 2009Publication date: September 20, 2012Applicant: NUANCE COMMUNICATIONS, INC.Inventors: Daniel Andrés Vásquez Cano, Guillermo Aradilla, Rainer Gruhn
-
Publication number: 20120232904Abstract: A method and apparatus for correcting a named entity word in a speech input text. The method includes recognizing a speech input signal from a user, obtaining a recognition result including named entity vocabulary mark-up information, determining a named entity word recognized incorrectly in the recognition result according to the named entity vocabulary mark-up information, displaying the named entity word recognized incorrectly, and correcting the named entity word recognized incorrectly.Type: ApplicationFiled: March 12, 2012Publication date: September 13, 2012Applicant: Samsung Electronics Co., Ltd.Inventors: Xuan ZHU, Hua Zhang, Tengrong Su, Ki-Wan Eom, Jae-Won Lee
-
Publication number: 20120232898Abstract: The invention relates to a system and method for gathering data for use in a spoken dialog system. An aspect of the invention is generally referred to as an automated hidden human that performs data collection automatically at the beginning of a conversation with a user in a spoken dialog system. The method comprises presenting an initial prompt to a user, recognizing a received user utterance using an automatic speech recognition engine and classifying the recognized user utterance using a spoken language understanding module. If the recognized user utterance is not understood or classifiable to a predetermined acceptance threshold, then the method re-prompts the user. If the recognized user utterance is not classifiable to a predetermined rejection threshold, then the method transfers the user to a human as this may imply a task-specific utterance. The received and classified user utterance is then used for training the spoken dialog system.Type: ApplicationFiled: May 21, 2012Publication date: September 13, 2012Applicant: AT&T Intellectual Property II, L.P.Inventors: Giuseppe Di Fabbrizio, Dilek Z. Hakkani-Tur, Mazin G. Rahim, Bernard S. Renger, Gokhan Tur
-
Publication number: 20120232901Abstract: A language identification system that includes a universal phoneme decoder (UPD) is described. The UPD contains a universal phoneme set representing both 1) all phonemes occurring in the set of two or more spoken languages, and 2) captures phoneme correspondences across languages, such that a set of unique phoneme patterns and probabilities are calculated in order to identify a most likely phoneme occurring each time in the audio files in the set of two or more potential languages in which the UPD was trained on. Each statistical language model (SLM) uses the set of unique phoneme patterns created for each language in the set to distinguish between spoken human languages in the set of languages. The run-time language identifier module identifies a particular human language being spoken by utilizing the linguistic probabilities supplied by the SLMs that are based on the set of unique phoneme patterns created for each language.Type: ApplicationFiled: May 24, 2012Publication date: September 13, 2012Applicant: Autonomy Corporation Ltd.Inventors: Mahapathy Kadirkamanathan, Christopher John Waple
-
Publication number: 20120215538Abstract: In one embodiment, a method includes identifying a first communication from a customer, identifying a second communication from the customer following a response to the first communication from a contact center, and analyzing the first and second communications at a contact center network device to determine a change in sentiment from the first communication to the second communication. An apparatus for contact center performance measurement is also disclosed.Type: ApplicationFiled: February 17, 2011Publication date: August 23, 2012Applicant: CISCO TECHNOLOGY, INC.Inventors: Andrew Cleasby, Robert Zacher
-
Publication number: 20120215539Abstract: A recipient computing device can receive a speech utterance to be processed by speech recognition and segment the speech utterance into two or more speech utterance segments, each of which can be to one of a plurality of available speech recognizers. A first one of the plurality of available speech recognizers can be implemented on a separate computing device accessible via a data network. A first segment can be processed by the first recognizer and the results of the processing returned to the recipient computing device, and a second segment can be processed by a second recognizer implemented at the recipient computing device.Type: ApplicationFiled: February 22, 2012Publication date: August 23, 2012Inventor: Ajay Juneja
-
Publication number: 20120209610Abstract: Provided herein are systems and methods for using context-sensitive speech recognition logic in a computer to create a software program, including context-aware voice entry of instructions that make up a software program, automatic context-sensitive instruction formatting, and automatic context-sensitive insertion-point positioning.Type: ApplicationFiled: April 24, 2012Publication date: August 16, 2012Inventor: Lunis ORCUTT
-
Publication number: 20120185252Abstract: A method of generating a confidence measure generator is provided for use in a voice search system, the voice search system including voice search components comprising a speech recognition system, a dialog manager and a search system. The method includes selecting voice search features, from a plurality of the voice search components, to be considered by the confidence measure generator in generating a voice search confidence measure. The method includes training a model, using a computer processor, to generate the voice search confidence measure based on selected voice search features.Type: ApplicationFiled: March 23, 2012Publication date: July 19, 2012Applicant: Microsoft CorporationInventors: Ye-Yi Wang, Yun-Cheng Ju, Dong Yu
-
Publication number: 20120173240Abstract: Subspace speech adaptation may be utilized for facilitating the recognition of speech containing short utterances. Speech training data may be received in a speech model by a computer. A first matrix may be determined for preconditioning speech statistics based on the speech training data. A second matrix may be determined for representing a basis for the speech to be recognized. A set of basis matrices may then be determined from the first matrix and the second matrix. Speech test data including a short utterance may then be received by the computer. The computer may then apply the set of basis matrices to the speech test data to produce a transcription. The transcription may represent speech recognition of the short utterance.Type: ApplicationFiled: December 30, 2010Publication date: July 5, 2012Applicant: MICROSOFT CORPORATIONInventors: Daniel Povey, Kaisheng YAO, Yifan Gong
-
Publication number: 20120166197Abstract: The present invention discloses converting a text form into a speech. In the present invention, partial word lists of a data source are obtained by parsing the data source in parallel or in series. The partial word lists are then compiled to obtain phoneme graphs corresponding, respectively, to the partial word lists, and then the obtained phoneme graphs are combined. Speech recognition is then conducted according to the combination results. According to the present invention, computational complexity may be reduced and recognition efficiency may be improved during speech recognition.Type: ApplicationFiled: March 4, 2012Publication date: June 28, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Guo Kang Fu, Zhao Bing Han, Bin Jia, Ying Liu
-
Publication number: 20120166194Abstract: Disclosed herein are an apparatus and method for recognizing speech. The apparatus includes a frame-based speech recognition unit, a segment division unit, a segment feature extraction unit, a segment speech recognition performance unit, and a combination and synchronization unit. The frame-based speech recognition unit extracts frame speech feature vectors from a speech signal, and performs speech recognition on frames of the speech signal using the frame speech feature vectors and a frame-based probability model. The segment division unit divides the speech signal into segments. The segment feature extraction unit extracts segment speech feature vectors around a boundary between the segments. The segment speech recognition performance unit performs speech recognition on the segments of the speech signal using the segment speech feature vectors and a segment-based probability model.Type: ApplicationFiled: December 22, 2011Publication date: June 28, 2012Applicant: Electronics and Telecommunications Research InstituteInventors: Ho-Young JUNG, Jeon-Gue PARK, Hoon CHUNG
-
Publication number: 20120166196Abstract: This document describes word-dependent language models, as well as their creation and use. A word-dependent language model can permit a speech-recognition engine to accurately verify that a speech utterance matches a multi-word phrase. This is useful in many contexts, including those where one or more letters of the expected phrase are known to the speaker.Type: ApplicationFiled: December 23, 2010Publication date: June 28, 2012Applicant: Microsoft CorporationInventors: Yun-Cheng Ju, Ivan J. Tashev, Chad R. Heinemann
-
Publication number: 20120158405Abstract: A speech recognition device (1) processes speech data (SD) of a dictation and establishes recognized text information (ETI) and link information (LI) of the dictation. In a synchronous playback mode of the speech recognition device (1), during acoustic playback of the dictation a correction device (10) synchronously marks the word of the recognized text information (ETI) which word relates to speech data (SD) just played back marked by link information (LI) is marked synchronously, the just marked word featuring the position of an audio cursor (AC). When a user of the speech recognition device (1) recognizes an incorrect word, he positions a text cursor (TC) at the incorrect word and corrects it. Cursor synchronization means (15) makes it possible to synchronize text cursor (TC) with audio cursor (AC) or audio cursor (AC) with text cursor (TC) so the positioning of the respective cursor (AC, TC) is simplified considerably.Type: ApplicationFiled: February 13, 2012Publication date: June 21, 2012Applicant: Nuance Communications Austria GmbHInventor: Wolfgang Gschwendtner
-
Publication number: 20120150541Abstract: A method of generating proxy acoustic models for use in automatic speech recognition includes training acoustic models from speech received via microphone from male speakers of a first language, and adapting the acoustic models in response to language-independent speech data from female speakers of a second language, to generate proxy acoustic models for use during runtime of speech recognition of an utterance from a female speaker of the first language.Type: ApplicationFiled: December 10, 2010Publication date: June 14, 2012Applicant: GENERAL MOTORS LLCInventors: Gaurav Talwar, Rathinavelu Chengalvarayan
-
Publication number: 20120143607Abstract: The present invention provides a speech recognition system combined with one or more alternate input modalities to ensure efficient and accurate text input. The speech recognition system achieves less than perfect accuracy due to limited processing power, environmental noise, and/or natural variations in speaking style. The alternate input modalities use disambiguation or recognition engines to compensate for reduced keyboards, sloppy input, and/or natural variations in writing style. The ambiguity remaining in the speech recognition process is mostly orthogonal to the ambiguity inherent in the alternate input modality, such that the combination of the two modalities resolves the recognition errors efficiently and accurately. The invention is especially well suited for mobile devices with limited space for keyboards or touch-screen input.Type: ApplicationFiled: December 6, 2011Publication date: June 7, 2012Inventors: Michael LONGÉ, Richard Eyraud, Keith C. Hullfish
-
Publication number: 20120143610Abstract: A sound event detecting module for detecting whether a sound event with characteristic of repeating is generated. A sound end recognizing unit recognizes ends of sounds according to a sound signal to generate sound sections and multiple sets of feature vectors of the sound sections correspondingly. A storage unit stores at least M sets of feature vectors. A similarity comparing unit compares the at least M sets of feature vectors with each other, and correspondingly generates a similarity score matrix, which stores similarity scores of any two of the sound sections of the at least M of the sound sections. A correlation arbitrating unit determines the number of sound sections with high correlations to each other according to the similarity score matrix. When the number is greater than one threshold value, the correlation arbitrating unit indicates that the sound event with the characteristic of repeating is generated.Type: ApplicationFiled: December 30, 2010Publication date: June 7, 2012Applicant: INDUSTRIAL TECHNOLOGY RESEARCH INSTITUTEInventors: Yuh-Ching Wang, Kuo-Yuan Li
-
Publication number: 20120136662Abstract: The invention deals with speech recognition, such as a system for recognizing words in continuous speech. A speech recognition system is disclosed which is capable of recognizing a huge number of words, and in principle even an unlimited number of words. The speech recognition system comprises a word recognizer for deriving a best path through a word graph, and wherein words are assigned to the speech based on the best path. The word score being obtained from applying a phonemic language model to each word of the word graph. Moreover, the invention deals with an apparatus and a method for identifying words from a sound block and to computer readable code for implementing the method.Type: ApplicationFiled: February 3, 2012Publication date: May 31, 2012Applicant: Nuance Communications Austria GMBHInventor: Zsolt Saffer
-
Publication number: 20120136661Abstract: The present invention discloses converting a text form into a speech. In the present invention, partial word lists of a data source are obtained by parsing the data source in parallel or in series. The partial word lists are then compiled to obtain phoneme graphs corresponding, respectively, to the partial word lists, and then the obtained phoneme graphs are combined. Speech recognition is then conducted according to the combination results. According to the present invention, computational complexity may be reduced and recognition efficiency may be improved during speech recognition.Type: ApplicationFiled: November 2, 2011Publication date: May 31, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Guo Kang Fu, Zhao Bing Han, Bin Jia, Ying Liu
-
Publication number: 20120136660Abstract: A voice-estimation device that probes the vocal tract of a user with sub-threshold acoustic waves to estimate the user's voice while the user speaks silently or audibly in a noisy or socially sensitive environment. The waves reflected by the vocal tract are detected and converted into a digital signal, which is then processed segment-by-segment. Based on the processing, a set of formant frequencies is determined for each segment. Each such set is then analyzed to assign a phoneme to the corresponding segment of the digital signal. The resulting sequence of phonemes is converted into a digital audio signal or text representing the user's estimated voice.Type: ApplicationFiled: November 30, 2010Publication date: May 31, 2012Applicant: ALCATEL-LUCENT USA INC.Inventors: Dale D. Harman, Lothar Benedikt Moeller