Segmentation Or Word Limit Detection (epo) Patents (Class 704/E15.005)

E Subclasses

Word boundary detection (epo) (Class 704/E15.006)

Spoken Utterance Classification Training for a Speech Recognition System

Publication number: 20130159000

Abstract: The subject disclosure is directed towards training a classifier for spoken utterances without relying on human-assistance. The spoken utterances may be related to a voice menu program for which a speech comprehension component interprets the spoken utterances into voice menu options. The speech comprehension component provides confirmations to some of the spoken utterances in order to accurately assign a semantic label. For each spoken utterance with a denied confirmation, the speech comprehension component automatically generates a pseudo-semantic label that is consistent with the denied confirmation and selected from a set of potential semantic labels and updates a classification model associated with the classifier using the pseudo-semantic label.

Type: Application

Filed: December 15, 2011

Publication date: June 20, 2013

Applicant: MICROSOFT CORPORATION

Inventors: Yun-Cheng Ju, James Garnet Droppo, III
METHOD AND SYSTEM FOR GENERATING SEARCH NETWORK FOR VOICE RECOGNITION

Publication number: 20130138441

Abstract: Disclosed is a method of generating a search network for voice recognition, the method including: generating a pronunciation transduction weighted finite state transducer by implementing a pronunciation transduction rule representing a phenomenon of pronunciation transduction between recognition units as a weighted finite state transducer; and composing the pronunciation transduction weighted finite state transducer and one or more weighted finite state transducers.

Type: Application

Filed: August 14, 2012

Publication date: May 30, 2013

Applicant: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE

Inventors: Seung Hi Kim, Dong Hyun Kim, Young Ik Kim, Jun Park, Hoon Young Cho, Sang Hun Kim
Enhanced stability prediction for incrementally generated speech recognition hypotheses

Publication number: 20130110492

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for predicting the stability of speech recognition results. In one aspect, a method includes determining a length of time, or a number of occasions, in which a word has remained in an incremental speech recognizer's top hypothesis, and assigning a stability metric to the word based on the length of time or number of occasions.

Type: Application

Filed: May 1, 2012

Publication date: May 2, 2013

Applicant: GOOGLE INC.

Inventors: Ian C. McGraw, Alexander H. Gruenstein
SYSTEM AND METHOD FOR COMBINING FRAME AND SEGMENT LEVEL PROCESSING, VIA TEMPORAL POOLING, FOR PHONETIC CLASSIFICATION

Publication number: 20130103402

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for combining frame and segment level processing, via temporal pooling, for phonetic classification. A frame processor unit receives an input and extracts the time-dependent features from the input. A plurality of pooling interface units generates a plurality of feature vectors based on pooling the time-dependent features and selecting a plurality of time-dependent features according to a plurality of selection strategies. Next, a plurality of segmental classification units generates scores for the feature vectors. Each segmental classification unit (SCU) can be dedicated to a specific pooling interface unit (PIU) to form a PIU-SCU combination. Multiple PIU-SCU combinations can be further combined to form an ensemble of combinations, and the ensemble can be diversified by varying the pooling operations used by the PIU-SCU combinations.

Type: Application

Filed: October 25, 2011

Publication date: April 25, 2013

Applicant: AT&T Intellectual Property I, L.P.

Inventors: Sumit CHOPRA, Dimitrios Dimitriadis, Patrick Haffner
RECOGNIZING DEVICE, COMPUTER-READABLE RECORDING MEDIUM, RECOGNIZING METHOD, GENERATING DEVICE, AND GENERATING METHOD

Publication number: 20130096918

Abstract: A recognizing device includes a memory and a processor coupled to the memory. The memory stores words included in a sentence and positional information indicating a position of the words in the sentence. The processor executes a process including comparing an input voice signal with reading information of a character string that connects a plurality of words stored in the memory to calculate a similarity; calculating a connection score indicating a proximity between the plurality of connected words based on positional information of the words stored in the memory; and determining a character string corresponding to the voice signal based on the similarity and the connection score.

Type: Application

Filed: August 15, 2012

Publication date: April 18, 2013

Applicant: FUJITSU LIMITED

Inventor: Shouji HARADA
APPARATUS AND METHOD FOR SPEECH RECOGNITION

Publication number: 20130085757

Abstract: An embodiment of an apparatus for speech recognition includes a plurality of trigger detection units, each of which is configured to detect a start trigger for recognizing a command utterance for controlling a device, a selection unit, utilizing a signal from one or more sensors embedded on the device, configured to select a selected trigger detection unit among the trigger detection units, the selected trigger detection unit being appropriate to a usage environment of the device, and a recognition unit configured to recognize the command utterance when the start trigger is detected by the selected trigger detection unit.

Type: Application

Filed: June 29, 2012

Publication date: April 4, 2013

Applicant: Kabushiki Kaisha Toshiba

Inventors: Masanobu NAKAMURA, Akinori KAWAMURA
APPARATUS AND METHOD FOR CONTROLLING HOME NETWORK SERVICE IN PORTABLE TERMINAL

Publication number: 20130066635

Abstract: An apparatus and a method, which set a remote control command for controlling a home network service in a portable terminal are provided. The apparatus includes a memory for storing configuration types of a remote control command in a set order in a home network service; and a controller for setting the remote control command including the input configuration types of the remote control command and transmitting the remote control command, when the configuration types of the remote control command are input in the set order in the home network service.

Type: Application

Filed: September 10, 2012

Publication date: March 14, 2013

Applicant: Samsung Electronics Co., Ltd.

Inventors: Jong-Seok KIM, Jin Park
TRANSCRIPT RE-SYNC

Publication number: 20130060572

Abstract: In an aspect, in general, method for aligning an audio recording and a transcript includes receiving a transcript including a plurality of terms, each term of the plurality of terms associated with a time location within a different version of the audio recording, forming a plurality of search terms from the terms of the transcript, determining possible time locations of the search terms in the audio recording, determining a correspondence between time locations within the different version of the audio recording associated with the search terms and the possible time locations of the search terms in the audio recording, and aligning the audio recording and the transcript including updating the time location associated with terms of the transcript based on the determined correspondence.

Type: Application

Filed: September 4, 2012

Publication date: March 7, 2013

Applicant: Nexidia Inc.

Inventors: Jacob B. Garland, Drew Lanham, Daryl Kip Watters, Marsal Gavalda, Mark Finlay, Kenneth K. Griggs
MULTIMODAL DISAMBIGUATION OF SPEECH RECOGNITION

Publication number: 20130041667

Abstract: The present invention provides a speech recognition system combined with one or more alternate input modalities to ensure efficient and accurate text input. The speech recognition system achieves less than perfect accuracy due to limited processing power, environmental noise, and/or natural variations in speaking style. The alternate input modalities use disambiguation or recognition engines to compensate for reduced keyboards, sloppy input, and/or natural variations in writing style. The ambiguity remaining in the speech recognition process is mostly orthogonal to the ambiguity inherent in the alternate input modality, such that the combination of the two modalities resolves the recognition errors efficiently and accurately. The invention is especially well suited for mobile devices with limited space for keyboards or touch-screen input.

Type: Application

Filed: October 12, 2012

Publication date: February 14, 2013

Applicant: NUANCE COMMUNICATIONS, INC.

Inventor: NUANCE COMMUNICATIONS, INC.
System and Method for Discriminative Pronunciation Modeling for Voice Search

Publication number: 20130035939

Abstract: Disclosed herein is a method for speech recognition. The method includes receiving speech utterances, assigning a pronunciation weight to each unit of speech in the speech utterances, each respective pronunciation weight being normalized at a unit of speech level to sum to 1, for each received speech utterance, optimizing the pronunciation weight by identifying word and phone alignments and corresponding likelihood scores, and discriminatively adapting the pronunciation weight to minimize classification errors, and recognizing additional received speech utterances using the optimized pronunciation weights. A unit of speech can be a sentence, a word, a context-dependent phone, a context-independent phone, or a syllable. The method can further include discriminatively adapting pronunciation weights based on an objective function. The objective function can be maximum mutual information, maximum likelihood training, minimum classification error training, or other functions known to those of skill in the art.

Type: Application

Filed: October 11, 2012

Publication date: February 7, 2013

Applicant: AT&T INTELLECTUAL PROPERTY I, L.P.

Inventor: AT&T INTELLECTUAL PROPERTY I, L.P.
APPARATUS AND METHOD FOR RECOGNIZING VOICE

Publication number: 20130035938

Abstract: The present invention includes a hierarchical search process. The hierarchical search process includes three steps. In a first step, a word boundary is determined using a recognition method of determining a following word dependent on a preceding word, and a word boundary detector. In a second step, word unit based recognition is performed in each area by dividing an input voice into a plurality of areas based on the determined word boundary. Finally, in a third step, a language model is applied to induce an optimal sentence recognition result with respect to a candidate word that is determined for each area. The present invention may improve the voice recognition performance, and particularly, the sentence unit based consecutive voice recognition performance.

Type: Application

Filed: July 2, 2012

Publication date: February 7, 2013

Applicant: ELECTRONICS AND COMMUNICATIONS RESEARCH INSTITUTE

Inventor: Ho Young Jung
SPEECH RECOGNITION SYSTEM

Publication number: 20130013310

Abstract: A speech recognition system comprising a recognition dictionary for use in speech recognition and a controller configured to recognize an inputted speech by using the recognition dictionary is disclosed. The controller detects a speech section based on a signal level of the inputted speech, recognizes a speech data corresponding to the speech section by using the recognition dictionary, and displays a recognition result of the recognition process and a correspondence item that corresponds to the recognition result in form of list. The correspondence item displayed in form of list is manually operable.

Type: Application

Filed: July 5, 2012

Publication date: January 10, 2013

Applicant: DENSO CORPORATION

Inventors: Yuki Fujisawa, Katsushi Asami
HIERARCHICAL METHODS AND APPARATUS FOR EXTRACTING USER INTENT FROM SPOKEN UTTERANCES

Publication number: 20130006637

Abstract: Improved techniques are disclosed for permitting a user to employ more human-based grammar (i.e., free form or conversational input) while addressing a target system via a voice system. For example, a technique for determining intent associated with a spoken utterance of a user comprises the following steps/operations. Decoded speech uttered by the user is obtained. An intent is then extracted from the decoded speech uttered by the user. The intent is extracted in an iterative manner such that a first class is determined after a first iteration and a sub-class of the first class is determined after a second iteration. The first class and the sub-class of the first class are hierarchically indicative of the intent of the user, e.g., a target and data that may be associated with the target.

Type: Application

Filed: August 1, 2012

Publication date: January 3, 2013

Applicant: Nuance Communications, Inc.

Inventors: Dimitri Kanevsky, Joseph Simon Reisinger, Robert Sicconi, Mahesh Viswanathan
Turbo Processing of Speech Recognition

Publication number: 20130006631

Abstract: Environmental recognition systems may improve recognition accuracy by leveraging local and nonlocal features in a recognition target. A local decoder may be used to analyze local features, and a nonlocal decoder may be used to analyze nonlocal features. Local and nonlocal estimates may then be exchanged to improve the accuracy of the local and nonlocal decoders. Additional iterations of analysis and exchange may be performed until a predetermined threshold is reached. In some embodiments, the system may comprise extrinsic information extractors to prevent positive feedback loops from causing the system to adhere to erroneous previous decisions.

Type: Application

Filed: June 28, 2012

Publication date: January 3, 2013

Applicant: UTAH STATE UNIVERSITY

Inventors: Jacob Gunther, Todd Moon
SYSTEM AND METHOD FOR IMPROVING TEXT INPUT IN A SHORTHAND-ON-KEYBOARD INTERFACE

Publication number: 20130006639

Abstract: A word pattern recognition system improves text input entered via a shorthand-on-keyboard interface. A core lexicon comprises commonly used words in a language; an extended lexicon comprises words not included in the core lexicon. The system only directly outputs words from the core lexicon. Candidate words from the extended lexicon can be outputted and simultaneously admitted to the core lexicon upon user selection. A concatenation module enables a user to input parts of a long word separately. A compound word module combines two common shorter words whose concatenation forms a long word.

Type: Application

Filed: September 14, 2012

Publication date: January 3, 2013

Applicant: Nuance Communications, Inc.

Inventors: Per-Ola Kristensson, Shumin Zhai
SPEECH RECOGNITION FOR PREMATURE ENUNCIATION

Publication number: 20120323577

Abstract: Methods of automatic speech recognition for premature enunciation. In one method, a) a user is prompted to input speech, then b) a listening period is initiated to monitor audio via a microphone, such that there is no pause between the end of step a) and the beginning of step b), and then the begin-speaking audible indicator is communicated to the user during the listening period. In another method, a) at least one audio file is played including both a prompt for a user to input speech and a begin-speaking audible indicator to the user, b) a microphone is activated to monitor audio, after playing the prompt but before playing the begin-speaking audible indicator in step a), and c) speech is received from the user via the microphone.

Type: Application

Filed: June 16, 2011

Publication date: December 20, 2012

Applicant: GENERAL MOTORS LLC

Inventors: John J. Correia, Rathinavelu Chengalvarayan, Gaurav Talwar, Xufang Zhao
INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, INFORMATION PROCESSING SYSTEM, AND PROGRAM

Publication number: 20120316880

Abstract: An information processing apparatus, information processing method, and computer readable non-transitory storage medium for analyzing words reflecting information that is not explicitly recognized verbally. An information processing method includes the steps of: extracting speech data and sound data used for recognizing phonemes included in the speech data as words; identifying a section surrounded by pauses within a speech spectrum of the speech data; performing sound analysis on the identified section to identify a word in the section; generating prosodic feature values for the words; acquiring frequencies of occurrence of the word within the speech data; calculating a degree of fluctuation within the speech data for the prosodic feature values of high frequency words where the high frequency words are any words whose frequency of occurrence meets a threshold; and determining a key phrase based on the degree of fluctuation.

Type: Application

Filed: August 22, 2012

Publication date: December 13, 2012

Applicant: International Business Machines Corporation

Inventors: Tohru Nagano, Masafumi Nishimura, Ryuki Tachibana
VOICE RECOGNITION GRAMMAR SELECTION BASED ON CONTEXT

Publication number: 20120316878

Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving geographical information derived from a non-verbal user action associated with a first computing device. The non-verbal user action implies an interest of a user in a geographic location. The method also includes identifying a grammar associated with the geographic location using the derived geographical information and outputting a grammar indicator for use in selecting the identified grammar for voice recognition processing of vocal input from the user.

Type: Application

Filed: August 23, 2012

Publication date: December 13, 2012

Applicant: Google Inc.

Inventors: David Singleton, Debajit Ghosh
SYSTEM FOR DETECTING SPEECH INTERVAL AND RECOGNIZING CONTINOUS SPEECH IN A NOISY ENVIRONMENT THROUGH REAL-TIME RECOGNITION OF CALL COMMANDS

Publication number: 20120316879

Abstract: A continuous speech recognition system to recognize continuous speech smoothly in a noisy environment. The system selects call commands, configures a minimum recognition network in token, which consists of the call commands and mute intervals including noises, recognizes the inputted speech continuously in real time, analyzes the reliability of speech recognition continuously and recognizes the continuous speech from a speaker. When a speaker delivers a call command, the system for detecting the speech interval and recognizing continuous speech in a noisy environment through the real-time recognition of call commands measures the reliability of the speech after recognizing the call command, and recognizes the speech from the speaker by transferring the speech interval following the call command to a continuous speech-recognition engine at the moment when the system recognizes the call command.

Type: Application

Filed: August 22, 2012

Publication date: December 13, 2012

Applicant: KOREAPOWERVOICE CO., LTD.

Inventors: Heui-Suck JUNG, Se-Hoon CHIN, Tae-Young ROH
METHOD AND SYSTEM FOR IMPROVING SPEECH RECOGNITION ACCURACY BY USE OF GEOGRAPHIC INFORMATION

Publication number: 20120303267

Abstract: A method for speech recognition includes providing a source of geographical information within a vehicle. The geographical information pertains to a current location of the vehicle, a planned travel route of the vehicle, a map displayed within the vehicle, and/or a gesture marked by a user on a map. Words spoken within the vehicle are recognized by use of a speech recognition module. The recognizing is dependent upon the geographical information.

Type: Application

Filed: August 6, 2012

Publication date: November 29, 2012

Applicant: Robert Bosch GmbH

Inventors: Zhongnan Shen, Fuliang Weng, Zhe Feng
INFORMATION PROCESSING DEVICE, PORTABLE DEVICE AND INFORMATION PROCESSING SYSTEM

Publication number: 20120299824

Abstract: To take security into account and increase user friendliness, an information processing device includes: an input unit to which information is input; an extracting unit extracting predetermined words from the information input to the input unit; a classifying unit classifying the words extracted by the extracting unit into first words and second words; and a converting unit converting the first words by a first conversion method and converting the second words by a second conversion method, the second conversion method being different from the first conversion method.

Type: Application

Filed: February 4, 2011

Publication date: November 29, 2012

Applicant: NIKON CORPORATION

Inventors: Hideo Hoshuyama, Hiroyuki Akiya, Kazuya Umeyama, Keiichi Nitta, Hiroki Uwai, Masakazu Sekiguchi
SPEECH RECOGNITION OF CHARACTER SEQUENCES

Publication number: 20120296653

Abstract: A method of and a system for processing speech. A spoken utterance of a plurality of characters can be received. A plurality of known character sequences that potentially correspond to the spoken utterance can be selected. Each selected known character sequence can be scored based on, at least in part, a weighting of individual characters that comprise the known character sequence.

Type: Application

Filed: July 30, 2012

Publication date: November 22, 2012

Applicant: Nuance Communications, Inc.

Inventor: Kenneth D. White
CHINESE SPEECH RECOGNITION SYSTEM AND METHOD

Publication number: 20120290302

Abstract: A Chinese speech recognition system and method is disclosed. Firstly, a speech signal is received and recognized to output a word lattice. Next, the word lattice is received, and word arcs of the word lattice are rescored and reranked with a prosodic break model, a prosodic state model, a syllable prosodic-acoustic model, a syllable-juncture prosodic-acoustic model and a factored language model, so as to output a language tag, a prosodic tag and a phonetic segmentation tag, which correspond to the speech signal. The present invention performs rescoring in a two-stage way to promote the recognition rate of basic speech information and labels the language tag, prosodic tag and phonetic segmentation tag to provide the prosodic structure and language information for the rear-stage voice conversion and voice synthesis.

Type: Application

Filed: April 13, 2012

Publication date: November 15, 2012

Inventors: Jyh-Her YANG, Chen-Yu Chiang, Ming-Chieh Liu, Yih-Ru Wang, Yuan-Fu Liao, Sin-Horng Chen
SPEECH RECOGNITION SYSTEM AND METHOD BASED ON WORD-LEVEL CANDIDATE GENERATION

Publication number: 20120290303

Abstract: A speech recognition system and method based on word-level candidate generation are provided. The speech recognition system may include a speech recognition result verifying unit to verify a word sequence and a candidate word for at least one word included in the word sequence when the word sequence and the candidate word are provided as a result of speech recognition. A word sequence displaying unit may display the word sequence in which the at least one word is visually distinguishable from other words of the word sequence. The word sequence displaying unit may display the word sequence by replacing the at least one word with the candidate word when the at least one word is selected by a user.

Type: Application

Filed: May 8, 2012

Publication date: November 15, 2012

Applicant: NHN CORPORATION

Inventors: Sang Ho LEE, Hoon KIM, Dong Ook KOO, Dae Sung JUNG
COMPRESSED PHONETIC REPRESENTATION

Publication number: 20120278079

Abstract: An audio processing system makes use of a number of levels of compression or data reduction, thereby providing reduced storage requirements while maintaining a high accuracy of keyword detection in the original audio input.

Type: Application

Filed: April 29, 2011

Publication date: November 1, 2012

Inventors: Jon A. Arrowood, Robert W. Morris, Peter S. Cardillo, Marsal Gavalda
SPEECH RECOGNITION BASED ON PRONUNCIATION MODELING

Publication number: 20120271635

Abstract: A system and method for performing speech recognition is disclosed. The method comprises receiving an utterance, applying the utterance to a recognizer with a language model having pronunciation probabilities associated with unique word identifiers for words given their pronunciations and presenting a recognition result for the utterance. Recognition improvement is found by moving a pronunciation model from a dictionary to the language model.

Type: Application

Filed: July 2, 2012

Publication date: October 25, 2012

Applicant: AT&T Intellectual Property II, L.P.

Inventor: Andrej Ljolje
Context Based Voice Activity Detection Sensitivity

Publication number: 20120271634

Abstract: A speech dialog system is described that adjusts a voice activity detection threshold during a speech dialog prompt C to reflect a context-based probability of user barge in speech occurring. For example, the context-based probability may be based on the location of one or more transition relevance places in the speech dialog prompt.

Type: Application

Filed: March 26, 2010

Publication date: October 25, 2012

Applicant: NUANCE COMMUNICATIONS, INC.

Inventor: Nils Lenke
Speech based learning/training system using semantic decoding

Publication number: 20120265531

Abstract: An intelligent query system for processing voiced-based queries is disclosed, which uses semantic based processing to identify the question posed by the user by understanding the meaning of the users utterance. Based on identifying the meaning of the utterance, the system selects a single answer that best matches the user's query. The answer that is paired to this single question is then retrieved and presented to the user. The system, as implemented, accepts environmental variables selected by the user and is scalable to provide answers to a variety and quantity of user-initiated queries.

Type: Application

Filed: June 18, 2012

Publication date: October 18, 2012

Inventor: Ian M. Bennett
SYSTEM AND METHOD FOR PROVIDING AUGMENTED DATA IN A NETWORK ENVIRONMENT

Publication number: 20120262533

Abstract: A method is provided in one example and includes identifying a particular word recited by an active speaker in a conference involving a plurality of endpoints in a network environment; evaluating a profile associated with the active speaker in order to identify contextual information associated with the particular word; and providing augmented data associated with the particular word to at least some of the plurality of endpoints. In more specific examples, the active speaker is identified using a facial detection protocol, or a speech recognition protocol. Data from the active speaker can be converted from speech to text.

Type: Application

Filed: April 18, 2011

Publication date: October 18, 2012

Inventors: Satish K. Gannu, Leon A. Frazier, Didier R. Moretti
Online Maximum-Likelihood Mean and Variance Normalization for Speech Recognition

Publication number: 20120259632

Abstract: A feature transform for speech recognition is described. An input speech utterance is processed to produce a sequence of representative speech vectors. A time-synchronous speech recognition pass is performed using a decoding search to determine a recognition output corresponding to the speech input. The decoding search includes, for each speech vector after some first threshold number of speech vectors, estimating a feature transform based on the preceding speech vectors in the utterance and partial decoding results of the decoding search. The current speech vector is then adjusted based on the current feature transform, and the adjusted speech vector is used in a current frame of the decoding search.

Type: Application

Filed: February 22, 2010

Publication date: October 11, 2012

Applicant: NUANCE COMMUNICATIONS, INC.

Inventor: Daniel Willett
SPEECH RECOGNITION DEPENDENT ON TEXT MESSAGE CONTENT

Publication number: 20120245934

Abstract: A method of automatic speech recognition. An utterance is received from a user in reply to a text message, via a microphone that converts the reply utterance into a speech signal. The speech signal is processed using at least one processor to extract acoustic data from the speech signal. An acoustic model is identified from a plurality of acoustic models to decode the acoustic data, and using a conversational context associated with the text message. The acoustic data is decoded using the identified acoustic model to produce a plurality of hypotheses for the reply utterance.

Type: Application

Filed: March 25, 2011

Publication date: September 27, 2012

Applicant: GENERAL MOTORS LLC

Inventors: Gaurav Talwar, Xufang Zhao
Downsampling Schemes in a Hierarchical Neural Network Structure for Phoneme Recognition

Publication number: 20120239403

Abstract: An approach for phoneme recognition is described. A sequence of intermediate output posterior vectors is generated from an input sequence of cepstral features using a first layer perceptron. The intermediate output posterior vectors are then downsampled to form a reduced input set of intermediate posterior vectors for a second layer perceptron. A sequence of final posterior vectors is generated from the reduced input set of intermediate posterior vectors using the second layer perceptron. Then the final posterior vectors are decoded to determine an output recognized phoneme sequence representative of the input sequence of cepstral features.

Type: Application

Filed: September 28, 2009

Publication date: September 20, 2012

Applicant: NUANCE COMMUNICATIONS, INC.

Inventors: Daniel Andrés Vásquez Cano, Guillermo Aradilla, Rainer Gruhn
METHOD AND APPARATUS FOR CORRECTING A WORD IN SPEECH INPUT TEXT

Publication number: 20120232904

Abstract: A method and apparatus for correcting a named entity word in a speech input text. The method includes recognizing a speech input signal from a user, obtaining a recognition result including named entity vocabulary mark-up information, determining a named entity word recognized incorrectly in the recognition result according to the named entity vocabulary mark-up information, displaying the named entity word recognized incorrectly, and correcting the named entity word recognized incorrectly.

Type: Application

Filed: March 12, 2012

Publication date: September 13, 2012

Applicant: Samsung Electronics Co., Ltd.

Inventors: Xuan ZHU, Hua Zhang, Tengrong Su, Ki-Wan Eom, Jae-Won Lee
SYSTEM AND METHOD OF PROVIDING AN AUTOMATED DATA-COLLECTION IN SPOKEN DIALOG SYSTEMS

Publication number: 20120232898

Abstract: The invention relates to a system and method for gathering data for use in a spoken dialog system. An aspect of the invention is generally referred to as an automated hidden human that performs data collection automatically at the beginning of a conversation with a user in a spoken dialog system. The method comprises presenting an initial prompt to a user, recognizing a received user utterance using an automatic speech recognition engine and classifying the recognized user utterance using a spoken language understanding module. If the recognized user utterance is not understood or classifiable to a predetermined acceptance threshold, then the method re-prompts the user. If the recognized user utterance is not classifiable to a predetermined rejection threshold, then the method transfers the user to a human as this may imply a task-specific utterance. The received and classified user utterance is then used for training the spoken dialog system.

Type: Application

Filed: May 21, 2012

Publication date: September 13, 2012

Applicant: AT&T Intellectual Property II, L.P.

Inventors: Giuseppe Di Fabbrizio, Dilek Z. Hakkani-Tur, Mazin G. Rahim, Bernard S. Renger, Gokhan Tur
AUTOMATIC SPOKEN LANGUAGE IDENTIFICATION BASED ON PHONEME SEQUENCE PATTERNS

Publication number: 20120232901

Abstract: A language identification system that includes a universal phoneme decoder (UPD) is described. The UPD contains a universal phoneme set representing both 1) all phonemes occurring in the set of two or more spoken languages, and 2) captures phoneme correspondences across languages, such that a set of unique phoneme patterns and probabilities are calculated in order to identify a most likely phoneme occurring each time in the audio files in the set of two or more potential languages in which the UPD was trained on. Each statistical language model (SLM) uses the set of unique phoneme patterns created for each language in the set to distinguish between spoken human languages in the set of languages. The run-time language identifier module identifies a particular human language being spoken by utilizing the linguistic probabilities supplied by the SLMs that are based on the set of unique phoneme patterns created for each language.

Type: Application

Filed: May 24, 2012

Publication date: September 13, 2012

Applicant: Autonomy Corporation Ltd.

Inventors: Mahapathy Kadirkamanathan, Christopher John Waple
Performance measurement for customer contact centers

Publication number: 20120215538

Abstract: In one embodiment, a method includes identifying a first communication from a customer, identifying a second communication from the customer following a response to the first communication from a contact center, and analyzing the first and second communications at a contact center network device to determine a change in sentiment from the first communication to the second communication. An apparatus for contact center performance measurement is also disclosed.

Type: Application

Filed: February 17, 2011

Publication date: August 23, 2012

Applicant: CISCO TECHNOLOGY, INC.

Inventors: Andrew Cleasby, Robert Zacher
HYBRIDIZED CLIENT-SERVER SPEECH RECOGNITION

Publication number: 20120215539

Abstract: A recipient computing device can receive a speech utterance to be processed by speech recognition and segment the speech utterance into two or more speech utterance segments, each of which can be to one of a plurality of available speech recognizers. A first one of the plurality of available speech recognizers can be implemented on a separate computing device accessible via a data network. A first segment can be processed by the first recognizer and the results of the processing returned to the recipient computing device, and a second segment can be processed by a second recognizer implemented at the recipient computing device.

Type: Application

Filed: February 22, 2012

Publication date: August 23, 2012

Inventor: Ajay Juneja
VOICED PROGRAMMING SYSTEM AND METHOD

Publication number: 20120209610

Abstract: Provided herein are systems and methods for using context-sensitive speech recognition logic in a computer to create a software program, including context-aware voice entry of instructions that make up a software program, automatic context-sensitive instruction formatting, and automatic context-sensitive insertion-point positioning.

Type: Application

Filed: April 24, 2012

Publication date: August 16, 2012

Inventor: Lunis ORCUTT
CONFIDENCE MEASURE GENERATION FOR SPEECH RELATED SEARCHING

Publication number: 20120185252

Abstract: A method of generating a confidence measure generator is provided for use in a voice search system, the voice search system including voice search components comprising a speech recognition system, a dialog manager and a search system. The method includes selecting voice search features, from a plurality of the voice search components, to be considered by the confidence measure generator in generating a voice search confidence measure. The method includes training a model, using a computer processor, to generate the voice search confidence measure based on selected voice search features.

Type: Application

Filed: March 23, 2012

Publication date: July 19, 2012

Applicant: Microsoft Corporation

Inventors: Ye-Yi Wang, Yun-Cheng Ju, Dong Yu
Subspace Speech Adaptation

Publication number: 20120173240

Abstract: Subspace speech adaptation may be utilized for facilitating the recognition of speech containing short utterances. Speech training data may be received in a speech model by a computer. A first matrix may be determined for preconditioning speech statistics based on the speech training data. A second matrix may be determined for representing a basis for the speech to be recognized. A set of basis matrices may then be determined from the first matrix and the second matrix. Speech test data including a short utterance may then be received by the computer. The computer may then apply the set of basis matrices to the speech test data to produce a transcription. The transcription may represent speech recognition of the short utterance.

Type: Application

Filed: December 30, 2010

Publication date: July 5, 2012

Applicant: MICROSOFT CORPORATION

Inventors: Daniel Povey, Kaisheng YAO, Yifan Gong
CONVERTING TEXT INTO SPEECH FOR SPEECH RECOGNITION

Publication number: 20120166197

Abstract: The present invention discloses converting a text form into a speech. In the present invention, partial word lists of a data source are obtained by parsing the data source in parallel or in series. The partial word lists are then compiled to obtain phoneme graphs corresponding, respectively, to the partial word lists, and then the obtained phoneme graphs are combined. Speech recognition is then conducted according to the combination results. According to the present invention, computational complexity may be reduced and recognition efficiency may be improved during speech recognition.

Type: Application

Filed: March 4, 2012

Publication date: June 28, 2012

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Guo Kang Fu, Zhao Bing Han, Bin Jia, Ying Liu
METHOD AND APPARATUS FOR RECOGNIZING SPEECH

Publication number: 20120166194

Abstract: Disclosed herein are an apparatus and method for recognizing speech. The apparatus includes a frame-based speech recognition unit, a segment division unit, a segment feature extraction unit, a segment speech recognition performance unit, and a combination and synchronization unit. The frame-based speech recognition unit extracts frame speech feature vectors from a speech signal, and performs speech recognition on frames of the speech signal using the frame speech feature vectors and a frame-based probability model. The segment division unit divides the speech signal into segments. The segment feature extraction unit extracts segment speech feature vectors around a boundary between the segments. The segment speech recognition performance unit performs speech recognition on the segments of the speech signal using the segment speech feature vectors and a segment-based probability model.

Type: Application

Filed: December 22, 2011

Publication date: June 28, 2012

Applicant: Electronics and Telecommunications Research Institute

Inventors: Ho-Young JUNG, Jeon-Gue PARK, Hoon CHUNG
Word-Dependent Language Model

Publication number: 20120166196

Abstract: This document describes word-dependent language models, as well as their creation and use. A word-dependent language model can permit a speech-recognition engine to accurately verify that a speech utterance matches a multi-word phrase. This is useful in many contexts, including those where one or more letters of the expected phrase are known to the speaker.

Type: Application

Filed: December 23, 2010

Publication date: June 28, 2012

Applicant: Microsoft Corporation

Inventors: Yun-Cheng Ju, Ivan J. Tashev, Chad R. Heinemann
SYNCHRONISE AN AUDIO CURSOR AND A TEXT CURSOR DURING EDITING

Publication number: 20120158405

Abstract: A speech recognition device (1) processes speech data (SD) of a dictation and establishes recognized text information (ETI) and link information (LI) of the dictation. In a synchronous playback mode of the speech recognition device (1), during acoustic playback of the dictation a correction device (10) synchronously marks the word of the recognized text information (ETI) which word relates to speech data (SD) just played back marked by link information (LI) is marked synchronously, the just marked word featuring the position of an audio cursor (AC). When a user of the speech recognition device (1) recognizes an incorrect word, he positions a text cursor (TC) at the incorrect word and corrects it. Cursor synchronization means (15) makes it possible to synchronize text cursor (TC) with audio cursor (AC) or audio cursor (AC) with text cursor (TC) so the positioning of the respective cursor (AC, TC) is simplified considerably.

Type: Application

Filed: February 13, 2012

Publication date: June 21, 2012

Applicant: Nuance Communications Austria GmbH

Inventor: Wolfgang Gschwendtner
MALE ACOUSTIC MODEL ADAPTATION BASED ON LANGUAGE-INDEPENDENT FEMALE SPEECH DATA

Publication number: 20120150541

Abstract: A method of generating proxy acoustic models for use in automatic speech recognition includes training acoustic models from speech received via microphone from male speakers of a first language, and adapting the acoustic models in response to language-independent speech data from female speakers of a second language, to generate proxy acoustic models for use during runtime of speech recognition of an utterance from a female speaker of the first language.

Type: Application

Filed: December 10, 2010

Publication date: June 14, 2012

Applicant: GENERAL MOTORS LLC

Inventors: Gaurav Talwar, Rathinavelu Chengalvarayan
MULTIMODAL DISAMBIGUATION OF SPEECH RECOGNITION

Publication number: 20120143607

Abstract: The present invention provides a speech recognition system combined with one or more alternate input modalities to ensure efficient and accurate text input. The speech recognition system achieves less than perfect accuracy due to limited processing power, environmental noise, and/or natural variations in speaking style. The alternate input modalities use disambiguation or recognition engines to compensate for reduced keyboards, sloppy input, and/or natural variations in writing style. The ambiguity remaining in the speech recognition process is mostly orthogonal to the ambiguity inherent in the alternate input modality, such that the combination of the two modalities resolves the recognition errors efficiently and accurately. The invention is especially well suited for mobile devices with limited space for keyboards or touch-screen input.

Type: Application

Filed: December 6, 2011

Publication date: June 7, 2012

Inventors: Michael LONGÉ, Richard Eyraud, Keith C. Hullfish
Sound Event Detecting Module and Method Thereof

Publication number: 20120143610

Abstract: A sound event detecting module for detecting whether a sound event with characteristic of repeating is generated. A sound end recognizing unit recognizes ends of sounds according to a sound signal to generate sound sections and multiple sets of feature vectors of the sound sections correspondingly. A storage unit stores at least M sets of feature vectors. A similarity comparing unit compares the at least M sets of feature vectors with each other, and correspondingly generates a similarity score matrix, which stores similarity scores of any two of the sound sections of the at least M of the sound sections. A correlation arbitrating unit determines the number of sound sections with high correlations to each other according to the similarity score matrix. When the number is greater than one threshold value, the correlation arbitrating unit indicates that the sound event with the characteristic of repeating is generated.

Type: Application

Filed: December 30, 2010

Publication date: June 7, 2012

Applicant: INDUSTRIAL TECHNOLOGY RESEARCH INSTITUTE

Inventors: Yuh-Ching Wang, Kuo-Yuan Li
SPEECH RECOGNITION SYSTEM WITH HUGE VOCABULARY

Publication number: 20120136662

Abstract: The invention deals with speech recognition, such as a system for recognizing words in continuous speech. A speech recognition system is disclosed which is capable of recognizing a huge number of words, and in principle even an unlimited number of words. The speech recognition system comprises a word recognizer for deriving a best path through a word graph, and wherein words are assigned to the speech based on the best path. The word score being obtained from applying a phonemic language model to each word of the word graph. Moreover, the invention deals with an apparatus and a method for identifying words from a sound block and to computer readable code for implementing the method.

Type: Application

Filed: February 3, 2012

Publication date: May 31, 2012

Applicant: Nuance Communications Austria GMBH

Inventor: Zsolt Saffer
CONVERTING TEXT INTO SPEECH FOR SPEECH RECOGNITION

Publication number: 20120136661

Abstract: The present invention discloses converting a text form into a speech. In the present invention, partial word lists of a data source are obtained by parsing the data source in parallel or in series. The partial word lists are then compiled to obtain phoneme graphs corresponding, respectively, to the partial word lists, and then the obtained phoneme graphs are combined. Speech recognition is then conducted according to the combination results. According to the present invention, computational complexity may be reduced and recognition efficiency may be improved during speech recognition.

Type: Application

Filed: November 2, 2011

Publication date: May 31, 2012

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Guo Kang Fu, Zhao Bing Han, Bin Jia, Ying Liu
VOICE-ESTIMATION BASED ON REAL-TIME PROBING OF THE VOCAL TRACT

Publication number: 20120136660

Abstract: A voice-estimation device that probes the vocal tract of a user with sub-threshold acoustic waves to estimate the user's voice while the user speaks silently or audibly in a noisy or socially sensitive environment. The waves reflected by the vocal tract are detected and converted into a digital signal, which is then processed segment-by-segment. Based on the processing, a set of formant frequencies is determined for each segment. Each such set is then analyzed to assign a phoneme to the corresponding segment of the digital signal. The resulting sequence of phonemes is converted into a digital audio signal or text representing the user's estimated voice.

Type: Application

Filed: November 30, 2010

Publication date: May 31, 2012

Applicant: ALCATEL-LUCENT USA INC.

Inventors: Dale D. Harman, Lothar Benedikt Moeller

prev 1 2 3 4 5 next