Segmentation Or Word Limit Detection (epo) Patents (Class 704/E15.005)
  • Publication number: 20130159000
    Abstract: The subject disclosure is directed towards training a classifier for spoken utterances without relying on human-assistance. The spoken utterances may be related to a voice menu program for which a speech comprehension component interprets the spoken utterances into voice menu options. The speech comprehension component provides confirmations to some of the spoken utterances in order to accurately assign a semantic label. For each spoken utterance with a denied confirmation, the speech comprehension component automatically generates a pseudo-semantic label that is consistent with the denied confirmation and selected from a set of potential semantic labels and updates a classification model associated with the classifier using the pseudo-semantic label.
    Type: Application
    Filed: December 15, 2011
    Publication date: June 20, 2013
    Applicant: MICROSOFT CORPORATION
    Inventors: Yun-Cheng Ju, James Garnet Droppo, III
  • Publication number: 20130138441
    Abstract: Disclosed is a method of generating a search network for voice recognition, the method including: generating a pronunciation transduction weighted finite state transducer by implementing a pronunciation transduction rule representing a phenomenon of pronunciation transduction between recognition units as a weighted finite state transducer; and composing the pronunciation transduction weighted finite state transducer and one or more weighted finite state transducers.
    Type: Application
    Filed: August 14, 2012
    Publication date: May 30, 2013
    Applicant: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE
    Inventors: Seung Hi Kim, Dong Hyun Kim, Young Ik Kim, Jun Park, Hoon Young Cho, Sang Hun Kim
  • Publication number: 20130110492
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for predicting the stability of speech recognition results. In one aspect, a method includes determining a length of time, or a number of occasions, in which a word has remained in an incremental speech recognizer's top hypothesis, and assigning a stability metric to the word based on the length of time or number of occasions.
    Type: Application
    Filed: May 1, 2012
    Publication date: May 2, 2013
    Applicant: GOOGLE INC.
    Inventors: Ian C. McGraw, Alexander H. Gruenstein
  • Publication number: 20130103402
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for combining frame and segment level processing, via temporal pooling, for phonetic classification. A frame processor unit receives an input and extracts the time-dependent features from the input. A plurality of pooling interface units generates a plurality of feature vectors based on pooling the time-dependent features and selecting a plurality of time-dependent features according to a plurality of selection strategies. Next, a plurality of segmental classification units generates scores for the feature vectors. Each segmental classification unit (SCU) can be dedicated to a specific pooling interface unit (PIU) to form a PIU-SCU combination. Multiple PIU-SCU combinations can be further combined to form an ensemble of combinations, and the ensemble can be diversified by varying the pooling operations used by the PIU-SCU combinations.
    Type: Application
    Filed: October 25, 2011
    Publication date: April 25, 2013
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Sumit CHOPRA, Dimitrios Dimitriadis, Patrick Haffner
  • Publication number: 20130096918
    Abstract: A recognizing device includes a memory and a processor coupled to the memory. The memory stores words included in a sentence and positional information indicating a position of the words in the sentence. The processor executes a process including comparing an input voice signal with reading information of a character string that connects a plurality of words stored in the memory to calculate a similarity; calculating a connection score indicating a proximity between the plurality of connected words based on positional information of the words stored in the memory; and determining a character string corresponding to the voice signal based on the similarity and the connection score.
    Type: Application
    Filed: August 15, 2012
    Publication date: April 18, 2013
    Applicant: FUJITSU LIMITED
    Inventor: Shouji HARADA
  • Publication number: 20130085757
    Abstract: An embodiment of an apparatus for speech recognition includes a plurality of trigger detection units, each of which is configured to detect a start trigger for recognizing a command utterance for controlling a device, a selection unit, utilizing a signal from one or more sensors embedded on the device, configured to select a selected trigger detection unit among the trigger detection units, the selected trigger detection unit being appropriate to a usage environment of the device, and a recognition unit configured to recognize the command utterance when the start trigger is detected by the selected trigger detection unit.
    Type: Application
    Filed: June 29, 2012
    Publication date: April 4, 2013
    Applicant: Kabushiki Kaisha Toshiba
    Inventors: Masanobu NAKAMURA, Akinori KAWAMURA
  • Publication number: 20130066635
    Abstract: An apparatus and a method, which set a remote control command for controlling a home network service in a portable terminal are provided. The apparatus includes a memory for storing configuration types of a remote control command in a set order in a home network service; and a controller for setting the remote control command including the input configuration types of the remote control command and transmitting the remote control command, when the configuration types of the remote control command are input in the set order in the home network service.
    Type: Application
    Filed: September 10, 2012
    Publication date: March 14, 2013
    Applicant: Samsung Electronics Co., Ltd.
    Inventors: Jong-Seok KIM, Jin Park
  • Publication number: 20130060572
    Abstract: In an aspect, in general, method for aligning an audio recording and a transcript includes receiving a transcript including a plurality of terms, each term of the plurality of terms associated with a time location within a different version of the audio recording, forming a plurality of search terms from the terms of the transcript, determining possible time locations of the search terms in the audio recording, determining a correspondence between time locations within the different version of the audio recording associated with the search terms and the possible time locations of the search terms in the audio recording, and aligning the audio recording and the transcript including updating the time location associated with terms of the transcript based on the determined correspondence.
    Type: Application
    Filed: September 4, 2012
    Publication date: March 7, 2013
    Applicant: Nexidia Inc.
    Inventors: Jacob B. Garland, Drew Lanham, Daryl Kip Watters, Marsal Gavalda, Mark Finlay, Kenneth K. Griggs
  • Publication number: 20130041667
    Abstract: The present invention provides a speech recognition system combined with one or more alternate input modalities to ensure efficient and accurate text input. The speech recognition system achieves less than perfect accuracy due to limited processing power, environmental noise, and/or natural variations in speaking style. The alternate input modalities use disambiguation or recognition engines to compensate for reduced keyboards, sloppy input, and/or natural variations in writing style. The ambiguity remaining in the speech recognition process is mostly orthogonal to the ambiguity inherent in the alternate input modality, such that the combination of the two modalities resolves the recognition errors efficiently and accurately. The invention is especially well suited for mobile devices with limited space for keyboards or touch-screen input.
    Type: Application
    Filed: October 12, 2012
    Publication date: February 14, 2013
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventor: NUANCE COMMUNICATIONS, INC.
  • Publication number: 20130035939
    Abstract: Disclosed herein is a method for speech recognition. The method includes receiving speech utterances, assigning a pronunciation weight to each unit of speech in the speech utterances, each respective pronunciation weight being normalized at a unit of speech level to sum to 1, for each received speech utterance, optimizing the pronunciation weight by identifying word and phone alignments and corresponding likelihood scores, and discriminatively adapting the pronunciation weight to minimize classification errors, and recognizing additional received speech utterances using the optimized pronunciation weights. A unit of speech can be a sentence, a word, a context-dependent phone, a context-independent phone, or a syllable. The method can further include discriminatively adapting pronunciation weights based on an objective function. The objective function can be maximum mutual information, maximum likelihood training, minimum classification error training, or other functions known to those of skill in the art.
    Type: Application
    Filed: October 11, 2012
    Publication date: February 7, 2013
    Applicant: AT&T INTELLECTUAL PROPERTY I, L.P.
    Inventor: AT&T INTELLECTUAL PROPERTY I, L.P.
  • Publication number: 20130035938
    Abstract: The present invention includes a hierarchical search process. The hierarchical search process includes three steps. In a first step, a word boundary is determined using a recognition method of determining a following word dependent on a preceding word, and a word boundary detector. In a second step, word unit based recognition is performed in each area by dividing an input voice into a plurality of areas based on the determined word boundary. Finally, in a third step, a language model is applied to induce an optimal sentence recognition result with respect to a candidate word that is determined for each area. The present invention may improve the voice recognition performance, and particularly, the sentence unit based consecutive voice recognition performance.
    Type: Application
    Filed: July 2, 2012
    Publication date: February 7, 2013
    Applicant: ELECTRONICS AND COMMUNICATIONS RESEARCH INSTITUTE
    Inventor: Ho Young Jung
  • Publication number: 20130013310
    Abstract: A speech recognition system comprising a recognition dictionary for use in speech recognition and a controller configured to recognize an inputted speech by using the recognition dictionary is disclosed. The controller detects a speech section based on a signal level of the inputted speech, recognizes a speech data corresponding to the speech section by using the recognition dictionary, and displays a recognition result of the recognition process and a correspondence item that corresponds to the recognition result in form of list. The correspondence item displayed in form of list is manually operable.
    Type: Application
    Filed: July 5, 2012
    Publication date: January 10, 2013
    Applicant: DENSO CORPORATION
    Inventors: Yuki Fujisawa, Katsushi Asami
  • Publication number: 20130006637
    Abstract: Improved techniques are disclosed for permitting a user to employ more human-based grammar (i.e., free form or conversational input) while addressing a target system via a voice system. For example, a technique for determining intent associated with a spoken utterance of a user comprises the following steps/operations. Decoded speech uttered by the user is obtained. An intent is then extracted from the decoded speech uttered by the user. The intent is extracted in an iterative manner such that a first class is determined after a first iteration and a sub-class of the first class is determined after a second iteration. The first class and the sub-class of the first class are hierarchically indicative of the intent of the user, e.g., a target and data that may be associated with the target.
    Type: Application
    Filed: August 1, 2012
    Publication date: January 3, 2013
    Applicant: Nuance Communications, Inc.
    Inventors: Dimitri Kanevsky, Joseph Simon Reisinger, Robert Sicconi, Mahesh Viswanathan
  • Publication number: 20130006631
    Abstract: Environmental recognition systems may improve recognition accuracy by leveraging local and nonlocal features in a recognition target. A local decoder may be used to analyze local features, and a nonlocal decoder may be used to analyze nonlocal features. Local and nonlocal estimates may then be exchanged to improve the accuracy of the local and nonlocal decoders. Additional iterations of analysis and exchange may be performed until a predetermined threshold is reached. In some embodiments, the system may comprise extrinsic information extractors to prevent positive feedback loops from causing the system to adhere to erroneous previous decisions.
    Type: Application
    Filed: June 28, 2012
    Publication date: January 3, 2013
    Applicant: UTAH STATE UNIVERSITY
    Inventors: Jacob Gunther, Todd Moon
  • Publication number: 20130006639
    Abstract: A word pattern recognition system improves text input entered via a shorthand-on-keyboard interface. A core lexicon comprises commonly used words in a language; an extended lexicon comprises words not included in the core lexicon. The system only directly outputs words from the core lexicon. Candidate words from the extended lexicon can be outputted and simultaneously admitted to the core lexicon upon user selection. A concatenation module enables a user to input parts of a long word separately. A compound word module combines two common shorter words whose concatenation forms a long word.
    Type: Application
    Filed: September 14, 2012
    Publication date: January 3, 2013
    Applicant: Nuance Communications, Inc.
    Inventors: Per-Ola Kristensson, Shumin Zhai
  • Publication number: 20120323577
    Abstract: Methods of automatic speech recognition for premature enunciation. In one method, a) a user is prompted to input speech, then b) a listening period is initiated to monitor audio via a microphone, such that there is no pause between the end of step a) and the beginning of step b), and then the begin-speaking audible indicator is communicated to the user during the listening period. In another method, a) at least one audio file is played including both a prompt for a user to input speech and a begin-speaking audible indicator to the user, b) a microphone is activated to monitor audio, after playing the prompt but before playing the begin-speaking audible indicator in step a), and c) speech is received from the user via the microphone.
    Type: Application
    Filed: June 16, 2011
    Publication date: December 20, 2012
    Applicant: GENERAL MOTORS LLC
    Inventors: John J. Correia, Rathinavelu Chengalvarayan, Gaurav Talwar, Xufang Zhao
  • Publication number: 20120316880
    Abstract: An information processing apparatus, information processing method, and computer readable non-transitory storage medium for analyzing words reflecting information that is not explicitly recognized verbally. An information processing method includes the steps of: extracting speech data and sound data used for recognizing phonemes included in the speech data as words; identifying a section surrounded by pauses within a speech spectrum of the speech data; performing sound analysis on the identified section to identify a word in the section; generating prosodic feature values for the words; acquiring frequencies of occurrence of the word within the speech data; calculating a degree of fluctuation within the speech data for the prosodic feature values of high frequency words where the high frequency words are any words whose frequency of occurrence meets a threshold; and determining a key phrase based on the degree of fluctuation.
    Type: Application
    Filed: August 22, 2012
    Publication date: December 13, 2012
    Applicant: International Business Machines Corporation
    Inventors: Tohru Nagano, Masafumi Nishimura, Ryuki Tachibana
  • Publication number: 20120316878
    Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving geographical information derived from a non-verbal user action associated with a first computing device. The non-verbal user action implies an interest of a user in a geographic location. The method also includes identifying a grammar associated with the geographic location using the derived geographical information and outputting a grammar indicator for use in selecting the identified grammar for voice recognition processing of vocal input from the user.
    Type: Application
    Filed: August 23, 2012
    Publication date: December 13, 2012
    Applicant: Google Inc.
    Inventors: David Singleton, Debajit Ghosh
  • Publication number: 20120316879
    Abstract: A continuous speech recognition system to recognize continuous speech smoothly in a noisy environment. The system selects call commands, configures a minimum recognition network in token, which consists of the call commands and mute intervals including noises, recognizes the inputted speech continuously in real time, analyzes the reliability of speech recognition continuously and recognizes the continuous speech from a speaker. When a speaker delivers a call command, the system for detecting the speech interval and recognizing continuous speech in a noisy environment through the real-time recognition of call commands measures the reliability of the speech after recognizing the call command, and recognizes the speech from the speaker by transferring the speech interval following the call command to a continuous speech-recognition engine at the moment when the system recognizes the call command.
    Type: Application
    Filed: August 22, 2012
    Publication date: December 13, 2012
    Applicant: KOREAPOWERVOICE CO., LTD.
    Inventors: Heui-Suck JUNG, Se-Hoon CHIN, Tae-Young ROH
  • Publication number: 20120303267
    Abstract: A method for speech recognition includes providing a source of geographical information within a vehicle. The geographical information pertains to a current location of the vehicle, a planned travel route of the vehicle, a map displayed within the vehicle, and/or a gesture marked by a user on a map. Words spoken within the vehicle are recognized by use of a speech recognition module. The recognizing is dependent upon the geographical information.
    Type: Application
    Filed: August 6, 2012
    Publication date: November 29, 2012
    Applicant: Robert Bosch GmbH
    Inventors: Zhongnan Shen, Fuliang Weng, Zhe Feng
  • Publication number: 20120299824
    Abstract: To take security into account and increase user friendliness, an information processing device includes: an input unit to which information is input; an extracting unit extracting predetermined words from the information input to the input unit; a classifying unit classifying the words extracted by the extracting unit into first words and second words; and a converting unit converting the first words by a first conversion method and converting the second words by a second conversion method, the second conversion method being different from the first conversion method.
    Type: Application
    Filed: February 4, 2011
    Publication date: November 29, 2012
    Applicant: NIKON CORPORATION
    Inventors: Hideo Hoshuyama, Hiroyuki Akiya, Kazuya Umeyama, Keiichi Nitta, Hiroki Uwai, Masakazu Sekiguchi
  • Publication number: 20120296653
    Abstract: A method of and a system for processing speech. A spoken utterance of a plurality of characters can be received. A plurality of known character sequences that potentially correspond to the spoken utterance can be selected. Each selected known character sequence can be scored based on, at least in part, a weighting of individual characters that comprise the known character sequence.
    Type: Application
    Filed: July 30, 2012
    Publication date: November 22, 2012
    Applicant: Nuance Communications, Inc.
    Inventor: Kenneth D. White
  • Publication number: 20120290302
    Abstract: A Chinese speech recognition system and method is disclosed. Firstly, a speech signal is received and recognized to output a word lattice. Next, the word lattice is received, and word arcs of the word lattice are rescored and reranked with a prosodic break model, a prosodic state model, a syllable prosodic-acoustic model, a syllable-juncture prosodic-acoustic model and a factored language model, so as to output a language tag, a prosodic tag and a phonetic segmentation tag, which correspond to the speech signal. The present invention performs rescoring in a two-stage way to promote the recognition rate of basic speech information and labels the language tag, prosodic tag and phonetic segmentation tag to provide the prosodic structure and language information for the rear-stage voice conversion and voice synthesis.
    Type: Application
    Filed: April 13, 2012
    Publication date: November 15, 2012
    Inventors: Jyh-Her YANG, Chen-Yu Chiang, Ming-Chieh Liu, Yih-Ru Wang, Yuan-Fu Liao, Sin-Horng Chen
  • Publication number: 20120290303
    Abstract: A speech recognition system and method based on word-level candidate generation are provided. The speech recognition system may include a speech recognition result verifying unit to verify a word sequence and a candidate word for at least one word included in the word sequence when the word sequence and the candidate word are provided as a result of speech recognition. A word sequence displaying unit may display the word sequence in which the at least one word is visually distinguishable from other words of the word sequence. The word sequence displaying unit may display the word sequence by replacing the at least one word with the candidate word when the at least one word is selected by a user.
    Type: Application
    Filed: May 8, 2012
    Publication date: November 15, 2012
    Applicant: NHN CORPORATION
    Inventors: Sang Ho LEE, Hoon KIM, Dong Ook KOO, Dae Sung JUNG
  • Publication number: 20120278079
    Abstract: An audio processing system makes use of a number of levels of compression or data reduction, thereby providing reduced storage requirements while maintaining a high accuracy of keyword detection in the original audio input.
    Type: Application
    Filed: April 29, 2011
    Publication date: November 1, 2012
    Inventors: Jon A. Arrowood, Robert W. Morris, Peter S. Cardillo, Marsal Gavalda
  • Publication number: 20120271635
    Abstract: A system and method for performing speech recognition is disclosed. The method comprises receiving an utterance, applying the utterance to a recognizer with a language model having pronunciation probabilities associated with unique word identifiers for words given their pronunciations and presenting a recognition result for the utterance. Recognition improvement is found by moving a pronunciation model from a dictionary to the language model.
    Type: Application
    Filed: July 2, 2012
    Publication date: October 25, 2012
    Applicant: AT&T Intellectual Property II, L.P.
    Inventor: Andrej Ljolje
  • Publication number: 20120271634
    Abstract: A speech dialog system is described that adjusts a voice activity detection threshold during a speech dialog prompt C to reflect a context-based probability of user barge in speech occurring. For example, the context-based probability may be based on the location of one or more transition relevance places in the speech dialog prompt.
    Type: Application
    Filed: March 26, 2010
    Publication date: October 25, 2012
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventor: Nils Lenke
  • Publication number: 20120265531
    Abstract: An intelligent query system for processing voiced-based queries is disclosed, which uses semantic based processing to identify the question posed by the user by understanding the meaning of the users utterance. Based on identifying the meaning of the utterance, the system selects a single answer that best matches the user's query. The answer that is paired to this single question is then retrieved and presented to the user. The system, as implemented, accepts environmental variables selected by the user and is scalable to provide answers to a variety and quantity of user-initiated queries.
    Type: Application
    Filed: June 18, 2012
    Publication date: October 18, 2012
    Inventor: Ian M. Bennett
  • Publication number: 20120262533
    Abstract: A method is provided in one example and includes identifying a particular word recited by an active speaker in a conference involving a plurality of endpoints in a network environment; evaluating a profile associated with the active speaker in order to identify contextual information associated with the particular word; and providing augmented data associated with the particular word to at least some of the plurality of endpoints. In more specific examples, the active speaker is identified using a facial detection protocol, or a speech recognition protocol. Data from the active speaker can be converted from speech to text.
    Type: Application
    Filed: April 18, 2011
    Publication date: October 18, 2012
    Inventors: Satish K. Gannu, Leon A. Frazier, Didier R. Moretti
  • Publication number: 20120259632
    Abstract: A feature transform for speech recognition is described. An input speech utterance is processed to produce a sequence of representative speech vectors. A time-synchronous speech recognition pass is performed using a decoding search to determine a recognition output corresponding to the speech input. The decoding search includes, for each speech vector after some first threshold number of speech vectors, estimating a feature transform based on the preceding speech vectors in the utterance and partial decoding results of the decoding search. The current speech vector is then adjusted based on the current feature transform, and the adjusted speech vector is used in a current frame of the decoding search.
    Type: Application
    Filed: February 22, 2010
    Publication date: October 11, 2012
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventor: Daniel Willett
  • Publication number: 20120245934
    Abstract: A method of automatic speech recognition. An utterance is received from a user in reply to a text message, via a microphone that converts the reply utterance into a speech signal. The speech signal is processed using at least one processor to extract acoustic data from the speech signal. An acoustic model is identified from a plurality of acoustic models to decode the acoustic data, and using a conversational context associated with the text message. The acoustic data is decoded using the identified acoustic model to produce a plurality of hypotheses for the reply utterance.
    Type: Application
    Filed: March 25, 2011
    Publication date: September 27, 2012
    Applicant: GENERAL MOTORS LLC
    Inventors: Gaurav Talwar, Xufang Zhao
  • Publication number: 20120239403
    Abstract: An approach for phoneme recognition is described. A sequence of intermediate output posterior vectors is generated from an input sequence of cepstral features using a first layer perceptron. The intermediate output posterior vectors are then downsampled to form a reduced input set of intermediate posterior vectors for a second layer perceptron. A sequence of final posterior vectors is generated from the reduced input set of intermediate posterior vectors using the second layer perceptron. Then the final posterior vectors are decoded to determine an output recognized phoneme sequence representative of the input sequence of cepstral features.
    Type: Application
    Filed: September 28, 2009
    Publication date: September 20, 2012
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventors: Daniel Andrés Vásquez Cano, Guillermo Aradilla, Rainer Gruhn
  • Publication number: 20120232904
    Abstract: A method and apparatus for correcting a named entity word in a speech input text. The method includes recognizing a speech input signal from a user, obtaining a recognition result including named entity vocabulary mark-up information, determining a named entity word recognized incorrectly in the recognition result according to the named entity vocabulary mark-up information, displaying the named entity word recognized incorrectly, and correcting the named entity word recognized incorrectly.
    Type: Application
    Filed: March 12, 2012
    Publication date: September 13, 2012
    Applicant: Samsung Electronics Co., Ltd.
    Inventors: Xuan ZHU, Hua Zhang, Tengrong Su, Ki-Wan Eom, Jae-Won Lee
  • Publication number: 20120232898
    Abstract: The invention relates to a system and method for gathering data for use in a spoken dialog system. An aspect of the invention is generally referred to as an automated hidden human that performs data collection automatically at the beginning of a conversation with a user in a spoken dialog system. The method comprises presenting an initial prompt to a user, recognizing a received user utterance using an automatic speech recognition engine and classifying the recognized user utterance using a spoken language understanding module. If the recognized user utterance is not understood or classifiable to a predetermined acceptance threshold, then the method re-prompts the user. If the recognized user utterance is not classifiable to a predetermined rejection threshold, then the method transfers the user to a human as this may imply a task-specific utterance. The received and classified user utterance is then used for training the spoken dialog system.
    Type: Application
    Filed: May 21, 2012
    Publication date: September 13, 2012
    Applicant: AT&T Intellectual Property II, L.P.
    Inventors: Giuseppe Di Fabbrizio, Dilek Z. Hakkani-Tur, Mazin G. Rahim, Bernard S. Renger, Gokhan Tur
  • Publication number: 20120232901
    Abstract: A language identification system that includes a universal phoneme decoder (UPD) is described. The UPD contains a universal phoneme set representing both 1) all phonemes occurring in the set of two or more spoken languages, and 2) captures phoneme correspondences across languages, such that a set of unique phoneme patterns and probabilities are calculated in order to identify a most likely phoneme occurring each time in the audio files in the set of two or more potential languages in which the UPD was trained on. Each statistical language model (SLM) uses the set of unique phoneme patterns created for each language in the set to distinguish between spoken human languages in the set of languages. The run-time language identifier module identifies a particular human language being spoken by utilizing the linguistic probabilities supplied by the SLMs that are based on the set of unique phoneme patterns created for each language.
    Type: Application
    Filed: May 24, 2012
    Publication date: September 13, 2012
    Applicant: Autonomy Corporation Ltd.
    Inventors: Mahapathy Kadirkamanathan, Christopher John Waple
  • Publication number: 20120215538
    Abstract: In one embodiment, a method includes identifying a first communication from a customer, identifying a second communication from the customer following a response to the first communication from a contact center, and analyzing the first and second communications at a contact center network device to determine a change in sentiment from the first communication to the second communication. An apparatus for contact center performance measurement is also disclosed.
    Type: Application
    Filed: February 17, 2011
    Publication date: August 23, 2012
    Applicant: CISCO TECHNOLOGY, INC.
    Inventors: Andrew Cleasby, Robert Zacher
  • Publication number: 20120215539
    Abstract: A recipient computing device can receive a speech utterance to be processed by speech recognition and segment the speech utterance into two or more speech utterance segments, each of which can be to one of a plurality of available speech recognizers. A first one of the plurality of available speech recognizers can be implemented on a separate computing device accessible via a data network. A first segment can be processed by the first recognizer and the results of the processing returned to the recipient computing device, and a second segment can be processed by a second recognizer implemented at the recipient computing device.
    Type: Application
    Filed: February 22, 2012
    Publication date: August 23, 2012
    Inventor: Ajay Juneja
  • Publication number: 20120209610
    Abstract: Provided herein are systems and methods for using context-sensitive speech recognition logic in a computer to create a software program, including context-aware voice entry of instructions that make up a software program, automatic context-sensitive instruction formatting, and automatic context-sensitive insertion-point positioning.
    Type: Application
    Filed: April 24, 2012
    Publication date: August 16, 2012
    Inventor: Lunis ORCUTT
  • Publication number: 20120185252
    Abstract: A method of generating a confidence measure generator is provided for use in a voice search system, the voice search system including voice search components comprising a speech recognition system, a dialog manager and a search system. The method includes selecting voice search features, from a plurality of the voice search components, to be considered by the confidence measure generator in generating a voice search confidence measure. The method includes training a model, using a computer processor, to generate the voice search confidence measure based on selected voice search features.
    Type: Application
    Filed: March 23, 2012
    Publication date: July 19, 2012
    Applicant: Microsoft Corporation
    Inventors: Ye-Yi Wang, Yun-Cheng Ju, Dong Yu
  • Publication number: 20120173240
    Abstract: Subspace speech adaptation may be utilized for facilitating the recognition of speech containing short utterances. Speech training data may be received in a speech model by a computer. A first matrix may be determined for preconditioning speech statistics based on the speech training data. A second matrix may be determined for representing a basis for the speech to be recognized. A set of basis matrices may then be determined from the first matrix and the second matrix. Speech test data including a short utterance may then be received by the computer. The computer may then apply the set of basis matrices to the speech test data to produce a transcription. The transcription may represent speech recognition of the short utterance.
    Type: Application
    Filed: December 30, 2010
    Publication date: July 5, 2012
    Applicant: MICROSOFT CORPORATION
    Inventors: Daniel Povey, Kaisheng YAO, Yifan Gong
  • Publication number: 20120166197
    Abstract: The present invention discloses converting a text form into a speech. In the present invention, partial word lists of a data source are obtained by parsing the data source in parallel or in series. The partial word lists are then compiled to obtain phoneme graphs corresponding, respectively, to the partial word lists, and then the obtained phoneme graphs are combined. Speech recognition is then conducted according to the combination results. According to the present invention, computational complexity may be reduced and recognition efficiency may be improved during speech recognition.
    Type: Application
    Filed: March 4, 2012
    Publication date: June 28, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Guo Kang Fu, Zhao Bing Han, Bin Jia, Ying Liu
  • Publication number: 20120166194
    Abstract: Disclosed herein are an apparatus and method for recognizing speech. The apparatus includes a frame-based speech recognition unit, a segment division unit, a segment feature extraction unit, a segment speech recognition performance unit, and a combination and synchronization unit. The frame-based speech recognition unit extracts frame speech feature vectors from a speech signal, and performs speech recognition on frames of the speech signal using the frame speech feature vectors and a frame-based probability model. The segment division unit divides the speech signal into segments. The segment feature extraction unit extracts segment speech feature vectors around a boundary between the segments. The segment speech recognition performance unit performs speech recognition on the segments of the speech signal using the segment speech feature vectors and a segment-based probability model.
    Type: Application
    Filed: December 22, 2011
    Publication date: June 28, 2012
    Applicant: Electronics and Telecommunications Research Institute
    Inventors: Ho-Young JUNG, Jeon-Gue PARK, Hoon CHUNG
  • Publication number: 20120166196
    Abstract: This document describes word-dependent language models, as well as their creation and use. A word-dependent language model can permit a speech-recognition engine to accurately verify that a speech utterance matches a multi-word phrase. This is useful in many contexts, including those where one or more letters of the expected phrase are known to the speaker.
    Type: Application
    Filed: December 23, 2010
    Publication date: June 28, 2012
    Applicant: Microsoft Corporation
    Inventors: Yun-Cheng Ju, Ivan J. Tashev, Chad R. Heinemann
  • Publication number: 20120158405
    Abstract: A speech recognition device (1) processes speech data (SD) of a dictation and establishes recognized text information (ETI) and link information (LI) of the dictation. In a synchronous playback mode of the speech recognition device (1), during acoustic playback of the dictation a correction device (10) synchronously marks the word of the recognized text information (ETI) which word relates to speech data (SD) just played back marked by link information (LI) is marked synchronously, the just marked word featuring the position of an audio cursor (AC). When a user of the speech recognition device (1) recognizes an incorrect word, he positions a text cursor (TC) at the incorrect word and corrects it. Cursor synchronization means (15) makes it possible to synchronize text cursor (TC) with audio cursor (AC) or audio cursor (AC) with text cursor (TC) so the positioning of the respective cursor (AC, TC) is simplified considerably.
    Type: Application
    Filed: February 13, 2012
    Publication date: June 21, 2012
    Applicant: Nuance Communications Austria GmbH
    Inventor: Wolfgang Gschwendtner
  • Publication number: 20120150541
    Abstract: A method of generating proxy acoustic models for use in automatic speech recognition includes training acoustic models from speech received via microphone from male speakers of a first language, and adapting the acoustic models in response to language-independent speech data from female speakers of a second language, to generate proxy acoustic models for use during runtime of speech recognition of an utterance from a female speaker of the first language.
    Type: Application
    Filed: December 10, 2010
    Publication date: June 14, 2012
    Applicant: GENERAL MOTORS LLC
    Inventors: Gaurav Talwar, Rathinavelu Chengalvarayan
  • Publication number: 20120143607
    Abstract: The present invention provides a speech recognition system combined with one or more alternate input modalities to ensure efficient and accurate text input. The speech recognition system achieves less than perfect accuracy due to limited processing power, environmental noise, and/or natural variations in speaking style. The alternate input modalities use disambiguation or recognition engines to compensate for reduced keyboards, sloppy input, and/or natural variations in writing style. The ambiguity remaining in the speech recognition process is mostly orthogonal to the ambiguity inherent in the alternate input modality, such that the combination of the two modalities resolves the recognition errors efficiently and accurately. The invention is especially well suited for mobile devices with limited space for keyboards or touch-screen input.
    Type: Application
    Filed: December 6, 2011
    Publication date: June 7, 2012
    Inventors: Michael LONGÉ, Richard Eyraud, Keith C. Hullfish
  • Publication number: 20120143610
    Abstract: A sound event detecting module for detecting whether a sound event with characteristic of repeating is generated. A sound end recognizing unit recognizes ends of sounds according to a sound signal to generate sound sections and multiple sets of feature vectors of the sound sections correspondingly. A storage unit stores at least M sets of feature vectors. A similarity comparing unit compares the at least M sets of feature vectors with each other, and correspondingly generates a similarity score matrix, which stores similarity scores of any two of the sound sections of the at least M of the sound sections. A correlation arbitrating unit determines the number of sound sections with high correlations to each other according to the similarity score matrix. When the number is greater than one threshold value, the correlation arbitrating unit indicates that the sound event with the characteristic of repeating is generated.
    Type: Application
    Filed: December 30, 2010
    Publication date: June 7, 2012
    Applicant: INDUSTRIAL TECHNOLOGY RESEARCH INSTITUTE
    Inventors: Yuh-Ching Wang, Kuo-Yuan Li
  • Publication number: 20120136662
    Abstract: The invention deals with speech recognition, such as a system for recognizing words in continuous speech. A speech recognition system is disclosed which is capable of recognizing a huge number of words, and in principle even an unlimited number of words. The speech recognition system comprises a word recognizer for deriving a best path through a word graph, and wherein words are assigned to the speech based on the best path. The word score being obtained from applying a phonemic language model to each word of the word graph. Moreover, the invention deals with an apparatus and a method for identifying words from a sound block and to computer readable code for implementing the method.
    Type: Application
    Filed: February 3, 2012
    Publication date: May 31, 2012
    Applicant: Nuance Communications Austria GMBH
    Inventor: Zsolt Saffer
  • Publication number: 20120136661
    Abstract: The present invention discloses converting a text form into a speech. In the present invention, partial word lists of a data source are obtained by parsing the data source in parallel or in series. The partial word lists are then compiled to obtain phoneme graphs corresponding, respectively, to the partial word lists, and then the obtained phoneme graphs are combined. Speech recognition is then conducted according to the combination results. According to the present invention, computational complexity may be reduced and recognition efficiency may be improved during speech recognition.
    Type: Application
    Filed: November 2, 2011
    Publication date: May 31, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Guo Kang Fu, Zhao Bing Han, Bin Jia, Ying Liu
  • Publication number: 20120136660
    Abstract: A voice-estimation device that probes the vocal tract of a user with sub-threshold acoustic waves to estimate the user's voice while the user speaks silently or audibly in a noisy or socially sensitive environment. The waves reflected by the vocal tract are detected and converted into a digital signal, which is then processed segment-by-segment. Based on the processing, a set of formant frequencies is determined for each segment. Each such set is then analyzed to assign a phoneme to the corresponding segment of the digital signal. The resulting sequence of phonemes is converted into a digital audio signal or text representing the user's estimated voice.
    Type: Application
    Filed: November 30, 2010
    Publication date: May 31, 2012
    Applicant: ALCATEL-LUCENT USA INC.
    Inventors: Dale D. Harman, Lothar Benedikt Moeller