Segmentation Or Word Limit Detection (epo) Patents (Class 704/E15.005)
  • Publication number: 20150106089
    Abstract: A computer-implemented method includes listening for audio name information indicative of a name of a computer, with the computer configured to listen for the audio name information in a first power mode that promotes a conservation of power; detecting the audio name information indicative of the name of the computer; after detection of the audio name information, switching to a second power mode that promotes a performance of speech recognition; receiving audio command information; and performing speech recognition on the audio command information.
    Type: Application
    Filed: December 30, 2010
    Publication date: April 16, 2015
    Inventors: Evan H. Parker, Michal R. Grabowski
  • Publication number: 20140129226
    Abstract: Techniques disclosed herein include systems and methods for privacy-sensitive training data collection for updating acoustic models of speech recognition systems. In one embodiment, the system locally creates adaptation data from raw audio data. Such adaptation can include derived statistics and/or acoustic model update parameters. The derived statistics and/or updated acoustic model data can then be sent to a speech recognition server or third-party entity. Since the audio data and transcriptions are already processed, the statistics or acoustic model data is devoid of any information that could be human-readable or machine readable such as to enable reconstruction of audio data. Thus, such converted data sent to a server does not include personal or confidential information. Third-party servers can then continually update speech models without storing personal and confidential utterances of users.
    Type: Application
    Filed: November 5, 2012
    Publication date: May 8, 2014
    Inventors: Antonio R. Lee, Petr Novak, Peder A. Olsen, Vaibhava Goel
  • Publication number: 20140114646
    Abstract: A system receives vocal input from one or more persons, and extracts one or more keywords from the vocal input. The system then generates a query using the one or more keywords, searches a database of products and services using the query, and identities a product or service as a function of the query.
    Type: Application
    Filed: October 24, 2012
    Publication date: April 24, 2014
    Applicant: SAP AG
    Inventor: Oleg Figlin
  • Publication number: 20140108010
    Abstract: A voice-enabled document system facilitates execution of service delivery operations by eliminating the need for manual or visual interaction during information retrieval by an operator. Access to voice-enabled documents can facilitate operations for mobile vendors, on-site or field-service repairs, medical service providers, food service providers, and the like. Service providers can access the voice-enabled documents by using a client device to retrieve the document, display it on a screen, and, via voice commands initiate playback of selected audio files containing information derived from text data objects selected from the document. Data structures that are components of a voice-enabled document include audio playback files and a logical association that links the audio playback files to user-selectable fields, and to a set of voice commands.
    Type: Application
    Filed: October 11, 2012
    Publication date: April 17, 2014
    Applicant: INTERMEC IP CORP.
    Inventors: Paul Maltseff, Roger Byford, Jim Logan
  • Publication number: 20140067395
    Abstract: A system and method are described for engaging an audience in a conversational advertisement. A conversational advertising system converses with an audience using spoken words. The conversational advertising system uses a speech recognition application to convert an audience's spoken input into text and a text-to-speech application to transform text of a response to speech that is to be played to the audience. The conversational adverting system follows an advertisement script to guide the audience in a conversation.
    Type: Application
    Filed: August 28, 2012
    Publication date: March 6, 2014
    Applicant: Nuance Communications, Inc.
    Inventors: Sundar Balasubramanian, Michael McSherry, Aaron Sheedy
  • Publication number: 20140058732
    Abstract: Techniques disclosed herein include systems and methods for managing user interface responses to user input including spoken queries and commands. This includes providing incremental user interface (UI) response based on multiple recognition results about user input that are received with different delays. Such techniques include providing an initial response to a user at an early time, before remote recognition results are available. Systems herein can respond incrementally by initiating an initial UI response based on first recognition results, and then modify the initial UI response after receiving secondary recognition results. Since an initial response begins immediately, instead of waiting for results from all recognizers, it reduces the perceived delay by the user before complete results get rendered to the user.
    Type: Application
    Filed: August 21, 2012
    Publication date: February 27, 2014
    Applicant: Nuance Communications, Inc.
    Inventors: Martin Labsky, Tomas Macek, Ladislav Kunc, Jan Kleindienst
  • Publication number: 20140025380
    Abstract: A speech recognition system, method of recognizing speech and a computer program product therefor. A client device identified with a context for an associated user selectively streams audio to a provider computer, e.g., a cloud computer. Speech recognition receives streaming audio, maps utterances to specific textual candidates and determines a likelihood of a correct match for each mapped textual candidate. A context model selectively winnows candidate to resolve recognition ambiguity according to context whenever multiple textual candidates are recognized as potential matches for the same mapped utterance. Matches are used to update the context model, which may be used for multiple users in the same context.
    Type: Application
    Filed: July 18, 2012
    Publication date: January 23, 2014
    Applicant: International Business Machines Corporation
    Inventors: Fernando Luiz Koch, Julio Nogima
  • Publication number: 20130332167
    Abstract: According to some aspects, a method of providing an interactive audio presentation, at least in part, by traversing a plurality of audio animations, each audio animation comprising a plurality of frames, each of the plurality of frames comprising a duration, at least one audio element, and at least one gate indicating criteria for transitioning to and identification of a subsequent frame and/or a subsequent animation is provided. The method comprises rendering a first audio animation, receiving input from the user associated with the presentation, selecting a second audio animation based, at least in part, on the input, and rendering the second audio animation. Some aspects include a system for to performing the above method and some aspects include a computer readable medium storing instructions that perform the above method when executed by at least one processor.
    Type: Application
    Filed: June 12, 2012
    Publication date: December 12, 2013
    Applicant: Nuance Communications, Inc.
    Inventor: Robert M. Kilgore
  • Publication number: 20130325474
    Abstract: Computationally implemented methods and systems include receiving indication of initiation of a speech-facilitated transaction between a party and a target device, and receiving adaptation data correlated to the party. The receiving is facilitated by a particular device associated with the party. The adaptation data is at least partly based on previous adaptation data derived at least in part from one or more previous speech interactions of the party. The methods and systems also include applying the received adaptation data correlated to the party to the target device, and processing speech from the party using the target device to which the received adaptation data has been applied. In addition to the foregoing, other aspects are described in the claims, drawings, and text.
    Type: Application
    Filed: May 31, 2012
    Publication date: December 5, 2013
    Inventors: Royce A. Levien, Richard T. Lord, Robert W. Lord, Mark A. Malamud, John D. Rinaldo, JR.
  • Publication number: 20130304471
    Abstract: A method, an apparatus and an article of manufacture for contextual voice query dilation in a Spoken Web search. The method includes determining a context in which a voice query is created, generating a set of multiple voice query terms based on the context and information derived by a speech recognizer component pertaining to the voice query, and processing the set of query terms with at least one dilation operator to produce a dilated set of queries. A method for performing a search on a voice query is provided, including generating a set of multiple query terms based on information derived by a speech recognizer component processing a voice query, processing the set with multiple dilation operators to produce multiple dilated sub-sets of query terms, selecting at least one query term from each dilated sub-set to compose a query set, and performing a search on the query set.
    Type: Application
    Filed: May 14, 2012
    Publication date: November 14, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Nitendra Rajput, Kundan Shrivastava
  • Publication number: 20130289994
    Abstract: Techniques disclosed herein include systems and methods that enable a voice trigger that wakes-up an electronic device or causes the device to make additional voice commands active, without manual initiation of voice command functionality. In addition, such a voice trigger is dynamically programmable or customizable. A speaker can program or designate a particular phrase as the voice trigger. In general, techniques herein execute a voice-activated wake-up system that operates on a digital signal processor (DSP) or other low-power, secondary processing unit of an electronic device instead of running on a central processing unit (CPU). A speech recognition manager runs two speech recognition systems on an electronic device. The CPU dynamically creates a compact speech system for the DSP. Such a compact system can be continuously run during a standby mode, without quickly exhausting a battery supply.
    Type: Application
    Filed: April 26, 2012
    Publication date: October 31, 2013
    Inventors: Michael Jack Newman, Robert Roth, William D. Alexander, Paul van Mulbregt
  • Publication number: 20130262116
    Abstract: A computer-implemented method and apparatus for searching for an element sequence, the method comprising: receiving a signal; determining an initial segment of the signal; inputting the initial segment into an element extraction engine to obtain a first element sequence; determining one or more second segments, each of the second segments at least partially overlapping with the initial segment; inputting the second segments into the element extraction engine to obtain at least one second element sequence; and searching for an element subsequence common to at least a predetermined number of sequences of the first element sequence and the second element sequences.
    Type: Application
    Filed: March 27, 2012
    Publication date: October 3, 2013
    Applicant: NOVOSPEECH
    Inventor: Yossef Ben-Ezra
  • Publication number: 20130262114
    Abstract: Different advantageous embodiments provide a crowdsourcing method for modeling user intent in conversational interfaces. One or more stimuli are presented to a plurality of describers. One or more sets of describer data are captured from the plurality of describers using a data collection mechanism. The one or more sets of describer data are processed to generate one or more models. Each of the one or more models is associated with a specific stimulus from the one or more stimuli.
    Type: Application
    Filed: April 3, 2012
    Publication date: October 3, 2013
    Applicant: MICROSOFT CORPORATION
    Inventors: Christopher John Brockett, Piali Choudhury, William Brennan Dolan, Yun-Cheng Ju, Patrick Pantel, Noelle Mallory Sophy, Svitlana Volkova
  • Publication number: 20130226576
    Abstract: Speech recognition processing captures phonemes of words in a spoken speech string and retrieves text of words corresponding to particular combinations of phonemes from a phoneme dictionary. A text-to-speech synthesizer then can produce and substitute a synthesized pronunciation of that word in the speech string. If the speech recognition processing fails to recognize a particular combination of phonemes of a word, as spoken, as may occur when a word is spoken with an accent or when the speaker has a speech impediment, the speaker is prompted to clarify the word by entry, as text, from a keyboard or the like for storage in the phoneme dictionary such that a synthesized pronunciation of the word can be played out when the initially unrecognized spoken word is again encountered in a speech string to improve intelligibility, particularly for conference calls.
    Type: Application
    Filed: February 23, 2012
    Publication date: August 29, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Peeyush Jaiswal, Burt Leo Vialpando, Fang Wang
  • Publication number: 20130191128
    Abstract: A continuous phonetic recognition method using semi-Markov model, a system for processing the method, and a recording medium for storing the method. In and embodiment of the phonetic recognition method of recognizing phones using a speech recognition system, a phonetic data recognition device receives speech, and a phonetic data processing device recognizes phones from the received speech using a semi-Markov model.
    Type: Application
    Filed: August 28, 2012
    Publication date: July 25, 2013
    Applicant: KOREA ADVANCED INSTITUTE OF SCIENCE AND TECHNOLOGY
    Inventors: Chang Dong Yoo, Sung Woong Kim
  • Publication number: 20130179169
    Abstract: A Chinese text readability assessing system analyzes and evaluates the readability of text data. A word segmentation module compares the text data with a corpus to obtain a plurality of word segments from the text data and provide part-of-speech settings corresponding to the word segments. A readability index analysis module analyzes the word segments and the part-of-speech settings based on readability indices to calculate index values of the readability indices in the text data. The index values are inputted to a readability mathematical model in a knowledge-evaluated training module, and the readability mathematical model produces a readability analysis result. Accordingly, the Chinese text readability assessing system of the present invention evaluates the readability of Chinese texts by word segmentation and the readability indices analysis in conjunction with the readability mathematical model.
    Type: Application
    Filed: July 5, 2012
    Publication date: July 11, 2013
    Applicant: NATIONAL TAIWAN NORMAL UNIVERSITY
    Inventors: Yao-Ting Sung, Ju-Ling Chen
  • Publication number: 20130166302
    Abstract: Aspects of customizing digital signage are addressed. For example, an audio feed may be analyzed for keywords occurring in potential customers' speech. These keywords are then employed to customize display screens of a digital display.
    Type: Application
    Filed: December 22, 2011
    Publication date: June 27, 2013
    Applicant: NCR Corporation
    Inventor: Brennan Eul I. Mercado
  • Publication number: 20130159000
    Abstract: The subject disclosure is directed towards training a classifier for spoken utterances without relying on human-assistance. The spoken utterances may be related to a voice menu program for which a speech comprehension component interprets the spoken utterances into voice menu options. The speech comprehension component provides confirmations to some of the spoken utterances in order to accurately assign a semantic label. For each spoken utterance with a denied confirmation, the speech comprehension component automatically generates a pseudo-semantic label that is consistent with the denied confirmation and selected from a set of potential semantic labels and updates a classification model associated with the classifier using the pseudo-semantic label.
    Type: Application
    Filed: December 15, 2011
    Publication date: June 20, 2013
    Applicant: MICROSOFT CORPORATION
    Inventors: Yun-Cheng Ju, James Garnet Droppo, III
  • Publication number: 20130138441
    Abstract: Disclosed is a method of generating a search network for voice recognition, the method including: generating a pronunciation transduction weighted finite state transducer by implementing a pronunciation transduction rule representing a phenomenon of pronunciation transduction between recognition units as a weighted finite state transducer; and composing the pronunciation transduction weighted finite state transducer and one or more weighted finite state transducers.
    Type: Application
    Filed: August 14, 2012
    Publication date: May 30, 2013
    Applicant: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE
    Inventors: Seung Hi Kim, Dong Hyun Kim, Young Ik Kim, Jun Park, Hoon Young Cho, Sang Hun Kim
  • Publication number: 20130110492
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for predicting the stability of speech recognition results. In one aspect, a method includes determining a length of time, or a number of occasions, in which a word has remained in an incremental speech recognizer's top hypothesis, and assigning a stability metric to the word based on the length of time or number of occasions.
    Type: Application
    Filed: May 1, 2012
    Publication date: May 2, 2013
    Applicant: GOOGLE INC.
    Inventors: Ian C. McGraw, Alexander H. Gruenstein
  • Publication number: 20130103402
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for combining frame and segment level processing, via temporal pooling, for phonetic classification. A frame processor unit receives an input and extracts the time-dependent features from the input. A plurality of pooling interface units generates a plurality of feature vectors based on pooling the time-dependent features and selecting a plurality of time-dependent features according to a plurality of selection strategies. Next, a plurality of segmental classification units generates scores for the feature vectors. Each segmental classification unit (SCU) can be dedicated to a specific pooling interface unit (PIU) to form a PIU-SCU combination. Multiple PIU-SCU combinations can be further combined to form an ensemble of combinations, and the ensemble can be diversified by varying the pooling operations used by the PIU-SCU combinations.
    Type: Application
    Filed: October 25, 2011
    Publication date: April 25, 2013
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Sumit CHOPRA, Dimitrios Dimitriadis, Patrick Haffner
  • Publication number: 20130096918
    Abstract: A recognizing device includes a memory and a processor coupled to the memory. The memory stores words included in a sentence and positional information indicating a position of the words in the sentence. The processor executes a process including comparing an input voice signal with reading information of a character string that connects a plurality of words stored in the memory to calculate a similarity; calculating a connection score indicating a proximity between the plurality of connected words based on positional information of the words stored in the memory; and determining a character string corresponding to the voice signal based on the similarity and the connection score.
    Type: Application
    Filed: August 15, 2012
    Publication date: April 18, 2013
    Applicant: FUJITSU LIMITED
    Inventor: Shouji HARADA
  • Publication number: 20130085757
    Abstract: An embodiment of an apparatus for speech recognition includes a plurality of trigger detection units, each of which is configured to detect a start trigger for recognizing a command utterance for controlling a device, a selection unit, utilizing a signal from one or more sensors embedded on the device, configured to select a selected trigger detection unit among the trigger detection units, the selected trigger detection unit being appropriate to a usage environment of the device, and a recognition unit configured to recognize the command utterance when the start trigger is detected by the selected trigger detection unit.
    Type: Application
    Filed: June 29, 2012
    Publication date: April 4, 2013
    Applicant: Kabushiki Kaisha Toshiba
    Inventors: Masanobu NAKAMURA, Akinori KAWAMURA
  • Publication number: 20130066635
    Abstract: An apparatus and a method, which set a remote control command for controlling a home network service in a portable terminal are provided. The apparatus includes a memory for storing configuration types of a remote control command in a set order in a home network service; and a controller for setting the remote control command including the input configuration types of the remote control command and transmitting the remote control command, when the configuration types of the remote control command are input in the set order in the home network service.
    Type: Application
    Filed: September 10, 2012
    Publication date: March 14, 2013
    Applicant: Samsung Electronics Co., Ltd.
    Inventors: Jong-Seok KIM, Jin Park
  • Publication number: 20130060572
    Abstract: In an aspect, in general, method for aligning an audio recording and a transcript includes receiving a transcript including a plurality of terms, each term of the plurality of terms associated with a time location within a different version of the audio recording, forming a plurality of search terms from the terms of the transcript, determining possible time locations of the search terms in the audio recording, determining a correspondence between time locations within the different version of the audio recording associated with the search terms and the possible time locations of the search terms in the audio recording, and aligning the audio recording and the transcript including updating the time location associated with terms of the transcript based on the determined correspondence.
    Type: Application
    Filed: September 4, 2012
    Publication date: March 7, 2013
    Applicant: Nexidia Inc.
    Inventors: Jacob B. Garland, Drew Lanham, Daryl Kip Watters, Marsal Gavalda, Mark Finlay, Kenneth K. Griggs
  • Publication number: 20130041667
    Abstract: The present invention provides a speech recognition system combined with one or more alternate input modalities to ensure efficient and accurate text input. The speech recognition system achieves less than perfect accuracy due to limited processing power, environmental noise, and/or natural variations in speaking style. The alternate input modalities use disambiguation or recognition engines to compensate for reduced keyboards, sloppy input, and/or natural variations in writing style. The ambiguity remaining in the speech recognition process is mostly orthogonal to the ambiguity inherent in the alternate input modality, such that the combination of the two modalities resolves the recognition errors efficiently and accurately. The invention is especially well suited for mobile devices with limited space for keyboards or touch-screen input.
    Type: Application
    Filed: October 12, 2012
    Publication date: February 14, 2013
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventor: NUANCE COMMUNICATIONS, INC.
  • Publication number: 20130035938
    Abstract: The present invention includes a hierarchical search process. The hierarchical search process includes three steps. In a first step, a word boundary is determined using a recognition method of determining a following word dependent on a preceding word, and a word boundary detector. In a second step, word unit based recognition is performed in each area by dividing an input voice into a plurality of areas based on the determined word boundary. Finally, in a third step, a language model is applied to induce an optimal sentence recognition result with respect to a candidate word that is determined for each area. The present invention may improve the voice recognition performance, and particularly, the sentence unit based consecutive voice recognition performance.
    Type: Application
    Filed: July 2, 2012
    Publication date: February 7, 2013
    Applicant: ELECTRONICS AND COMMUNICATIONS RESEARCH INSTITUTE
    Inventor: Ho Young Jung
  • Publication number: 20130035939
    Abstract: Disclosed herein is a method for speech recognition. The method includes receiving speech utterances, assigning a pronunciation weight to each unit of speech in the speech utterances, each respective pronunciation weight being normalized at a unit of speech level to sum to 1, for each received speech utterance, optimizing the pronunciation weight by identifying word and phone alignments and corresponding likelihood scores, and discriminatively adapting the pronunciation weight to minimize classification errors, and recognizing additional received speech utterances using the optimized pronunciation weights. A unit of speech can be a sentence, a word, a context-dependent phone, a context-independent phone, or a syllable. The method can further include discriminatively adapting pronunciation weights based on an objective function. The objective function can be maximum mutual information, maximum likelihood training, minimum classification error training, or other functions known to those of skill in the art.
    Type: Application
    Filed: October 11, 2012
    Publication date: February 7, 2013
    Applicant: AT&T INTELLECTUAL PROPERTY I, L.P.
    Inventor: AT&T INTELLECTUAL PROPERTY I, L.P.
  • Publication number: 20130013310
    Abstract: A speech recognition system comprising a recognition dictionary for use in speech recognition and a controller configured to recognize an inputted speech by using the recognition dictionary is disclosed. The controller detects a speech section based on a signal level of the inputted speech, recognizes a speech data corresponding to the speech section by using the recognition dictionary, and displays a recognition result of the recognition process and a correspondence item that corresponds to the recognition result in form of list. The correspondence item displayed in form of list is manually operable.
    Type: Application
    Filed: July 5, 2012
    Publication date: January 10, 2013
    Applicant: DENSO CORPORATION
    Inventors: Yuki Fujisawa, Katsushi Asami
  • Publication number: 20130006631
    Abstract: Environmental recognition systems may improve recognition accuracy by leveraging local and nonlocal features in a recognition target. A local decoder may be used to analyze local features, and a nonlocal decoder may be used to analyze nonlocal features. Local and nonlocal estimates may then be exchanged to improve the accuracy of the local and nonlocal decoders. Additional iterations of analysis and exchange may be performed until a predetermined threshold is reached. In some embodiments, the system may comprise extrinsic information extractors to prevent positive feedback loops from causing the system to adhere to erroneous previous decisions.
    Type: Application
    Filed: June 28, 2012
    Publication date: January 3, 2013
    Applicant: UTAH STATE UNIVERSITY
    Inventors: Jacob Gunther, Todd Moon
  • Publication number: 20130006637
    Abstract: Improved techniques are disclosed for permitting a user to employ more human-based grammar (i.e., free form or conversational input) while addressing a target system via a voice system. For example, a technique for determining intent associated with a spoken utterance of a user comprises the following steps/operations. Decoded speech uttered by the user is obtained. An intent is then extracted from the decoded speech uttered by the user. The intent is extracted in an iterative manner such that a first class is determined after a first iteration and a sub-class of the first class is determined after a second iteration. The first class and the sub-class of the first class are hierarchically indicative of the intent of the user, e.g., a target and data that may be associated with the target.
    Type: Application
    Filed: August 1, 2012
    Publication date: January 3, 2013
    Applicant: Nuance Communications, Inc.
    Inventors: Dimitri Kanevsky, Joseph Simon Reisinger, Robert Sicconi, Mahesh Viswanathan
  • Publication number: 20130006639
    Abstract: A word pattern recognition system improves text input entered via a shorthand-on-keyboard interface. A core lexicon comprises commonly used words in a language; an extended lexicon comprises words not included in the core lexicon. The system only directly outputs words from the core lexicon. Candidate words from the extended lexicon can be outputted and simultaneously admitted to the core lexicon upon user selection. A concatenation module enables a user to input parts of a long word separately. A compound word module combines two common shorter words whose concatenation forms a long word.
    Type: Application
    Filed: September 14, 2012
    Publication date: January 3, 2013
    Applicant: Nuance Communications, Inc.
    Inventors: Per-Ola Kristensson, Shumin Zhai
  • Publication number: 20120323577
    Abstract: Methods of automatic speech recognition for premature enunciation. In one method, a) a user is prompted to input speech, then b) a listening period is initiated to monitor audio via a microphone, such that there is no pause between the end of step a) and the beginning of step b), and then the begin-speaking audible indicator is communicated to the user during the listening period. In another method, a) at least one audio file is played including both a prompt for a user to input speech and a begin-speaking audible indicator to the user, b) a microphone is activated to monitor audio, after playing the prompt but before playing the begin-speaking audible indicator in step a), and c) speech is received from the user via the microphone.
    Type: Application
    Filed: June 16, 2011
    Publication date: December 20, 2012
    Applicant: GENERAL MOTORS LLC
    Inventors: John J. Correia, Rathinavelu Chengalvarayan, Gaurav Talwar, Xufang Zhao
  • Publication number: 20120316878
    Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving geographical information derived from a non-verbal user action associated with a first computing device. The non-verbal user action implies an interest of a user in a geographic location. The method also includes identifying a grammar associated with the geographic location using the derived geographical information and outputting a grammar indicator for use in selecting the identified grammar for voice recognition processing of vocal input from the user.
    Type: Application
    Filed: August 23, 2012
    Publication date: December 13, 2012
    Applicant: Google Inc.
    Inventors: David Singleton, Debajit Ghosh
  • Publication number: 20120316880
    Abstract: An information processing apparatus, information processing method, and computer readable non-transitory storage medium for analyzing words reflecting information that is not explicitly recognized verbally. An information processing method includes the steps of: extracting speech data and sound data used for recognizing phonemes included in the speech data as words; identifying a section surrounded by pauses within a speech spectrum of the speech data; performing sound analysis on the identified section to identify a word in the section; generating prosodic feature values for the words; acquiring frequencies of occurrence of the word within the speech data; calculating a degree of fluctuation within the speech data for the prosodic feature values of high frequency words where the high frequency words are any words whose frequency of occurrence meets a threshold; and determining a key phrase based on the degree of fluctuation.
    Type: Application
    Filed: August 22, 2012
    Publication date: December 13, 2012
    Applicant: International Business Machines Corporation
    Inventors: Tohru Nagano, Masafumi Nishimura, Ryuki Tachibana
  • Publication number: 20120316879
    Abstract: A continuous speech recognition system to recognize continuous speech smoothly in a noisy environment. The system selects call commands, configures a minimum recognition network in token, which consists of the call commands and mute intervals including noises, recognizes the inputted speech continuously in real time, analyzes the reliability of speech recognition continuously and recognizes the continuous speech from a speaker. When a speaker delivers a call command, the system for detecting the speech interval and recognizing continuous speech in a noisy environment through the real-time recognition of call commands measures the reliability of the speech after recognizing the call command, and recognizes the speech from the speaker by transferring the speech interval following the call command to a continuous speech-recognition engine at the moment when the system recognizes the call command.
    Type: Application
    Filed: August 22, 2012
    Publication date: December 13, 2012
    Applicant: KOREAPOWERVOICE CO., LTD.
    Inventors: Heui-Suck JUNG, Se-Hoon CHIN, Tae-Young ROH
  • Publication number: 20120299824
    Abstract: To take security into account and increase user friendliness, an information processing device includes: an input unit to which information is input; an extracting unit extracting predetermined words from the information input to the input unit; a classifying unit classifying the words extracted by the extracting unit into first words and second words; and a converting unit converting the first words by a first conversion method and converting the second words by a second conversion method, the second conversion method being different from the first conversion method.
    Type: Application
    Filed: February 4, 2011
    Publication date: November 29, 2012
    Applicant: NIKON CORPORATION
    Inventors: Hideo Hoshuyama, Hiroyuki Akiya, Kazuya Umeyama, Keiichi Nitta, Hiroki Uwai, Masakazu Sekiguchi
  • Publication number: 20120303267
    Abstract: A method for speech recognition includes providing a source of geographical information within a vehicle. The geographical information pertains to a current location of the vehicle, a planned travel route of the vehicle, a map displayed within the vehicle, and/or a gesture marked by a user on a map. Words spoken within the vehicle are recognized by use of a speech recognition module. The recognizing is dependent upon the geographical information.
    Type: Application
    Filed: August 6, 2012
    Publication date: November 29, 2012
    Applicant: Robert Bosch GmbH
    Inventors: Zhongnan Shen, Fuliang Weng, Zhe Feng
  • Publication number: 20120296653
    Abstract: A method of and a system for processing speech. A spoken utterance of a plurality of characters can be received. A plurality of known character sequences that potentially correspond to the spoken utterance can be selected. Each selected known character sequence can be scored based on, at least in part, a weighting of individual characters that comprise the known character sequence.
    Type: Application
    Filed: July 30, 2012
    Publication date: November 22, 2012
    Applicant: Nuance Communications, Inc.
    Inventor: Kenneth D. White
  • Publication number: 20120290302
    Abstract: A Chinese speech recognition system and method is disclosed. Firstly, a speech signal is received and recognized to output a word lattice. Next, the word lattice is received, and word arcs of the word lattice are rescored and reranked with a prosodic break model, a prosodic state model, a syllable prosodic-acoustic model, a syllable-juncture prosodic-acoustic model and a factored language model, so as to output a language tag, a prosodic tag and a phonetic segmentation tag, which correspond to the speech signal. The present invention performs rescoring in a two-stage way to promote the recognition rate of basic speech information and labels the language tag, prosodic tag and phonetic segmentation tag to provide the prosodic structure and language information for the rear-stage voice conversion and voice synthesis.
    Type: Application
    Filed: April 13, 2012
    Publication date: November 15, 2012
    Inventors: Jyh-Her YANG, Chen-Yu Chiang, Ming-Chieh Liu, Yih-Ru Wang, Yuan-Fu Liao, Sin-Horng Chen
  • Publication number: 20120290303
    Abstract: A speech recognition system and method based on word-level candidate generation are provided. The speech recognition system may include a speech recognition result verifying unit to verify a word sequence and a candidate word for at least one word included in the word sequence when the word sequence and the candidate word are provided as a result of speech recognition. A word sequence displaying unit may display the word sequence in which the at least one word is visually distinguishable from other words of the word sequence. The word sequence displaying unit may display the word sequence by replacing the at least one word with the candidate word when the at least one word is selected by a user.
    Type: Application
    Filed: May 8, 2012
    Publication date: November 15, 2012
    Applicant: NHN CORPORATION
    Inventors: Sang Ho LEE, Hoon KIM, Dong Ook KOO, Dae Sung JUNG
  • Publication number: 20120278079
    Abstract: An audio processing system makes use of a number of levels of compression or data reduction, thereby providing reduced storage requirements while maintaining a high accuracy of keyword detection in the original audio input.
    Type: Application
    Filed: April 29, 2011
    Publication date: November 1, 2012
    Inventors: Jon A. Arrowood, Robert W. Morris, Peter S. Cardillo, Marsal Gavalda
  • Publication number: 20120271635
    Abstract: A system and method for performing speech recognition is disclosed. The method comprises receiving an utterance, applying the utterance to a recognizer with a language model having pronunciation probabilities associated with unique word identifiers for words given their pronunciations and presenting a recognition result for the utterance. Recognition improvement is found by moving a pronunciation model from a dictionary to the language model.
    Type: Application
    Filed: July 2, 2012
    Publication date: October 25, 2012
    Applicant: AT&T Intellectual Property II, L.P.
    Inventor: Andrej Ljolje
  • Publication number: 20120271634
    Abstract: A speech dialog system is described that adjusts a voice activity detection threshold during a speech dialog prompt C to reflect a context-based probability of user barge in speech occurring. For example, the context-based probability may be based on the location of one or more transition relevance places in the speech dialog prompt.
    Type: Application
    Filed: March 26, 2010
    Publication date: October 25, 2012
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventor: Nils Lenke
  • Publication number: 20120262533
    Abstract: A method is provided in one example and includes identifying a particular word recited by an active speaker in a conference involving a plurality of endpoints in a network environment; evaluating a profile associated with the active speaker in order to identify contextual information associated with the particular word; and providing augmented data associated with the particular word to at least some of the plurality of endpoints. In more specific examples, the active speaker is identified using a facial detection protocol, or a speech recognition protocol. Data from the active speaker can be converted from speech to text.
    Type: Application
    Filed: April 18, 2011
    Publication date: October 18, 2012
    Inventors: Satish K. Gannu, Leon A. Frazier, Didier R. Moretti
  • Publication number: 20120265531
    Abstract: An intelligent query system for processing voiced-based queries is disclosed, which uses semantic based processing to identify the question posed by the user by understanding the meaning of the users utterance. Based on identifying the meaning of the utterance, the system selects a single answer that best matches the user's query. The answer that is paired to this single question is then retrieved and presented to the user. The system, as implemented, accepts environmental variables selected by the user and is scalable to provide answers to a variety and quantity of user-initiated queries.
    Type: Application
    Filed: June 18, 2012
    Publication date: October 18, 2012
    Inventor: Ian M. Bennett
  • Publication number: 20120259632
    Abstract: A feature transform for speech recognition is described. An input speech utterance is processed to produce a sequence of representative speech vectors. A time-synchronous speech recognition pass is performed using a decoding search to determine a recognition output corresponding to the speech input. The decoding search includes, for each speech vector after some first threshold number of speech vectors, estimating a feature transform based on the preceding speech vectors in the utterance and partial decoding results of the decoding search. The current speech vector is then adjusted based on the current feature transform, and the adjusted speech vector is used in a current frame of the decoding search.
    Type: Application
    Filed: February 22, 2010
    Publication date: October 11, 2012
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventor: Daniel Willett
  • Publication number: 20120245934
    Abstract: A method of automatic speech recognition. An utterance is received from a user in reply to a text message, via a microphone that converts the reply utterance into a speech signal. The speech signal is processed using at least one processor to extract acoustic data from the speech signal. An acoustic model is identified from a plurality of acoustic models to decode the acoustic data, and using a conversational context associated with the text message. The acoustic data is decoded using the identified acoustic model to produce a plurality of hypotheses for the reply utterance.
    Type: Application
    Filed: March 25, 2011
    Publication date: September 27, 2012
    Applicant: GENERAL MOTORS LLC
    Inventors: Gaurav Talwar, Xufang Zhao
  • Publication number: 20120239403
    Abstract: An approach for phoneme recognition is described. A sequence of intermediate output posterior vectors is generated from an input sequence of cepstral features using a first layer perceptron. The intermediate output posterior vectors are then downsampled to form a reduced input set of intermediate posterior vectors for a second layer perceptron. A sequence of final posterior vectors is generated from the reduced input set of intermediate posterior vectors using the second layer perceptron. Then the final posterior vectors are decoded to determine an output recognized phoneme sequence representative of the input sequence of cepstral features.
    Type: Application
    Filed: September 28, 2009
    Publication date: September 20, 2012
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventors: Daniel Andrés Vásquez Cano, Guillermo Aradilla, Rainer Gruhn
  • Publication number: 20120232904
    Abstract: A method and apparatus for correcting a named entity word in a speech input text. The method includes recognizing a speech input signal from a user, obtaining a recognition result including named entity vocabulary mark-up information, determining a named entity word recognized incorrectly in the recognition result according to the named entity vocabulary mark-up information, displaying the named entity word recognized incorrectly, and correcting the named entity word recognized incorrectly.
    Type: Application
    Filed: March 12, 2012
    Publication date: September 13, 2012
    Applicant: Samsung Electronics Co., Ltd.
    Inventors: Xuan ZHU, Hua Zhang, Tengrong Su, Ki-Wan Eom, Jae-Won Lee