Abstract: A method for establishing paraphrasing data for a machine translation system includes selecting a paraphrasing target sentence through application of an object language model to a translated sentence that is obtained by machine-translating a source language sentence, extracting paraphrasing candidates that can be paraphrased with the paraphrasing target sentence from a source language corpus DB, performing machine translation with respect to the paraphrasing candidates, selecting a final paraphrasing candidate by applying the object language model to the result of the machine translation with respect to the paraphrasing candidates, and confirming the paraphrasing target sentence and the final paraphrasing candidate as paraphrasing lexical patterns using a bilingual corpus and storing the paraphrasing lexical patterns in a paraphrasing DB. According to the present invention, the consistent paraphrasing data can be established since the paraphrasing data is automatically established.
Type:
Grant
Filed:
October 31, 2012
Date of Patent:
May 19, 2015
Assignee:
ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE
Inventors:
Chang Hyun Kim, Young-Ae Seo, Seong Il Yang, Jinxia Huang, Jong Hun Shin, Young Kil Kim, Sang Kyu Park
Abstract: A voice gesture is determined from characteristics of an audio signal based on sound uttered by a user. The voice gesture may represent a command or parameters or a command, and may be context sensitive. Upon determining a command and parameters of the command based on the received voice gesture, the command is executed in accordance with the determined parameters. The command may modify any number of attributes within an environment including, but limited to, an image projected within the environment.
Abstract: A system can receive text. The text can be divided into various portions. One or more significance indicators can be associated with each portion of text: these significance indicators can also be received by the system. The system can then display a portion of text and the associated significance indicators to the user.
Abstract: Methods and apparatus, including computer program products, for using speech to text for detecting commercials and aligning edited episodes with transcripts. A method includes, receiving an original video or audio having a transcript, receiving an edited video or audio of the original video or audio, applying a speech-to-text process to the received original video or audio having a transcript, applying a speech-to-text process to the received edited video or audio, and applying an alignment to determine locations of the edits.
Abstract: An audio encoder has a window function controller, a windower, a time warper with a final quality check functionality, a time/frequency converter, a TNS stage or a quantizer encoder, the window function controller, the time warper, the TNS stage or an additional noise filling analyzer are controlled by signal analysis results obtained by a time warp analyzer or a signal classifier. Furthermore, a decoder applies a noise filling operation using a manipulated noise filling estimate depending on a harmonic or speech characteristic of the audio signal.
Type:
Grant
Filed:
January 11, 2011
Date of Patent:
April 21, 2015
Assignee:
Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.
Inventors:
Stefan Bayer, Sascha Disch, Ralf Geiger, Guillaume Fuchs, Max Neuendorf, Gerald Schuller, Bernd Edler
Abstract: A method, a computer readable medium and a system for reporting automatic speech recognition that comprises, collecting an utterance, analyzing the utterance, receiving a translation of the utterance, and determining a difference between the analyzed utterance and the translated utterance. An embodiment the disclosure includes updating the utterance analysis based upon the determined difference, correlating the analyzed utterance to the translated utterance and tracking determined difference by a translator. In another embodiment the disclose includes reporting, categorizing, sorting, and grouping the determined difference.
Abstract: A custom dictionary is generated for an e-book. A dictionary management system receives a custom dictionary request from a user client operated by a user, the custom dictionary request identifying the e-book and including dictionary management information describing the user. The dictionary management system chooses a group reader profile that has an associated group reading score for the user based on the dictionary management information and candidate words are identified in the identified e-book for inclusion in the custom dictionary. The dictionary management system selects words for inclusion in the custom dictionary from among the candidate words responsive to the associated group reading score for the chosen group reading profile. The dictionary management system generates the custom dictionary using the selected words, and provides the generated custom dictionary to the user client.
Abstract: Computer-based speech recognition can be improved by recognizing words with an accurate accent model. In order to provide a large number of possible accents, while providing real-time speech recognition, a language tree data structure of possible accents is provided in one embodiment such that a computerized speech recognition system can benefit from choosing among accent categories when searching for an appropriate accent model for speech recognition.
Abstract: An information processing apparatus includes: a plurality of information input units; an event detection unit that generates event information including estimated position information and estimated identification information of users present in the real space based on analysis of the information from the information input unit; and an information integration processing unit that inputs the event information, and generates target information including a position of each user and user identification information based on the input event information, and signal information representing a probability value of the event generation source, wherein the information integration processing unit includes an utterance source probability calculation unit, and wherein the utterance source probability calculation unit performs a process of calculating an utterance source score as an index value representing an utterance source probability of each target by multiplying weights based on utterance situations by a plurality of d
Abstract: Speech recognition may be improved using data derived from an utterance. In some embodiments, audio data is received by a user device. Adaptation data may be retrieved from a data store accessible by the user device. The audio data and the adaptation data may be transmitted to a server device. The server device may use the audio data to calculate second adaptation data. The second adaptation data may be transmitted to the user device. Synchronously or asynchronously, the server device may perform speech recognition using the audio data and the second adaptation data and transmit speech recognition results back to the user device.
Type:
Grant
Filed:
October 30, 2012
Date of Patent:
March 31, 2015
Assignee:
Amazon Technologies, Inc.
Inventors:
Hugh Secker-Walker, Bjorn Hoffmeister, Ryan Thomas, Stan Salvador, Karthik Ramakrishnan
Abstract: Various embodiments described herein facilitate multi-lingual communications. The systems and methods of some embodiments enable multi-lingual communications through different modes of communication including, for example, Internet-based chat, e-mail, text-based mobile phone communications, postings to online forums, postings to online social media services, and the like. Certain embodiments implement communication systems and methods that translate text between two or more languages. Users of the systems and methods may be incentivized to submit corrections for inaccurate or erroneous translations, and may receive a reward for these submissions. Systems and methods for assessing the accuracy of translations are described.
Abstract: Embodiments of the present invention include an apparatus, method, and system for calculating senone scores for multiple concurrent input speech streams. The method can include the following: receiving one or more feature vectors from one or more input streams; accessing the acoustic model one senone at a time; and calculating separate senone scores corresponding to each incoming feature vector. The calculation uses a single read access to the acoustic model for a single senone and calculates a set of separate senone scores for the one or more feature vectors, before proceeding to the next senone in the acoustic model.
Abstract: A technique for authenticating a user is described. During this authentication technique, an electronic device (such as a cellular telephone) captures multiple images of the user while the user moves the electronic device in a pre-defined manner (for example, along a path in 3-dimensional space), and determines positions of the electronic device when the multiple images were captured. Then, the electronic device compares the images at the positions with corresponding pre-existing images of the user captured at different points of view. If the comparisons achieve a match condition, the electronic device authenticates the user. In this way, the authentication technique may be used to prevent successful replay attacks.
Type:
Grant
Filed:
January 10, 2013
Date of Patent:
March 17, 2015
Assignee:
Intuit Inc.
Inventors:
Alexander S. Ran, Christopher Z. Lesner, Cynthia J. Osmon
Abstract: A system and method for extracting and reusing metadata to analyze messages is provided. A stream of messages is monitored. Those messages with a predetermined message component pointing to a referent are identified. Words that are related to the referent are extracted from each of the messages. A local similarity of the identified messages is determined by comparing the extracted words of each message. A global similarity of the identified messages is determined by combining the extracted words from all the identified messages and by comparing the combined extracted words with extracted words from all messages that include a different referent. A determination is made as to whether one or more of the extracted words from the identified messages are descriptive of the referent based on the local and global comparisons.
Abstract: A method for linguistical analytic consolidation is described. The method includes displaying a user interface on a mobile device. The method also includes receiving source text content to display in the user interface. The method also includes scanning the source text content for a specific element. The method also includes flagging the specific element of the source text content to be modified according to a set of linguistic rules. Modifying the specific element according to the set of linguistic rules results in a consolidated form of the source text content.
Type:
Grant
Filed:
September 25, 2012
Date of Patent:
March 17, 2015
Assignee:
International Business Machines Corporation
Abstract: Advance Machine Learning or Unsupervized Machine Learning Techniques are provided that relate to Self-learning processes by which a machine generates a sensible automated summary, extracts knowledge, and extracts contextually related Topics along with the justification that explains “why they are related” automatically without any human intervention or guidance (backed ontology's) during the process. Such processes also relate to generating a 360-Degree Contextual Result (360-DCR) using Auto-summary, Knowledge Extraction and Contextual Mapping.
Abstract: A system and method for performing dual mode speech recognition, employing a local recognition module on a mobile device and a remote recognition engine on a server device. The system accepts a spoken query from a user, and both the local recognition module and the remote recognition engine perform speech recognition operations on the query, returning a transcription and confidence score, subject to a latency cutoff time. If both sources successfully transcribe the query, then the system accepts the result having the higher confidence score. If only one source succeeds, then that result is accepted. In either case, if the remote recognition engine does succeed in transcribing the query, then a client vocabulary is updated if the remote system result includes information not present in the client vocabulary.
Type:
Grant
Filed:
June 21, 2012
Date of Patent:
March 3, 2015
Assignee:
Soundhound, Inc.
Inventors:
Timothy P. Stonehocker, Keyvan Mohajer, Bernard Mont-Reynaud
Abstract: A system and method provide bidirectional context-based text disambiguation. In one implementation, a processor receives an input text comprising a set of string objects, which may include ambiguous objects such as incomplete or unrecognizable words of a selected language. The processor identifies a set of candidate word objects corresponding to at least a first one of the string objects and a second one of the string objects. Each candidate word object represents, for example, a complete or recognizable word of the selected language. The processor outputs a selected word object in place of a first one of the string objects, as a function of a contextual comparison between one or more candidate word objects corresponding to the first string object and one or more candidate word objects corresponding to the second string object.
Type:
Grant
Filed:
April 30, 2012
Date of Patent:
March 3, 2015
Assignee:
BlackBerry Limited
Inventors:
Jerome Pasquero, Noel John Orland Stonehouse, Daniel James Legg, Jason Tyler Griffin
Abstract: According to one embodiment, a markup assistance apparatus includes an acquisition unit, a first calculation unit, a detection unit and a presentation unit. The acquisition unit acquires a feature amount for respective tags, each of the tags being used to control text-to-speech processing of a markup text. The first calculation unit calculates, for respective character strings, a variance of feature amounts of the tags which are assigned to the character string in a markup text. The detection unit detects a first character string assigned a first tag having the variance not less than a first threshold value as a first candidate including the tag to be corrected. The presentation unit presents the first candidate.
Abstract: Disclosed is subject matter that provides a technique and a device that may include an accelerometer, a display device, an input device and a processor. The input device may receive textual information in a first language. The processor may be configured to generate a plurality of probable translation alternatives for a translation result. Each probable translation alternative may be a translation of the textual information into a second language. The processor may present a first of the plurality of probable translation alternatives on the display device in an alternate translation result dialog screen. Based on an accelerometer signal, the processor may determine whether the device is being shaken. In response to a determination the device is being shaken, the processor may present a second of the plurality of probable translation alternatives on the display device in place of the first of the plurality of probable translation alternatives.