Abstract: A method and an apparatus for selecting an answer to a natural language question. The method includes: detecting a named entity in the natural language question; extracting information related to an answer from the natural language question; searching in linked data according to the detected named entity; generating a candidate answer according to a search result; parsing the candidate answer according to the information related to the answer; and obtaining a value of a feature of the candidate answer; and evaluating each candidate answer by synthesizing the value of the feature of the candidate answer.
Type:
Grant
Filed:
April 23, 2010
Date of Patent:
October 30, 2012
Assignee:
International Business Machines Corporation
Inventors:
David Angelo Ferrucci, Li Ma, Yue Pan, Zhao Ming Qiu, Chen Wang, Christopher Welty, Lei Zhang
Abstract: Systems, methods, and apparatus for accessing distributed models in automated machine processing, including using large language models in machine translation, speech recognition and other applications.
Type:
Grant
Filed:
February 16, 2007
Date of Patent:
October 23, 2012
Assignee:
Google Inc.
Inventors:
Ignacio E. Thayer, Franz Josef Och, Alexander Mark Franz, Jeffrey Dean, Thorsten Brants, Jay M. Ponte, Peng Xu, Sha-Mayn Teh, Jeffrey Chin, Anton Carver, Daniel Rosart, John S. Hawkins, Karel Driesen
Abstract: A grammatical inference system for inferring a grammar from a plurality of example sentences. The system selects sentences having a common suffix or prefix component; identifies the other of said suffix or prefix component of each selected sentence; generating rules for generating the example sentences and the other components; reduces the right hand side of each rule on the basis of the right hand sides of the other rules; and generates a grammar on the basis of the reduced rules.
Abstract: A system and method for determining a set of attributes to a communication includes a decision engine, a monitoring module, and application software. The decision engine receives communications and assigns a set of attributes to each received communication. Each communication and associated set of attributes is sent to the communication's corresponding application which processes the set of attributes for performing an action, such as display. The monitoring module monitors an item selected by the system user. The monitoring module may feed the selected item and associated communication back to the decision module. The decision engine may process the feedback on-line or in real-time. The decision module is a learning system that updates classification criteria using feedback. Classification scores associated with each set of attributes may represent an estimate of the statistical likelihood that each attribute is the proper response to the communication.
Type:
Grant
Filed:
March 27, 2002
Date of Patent:
October 16, 2012
Assignee:
International Business Machines Corporation
Abstract: A system offers potential completions for fragments of text. The system may obtain a text fragment and identify documents that include the text fragment. The system may locate sentences within the documents that include at least a portion of the text fragment, identify sentence endings associated with the located sentences, and present the sentence endings as potential completions for the text fragment.
Type:
Grant
Filed:
September 16, 2011
Date of Patent:
October 2, 2012
Assignee:
Google Inc.
Inventors:
Georges R. Harik, Simon Tong, David R. Cheng
Abstract: A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software that is operable to disambiguate compound text input. The device is able to assemble language objects in the memory to generate compound language solutions. The device is able to analyze the combinations of language objects in light of N-gram data stored on the device to avoid proposing low-probability compound language solutions.
Abstract: A statistical machine translation (SMT) system employs a conditional translation probability conditioned on the source language content. A model parameters optimization engine is configured to optimize values of parameters of the conditional translation probability using a translation pool comprising candidate aligned translations for source language sentences having reference translations. The model parameters optimization engine adds candidate aligned translations to the translation pool by sampling available candidate aligned translations in accordance with the conditional translation probability.
Abstract: The present invention relates to a method and system for textual exploration and discovery. More specifically, the method and system provide a text-driven and grammar based tool for textual exploration and textual navigation. The facilities for textual exploration and textual navigation are based on a system of index entries that are connected to the underlying text segments from which the index entries are derived. Text units with particular grammatical, semantic, and/or pragmatic features constitute bundles of sentences or text zones.
Abstract: String-oriented web queries are utilized as a tool to examine the fabric of how words, phrases and/or n-grams alternate in a language. This fabric is exploited in order to build up a matrix of semantically equivalent pieces of language. In one embodiment, the Distributional Hypothesis is utilized, along with strategies for confirming synonymy, to systematically build up a picture of what words/phrases can be legitimately substituted for one another.
Abstract: The teachings described herein generally relate to a system and method for multilingual teaching of numeric or language skills through an electronic translation of a source phrase to a destination language selected from multiple languages. The electronic translation can occur as a spoken translation, can be in real-time, and can mimic the voice of the user of the system.
Type:
Grant
Filed:
March 13, 2007
Date of Patent:
August 7, 2012
Assignee:
NewTalk, Inc.
Inventors:
Bruce W. Nash, Craig A. Robinson, Martha P. Robinson, Robert H. Clemons
Abstract: A method for enabling input into a handheld electronic device having at least three selectable languages available thereon includes detecting a predetermined input a number of times and switching a selected language between one of the three selectable languages and another of the three selectable languages wherein the another language is an immediately preceding selected language.
Type:
Grant
Filed:
December 20, 2010
Date of Patent:
May 29, 2012
Assignee:
Research In Motion Limited
Inventors:
Vadim Fux, Carlo Chiarello, Andrew D. Bocking, Harry R. Major
Abstract: A method and system for determining text coherence in an essay is disclosed. A method of evaluating the coherence of an essay includes receiving an essay having one or more discourse elements and text segments. The one or more discourse elements are annotated either manually or automatically. A text segment vector is generated for each text segment in a discourse element using sparse random indexing vectors. The method or system then identifies one or more essay dimensions and measures the semantic similarity of each text segment based on the essay dimensions. Finally, a coherence level is assigned to the essay based on the measured semantic similarities.
Type:
Grant
Filed:
May 10, 2010
Date of Patent:
May 22, 2012
Assignee:
Educational Testing Service
Inventors:
Jill Burstein, Derrick Higgins, Claudia Gentile, Daniel Marcu
Abstract: In the method of extraction, the words of the text are encoded by comparing them with the contents of a lexicon of tool words (essentially articles, prepositions, conjunctions, and verbal auxiliaries), and nominal groups are then identified by searching subsets of the resulting succession of encoded words to look for groups of encoded words that comply with predefined syntactical rules.
Abstract: A translation apparatus includes: an image acquiring unit that acquires an original document image read from an original document including an original sentence in a first language; a translating unit that translates the original sentence into a second language; a line-space specifying unit that specifies a line-space region for each line of the original sentence; a first translation document creating unit that creates a first translation document by arranging a translation sentence in each line-space region of the original document image; a second translation document creating unit that creates a second translation document; a determining unit that determines whether a non-interference condition is satisfied on the basis of each line-space region; and an output unit that outputs the first translation document in a case where the non-interference condition is satisfied, or that outputs the second translation document in a case where the non-interference condition is not satisfied.
Abstract: A system and method of refining context-free grammars (CFGs). The method includes deriving back-off grammar (BOG) rules from an initially developed CFG and utilizing the initial CFG and the derived BOG rules to recognize user utterances. Based on a response of the initial CFG and the derived BOG rules to the user utterances, at least a portion of the derived BOG rules are utilized to modify the initial CFG and thereby produce a refined CFG. The above method can carried out iterativey, with each new iteration utilizing a refined CFG from preceding iterations.
Type:
Grant
Filed:
December 1, 2006
Date of Patent:
January 31, 2012
Assignee:
Microsoft Corporation
Inventors:
Timothy Paek, Max Chickering, Eric Badger
Abstract: To enter Chinese text, a user enters the corresponding phonetic spelling via telephone style keypad. Some or all keys represent multiple phonetic letters. In disambiguating entered key presses to yield a valid phonetic spelling, a computer divides the key presses into segments, while still preserving key press order. Each segment must correspond to an entry in a dictionary of Chinese characters, character phrases, and/or character components such as radicals or other predetermined stroke groupings. Upon arrival of a new key press that cannot form a valid entry when appended to the current segment, key presses are incrementally reallocated from the previous segment. As for already-resolved segments occurring prior to the previous and current segments, these are left intact. After each shifting attempt, the computer reinterprets key presses of the last two segments, and accepts the new segmentation if the segments form valid dictionary entries.
Abstract: Operations for weighted and non-weighted multi-tape automata are described for use in natural language processing tasks such as morphological analysis, disambiguation, and entity extraction.
Type:
Grant
Filed:
November 9, 2009
Date of Patent:
January 10, 2012
Assignee:
Xerox Corporation
Inventors:
Andre Kempe, Franck Guingne, Florent Nicart
Abstract: A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software that is operable to disambiguate compound text input. The device is able to assemble language objects in the memory to generate compound language solutions. The device is able to analyze the combinations of language objects in light of N-gram data stored on the device to avoid proposing low-probability compound language solutions.
Abstract: A method of prioritizing the automated translation of communications relating to a predetermined topic includes capturing and inputting into a data processing system a translation-candidate communication rendered in a first human language. A first data set representative of the translation-candidate communication is stored in computer memory and parsed into communication sub-portions. Communication sub-portions are algorithmically selected for translation depending on their relatedness to the predetermined topic as determined by first-language extraction rules. Each selected communication sub-portion is translated to a translated-data-set sub-portion representative of that selected communication sub-portion in the second human language. Translated-data-set sub-portions are subjected to a secondary filtration process in accordance with which their relatedness to the predetermined topic is determined by second-language extraction rules.
Abstract: A system for indexing displayed elements that is useful for accessing and understanding new or difficult materials, in which a user highlights unknown words or characters or other displayed elements encountered while viewing displayed materials. In a language learning application, the system displays the meaning of a word in context; and the user may include the word in a personal vocabulary to build a database of words and phrases. In a Japanese language application, one or more Japanese language books are read on an electronic display. Readings (‘yomi’) for all words are readily viewable for any selected word or phrase, as well as an English reference to the selected word or phrase. Extensive notes are provided for difficult phrases and words not normally found in a dictionary. A unique indexing scheme allows word-by-word access to any of several external multi-media references.