Patents by Inventor Daniel Marcus

Daniel Marcus has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20070016400
    Abstract: A special notation that extends the notion of IDL by weighted operators. The Weighted IDL or WIDL can be intersected with a language model, for example an n-gram language model or a syntax-based language model. The intersection is carried out by converting the IDL to a graph, and unfolding the graph in a way which maximizes its compactness.
    Type: Application
    Filed: June 21, 2005
    Publication date: January 18, 2007
    Inventors: Radu Soricutt, Daniel Marcu
  • Patent number: 7127208
    Abstract: To automatically annotate an essay, a sentence of the essay is identified and a feature associated with the sentence is determined. In addition, a probability of the sentence being a discourse element is determined by mapping the feature to a model. The model having been generated by a machine learning application based on at least one annotated essay. Furthermore, the essay is annotated based on the probability.
    Type: Grant
    Filed: June 24, 2002
    Date of Patent: October 24, 2006
    Assignee: Educational Testing Service
    Inventors: Jill Burstein, Daniel Marcu
  • Publication number: 20060142995
    Abstract: Training and translation using trees and/or subtrees as parts of the rules. A target language is word aligned with a source language, and at least one of the languages is parsed into trees. The trees are used for training, by aligning conversion steps, forming a manual set of information representing the conversion steps and then learning rules from that reduced set. The rules include subtrees as parts thereof, and are used for decoding, along with an n-gram language model and a syntax based language mode.
    Type: Application
    Filed: October 12, 2005
    Publication date: June 29, 2006
    Inventors: Kevin Knight, Michel Galley, Mark Hopkins, Daniel Marcu, Ignacio Thayer
  • Publication number: 20050228643
    Abstract: A translation training device which extracts from two nonparallel Corpora a set of parallel sentences. The system finds parameters between different sentences or phrases, in order to find parallel sentences. The parallel sentences are then used for training a data-driven machine translation system. The process can be applied repetitively until sufficient data is collected or until the performance of the translation system stops improving.
    Type: Application
    Filed: March 22, 2005
    Publication date: October 13, 2005
    Inventors: Dragos Munteanu, Daniel Marcu
  • Publication number: 20050143971
    Abstract: A method and system for determining text coherence in an essay is disclosed. A method of evaluating the coherence of an essay includes receiving an essay having one or more discourse elements and text segments. The one or more discourse elements are annotated either manually or automatically. A text segment vector is generated for each text segment in a discourse element using sparse random indexing vectors. The method or system then identifies one or more essay dimensions and measures the semantic similarity of each text segment based on the essay dimensions. Finally, a coherence level is assigned to the essay based on the measured semantic similarities.
    Type: Application
    Filed: October 26, 2004
    Publication date: June 30, 2005
    Inventors: Jill Burstein, Derrick Higgins, Claudia Gentile, Daniel Marcu
  • Publication number: 20050042592
    Abstract: An essay is analyzed automatically by accepting the essay and determining whether each of a predetermined set of features is present or absent in each sentence of the essay. For each sentence in the essay a probability that the sentence is a member of a certain discourse element category is calculated. The probability is based on the determinations of whether each feature in the set of features is present or absent. Furthermore, based on the calculated probabilities, a sentence is chosen as the choice for the discourse element category.
    Type: Application
    Filed: September 22, 2004
    Publication date: February 24, 2005
    Inventors: Jill Burstein, Daniel Marcu, Vyacheslav Andreyev, Martin Chodorow, Claudia Leacock
  • Patent number: 6796800
    Abstract: An essay is analyzed automatically by accepting the essay and determining whether each of a predetermined set of features is present or absent in each sentence of the essay. For each sentence in the essay a probability that the sentence is a member of a certain discourse element category is calculated. The probability is based on the determinations of whether each feature in the set of features is present or absent. Furthermore, based on the calculated probabilities, a sentence is chosen as the choice for the discourse element category.
    Type: Grant
    Filed: January 23, 2002
    Date of Patent: September 28, 2004
    Assignee: Educational Testing Service
    Inventors: Jill Burstein, Daniel Marcu, Vyacheslav Andreyev, Martin Sanford Chodorow, Claudia Leacock
  • Publication number: 20040030551
    Abstract: A machine translation (MT) system may utilize a phrase-based joint probability model. The model may be used to generate source and target language sentences simultaneously. In an embodiment, the model may learn phrase-to-phrase alignments from word-to-word alignments generated by a word-to-word statistical MT system. The system may utilize the joint probability model for both source-to-target and target-to-source translation applications.
    Type: Application
    Filed: March 27, 2003
    Publication date: February 12, 2004
    Inventors: Daniel Marcu, William Wong, Kevin Knight, Philipp Koehn
  • Publication number: 20030233222
    Abstract: A statistical machine translation (MT) system may use a large monolingual corpus to improve the accuracy of translated phrases/sentences. The MT system may produce a alternative translations and use the large monolingual corpus to (re)rank the alternative translations.
    Type: Application
    Filed: March 26, 2003
    Publication date: December 18, 2003
    Inventors: Radu Soricut, Daniel Marcu, Kevin Knight
  • Publication number: 20030204400
    Abstract: A machine translation system may use non-parallel monolingual corpora to generate a translation lexicon. The system may identify identically spelled words in the two corpora, and use them as a seed lexicon. The system may use various clues, e.g., context and frequency, to identify and score other possible translation pairs, using the seed lexicon as a basis. An alternative system may use a small bilingual lexicon in addition to non-parallel corpora to learn translations of unknown words and to generate a parallel corpus.
    Type: Application
    Filed: March 26, 2003
    Publication date: October 30, 2003
    Inventors: Daniel Marcu, Kevin Knight, Dragos Stefan Munteanu, Philipp Koehn
  • Publication number: 20030138758
    Abstract: To automatically annotate an essay, a sentence of the essay is identified and a feature associated with the sentence is determined. In addition, a probability of the sentence being a discourse element is determined by mapping the feature to a model. The model having been generated by a machine learning application based on at least one annotated essay. Furthermore, the essay is annotated based on the probability.
    Type: Application
    Filed: June 24, 2002
    Publication date: July 24, 2003
    Inventors: Jill Burstein, Daniel Marcu
  • Publication number: 20030009322
    Abstract: A statistical translation memory (TMEM) may be generated by training a translation model with a naturally generated TMEM. A number of tuples may be extracted from each translation pair in the TMEM. The tuples may include a phrase in a source language and a corresponding phrase in a target language. The tuples may also include probability information relating to the phrases generated by the translation model.
    Type: Application
    Filed: May 17, 2002
    Publication date: January 9, 2003
    Inventor: Daniel Marcu
  • Publication number: 20020188439
    Abstract: A statistical machine translation (MT) system may include a translation memory (TMEM) and a decoder. The decoder may translate an input text segment using a statistical MT decoding algorithm, for example, a greedy decoding algorithm. The system may generate a cover of the input text segment from text segments in the TMEM. The decoder may use the cover as an initial translation in the decoding operation.
    Type: Application
    Filed: May 9, 2002
    Publication date: December 12, 2002
    Inventor: Daniel Marcu
  • Publication number: 20020142277
    Abstract: An essay is analyzed automatically by accepting the essay and determining whether each of a predetermined set of features is present or absent in each sentence of the essay. For each sentence in the essay a probability that the sentence is a member of a certain discourse element category is calculated. The probability is based on the determinations of whether each feature in the set of features is present or absent. Furthermore, based on the calculated probabilities, a sentence is chosen as the choice for the discourse element category.
    Type: Application
    Filed: January 23, 2002
    Publication date: October 3, 2002
    Inventors: Jill Burstein, Daniel Marcu, Vyacheslav Andreyev, Martin Sanford Chodorow, Claudia Leacock`
  • Publication number: 20020046018
    Abstract: A discourse structure for an input text segment is determined by generating a set of one or more discourse parsing decision rules based on a training set, and determining a discourse structure for the input text segment by applying the generated set of discourse parsing decision rules to the input text segment. A tree structure is summarized by generating a set of one or more summarization decision rules based on a training set, and compressing the tree structure by applying the generated set of summarization decision rules to the tree structure. Alternatively, summarization is accomplished by parsing an input text segment to generate a parse tree for the input segment, generating a plurality of potential solutions, applying a statistical model to determine a probability of correctness for each of potential solution, and extracting one or more high- probability solutions based on the solutions' respective determined probabilities of correctness.
    Type: Application
    Filed: May 11, 2001
    Publication date: April 18, 2002
    Inventors: Daniel Marcu, Kevin Knight
  • Publication number: 20020040292
    Abstract: Machine translation decoding is accomplished by receiving as input a text segment in a source language to be translated into a target language, generating an initial translation as a current target language translation, applying one or more modification operators to the current target language translation to generate one or more modified target language translations, determining whether one or more of the modified target language translations represents an improved translation in comparison with the current target language translation, setting a modified target language translation as the current target language translation, and repeating these steps until occurrence of a termination condition. Automatically generating a tree (e.g., either a syntactic tree or a discourse tree) can be accomplished by receiving as input a tree corresponding to a source language text segment, and applying one or more decision rules to the received input to generate a tree corresponding to a target language text segment.
    Type: Application
    Filed: May 11, 2001
    Publication date: April 4, 2002
    Inventor: Daniel Marcu