Patents by Inventor Daniel Marcus

Daniel Marcus has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Weighted system of expressing language information using a compact notation

Publication number: 20070016400

Abstract: A special notation that extends the notion of IDL by weighted operators. The Weighted IDL or WIDL can be intersected with a language model, for example an n-gram language model or a syntax-based language model. The intersection is carried out by converting the IDL to a graph, and unfolding the graph in a way which maximizes its compactness.

Type: Application

Filed: June 21, 2005

Publication date: January 18, 2007

Inventors: Radu Soricutt, Daniel Marcu
Automated annotation

Patent number: 7127208

Abstract: To automatically annotate an essay, a sentence of the essay is identified and a feature associated with the sentence is determined. In addition, a probability of the sentence being a discourse element is determined by mapping the feature to a model. The model having been generated by a machine learning application based on at least one annotated essay. Furthermore, the essay is annotated based on the probability.

Type: Grant

Filed: June 24, 2002

Date of Patent: October 24, 2006

Assignee: Educational Testing Service

Inventors: Jill Burstein, Daniel Marcu
Training for a text-to-text application which uses string to tree conversion for training and decoding

Publication number: 20060142995

Abstract: Training and translation using trees and/or subtrees as parts of the rules. A target language is word aligned with a source language, and at least one of the languages is parsed into trees. The trees are used for training, by aligning conversion steps, forming a manual set of information representing the conversion steps and then learning rules from that reduced set. The rules include subtrees as parts thereof, and are used for decoding, along with an n-gram language model and a syntax based language mode.

Type: Application

Filed: October 12, 2005

Publication date: June 29, 2006

Inventors: Kevin Knight, Michel Galley, Mark Hopkins, Daniel Marcu, Ignacio Thayer
Discovery of parallel text portions in comparable collections of corpora and training using comparable texts

Publication number: 20050228643

Abstract: A translation training device which extracts from two nonparallel Corpora a set of parallel sentences. The system finds parameters between different sentences or phrases, in order to find parallel sentences. The parallel sentences are then used for training a data-driven machine translation system. The process can be applied repetitively until sufficient data is collected or until the performance of the translation system stops improving.

Type: Application

Filed: March 22, 2005

Publication date: October 13, 2005

Inventors: Dragos Munteanu, Daniel Marcu
Method and system for determining text coherence

Publication number: 20050143971

Abstract: A method and system for determining text coherence in an essay is disclosed. A method of evaluating the coherence of an essay includes receiving an essay having one or more discourse elements and text segments. The one or more discourse elements are annotated either manually or automatically. A text segment vector is generated for each text segment in a discourse element using sparse random indexing vectors. The method or system then identifies one or more essay dimensions and measures the semantic similarity of each text segment based on the essay dimensions. Finally, a coherence level is assigned to the essay based on the measured semantic similarities.

Type: Application

Filed: October 26, 2004

Publication date: June 30, 2005

Inventors: Jill Burstein, Derrick Higgins, Claudia Gentile, Daniel Marcu
Methods for automated essay analysis

Publication number: 20050042592

Abstract: An essay is analyzed automatically by accepting the essay and determining whether each of a predetermined set of features is present or absent in each sentence of the essay. For each sentence in the essay a probability that the sentence is a member of a certain discourse element category is calculated. The probability is based on the determinations of whether each feature in the set of features is present or absent. Furthermore, based on the calculated probabilities, a sentence is chosen as the choice for the discourse element category.

Type: Application

Filed: September 22, 2004

Publication date: February 24, 2005

Inventors: Jill Burstein, Daniel Marcu, Vyacheslav Andreyev, Martin Chodorow, Claudia Leacock
Methods for automated essay analysis

Patent number: 6796800

Abstract: An essay is analyzed automatically by accepting the essay and determining whether each of a predetermined set of features is present or absent in each sentence of the essay. For each sentence in the essay a probability that the sentence is a member of a certain discourse element category is calculated. The probability is based on the determinations of whether each feature in the set of features is present or absent. Furthermore, based on the calculated probabilities, a sentence is chosen as the choice for the discourse element category.

Type: Grant

Filed: January 23, 2002

Date of Patent: September 28, 2004

Assignee: Educational Testing Service

Inventors: Jill Burstein, Daniel Marcu, Vyacheslav Andreyev, Martin Sanford Chodorow, Claudia Leacock
Phrase to phrase joint probability model for statistical machine translation

Publication number: 20040030551

Abstract: A machine translation (MT) system may utilize a phrase-based joint probability model. The model may be used to generate source and target language sentences simultaneously. In an embodiment, the model may learn phrase-to-phrase alignments from word-to-word alignments generated by a word-to-word statistical MT system. The system may utilize the joint probability model for both source-to-target and target-to-source translation applications.

Type: Application

Filed: March 27, 2003

Publication date: February 12, 2004

Inventors: Daniel Marcu, William Wong, Kevin Knight, Philipp Koehn
Statistical translation using a large monolingual corpus

Publication number: 20030233222

Abstract: A statistical machine translation (MT) system may use a large monolingual corpus to improve the accuracy of translated phrases/sentences. The MT system may produce a alternative translations and use the large monolingual corpus to (re)rank the alternative translations.

Type: Application

Filed: March 26, 2003

Publication date: December 18, 2003

Inventors: Radu Soricut, Daniel Marcu, Kevin Knight
Constructing a translation lexicon from comparable, non-parallel corpora

Publication number: 20030204400

Abstract: A machine translation system may use non-parallel monolingual corpora to generate a translation lexicon. The system may identify identically spelled words in the two corpora, and use them as a seed lexicon. The system may use various clues, e.g., context and frequency, to identify and score other possible translation pairs, using the seed lexicon as a basis. An alternative system may use a small bilingual lexicon in addition to non-parallel corpora to learn translations of unknown words and to generate a parallel corpus.

Type: Application

Filed: March 26, 2003

Publication date: October 30, 2003

Inventors: Daniel Marcu, Kevin Knight, Dragos Stefan Munteanu, Philipp Koehn
Automated annotation

Publication number: 20030138758

Abstract: To automatically annotate an essay, a sentence of the essay is identified and a feature associated with the sentence is determined. In addition, a probability of the sentence being a discourse element is determined by mapping the feature to a model. The model having been generated by a machine learning application based on at least one annotated essay. Furthermore, the essay is annotated based on the probability.

Type: Application

Filed: June 24, 2002

Publication date: July 24, 2003

Inventors: Jill Burstein, Daniel Marcu
Statistical method for building a translation memory

Publication number: 20030009322

Abstract: A statistical translation memory (TMEM) may be generated by training a translation model with a naturally generated TMEM. A number of tuples may be extracted from each translation pair in the TMEM. The tuples may include a phrase in a source language and a corresponding phrase in a target language. The tuples may also include probability information relating to the phrases generated by the translation model.

Type: Application

Filed: May 17, 2002

Publication date: January 9, 2003

Inventor: Daniel Marcu
Statistical memory-based translation system

Publication number: 20020188439

Abstract: A statistical machine translation (MT) system may include a translation memory (TMEM) and a decoder. The decoder may translate an input text segment using a statistical MT decoding algorithm, for example, a greedy decoding algorithm. The system may generate a cover of the input text segment from text segments in the TMEM. The decoder may use the cover as an initial translation in the decoding operation.

Type: Application

Filed: May 9, 2002

Publication date: December 12, 2002

Inventor: Daniel Marcu
Methods for automated essay analysis

Publication number: 20020142277

Abstract: An essay is analyzed automatically by accepting the essay and determining whether each of a predetermined set of features is present or absent in each sentence of the essay. For each sentence in the essay a probability that the sentence is a member of a certain discourse element category is calculated. The probability is based on the determinations of whether each feature in the set of features is present or absent. Furthermore, based on the calculated probabilities, a sentence is chosen as the choice for the discourse element category.

Type: Application

Filed: January 23, 2002

Publication date: October 3, 2002

Inventors: Jill Burstein, Daniel Marcu, Vyacheslav Andreyev, Martin Sanford Chodorow, Claudia Leacock`
Discourse parsing and summarization

Publication number: 20020046018

Abstract: A discourse structure for an input text segment is determined by generating a set of one or more discourse parsing decision rules based on a training set, and determining a discourse structure for the input text segment by applying the generated set of discourse parsing decision rules to the input text segment. A tree structure is summarized by generating a set of one or more summarization decision rules based on a training set, and compressing the tree structure by applying the generated set of summarization decision rules to the tree structure. Alternatively, summarization is accomplished by parsing an input text segment to generate a parse tree for the input segment, generating a plurality of potential solutions, applying a statistical model to determine a probability of correctness for each of potential solution, and extracting one or more high- probability solutions based on the solutions' respective determined probabilities of correctness.

Type: Application

Filed: May 11, 2001

Publication date: April 18, 2002

Inventors: Daniel Marcu, Kevin Knight
Machine translation techniques

Publication number: 20020040292

Abstract: Machine translation decoding is accomplished by receiving as input a text segment in a source language to be translated into a target language, generating an initial translation as a current target language translation, applying one or more modification operators to the current target language translation to generate one or more modified target language translations, determining whether one or more of the modified target language translations represents an improved translation in comparison with the current target language translation, setting a modified target language translation as the current target language translation, and repeating these steps until occurrence of a termination condition. Automatically generating a tree (e.g., either a syntactic tree or a discourse tree) can be accomplished by receiving as input a tree corresponding to a source language text segment, and applying one or more decision rules to the received input to generate a tree corresponding to a target language text segment.

Type: Application

Filed: May 11, 2001

Publication date: April 4, 2002

Inventor: Daniel Marcu

prev 1 2 3 4 5