Abstract: Index terms are drawn from text documents without the need for language-specific processes or training and are suitable as gists for the subject documents. Index terms are extracted on the basis of scores of constituent n-grams relative to n-gram counts in a corpus. A method of extracting joint index terms to represent a plurality of documents is also provided.
Type:
Grant
Filed:
July 19, 1994
Date of Patent:
May 12, 1998
Assignee:
The United States of America as represented by the Secretary of NSA