Abstract: A system and method for binary classification of text units such as sentences, paragraphs and documents as either a rule of law (ROL) or not a rule of law (˜ROL).
During a training phase of the system and method of the present invention, an initialized knowledge base and labeled or pre-classified sentences are used to build a trained knowledge base. The trained knowledge base contains an equation, a threshold, and a plurality of statistical values called Z values.
When inputting text documents for classification, a Z value is generated for each term or token in the input text. The Z values are input to the equation which calculates a score for each sentence. Each calculated score is then compared to the threshold to classify each sentence as either ROL or ˜ROL.
Type:
Grant
Filed:
May 31, 2000
Date of Patent:
January 27, 2004
Assignee:
Lexis Nexis
Inventors:
Timothy L. Humphrey, X. Allan Lu, James S. Wiltshire, Jr., John T. Morelock, Spiro G. Collias, Salahuddin Ahmed
Abstract: An economic, scalable machine learning system and process perform document (concept) classification with high accuracy using large topic schemes, including large hierarchical topic schemes. One or more highly relevant classification topics is suggested for a-given document (concept) to be classified. The invention includes training and concept classification processes. The invention also provides methods that may be used as part of the training and/or concept classification processes, including: a method of scoring the relevance of features in training concepts, a method of ranking concepts based on relevance score, and a method of voting on topics associated with an input concept. In a preferred embodiment, the invention is applied to the legal (case law) domain, classifying legal concepts (rules of law) according to a proprietary legal topic classification scheme (a hierarchical scheme of areas of law).
Type:
Grant
Filed:
August 4, 2000
Date of Patent:
December 31, 2002
Assignee:
Lexis Nexis
Inventors:
James S. Wiltshire, Jr., John T. Morelock, Timothy L. Humphrey, X. Allan Lu, James M. Peck, Salahuddin Ahmed