Abstract: A method for generating a grammar for a collection of sample documents or records which are marked up under the SGML language. Once completed, the grammar is reduced to provide a document type definition (DTD). In constructing the initial or large corpus grammar, SGML tags are extracted and missing tags are accounted for. The tags then are matched to develop a tag structure from which a corpus grammar is built. Utilizing a sequence of reduction procedures, the corpus grammar is reduced.
Type:
Grant
Filed:
August 22, 1994
Date of Patent:
December 10, 1996
Assignee:
OCLC Online Library Center, Incorporated