Patents Represented by Attorney, Agent or Law Firm Jeanneatte Walden
  • Patent number: 6792576
    Abstract: A method for generating a wrapper grammar for a file having a structure of a particular format includes providing at least one sample file of the particular format, where the particular format comprises a plurality of string tokens. Each sample file includes a plurality of tokens (data strings) which may be actual data from the document, an HTML tag or some other grammatical separator. The sample file of the particular format is then processed by annotating attributable tokens with a user-defined attribute, such as Author, Title, etc. from a set of attributes to form an annotated sample set. The annotated sample set is then evaluated to determine if wrapper grammar generation is possible, and if it is possible, a wrapper grammar for the files having a structure of the particular format is generated. Preferably, the annotated sample set is evaluated by determining if all attributes in the annotated sample set are distinguishable from one another.
    Type: Grant
    Filed: July 26, 1999
    Date of Patent: September 14, 2004
    Assignee: Xerox Corporation
    Inventor: Boris Chidlovskii