Abstract: A system and method of for describing target data as a sequence of pattern elements and pattern element groups that comprise an overall target pattern is described. Pattern elements may utilize regular expression syntax along with other metadata that describe the behavior of the element. A pattern element group may be a collection of fully defined pattern elements where at least one pattern element from the group must have a match for the overall pattern to match. Patterns contain both pattern elements and pattern element groups. The general process involves first performing optical character recognition (OCR) on the document, which in turn produces a sequence of text tokens representing the lines of text on each page of the document. The search algorithm may then apply each defined pattern to the entire document capturing and/or extracting data that match each pattern's required elements and element groups.
Abstract: A system and method of for describing target data as a sequence of pattern elements and pattern element groups that comprise an overall target pattern is described. Pattern elements may utilize regular expression syntax along with other metadata that describe the behavior of the element. A pattern element group may be a collection of fully defined pattern elements where at least one pattern element from the group must have a match for the overall pattern to match. Patterns contain both pattern elements and pattern element groups. The general process involves first performing optical character recognition (OCR) on the document, which in turn produces a sequence of text tokens representing the lines of text on each page of the document. The search algorithm may then apply each defined pattern to the entire document capturing and/or extracting data that match each pattern's required elements and element groups.
Abstract: A system and method of for describing target data as a sequence of pattern elements and pattern element groups that comprise an overall target pattern is described. Pattern elements may utilize regular expression syntax along with other metadata that describe the behavior of the element. A pattern element group may be a collection of fully defined pattern elements where at least one pattern element from the group must have a match for the overall pattern to match. Patterns contain both pattern elements and pattern element groups. The general process involves first performing optical character recognition (OCR) on the document, which in turn produces a sequence of text tokens representing the lines of text on each page of the document. The search algorithm may then apply each defined pattern to the entire document capturing and/or extracting data that match each pattern's required elements and element groups.