Abstract: Embodiments of the present disclosure provide systems and methods for extracting entities from semi-structured enterprise documents. The method performed by a server system includes receiving an enterprise document in a semi-structured format. The method includes extracting document features from the enterprise document. The document features include structural, token-specific, and entity-specific features. Further, the method includes identifying candidate entities in the enterprise document based at least on a machine learning model which uses document features. The candidate entities include candidate tabular entities and candidate non-tabular entities. The method includes computing probability scores for the one or more tokens-corresponding to the candidate non-tabular entities and the candidate tabular entities, based at least on the machine learning model.
Type:
Grant
Filed:
February 22, 2022
Date of Patent:
December 31, 2024
Assignee:
TAO AUTOMATION SERVICES PRIVATE LIMITED
Inventors:
Hariharamoorthy Theriappan, Amit Rajan, Nagaraju Pappu, Jawahar Bekay