Abstract: Systems and methods for segmentation of report corpus using visual signatures are disclosed. According to one embodiment, a computer-implemented method comprises converting a document to a grayscale image and removing noise from the grayscale image by eroding isolated pixels. Connected regions in the grayscale image are determined and a region of the grayscale image having a square shape is identified. An area of the region is computed and if the area is larger than a threshold, determining that the document contains a form.
Abstract: A system and method for hybrid entity recognition are disclosed. According to one embodiment, a computer-implemented process, comprises receiving an input sentence. The input sentence is preprocessed to remove extraneous information, perform spelling correction, and perform grammar correction to generate a cleaned input sentence. A POS tagger, tags parts of speech of the cleaned input sentence. A rules based entity recognizer module identifies first level entities in the cleaned input sentence. The cleaned input sentence is converted and translated into numeric vectors. Basic and composite entities are extracted from the cleaned input sentence using the numeric vectors.
Type:
Application
Filed:
December 19, 2019
Publication date:
July 9, 2020
Applicant:
Genpact Luxembourg S.à r.l
Inventors:
Ravi Narayan, Sunil Kumar Khokhar, Vikas Mehta, Chirag Srivastava