Abstract: In some aspects, the disclosure is directed to methods and systems for automatic context-based annotation by leveraging a priori knowledge from annotations in template documents. A large library of template documents may be generated and pre-processed in many implementations to identify annotations or other inclusions commonly present on documents related to or conforming to the template. Newly scanned documents may be compared to these templates, and when a similar template is identified, annotation locations and types from the template may be applied to the newly scanned document to recognize and classify annotations and inclusions. To increase efficiency and provide scalability, comparisons of scanned documents and template documents may be distributed amongst a plurality of computing devices for processing in parallel, with similarity results aggregated.
Abstract: In some aspects, the disclosure is directed to methods and systems for detection and classification of stamps in documents. The system can receive image data and textual data of a document. The system can pre-process and filter that data, and covert the textual data to a term frequency inverse document frequency (TF-IDF) vector. The system can detect the presence of a stamp on the document. The system can extract a subset of the image data including the stamp. The system can extract text from the subset of the image data. The system can classify the stamp using the extracted text, the image data, and the TF-IDF vector. The system can store the classification in a database.
Type:
Grant
Filed:
June 2, 2022
Date of Patent:
November 28, 2023
Assignee:
Nationstar Mortgage LLC
Inventors:
Won Lee, Goutam Venkatesh, Ankit Kumar Sinha, Sudhir Sundararam
Abstract: In some aspects, the disclosure is directed to methods and systems for machine learning-based data extraction using multiple string searching models. String extraction logic may differ depending on the type of document received. For documents identified to contain line item structures, broader searching models are applied to the document to account for the increased variability of data in the document inherent in data organized in line item structures. For documents identifier to contain non-line item structures, stricter searching models are applied to the document to account for predictable data in the document associated with data organized in non-line item structures.
Type:
Application
Filed:
September 28, 2021
Publication date:
March 30, 2023
Applicant:
Nationstar Mortgage LLC, d/b/a/ Mr. Cooper
Abstract: In some aspects, the disclosure is directed to methods and systems for detection and classification of stamps in documents. The system can receive image data and textual data of a document. The system can pre-process and filter that data, and covert the textual data to a term frequency inverse document frequency (TF-IDF) vector. The system can detect the presence of a stamp on the document. The system can extract a subset of the image data including the stamp. The system can extract text from the subset of the image data. The system can classify the stamp using the extracted text, the image data, and the TF-IDF vector. The system can store the classification in a database.
Type:
Grant
Filed:
August 11, 2020
Date of Patent:
June 14, 2022
Assignee:
Nationstar Mortgage LLC
Inventors:
Won Lee, Goutam Venkatesh, Ankit Kumar Sinha, Sudhir Sundararam