Abstract: A method is described herein that comprises receiving scanned documents, wherein the scanned documents comprise unstructured data. The method includes performing optical character recognition of the scanned documents to produce text data for each page of the scanned documents, wherein the text data for each page comprises a sequence of words stored together with their location. The method includes dividing each page of the scanned documents into subsections. The method includes using the text data to identify a structure type of each subsection of a page, wherein the structure type includes at least one of a table and text paragraph. The method includes using the text data to label each subsection of a page with a semantic type, wherein the semantic type defines a context surrounding collection of information in a subsection. The method includes using the text data for each subsection of a page to identify medical concepts.
Type:
Grant
Filed:
April 22, 2020
Date of Patent:
January 30, 2024
Assignee:
SELECT REHABILITATION, INC.
Inventors:
Michael Gallagher, Michael Capstick, Matthew Moran