Patents by Inventor Vishal Prakash CHAVAN

Vishal Prakash CHAVAN has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Method and system for extracting information from input document comprising multi-format information

Patent number: 12032651

Abstract: Disclosed herein is method and a system for extracting information from an input document comprising multi-format information. In an embodiment, a Hypertext Markup Language (HTML) document corresponding to the input document is created by analyzing the input document comprising documents of multiple data formats. Further, the HTML document is realigned based on a number of columns in each page of the HTML document. Furthermore, a document Identifier (ID) associated with each of the documents is determined in realigned HTML document by classifying information in each of the document pages using a pretrained Machine Learning (ML) model. Subsequently, a hierarchy configuration file, corresponding to the realigned HTML document, is generated based on the document ID. Finally, information from the hierarchy configuration file associated with each of the document ID is extracted by orchestrating one or more data extractors for extracting data attributes from the hierarchy configuration file.

Type: Grant

Filed: June 15, 2022

Date of Patent: July 9, 2024

Assignee: Wipro Limited

Inventors: Swapnil Dnyaneshwar Belhe, Shishir Vivek Bhaskarwar, Vishal Prakash Chavan
METHOD AND SYSTEM FOR EXTRACTING INFORMATION FROM INPUT DOCUMENT COMPRISING MULTI-FORMAT INFORMATION

Publication number: 20230315799

Abstract: Disclosed herein is method and a system for extracting information from an input document comprising multi-format information. In an embodiment, a Hypertext Markup Language (HTML) document corresponding to the input document is created by analyzing the input document comprising documents of multiple data formats. Further, the HTML document is realigned based on a number of columns in each page of the HTML document. Furthermore, a document Identifier (ID) associated with each of the documents is determined in realigned HTML document by classifying information in each of the document pages using a pretrained Machine Learning (ML) model. Subsequently, a hierarchy configuration file, corresponding to the realigned HTML document, is generated based on the document ID. Finally, information from the hierarchy configuration file associated with each of the document ID is extracted by orchestrating one or more data extractors for extracting data attributes from the hierarchy configuration file.

Type: Application

Filed: June 15, 2022

Publication date: October 5, 2023

Inventors: Swapnil Dnyaneshwar BELHE, Shishir Vivek BHASKARWAR, Vishal Prakash CHAVAN

Method and system for extracting information from input document comprising multi-format information

METHOD AND SYSTEM FOR EXTRACTING INFORMATION FROM INPUT DOCUMENT COMPRISING MULTI-FORMAT INFORMATION