Patents by Inventor Vibhas Gejji

Vibhas Gejji has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Document spatial layout feature extraction to simplify template classification

Patent number: 11804056

Abstract: Image encoded documents are identified by recognizing known objects in each document with an object recognizer. The objects in each page are filtered to remove lower order objects. Known features in the objects are recognized by sequentially organizing each object in each filtered page into a one-dimensional array, where each object is positioned in a corresponding one-dimensional array as a function of location in the corresponding filtered page. The one-dimensional array is then compared to known arrays to classify the image document corresponding to the one-dimensional array.

Type: Grant

Filed: May 30, 2022

Date of Patent: October 31, 2023

Assignee: Automation Anywhere, Inc.

Inventors: Michael Sundell, Vibhas Gejji
DOCUMENT SPATIAL LAYOUT FEATURE EXTRACTION TO SIMPLIFY TEMPLATE CLASSIFICATION

Publication number: 20220292862

Abstract: Image encoded documents are identified by recognizing known objects in each document with an object recognizer. The objects in each page are filtered to remove lower order objects. Known features in the objects are recognized by sequentially organizing each object in each filtered page into a one-dimensional array, where each object is positioned in a corresponding one-dimensional array as a function of location in the corresponding filtered page. The one-dimensional array is then compared to known arrays to classify the image document corresponding to the one-dimensional array.

Type: Application

Filed: May 30, 2022

Publication date: September 15, 2022

Inventors: Michael Sundell, Vibhas Gejji
Document spatial layout feature extraction to simplify template classification

Patent number: 11348353

Abstract: Image encoded documents are identified by recognizing known objects in each document with an object recognizer. The objects in each page are filtered to remove lower order objects. Known features in the objects are recognized by sequentially organizing each object in each filtered page into a one-dimensional array, where each object is positioned in a corresponding one-dimensional array as a function of location in the corresponding filtered page. The one-dimensional array is then compared to known arrays to classify the image document corresponding to the one-dimensional array.

Type: Grant

Filed: January 31, 2020

Date of Patent: May 31, 2022

Assignee: Automation Anywhere, Inc.

Inventors: Michael Sundell, Vibhas Gejji
METHOD AND SYSTEM FOR EXTRACTION OF TABLE DATA FROM DOCUMENTS FOR ROBOTIC PROCESS AUTOMATION

Publication number: 20220108107

Abstract: Improved techniques to access content from documents in an automated fashion. The improved techniques permit content of tables within documents to be retrieved and then used by computer systems operating various software programs (e.g., application programs). Consequently, Robotic Process Automation (RPA) systems are able to accurately understand the content of tables within documents so that users, application programs and/or software robots can operate on the documents with increased reliability and flexibility. The documents being received and processed can be electronic images of documents. For example, the documents can be business transaction documents which include tables, such as purchase orders, invoices, delivery receipts, bills of lading, etc.

Type: Application

Filed: January 27, 2021

Publication date: April 7, 2022

Inventors: Siddarth Sathi, Vibhas Gejji, Anish Hiranandani, Bruno Gomes Selva, Anjana Prabhakar
METHOD AND SYSTEM FOR EXTRACTION OF DATA FROM DOCUMENTS FOR ROBOTIC PROCESS AUTOMATION

Publication number: 20220108108

Abstract: Improved techniques to access content from documents in an automated fashion. The improved techniques permit extraction of data from documents, namely, images of documents. The extraction processing can be hierarchical, such as being performed in multiple levels (i.e., multi-leveled). At an upper level, numerous different objects within a document can be detected along with positional data for the objects and can be categorized based on a type of object. Then, at lower levels, the different objects can be processed differently depending on the type of object. As a result, data extraction from the document can be performed with greater reliability and precision.

Type: Application

Filed: January 27, 2021

Publication date: April 7, 2022

Inventors: Siddarth Sathi, Vibhas Gejji, Anish Hiranandani, Bruno Gomes Selva, Anjana Prabhakar
MACHINED LEARNING SUPPORTING DOCUMENT DATA EXTRACTION

Publication number: 20220108106

Abstract: Improved techniques to access content from documents in an automated fashion. The improved techniques permit content within documents to be retrieved and then used by computer systems operating various software programs (e.g., application programs), such as an extraction program. Documents, especially business transaction documents, often have various descriptors (or tables) and values that form key-value pairs. The improved techniques permit key-value pairs within documents to be recognized and extracted from documents. Consequently, RPA systems are able to accurately understand the content of tables within documents so that users and/or software robots can operate on the documents with increased reliability and flexibility.

Type: Application

Filed: January 27, 2021

Publication date: April 7, 2022

Inventors: Siddarth Sathi, Vibhas Gejji, Anish Hiranandani, Bruno Gomes Selva, Anjana Prabhakar
Region adjacent subgraph isomorphism for layout clustering in document images

Patent number: 11256760

Abstract: A computer system and computerized method that groups documents with similar image layout together. A document similarity metric based on locally connected subgraphs is employed. Region adjacency graphs are generated from word segments extracted from document images. Fuzzy attributed graph isomorphism is performed on subgraphs checking node and edge attribute similarity. Document similarity is then calculated on a normalized score between matching subgraphs of different documents. Unsupervised clustering of document layouts is performed to generate clusters of documents with similar structure.

Type: Grant

Filed: September 28, 2018

Date of Patent: February 22, 2022

Assignee: Automation Anywhere, Inc.

Inventors: Thomas Corcoran, Vibhas Gejji, Stephen Van Lare
DOCUMENT SPATIAL LAYOUT FEATURE EXTRACTION TO SIMPLIFY TEMPLATE CLASSIFICATION

Publication number: 20210240975

Abstract: Image encoded documents are identified by recognizing known objects in each document with an object recognizer. The objects in each page are filtered to remove lower order objects. Known features in the objects are recognized by sequentially organizing each object in each filtered page into a one-dimensional array, where each object is positioned in a corresponding one-dimensional array as a function of location in the corresponding filtered page. The one-dimensional array is then compared to known arrays to classify the image document corresponding to the one-dimensional array.

Type: Application

Filed: January 31, 2020

Publication date: August 5, 2021

Applicant: Automation Anywhere, Inc.

Inventors: Michael Sundell, Vibhas Gejji
Synthetic augmentation of document images

Patent number: 10984284

Abstract: A computerized method and system for adding distortions to a computer-generated image of a document stored in an image file. An original computer-generated image file is selected and is processed to generate one or more distorted image files for each original computer-generated image file by selecting one or more augmentation modules from a set of augmentation modules to form an augmentation sub-system. The original computer-generated image file is processed with the augmentation sub-system to generate an augmented image file by altering the original computer-generated image file to add distortions that simulate distortions introduced during scanning of a paper-based representation of a document represented in the original computer-generated image file.

Type: Grant

Filed: November 19, 2018

Date of Patent: April 20, 2021

Assignee: Automation Anywhere, Inc.

Inventors: Thomas Corcoran, Vibhas Gejji, Stephen Van Lare
Auto-correction of pattern defined strings

Patent number: 10963717

Abstract: A computer implemented method and system for correcting error produced by Optical Character Recognition (OCR) of text contained in an image encoded document. An error model representing frequency and type of errors produced by Optical Character Recognition Engine is generated. An OCR character string generated by OCR is retrieved. A user-defined pattern of a plurality of character strings is retrieved, where each character string represents a possible correct representation of characters in the OCR character string. The OCR character string is compared to each of the above generated character strings and a ‘likelihood score’ is calculated based on the information from the error model. The character string with the highest ‘likelihood score’ is presumed to be the corrected version of the OCR character string.

Type: Grant

Filed: December 21, 2018

Date of Patent: March 30, 2021

Assignee: Automation Anywhere, Inc.

Inventors: Thomas Corcoran, Vibhas Gejji, Stephen Van Lare
Deep learning based document image embeddings for layout classification and retrieval

Patent number: 10963692

Abstract: Image documents that have a visually perceptible geometric structure and a plurality of visually perceptible key-value pairs are grouped. The image documents are processed to generate a corresponding textually encoded document. The textually encoded documents are each assigned into one of a plurality of layout groups, wherein all textually encoded documents in a particular layout group share a visually perceptible layout that is substantially similar. Triplets are selected from the layout groups, where two documents are from the same layout group and one document is from a different layout group. The triplets are processed with a convolutional neural network to generate a trained neural network that may be used to classify documents in a production environment such that a template designed on one image document in a group permits an extraction engine to extract all relevant fields on all image documents within the group.

Type: Grant

Filed: November 30, 2018

Date of Patent: March 30, 2021

Assignee: Automation Anywhere, Inc.

Inventors: Thomas Corcoran, Vibhas Gejji, Stephen Van Lare
Identification of key segments in document images

Patent number: 10699112

Abstract: A system and method of automatically learning new keywords in a document image based on context such as when a never before seen keyword exists surrounded by other key-value pairs. A machine learning based approach leverages subword embeddings and two-dimensional geometric contexts in a gradient boosted trees classifier. Keys may be composed of multi-word strings or single-word strings.

Type: Grant

Filed: September 28, 2018

Date of Patent: June 30, 2020

Assignee: Automation Anywhere, Inc.

Inventors: Thomas Corcoran, Vibhas Gejji, Stephen Van Lare