Patents Assigned to Greenhouse Software, Inc.

METHODS AND APPARATUS FOR EXTRACTING DATA FROM A DOCUMENT BY ENCODING IT WITH TEXTUAL AND VISUAL FEATURES AND USING MACHINE LEARNING

Publication number: 20250005952

Abstract: An apparatus including a processor caused to receive document images, each including representations of characters. The processor is caused to parse each document image to extract, based on structure type, subsets of characters, to generate a text encoding for that document image. For each document, the processor is caused to extract visual features to generate a visual encoding for that document image, each visual feature associated with a subset of characters. The processor is caused to generate parsed documents, each parsed document uniquely associated with a document image and based on the text and visual encoding for that document image. For each parsed document, the processor is caused to identify sections uniquely associated with section type. The processor is caused to train machine learning models, each machine learning model associated with one section type and trained using a portion of each parsed document associated with that section type.

Type: Application

Filed: June 28, 2024

Publication date: January 2, 2025

Applicant: Greenhouse Software, Inc.

Inventor: Triantafyllos XYLOURIS
Methods and apparatus for extracting data from a document by encoding it with textual and visual features and using machine learning

Patent number: 12183106

Abstract: An apparatus including a processor caused to receive document images, each including representations of characters. The processor is caused to parse each document image to extract, based on structure type, subsets of characters, to generate a text encoding for that document image. For each document, the processor is caused to extract visual features to generate a visual encoding for that document image, each visual feature associated with a subset of characters. The processor is caused to generate parsed documents, each parsed document uniquely associated with a document image and based on the text and visual encoding for that document image. For each parsed document, the processor is caused to identify sections uniquely associated with section type. The processor is caused to train machine learning models, each machine learning model associated with one section type and trained using a portion of each parsed document associated with that section type.

Type: Grant

Filed: June 28, 2024

Date of Patent: December 31, 2024

Assignee: Greenhouse Software, Inc.

Inventor: Triantafyllos Xylouris

METHODS AND APPARATUS FOR EXTRACTING DATA FROM A DOCUMENT BY ENCODING IT WITH TEXTUAL AND VISUAL FEATURES AND USING MACHINE LEARNING

Methods and apparatus for extracting data from a document by encoding it with textual and visual features and using machine learning