Patents by Inventor Christian Reisswig

Christian Reisswig has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

DATA-DRIVEN STRUCTURE EXTRACTION FROM TEXT DOCUMENTS

Publication number: 20250103815

Abstract: Methods and apparatus are disclosed for extracting structured content, as graphs, from text documents. Graph vertices and edges correspond to document tokens and pairwise relationships between tokens. Undirected peer relationships and directed relationships (e.g. key-value or composition) are supported. Vertices can be identified with predefined fields, and thence mapped to database columns for automated storage of document content in a database. A trained neural network classifier determines relationship classifications for all pairwise combinations of input tokens. The relationship classification can differentiate multiple relationship types. A multi-level classifier extracts multi-level graph structure from a document. Disclosed embodiments support arbitrary graph structures with hierarchical and planar relationships. Relationships are not restricted by spatial proximity or document layout. Composite tokens can be identified interspersed with other content.

Type: Application

Filed: December 10, 2024

Publication date: March 27, 2025

Applicant: SAP SE

Inventor: Christian Reisswig
Data-driven structure extraction from text documents

Patent number: 12204860

Abstract: Methods and apparatus are disclosed for extracting structured content, as graphs, from text documents. Graph vertices and edges correspond to document tokens and pairwise relationships between tokens. Undirected peer relationships and directed relationships (e.g. key-value or composition) are supported. Vertices can be identified with predefined fields, and thence mapped to database columns for automated storage of document content in a database. A trained neural network classifier determines relationship classifications for all pairwise combinations of input tokens. The relationship classification can differentiate multiple relationship types. A multi-level classifier extracts multi-level graph structure from a document. Disclosed embodiments support arbitrary graph structures with hierarchical and planar relationships. Relationships are not restricted by spatial proximity or document layout. Composite tokens can be identified interspersed with other content.

Type: Grant

Filed: February 22, 2023

Date of Patent: January 21, 2025

Assignee: SAP SE

Inventor: Christian Reisswig
AUGMENTING ELECTRONIC DOCUMENTS TO GENERATE SYNTHETIC TRAINING DATA SETS

Publication number: 20230334309

Abstract: Systems, methods, and computer-readable media for generating a synthetic training data set from an original unstructured electronic document are disclosed. The synthetic training data set may be used to train a deep learning model to extract data from the original electronic document. The original electronic document may comprise annotated data fields. Each annotated data field may comprise a bounding box and a label. The original electronic document may comprise a header, a table, and a footer. Macro augmentation operations may be applied to the original electronic document to create sub-templates representative of distinct page layouts in the original electronic document. The synthetic training data set may be generated by applying geometric and semantic data augmentations to the sub-templates and the original electronic documents. The synthetic training data set may then be provided the deep learning model for training.

Type: Application

Filed: April 14, 2022

Publication date: October 19, 2023

Inventors: Alexey Streltsov, Monit Shah Singh, Dhananjay Tomar, Christian Reisswig, Minh Duc Bui
Cascade pooling for natural language processing

Patent number: 11763094

Abstract: Natural language processing systems and methods are disclosed herein. In some embodiments, digital document information comprising text is received. The digital document information may be processed through word and character encoding operations to generate word and character vectors while retaining document location information for the words and characters. The data may be then be processed by a series of convolution and maximum pooling operations to obtain maximum valued elements from the data. The document location information as well as the maximum values element data may be further processed for semantic classification of the data using a semantic classifier and bounding box regression.

Type: Grant

Filed: May 13, 2021

Date of Patent: September 19, 2023

Assignee: SAP SE

Inventor: Christian Reisswig
DATA-DRIVEN STRUCTURE EXTRACTION FROM TEXT DOCUMENTS

Publication number: 20230206000

Abstract: Methods and apparatus are disclosed for extracting structured content, as graphs, from text documents. Graph vertices and edges correspond to document tokens and pairwise relationships between tokens. Undirected peer relationships and directed relationships (e.g. key-value or composition) are supported. Vertices can be identified with predefined fields, and thence mapped to database columns for automated storage of document content in a database. A trained neural network classifier determines relationship classifications for all pairwise combinations of input tokens. The relationship classification can differentiate multiple relationship types. A multi-level classifier extracts multi-level graph structure from a document. Disclosed embodiments support arbitrary graph structures with hierarchical and planar relationships. Relationships are not restricted by spatial proximity or document layout. Composite tokens can be identified interspersed with other content.

Type: Application

Filed: February 22, 2023

Publication date: June 29, 2023

Applicant: SAP SE

Inventor: Christian Reisswig
Data-driven structure extraction from text documents

Patent number: 11615246

Abstract: Methods and apparatus are disclosed for extracting structured content, as graphs, from text documents. Graph vertices and edges correspond to document tokens and pairwise relationships between tokens. Undirected peer relationships and directed relationships (e.g. key-value or composition) are supported. Vertices can be identified with predefined fields, and thence mapped to database columns for automated storage of document content in a database. A trained neural network classifier determines relationship classifications for all pairwise combinations of input tokens. The relationship classification can differentiate multiple relationship types. A multi-level classifier extracts multi-level graph structure from a document. Disclosed embodiments support arbitrary graph structures with hierarchical and planar relationships. Relationships are not restricted by spatial proximity or document layout. Composite tokens can be identified interspersed with other content.

Type: Grant

Filed: June 3, 2020

Date of Patent: March 28, 2023

Assignee: SAP SE

Inventor: Christian Reisswig
PSEUDO-LABEL GENERATION USING AN ENSEMBLE MODEL

Publication number: 20230075369

Abstract: Systems and methods include training of each of a plurality of models based on a first set of training data comprising a first plurality of pairs, each of the first plurality of pairs comprising a feature and a corresponding label, inputting of each of a plurality of features into each of the plurality of trained models to generate, for each feature of the plurality of features, a plurality of output labels, determining, for each of the plurality of features, a pseudo-label based on the plurality of output labels generated for the feature, determining a second set of training data comprising a second plurality of pairs, each of the second plurality of pairs comprising one of the plurality of features and a pseudo-label determined for the one of the plurality of features, and training an inference model to output an inferred label based on the first set of training data and the second set of training data.

Type: Application

Filed: September 8, 2021

Publication date: March 9, 2023

Inventors: Sohyeong KIM, Christian REISSWIG
Model-independent confidence values for extracted document information using a convolutional neural network

Patent number: 11557140

Abstract: Disclosed herein are system, method, and computer program product embodiments for correcting extracted document information based on generated confidence and correctness scores. In an embodiment, a document correcting system may receive a document and document information that represents information extracted from the document. The document correcting system may determine the correctness of the document information by processing the document to generate a character grid representing textual information and spatial arrangements for the text within the document. The document correcting system may apply a convolutional neural network on character grid and the document information. The convolutional neural network may output corrected document information, a correctness value indicating the possible errors in the document information, and a confidence value indicating a likelihood of the possible errors.

Type: Grant

Filed: November 30, 2020

Date of Patent: January 17, 2023

Assignee: SAP SE

Inventor: Christian Reisswig
Targeted document information extraction

Patent number: 11514489

Abstract: Disclosed herein are various embodiments for targeted document information extraction. An embodiment operates by receiving a document associated with a particular customer of a plurality of customers. It is determined whether to use a global processor or template processor to analyze the document based on whether one or more customer templates are associated with the particular customer. Which of the one or more templates associated with the particular customer correspond to the document is identified. The document is compared to the identified template associated with the customer. Information is extracted from the document based on the identified template and the identified plurality of variations. The extracted information for the document is output.

Type: Grant

Filed: January 6, 2021

Date of Patent: November 29, 2022

Assignee: SAP SE

Inventors: Ying Jiang, Christian Reisswig
MODEL-INDEPENDENT CONFIDENCE VALUE PREDICTION MACHINE LEARNED MODEL

Publication number: 20220366301

Abstract: In an example embodiment, a confidence score is computed for a predicted label (from a first model) for information extracted from a document. The confidence score is computed using a machine learned model different than the first model which is based on a Sliding-Window method. The Sliding-Window method may be based on convolutional neural networks classification, using sliding windows. It receives as input (1) the string of extracted information from an independent previous information extracted step (the “input text”), (2) the string's predicted class label, (3) the string's coordinate location in the document, and (4) the text of the document (for additional context information). The Sliding-Window method's task is to predict the confidence score to determine the correctness of the predicted label for the information.

Type: Application

Filed: June 22, 2021

Publication date: November 17, 2022

Inventors: Nurzat Rakhmanberdieva, Alexey Streltsov, Christian Reisswig
CASCADE POOLING FOR NATURAL LANGUAGE PROCESSING

Publication number: 20220366144

Abstract: Natural language processing systems and methods are disclosed herein. In some embodiments, digital document information comprising text is received. The digital document information may be processed through word and character encoding operations to generate word and character vectors while retaining document location information for the words and characters. The data may be then be processed by a series of convolution and maximum pooling operations to obtain maximum valued elements from the data. The document location information as well as the maximum values element data may be further processed for semantic classification of the data using a semantic classifier and bounding box regression.

Type: Application

Filed: May 13, 2021

Publication date: November 17, 2022

Inventor: Christian Reisswig
Adaptive high-resolution digital image processing with neural networks

Patent number: 11488020

Abstract: Technologies are described for performing adaptive high-resolution digital image processing using neural networks. For example, a number of different regions can be defined representing portions of a digital image. One of the regions covers the entire digital image at a reduced resolution. The other regions cover less than the entire digital image at resolutions higher than the region covering the entire digital image. Neural networks are then used to process each of the regions. The neural networks share information using prolongation and restriction operations. Prolongation operations propagate activations from a neural network operating on a lower resolution region to context zones of a neural network operating on a higher resolution region. Restriction operations propagate activations from the neural network operating on the higher resolution region back to the neural network operating on the lower resolution region.

Type: Grant

Filed: June 2, 2020

Date of Patent: November 1, 2022

Assignee: SAP SE

Inventors: Christian Reisswig, Shachar Klaiman
TARGETED DOCUMENT INFORMATION EXTRACTION

Publication number: 20220215446

Abstract: Disclosed herein are various embodiments for targeted document information extraction. An embodiment operates by receiving a document associated with a particular customer of a plurality of customers. It is determined whether to use a global processor or template processor to analyze the document based on whether one or more customer templates are associated with the particular customer. Which of the one or more templates associated with the particular customer correspond to the document is identified. The document is compared to the identified template associated with the customer. Information is extracted from the document based on the identified template and the identified plurality of variations. The extracted information for the document is output.

Type: Application

Filed: January 6, 2021

Publication date: July 7, 2022

Inventors: YING JIANG, Christian Reisswig
MODEL-INDEPENDENT CONFIDENCE VALUES FOR EXTRACTED DOCUMENT INFORMATION USING A CONVOLUTIONAL NEURAL NETWORK

Publication number: 20220171967

Abstract: Disclosed herein are system, method, and computer program product embodiments for correcting extracted document information based on generated confidence and correctness scores. In an embodiment, a document correcting system may receive a document and document information that represents information extracted from the document. The document correcting system may determine the correctness of the document information by processing the document to generate a character grid representing textual information and spatial arrangements for the text within the document. The document correcting system may apply a convolutional neural network on character grid and the document information. The convolutional neural network may output corrected document information, a correctness value indicating the possible errors in the document information, and a confidence value indicating a likelihood of the possible errors.

Type: Application

Filed: November 30, 2020

Publication date: June 2, 2022

Inventor: Christian REISSWIG
QUERYING SEMANTIC DATA FROM UNSTRUCTURED DOCUMENTS

Publication number: 20220092328

Abstract: Disclosed herein are system, method, and computer program product embodiments for querying document terms and identifying target data from documents. In an embodiment, a document processing system may receive a document and a query string. The document processing system may perform optical character recognition to obtain character information and positioning information for the characters of the document. The document processing system may generate a two-dimensional character grid for the document. The document processing system may apply a convolutional neural network to the character grid and the query string to identify target data from the document corresponding to the query string. The convolutional neural network may then produce a segmentation mask and/or bounding boxes to identify the targeted data.

Type: Application

Filed: September 23, 2020

Publication date: March 24, 2022

Inventors: Johannes HOEHNE, Christian REISSWIG
Querying semantic data from unstructured documents

Patent number: 11281928

Abstract: Disclosed herein are system, method, and computer program product embodiments for querying document terms and identifying target data from documents. In an embodiment, a document processing system may receive a document and a query string. The document processing system may perform optical character recognition to obtain character information and positioning information for the characters of the document. The document processing system may generate a two-dimensional character grid for the document. The document processing system may apply a convolutional neural network to the character grid and the query string to identify target data from the document corresponding to the query string. The convolutional neural network may then produce a segmentation mask and/or bounding boxes to identify the targeted data.

Type: Grant

Filed: September 23, 2020

Date of Patent: March 22, 2022

Assignee: SAP SE

Inventors: Johannes Hoehne, Christian Reisswig
Positional embeddings for document processing

Patent number: 11275934

Abstract: Disclosed herein are system, method, and computer program product embodiments for generating document labels using positional embeddings. In an embodiment, a label system may identify tokens, such as words, of a document image. The label system may apply a position vector neural network to the document image to analyze the pixels and determine positional embedding vectors corresponding to the words. The label system may then combine the positional embedding vectors to corresponding word vectors for use as an input to a neural network trained to generate document labels. This combination may embed the positional information with the corresponding word information in a serialized manner for processing by the document label neural network. Using this formatting, the label system may generate document labels in a light-weight and fast manner while still preserving spatial relationships between words.

Type: Grant

Filed: November 20, 2019

Date of Patent: March 15, 2022

Assignee: SAP SE

Inventors: Christian Reisswig, Stefan Klaus Baur
Two-dimensional document processing

Patent number: 11244208

Abstract: Disclosed herein are system, method, and computer program product embodiments for processing a document. In an embodiment, a document processing system may receive a document. The document processing system may perform optical character recognition to obtain character information and positioning information for the characters. The document processing system may generate a down-sampled two-dimensional character grid for the document. The document processing system may apply a convolutional neural network to the character grid to obtain semantic meaning for the document. The convolutional neural network may produce a segmentation mask and bounding boxes to correspond to the document.

Type: Grant

Filed: December 12, 2019

Date of Patent: February 8, 2022

Assignee: SAP SE

Inventors: Christian Reisswig, Anoop Raveendra Katti, Steffen Bickel, Johannes Hoehne, Jean Baptiste Faddoul
DATA-DRIVEN STRUCTURE EXTRACTION FROM TEXT DOCUMENTS

Publication number: 20210383067

Abstract: Methods and apparatus are disclosed for extracting structured content, as graphs, from text documents. Graph vertices and edges correspond to document tokens and pairwise relationships between tokens. Undirected peer relationships and directed relationships (e.g. key-value or composition) are supported. Vertices can be identified with predefined fields, and thence mapped to database columns for automated storage of document content in a database. A trained neural network classifier determines relationship classifications for all pairwise combinations of input tokens. The relationship classification can differentiate multiple relationship types. A multi-level classifier extracts multi-level graph structure from a document. Disclosed embodiments support arbitrary graph structures with hierarchical and planar relationships. Relationships are not restricted by spatial proximity or document layout. Composite tokens can be identified interspersed with other content.

Type: Application

Filed: June 3, 2020

Publication date: December 9, 2021

Applicant: SAP SE

Inventor: Christian Reisswig
ADAPTIVE HIGH-RESOLUTION DIGITAL IMAGE PROCESSING WITH NEURAL NETWORKS

Publication number: 20210374548

Abstract: Technologies are described for performing adaptive high-resolution digital image processing using neural networks. For example, a number of different regions can be defined representing portions of a digital image. One of the regions covers the entire digital image at a reduced resolution. The other regions cover less than the entire digital image at resolutions higher than the region covering the entire digital image. Neural networks are then used to process each of the regions. The neural networks share information using prolongation and restriction operations. Prolongation operations propagate activations from a neural network operating on a lower resolution region to context zones of a neural network operating on a higher resolution region. Restriction operations propagate activations from the neural network operating on the higher resolution region back to the neural network operating on the lower resolution region.

Type: Application

Filed: June 2, 2020

Publication date: December 2, 2021

Applicant: SAP SE

Inventors: Christian Reisswig, Shachar Klaiman

1 2 next