Patents Assigned to LEVERTON HOLDING LLC

Text line image splitting with different font sizes

Patent number: 11869259

Abstract: A method for splitting text line images includes receiving a text line image and identifying that the text line image comprises a plurality of zones, wherein each zone includes text whose font differs from the text of adjacent zones. The method further includes selecting a splitting position between multiple zones and splitting the text line image at the splitting position into a plurality of image segments, wherein each image segment contains at least one zone of the text line image and performing optical character recognition on each image segment to recognize a text segment of the image segment. In certain implementations, the method further includes generating one or more confidence measurements and selecting a splitting position that corresponds to a large gradient in the confidence measurement.

Type: Grant

Filed: October 18, 2021

Date of Patent: January 9, 2024

Assignee: LEVERTON HOLDING LLC

Inventors: Florian Kuhlmann, Michael Kieweg
Named entity recognition with convolutional networks

Patent number: 11816571

Abstract: Methods and systems for recognizing named entities within the text of a document are provided. The methods and systems may include receiving a document image and recognized text of the document image. A feature map of the document image may be created, a tagged map may be created, and locations of tags within the tagged map may be estimated using a machine learning model. Named entities with the recognized text may be recognized based on the one or more locations of the tags. In some embodiments, the machine learning model is a convolutional neural network. In further embodiments, creating the feature map may include determining, for a subset of the cells of the feature map, one or more features of the recognized text contained in a corresponding portion of the document image.

Type: Grant

Filed: October 4, 2021

Date of Patent: November 14, 2023

Assignee: LEVERTON HOLDING LLC

Inventor: Christian Schäfer
Text line normalization systems and methods

Patent number: 11704476

Abstract: A method for estimating text heights of text line images includes estimating a text height with a sequence recognizer. The method further includes normalizing a vertical dimension and/or position of text within a text line image based on the text height. The method may also further include calculating a feature of the text line image. In some examples, the sequence recognizer estimates the text height with a machine learning model.

Type: Grant

Filed: July 12, 2021

Date of Patent: July 18, 2023

Assignee: LEVERTON HOLDING LLC

Inventors: Florian Kuhlmann, Michael Kieweg, Saurabh Shekhar Verma
Post-filtering of named entities with machine learning

Patent number: 11687719

Abstract: A method for identifying errors associated with named entity recognition includes recognizing a candidate named entity within a text and extracting a chunk from the text containing the candidate named entity. The method further includes creating a feature vector associated with the chunk and analyzing the feature vector for an indication of an error associated with the candidate named entity. The method also includes correcting the error associated with the candidate named entity.

Type: Grant

Filed: March 1, 2021

Date of Patent: June 27, 2023

Assignee: LEVERTON HOLDING LLC

Inventors: Christian Schäfer, Michael Kieweg, Florian Kuhlmann
METHODS AND SYSTEMS FOR AUTOMATED TABLE DETECTION WITHIN DOCUMENTS

Publication number: 20230021040

Abstract: Methods and systems for detecting tables within documents are provided. The methods and systems may include receiving a text of the document that includes a plurality of words depicted in the document image. Feature sets may be calculated for the words and may contain one or more features of a corresponding word of the text. Candidate table words may then be identified based on the features vectors, and may then be used to identify a table location within the document image. In some cases, the candidate table words may be identified using a machine learning model.

Type: Application

Filed: September 19, 2022

Publication date: January 19, 2023

Applicant: LEVERTON HOLDING LLC

Inventors: Christian Schäfer, Michael Kieweg
Methods and systems for automated table detection within documents

Patent number: 11450125

Abstract: Methods and systems for detecting tables within documents are provided. The methods and systems may include receiving a text of the document that includes a plurality of words depicted in the document image. Feature sets may be calculated for the words and may contain one or more features of a corresponding word of the text. Candidate table words may then be identified based on the features vectors, and may then be used to identify a table location within the document image. In some cases, the candidate table words may be identified using a machine learning model.

Type: Grant

Filed: December 2, 2019

Date of Patent: September 20, 2022

Assignee: LEVERTON HOLDING LLC

Inventors: Christian Schäfer, Michael Kieweg
TEXT LINE IMAGE SPLITTING WITH DIFFERENT FONT SIZES

Publication number: 20220108555

Abstract: A method for splitting text line images includes receiving a text line image and identifying that the text line image comprises a plurality of zones, wherein each zone includes text whose font differs from the text of adjacent zones. The method further includes selecting a splitting position between multiple zones and splitting the text line image at the splitting position into a plurality of image segments, wherein each image segment contains at least one zone of the text line image and performing optical character recognition on each image segment to recognize a text segment of the image segment. In certain implementations, the method further includes generating one or more confidence measurements and selecting a splitting position that corresponds to a large gradient in the confidence measurement.

Type: Application

Filed: October 18, 2021

Publication date: April 7, 2022

Applicant: LEVERTON HOLDING LLC

Inventors: Florian Kuhlmann, Michael Kieweg
NAMED ENTITY RECOGNITION WITH CONVOLUTIONAL NETWORKS

Publication number: 20220100994

Abstract: Methods and systems for recognizing named entities within the text of a document are provided. The methods and systems may include receiving a document image and recognized text of the document image. A feature map of the document image may be created, a tagged map may be created, and locations of tags within the tagged map may be estimated using a machine learning model. Named entities with the recognized text may be recognized based on the one or more locations of the tags. In some embodiments, the machine learning model is a convolutional neural network. In further embodiments, creating the feature map may include determining, for a subset of the cells of the feature map, one or more features of the recognized text contained in a corresponding portion of the document image.

Type: Application

Filed: October 4, 2021

Publication date: March 31, 2022

Applicant: Leverton Holding LLC

Inventor: Christian Schäfer
TEXT LINE NORMALIZATION SYSTEMS AND METHODS

Publication number: 20210334573

Abstract: A method for estimating text heights of text line images includes estimating a text height with a sequence recognizer. The method further includes normalizing a vertical dimension and/or position of text within a text line image based on the text height. The method may also further include calculating a feature of the text line image. In some examples, the sequence recognizer estimates the text height with a machine learning model.

Type: Application

Filed: July 12, 2021

Publication date: October 28, 2021

Applicant: Leverton Holding LLC

Inventors: Florian Kuhlmann, Michael Kieweg, Saurabh Shekhar Verma
Text line image splitting with different font sizes

Patent number: 11151371

Abstract: A method for splitting text line images includes receiving a text line image and identifying that the text line image comprises a plurality of zones, wherein each zone includes text whose font differs from the text of adjacent zones. The method further includes selecting a splitting position between multiple zones and splitting the text line image at the splitting position into a plurality of image segments, wherein each image segment contains at least one zone of the text line image and performing optical character recognition on each image segment to recognize a text segment of the image segment. In certain implementations, the method further includes generating one or more confidence measurements and selecting a splitting position that corresponds to a large gradient in the confidence measurement.

Type: Grant

Filed: August 21, 2019

Date of Patent: October 19, 2021

Assignee: LEVERTON HOLDING, LLC

Inventors: Florian Kuhlmann, Michael Kieweg
Named entity recognition with convolutional networks

Patent number: 11138425

Abstract: Methods and systems for recognizing named entities within the text of a document are provided. The methods and systems may include receiving a document image and recognized text of the document image. A feature map of the document image may be created, a tagged map may be created, and locations of tags within the tagged map may be estimated using a machine learning model. Named entities with the recognized text may be recognized based on the one or more locations of the tags. In some embodiments, the machine learning model is a convolutional neural network. In further embodiments, creating the feature map may include determining, for a subset of the cells of the feature map, one or more features of the recognized text contained in a corresponding portion of the document image.

Type: Grant

Filed: September 25, 2019

Date of Patent: October 5, 2021

Assignee: LEVERTON HOLDING LLC

Inventor: Christian Schäfer
Text line normalization systems and methods

Patent number: 11062164

Abstract: A method for estimating text heights of text line images includes estimating a text height with a sequence recognizer. The method further includes normalizing a vertical dimension and/or position of text within a text line image based on the text height. The method may also further include calculating a feature of the text line image. In some examples, the sequence recognizer estimates the text height with a machine learning model.

Type: Grant

Filed: July 16, 2019

Date of Patent: July 13, 2021

Assignee: LEVERTON HOLDING LLC

Inventors: Florian Kuhlmann, Michael Kieweg, Saurabh Shekhar Verma
POST-FILTERING OF NAMED ENTITIES WITH MACHINE LEARNING

Publication number: 20210182494

Abstract: A method for identifying errors associated with named entity recognition includes recognizing a candidate named entity within a text and extracting a chunk from the text containing the candidate named entity. The method further includes creating a feature vector associated with the chunk and analyzing the feature vector for an indication of an error associated with the candidate named entity. The method also includes correcting the error associated with the candidate named entity.

Type: Application

Filed: March 1, 2021

Publication date: June 17, 2021

Applicant: LEVERTON HOLDING LLC

Inventors: Christian Schäfer, Michael Kieweg, Florian Kuhlmann
DATA STYLE TRANSFORMATION WITH ADVERSARIAL MODELS

Publication number: 20210166125

Abstract: Systems and methods for transforming data between multiple styles are provided. In one embodiment, a system is provided that includes a generator model, a discriminator model, and a preserver model. The generator model may be configured to receive data in a first style and generate converted data in a second style. The discriminator model may be configured to receive the converted data from the generator model, compare the converted data to original data in the second style, and compute a resemblance measure based on the comparison. The preserver model may be configured to receive the converted data from the generator model and compute an information measure of the converted data. The generator model may also be trained to optimize the resemblance measure and the information measure.

Type: Application

Filed: December 3, 2020

Publication date: June 3, 2021

Applicant: LEVERTON HOLDING LLC

Inventors: Christian Schäfer, Florian Kuhlmann
Post-filtering of named entities with machine learning

Patent number: 10936820

Abstract: A method for identifying errors associated with named entity recognition includes recognizing a candidate named entity within a text and extracting a chunk from the text containing the candidate named entity. The method further includes creating a feature vector associated with the chunk and analyzing the feature vector for an indication of an error associated with the candidate named entity. The method also includes correcting the error associated with the candidate named entity.

Type: Grant

Filed: May 20, 2019

Date of Patent: March 2, 2021

Assignee: LEVERTON HOLDING LLC

Inventors: Christian Schäfer, Michael Kieweg, Florian Kuhlmann
NAMED ENTITY RECOGNITION WITH CONVOLUTIONAL NETWORKS

Publication number: 20200097718

Abstract: Methods and systems for recognizing named entities within the text of a document are provided. The methods and systems may include receiving a document image and recognized text of the document image. A feature map of the document image may be created, a tagged map may be created, and locations of tags within the tagged map may be estimated using a machine learning model. Named entities with the recognized text may be recognized based on the one or more locations of the tags. In some embodiments, the machine learning model is a convolutional neural network. In further embodiments, creating the feature map may include determining, for a subset of the cells of the feature map, one or more features of the recognized text contained in a corresponding portion of the document image.

Type: Application

Filed: September 25, 2019

Publication date: March 26, 2020

Applicant: LEVERTON HOLDING LLC

Inventor: Christian Schäfer
TEXT LINE IMAGE SPLITTING WITH DIFFERENT FONT SIZES

Publication number: 20200065574

Abstract: A method for splitting text line images includes receiving a text line image and identifying that the text line image comprises a plurality of zones, wherein each zone includes text whose font differs from the text of adjacent zones. The method further includes selecting a splitting position between multiple zones and splitting the text line image at the splitting position into a plurality of image segments, wherein each image segment contains at least one zone of the text line image and performing optical character recognition on each image segment to recognize a text segment of the image segment. In certain implementations, the method further includes generating one or more confidence measurements and selecting a splitting position that corresponds to a large gradient in the confidence measurement.

Type: Application

Filed: August 21, 2019

Publication date: February 27, 2020

Applicant: LEVERTON HOLDING LLC

Inventors: Florian Kuhlmann, Michael Kieweg