Patents by Inventor Stanislav Semenov

Stanislav Semenov has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

IDENTIFICATION OF KEY-VALUE ASSOCIATIONS IN DOCUMENTS USING NEURAL NETWORKS

Publication number: 20240169752

Abstract: Aspects of the disclosure provide for mechanisms for identification of text fields in documents using neural networks. A method of the disclosure includes obtaining vectors, representative of objects in a document and processing the vectors to generate key hypotheses associating key(s) with one or more objects and value hypotheses associating value(s) with zero or more objects. The method further includes generating key-value association (KVA) hypotheses associating a selected key hypothesis with a selected value hypothesis and characterized by a KVA likelihood score that is based on at least a key likelihood score associated with the selected key hypothesis and a value likelihood score associated with the selected value hypothesis. The method further includes identifying one or more target KVAs of the document using the KVA likelihood scores of the generated KVA hypotheses.

Type: Application

Filed: November 21, 2022

Publication date: May 23, 2024

Inventor: Stanislav Semenov
EXTRACTING INFORMATION FROM DOCUMENTS USING AUTOMATIC MARKUP BASED ON HISTORICAL DATA

Publication number: 20240143632

Abstract: Mechanisms for document processing and analysis can include receiving a document and identifying, in a data structure, a record corresponding to the document. The record can include one or more entries, where each entry contains data reflecting a respective item of information extracted from a corresponding part of the document. The mechanisms can include determining for each entry of the record, a corresponding degree of association between the entry and a respective item of information referenced by the entry. They can further include updating the corresponding degrees of association, and selecting, among the corresponding degrees of association, a set of corresponding degrees of association whose aggregate degree of association satisfies a criterion.

Type: Application

Filed: October 28, 2022

Publication date: May 2, 2024

Inventor: Stanislav Semenov
RELIABLE DETERMINATION OF FIELD VALUES IN DOCUMENTS WITH REMOVAL OF STATIC FIELD ELEMENTS

Publication number: 20240144711

Abstract: Aspects and implementations provide for mechanisms of detection of fields in electronic documents and determination of values of the detected field. The disclosed techniques include obtaining an input into a machine learning model (MLM), the input including a first image of a field extracted from a document and depicting one or more static elements of the field and a field value, the input and further including a second image of the field. The input may be processed using the MLM to identify one or more static regions that correspond to static elements of the field. The identified static regions may be used to modify the first image in which the static regions are removed or have a reduced visibility. The modified image may be used to determine the field value.

Type: Application

Filed: October 31, 2022

Publication date: May 2, 2024

Inventors: Ivan Zagaynov, Stanislav Semenov, Alena Dedigurova
METHODS AND SYSTEMS OF FIELD DETECTION IN A DOCUMENT

Publication number: 20240078826

Abstract: Systems and methods are disclosed to receive a training data set comprising a plurality of document images, wherein each document image of the plurality of document images is associated with respective metadata identifying a document field containing a variable text; generate, by processing the plurality of document images, a first heat map represented by a data structure comprising a plurality of heat map elements corresponding to a plurality of document image pixels, wherein each heat map element stores a counter of a number of document images in which the document field contains a document image pixel associated with the heat map element; receive an input document image; and identify, within the input document image, a candidate region comprising the document field, wherein the candidate region comprises a plurality of input document image pixels corresponding to heat map elements satisfying a threshold condition.

Type: Application

Filed: November 10, 2023

Publication date: March 7, 2024

Inventors: Stanislav Semenov, Mikhail Lanin
DETECTING FIELDS IN DOCUMENT IMAGES

Publication number: 20240078828

Abstract: A method of detecting fields in document images includes: receiving a codebook comprising a set of visual words, each visual word corresponding to a center of a cluster of local descriptors; calculating, based on a set of user labeled document images, for each visual word of the codebook, a respective frequency distribution of a field position of a specified labeled field with respect to the visual word; loading a document image for extraction of target fields; calculating a statistical predicate of a possible position of a target field in the document image based on the frequency distributions; and detecting, using the trained model, fields in the document image based on the calculated statistical predicate.

Type: Application

Filed: November 6, 2023

Publication date: March 7, 2024

Inventors: Ivan Zagaynov, Vasily Loginov, Stanislav Semenov, Aleksandr Valiukov
Optimization and use of codebooks for document analysis

Patent number: 11893818

Abstract: A method of generating and optimizing a codebooks for document analysis comprises: receiving a first set of document images; extracting a plurality of keypoint regions from each document image of the first set of document images; calculating local descriptors for each keypoint region of the extracted keypoint regions; clustering the local descriptors such that each center of a cluster of local descriptors corresponds to a respective visual word; generating a codebook containing a set of visual words; and optimizing the codebook by maximizing mutual information (MI) between a target field of a second set of document images and at least one visual word of the set of visual words.

Type: Grant

Filed: July 26, 2021

Date of Patent: February 6, 2024

Assignee: ABBYY Development Inc.

Inventors: Ivan Zagaynov, Vasily Loginov, Stanislav Semenov, Aleksandr Valiukov
Methods and systems of field detection in a document

Patent number: 11861925

Abstract: Systems and methods are disclosed to receive a training data set comprising a plurality of document images, wherein each document image of the plurality of document images is associated with respective metadata identifying a document field containing a variable text; generate, by processing the plurality of document images, a first heat map represented by a data structure comprising a plurality of heat map elements corresponding to a plurality of document image pixels, wherein each heat map element stores a counter of a number of document images in which the document field contains a document image pixel associated with the heat map element; receive an input document image; and identify, within the input document image, a candidate region comprising the document field, wherein the candidate region comprises a plurality of input document image pixels corresponding to heat map elements satisfying a threshold condition.

Type: Grant

Filed: December 21, 2020

Date of Patent: January 2, 2024

Assignee: ABBYY Development Inc.

Inventors: Stanislav Semenov, Mikhail Lanin
Identification of fields in documents with neural networks without templates

Patent number: 11816165

Abstract: Aspects of the disclosure provide for mechanisms for identification of fields in documents using neural networks. A method of the disclosure includes obtaining a layout of a document, the document having a plurality of fields, identifying the document, based on the layout, as belonging to a first type of documents of a plurality of identified types of documents, identifying a plurality of symbol sequences of the document, and processing, by a processing device, the plurality of symbol sequences of the document using a first neural network associated with the first type of documents to determine an association of a first field of the plurality of fields with a first symbol sequence of the plurality of symbol sequences of the document.

Type: Grant

Filed: November 22, 2019

Date of Patent: November 14, 2023

Assignee: ABBYY Development Inc.

Inventor: Stanislav Semenov
Document clusterization using neural networks

Patent number: 11816909

Abstract: An example method of document classification comprises: detecting a set of keypoints in an input image; generating a set of keypoint vectors, wherein each keypoint vector of the set of keypoint vectors is associated with a corresponding keypoint of the set of keypoints; extracting a feature map from the input image; producing a combination of the set of keypoint vectors with the feature map; transforming the combination into a set of keypoint mapping vectors according to a predefined mapping scheme; estimating, based on the set of keypoint mapping vectors, a plurality of importance factors associated with the set of keypoints; and classifying the input image based on the set of keypoints and the plurality of importance factors.

Type: Grant

Filed: August 9, 2021

Date of Patent: November 14, 2023

Assignee: ABBYY Development Inc.

Inventors: Ivan Zagaynov, Stanislav Semenov
Identification of table partitions in documents with neural networks using global document context

Patent number: 11775746

Abstract: Aspects of the disclosure provide for mechanisms for identification of table partitions in documents using neural networks. A method of the disclosure includes obtaining a plurality of symbol sequences of a document having at least one table, determining a plurality of vectors representative of symbol sequences having at least one alphanumeric character or a table graphics element, processing the plurality of vectors using a first neural network to obtain a plurality of recalculated vectors, determining an association between a first recalculated vector and a second recalculated vector, wherein the first recalculated vector is representative of an alphanumeric sequence and the second recalculated vector is associated with a table partition, and determining, based on the association between the first recalculated vector and the second recalculated vector, an association between the alphanumeric sequence and the table partition.

Type: Grant

Filed: July 23, 2021

Date of Patent: October 3, 2023

Assignee: ABBYY Development Inc.

Inventor: Stanislav Semenov
Identification of blocks of associated words in documents with complex structures

Patent number: 11741734

Abstract: Aspects of the disclosure provide for mechanisms for identification of blocks of associated words in documents using neural networks. A method of the disclosure includes obtaining a plurality of words of a document, the document having a first block of associated words, determining a plurality of vectors representative of the plurality of words, processing the plurality of vectors using a first neural network to obtain a plurality of recalculated vectors having values based on the plurality of vectors, determining a plurality of association values corresponding to a connections between at least two words of the document, and identifying, using the plurality of recalculated vectors and the plurality of association values, the first block of associated symbol sequences.

Type: Grant

Filed: January 13, 2022

Date of Patent: August 29, 2023

Assignee: ABBYY Development Inc.

Inventor: Stanislav Semenov
EXTRACTING STRUCTURED INFORMATION FROM DOCUMENT IMAGES

Publication number: 20230206671

Abstract: An example method of extracting structured information from document images comprises: receiving a document image; detecting a tabular structure within the document image; identifying a plurality of rows of the tabular structure, wherein each row of the plurality of rows comprises one or more lines; for each row of the plurality of rows, identifying a set of field types of one or more fields comprised by each line of the one or more lines comprised by the respective row; detecting, in each line of the one or more lines, a set of fields corresponding to a respective set of field types; and extracting information from the set of fields.

Type: Application

Filed: December 27, 2021

Publication date: June 29, 2023

Inventors: Mikhail Lanin, Stanislav Semenov
CONTINUOUS LEARNING FOR DOCUMENT PROCESSING AND ANALYSIS

Publication number: 20230138491

Abstract: A document processing method includes receiving one or more documents, performing optical character recognition on the one or more documents to detect words comprising symbols in the one or more documents, and determining a encoding value for each of the symbols. It further includes applying a first hash function to each encoding value to generate a first set of hashed symbol values, applying a second hash function to each hashed symbol value to generate a vector array including a second set of hashed symbol values, and applying a linear transformation to each value of the second set of hashed symbol values of the vector array. The method also includes applying an irreversible non-linear activation function to the vector array to obtain abstract values associated with the symbols and saving the abstract values to train a neural network to detect fields in an input document.

Type: Application

Filed: November 3, 2021

Publication date: May 4, 2023

Inventor: Stanislav Semenov
CONTINUOUS LEARNING FOR DOCUMENT PROCESSING AND ANALYSIS

Publication number: 20230134218

Abstract: A document processing method includes receiving one or more sets of documents, and assigning each document to one or more basic clusters based on the metadata of the document. It further includes for each cluster, training a respective basic cluster model detecting one or more visual element types, and responsive to a first threshold criterion measure related to the one or more basic clusters being satisfied, generating one or more superclusters based on an attribute shared by documents comprised by the plurality of basic clusters. The method also includes training a respective supercluster model detecting the one or more element types and generating a generalized cluster from the one or more superclusters. It includes training a generalized model for the generalized cluster, receiving an input document, assigning the input document to corresponding clusters, and detecting visual elements by processing the input document by each of the corresponding models.

Type: Application

Filed: November 3, 2021

Publication date: May 4, 2023

Inventors: Stanislav Semenov, Andrei Zyuzin
DOCUMENT CLUSTERIZATION USING NEURAL NETWORKS

Publication number: 20230038097

Abstract: An example method of document classification comprises: detecting a set of keypoints in an input image; generating a set of keypoint vectors, wherein each keypoint vector of the set of keypoint vectors is associated with a corresponding keypoint of the set of keypoints; extracting a feature map from the input image; producing a combination of the set of keypoint vectors with the feature map; transforming the combination into a set of keypoint mapping vectors according to a predefined mapping scheme; estimating, based on the set of keypoint mapping vectors, a plurality of importance factors associated with the set of keypoints; and classifying the input image based on the set of keypoints and the plurality of importance factors.

Type: Application

Filed: August 9, 2021

Publication date: February 9, 2023

Inventors: Ivan Zagaynov, Stanislav Semenov
DESIGN OPTIMIZATION AND USE OF CODEBOOKS FOR DOCUMENT ANALYSIS

Publication number: 20230028992

Abstract: A method of generating and optimizing a codebooks for document analysis comprises: receiving a first set of document images; extracting a plurality of keypoint regions from each document image of the first set of document images; calculating local descriptors for each keypoint region of the extracted keypoint regions; clustering the local descriptors such that each center of a cluster of local descriptors corresponds to a respective visual word; generating a codebook containing a set of visual words; and optimizing the codebook by maximizing mutual information (MI) between a target field of a second set of document images and at least one visual word of the set of visual words.

Type: Application

Filed: July 26, 2021

Publication date: January 26, 2023

Inventors: Ivan Zagaynov, Vasily Loginov, Stanislav Semenov
FUZZY SEARCHING USING WORD SHAPES FOR BIG DATA APPLICATIONS

Publication number: 20220335073

Abstract: Aspects of the disclosure provide for systems and methods for word shape-assisted searches in big data applications. The systems and methods of the disclosure enable operations that identify a mapping scheme in which words are represented via word shapes with same word shapes capable of representing different words. Operations further include forming hypotheses that prospectively associate words in a document with target entries in a database, and eliminainge at least some of the formed hypotheses based on mismatch between sets of word shapes corresponding to the words of the formed hypotheses and word shapes of various database entries.

Type: Application

Filed: April 22, 2021

Publication date: October 20, 2022

Inventor: Stanislav Semenov
METHODS AND SYSTEMS OF FIELD DETECTION IN A DOCUMENT

Publication number: 20220198182

Abstract: Systems and methods are disclosed to receive a training data set comprising a plurality of document images, wherein each document image of the plurality of document images is associated with respective metadata identifying a document field containing a variable text; generate, by processing the plurality of document images, a first heat map represented by a data structure comprising a plurality of heat map elements corresponding to a plurality of document image pixels, wherein each heat map element stores a counter of a number of document images in which the document field contains a document image pixel associated with the heat map element; receive an input document image; and identify, within the input document image, a candidate region comprising the document field, wherein the candidate region comprises a plurality of input document image pixels corresponding to heat map elements satisfying a threshold condition.

Type: Application

Filed: December 21, 2020

Publication date: June 23, 2022

Inventors: Stanislav Semenov, Mikhail Lanin
DOCUMENT CLUSTERIZATION

Publication number: 20220156491

Abstract: A computer-implemented method for document clusterization, comprising: receiving an input document; determining, by evaluating a document similarity function, a plurality of similarity measures, wherein each similarity measure of the plurality of similarity measures reflects a degree of similarity between the input document and a corresponding cluster of documents of a plurality of clusters of documents; based on the plurality of similarity measures, determining that the input document does not belong to any of the clusters of documents of the plurality of clusters of documents; creating a new cluster of documents; and associating the input document with the new cluster of documents.

Type: Application

Filed: November 18, 2020

Publication date: May 19, 2022

Inventors: Stanislav Semenov, Alexandra Antonova, Aleksey Misyurev
IDENTIFICATION OF BLOCKS OF ASSOCIATED WORDS IN DOCUMENTS WITH COMPLEX STRUCTURES

Publication number: 20220139098

Abstract: Aspects of the disclosure provide for mechanisms for identification of blocks of associated words in documents using neural networks. A method of the disclosure includes obtaining a plurality of words of a document, the document having a first block of associated words, determining a plurality of vectors representative of the plurality of words, processing the plurality of vectors using a first neural network to obtain a plurality of recalculated vectors having values based on the plurality of vectors, determining a plurality of association values corresponding to a connections between at least two words of the document, and identifying, using the plurality of recalculated vectors and the plurality of association values, the first block of associated symbol sequences.

Type: Application

Filed: January 13, 2022

Publication date: May 5, 2022

Inventor: Stanislav Semenov

1 2 next