Patents by Inventor Stanislav Semenov

Stanislav Semenov has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240169752
    Abstract: Aspects of the disclosure provide for mechanisms for identification of text fields in documents using neural networks. A method of the disclosure includes obtaining vectors, representative of objects in a document and processing the vectors to generate key hypotheses associating key(s) with one or more objects and value hypotheses associating value(s) with zero or more objects. The method further includes generating key-value association (KVA) hypotheses associating a selected key hypothesis with a selected value hypothesis and characterized by a KVA likelihood score that is based on at least a key likelihood score associated with the selected key hypothesis and a value likelihood score associated with the selected value hypothesis. The method further includes identifying one or more target KVAs of the document using the KVA likelihood scores of the generated KVA hypotheses.
    Type: Application
    Filed: November 21, 2022
    Publication date: May 23, 2024
    Inventor: Stanislav Semenov
  • Publication number: 20240143632
    Abstract: Mechanisms for document processing and analysis can include receiving a document and identifying, in a data structure, a record corresponding to the document. The record can include one or more entries, where each entry contains data reflecting a respective item of information extracted from a corresponding part of the document. The mechanisms can include determining for each entry of the record, a corresponding degree of association between the entry and a respective item of information referenced by the entry. They can further include updating the corresponding degrees of association, and selecting, among the corresponding degrees of association, a set of corresponding degrees of association whose aggregate degree of association satisfies a criterion.
    Type: Application
    Filed: October 28, 2022
    Publication date: May 2, 2024
    Inventor: Stanislav Semenov
  • Publication number: 20240144711
    Abstract: Aspects and implementations provide for mechanisms of detection of fields in electronic documents and determination of values of the detected field. The disclosed techniques include obtaining an input into a machine learning model (MLM), the input including a first image of a field extracted from a document and depicting one or more static elements of the field and a field value, the input and further including a second image of the field. The input may be processed using the MLM to identify one or more static regions that correspond to static elements of the field. The identified static regions may be used to modify the first image in which the static regions are removed or have a reduced visibility. The modified image may be used to determine the field value.
    Type: Application
    Filed: October 31, 2022
    Publication date: May 2, 2024
    Inventors: Ivan Zagaynov, Stanislav Semenov, Alena Dedigurova
  • Publication number: 20240078826
    Abstract: Systems and methods are disclosed to receive a training data set comprising a plurality of document images, wherein each document image of the plurality of document images is associated with respective metadata identifying a document field containing a variable text; generate, by processing the plurality of document images, a first heat map represented by a data structure comprising a plurality of heat map elements corresponding to a plurality of document image pixels, wherein each heat map element stores a counter of a number of document images in which the document field contains a document image pixel associated with the heat map element; receive an input document image; and identify, within the input document image, a candidate region comprising the document field, wherein the candidate region comprises a plurality of input document image pixels corresponding to heat map elements satisfying a threshold condition.
    Type: Application
    Filed: November 10, 2023
    Publication date: March 7, 2024
    Inventors: Stanislav Semenov, Mikhail Lanin
  • Publication number: 20240078828
    Abstract: A method of detecting fields in document images includes: receiving a codebook comprising a set of visual words, each visual word corresponding to a center of a cluster of local descriptors; calculating, based on a set of user labeled document images, for each visual word of the codebook, a respective frequency distribution of a field position of a specified labeled field with respect to the visual word; loading a document image for extraction of target fields; calculating a statistical predicate of a possible position of a target field in the document image based on the frequency distributions; and detecting, using the trained model, fields in the document image based on the calculated statistical predicate.
    Type: Application
    Filed: November 6, 2023
    Publication date: March 7, 2024
    Inventors: Ivan Zagaynov, Vasily Loginov, Stanislav Semenov, Aleksandr Valiukov
  • Patent number: 11893818
    Abstract: A method of generating and optimizing a codebooks for document analysis comprises: receiving a first set of document images; extracting a plurality of keypoint regions from each document image of the first set of document images; calculating local descriptors for each keypoint region of the extracted keypoint regions; clustering the local descriptors such that each center of a cluster of local descriptors corresponds to a respective visual word; generating a codebook containing a set of visual words; and optimizing the codebook by maximizing mutual information (MI) between a target field of a second set of document images and at least one visual word of the set of visual words.
    Type: Grant
    Filed: July 26, 2021
    Date of Patent: February 6, 2024
    Assignee: ABBYY Development Inc.
    Inventors: Ivan Zagaynov, Vasily Loginov, Stanislav Semenov, Aleksandr Valiukov
  • Patent number: 11861925
    Abstract: Systems and methods are disclosed to receive a training data set comprising a plurality of document images, wherein each document image of the plurality of document images is associated with respective metadata identifying a document field containing a variable text; generate, by processing the plurality of document images, a first heat map represented by a data structure comprising a plurality of heat map elements corresponding to a plurality of document image pixels, wherein each heat map element stores a counter of a number of document images in which the document field contains a document image pixel associated with the heat map element; receive an input document image; and identify, within the input document image, a candidate region comprising the document field, wherein the candidate region comprises a plurality of input document image pixels corresponding to heat map elements satisfying a threshold condition.
    Type: Grant
    Filed: December 21, 2020
    Date of Patent: January 2, 2024
    Assignee: ABBYY Development Inc.
    Inventors: Stanislav Semenov, Mikhail Lanin
  • Patent number: 11816165
    Abstract: Aspects of the disclosure provide for mechanisms for identification of fields in documents using neural networks. A method of the disclosure includes obtaining a layout of a document, the document having a plurality of fields, identifying the document, based on the layout, as belonging to a first type of documents of a plurality of identified types of documents, identifying a plurality of symbol sequences of the document, and processing, by a processing device, the plurality of symbol sequences of the document using a first neural network associated with the first type of documents to determine an association of a first field of the plurality of fields with a first symbol sequence of the plurality of symbol sequences of the document.
    Type: Grant
    Filed: November 22, 2019
    Date of Patent: November 14, 2023
    Assignee: ABBYY Development Inc.
    Inventor: Stanislav Semenov
  • Patent number: 11816909
    Abstract: An example method of document classification comprises: detecting a set of keypoints in an input image; generating a set of keypoint vectors, wherein each keypoint vector of the set of keypoint vectors is associated with a corresponding keypoint of the set of keypoints; extracting a feature map from the input image; producing a combination of the set of keypoint vectors with the feature map; transforming the combination into a set of keypoint mapping vectors according to a predefined mapping scheme; estimating, based on the set of keypoint mapping vectors, a plurality of importance factors associated with the set of keypoints; and classifying the input image based on the set of keypoints and the plurality of importance factors.
    Type: Grant
    Filed: August 9, 2021
    Date of Patent: November 14, 2023
    Assignee: ABBYY Development Inc.
    Inventors: Ivan Zagaynov, Stanislav Semenov
  • Patent number: 11775746
    Abstract: Aspects of the disclosure provide for mechanisms for identification of table partitions in documents using neural networks. A method of the disclosure includes obtaining a plurality of symbol sequences of a document having at least one table, determining a plurality of vectors representative of symbol sequences having at least one alphanumeric character or a table graphics element, processing the plurality of vectors using a first neural network to obtain a plurality of recalculated vectors, determining an association between a first recalculated vector and a second recalculated vector, wherein the first recalculated vector is representative of an alphanumeric sequence and the second recalculated vector is associated with a table partition, and determining, based on the association between the first recalculated vector and the second recalculated vector, an association between the alphanumeric sequence and the table partition.
    Type: Grant
    Filed: July 23, 2021
    Date of Patent: October 3, 2023
    Assignee: ABBYY Development Inc.
    Inventor: Stanislav Semenov
  • Patent number: 11741734
    Abstract: Aspects of the disclosure provide for mechanisms for identification of blocks of associated words in documents using neural networks. A method of the disclosure includes obtaining a plurality of words of a document, the document having a first block of associated words, determining a plurality of vectors representative of the plurality of words, processing the plurality of vectors using a first neural network to obtain a plurality of recalculated vectors having values based on the plurality of vectors, determining a plurality of association values corresponding to a connections between at least two words of the document, and identifying, using the plurality of recalculated vectors and the plurality of association values, the first block of associated symbol sequences.
    Type: Grant
    Filed: January 13, 2022
    Date of Patent: August 29, 2023
    Assignee: ABBYY Development Inc.
    Inventor: Stanislav Semenov
  • Publication number: 20230206671
    Abstract: An example method of extracting structured information from document images comprises: receiving a document image; detecting a tabular structure within the document image; identifying a plurality of rows of the tabular structure, wherein each row of the plurality of rows comprises one or more lines; for each row of the plurality of rows, identifying a set of field types of one or more fields comprised by each line of the one or more lines comprised by the respective row; detecting, in each line of the one or more lines, a set of fields corresponding to a respective set of field types; and extracting information from the set of fields.
    Type: Application
    Filed: December 27, 2021
    Publication date: June 29, 2023
    Inventors: Mikhail Lanin, Stanislav Semenov
  • Publication number: 20230138491
    Abstract: A document processing method includes receiving one or more documents, performing optical character recognition on the one or more documents to detect words comprising symbols in the one or more documents, and determining a encoding value for each of the symbols. It further includes applying a first hash function to each encoding value to generate a first set of hashed symbol values, applying a second hash function to each hashed symbol value to generate a vector array including a second set of hashed symbol values, and applying a linear transformation to each value of the second set of hashed symbol values of the vector array. The method also includes applying an irreversible non-linear activation function to the vector array to obtain abstract values associated with the symbols and saving the abstract values to train a neural network to detect fields in an input document.
    Type: Application
    Filed: November 3, 2021
    Publication date: May 4, 2023
    Inventor: Stanislav Semenov
  • Publication number: 20230134218
    Abstract: A document processing method includes receiving one or more sets of documents, and assigning each document to one or more basic clusters based on the metadata of the document. It further includes for each cluster, training a respective basic cluster model detecting one or more visual element types, and responsive to a first threshold criterion measure related to the one or more basic clusters being satisfied, generating one or more superclusters based on an attribute shared by documents comprised by the plurality of basic clusters. The method also includes training a respective supercluster model detecting the one or more element types and generating a generalized cluster from the one or more superclusters. It includes training a generalized model for the generalized cluster, receiving an input document, assigning the input document to corresponding clusters, and detecting visual elements by processing the input document by each of the corresponding models.
    Type: Application
    Filed: November 3, 2021
    Publication date: May 4, 2023
    Inventors: Stanislav Semenov, Andrei Zyuzin
  • Publication number: 20230038097
    Abstract: An example method of document classification comprises: detecting a set of keypoints in an input image; generating a set of keypoint vectors, wherein each keypoint vector of the set of keypoint vectors is associated with a corresponding keypoint of the set of keypoints; extracting a feature map from the input image; producing a combination of the set of keypoint vectors with the feature map; transforming the combination into a set of keypoint mapping vectors according to a predefined mapping scheme; estimating, based on the set of keypoint mapping vectors, a plurality of importance factors associated with the set of keypoints; and classifying the input image based on the set of keypoints and the plurality of importance factors.
    Type: Application
    Filed: August 9, 2021
    Publication date: February 9, 2023
    Inventors: Ivan Zagaynov, Stanislav Semenov
  • Publication number: 20230028992
    Abstract: A method of generating and optimizing a codebooks for document analysis comprises: receiving a first set of document images; extracting a plurality of keypoint regions from each document image of the first set of document images; calculating local descriptors for each keypoint region of the extracted keypoint regions; clustering the local descriptors such that each center of a cluster of local descriptors corresponds to a respective visual word; generating a codebook containing a set of visual words; and optimizing the codebook by maximizing mutual information (MI) between a target field of a second set of document images and at least one visual word of the set of visual words.
    Type: Application
    Filed: July 26, 2021
    Publication date: January 26, 2023
    Inventors: Ivan Zagaynov, Vasily Loginov, Stanislav Semenov
  • Publication number: 20220335073
    Abstract: Aspects of the disclosure provide for systems and methods for word shape-assisted searches in big data applications. The systems and methods of the disclosure enable operations that identify a mapping scheme in which words are represented via word shapes with same word shapes capable of representing different words. Operations further include forming hypotheses that prospectively associate words in a document with target entries in a database, and eliminainge at least some of the formed hypotheses based on mismatch between sets of word shapes corresponding to the words of the formed hypotheses and word shapes of various database entries.
    Type: Application
    Filed: April 22, 2021
    Publication date: October 20, 2022
    Inventor: Stanislav Semenov
  • Publication number: 20220198182
    Abstract: Systems and methods are disclosed to receive a training data set comprising a plurality of document images, wherein each document image of the plurality of document images is associated with respective metadata identifying a document field containing a variable text; generate, by processing the plurality of document images, a first heat map represented by a data structure comprising a plurality of heat map elements corresponding to a plurality of document image pixels, wherein each heat map element stores a counter of a number of document images in which the document field contains a document image pixel associated with the heat map element; receive an input document image; and identify, within the input document image, a candidate region comprising the document field, wherein the candidate region comprises a plurality of input document image pixels corresponding to heat map elements satisfying a threshold condition.
    Type: Application
    Filed: December 21, 2020
    Publication date: June 23, 2022
    Inventors: Stanislav Semenov, Mikhail Lanin
  • Publication number: 20220156491
    Abstract: A computer-implemented method for document clusterization, comprising: receiving an input document; determining, by evaluating a document similarity function, a plurality of similarity measures, wherein each similarity measure of the plurality of similarity measures reflects a degree of similarity between the input document and a corresponding cluster of documents of a plurality of clusters of documents; based on the plurality of similarity measures, determining that the input document does not belong to any of the clusters of documents of the plurality of clusters of documents; creating a new cluster of documents; and associating the input document with the new cluster of documents.
    Type: Application
    Filed: November 18, 2020
    Publication date: May 19, 2022
    Inventors: Stanislav Semenov, Alexandra Antonova, Aleksey Misyurev
  • Publication number: 20220139098
    Abstract: Aspects of the disclosure provide for mechanisms for identification of blocks of associated words in documents using neural networks. A method of the disclosure includes obtaining a plurality of words of a document, the document having a first block of associated words, determining a plurality of vectors representative of the plurality of words, processing the plurality of vectors using a first neural network to obtain a plurality of recalculated vectors having values based on the plurality of vectors, determining a plurality of association values corresponding to a connections between at least two words of the document, and identifying, using the plurality of recalculated vectors and the plurality of association values, the first block of associated symbol sequences.
    Type: Application
    Filed: January 13, 2022
    Publication date: May 5, 2022
    Inventor: Stanislav Semenov