Patents Assigned to ABBYY DEVELOPMENT INC.
  • Patent number: 12205391
    Abstract: An example method of extracting structured information from document images comprises: receiving a document image; detecting a tabular structure within the document image; identifying a plurality of rows of the tabular structure, wherein each row of the plurality of rows comprises one or more lines; for each row of the plurality of rows, identifying a set of field types of one or more fields comprised by each line of the one or more lines comprised by the respective row; detecting, in each line of the one or more lines, a set of fields corresponding to a respective set of field types; and extracting information from the set of fields.
    Type: Grant
    Filed: December 27, 2021
    Date of Patent: January 21, 2025
    Assignee: ABBYY Development Inc.
    Inventors: Mikhail Lanin, Stanislav Semenov
  • Patent number: 12190622
    Abstract: A computer-implemented method for document clusterization, comprising: receiving an input document; determining, by evaluating a document similarity function, a plurality of similarity measures, wherein each similarity measure of the plurality of similarity measures reflects a degree of similarity between the input document and a corresponding cluster of documents of a plurality of clusters of documents; based on the plurality of similarity measures, determining that the input document does not belong to any of the clusters of documents of the plurality of clusters of documents; creating a new cluster of documents; and associating the input document with the new cluster of documents.
    Type: Grant
    Filed: November 18, 2020
    Date of Patent: January 7, 2025
    Assignee: ABBYY Development Inc.
    Inventors: Stanislav Semenov, Alexandra Antonova, Aleksey Misyurev
  • Patent number: 12158900
    Abstract: Mechanisms for document processing and analysis can include receiving a document and identifying, in a data structure, a record corresponding to the document. The record can include one or more entries, where each entry contains data reflecting a respective item of information extracted from a corresponding part of the document. The mechanisms can include determining for each entry of the record, a corresponding degree of association between the entry and a respective item of information referenced by the entry. They can further include updating the corresponding degrees of association, and selecting, among the corresponding degrees of association, a set of corresponding degrees of association whose aggregate degree of association satisfies a criterion.
    Type: Grant
    Filed: October 28, 2022
    Date of Patent: December 3, 2024
    Assignee: ABBYY Development Inc.
    Inventor: Stanislav Semenov
  • Patent number: 12118816
    Abstract: A document processing method includes receiving one or more sets of documents, and assigning each document to one or more basic clusters based on the metadata of the document. It further includes for each cluster, training a respective basic cluster model detecting one or more visual element types, and responsive to a first threshold criterion measure related to the one or more basic clusters being satisfied, generating one or more superclusters based on an attribute shared by documents comprised by the plurality of basic clusters. The method also includes training a respective supercluster model detecting the one or more element types and generating a generalized cluster from the one or more superclusters. It includes training a generalized model for the generalized cluster, receiving an input document, assigning the input document to corresponding clusters, and detecting visual elements by processing the input document by each of the corresponding models.
    Type: Grant
    Filed: November 3, 2021
    Date of Patent: October 15, 2024
    Assignee: ABBYY Development Inc.
    Inventors: Stanislav Semenov, Andrei Zyuzin
  • Patent number: 12118813
    Abstract: A document processing method includes receiving one or more documents, performing optical character recognition on the one or more documents to detect words comprising symbols in the one or more documents, and determining a encoding value for each of the symbols. It further includes applying a first hash function to each encoding value to generate a first set of hashed symbol values, applying a second hash function to each hashed symbol value to generate a vector array including a second set of hashed symbol values, and applying a linear transformation to each value of the second set of hashed symbol values of the vector array. The method also includes applying an irreversible non-linear activation function to the vector array to obtain abstract values associated with the symbols and saving the abstract values to train a neural network to detect fields in an input document.
    Type: Grant
    Filed: November 3, 2021
    Date of Patent: October 15, 2024
    Assignee: ABBYY Development Inc.
    Inventor: Stanislav Semenov
  • Patent number: 12086647
    Abstract: A method for dynamically generating and executing tasks can include includes executing a worker execution stream, where the worker execution stream includes multiple execution threads associated with a workflow of the workflow service, receiving, by the worker execution stream, from a workflow service, a definition of a task, and responsive to determining that the definition of the task satisfies a predefined criterion, dividing the task into a set of sub-tasks. The method further includes generating a definition of a sub-task workflow for the set of sub-tasks, and causing the workflow service to distribute, based on the definition of the sub-task workflow, the sub-tasks of the set to one or more workers for execution.
    Type: Grant
    Filed: December 16, 2022
    Date of Patent: September 10, 2024
    Assignee: ABBYY Development Inc.
    Inventors: Vladimir Demidov, Vladimir Bukin, Vladimir Yunev, Alexander Subbotin
  • Patent number: 12046016
    Abstract: A method of the disclosure includes receiving, by a processing device, a document image, dividing the document image into a plurality of patches and determining, for each patch, whether the patch is monochromatic or polychromatic. It further includes clusterizing a plurality of monochromatic patches into a plurality of clusters within a color space, wherein each cluster corresponds to a color layer of a plurality of color layers of the document image, and segmenting each polychromatic patch into a corresponding plurality of monochromatic segments. The method also includes, for each polychromatic patch, associating each monochromatic segment of the corresponding plurality of monochromatic segments with a cluster of the plurality of clusters, and utilizing the plurality of clusters for performing an information extraction task on the document image.
    Type: Grant
    Filed: December 9, 2021
    Date of Patent: July 23, 2024
    Assignee: ABBYY Development Inc.
    Inventors: Vadim Mikhonov, Ivan Zagaynov
  • Patent number: 12008431
    Abstract: Aspects and implementations provide for mechanisms of detection and decoding of barcodes in images. The disclosed techniques include estimating dimensions of a module of a barcode based on geometric characteristics of a barcode image, forming hypotheses that group modules into barcode symbols, and assessing viability of formed hypotheses. Various operations of the techniques may involve the use of neural networks, including estimation of module dimensions and assessment of groupings of modules into lines and lines into barcode symbols. The techniques may be used for decoding of barcodes captured in images of unfavorable conditions, including blur, perspective, sub-optimal lighting, barcode deformation, and the like. The techniques may be applied to decoding linear one-dimensional barcodes, two-dimensional barcodes, and stacked linear barcodes.
    Type: Grant
    Filed: May 16, 2022
    Date of Patent: June 11, 2024
    Assignee: ABBYY Development Inc.
    Inventors: Ivan Zagaynov, Dmitry Zvonarev, Maksim Baranchikov
  • Patent number: 11972626
    Abstract: System and method for document image detection, comprising: producing, using a neural network, a superpixel segmentation map of an input image; generating a superpixel binary mask by associating each superpixel of the superpixel segmentation map with a class of a predetermined set of classes; identifying one or more connected components in the superpixel binary mask; for each connected component of the superpixel binary mask, identifying a corresponding minimum bounding polygon; creating one or more image dividing lines based on the minimum bounding polygons; and defining boundaries of one or more objects of interest based on at least a subset of the image dividing lines.
    Type: Grant
    Filed: December 24, 2020
    Date of Patent: April 30, 2024
    Assignee: ABBYY Development Inc.
    Inventors: Ivan Zagaynov, Aleksandra Stepina
  • Patent number: 11960966
    Abstract: Aspects and implementations provide for mechanisms of detection and decoding of barcodes in images. The disclosed techniques include estimating dimensions of a module of a barcode based on geometric characteristics of a barcode image, forming hypotheses that group modules into barcode symbols, and assessing viability of formed hypotheses. Various operations of the techniques may involve the use of neural networks, including estimation of module dimensions and assessment of groupings of modules into lines and lines into barcode symbols. The techniques may be used for decoding of barcodes captured in images of unfavorable conditions, including blur, perspective, sub-optimal lighting, barcode deformation, and the like. The techniques may be applied to decoding linear one-dimensional barcodes, two-dimensional barcodes, and stacked linear barcodes.
    Type: Grant
    Filed: May 16, 2022
    Date of Patent: April 16, 2024
    Assignee: ABBYY Development Inc.
    Inventors: Ivan Zagaynov, Dmitry Zvonarev, Aleksandr Riashchikov
  • Patent number: 11948385
    Abstract: A computer-implemented method for image capture by a mobile device, comprising: receiving, by a video capturing application running on a mobile device, a video stream from a camera of the mobile device; identifying a specific frame of the video stream; generating a plurality of hypotheses defining image borders within the specific frame; selecting, by a neural network, a particular hypothesis among the plurality of hypotheses; producing a candidate image by applying the particular hypothesis to the specific frame; determining a value of a quality metric of the candidate image; determining that the value of the quality metric of the candidate image exceeds one or more values of the quality metric of one or more previously processed images extracted from the video stream; wherein the image capture application is a zero-footprint application.
    Type: Grant
    Filed: May 23, 2022
    Date of Patent: April 2, 2024
    Assignee: ABBYY Development Inc.
    Inventors: Ivan Zagaynov, Stepan Lobastov, Juri Katkov, Vasily Shahov, Olga Titova, Ivan Khintsitskiy
  • Patent number: 11893818
    Abstract: A method of generating and optimizing a codebooks for document analysis comprises: receiving a first set of document images; extracting a plurality of keypoint regions from each document image of the first set of document images; calculating local descriptors for each keypoint region of the extracted keypoint regions; clustering the local descriptors such that each center of a cluster of local descriptors corresponds to a respective visual word; generating a codebook containing a set of visual words; and optimizing the codebook by maximizing mutual information (MI) between a target field of a second set of document images and at least one visual word of the set of visual words.
    Type: Grant
    Filed: July 26, 2021
    Date of Patent: February 6, 2024
    Assignee: ABBYY Development Inc.
    Inventors: Ivan Zagaynov, Vasily Loginov, Stanislav Semenov, Aleksandr Valiukov
  • Patent number: 11893784
    Abstract: Aspects of the disclosure provide for systems and processes for assessing image quality for optical character recognition (OCR), including but not limited to: segmenting an image into patches, providing the segmented image as an input into a first machine learning model (MLM), obtaining, using the first MLM, for each patch, first feature vectors representative of a reduction of imaging quality in a respective patch, and second feature vectors representative of a text content of the respective patch, providing to a second MLM the first feature vectors and the second feature vectors, and obtaining, using the second MLM, an indication of suitability of the image for OCR.
    Type: Grant
    Filed: May 20, 2021
    Date of Patent: February 6, 2024
    Assignee: ABBYY Development Inc.
    Inventors: Ivan Zagaynov, Dmitry Rodin, Vasily Loginov
  • Patent number: 11861925
    Abstract: Systems and methods are disclosed to receive a training data set comprising a plurality of document images, wherein each document image of the plurality of document images is associated with respective metadata identifying a document field containing a variable text; generate, by processing the plurality of document images, a first heat map represented by a data structure comprising a plurality of heat map elements corresponding to a plurality of document image pixels, wherein each heat map element stores a counter of a number of document images in which the document field contains a document image pixel associated with the heat map element; receive an input document image; and identify, within the input document image, a candidate region comprising the document field, wherein the candidate region comprises a plurality of input document image pixels corresponding to heat map elements satisfying a threshold condition.
    Type: Grant
    Filed: December 21, 2020
    Date of Patent: January 2, 2024
    Assignee: ABBYY Development Inc.
    Inventors: Stanislav Semenov, Mikhail Lanin
  • Patent number: 11816909
    Abstract: An example method of document classification comprises: detecting a set of keypoints in an input image; generating a set of keypoint vectors, wherein each keypoint vector of the set of keypoint vectors is associated with a corresponding keypoint of the set of keypoints; extracting a feature map from the input image; producing a combination of the set of keypoint vectors with the feature map; transforming the combination into a set of keypoint mapping vectors according to a predefined mapping scheme; estimating, based on the set of keypoint mapping vectors, a plurality of importance factors associated with the set of keypoints; and classifying the input image based on the set of keypoints and the plurality of importance factors.
    Type: Grant
    Filed: August 9, 2021
    Date of Patent: November 14, 2023
    Assignee: ABBYY Development Inc.
    Inventors: Ivan Zagaynov, Stanislav Semenov
  • Patent number: 11816165
    Abstract: Aspects of the disclosure provide for mechanisms for identification of fields in documents using neural networks. A method of the disclosure includes obtaining a layout of a document, the document having a plurality of fields, identifying the document, based on the layout, as belonging to a first type of documents of a plurality of identified types of documents, identifying a plurality of symbol sequences of the document, and processing, by a processing device, the plurality of symbol sequences of the document using a first neural network associated with the first type of documents to determine an association of a first field of the plurality of fields with a first symbol sequence of the plurality of symbol sequences of the document.
    Type: Grant
    Filed: November 22, 2019
    Date of Patent: November 14, 2023
    Assignee: ABBYY Development Inc.
    Inventor: Stanislav Semenov
  • Patent number: 11790675
    Abstract: In one embodiment, a system receives an image depicting a line of text. The system segments the image into two or more fragment images. For each of the two or more fragment images, the system determines a first hypothesis to segment the fragment image into a first plurality of grapheme images and a first fragmentation confidence score. The system determines a second hypothesis to segment the fragment image into a second plurality of grapheme images and a second fragmentation confidence score. The system determines that the first fragmentation confidence score is greater than the second fragmentation confidence score. The system translates the first plurality of grapheme images defined by the first hypothesis to symbols. The system assembles the symbols of each fragment image to derive the line of text.
    Type: Grant
    Filed: November 30, 2020
    Date of Patent: October 17, 2023
    Assignee: ABBYY Development Inc.
    Inventor: Andrei Upshinskii
  • Patent number: 11775746
    Abstract: Aspects of the disclosure provide for mechanisms for identification of table partitions in documents using neural networks. A method of the disclosure includes obtaining a plurality of symbol sequences of a document having at least one table, determining a plurality of vectors representative of symbol sequences having at least one alphanumeric character or a table graphics element, processing the plurality of vectors using a first neural network to obtain a plurality of recalculated vectors, determining an association between a first recalculated vector and a second recalculated vector, wherein the first recalculated vector is representative of an alphanumeric sequence and the second recalculated vector is associated with a table partition, and determining, based on the association between the first recalculated vector and the second recalculated vector, an association between the alphanumeric sequence and the table partition.
    Type: Grant
    Filed: July 23, 2021
    Date of Patent: October 3, 2023
    Assignee: ABBYY Development Inc.
    Inventor: Stanislav Semenov
  • Patent number: 11741734
    Abstract: Aspects of the disclosure provide for mechanisms for identification of blocks of associated words in documents using neural networks. A method of the disclosure includes obtaining a plurality of words of a document, the document having a first block of associated words, determining a plurality of vectors representative of the plurality of words, processing the plurality of vectors using a first neural network to obtain a plurality of recalculated vectors having values based on the plurality of vectors, determining a plurality of association values corresponding to a connections between at least two words of the document, and identifying, using the plurality of recalculated vectors and the plurality of association values, the first block of associated symbol sequences.
    Type: Grant
    Filed: January 13, 2022
    Date of Patent: August 29, 2023
    Assignee: ABBYY Development Inc.
    Inventor: Stanislav Semenov
  • Patent number: 11715008
    Abstract: Systems and methods for neural network training utilizing loss functions reflecting neighbor token dependencies.
    Type: Grant
    Filed: December 29, 2018
    Date of Patent: August 1, 2023
    Assignee: ABBYY Development Inc.
    Inventors: Eugene Indenbom, Daniil Anastasiev