Patents Assigned to ABBYY Production LLC
  • Patent number: 11232299
    Abstract: Aspects of the disclosure provide for mechanisms for identification of blocks of associated words in documents using neural networks. A method of the disclosure includes obtaining a plurality of words of a document, the document having a first block of associated words, determining a plurality of vectors representative of the plurality of words, processing the plurality of vectors using a first neural network to obtain a plurality of recalculated vectors having values based on the plurality of vectors, determining a plurality of association values corresponding to a connections between at least two words of the document, and identifying, using the plurality of recalculated vectors and the plurality of association values, the first block of associated symbol sequences.
    Type: Grant
    Filed: December 18, 2019
    Date of Patent: January 25, 2022
    Assignee: ABBYY Production LLC
    Inventor: Stanislav Semenov
  • Patent number: 11170249
    Abstract: Mechanisms for identification of text fields in documents using neural networks are described. Identification of text fields includes obtaining a plurality of symbol sequences of a document having a plurality of text fields, determining a plurality of vectors representative of one of the plurality of symbol sequences, processing the plurality of vectors using a first neural network to obtain, based on values of the plurality of vectors, a plurality of recalculated vectors, determining an association between a first recalculated vector of the plurality of recalculated vectors and a first text field of the plurality of text fields, the first recalculated vector being representative of a first symbol sequence of the plurality of symbol sequences, and determining, based on the association between the first recalculated vector and the first text field, an association between the first symbol sequence and the first text field.
    Type: Grant
    Filed: October 4, 2019
    Date of Patent: November 9, 2021
    Assignee: ABBYY Production LLC
    Inventor: Stanislav Semenov
  • Patent number: 11170248
    Abstract: A data capture component receives a video stream comprising a plurality of frames, wherein each frame comprises a data field. One or more text regions in a selected frame of the plurality of frames are identified. One of the one or more identified text regions that corresponds to a set of attributes associated with the data field are selected. The data of the one of the one or more identified text regions of the selected frame are compared with data of one or more text regions of a subsequent frame. Responsive to determining that the data of the one or more text regions of the subsequent frame is a closer match to the set of attributes, the data of the one of the one or more identified text regions of the selected frame are updated. The data of the one of the one or more identified text regions is then provided to a client device.
    Type: Grant
    Filed: November 25, 2019
    Date of Patent: November 9, 2021
    Assignee: ABBYY Production LLC
    Inventor: Andrey Isaev
  • Patent number: 11164035
    Abstract: Systems and methods for neural-network-based optical character recognition using specialized confidence functions. An example method comprises: receiving a grapheme image; computing, by a neural network, a feature vector representing the grapheme image in a space of image features; and computing a confidence vector associated with the grapheme image, wherein each element of the confidence vector reflects a distance, in the space of image features, between the feature vector and a center of a class of a set of classes, wherein the class is identified by an index of the element of the confidence vector.
    Type: Grant
    Filed: November 2, 2018
    Date of Patent: November 2, 2021
    Assignee: ABBYY Production LLC
    Inventor: Aleksey Zhuravlev
  • Patent number: 11157779
    Abstract: A classification engine generates, using a weighted graph, a plurality of sets of confused graphemes based on recognition data for a plurality of document images; receives an input grapheme image associated with a document image comprising a plurality of grapheme images; determines a set of recognition options for the input grapheme image, where the set of recognition options comprises a set of target characters that are similar to the input grapheme image; identifies a neural network trained to recognize a first set of confused graphemes, where the first set of confused graphemes comprises at least a portion of the set of recognition options for the input grapheme image; and determines a grapheme class for the input grapheme image using the identified neural network.
    Type: Grant
    Filed: February 14, 2020
    Date of Patent: October 26, 2021
    Assignee: ABBYY Production LLC
    Inventors: Aleksey Alekseevich Zhuravlev, Vladimir Rybkin, Konstantin Vladimirovich Anisimovich, Azat Aydarovich Davletshin
  • Patent number: 11106931
    Abstract: Systems and methods for performing OCR of an image depicting text symbols and imaging a document having a plurality of planar regions are disclosed. An example method comprises: receiving a first image of a document having a plurality of planar regions and one or more second images of the document; identifying a plurality of coordinate transformations corresponding to each of the planar regions of the first image of the document; identifying, using the plurality of coordinate transformations, a cluster of symbol sequences of the text in the first image and in the one or more second images; and producing a resulting OCR text comprising a median symbol sequence for the cluster of symbol sequences.
    Type: Grant
    Filed: August 22, 2019
    Date of Patent: August 31, 2021
    Assignee: ABBYY Production LLC
    Inventor: Aleksey Kalyuzhny
  • Patent number: 11107202
    Abstract: The subject matter of this specification can be implemented in, among other things, a method including identifying one or more blocks in an electronic image that depicts text characters. The method includes identifying one or more text blocks among the blocks that depict the text characters. The method includes identifying a text contrast value for each of the text blocks. The method includes identifying a type for each pixel in each of the text blocks based on the text contrast value. The method includes determining, for each pixel in each of the text blocks, a brightness for the pixel based on the identified type. The method includes storing, in at least one memory, the electronic image including the determined brightness for each pixel in each of the text blocks.
    Type: Grant
    Filed: February 3, 2020
    Date of Patent: August 31, 2021
    Assignee: ABBYY PRODUCTION LLC
    Inventors: Vasily Vasilyevich Loginov, Ivan Germanovich Zagaynov
  • Patent number: 11087093
    Abstract: Systems and methods for using autoencoders for training natural language classifiers. An example method comprises: producing, by a computer system, a plurality of feature vectors, wherein each feature vector represents a natural language text of a text corpus, wherein the text corpus comprises a first plurality of annotated natural language texts and a second plurality of un-annotated natural language texts; training, using the plurality of feature vectors, an autoencoder represented by an artificial neural network; producing, by the autoencoder, an output of the hidden layer, by processing a training data set comprising the first plurality of annotated natural language texts; and training, using the training data set, a text classifier that accepts an input vector comprising the output of the hidden layer and yields a degree of association, with a certain text category, of a natural language text utilized to produce the output of the hidden layer.
    Type: Grant
    Filed: October 11, 2019
    Date of Patent: August 10, 2021
    Assignee: ABBYY Production LLC
    Inventors: Konstantin Vladimirovich Anisimovich, Evgenii Mikhailovich Indenbom, Ivan Ivanovich Ivashnev
  • Patent number: 11074442
    Abstract: Aspects of the disclosure provide for mechanisms for identification of table partitions in documents using neural networks. A method of the disclosure includes obtaining a plurality of symbol sequences of a document having at least one table, determining a plurality of vectors representative of symbol sequences having at least one alphanumeric character or a table graphics element, processing the plurality of vectors using a first neural network to obtain a plurality of recalculated vectors, determining an association between a first recalculated vector and a second recalculated vector, wherein the first recalculated vector is representative of an alphanumeric sequence and the second recalculated vector is associated with a table partition, and determining, based on the association between the first recalculated vector and the second recalculated vector, an association between the alphanumeric sequence and the table partition.
    Type: Grant
    Filed: October 4, 2019
    Date of Patent: July 27, 2021
    Assignee: Abbyy Production LLC
    Inventor: Stanislav Semenov
  • Patent number: 11023764
    Abstract: Systems and methods for performing OCR of a series of images depicting text symbols.
    Type: Grant
    Filed: February 3, 2020
    Date of Patent: June 1, 2021
    Assignee: ABBYY Production, LLC
    Inventors: Aleksey Ivanovich Kalyuzhny, Aleksey Yevgenyevich Lebedev
  • Patent number: 10977511
    Abstract: Systems and methods for performing OCR of a series of images depicting text symbols. An example method comprises performing OCR a series of images to produce a current symbol sequence and corresponding symbol sequence quadrangle; associating the current symbol sequence with a previous symbol sequence for a previously received image; identifying a median string; determining a median symbol sequence quadrangle; and displaying, using the median symbol sequence quadrangle, a resulting OCR text representing at least a portion of the original document.
    Type: Grant
    Filed: December 9, 2019
    Date of Patent: April 13, 2021
    Assignee: ABBYY PRODUCTION LLC
    Inventor: Aleksey Ivanovich Kalyuzhny
  • Patent number: 10963647
    Abstract: Systems and methods are disclosed to predict probability of occurrence of a string. A sequence of vectors is generated based at least on a maximum length of word for each symbol in the string. The sequence of vectors is provided to a machine learning unit for the string. A probability of occurrence of the string is obtained from the machine learning unit.
    Type: Grant
    Filed: May 18, 2020
    Date of Patent: March 30, 2021
    Assignee: ABBYY Production LLC
    Inventors: Evgenii Mikhaylovich Indenbom, Daniil Garryevich Anastasiev
  • Patent number: 10872271
    Abstract: Systems and methods for training image processing neural networks by synthetic photorealistic indicia-bearing images. An example method comprises: generating an initial set of images, wherein each image of the initial set of images comprises a rendering of a text string; producing an augmented set of images by processing the initial set of images to introduce, into each image of the initial set of image, at least one simulated image defect; generating a training dataset comprising a plurality of pairs of images, wherein each pair of images comprises a first image selected from the initial set of images and a second image selected from the augmented set of images; and training, using the training dataset, a convolutional neural network for image processing.
    Type: Grant
    Filed: September 21, 2018
    Date of Patent: December 22, 2020
    Assignee: ABBYY PRODUCTION LLC
    Inventors: Ivan Germanovich Zagaynov, Pavel Valeryevich Borin
  • Patent number: 10867169
    Abstract: Aspects of the disclosure provide for mechanisms for character recognition using neural networks. A method of the disclosure includes assigning, using a first-level classifier of a grapheme classifier, an input grapheme image to a first grapheme cluster of a plurality of grapheme clusters, wherein the first grapheme cluster comprises a first plurality of graphemes; selecting, by a processing device, a classifier from a plurality of second-level classifiers of the grapheme classifier based on the first grapheme cluster, wherein the selected classifier is trained to recognize the first plurality of graphemes; and processing the input grapheme image using the selected classifier to recognize at least one character in the input grapheme image.
    Type: Grant
    Filed: June 22, 2018
    Date of Patent: December 15, 2020
    Assignee: ABBYY Production LLC
    Inventor: Aleksey Alekseevich Zhuravlev
  • Patent number: 10762389
    Abstract: Systems and methods are disclosed to receive an image depicting at least a part of a document and identify a plurality of partition points dividing the image into potential segments; generate a linear partition graph (LPG) comprising a plurality of vertices using the plurality of partition points and a plurality of arcs connecting the plurality of vertices; identify a path of the LPG having a value of a quality metric above a threshold value, wherein the path is selected from a plurality of paths of the LPG and comprises one or more arcs and the value of the quality metric is derived using a neural network classifying each of a plurality of pixels of the image; and generate one or more blocks of the image wherein each of the one or more blocks corresponds to an arc of the identified path and represents a portion of the image associated with a type of an object.
    Type: Grant
    Filed: October 26, 2018
    Date of Patent: September 1, 2020
    Assignee: ABBYY Production LLC
    Inventors: Konstantin Zuev, Dmitry Deryagin, Mikhail Atroshchenko
  • Patent number: 10726557
    Abstract: The current document is directed to methods and systems that acquire an image containing text with curved text lines to generate a corresponding corrected image in which the text lines are straightened and have a rectilinear organization. The method may include identifying a page sub-image within the text-containing image, generating a text-line-curvature model for the page sub-image that associates inclination angles with pixels in the page sub-image, generating local displacements, using the text-line-curvature model, for pixels in the page sub-image, and transferring pixels from the page sub-image to a corrected page-sub-image using the local displacements to construct a corrected page sub-image in which the text lines are straightened and in which the text characters and symbols have a rectilinear arrangement.
    Type: Grant
    Filed: September 30, 2019
    Date of Patent: July 28, 2020
    Assignee: ABBYY Production LLC
    Inventors: Olga Arnoldova Kacher, Ivan Germanovich Zagaynov, Vladimir Rybkin
  • Patent number: 10713515
    Abstract: The subject matter of this specification can be implemented in, among other things, a method that includes receiving a first image from a first camera depicting a first view of a physical item, where the physical item displays a plurality of characters. The method includes receiving a second image from a second camera depicting a second view of the physical item. The method includes performing optical character recognition on the first image to identify first characters and a first layout in the first image and on the second image to identify second characters and a second layout in the second image. The method includes combining the first characters with the second characters by comparing the first characters with the second characters and the first layout with the second layout. The method includes storing the combined first and second characters.
    Type: Grant
    Filed: September 25, 2017
    Date of Patent: July 14, 2020
    Assignee: ABBYY PRODUCTION LLC
    Inventors: Aleksey Ivanovich Kalyuzhny, Aleksey Yevgen'yevich Lebedev
  • Patent number: 10706320
    Abstract: Disclosed are systems and method for determining document type of a digital document.
    Type: Grant
    Filed: October 30, 2018
    Date of Patent: July 7, 2020
    Assignee: ABBYY Production LLC
    Inventor: Irina Zosimovna Filimonova
  • Patent number: 10706369
    Abstract: Systems and methods for utilizing user-verified data for training confidence level models.
    Type: Grant
    Filed: January 30, 2017
    Date of Patent: July 7, 2020
    Assignee: ABBYY Production LLC
    Inventors: Anna Pospelova, Elmira Rakhmatulina
  • Patent number: 10699109
    Abstract: The present disclosures provide methods of optical character recognition for a patterned document having one static element and one information field. Systems and methods are disclosed to identify in each of a current and a previous image of a series of images of an original document overlapping with each other, a corresponding plurality of base points, wherein each base point is associated with one textural artifact in each of the current image and the previous image using an OCR text of the current image; identify parameters of a coordinate transformation converting coordinates of the previous image into coordinates of the current image; associate a part of the OCR text with a cluster of a plurality of clusters of symbol sequences; identify a median string representing the cluster of symbol sequences; and produce a resulting OCR text representing at least a portion of the original document.
    Type: Grant
    Filed: September 28, 2018
    Date of Patent: June 30, 2020
    Assignee: ABBYY Production LLC
    Inventor: Aleksey Ivanovich Kalyuzhny