Abstract: A system and method for training a neural network model includes obtaining, by a processing device, a document image containing raw text, tokenizing the raw text in the document image to obtain tokens located in a plurality of rows, identifying a first token in one of the plurality of rows, calculating a horizontal language feature of the first token based on the first token and one or more tokens in the row, and encoding, using a first encoder, the horizontal language feature into a horizontal language embedding, calculating a vertical language feature of the first token based on the token and one or more tokens in rows above or below the row, and encoding, using a second encoder, the vertical language feature into a vertical language embedding, and training a neural network model using the horizontal language embeddings and the vertical language embeddings.
Abstract: A system and method for constructing a training dataset and training a neural network include obtaining a searchable portable document format (PDF) document, identifying a bounding box defining a region in a background image that is associated with an overlaying text object defined in the PDF document, determining an image crop of the PDF document according to the bounding box, and generating a training data sample for the training dataset, the training data sample comprising a data pair of the image crop and the associated text object.
Abstract: A system and method relate to a processing device implementing a master artificial intelligence (AI) engine to receive, from each of one or more real-time AI engines, a machine learning algorithm, parameters associated with the machine learning algorithm, and features employed to train the parameters, receive labeled data used to train the parameters associated with the machine learning algorithm, and construct, based on a combination rule, a master machine learning model using the features, the machine learning algorithm, and the parameters associated with the machine learning algorithm received from each of the one or more real-time AI engines.
Abstract: A system and method for machine learning training provide a master AI subsystem for training a machine learning processing pipeline, the machine learning processing pipeline including machine learning components to process an input document, where each of at least two of the candidate machine learning components is provided with at least two candidate implementations, and the master AI subsystem is to train the machine learning processing pipeline by selectively deploying the at least two candidate implementations for each of the at least two of the machine learning components.
Abstract: A system and method for real-time machine learning include an interface device and a processing device to responsive to receiving a document, identify tokens in a document object model (DOM) tree associated with the document, present, on a user interface of the interface device, the document including the identified tokens, label, based on user actions on the user interface, one or more of the tokens in the DOM tree as one of a strong positive, a strong negative, or one of a weak positive or a weak negative token, and provide the DOM tree including the labeled tokens to train a machine learning model.
Abstract: A system and method for real-time machine learning include an interface device and a processing device to responsive to receiving a document, identify tokens in a document object model (DOM) tree associated with the document, present, on a user interface of the interface device, the document including the identified tokens, label, based on user actions on the user interface, one or more of the tokens in the DOM tree as one of a strong positive, a strong negative, or one of a weak positive or a weak negative token, and provide the DOM tree including the labeled tokens to train a machine learning model.