Abstract: The invention provides a system and method for analyzing similarity of natural language data. The system comprises a neural network subsystem adapted for reading graph format input data comprising a plurality of nodes having node values, and a similarity estimation subsystem utilizing the neural network subsystem and being trained for estimating similarity of a first and a second graphs, the similarity estimation subsystem being capable of producing at least one similarity value. In addition, there is provided a similarity explainability subsystem adapted to calculate importance values for a plurality of nodes or subgraphs of the second graph, which are used to create a reduced second graph indicate sub-blocks of the second block of natural language.
Abstract: The invention provides a method and system for training a machine learning-based patent search or novelty evaluation system. The method comprises providing a plurality of patent documents each having a computer-identifiable claim block and specification block, the specification block including at least part of the description of the patent document. The method also comprises providing a machine learning model and training the machine learning model using a training data set comprising data from said patent documents for forming a trained machine learning model. According to the invention, the training comprises using pairs of claim blocks and specification blocks originating from the same patent document as training cases of said training data set.