Abstract: The present invention provides a method and a system for identifying relevant information in a data set. The method involves the identification of nodes of interest in a tree structure. A node of interest is a node that contains information, which is relevant to a pre-defined context. The method further involves the step of iteratively extracting sub-trees from the tree structure and identifying records in the extracted sub-trees. The sub-tree is a hierarchical structure that shows the relationship of each node of interest with its ancestor nodes in the tree structure. Each record is a group of sub-tree nodes and contains at least one node of interest.
Type:
Grant
Filed:
April 25, 2005
Date of Patent:
March 2, 2010
Assignee:
IM2, Inc.
Inventors:
Alex Meyer, Shashikant Khandelwal, Dhiraj Pardasani, Ranjit Padmanabhan
Abstract: A present invention provides a method and a system for extracting information related to a pre-defined context from data sets written in semi-structured or unstructured form, such as a natural language text. The information related to the pre-defined context is stored in an information store in accordance with a pre-defined structural arrangement. Further, the individual data values in the extracted information are assigned weights depending on their relevance to attributes of the predefined context. The operation of assigning weights to the structured information provides a measure for comparing the relevance of a plurality of structurally arranged information to the attributes of the pre-defined context.
Abstract: The invention provides a method and system to compare data objects. Each data object is converted into a directed acyclic graph forest, which comprises one or more directed acyclic graphs. The directed acyclic graph forests corresponding to data objects are then compared to calculate a similarity score between the data objects. The similarity score is then used as a measure to determine the extent of similarity between the data objects.