SYSTEM AND METHOD FOR MINING PATTERNS FROM RELATIONSHIP SEQUENCES EXTRACTED FROM BIG DATA

The various embodiments herein provide a system and method for mining frequent patterns in relationship space from a plurality of relationship sequences extracted from a big data The system comprises a data repository for collecting and storing the big data. An Entity Store for collecting and storing a plurality of entities from the big data, an Entity Hierarchy for representing a hierarchical structure of entities, a Relationship Store for collecting and storing relationship instances between the pluralities of entities, a Relationship Hierarchy for representing a hierarchical structure of relationship, a language/domain model for organizing entities and relationships in a hierarchical manner, a pattern query Processing Module (PQPM) for processing, a pattern query related to finding patterns in relationships and entities, and a Pattern Generation Module (PGM) to generate frequent patterns and a Frequent Pattern Display Module (FPDM) to provide a visual presentation of the mined patterns.

Skip to: Description  ·  Claims  · Patent History  ·  Patent History
Description
CROSS-REFERENCE TO RELATED APPLICATION

The present application claims priority of Indian provisional application serial number 3286/C14E12012 filed on Aug. 10, 2012, and that application is incorporated in its entirety at least by reference.

BACKGROUND

1. Technical Field

The embodiments herein generally relate to data mining and particularly relates to a mining patterns from structured, unstructured and semi-structured data from heterogeneous sources. The embodiments herein more particularly relates to a system and method for mining patterns in relationship sequences extracted from big data.

2. Description of the Related Art

Information explosion within and outside the organization has led to exponential increase in unstructured data, while the systems currently used are especially meant for processing structured data. With the advent of big data systems such as columnar databases, map reduce frameworks such as Hadoop, it is now possible to store heterogeneous data at one point. A big data is any one or a combination of an unstructured data source, a semi-structured data source and a structured data source. However, making information available for analytics or deriving new perspectives from big data to enable analytics is something that is not understood clearly yet.

Pattern mining in structured data is a fairly well understood problem; however, pattern mining on unstructured data is much less understood. The approaches for pattern mining in structured and unstructured data are completely different. In both structured and unstructured pattern mining, the co-occurrence of entities decides the pattern, given that the entities can share multiple relationships among themselves. However, the co-occurrence of entities alone does not ensure that patterns are bound to the correct context.

One of the existing prior art provides a system and method for the automatic mining of new relationships which employs the use of “association rule mining” in discovering new relationships. The “association rule mining” technique basically uses the co-occurrence of words that are used to describe a relationship to find new relationships. The objective of this prior art is to discover new relationships between entities given that a statistical module asserts the significance of the relationship, and a relationship does not match existing relationships between the pair of entities already in the relation database. The prior art adds new relationships after trying to resolve it with the existing relationships.

Another existing prior art provides a state of art method for effective pattern discovery for text mining which follow a term based approach for closed sequence pattern mining. This effort too examines sequences that are formed by term occurrences. The prior art considers only the text data and the method of extracting patterns cannot be extended to other forms of data. Also, the prior art limits itself to mining patterns in entity space, where every term is considered as entity.

There exist many limitations in existing prior arts which explain pattern mining in relationship space. The existing systems attempt to perform pattern-mining on either structured or on unstructured data and not on amalgamation of both. Also, the approach of existing pattern-mining is based on co-occurrence of two or more entities, and mines patterns in entity space only. These methods do not ensure contextual resolution of entities, as same entities can co-occur in different contexts he existing pattern-mining approaches do not mine patterns upon resolution of both entities and relationships, although certain aspects of entity resolution have been addressed. Further, many forms of representation of relationships that occur between entities are rather complex and require expensive logical inference mechanism for realizing a hierarchy of the relationships. In unstructured data context, it is important to arrive at a suitable representation of relationship that facilitates easy resolution of relationships.

In view of the foregoing, there is a need for a system and method for mining patterns in relationship sequences extracted from big data. There is also a need for system and method for finding patterns based on co-occurring relationships. Further there exists a need for a system and method which can extract frequent patterns in relationship space from relationship sequences.

The abovementioned shortcomings, disadvantages and problems are addressed herein and which will be understood by reading and studying the following specification.

SUMMARY

The primary object of the embodiments herein is to provide a system and method for mining patterns in a relationship space from a collection of structured, unstructured and semi-structured, data.

Another object of the embodiments herein is to provide, a system and method for enabling pattern extraction in relationship space by storing entities and relationships, and maintaining entity hierarchy and relationship hierarchy respectively.

Yet another object of the embodiments herein is to provide a system and method for building relationship sequences from heterogeneous data sources to represent the order in which the relationships occur to facilitate pattern mining.

Yet another object of the embodiments herein is to provide a system and method for extracting relevant relationship sequences from stored relationships using entity and relationship hierarchies for pattern-mining.

Yet another object of the embodiments herein is to provide a system and method for generating most frequent patterns in relationship space from relationship sequences.

Yet another object of the embodiments herein is to provide a system and method for deriving new perspectives from big data to enable analytics of the derived data.

These and other objects and advantages of the present invention will become readily apparent from the following detailed description taken in conjunction with the accompanying drawings.

The various embodiments herein provide a system for mining frequent patterns in relationship space from a plurality of relationship sequences extracted from a big data. The system comprising a data repository for collecting and storing the big data, an entity store for collecting and storing a plurality of entities from the big data, an entity hierarchy for representing a hierarchical structure re of entities, a relationship store for collecting and storing relationship instances between the plurality of entities from the big data, a relationship hierarchy for representing a hierarchical structure of relationships, a language/domain model for organizing entities and relationships in a hierarchical manner, a Pattern Query Processing Module (PQPM) for processing a pattern query related, to finding patterns in relationships and entities, a Pattern Generation Module (PGM) to generate frequent patterns from one or more relationship sequences from the data sources collected based on the pattern query and a Frequent Pattern Display Module (FPDM) to provide a visual presentation of the mined patterns. The pattern generation module performs frequent pattern mining by gathering relevant data sources using the entity hierarchy and the relationship hierarchy. It generates relationship sequences with respect to each of the data source and extracts the most frequent patterns in the collection of relationship sequences.

According to an embodiment herein, the big data comprises structured, unstructured and semi-structured data from heterogeneous data sources.

According to an embodiment herein, the entity store is a collection of entities extracted from the big data. The entity store stores specific information with respect to each entity.

According to an embodiment herein, the entity hierarchy represents a hierarchical structure of entities resolved using Natural Language Processing (NLP) techniques with a support of the Language and Knowledge Models.

According to an embodiment herein, the relationship store is adapted to store information related to each relationship instance.

According to an embodiment herein, the Relationship Hierarchy represents a hierarchical arrangement of relationships by resolving, the relationships through at least one of a word-sense disambiguation technique, syntactic resolution and semantic resolution in conjunction with the language/domain model for context resolution.

According to an embodiment herein, the Pattern Query Processing Module (PQPM) processes the pattern query by expanding the pattern query in terms of entities after consulting with the entity store and the hierarchy of entities. The pattern query is a list comprising entities and relationships of the entities.

According to an embodiment herein, the Pattern Query Processing Module (PQPM) performs a query expansion of the pattern query to provide a relevant result by disambiguation and resolution of the entities in the pattern query. The disambiguation of the entities in the pattern query is conducted by identifying explicit and implicit similar entities and ignoring the dissimilar entities.

According to an embodiment herein, the Pattern Generation Module (PGM) comprises a document retriever to collect documents pertaining to the entities and relationships suggested by the query expansion, a Relationship Sequence Generator to create a relationship sequence with respect to each of the retrieved documents, and a Frequent Pattern Growth Module (FPGM) for extracting relevant relationship sequences.

According to an embodiment herein, the Relationship Sequence Generator builds the relationship sequences by treating each relationship as an item. Each relationship sequence comprises the relationships in the order of appearance in the data source.

According to an embodiment herein, the Frequent Pattern Growth Pattern-Mining Module (FPGM) adapts a Frequent Pattern Growth (FPG) algorithm for extracting relevant relationship sequences which considers the relationship sequences as item-sets and extracts the most frequent item-sets.

The embodiments herein further provide a method for mining frequent patterns from a plurality of relationship sequences extracted from a big data. The method comprising, extracting a plurality of entities from the big data, storing the extracted plurality of entities in an entity store, extracting and storing one or more relationships among the plurality of entities, building an entity hierarchy by arranging the plurality of entities in a hierarchical manner, creating a relationship hierarchy by arranging the relationships in a hierarchical manner, inputting a pattern query; where the pattern query is a list of entities and the relationship of entities, processing the pattern query to find patterns in relationships and entities, retrieving relevant data sources from data using the entity hierarchy and the relationship hierarchy based on the pattern query, building relationship sequences with respect to one or more retrieved data sources and extracting frequent patterns from the relationship sequences and displaying the frequent patterns on a frequent pattern display module.

According to an embodiment herein, the big data comprises structured, unstructured and semi-structured data from heterogeneous data sources for enabling data analysis on a single view.

According to an embodiment herein, generating frequent patterns among the relationship sequences is performed using a Frequent Pattern Growth Algorithm which considers the relationship sequences as item-sets and extracts the most frequent item-sets.

According to an embodiment herein, the method of extracting frequent patterns comprises collecting data sources pertaining to one or more entities and relationships contained in a pattern query, building a relationship sequence pertaining to each of the data source by handling each relationship as an item in an item-set that represents a relationship sequence, building a relationship sequence in an order the relationships appear in the document and identifying the frequent relationship sequences,

According, to an embodiment herein, the method of processing the pattern query comprises extracting the hierarchy of the plurality of entities, expanding the pattern query in terms of entities based on the entity hierarchy and expanding the pattern query in terms of relationships based on the relationship hierarchy.

According to an embodiment herein, expanding the pattern query in terms of entity comprises disambiguating the entities the pattern query, including synonyms and implied entities in the query expansion and perforating context resolution by including similar entities and discarding dissimilar entities.

According to an embodiment of the present invention, expanding the pattern query in terms of relationships comprises resolving relationships according to the context, including the relationship which implies context similarity, including the relationships that are implied within the syntactic and semantic similarity and discarding the semantically and syntactically dissimilar relationships.

These and the other aspects of the embodiments herein will be better appreciated and understood when considered in conjunction with the following description and the accompanying drawings. It should be understood, however, that the following descriptions, while indicating preferred embodiments and numerous specific details thereof, are given by way of illustration and not of limitation. Many changes and modifications may be made within the scope of the embodiments herein without departing from the spirit thereof, and the embodiments herein include all such modifications.

BRIEF DESCRIPTION OF THE DRAWINGS

The other objects, features and advantages will occur to those skilled in the art from the following description of the preferred embodiment: and the accompanying drawings in which:

FIG. 1 is a block diagram illustrating a system for frequent pattern mining in relationship space, according to an embodiment of the present disclosure.

FIG. 2 illustrates a flow chart of a method for performing frequent pattern mining in relationship space, according to an embodiment of the present disclosure.

FIG. 3 is a flow diagram illustrating a method for extracting frequent patterns, according to an embodiment of the present disclosure.

FIG. 4 is a flow chart illustrating a method for processing the pattern query, according to an embodiment of the present disclosure.

Although the specific features of the present invention are shown in some drawings and not in others. This is done for convenience only as each feature may be combined with any or all of the other features in accordance with the present invention.

DETAILED DESCRIPTION OF DRAWINGS

In the following detailed description, a reference is made to the accompanying drawings that form a part hereof, and in which the specific embodiments that may be practiced is shown by way of illustration. These embodiments are described in sufficient detail to enable those skilled in the art to practice the embodiments and it is to be understood that the logical, mechanical and other changes may be made without departing from the scope of the embodiments. The following detailed description is therefore not to be taken in a limiting sense.

The various embodiments herein provide a system for mining frequent patterns in relationship space from a plurality of relationship sequences extracted from a big data. The system comprising a data repository for collecting and storing the big data an entity store for collecting and storing a plurality of entities from the big data an entity hierarchy that represents a hierarchical structure of entities, a relationship store for collecting and storing relationship instances between the plurality of entities from the big data, a relationship hierarchy that represents a hierarchical structure of relationships and a language/domain model for organizing entities and relationships in a hierarchical manner. The system further comprises a Pattern Query Processing Module (PQPM) far processing, a pattern query related to finding patterns in relationships and entities, a Pattern Generation Module (PGM) to generate frequent patterns from one or more relationship sequences from the data sources collected based on the pattern query and a Frequent Pattern Display Module (FPDM) to provide a visual presentation of the mined patterns. The pattern generation nodule performs frequent pattern mining by extracting relevant relationship sequences from the relationship store using the entity hierarchy and the relationship hierarchy.

The big data comprises structured, unstructured and semi-structured data from heterogeneous data sources for enabling data analysis on a single view.

The entity store is a collection of entities extracted from the big data. The entity store stores specific information that enables in distinguishing with one or more entities to retrieve one or more documents containing relevant entities corresponding to the pattern query. The entity hierarchy represents a hierarchical structure of entities resolved using Natural Language Processing (NLP) techniques with the support of the language/domain models.

The relationship store is adapted to store information related to each relationship instance for distinguishing with one or more relationship instances. The Relationship Hierarchy represents a hierarchical arrangement of relationships by resolving the relationships through at least one of a word-sense disambiguation technique and context resolution technique in conjunction with the language/domain model.

The Pattern Query Processing Module (PQPM) processes the pattern query by expanding the pattern query in terms of entities alter consulting with the entity store and the hierarchy of the entity. The pattern query is a list comprising entities and relationships of the entities.

The Pattern Query Processing Module (PQPM) performs a context resolution of the pattern query to provide a relevant result by disambiguation of the entities in the pattern query. The disambiguation of the entities in the pattern query is conducted by considering synonyms and implied entities obtained during expansion of pattern query where similar entities are included and dissimilar entities are excluded.

The Pattern Generation Module (PGM) comprises a document retriever to collect documents pertaining to the entities and relationships contained in the pattern query. A Relationship Sequence Generator to create a relationship sequence with respect to each of the retrieved documents. A Frequent Pattern Growth Module (FPGM) for extracting relevant relationship sequences.

The Relationship Sequence Generator builds the relationship sequences by treating each relationship as an item. Each relationship sequence comprises the relationships in the order of appearance in the document.

The Frequent Pattern Growth Module (FPGM) adapts a Frequent Pattern Growth (FPG) algorithm for extracting relevant relationship sequences which considers the relationship sequences as item-sets and extracts the most frequent item-sets.

The method for mining frequent patterns from a plurality of relationship sequences extracted from a big data. The method comprising, extracting a plurality of entities from the big data. An entity refers to concepts comprising language unit having an independent meaning. The plurality of entities extracted from the big data is stored in an entity store and the extracted entities are arranged in a hierarchical manner. Similarly one or more relationships among the plurality of entities are extracted and stored and a relationship hierarchy is created by arranging the relationships in a hierarchical manner. Further a pattern query is inputted to a pattern query recognition module which processes the pattern query to find patterns in relationships and entities, retrieve relevant data sources from data using the entity hierarchy and the relationship hierarchy based on the pattern query, build relationship sequences with respect to one or more retrieved data sources, extract frequent patterns from the relationship sequences and display the frequent patterns on a frequent pattern display module. The pattern query is a list of entities and the relationship of entities. Here generating frequent patterns among the relationship sequences is performed using a Frequent Pattern Growth Algorithm which considers the relationship sequences as item-sets and extracts the most frequent item-sets.

The method of extracting frequent patterns comprises collecting data sources pertaining to one or more entities and relationships contained in a pattern query. Then relationship sequence pertaining to each of the data source is built by handling each relationship as an item in an item-set that represents a relationship sequence. Further relationship sequence is built in an order the relationships appear in the document and finally the frequent relationship sequences are identified.

Similarly the method of processing the pattern query comprises extracting the hierarchy of the plurality of entities expanding the pattern query in terms of entities based on the entity hierarchy and expanding the pattern query in terms of relationships based on the relationship hierarchy.

Here the pattern query in terms of entity comprises disambiguating the entities the pattern query, including synonyms and implied entities in the query expansion and performing context resolution by including similar entities and discarding dissimilar entities. Similarly, expanding the pattern query in terms of relationships comprises resolving relationships according to the context, including the relationship which implies context similarity, including the relationships that are implied within the syntactic similarity and discarding the contextually and syntactically dissimilar relationships.

FIG. 1 is a block diagram illustrating a system for frequent pattern mining in relationship space, according to an embodiment of the present disclosure. The system comprises a data repository 101, a Language/Domain Models 102, an entity store 103, an entity hierarchy 104, a relationship store 105, a relationship hierarchy 106, aquery interface 107, a Pattern Query Processing Module (PQRM) 108, a Pattern Generation Module (PGM) 109 and a Frequent Pattern Display Module (FPDM) 110.

The data repository 101 is adopted for collecting and storing big data. The big data is a collection of all forms of data comprising structured, semi-structured and unstructured data from heterogeneous sources and a language/domain model 102 to resolve and organize entities and relationships in a hierarchy. The language/domain model 102 is used to disambiguate sense in an unstructured data. The language/domain model 102 also disambiguates sense in the structured and semi-structured data contexts from data repository 101.

The entity store 103 is a collection of entities extracted from the data repository 101. The entity store 103 also stores certain specific information relating to entities that helps in distinguishing other entities. The entity store 103 is used only to retrieve the documents containing the relevant entities corresponding to a pattern query 108. The entity hierarchy 104 is built using the Language/Domain Model 102. The entity hierarchy is a hierarchical structure of entities that is built using Natural Language Processing (NLP) techniques with the support of the Language/Domain Model 102. The the Language/Domain Model 102 is used to resolve and organize entities and relationships in a hierarchy. The Language/Domain Model 102 is especially used to disambiguate sense in an unstructured, it is useful to disambiguate sense in the structured and semi-structured data contexts also. After generation of the entity hierarchy, the entity hierarchy is made available to a pattern query Processing Module (PQRM) 108.

The relationship store 105 includes a collection of relationship instances that also stores certain information specific to relationship instances. The relationship hierarchy 106 is a hierarchical arrangement of relationships that are contextually resolved by word-sense disambiguation with the help of the Language/Domain Model 102. The relationship store 105 and the relationship hierarchy 106 functions in conjunction with the Pattern Query Processing Module (PQRM) 108.

The Pattern Query Processing Module (PQPM) 108 receives a pattern query inputted through a query interface 107 and performs processing as per the required information. The pattern query comprises a list of entities and relationships. The PQPM 108 consults the entity store 103 and the entity hierarchy and expands the pattern query in terms of entities. This entity expansion process involves disambiguating the entities in the pattern query, including the synonyms and implied entities in query expansion, making a context resolution to include the similar and exclude the dissimilar entities.

The Pattern Generation Module (PGM) 109 comprises a Document Retriever 109a, a Relationship Sequence Generator 109b and a Frequent Pattern Growth Pattern Mining Module (FPGMM) 109c. The document retriever 109a collects all documents pertaining to the entities/relationships contained in the pattern query. The Relationship Sequence Generator 109b generates a relationship sequence with respect to each of document or data by treating each relationship as an item. The Relationship Sequence Generator 109b builds a relationship sequence in the order of appearance in the document. The Frequent Pattern Growth Pattern-mining module (FPGMM) module uses a Frequent Pattern Growth algorithm (FPG) for processing the pattern query. The FPG algorithm treats the relationship sequences like item-sets and extracts the most frequent item-sets/relationship sequences. The Frequent Pattern Display Module (FPDM) 110 provides for in visualizing the most frequent patterns extracted from relationship sequences in conjunction with the entity.

FIG. 2 illustrates a flow chart of a method for performing frequent pattern mining in relationship space, according to an embodiment of the present disclosure. The method comprises frequent pattern mining in relationship space. In particular, the method comprises processing of big data for recognizing plurality of entities. The plurality of entities are then extracted and stored in an entity store. The entity store, stores meaningful entities extracted out of big data irrespective of the form from which the entity originates (201). Entities are objects that make independent sense. Entities are a named and unnamed object which includes names of living and non living things, concepts, theories or simply the language units that make independent sense. Entities is any one of named entities such as names of places, people etc., or concepts that is represented by one or more terms (example, “Purchase power’, ‘Purchase’ as noun and ‘Purchase’ as verb is three different concepts). In brief, the entity refers to named entities and concepts (language unit with independent meaning). An entity hierarchy is then built by arranging, the plurality of entities in a hierarchical manner (202). Further a set of relationships among a plurality of entities is extracted and stored in a relationship store (203), and a relationship hierarchy is created by arranging the relationships in a hierarchical manner (204).

The method involves the use of the entity hierarchy arid the relationship hierarchy during response to the pattern query. In case of a pattern query, the pattern query is inputted to a Pattern Query Processing Module (PQPM) for finding frequent patterns related to entities and relationships in the query (205).

The document collector collects the documents that are relevant to the pattern query (206). Based on the contents of the pattern query, the Relationship Sequence Generator generates a relationship sequence for each of the retrieved document (207). The PGM adopts a Frequent Pattern Growth Module (FPGM) for identifying the frequent patterns among the relationship sequences (208). Finally, the identified patterns are displayed on a Frequent Pattern Display Module (FPDM) (209).

FIG. 3 is a flow diagram illustrating a method for extracting frequent patterns, according to an embodiment of the present disclosure. The method comprises receiving a pattern query in a pattern query Processing Module (PQPM). The PQPM processes the pattern query and communicates with a Pattern Generation Module (PGM). The PGM comprises three subunits as Document Retriever, a Relationship Sequence Generator and a Frequent Pattern Growth Module (FPGM). Once the PGM receives the command from the PQPM, the document retriever starts collecting, one or more documents (301). The one or more documents are related to the one or more entities and relationships contained in the pattern query. Once the related documents are collected, the Relationship Sequence Generator builds a relationship sequence in an order in which the relationships appear in the document (302). The relationship sequences that appear like “item-sets” enable frequent item set mining. The item-sets comprise relationship sequences in an orderly manner for easy processing. Once an ordered item-set is built, the Frequent Pattern Growth Module (FPGM) mines for the required pattern as desired by the pattern query (303). The result of the frequent relationships sequences are then displayed by Frequent Pattern Display Module (FPDM).

FIG. 4 is a flow chart illustrating a method for processing the pattern query, according to an embodiment of the present disclosure. The pattern query is raised by a user which is inputted to a Pattern Query Processing Module (PQPM). Depending on the content of the pattern query, the PQPM expands the pattern query in terms of entities on referring the entity list and the entity hierarchy (401). Expanding the pattern query in terms of entity includes steps of disambiguating the entities in the pattern query, including synonyms and implied entities in the query expansion and performing context resolution by including similar entities and discarding dissimilar entities. The PQPM then expands the pattern query in terms of relationships based on the relationship hierarchy (402). Here expanding the pattern query in terms of relationship includes resolving relationships according to the context, including the relationships which implies context similarity, including the relationships that are implied within the syntactic similarity and discarding the contextually and syntactically dissimilar relationships.

The embodiments of the present invention disclose an approach that looks for patterns in the relationship space. The embodiments of the present disclosure, provides a robust approach to find patterns and ensures context resolution effectively. The entities and relationships among the entities assist in understanding the big data. All the entities and relationships are derived and collected. This collection of entities and relationships serves as input to all intelligent processing of data. Data mining and data analysis applications, forecasting, predictive analytics applications and machine learning applications make use of the patterns to learn further insights. The embodiments herein enable an enterprise that intends to facilitate processing of big data and build applications on top. The embodiment herein also allows building of domain specific, niche applications that harness big data. The embodiments herein provides immense benefit to following sectors but is not limited to retail, health and pharmaceutical services, banking and insurance.

The foregoing description of the specific embodiments will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to he comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Therefore, while the embodiments herein have been described in terms of preferred embodiments, those skilled in the art will recognize that the embodiments herein can be practiced with modification.

Claims

1. A system for mining frequent patterns in relationship space from a plurality of relationship sequences extracted from a big data, the system comprising:

a data repository for collecting and storing the big data;
an entity store for collecting and storing a plurality of entities from the big data;
an entity hierarchy for representing a hierarchical structure of entities;
a relationship store for collecting and storing relationship instances between the plurality of entities from the big data;
a relationship hierarchy for representing a hierarchical structure of relationships;
a language/domain model for organizing entities and relationships in a hierarchical manner;
a pattern query Processing Module (PQPM) for expanding a pattern query related to finding patterns in relationships and entities;
a Pattern Generation Module (PGM) to generate frequent patterns from one or more relationship sequences from the data sources collected based on the pattern query; and
a Frequent Pattern Display Module (FPDM) to provide a visual presentation of the mined patterns;
where the pattern generation module performs frequent pattern mining by extracting relevant relationship sequences from the relationship store using the entity hierarchy and the relationship hierarchy.

2. The system according to claim 1, wherein the big data comprises structured, unstructured and semi-structured data from heterogeneous data sources.

3. The system according to claim 1, wherein the entity store is a collection of entities extracted from the big data, wherein the entity store stores information specific to each entity.

4. The system according to claim 1, wherein the entity hierarchy is a hierarchical structure of entities resolved using Natural Language Processing (NLP) techniques with a support of the language/domain model.

5. The system according to claim 1, wherein the relationship store is adapted to store information related to each relationship instance.

6. The system according to claim 1, wherein the Relationship Hierarchy provides a hierarchical arrangement of relationships by resolving the relationships through at least one of a word-sense disambiguation technique, syntactic and semantic similarity and context resolution technique in conjunction with the language/domain model.

7. The system according to claim 1, wherein the pattern query Processing Module (PQPM) processes the pattern query by expanding, the pattern query in terms of entities after consulting the entity hierarchy, wherein the pattern query is a list comprising entities and relationships of the entities.

8. The system according to claim 1, wherein the pattern query Processing Module (PQPM) performs a expansion of the pattern query to provide a relevant result by disambiguation of the entities in the pattern query, where the disambiguation of the entities in the pattern query is conducted by identifying explicit and implicit similar entities and ignoring the dissimilar entities

9. The system according to claim 1, the Pattern Generation Module (PGM) comprises:

a document retriever to collect documents pertaining to the entities and relationships suggested by the query expansion;
a Relationship Sequence Generator to create a relationship sequence with respect to each of the retrieved documents;
a Frequent Pattern Growth Module (FPGM) for extracting relevant relationship sequences.

10. The system according to claim 9, wherein the Relationship Sequence Generator builds the relationship sequences by treating each relationship as an item, where each relationship sequence comprises the relationships in the order of appearance in the document.

11. The system according to claim 9, wherein the Frequent Pattern Growth Pattern-Mining Module (FPGM) adapts a Frequent Pattern Growth (FPG) algorithm for extracting relevant relationship sequences which considers the relationship sequences as item-sets and extracts the most frequent item-sets.

12. A method for mining frequent patterns from a plurality of relationship sequences extracted from a big data, the method comprising:

extracting a plurality of entities from the big data, where an entity refers to concepts comprising language unit having an independent meaning;
storing the extracted plurality of entities in an entity store;
extracting and storing one or more relationships among the plurality of entities;
building an entity hierarchy by arranging the plurality of entities in a hierarchical manner;
creating a relationship hierarchy by arranging the relationships in a hierarchical manner;
inputting a pattern query, where the pattern query is a list of entities and the relationship of entities;
expanding the pattern query to include most relevant entities and relationships and ignore irrelevant patterns and relationships;
retrieving relevant data sources from data using the pattern query;
building relationship sequences with respect to one or more retrieved data sources;
extracting frequent patterns from the relationship sequences; and
displaying the frequent patterns on a frequent pattern display module.

13. The method according to claim 12, wherein the big data comprises structured, unstructured and semi-structured data from heterogeneous data sources for enabling data analysis on a single view.

14. The method according to claim 12, wherein generating frequent patterns among the relationship sequences is performed using a Frequent Pattern Growth Algorithm which considers the relationship sequences as item-sets and extracts the most frequent item-sets.

15. The method according to claim 12, wherein the method of extracting frequent patterns comprises:

collecting data sources pertaining to one or more entities and relationships contained in a pattern query;
building a relationship sequence pertaining to each of the data source by handling each relationship as an item in an item-set that represents a relationship sequence;
building a relationship sequence in an order the relationships appear in the document; and
identifying the frequent relationship sequences.

16. The method according to claim 12, wherein the method of processing the pattern query comprises:

extracting the hierarchy of the plurality of entities;
expanding the pattern query in terms of entities based on the entity hierarchy; and
expanding the pattern query m terms of relationships based on the relationship hierarchy.

17. The method according to claim 16, expanding the pattern query in terms of entity comprises:

disambiguating the entities the pattern query;
including synonyms and implied entities in the query expansion; and
discarding dissimilar entities.

18. The method according to claim 16, expanding the pattern query in terms of relationships comprises:

resolving relationships according, to the context;
including the relationships which implies context similarity;
including the relationships that are implied within the syntactic similarity; and
discarding the contextually and syntactically dissimilar relationships.
Patent History
Publication number: 20140046977
Type: Application
Filed: Jan 31, 2013
Publication Date: Feb 13, 2014
Applicant: XURMO TECHNOLOGIES PVT. LTD. (BANGALORE)
Inventors: SRIDHAR GOPALAKRISHNAN (Bangalore), SUJATHA RAVIPRASAD UPADHYAYA (BANGALORE)
Application Number: 13/755,047
Classifications
Current U.S. Class: Data Mining (707/776)
International Classification: G06F 17/30 (20060101);