Abstract: According to embodiments of the present disclosure, a method and a distributed processing system are provided to discover statistically significant patterns from arbitrarily large data set by statistical analysis. The present disclosure provides new distributed system and algorithm of detecting statistical patterns of different orders. Also, the present disclosure provides effectively traversing data domain for pattern candidate generation that supports multi-agent distributed computation model. By increasing and decreasing the number of agents, the system is able to handle bigger or smaller problems. Further, the present disclosure provides a scheme of partitioning data in distributed storage more efficiently for statistical analysis.
Abstract: Representing a large amount of association patterns in the form of data events in a computer system is accomplished by use of a unified framework based on attributed hypergraph (AHG). Data relationships are stored as attributed hypergraphs in a computer or computer network, ready for querying and further analysis. This invention is simple yet general enough to directly encode association patterns of different orders discovered from large databases or raw data relations with arbitrary properties. Both qualitative relations (if A and B are related) and quantitative relations (A and B are related k % of the time) are represented as attributed hyperedges. Such representation is lucid and transparent for visualization. It supports ad hoc and complex associative queries while requiring no physical pre-design or restructure. Thus, a computer storage and retrieving system (e.g., a database) can be readily implemented to store and manipulate huge amounts of relations in accordance with an AHG representation.