METHOD AND SYSTEM FOR VISUALIZING INFORMATION EXTRACTED FROM BIG DATA
The various embodiments herein describe a method for providing information visualization comprising identifying a plurality of events from a big data, calculating a temporal distance between at least two events, calculating a semantic distance between the at least two events, storing the calculated semantic distance and the temporal distance with respect to the at least two events in a data structure, providing the semantic distance and the temporal distances calculated between the at least two events on a visual representation, identifying one or more relevant events with respect to the target event based on the semantic distance and the temporal distance through visualization, selecting a plurality of events that influence the target event and examining a pattern of the plurality of events to realize possible collinearity of the events to further reduce the influencing events in order to facilitate feature selection through visualization.
Latest XURMO TECHNOLOGIES PVT. LTD. Patents:
- Computer implemented system and method for investigative data analytics
- System and method for building relationship hierarchy
- SYSTEM AND METHOD FOR BUILDING RELATIONSHIP HIERARCHY
- SYSTEM AND METHOD FOR MINING PATTERNS FROM RELATIONSHIP SEQUENCES EXTRACTED FROM BIG DATA
- METHOD AND SYSTEM FOR BUILDING ENTITY HIERARCHY FROM BIG DATA
The present application claims priority of Indian provisional application serial number 3286/CHE/2012 filed on Aug. 10, 2012, and that application is incorporated in its entirety at least by reference.
BACKGROUND1. Technical Field
The embodiments herein generally relate to data mining and particularly relates to a method for extracting and processing events from a large collection of data. The embodiments herein more particularly relates to a method and system for visualizing information by realizing the influence of one or more events on a target event.
2. Description of the Related Art
An entity is an unit of data which has an independent self-explanatory meaning, and is also referred as an object that makes an independent sense. A relationship is a property which describes an association between two or more entities. The relationship between two or more entities helps in understanding the characteristics and behavior of the entities. An event is a relationship that occurs between a entity and a time entity, simply, it is a relationship with respect to time. The big data is a large collection of data that comes from structured, unstructured and semi-structured data sources. In an analytics context, the entities, relationships and events manifest as variables or features.
In big data analytics, the influence of other variables or features on a given variable is often studied to make a prediction of the value or state of the variable. This is a typical feature selection problem that is magnified in the context of big data analytics because of the large number of features/variables available. The existing technology discusses various feature selection techniques specific to different context. Feature selection is quite a complex process and therefore a large number of efforts have gone into addressing specific problems related to feature selection in different domains.
The current feature selection procedures are mostly based on machine learning/statistics/data mining techniques. All these efforts require a good understanding of machine learning, statistics techniques and also of the problem domain. However, the existing techniques do not provide a simple, generic strategy that can be applied to all contexts.
In big data analytics, the problem of predicting the occurrence of events often comes across. The occurrence of a given event is greatly influenced by the occurrences of many other events that happen simultaneously. However, not all of the events bear equal influence on the target event. To predict the occurrence or the state of a target event, it is important to identify the events that bear high influence on the target event. This reduces the problem to that of feature selection or dimension reduction. Feature selection typically requires utilization of domain knowledge combined with knowledge of statistics, data mining and machine learning.
Hence, there is a need for a method and system for visualizing the influence of one or more events on a target event interactively. Also, there is a need for a method and system for performing feature selection without the need for deep understanding of statistics, machine learning or the problem domain. Further, there is a need for a method and system for providing effective visualization of the feature selection. Moreover, there is a need for a method and system for enabling feature selection from various context perspectives.
The abovementioned shortcomings, disadvantages and problems are addressed herein and which will be understood by reading and studying the following specification.
SUMMARYThe primary object of the embodiments herein is to provide a method and system for visualizing the influence of one or more events on a target event interactively.
Also, there is a need for a method and system for providing a simple and intuitive method of feature selection which does not require an understanding of statistics, machine learning or the problem domain.
Another object of the embodiments herein is to provide a method and system which enables effective feature selection to identify the relevant events that influence the target event.
Another object of the embodiments herein is to provide a method and system for visualizing information which employs an information dimension for representing information based on semantic and temporal relatedness.
Another object of the embodiments herein is to provide a method and system for computing semantic distance and temporal distance between one or more influencing events and a target event.
Another object of the embodiments herein is to provide a method and system for representing information on semantic vs. temporal distance axes.
These and other objects and advantages of the present embodiments will become readily apparent from the following detailed description taken in conjunction with the accompanying drawings.
The various embodiment herein describe a method for providing information visualization comprising, identifying a plurality of events from a big data, calculating a temporal distance between at least two events, calculating a semantic distance between the at least two events, storing the calculated semantic distance and the temporal distance with respect to the at least two events in a data structure, providing the semantic distance and the temporal distances calculated between the at least two events on a visual representation, identifying one or more relevant events with respect to the target event based on the semantic distance and the temporal distance through visualization, selecting a plurality of events that influence the target event and examining a pattern of the plurality of events to realize possible collinearity of the events to limit the influencing events considered for feature analysis.
According to an embodiment herein, realizing the collinearity of the events comprises identifying at least two events sharing an exact relationship and reducing the number of variables to ensure effective feature selection.
According to an embodiment herein, an event is defined as a relationship which occurred at an instant of time.
According to an embodiment herein, the method for providing information visualization further comprises selecting a new target event and re-visualizing the influence of the selected features on the new target event.
According to an embodiment herein, the feature selection is defined as selecting a plurality of variables having an influence on the occurrence of the event.
According to an embodiment herein, calculating the temporal distance between at least two events comprise computing temporal correlation between the at least two events. The temporal correlation is measure of correlation of the events across time.
According to an embodiment herein, the temporal distance is calculated on the basis of time series data obtained from big data.
According to an embodiment herein, calculating the semantic distance between the two events comprises at least one of measuring a contextual distance between one or more words if the plurality of events are described by the words, measuring a contextual similarity or a semantic similarity as provided by a domain model and a language model and measuring the contextual or the semantic similarity based on analyzing the relationships and the entities.
According to an embodiment herein, the semantic distance is calculated using structured data and an unstructured data or a combination of structured data and unstructured data.
According to an embodiment herein, the method for providing information visualization further comprises storing the semantic distance and the temporal distances between the at least two events in a data structure capable of storing both the semantic distances and the temporal distances separately.
According to an embodiment herein, a value corresponding to the semantic distance and the temporal distance is in a preset numerical range or correspond to a discrete values range, with an event being closest to itself both temporally and semantically.
According to an embodiment herein, the representation of the visualization comprises events represented by a predefined shape, negatively co-related events differentiated by the shape, different events or event types represented by different colors, highly influential events arranged around the target event and less influential events arranged away from the targeted event as they approach the origin.
According to an embodiment herein, the method for providing information visualization further comprises setting limits for semantic distances and the temporal distances for the plurality of events influencing the target event and selecting the events falling in the defined limit as the highly influential events.
The various embodiments herein describe a system for providing information visualization comprising an event extractor to extract a plurality of events from a big data, a semantic distance estimator to calculate a temporal distance between at least two events. The system further comprises a temporal distance estimator to calculate a temporal distance between the at least two events, a data structure to store the calculated temporal distance and the semantic distance with respect to the a pair of events, an user interface provided on a user device to display one or more relevant events with respect to the target event based on the semantic distance and the temporal distance and provide an interactive input to the data structure to select a plurality of events that influence the target event, and display unit for visualizing the influence of the selected features on the target event.
According to an embodiment herein, the temporal distance estimator calculates the temporal distance between a pair of events by computing a temporal correlation between the a pair of events across time.
According to an embodiment herein, the semantic distance estimator calculates the semantic distance between the two events by at least one of measuring a contextual distance between one or more words if the plurality of events are described by the words, measuring a contextual similarity or a semantic similarity as provided by a domain model and a language model and measuring the contextual similarity or the semantic similarity based on analyzing the relationships and the entities.
According to an embodiment herein, the semantic distances and the temporal distances are stored in a data structure which allows separate storage and easy retrieval of each of the semantic distance and the temporal distance with respect to an event pair.
These and other aspects of the embodiments herein will be better appreciated and understood when considered in conjunction with the following description and the accompanying drawings. It should be understood, however, that the following descriptions, while indicating preferred embodiments and numerous specific details thereof are given by way of illustration and not of limitation. Many changes and modifications may be made within the scope of the embodiments herein without departing from the spirit thereof, and the embodiments herein include all such modifications.
The other objects, features and advantages will occur to those skilled in the art from the following description of the preferred embodiment and the accompanying drawings in which:
Although the specific features of the present embodiments are shown in some drawings and not in others. This is done for convenience only as each feature may be combined with any or all of the other features in accordance with the present embodiments.
DETAILED DESCRIPTION OF THE DRAWINGSIn the following detailed description, a reference is made to the accompanying drawings that form a part hereof, and in which the specific embodiments that may be practiced is shown by way of illustration. These embodiments are described in sufficient detail to enable those skilled in the art to practice the embodiments and it is to be understood that the logical, mechanical and other changes may be made without departing from the scope of the embodiments. The following detailed description is therefore not to be taken in a limiting sense.
The various embodiment herein describe a method for providing information visualization comprising, identifying a plurality of events from a big data, calculating a temporal distance between at least two events, calculating a semantic distance between the at least two events, storing the calculated semantic distance and the temporal distance with respect to the at least two events in a data structure, providing the semantic distance and the temporal distances calculated with respect to a target event on a visual representation, identifying one or more relevant events with respect to a target event based on the semantic distance and the temporal distance through visualization, selecting a plurality of events that influence the target event, and examining a pattern of the plurality of events to realize possible collinearity among events to reduce number of influencing events to assist feature selection. Here the collinearity of the plurality of events is suggested by approximately same semantic and temporal distances on the visualization scheme. The collinear events as defined herein are likely to have the same semantic and temporal distance from the target event and hence appear very close to each other or one on top of the other in visualization.
The method for providing information visualization further comprises selecting a new target event and re-visualizing the influence of the plurality of events on the new target event. Here an event is defined as a relationship which occurred at an instant of time and feature selection is defined as selecting a plurality of variables having an influence on the occurrence of the event.
The method of calculating the temporal distance between at least two events comprise computing temporal correlation between the at least two events. The temporal correlation is measure of correlation of the events across time. The temporal distance is calculated on the basis of time series data obtained from one of a structured data, an unstructured data or a combination of structured data and unstructured data.
The method of calculating the semantic distance between the two events comprises at least one of measuring a contextual distance between one or more words if the plurality of events are described by the words, measuring a contextual similarity or a semantic similarity as provided by a domain model and a language model and measuring the contextual or the semantic similarity based on analyzing the relationships and the entities. The semantic distance is calculated on the basis of structured data and an unstructured data or a combination of structured data and unstructured data.
The method for providing information visualization further comprises storing the semantic distance and the temporal distances between the at least two events in a data structure capable of storing both the semantic distances and the temporal distances separately.
The value corresponding to the semantic distance and the temporal distance is in a preset numerical range or correspond to a discrete values range, with an event being closest to itself both temporally and semantically.
The representation of the visualization comprises events represented by a predefined shape, negatively co-related events differentiated by the shape, different events represented by different colors, different type of events represented by different colors, highly influential events arranged around the target event, and less influential events arranged away from the targeted event as they approach the origin.
The method for providing information visualization further comprises setting limits for semantic distances and the temporal distances for the plurality of events influencing the target event and selecting the events falling in the pre-set limit as the highly influential events.
The various embodiments herein describe a system for providing information visualization comprising an event extractor to extract a plurality of events from a big data, a semantic distance estimator to calculate a temporal distance between at least two events. At least one event among the events is a target event. The system further comprises a temporal distance estimator to calculate a temporal distance between the at least two events, a data structure to store the calculated semantic distance and the semantic distance with respect to the at least two events, an user interface provided on a user device to display one or more relevant events with respect to the target event based on the semantic distance and the temporal distance and provide an interactive input to the data structure to select a plurality of events that influence the target event, and display unit for visualizing the influence of the selected features on the target event.
The temporal distance estimator calculates the temporal distance between at least two events by computing a temporal correlation between the at least two events across time.
The semantic distance estimator calculates the semantic distance between the two events by at least one of: measuring a contextual distance between one or more words if the plurality of events are described by the words, measuring a contextual similarity or a semantic similarity as provided by a domain model and a language model and measuring the contextual similarity or the semantic similarity based on analyzing the relationships and the entities.
The semantic distances and the temporal distances are stored in a data structure which allows separate storage and easy retrieval of each of the semantic distance and the temporal distance with respect to an event-target event pair.
According to an embodiment herein, the data structure storing the semantic distance and the temporal distance with respect to an event-target event pair is in a matrix form, in which the lower and upper triangles hold semantic and the temporal distances respectively. The values corresponding to the semantic and temporal distances in the matrix form vary from zero to one, where one being the closest representing the semantic/temporal distance of an event to itself.
The embodiments herein provide a simpler, intuitive, interactive and easy to deploy visualization process. Also, the interactive visualization uses the semantic and temporal distances to identify the most relevant variables based on the guiding principles of temporal and semantic distances. This interactive visualization procedure works on most simple, heuristic and intuitive principles making the understanding of results easy. The results are displayed with a few clicks, without requiring user input and visualization of influences on an event is made more interactive and intuitive. While the computations of semantic and temporal distances are complex, the complexities are hidden from the user. The embodiments of the present disclosure provide immense benefit in Retail, Healthcare Education, Governance, etc.
The foregoing description of the specific embodiments will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Therefore, while the embodiments herein have been described in terms of preferred embodiments, those skilled in the art will recognize that the embodiments herein can be practiced with modification.
Claims
1. A method for providing information visualization comprises:
- identifying a plurality of events from a big data;
- calculating a temporal distance between at least two events;
- calculating a semantic distance between the at least two events;
- storing the calculated semantic distance and the temporal distance with respect to the at least two events in a data structure;
- providing the semantic distance and the temporal distances calculated between the at least two events on a visual representation;
- identifying one or more relevant events with respect to the target event based on the semantic distance and the temporal distance through visualization;
- selecting a plurality of events that influence the target event; and
- examining a pattern of the plurality of events to realize a collinearity of the events to limit the influencing events considered for feature analysis.
2. The method of claim 1, wherein realizing the collinearity of the events comprises identifying at least two events sharing an exact relationship and reducing the number of variables to facilitate effective feature selection.
3. The method of claim 1, further comprises:
- selecting a new target event; and
- re-visualizing the influence of the selected features on the new target event.
4. The method of claim 1, wherein an event is defined as a relationship which occurred at an instant of time.
5. The method of claim 1, wherein the feature selection is defined as selecting a plurality of variables having an influence on the occurrence of the event.
6. The method of claim 1, wherein calculating the temporal distance between at least two events comprises computing temporal correlation between the at least two events; wherein the temporal correlation is a measure of correlation of the events across time.
7. The method of claim 1, wherein the temporal distance is calculated on the basis of time series data obtained from one of a structured data, an unstructured data or a combination of structured data and unstructured data.
8. The method of claim 1, wherein calculating the semantic distance between the two events comprises at least one of:
- measuring a contextual distance between one or more words if the plurality of events are described by the words;
- measuring a contextual similarity or a semantic similarity as provided by a domain model and a language model; and
- measuring the contextual or the semantic similarity based on analyzing the relationships and the entities.
9. The method of claim 1, wherein the semantic distance is calculated on the basis of structured data and an unstructured data or a combination of structured data and unstructured data.
10. The method of claim 1, further comprises storing the semantic distance and the temporal distances between the at least two events in a data structure capable of storing both the semantic distances and the temporal distances separately.
11. The method of claim 1, wherein a value corresponding to the semantic distance and the temporal distance is in a preset numerical range or correspond to a discrete values range, with an event being closest to itself both temporally and semantically.
12. The method of claim 1, wherein the representation of the visualization comprises:
- events represented by a predefined shape;
- negatively co-related events differentiated by the shape;
- different events represented by different colors;
- different event types represented by different colors;
- highly influential events rendered around the target event; and
- less influential events rendered away from the targeted event as they approach the origin.
13. The method of claim 1, wherein further comprises:
- setting limits for semantic distances and the temporal distances for the plurality of events influencing the target event; and
- selecting the events falling in a defined limit as the highly influential events.
14. A system for providing information visualization comprises:
- an event extractor to extract a plurality of events from a big data;
- a semantic distance estimator to calculate a semantic distance between at least two events;
- a temporal distance estimator to calculate a temporal distance between the at least two events;
- a data structure to store to store the calculated semantic distance and the temporal distance with respect to the at least two events;
- an user interface provided on a user device to: display one or more relevant events with respect the target event based on the semantic distance and the temporal distance; and provide an interactive input to the data structure to select selecting a plurality of events that influence the target event; and
- a display unit for visualizing the influence of the selected features on the target event.
15. The system of claim 14, wherein the temporal distance estimator calculates the temporal distance between at least two events by computing a temporal correlation between the at least two events across time.
16. The system of claim 14, wherein the semantic distance estimator calculates the semantic distance between the two events by at least one of:
- measuring a contextual distance between one or more words if the plurality of events are described by the words;
- measring a contextual similarity or a semantic similarity as provided by a domain model and a language model; and
- measuring the contextual similarity or the semantic similarity based on analyzing the relationships and the entities.
17. The system of claim 14, wherein the semantic distances and the temporal distances are stored in a data structure which allows separate storage and easy retrieval of each of the semantic distance and the temporal distance with respect to an event-target event pair.
Type: Application
Filed: Jan 31, 2013
Publication Date: Feb 13, 2014
Applicant: XURMO TECHNOLOGIES PVT. LTD. (BANGALORE)
Inventors: SRIDHAR GOPALAKRISHNAN (BANGALORE), SUJATHA RAVIPRASAD UPADHYAYA (BANGALORE)
Application Number: 13/755,059
International Classification: G06N 5/02 (20060101);