Method and system for detection, classification and prediction of user behavior trends

A method and system for detection, classification and prediction of user behavior trends using correspondence analysis is disclosed. The method and system reduces the n-dimensional feature space to lower dimensional space for easy processing, improved quality of emerging clusters and superior prediction accuracies. Further, the method applies the correspondence analysis so that each user is assigned with a new coordinate in the lower dimension which maintains a similarity, difference and the relationship between the variables. Once the correspondence analysis is completed, clustering or grouping of the coordinates based on the similar trends of the users is performed. Further, unlabeled cluster members are assigned class membership proportional to the labeled samples in the cluster. Finally, the method predicts the future actions of the users based on the past trends that are observed from the labeled clusters.

Skip to: Description  ·  Claims  · Patent History  ·  Patent History
Description

The present application is based on, and claims priority from, IN Application Number 2581/CHE/2013, filed on 13 Jun. 2013, the disclosure of which is hereby incorporated by reference herein.

TECHNICAL FIELD

Embodiments herein relate to the field of predictive analytics and more particularly relates to a method and system for detection, classification and prediction of behaviour trends using correspondence analysis.

BACKGROUND

In competitive business environments, companies frequently desire to forecast events that influence business metrics and performance indicators. Indeed, such ability is often important for effective decision making. Information obtained from accurate event forecast, results in more efficient operations and cost savings for the business. For example, the business that forecasts particular requirements in the near future can make profitable adjustments to its business practices based on this information. As another example, if the business can accurately predict potential failures or inefficiencies in the business process, then requirements can be analyzed to mitigate such failures.

By recognizing future trends, companies can potentially increase efficiency and gain competitive advantage. Accurate recognition of such trends also results in significant cost savings and improved business processes.

In certain business applications, there are many situations where the behavior of users should be predicted and analyzed for taking actions according to the behavioral trends. Further, the events generated by the users are sources of precious information about their behavior, interactions, preferences as well as temporal changes in their behavior and preferences. In the current scenario, the marketers are not able to take the advantage of the data related to the user that is available in large amounts. This prevents the service providers or marketers from providing accurate service personalization, customized personal offers and others based on the user behavior trends. In case of large data sets, it would be complex and expensive to predict behavior of each and every user at an individual level

The existing methods of trend recognition and predictions based on numerical time series data are based on individual users, where each user is treated as an independent entity. The representation as well as grouping of millions of users (for example users in a telecommunications network) based on such time-series data is an expensive option in terms of space and time complexity. The existing system lacks the mechanism for a low-dimension representation of the time series for global trending pattern of a data set.

BRIEF DESCRIPTION OF THE FIGURES

Embodiments herein are illustrated in the accompanying drawings, throughout which like reference letters indicate corresponding parts in the various figures. The embodiments herein will be better understood from the following description with reference to the drawings, in which:

FIG. 1 illustrates an overview for detection and classification of user behavior trends using correspondence analysis, according to the embodiments as disclosed herein;

FIG. 2 illustrates a flow diagram explaining the various steps involved in predicting the user behavior trends using the correspondence analysis, according to the embodiments as disclosed herein;

FIG. 3 depicts the process of reducing the dimensions of data, according to embodiments as disclosed herein;

FIG. 4 depicts the process of clustering, according to embodiments as disclosed herein;

FIG. 5 is a flowchart illustrating the process of optimizing campaigns and performing product bundling for a user based on clusters, according to embodiments as disclosed herein;

FIG. 6 is a graph showing the representation of users in a low dimensional feature space, according to the embodiments as disclosed herein;

FIG. 7 is a graph showing the grouping of users having similar trends over certain time period, according to the embodiments as disclosed herein; and

FIG. 8 illustrates a computing environment implementing the method and system for detection and classification of user behavior trends using correspondence analysis, according to the embodiments as disclosed herein.

DETAILED DESCRIPTION OF EMBODIMENTS

The embodiments herein and the various features and advantageous details thereof are explained more fully with reference to the non-limiting embodiments that are illustrated in the accompanying drawings and detailed in the following description. Descriptions of well-known components and processing techniques are omitted so as to not unnecessarily obscure the embodiments herein. The examples used herein are intended merely to facilitate an understanding of ways in which the embodiments herein can be practiced and to further enable those of skill in the art to practice the embodiments herein. Accordingly, the examples should not be construed as limiting the scope of the embodiments herein.

Provided herein is a scalable mechanism for grouping the users based on similar trends in n-dimensional space using correspondence analysis. The method provides a framework for clustering or grouping the users, representing their trends in an n-dimensional space, using correspondence analysis. The method reduces the n-dimensional feature space to a lower dimensional space for easy processing, better interpretation and for generating superior quality clusters, Further, the method applies the correspondence analysis so that each user is assigned with a new coordinate in the lower dimension which maintains a similarity, difference and the relationship between the variables.

Once the correspondence analysis is done, clustering or grouping of the coordinates based on the similar trends of the users is performed. Further, unlabeled cluster members are assigned class membership proportional to the labeled samples in the cluster. Finally, the method predicts the future actions of the users based on the past trends that are observed from the labeled clusters. Completely unlabeled clusters may be inspected by an administrator for the purpose of manual analysis, labeling and mapping to predicted trends and actions.

The embodiments herein achieve a method and system that provides a scalable mechanism for grouping the users based on similar trends in n-dimensional space using correspondence analysis.

Further, the method and system is applicable in the context of any user transaction based system (for example in a telecom network, banking system and so on). The method provides a framework for clustering or grouping the users representing similar trends in the n-dimensional space using correspondence analysis.

The correspondence analysis is used to recognize the trends or nature of the users on the basis of their numerical attributes as well as temporal variation of such attributes.

The method and system disclosed herein reduces the n-dimensional feature space to a lower dimensional space for easy processing and interpretation, without losing the trend information of each user, using correspondence analysis. Further, each user is assigned with a new coordinate in the lower dimension which maintains a similarity, difference and the relationship between the variables, as they existed in the higher dimensional space.

Once the correspondence analysis is done, clustering or grouping of the coordinates based on the similar trends of the users is performed.

Further, unlabeled cluster members are assigned class membership proportional to the labeled samples in the cluster. Finally, the method predicts the future actions of the users based on the past trends that are observed from the labeled clusters.

The principal object of the embodiments herein is to provide a scalable method and system for detection, classification and prediction of behaviour trends using correspondence analysis.

Another object of the embodiments herein is to provide a scalable method and system for effectively reducing the dimensional space using correspondence analysis on numerical multinomial data for reduction of complexity in cluster analysis and to improve quality of emerging clusters, along with superior prediction accuracies.

Referring now to the drawings and more particularly to FIGS. 1 through 5 where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments.

FIG. 1 illustrates an overview for detection and classification of user behavior trends using correspondence analysis, according to the embodiments as disclosed herein. As depicted in the FIG. 1, consider a group of users performing transactions with a system (for example a telecom network, banking system and so on). The transactions of the users are recorded in the network 101. Further, the network 101 maintains the raw transaction logs of all the users in a file server 102.

In an embodiment, the file server 102 comprises the files that have all the transactional details of the users.

The raw transactional logs that are present in the file server 102 are uploaded by scheduled data upload jobs orchestrated by cluster master 103 into a distributed file system 105.

Jobs orchestrated by cluster master 103 perform clustering of the users based on their transactions that are having similar trends, in a distributed fashion, over the worker nodes 104.

In an embodiment, the raw transactional logs having the n-dimensional feature space is reduced to a lower dimensional space for easy processing and interpretation, without losing the trend information of each user.

In an embodiment, correspondence analysis is used for trend recognition and dimensionality reduction of the raw transactional data of the users.

Typically, the correspondence analysis is a descriptive technique that is designed to analyze simple two-way and multi-way tables containing some measure of correspondence between rows and columns.

The correspondence analysis is used to recognize the trend of users on the basis of temporal variations of their numerical attributes. Each column of the correspondence table represents a numerical attribute and all the columns will be the observations of the same variable over time at different time instances.

In an embodiment, the cluster master 103 maintains the uploaded files over the worker nodes 104 and distributed file system 105 (any distributed file system or memory). The raw transactional data logs are distributed across multiple machines and the correspondence analysis is applied on the data.

Once the correspondence analysis is completed, clustering or grouping of the coordinates based on the similar trends of the users is performed. Further, unlabeled cluster members are assigned class membership proportional to the labeled samples in the cluster. Finally, the method predicts the future actions of the users based on the past trends that are observed from the labeled clusters.

The cluster master 103 further applies association rule mining on the clusters discovered in a lower dimensional space. The cluster master 103 further uses the discovered rules for user targeted applications, such as optimizing advertising campaigns, performing product bundling, pricing and so on.

The cluster master 103 may be a standalone device. The cluster master 103 may comprise of a plurality of devices, implemented using distributed architecture. The cluster master 103 may be implemented on the cloud.

FIG. 2 illustrates a flow diagram explaining the various steps involved in predicting the user behavior trends using the correspondence analysis, according to the embodiments as disclosed herein. As depicted in the flow diagram 200, initially the method obtains (201) the raw data from the network that corresponds to a particular domain. For example, the domain may include but is not limited to a telecommunications network or a banking system. In a telecom domain all the transactions are recorded and stored in a network and in a banking system, all the transactions of the users are stored in a bank server.

The data format which is used herein as an example is U=1, 2, 3 . . . u subjects, for each subject numerical value of the attribute at each time instance T=1, 2, 3 . . . t is measured, so in table format it will look like

T1 T2 T3 T4 . . . Tt User 1 X11 X12 X13 X14 . . . X1t User 2 X21 X22 X23 X24 . . . X2t . . . . . . . . . . . . . . . . . . . . . User u Xu1 Xu2 Xu3 Xu4 . . . X ut

Here Xij can be value of any numerical attribute observed at different time instances. Data in this case is of u*t dimension or each subject is measured in t-dimensional space.

The transactional data of the users can either be obtained from a network designed for storing such data (for example, the telecom network or the bank server). Once the transactional data (raw data) is obtained, the method performs (202) pre-processing and feature selection on the raw data. In an embodiment, the preprocessing and feature selection on the raw data comprises determining the attributes of the users. One such attribute of the user can be minutes of usage (may be usage of a network in telecommunications domain).

Further, the method obtains (203) trend data from the raw transactional logs. The trend data includes the values that changes over time.

Further, the method reduces (204) the dimensionality of the data format of the raw data (which is a multinomial data, n-dimensional), when the feature selection and trend data are obtained from the raw transactional logs using correspondence analysis (301, 302) (as depicted in FIG. 3). For data with low dimensionality, the new coordinates will be such that those users who are following similar trend in multidimensional time series domain will become closer to each other than that those who are dissimilar. In an example, consider users were in t-dimensional space, if the data can be mapped from t to 2 or 3-dimensional space without losing much information about the trend of the subscribers, then it will be easily interpretable and analyzable and efficiently represented in comparison to the data in t-dimensions.

Correspondence analysis is an exploratory data analysis technique for contingency tables and multivariate or multinomial data. Correspondence analysis also emphasizes on the graphical representation of the result in lower dimension for its easy interpretation, maintaining the similarity or dissimilarity between the rows and the column of the table. Embodiments herein apply correspondence analysis in applications where the trend of high dimensional user data with numerical multidimensional attributes of time series domain is required. Correspondence analysis is used to determine similarities and differences among the trends of users with respect to their behavior over time and depicting the same graphically in a joint low-dimensional space. Correspondence analysis assigns each user a co-ordinate in the lower dimension maintaining the similarity, difference and the relationship between the variables in rows and columns of the table, which means those rows which are similar in their trend will be close to each other in the new low dimensional space and those which are dissimilar will be some far apart. Correspondence analysis is based on the Eigen value of a matrix, so it can be used for dimension reduction similar to principal component analysis, which enables an easier interpretation of results. The similarity between users in the new low dimensional space can be graphically visualized.

In an embodiment, the correspondence analysis is used to recognize the trend of users (subscribers) on the basis of their numerical attributes. Once the correspondence analysis is applied, the correspondence table is generated. Each column of the correspondence table represents a numerical attribute and all the columns will be the observations of the same variable over time at different time instances.

In an embodiment, the method obtains the number of target dimensions (for example, it can be 2-dimensional or 3-dimensional based on the requirement) as an input for reducing the dimensionality of transactional data of the users.

Once the dimensionality of the data is reduced using correspondence analysis, the method performs (205) the clustering of the users attributes based on parameters to obtain unlabeled clusters based on trend similarity. Clustering of the users is performed to group the users having similar trends. In an embodiment, the method obtains clustering parameters for performing clustering of the users based on parameters. In an embodiment, standard clustering techniques such as DBSCAN (for density based clusters) and k-means clustering algorithm can be used for grouping the similar trends of the users in the lower dimension such that the users with similar trends will be grouped in the same cluster.

Embodiments disclosed herein first apply DBSCAN clustering to obtain (401) density based clusters. DBCCAN considers users whose trend differs from the majority of the users as noise because of their lesser density. To avoid loss of this data, the noise is further clustered (402) using k-means clustering algorithm, before the final clusters are obtained (403) (as depicted in FIG. 4).

The clusters formed in lower dimension retain the properties (similarities, differences and relationships) which were there in the n dimensional space.

Based on the trend similarity with the labeled samples, the clusters are assigned (206) labels based on label information of users according to the actions taken by them previously (historical data). The clusters may be further divided into classes based on at least one other feature and each user in the cluster may be assigned to be a member of at least one class. The users may be then assigned a confidence level for each predicted action, based on the class to which they belong.

Further, the method predicts (207) the future actions of the users based on the trends of attributes that are observed in the case of labeled samples. In an embodiment, the prediction step forecasts the future actions of users, based on the past trends of attribute values that are observed in the case of labeled samples. The prediction may be in the form of rules consisting of predicates and relationships among them along with augmented statistics such as confidence measures, indicating a degree of algorithmic confidence on each rule. For example, if there is a churn file that lists the users who are churned, and could make use of the trends exhibited by these users prior to churning to label other users who exhibit similar trends as potential churn candidates. Further, there can be multiple labeled lists corresponding to user actions that are observed in the past (for example churning, postpaid to prepaid switching and so on). In each of the unlabeled clusters that emerge, the number of labeled users can be identified from a particular list being present. Having more users from a labeled list (representing a class) in a cluster is a strong indication that the cluster likely represents the group of users who could potentially exhibit the same behavior. The various actions in flow diagram 200 may be performed in the order presented, in a different order or simultaneously. Further, in some embodiments, some actions listed in FIG. 2 may be omitted.

FIG. 5 is a flowchart illustrating the process of optimizing campaigns and performing product bundling for a user based on clusters, according to embodiments as disclosed herein. After the formation of clusters in lower dimension space, association rule mining can be applied on each of these clusters, and thereby automatically use the discovered rules for user targeted applications, such as optimizing advertising campaigns, performing product bundling, pricing and so on. Association rule mining is a method for discovering interesting relations between variables in large databases. It finds complete association from all the items to the others given historical purchase data (market-basket analysis). E.g. if most people who buy bread and milk also tend to buy butter, a rule milk, bread->butter [support=5%, confidence=100%] may be discovered. Support of an item set is the fraction of all purchases in which that item set appears (e.g. if there are 100 purchases (each purchase may contain multiple items such as bread, jam, butter, oil, juice, milk etc.), if 20 of the purchases had bread as well as butter, then support of bread->butter is 20%). Confidence is the fraction of purchases in which two items appear together to the total number of purchases for the 2nd item (e.g. confidence of bread->butter will be 1.0 if out of all purchases of bread, butter is also purchased together with it).

After the formation of clusters in lower dimension space using other features of users within the clusters and the campaigns which historically were sent to them, the relationships between features of users within the cluster (wherein examples of the features may be the ARPU of the user, the number of SMSs sent by the user, the number of international calls made by the user and so on) and features of user who were previously converted by previously run campaigns are found (501) and the underlying hidden rules in the relationships are mined (502) using association rule mining. The rules obtained are combined (503) to suggest user targeted applications, such as optimizing advertising campaigns, performing product bundling, pricing and so on.

In an example, each attribute of users is discretized into bins (e.g ARPU (Average Revenue Per User) can be high, medium and low). Now each conversion for each campaign can be treated like a “purchase”. So corresponding to each campaign, top association rules are mined. Now, within each cluster, conversion information corresponding to several campaigns can be obtained. Now the discovered rules can be ranked based on how many times they occur within the cluster and then top ranking rules would be combined to generate new rules which can be the basis for designing a new campaign or optimizing an existing campaign.

The various actions in flow diagram 500 may be performed in the order presented, in a different order or simultaneously. Further, in some embodiments, some actions listed in FIG. 5 may be omitted.

FIG. 6 is a graph showing the representation of users in a low dimensional feature space, according to the embodiments as disclosed herein. The graph shown in the figure depicts a two dimensional feature space with X and Y axes. The graph is obtained by reducing the n-dimensional feature space and applying the correspondence analysis on the numerical time series data.

Considering a sample of ten users in a telecom network as an example. The transactional data of all the ten users are recorded in the telecom network. The transactional data (raw data) of all the users is represented using U=1, 2, 3 . . . u users, for each user, numerical value of the attribute at each time instance T=1, 2, 3 . . . t is measured. This model of representing each user's numerical value of the attribute at each time instances forms a multinomial data or an array having n-dimensions (for example u×n).

The first step involved in classification and detection of user behavior trends using correspondence analysis is the reduction of n-dimensional space to lower dimensional space.

The dimensionality reduction of the multinomial data is performed for easy processing and interpretation of data without losing trend information of each user. The multinomial data can be reduced to lower dimension (for example 2-dimensional or 3-dimensional based on the requirement). In the lower dimensional feature space (2-dimensional as in the graph), the new coordinates (as shown in the graph) will be such that those users who are following similar trend in the multidimensional time series domain will become closer to each other than those who are dissimilar as shown in the graph.

FIG. 7 is a graph showing the grouping of users having similar trends over certain time period, according to the embodiments as disclosed herein. Once the multinomial data is reduced to a lower dimension (2-dimensional) as described in FIG. 3, the users of the telecom network can be grouped or clustered as shown in the graph. Clustering or grouping of the coordinates is performed based on the similar trends of the users. These groups or clusters contain the users who are similar in their trends over certain time period. These clusters are used for group based prediction or further analysis on the group.

Further, unlabeled cluster members are assigned class membership proportional to the labeled samples in the cluster. Finally, the method predicts the future actions of the users based on the past trends that are observed from the labeled clusters.

From the transactional data or historical data of the users in the telecom network, the actions performed by the users following a similar trend can be predicted. This information is used for predicting the actions of new users of similar trend.

Consider a group of 10 users (as depicted in table 1 below, which depicts the ARPU for each user) having a similar trend.

TABLE 1 ARPU Month 1 Month 2 Month 3 Month 4 User 1 473.05 740 439 0 User 2 247 100 99 0 User 3 372 508 282 0 User 4 80 105.1 55 30 User 5 235 334 50 120.17 User 6 409 309 9 500 User 7 73.01 75.05 0 144.01 User 8 105 176 129 509 User 9 65 0 0 10 User 10 200 0 0 50

After applying correspondence analysis of this type of numerical time series data, it maps it to two-dimensional feature space by assigning new coordinates to the users such that those which are following similar trend will be close to each other in this new space as indicated in table 2.

TABLE 2 ARPU Dimension 1 Dimension 2 User 1 0.715 1.706 User 2 0.14 1.975 User 3 0.785 1.2 User 4 −0.854 −0.967 User 5 −1.047 −0.134 User 6 −1.488 −0.656 User 7 −1.266 −0.083 User 8 −1.705 −0.622 User 9 2.051 −1.46 User 10 2.6 −0.9

In an example, consider that various brands based on their historical stock prices can be expressed as a time series (change in price over time). Assuming that it is required to identify brands which are similar in terms of their stock value trends over a period of time. The dimensionality of the historical stock prices is reduced from a multi-dimensional time series data into a 2D space followed by clustering. This will result in clusters of similar brands (for example, brands like Yahoo and Amazon may fall in one cluster and so on). Once grouping is done, timeseries models can be learned at the cluster level (e.g. ARMA models) to make predictions of future stock values.

Embodiments disclosed herein may be used for video segmentation, as depicted in the following example. Consider that an unsupervised segmentation of objects/users in a video needs to be done based on their similarity of their motion, which may be for safety management of large gathering (big crowd) in a public area, to get moving areas in a scene for efficient video compression, to detect unusual events, video surveillance or to analyze video for further for specific purposes.

For finding trend of object movement in a video, use magnitude of pixel movements over frames as an attribute of the trend recognition. Optical flow is used to get the subsequent position of the pixels from frame to frame. If a pixel was at (u1, v1) position in one frame and it move to (u2, v2) position in the next frame then the magnitude of its movement is calculated as the Euclidian distance between them. Once the optical flow is obtained, the magnitude of pixels displacement is calculated consecutively over all n frames, which results the trend of pixels in time series over (n−1) dimensions. Correspondence analysis will map the pixels movement data from n−1 dimension to 2-dimension such that pixels which belong to the similar object movement will be close to each other than those have dissimilar motion. On clustering the pixels, all pixels within each cluster will be representing similar trend.

Often gatherings involve movement of crowds in confined spaces such as city streets, overhead bridges, or narrow passageways. Because of the small space and big crowd, there can be many catastrophic events. If the usual motion at these places can be known apriori, then it is possible to predict locations of possible stampedes and hence do better safety management in those areas.

Detection of unusual events may be performed if areas where objects motion is not regular or deviations from normal behavior are detected.

FIG. 8 illustrates a computing environment implementing the method and system for detection and classification of user behavior trends using correspondence analysis, according to the embodiments as disclosed herein. The compute environment may consist of plurality such units, forming a distributed cluster, over which the algorithms are executed in a scalable fashion. As depicted the computing environment 801 comprises at least one processing unit 804 that is equipped with a control unit 802 and an Arithmetic Logic Unit (ALU) 803, a memory 805, a storage unit 806, plurality of networking devices 808 and a plurality Input output (I/O) devices 807. The processing unit 804 is responsible for processing the instructions of the algorithm. The processing unit 804 receives commands from the control unit in order to perform its processing. Further, any logical and arithmetic operations involved in the execution of the instructions are computed with the help of the ALU 803.

The overall computing environment 801 can be composed of multiple homogeneous and/or heterogeneous cores, multiple CPUs of different kinds, special media and other accelerators. The processing unit 804 is responsible for processing the instructions of the algorithm. Further, the plurality of processing units 804 may be located on a single chip or over multiple chips. Further a plurality of nodes such as 801 may be interconnected over a network to form a distributed computing environment, where the method described gets executed in a distributed fashion.

The algorithm comprising of instructions and codes required for the implementation are stored in either the memory unit 805 or the storage 806 or both. At the time of execution, the instructions may be fetched from the corresponding memory 805 and/or storage 806, and executed by the processing unit 804.

In case of any hardware implementations various networking devices 808 or external I/O devices 807 may be connected to the computing environment to support the implementation through the networking unit and the I/O device unit.

Embodiments disclosed herein enable compression of large amounts of temporal data related to users to smaller and more manageable amounts of data, hereby reducing the time required for processing the data and complexity of the system required for computing.

Embodiments disclosed herein enable detection of unusual events based on the raw data. The unusual event may be a behaviour of a user which does not match his history and/or the cluster of users to which he belongs. For example, the unusual event may be a user of a telecommunication network sending a large number of SMSs within a short period of time, when he previously used to send only a few SMSs.

Embodiments disclosed herein account for temporal changes in user behaviour.

The embodiments disclosed herein can be implemented through at least one software program running on at least one hardware device and performing network management functions to control the elements. The elements shown in FIGS. 1 and 5 include blocks which can be at least one of a hardware device, or a combination of hardware device and software module.

The foregoing description of the specific embodiments will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Therefore, while the embodiments herein have been described in terms of preferred embodiments, those skilled in the art will recognize that the embodiments herein can be practiced with modification within the spirit and scope of the embodiments as described herein.

Claims

1. A method for detection of user behaviour trends, wherein the method comprises of

performing pre-processing and feature selection on raw data by a cluster master, wherein the raw data comprises of data related to temporal behaviour of a user;
obtaining trend data from the raw data by the cluster master;
reducing dimensionality of the raw data by the cluster master to a lower dimension using correspondence analysis, wherein the data with the lower dimension causes users with similar behaviour to be closer to each other than those who are dissimilar;
performing clustering on the data with the lower dimension by the cluster master based on attributes of the user; and
assigning at least one label to the clustered data by the cluster master.

2. The method, as claimed in claim 1, wherein pre-processing and feature selection on the raw data comprises determining the attributes of the users.

3. The method, as claimed in claim 1, wherein trend data comprises behaviour that changes over time.

4. The method, as claimed in claim 1, wherein assigning at least one label to the clustered data is based on label information of previous users according to actions taken by the users previously.

5. The method, as claimed in claim 1, wherein the method further comprises of predicting future actions of the users based on the labeled clustered data.

6. The method, as claimed in claim 5, wherein the method further comprises of augmenting the predictions of future actions by generating confidence measures based on class membership proportional to the labeled clustered data.

7. The method, as claimed in claim 1, wherein the method further comprises of

applying association rule mining on the clustered data to discover at least one rule; and
using the at least one discovered rule for user targeted applications.

8. The method, as claimed in claim 7, wherein applying association rule mining on the clustered data to discover at least one rule comprises of

finding relationships between features of users in a cluster features of users who were previously converted by historical campaigns and features of previous campaigns themselves;
mining underlying rules in the clustered data; and
discovering defining attributes of each campaign, relationship of attributes of each campaign, other attributes of the campaign and previously converted users.

9. The method, as claimed in claim 1, wherein the method further comprises of detecting unusual events based on the raw data.

10. The method, as claimed in claim 1, wherein the raw data is at least one of numerical multinomial data; and an array having n-dimensions, where the raw data comprises of continuous individual features.

11. A computer program product comprising computer executable program code recorded on a computer readable non-transitory storage medium, said computer executable program code when executed, causing a method for detection, classification and prediction of user behaviour trends, comprising:

performing pre-processing and feature selection on raw data, wherein the raw data comprises of data related to temporal behaviour of a user;
obtaining trend data from the raw data;
reducing dimensionality of the raw data to a lower dimension using correspondence analysis, wherein the data with the lower dimension causes users with similar behaviour to be closer to each other than those who are dissimilar;
performing clustering on the data with the lower dimension based on attributes of the user; and
assigning at least one label to the clustered data.

12. The computer program product, as claimed in claim 11, wherein pre-processing and feature selection on the raw data comprises determining the attributes of the users.

13. The computer program product, as claimed in claim 11, wherein trend data comprises behaviour that changes over time.

14. The computer program product, as claimed in claim 11, wherein assigning at least one label to the clustered data is based on label information of previous users according to actions taken by the users previously.

15. The computer program product, as claimed in claim 11, wherein the method further comprises of predicting future actions of the users based on the labeled clustered data.

16. The computer program product, as claimed in claim 15, wherein the method further comprises of augmenting the predictions of future actions by generating confidence measures based on class membership proportional to the labeled clustered data.

17. The computer program product, as claimed in claim 11, wherein the method further comprises of

applying association rule mining on the clustered data to discover at least one rule; and
using the at least one discovered rule for user targeted applications.

18. The computer program product, as claimed in claim 17, wherein applying association rule mining on the clustered data to discover at least one rule comprises of

finding relationships between features of users in a cluster features of users who were previously converted by historical campaigns and features of previous campaigns themselves;
mining underlying rules in the clustered data; and
discovering defining attributes of each campaign, relationship of attributes of each campaign, other attributes of the campaign and previously converted users.

19. The computer program product, as claimed in claim 11, wherein the method further comprises of detecting unusual events based on the raw data.

20. The computer program product, as claimed in claim 11, wherein the raw data is at least one of numerical multinomial data; and an array having n-dimensions, where the raw data comprises of continuous individual features.

Patent History
Publication number: 20140372175
Type: Application
Filed: Jun 13, 2014
Publication Date: Dec 18, 2014
Inventors: Noopur Jain (Bahraich), Santanu Chaudhury (New Delhi), Jobin Wilson (Kerala), Prateek Kapadia (Mumbai)
Application Number: 14/303,621
Classifications
Current U.S. Class: Market Prediction Or Demand Forecasting (705/7.31)
International Classification: G06Q 30/02 (20060101);